Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Early Review of draft-ietf-rtgwg-segment-routing-ti-lfa-11
review-ietf-rtgwg-segment-routing-ti-lfa-11-opsdir-early-mishra-2023-08-25-00

Versions:
Request	Review of	draft-ietf-rtgwg-segment-routing-ti-lfa-09
	Requested revision	09 (document currently at 13)
	Type	Early Review
	Team	Ops Directorate (opsdir)
	Deadline	2023-04-14
	Requested	2023-03-20
	Requested by	Yingzhen Qu
	Authors	Ahmed Bashandy , Stephane Litkowski , Clarence Filsfils , Pierre Francois , Bruno Decraene , Daniel Voyer
	I-D last updated	2023-08-25
	Completed reviews	Opsdir Early review of -12 by Gyan Mishra (diff) Secdir Last Call review of -13 by Wes Hardaker Rtgdir Last Call review of -13 by Ben Niven-Jenkins Genart Last Call review of -13 by Roni Even Opsdir Early review of -11 by Gyan Mishra (diff) Secdir Early review of -10 by Wes Hardaker (diff) Rtgdir Early review of -09 by Andy Smith (diff) Intdir Early review of -11 by Antoine Fressancourt (diff)
	Comments	The RTGWG chairs would like to request some early directorate reviews to prepare the draft for WGLC, which will start after IETF 116.
Assignment	Reviewer	Gyan Mishra
	State	Completed
	Request	Early review on draft-ietf-rtgwg-segment-routing-ti-lfa by Ops Directorate Overtaken by Events
	Posted at	https://mailarchive.ietf.org/arch/msg/ops-dir/IOr-6_PpOesLm_osd8QDSuGD3Nk
	Reviewed revision	11 (document currently at 13)
	Result	Serious Issues
	Completed	2023-08-25
review-ietf-rtgwg-segment-routing-ti-lfa-11-opsdir-early-mishra-2023-08-25-00
TI-LFA Draft Version 11 review:
Review Result: Serious Issues

This draft has serious issues and is not ready for publication.  The
interactions between TI-LFA and uLoop avoidance needs to be detailed in the
specification. I have expanded on this topic in detail. Also clearly state what
post convergence meaning as well as how this draft is extension to IP FRR base
RFC 5286 replacing T-LDP RLFA RFC 7490 with SR based TI-LFA RLFA.

Below is draft overview and description of issues found in the draft that
should be addressed that I hope will improve the draft readiness for
publication:

The basic high level concept behind TI-LFA is to provide post convergence RLFA
(DFA) that is guaranteed loopfree & stateless and not a “stateful”
pre-convergence “pre-computed”             “pre-programmed” backup path that is
IGP LSDB SPF topology tree dependent based on IGP metrics and possibly other
constraints.  By making the TI-LFA use a “post convergence” RLFA calculation
for the RLFA PQ node we can provide 100% full coverage of all prefixes as
compare to T-LDP based RLFA by using SR’s statelessness making it “topology IGP
and metric & constraints independent” by using a static SID list to build the
path which in most cases is a single node sid which can leverage ECMP or a node
sid + adjacency sid or at most 4 SIDs for the backup path.  TI-LFA works in
conjunction with uLoop Micro loop avoidance which I will discuss later in this
review and the interaction between the two and why I believe it is important to
reference the interaction between the two specifications and possibilities of
combining the specifications.

Digging deep dive into the draft and details of TI-LFA specification.

IP FRR (Base LFA) RFC 5286, T-LDP based RLFA, TI-LFA all provide per prefix
“Local Protection” Link, Node and SRLG protection.  This is similar to RSVP-TE
FRR RFC 4090 providing similar Link, Node and SRLG protection.  RSVP-TE can
provide path protection where all LFA styles are “Local Protection” only.
TI-LFA is an extension of LFA  where T-LDP based RLFA is replaced with Segment
Routing based RLFA using static sid list from extended P space to RLFA PQ node
post convergence calculation. This concept of TI-LFA being an extension of LFA
should be expressed throughout the draft that the original base “IP FRR” LFA
RFC 5286 is not changing and exists with existing standards based pre
programmed backup path.  We are just replacing T-LDP based RLFA pre convergence
backup path algorithm for RLFA PQ node calculation with now TI-LFA post
convergence backup path RLFA PQ node calculation.

T-LDP based RLFA is stateful with MPLS data plane – underlying topology
dependent IGP metric & constraint based backup path calculation. TI-LFA based
RLFA is stateless with SR data plane SR-MPLS or SRv6 – completely topology
independent using SR mechanics via static SID list.  AFAKI  TI-LFA uses SR
mechanics to build the static sid based backup path which makes it local
“RIB/FIB independent” making it IGP LSDB link state topology metrics &
constraints independent.  Is that true and if so that should be added to
provide clarity as to what is meant by “topology independent”.

IP FRR (LFA-Base) comes into play when there you have multiple uplinks and
downlinks criss-cross links beteen nodes (more redundancy) RLFA (TI-LFA RLFA /
T-LDP RLFA) comes into play with loop or ring topology where you have single
uplink and single downlink and less redundancy which is a very common situation
with Service Providers physical fiber plant infrastructure.  Thus TI-LFA  +
uLoop play and import role in link, node & srlg protection to provide sub 50ms
convergence. Capacity planning always has to be taken into account with link &
node failures when traffic reroutes across a failover path that is may not be
sized appropriately and so just as with IP FRR Base RFC 5286 and T-LDP RLFA RFC
7490 this goal of TI-LFA is to provide link, node and srlg protection and sub
50ms convergence to a failover backup path.  So if capacity issues exist before
turning on any LFA flavor those same capacity issues would exist with any
flavor of LFA including TI-LFA.  The advantages of T-LDP based RLFA versus
TI-LFA related to distribution of traffic over ECMP paths using single node sid
exists as well with T-LDP RLFA so the use of ECMP with TI-LFA distribution is
no different then T-LDP RLFA.

IP FRR LFA Base RFC 5286 and T-LDP RLFA RFC 7490 requires tiebreaking rules for
LFA style LFA, RLFA and which to be prioritized which now is not required with
TI-LFA. That provides much simplicity and optimization for TI-LFA> TI-LFA as
well now has configurable tiebreaking knob for implementation specific
protection schemes link, node, srlg where link is default protection and link,
node, srlg can be prioritized to prefer one protection mechanism over another.

An implementation consideration which should be discussed in detail in the
draft is that if all nodes have TI-LFA enabled which is typical in a service
provider network and protected SID is used and TI-LFA occurs on the protected
SIDs, the complexity and pros and cons of using protected versus unprotected
sids for the repair path and any issues that could arise with link congestion
with nested TI-LFAs being triggered.  Also any limitations on having many
TI-LFAs and multi layer nested TI-LFAs due to protected SIDs along a path with
SR policy.  It maybe a implementation and deployment consideration to not use
or maybe not allow protected SIDs for the repair path. Possibility of using
stateful PCE / SDN controller to instantiate the triggered TI-LFAs and deeply
nested TI-LFAs if they occur which the controller could be used to manage the
bandwidth and constraints along the repair paths.

Applicability of RFC 7916 to TI-LFA. This should be discussed in detail and I
think even a separate section related to applicability.  RFC 7916 is mentioned
but not in detail on the applicability.  SR Policy can have link coloring
similar to RSVP-TE IGP extension and I AFAIK should be able to take advantage
of link coloring to control the choice of TI-LFA paths for RLFA PQ node
calculation to include or exclude links based on link colors based on
administrative groups to help aid building and optimizing the repair path. 
Also with PCE being able to gather the link speeds for bandwidth based TI-LFA
to avoid congested links as well as congestion caused by FRR activation. The
“no transit” condition on LFA computing node described in RFC 7916 is
applicable to TI-LFA and should be included in operational considerations
section using ISIS overload bit or OSPF R bit.

What I mentioned in the above paragraphs is AFAIK how TI-LFA works and its
inner workings and I believe should be discussed in the abstract, introduction
and throughout the document.

Major issues:
TI-LFA by itself is optimized to use ECMP and least number of SID’s for IGP
ECMP path.  The problem is during convergence uLoop exists in diagram below on
R1, R4, R5 until the nodes are converged.  With SR uLoop avoidance a timer is
set at time T to replace the TI-LFA backup path with uLoop path by installing a
static sid list to statically route traffic across the nodes that are not
converged avoiding the local FIB programming on R1, R4, R5 until they are
completely converged after which time the TI-LFA post convergence path can be
reverted back to  using optimized loose prefix sid path can be build and
stateless flows can be built on the post convergence LSP path.  So now with
uLoop avoidance a temporary static sid list strict path is created along path
from extended P space to RLFA PQ node and timer is set which pops in T seconds
once all nodes are converged and then the static strict sid list is removed
reverting back to the original TI-LFA post convergence backup path.

If you do not have a uLoop static sid list tunnel to statically tunnel across
all the non converged nodes in this case R1, R4, R5 have microloops with the
result would be an outage and black hole of traffic until all P & Q space nodes
are converged.  This is a problem as I see it with TI-LFA working independently
without SR uLoop avoidance.

R1, R4, R5 –Extended P Space
R4, R5, R6 – Q Space
R4,R5 – PQ Space

In this example the link between R2-R3 has link failure

CE –R1- R2 – R3-CE
    |         |
    R4 – R5 -R6

AFAIK, TI-LFA cannot work without uLoop as you need static sid list tunnel
across the entire path.  TI-LFA plus uLoop avoidance would be a better
comparison to RLFA T-LDP tunnel and/or RSVP TE tunnel link and node protection.
 With RLFA T-LDP case we are creating a tunnel from extended P space node to
RLFA calculated PQ Space node and with RSVP-TE creating bypass tunnel from PLR
node to merge point.  Because both RLFA T-LDP & RSVP-TE FRR utilize a tunnel
with additional labels to tunnel across the intermediate nodes thus are not
using IGP FIB entries for forwarding as traffic is being tunneled from S-F over
from PLR to merge point. With TI-LFA as the SID list is optimized to a single
prefix-sid or prefix-sid + adj-sid the intermediate nodes are ECMP forwarding
using IGP FIB / LFIB entries thus the uloop and subsequent outage occurs until
all intermediate nodes along the path to destination are converged. Of course
that is the main reason why SR uloop avoidance is most critical and why TI-LFA
cannot be used without uLoop avoidance.

After reading through this document many times I think the concept of uLoop
draft should be merged with TI-LFA as they are both so closely coupled that
they should be integrated into the TI-LFA solution.  The other alternative is
that when the TI-LFA RLFA PQ node is calculated to build the static sid list
with adj-sid only from the PLR node to TI-LFA calculated PQ node.  That would
eliminate the need for an extra step with the separate uLoop avoidance
specification as well and operator complexity of having to configure both.  The
issue with that is MSD and having an optimized SID list.  However using a
PCE/SDN controller along with IGP MSD signaling could be used to signal MSD and
manage the SID list platform limitations could be a possibility just as is done
today with SR policy on the head end, MSD limitations could be handled by the
PCE as well.

Minor issues:
Better clarity on post convergence backup path and what is meant by topology
independent per my description above.  Also better clarity that TI-LFA is an
extension of base specification IP FRR RFC 5286.  Mention in the draft that
even though TI-LFA uses post convergence backup path as TI-LFA is an extension
of LFA the semantics of pre-programmed backup path exists at time T1 once
configured, and then when a failure occurs at time T2 the backup path is
updated with the post convergence backup path information.  This maybe
implementation specific but I think should be included in the specification.
Recommendation to remove any marketing language and /or subjective language and
keep to the details of the specification.   Below I have given rewrite of
language recommendations to make the document more clear.

Section 1 & 3 I recommend should be combined

Abstract
Tried to make this section more clear with my rewrite
Why are we saying between two networks?
Is it talking about the source and destination that the TI-LFA coverage.  I
think of coverage as one of the improvement from LFA to TI-LFA that it’s
providing 100% coverage for all prefixes where base IP LFA does not. I think
the abstract is lengthy and could be made more brief removing the last sentence
to the introduction.  I have rewritten the abstract including the last sentence
which should be moved to the introduction. Old

This document presents Topology Independent Loop-free Alternate Fast Re-route
(TI-LFA), aimed at providing protection of node and adjacency segments within
the Segment Routing (SR) framework.      This Fast Re-route (FRR) behavior
builds on proven IP-FRR concepts being LFAs, remote LFAs (RLFA), and remote
LFAs with directed forwarding (DLFA).  It extends these concepts to provide
guaranteed coverage in any two connected networks using a link-state IGP. A key
aspect of TI-LFA is the FRR path selection approach establishing protection
over the expected post-convergence paths from the point of local repair,
reducing the operational need to control the tie-breaks among various FRR
options.¶

New

This document defines Topology Independent Loop-free Alternate Fast Re-route
(TI-LFA), aimed at providing link, Node and SRLG protection using prefix and
adjacency segments within the Segment Routing (SR) framework.  TI-LFA Fast
Re-route (FRR) behavior is an extension to the base IP-FRR framework using
LFAs, remote LFAs (RLFA), and remote LFAs with directed forwarding (DLFA).  It
extends these LFA concepts by now providing 100% full coverage to all prefixes.
 A key aspect of TI-LFA extension to base IP FRR (LFA) is that now a tiebreaker
is not required for LFA and RLFA.

Introduction

Old

By relying on SR this document provides a local repair mechanism for standard
link-state IGP shortest path capable of restoring end-to-end connectivity in
the case of a sudden directly connected failure of a network component.  Non-SR
mechanisms for local repair are beyond the scope of this document. Non-local
failures are addressed in a separate document
[I-D.bashandy-rtgwg-segment-routing-uloop].

The term topology independent (TI) refers to the ability to provide a loop free
backup path irrespective of the topologies used in the network. This provides a
major improvement compared to LFA [RFC5286] and remote LFA [RFC7490] which
cannot provide a complete protection coverage in some topologies as described
in [RFC6571].¶ For each destination in the network, TI-LFA pre-installs a
backup forwarding entry for each protected destination ready to be activated
upon detection of the failure of a link used to reach the destination.

New

Using SR this document provides a local repair mechanism for standard SPF path
calculation capable of restoring end-to-end connectivity in the case of a
sudden directly connected failure of a network component.  Non-SR mechanisms
for local repair are beyond the scope of this document. Micro Loop avoidance is
a critical component of TI-LFA post convergence by providing a temporary SR
policy across intermediate P and Q space nodes that have not converged
[I-D.bashandy-rtgwg-segment-routing-uloop].

The term topology independent (TI) refers to the ability to provide a loop free
backup path that is independent of underlying link state database IGP metric &
constraints which is now based on segment routing policy.  This solution is an
extension of the base IP FRR (LFA) [RFC5286] and replaces T-LDP based remote
LFA [RFC7490] which cannot provide a complete protection coverage in some
topologies as described in [RFC6571]. For each destination in the network,
TI-LFA pre-installs a backup forwarding entry for each protected destination
ready to be activated after a failure and updated based on current SPF for post
computation backup path upon detection of the failure of a link, node or slrg 
used to reach the destination.

**This next few paragraphs are rewritten to make clear that TI-LFA is an
extension to base IP FRR and that T-LDP based RLFA is replaced with TI-LFA
based RLFA**

Old
By using SR, TI-LFA does not require the establishment of TLDP sessions
(Targeted Label Distribution Protocol) with remote nodes in order to take
advantage of the applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or
remote LFAs with directed forwarding (DLFA)[RFC5714]. All the Segment
Identifiers (SIDs) are available in the link state database (LSDB) of the IGP.
As a result, preferring LFAs over RLFAs or DLFAs, as well as minimizing the
number of RLFA or DLFA repair nodes is not required anymore.¶ By using SR,
there is no need to create state in the network in order to enforce an explicit
FRR path. This relieves the nodes themselves from having to maintain extra
state, and it relieves the operator from having to deploy an extra protocol or
extra protocol sessions just to enhance the protection coverage.

New
TI-LFA replaces RLFA provided by the establishment of TLDP sessions (Targeted
Label Distribution Protocol) with remote nodes in order to take advantage of
the applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or remote LFAs with
directed forwarding (DLFA)[RFC5714] with SR based RLFA.   All the Segment
Identifiers (SIDs) are available in the link state database (LSDB) of the IGP.
Thus, tiebreaking of LFAs over RLFAs or DLFAs, as well as minimizing the number
of RLFA or DLFA repair nodes is no longer required as SR framework advertises
the SIDs via IGP extension. **Below sentence which is promoting the technology
and is well known to anyone familiar with SR reading this document so I think
can be excluded** By using SR, there is no need to create state in the network
in order to enforce an explicit FRR path. This relieves the nodes themselves
from having to maintain extra state, and it relieves the operator from having
to deploy an extra protocol or extra protocol sessions just to enhance the
protection coverage.

Section Terminology:

*Is there any change in the definition for P space, Extended P space & Q space
from RFC 7490 and if so that should be specified.

Similar to [RFC7490], we use the concept of P-Space and Q-Space for TI-LFA.

*R,X please define R which is the Source node but define what is R within the
topology which is the R=PLR, Also the resource X is the link, node, srlg -this
is not completely clear so I think should be explicitly spelled out

The P-space P(R,X) of a router R with regard to a resource X (e.g. a link S-F,
a node F, or a SRLG) is the set of routers reachable from R using the
pre-convergence shortest paths without any of those paths (including equal-cost
path splits) transiting through X.

*The union of the P spaces of the neighbors of R but how is that the reduced
set of neighbors – the end of the sentence is confusing

Consider the set of neighbors of a router R and a resource X. Exclude from that
set of neighbors that are reachable from R using X. The Extended P-Space
P'(R,X) of a node R with regard to a resource X is the union of the P-spaces of
the neighbors in that reduced set of neighbors with regard to the resource X.

*should this be Q(D,X) where D-Destination and is the set of routers reachable
from D

The Q-space Q(R,X) of a router R with regard to a resource X is the set of
routers from which R can be reached without any path (including equal-cost path
splits) transiting through X.

*What does EP mean– “explicit path?” path from P node to Q node – explicitly
define EP

*AFAIK – the explicit path would that not be from the PLR to which is R to the
RLFA PQ node calculated by SPF

EP(P, Q) is an explicit SR-based path from a node P to a node Q.

*should we mention asymmetric is where the metric is not the same on both ends
of link.

A symmetric network is a network such that the IGP metric of each link is the
same in both directions of the link.

Section 5

Should section 5.2 Q space computation be all the nodes reachable by
Destination D w/o using resource X Should there be a section on TI-LFA
calculation to find the intersection of the extended P space & Q space which
would be the PQ space RLFA node calculated? Section 5.1 defines the extended P
space set of nodes – how is that any different then extended P space in RFC
7490  (this is just a set of nodes that are part of the P space) Section 5.2
defines the extended Q space set of nodes – how is that any different then
extended P space in RFC 7490  (this is just a set of nodes that are part of the
Q space) So would the static SID list be built from the from the PLR to the
RLFA PQ space node calculated and would traverse intermediate nodes with the
section 5.1 extended P space an 5.2 extended Q space

Section 5.3  Scaling considerations
With RFC 7490 RLFA we are just building a T-LDP tunnel from the PLR to RLFA
calculated PQ space node which is used for each prefix – so its  an RLFA tunnel
per prefix With TI-LFA we have a static sid list that is built per prefix from
the PLR to RLFA calculated PQ space node so that seems like a lot of heavy
lifting building that many steered paths static LSPs

Is it computing the RLFA PQ space per destination and not the Q space per
destination.  We need the intersection of the P space & Q space – which is the
RLFA node in the calculation to build the SID list from PLR to Destination node?

Section 6 – I think should be renamed TI-LFA node protection
S and D should be defined as Source & Destination
R = PLR = S  we should try to keep the terminology and naming conventions for
nodes consistent throughout the document. *Below sentence I would rewrite since
header is SRv6 related to SRH or SRv6 compression C-SID – uSID or GSID carrier
related where SR-MPLS is a label stack – no header

Old
The TI-LFA repair path (RP) consists of an outgoing interface and a list of
segments (repair list (RL)) to insert on the SR header in accordance with the
dataplane used.

New
The TI-LFA repair path (RP) consists of an outgoing interface and a list of
segments (segment list (SL)) to SRv6 endpoint behavior insert T.Insert endpoint
behavior or depending on hardware capabilities T.encap and SR-MPLS SID list
represented by a label stack.

/s/repair//sid list  replace repair list with sid list everywhere in the
document

I understand the reason to use RL-repair list is that RP-Repair path
nomenclature matches however you could call the repair path “bypass loop” which
is common term used for local repair.  Repair list is referring to sid list but
is saying list of what but does not say explicitly where sid list is really
what you are after so I think sid list is more appropriate term that repair
list. SL (SID list) – refers to Active path but can also refer to the backup
path as well as SL is referring to the SID list.  This can be added to the
terminology section.

*Below sentence is not entirely true as it depends on implementation as well
most implantations to avoid scalability issues with MSD will prefer to use a
mix of node-sid & adj-sid and not use all adj-sid.  When building the static
SID list from the PLR node to the TI-LFA calculated  RLFA PQ-node if adj-sid is
not used for the entire path then the intermediate nodes will rely on ECMP /
IGP programmed FIB entries resulting in  P and Q space nodes not converged
“microloops” which is one of the major issues I see in the specification that
uLoop must be included as a dependency for this specification.

“The repair list encodes the explicit post-convergence path to the destination,
which avoids the protected resource X and, at the same time, is guaranteed to
be loop-free irrespective of the state of FIBs along the nodes belonging to the
explicit path.”  not true unless you use all adj-sid which is not scalable and
the only solution is uLoop

Old
As an example, in Figure 1, we are interested by the TI-LFA backup from S to D
considering the failure of node N1.

New
Figure 1, example we are interested in the TI-LFA backup from S to D
considering the failure of node N1.

Section 6.1
Would this be IP FRR base RFC 5286

Section 6.2 FRR path using a PQ node
In section 6 was that not describing PQ node scenario ?
I would call remote node something different then R since R is reserved for PLR
/ S node Maybe call it Y = remote node

This is comparable to a post-convergence RLFA repair tunnel.

**this goes back to a point I made throughout the draft regarding TI-LFA is an
extension of base IP FRR (LFA) RFC 5286 and T-LDP RLFA is replaced by TI-LFA
RLFA so TI-LFA is an “RLFA” its just an RLFA that uses SR  I would not say
comparable…

I would say ..

This is a post-convergence RLFA repair tunnel.

Section 6.3 FRR path with P node and Q node adjacent
Here also I would not say “comparable”
This is comparable to a post-convergence DLFA (LFA with directed forwarding)
repair tunnel.

This is a post-convergence DLFA (LFA with directed forwarding) repair tunnel.

Section 6.4 Connecting distand P & Q nodes
I think you should say P and Q space is not adjacent since the P space
represents all P nodes and Q space represents all Q nodes but within those 2
spaces there is no intersecting common nodes So here we are saying there is no
intersecting P and Q space nodes and so not adjacent For RLFA my understanding
is you should have a intersection of P & Q space which is the PQ space and that
is is what is calculated as your RLFA PQ node to build the tunnel or static sid
list. If P & Q space are  not adjacent to each other which I guess is possible
I guess as mentioned its still possible to create a static sid list from P to Q

Section 7. Building TI LFA repair list
So the crux of this section is explaining the procedure of how to insert the TI
LFA repair path sid list (repair list) into an existing SID list inserting into
the SR-MPLS label stack or SRv6 SRH header and once you exit the bypass loop
(repair path) you are back to the original SR policy path.  I think the
procedures may vary since MPLS processing has PUSH, NEXT, CONTINE operations
and SRv6 SRH we are moving the SL pointer SL=SL-1 as we process the SIDs.  Also
for C-SID, Next SID uSID flavor or Replace SID G-SID flavor the operation
happens within the uSID carrier.   My suggestion is that as this process of
building the repair list is different for SR-MPLS label switching & SRv6
Programming endpoint behaviors processing & SRH SID list processing.  I think
it would make sense to take section 7 and add it to section 8 data plane
sections.  Also in doing so I think it would make sense to separate out SR-MPLS
& SRv6 into separate sections.   The sections show the scenario of node sid and
adjacency sid but I think it should have prefix-sid as well.  Node sid is a
type of prefix sid but the prefix sid would be an intermediate steering hop
versus the final destination would be the node sid  of the egress PE FEC
Loopback0.  So I think the prefix-sid scenario should be added as its more
pertinent then node sid since TI-LFA link, node, srlg protection could occur on
any transit node along the path.  I think you could replace the node sid with
prefix sid.

Section 7.1 Active segment is a node segment
Wherever mentioned SR header just to make better readability and accurate we
should replace header with sid list and sid list is applicable to both SR-MPLS
& SRv6. “The active segment MUST be kept on the SR header unchanged and the
repair list MUST be added.” Also maybe mention the reason why the active
segment must be kept in the header (segment list) unchanged is that once the
TI-LFA segment list is popped the original active segment can now be processed
and the SR policy steering can continue where it left off. I am not following
why the active segment should be the first segment of the repair list.  This
question applies to the other sections as well. My thought process on TI-LFA is
that lets say the head end node has an SR Policy with a single sid only to
steer the traffic and the active segment is a node sid in this case the node
sid is for the egress PE FEC loopback0 so an ECMP path to the final
destination, however the S-F link is down so now we should be trying to steer
to the node sid along the repair path bypass loop.  So the PLR Source router
does a CONTINUE swap operation on the active sid which now packet forwarded to
the next router which processes 1st segment in the repair list for the repair
path which takes you back to the merge point back end of the bypass loop.  Once
you have completed the processing the all the sids in the repair path the
repair list is empty.  What happens next and how do you forward to the egress
PE using the original SR policy which had the active prefix-sid. In this
example the link between R2-R3 has link failure

CE –R1- R2-   –R3-CE
     |         |
      R4 – R5 -R6

R1 is the PLR source node
So R1-R4-R5-R6 -R3 is the bypass loop that the repair path takes using the
repair list of sids. R1 is the source nod and has SR policy with label 16001
for the single node sid in the SR policy which is bound to prefix 3.3.3.3 on
node R3.  The link R1-R2 goes down. R1 has its pre programmed backup path
already configured with TI-LFA and now when R-R2 goes down it now calculates
the post convergence backup path to PQ node and installs in FIB node sid 16002
label binding to R6 6.6.6.6 so now we perform CONTINUE label swap across R4 R5
and when arrive at R6 merge point we are back on the SR Policy pre failure
path.  However now we don’t have the original SR Policy node sid to R3 16001
node sid to forward the traffic to R3. My thoughts on how this should work is
the if SR Policy has a node sid and TI-LFA failure occurs that the repair path
would be processed and then popped and then the active node sid would be
processed afterwards.   So in this case the repair list during the S-F failure
would be 16002 followed by 16001 in the label stack.  So 16002 would take you
along the bypass loop repair path to R6 and then 16001 CONTINUE would be
processed and take you to the egress PE R3.

Section 7.2 Active segment is the adjacency segment
Trying to understand this paragraph
The simplest approach for link protection of an adjacency segment S-F is to
create a repair list that will carry the traffic to F. To do so, one or more
“PUSH” operations are performed. If the repair list, while avoiding S-F,
terminates on F, S only pushes segments of the repair list. Otherwise, S pushes
a node segment of F, followed by the segments of the repair list. For details
on the "NEXT" and "PUSH" operations, refer to [RFC8402].¶

“If the repair list, while avoiding S-F, terminates on F, S only pushes
segments of the repair list.” So this makes sense here that it pushes the
repair list which in this use case is all adj-sid – correct? “Otherwise, S
pushes a node segment of F, followed by the segments of the repair list.” So
its saying S only pushes repair list which is all adj-sid otherwise it pushes
node sid followed by repair list which is all adj sid. This does not make sense
as it runs into the same issue that I described in section 7.1 where one the
repair path sid list is processed how do you get to the egress PE final
destination which is what the node sid / adj. So I would think that S would
push repair list of adj sid followed by the node sid to f I think what I maybe
missing which is unclear is that when the repair path list of sid is generated
that should be just the sid list needed to get to the pre failure sr policy
path “merge point” so once we get there then the sr policy should take over. 
So I think whats happening is we are describing just how to get to the S-F to F
Destination node of the link for the TI-LFA and not the final destination.  So
once the repair path sid list is processed and we are back on the pre failure
path then the remaining sid list in the label stack is then processed. When the
SR policy is in place and how single or multiple TI-LFAs occur where we have
multiple switchovers this really needs to be addressed but in the context from
the SR policy perspective and the end to end steering works and what happens
once the TI-LFA  repair path sid lists are processed and we are back onto the
pre failure path Destination router S-F F router how is the rest of the sid
list label stack or SRH sid list processed. A picture of the SR-MPLS label
stack and SRv6 SRH header and SRv6 CSID Next SID uSID or Replace SID G-SID
diagram would be very helpful in understanding the overall sid list processing
along the repair path.

Old
The simplest approach for link protection of an adjacency segment S-F is to
create a repair list that will carry the traffic to F. To do so, one or more
“PUSH” operations are performed. If the repair list, while avoiding S-F,
terminates on F, S only pushes segments of the repair list. Otherwise, S pushes
a node segment of F, followed by the segments of the repair list. For details
on the "NEXT" and "PUSH" operations, refer to [RFC8402].

New
 The simplest approach for link protection of an adjacency segment S-F is to
 create a repair list that will carry the traffic to F. To do so, one or more
 “PUSH” operations are performed. If the repair list, while avoiding S-F,
 terminates on F, S only pushes segments of the repair list in this use case
 only an adj-sid(s). Otherwise, as described in section 7.1 S pushes a node
 segment of F, followed by the segments of the repair list. For details on the
 "NEXT" and "PUSH" operations, refer to [RFC8402].
It is not clear why the node sid has to be pushed before the repair path

Section 7.2.1  Protecting [Adj, Adj] segment list
This is an important consideration to use protected versus unprotected sids for
the repair path.  I think that should be addresses why to use protected sids
and the pros and cons of using protected versus unprotected sids.  Use of
protected sids could result in complex failure scenarios and can go many layers
deep of TI-LFA which can get over complicated. I don’t think the description is
accurately describing the scenario with the protected sid as it seems its
describing the TI-LFA activation path from S-F but not if a protected SID fails
and now that triggers and nested TI-LFA activation on that sid for a new bypass
loop repair path.

Section 7.2.2 Protecting [Adj, Node] segment list
This is an important consideration to use protected versus unprotected sids for
the repair path.  I think that should be addresses why to use protected sids
and the pros and cons of using protected versus unprotected sids.  .  Use of
protected sids could result in complex failure scenarios and can go many layers
deep of TI-LFA which can get over complicated. I don’t think the description is
accurately describing the scenario with the protected sid as it seems its
describing the TI-LFA activation path from S-F but not if a protected SID fails
and now that triggers and nested TI-LFA activation on that sid for a new bypass
loop repair path.

Section 8.1 MPLS data plane considerations
I recommend combining Section 7 & Section 8 related to SR-MPLS data plane into
a new SR-MPLS data plane section

1.      Section
How is the active segment signaled by PHP implicit null value 3.  The egress PE
node must signal PHP per RFC 3032 to the PHP node and then the PHP node
performs the POP operation.  How is the active segment signaled by PHP? I think
we are talking about S-F the S node PLR node so what we are saying is that if
PHP is signaled by the active segment and repair list ends with adj sid, then
on the PLR source node S the active segment must be popped before pushing the
repair list. The TI-LFA activation could happen on any transit node along the
path and there could theoretically be many TI-LFA and even nested TI-LFA
activations occurring simultaneously.  So why does PHP come into play here.
AFAKI PHP implicit null signaling should only come into play at the PHP node
which has been signaled by the egress PE with implicit null value 3 POP to Pop
the topmost label at the PHP mode. What if TI-LFA activation happened on the
PHP node?

2.      Section
There is only 1 condition, what other conditions are we referring to other then
the signaling active segment on the source node S being signaled by PHP. So
here we are saying that if the other conditions which I don’t know what other
conditions – please specify – our met – then the active segment is popped on
the source node S and then pushed again with a label from the SRGB representing
Q where Q is the endpoint of the repair list.  Is this the RLFA Q node or PQ
node intersection of the P & Q space.  If it’s the PQ node then we should say
explicitly which use case that this is that type of node and here am guessing
the PQ node.  If its different for each use case then I we should specify each
use case from section 6, direct, PQ node, P and Q that are adjacent, distant P
& Q nodes.

Section 8.2 SRv6 data plane considerations
I recommend combining Section 7 & Section 8  related to SRv6 data plane into a
new SRv6 data plane section. This section does shed some light on the reason
why data plane dependency that with SR-MPLS need node-sid followed by adj-sid
but with SRv6 you can just to adj sid since with SRv6 the adj sids are
advertised in IGP both adj sid & node sid (locator) is advertised in IGP With
SR-MPLS both adj sid and node sid are advertised in igp and SRGB can be used
for both adj sid and node sid come out of same label block for all nodes so is
globally advertised.  Since the adj sid is advertised by the IGP it is
dynamically learned, however  for persistence across reboots generally used by
operators a static manual adj -sid must be added where prefix / node sid is
always static and global advertised.

I think below should be noted as it is a critical part of the TI-LFA
implementation

Full SID T.insert  and T.encap reduced and that T.insert is recommended
implementation if the hardware supports otherwise T.encap reduced for the
repair path repair list

SRv6 compression CSID Next SID  uSID T.insert recommended and T.encap reduced 
and that T.insert is recommended implementation if the hardware supports
otherwise T.encap reduced for the repair path repair list.

SRv6 compression CSID Replace SID  GSID T.insert recommended and T.encap
reduced  and that T.insert is recommended implementation if the hardware
supports otherwise T.encap reduced for the repair path repair list.

If there any special cases where T.encap should be used instead of T.insert
that should be noted.

If there any special considerations of TI-LFA for PSP, USP endpoint operation
which should be noted.

Section 9 TI-LFA and SR Algorithms (Flex Algo)
Since we are talking about TI-LFA with SR Algo here I don’t think we need to
reference RFC 8402 so I would remove this line SR allows an operator to bind an
algorithm to a prefix SID (as defined in [RFC8402]. I would put this sentence
at the beginning of this section. [RFC9350] defines a flexible algorithm
(FlexAlgo) framework to be associated with Prefix SIDs. FlexAlgo allows a user
to associate a constrained path to a Prefix SID rather than using the regular
IGP shortest path.

I think this entire paragraph can be removed below.  The entire document is
talking about the default algo 0 so as the entire document pertains to default
algo 0 this can be placed further up in the document in the introduction would
be appropriate.   When TI-LFA uses node sid  with default algo or any algo
there is no guarantee that the path will be loop free as local policy may have
overridden the expected path I think is appliable to any algo AFAIK. The SR
default algorithm allows an operator to override the IGP shortest path by using
local policies. When TI-LFA uses Node-SIDs associated with the default
algorithm, there is no guarantee that the path will be loop-free as a local
policy may have overriden the expected IGP path.

These last two sentences should be placed under a new operational
considerations section in the draft as local policy is applicable to any algo
and not just algo0.

As the local policies are defined by the operator, it becomes the
responsibility of this operator to ensure that the deployed policies do not
affect the TI-LFA deployment. It should be noted that such situation can
already happen today with existing mechanisms as remote LFA.

Why would the Adj-SID have to be unprotected?   Please add verbiage explaining
the reason why.  Also is it trying to say node & adj sid that are part of the
flex algo sub topology and if so then the sentence should be rewritten below.

Old
An implementation MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs
that are unprotected bound to the FlexAlgo to build the repair list.

New
An implementation MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs
that are unprotected to build the repair list.

Section 10 Usage of Adjacency Segments in repair list
Why would TI-LFA be only for single planned failure.  Would it also be for
unplanned failures which is the major benefit of TI-LFA. Here we are confusing
two different scenarios.   At the beginning we mention that adj sid can be
protected and not protected.  So that is referring to a single TI-LFA
activation where a protected SID can have failure and now it can have a TI-LFA
nested failure.  The topic of protected SID and multiple nested failures should
be added to this section.

In this section we are talking about multiple simultaneous failures along a
path from S-F There maybe cases where you have a very long path from S-F and
many intermediate nodes that have TI-LFA configured and so now how do you pick
and chose which node to enable TI-LFA.  Also how can you guarantee that you
will not have multiple FRR activations simultaneously unplanned failures once
configured.  Also you could have nested failures within the same TI-LFA
activation if the SID is protected which I mentioned above.  I don’t understand
and this needs to be explained why TI-LFA activation will not work if you have
multiple unplanned failures along a path from S-F.

TI-LFA is providing an protection optimization for FRR so lets say if the you
had multiple link failures and FRR was not configured the failover would still
work but just take much longer to recover.  In a large SP network where you
have a very long path and you have many link failures simultaneously going
across bypass loops, without FRR enabled the network would still recover but
the recovery would take much longer, however with FRR it would be instantaneous
with the link, node, srlg protection. Example below here we have 2 simultaneous
FRR activations.  This can work and I don’t see any issue with it w/ or w/o FRR
enabled.  You can extrapolate this same scenario to a 100 FRR activations
happening simultaneously and the network should still recover w/ or w/o FRR.
Here we have 2 bypass loops “repair paths” R2-R4-R3 and R5-R8-R6 and Link R2-R3
fails and link R5-R6 fails.

  R1 -R2 – R3 – R5  -  R6 – R7
      |     |   |      |
        R4         R8

Section 11 Advantages of using expected post convergence path during FRR
Capacity planning is always a consideration when designing an network and AFAIK
as far as the specification capacity planning is something that operators have
to be cognizant of regardless of FRR but for any type of link, node, slrg
failure.  That being said I don’t think that capacity planning under or over
capacity planning is any different w/ or w/o FRR as the failure path will S-F
will be close to the same along a bypass loop.   ECMP exists for LFA and RLFA
as well as now TI-LFA to distribute the load during a failure so that does not
change.  I think what I have stated here should be mentioned as far as capacity
planning perspective.

Section 12 Analysis based on real world topologies
Section 12 provides a similar analysis based on real world topologies similar
to RFC 7490 RLFA.  I agree that this data is important to the specification and
should remain in the body of the document.  I recommend mentioning that T1-T9
represents 9 Service Provider network topology use cases studied.  As the
number of links and nodes are specified per topology is plenty, I don’t think
its pertinent and maybe not even possible to provide the actual topology as its
NDA information. I each of the tables should the number  of SIDs % columns all
add up to 100%?

I noticed that in the tables the % is % of prefixes that fall in each category
so if you total all the columns it falls short of 100%. I thought that TI-LFA
should yield 100% prefix coverage due to post convergence and static sid list
so all prefixes should be covered.  Why would it not be 100% full coverage as
that is one of the main advantages of TI-LFA used for RLFA as opposed to T-LDP
RLFA? The tables are very confusing and hard to follow and I am not seeing in
the tables that 1 SID or 2 SID repair path yields 99% coverage in all topology
cases. Does the comment below apply to all tables and if so I am not seeing 99%
for the 1 SID or 2 SID column. “The measurements listed in the tables indicate
that for link and local SRLG protection, 1 SID repair path is sufficient to
protect more than 99% of the prefix in almost all cases. For node protection 2
SIDs repair paths yield 99% coverage.”

Section 13 Security considerations
No issues

Recommendation for Operational Considerations section
I think a operational considerations section should be added to the draft.
Detailing the possible caveats with possible layers of nested TI-LFAs within a
single repair path and complexity as well as scale.  Also within a single SR
policy path the scalability of the number of TI-LFAs and nested LFAs within a
single SR policy path. Detailing out graphical representation drawing of the
SR-MPLS data plane and repair path and SRv6 data plane repair path in the case
where you have  multi layer nested TI-LFAs and multiple TI-LFAs in a single SR
policy using all protected sids.  Also use of PCE/SDN controller for SR policy
& TI-LFA to aid in FRR activation instantiation and management of bandwidth and
capacity.

Nits:
There are minor grammatical errors which I addressed in rewrites discussed in
Minor issues section.
Early Review of draft-ietf-rtgwg-segment-routing-ti-lfa-11 review-ietf-rtgwg-segment-routing-ti-lfa-11-opsdir-early-mishra-2023-08-25-00

Early Review of draft-ietf-rtgwg-segment-routing-ti-lfa-11
review-ietf-rtgwg-segment-routing-ti-lfa-11-opsdir-early-mishra-2023-08-25-00