Skip to main content

Topology Independent Fast Reroute using Segment Routing
draft-ietf-rtgwg-segment-routing-ti-lfa-21

Yes

Jim Guichard

No Objection

Deb Cooley
Erik Kline
Orie Steele
(Francesca Palombini)
(Zaheduzzaman Sarker)

Note: This ballot was opened for revision 13 and is now closed.

Jim Guichard
Yes
Deb Cooley
No Objection
Erik Kline
No Objection
Gunter Van de Velde
(was Discuss) No Objection
Comment (2024-05-08 for -14) Sent for earlier
[8-May-2024 - Updated Position: No Objection]
# Gunter Van de Velde, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13

[Resolved] Please find below two blocking DISCUSS points (easy to address), and a series of 
non-blocking COMMENTs and some nits.

Many thanks for the RTGDIR reviews from Stewart Bryant, 
Andy Smith and Ben Niven-Jenkins during the 7 years development 
period of the TI-LFA specification. Also many thanks for the shepherd 
write-up by Steward Bryant to provide a brief overview of the 
progress of the draft through the WG and the current state of art.

Thank you to the authors of this document. I really appreciate the 
effort and believe it captures the TI-LFA normative procedures well. 
Reviewing it with fresh eyes, I've made several comments that could 
help further improve the quality. I hope these insights will be 
valuable for the authors and the Working Group as you continue 
to refine the document.

DISCUSS:
========

[Resolved] DISCUSS#1
In section '9. TI-LFA and SR algorithms' i found the text written from sr-mpls 
perspective. SRv6 has different considerations.

637	   and Q-Space as well as the post-convergence path.  An implementation
638	   MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that
639	   are unprotected to build the repair list.

The above seems written from an sr-mpls perspective. For SRv6 the Adj-SID is bound 
to a Locator and consequently bound to an algorithm. As result, the observed limitation 
of sr-mpls does not really apply for SRv6. For SRv6 an implementation can 
use protected Adj-SID in the repair path without breaking algorithm aware 
topology requirements. Consider allowing protected SRv6 Adj-SIDs for TI-LFA.

In addition consider some blob of text about Adj-SIDs and locators in 
"section 8.2.  SRv6 dataplane considerations" could be beneficial. 
With sr-mpls there is no correlation to the segment routing algorithm, however 
when using SRv6 dataplane Adj-SID Locator is correlated to an algorithm.

[Resolved] DISCUSS#2
Sections 11 and 12 do not introduce any supplementary artifacts to the normative 
procedures outlined for TI-LFA. The information within section11 and 12 is provided 
in extensive detail. Should the Working Group (WG) prefer to maintain this 
level of specificity, it is advisable to consider relocating the detailed 
content to an appendix unless there is a strong reason to keep it in the main 
body of the document.

High level comments:
====================

* TI-LFA is based upon Segment Routing, however the document seems to have mostly sr-mpls 
datapane type language. The SRv6 dataplane is only mentioned first time on line 493, almost 
half way through the document. Maybe consider mentioning support for SRv6 
dataplane earlier onwards.
* 6 people on front page. Did all authors edit text in the draft?
* Operational impact may want to explicit mention that there is no interop complexity 
because TI-LFA is a node local operation
* the document makes use of the term 'we' and other anthropomorphism. Maybe not the best 
approach in a formal document. Who is 'we'? editor, authors, WG, IETF community, operators, etc? 
policies have no awareness or emotions

Detailed review COMMENTS ([minor] and [major])
==============================================
(Line numbers are rendered using idnits rendering)

19	   This document presents Topology Independent Loop-free Alternate Fast
20	   Re-route (TI-LFA), aimed at providing protection of node and
21	   adjacency segments within the Segment Routing (SR) framework.  This

[minor]
s/Re-route/Reroute/

[major]
The description provide insight that TI-LFA provide protection of node and adj segments.
It does not specify what 'protection' is all about or that 'protection' is 
constrained to single link|node failures. i.e. rfc5286 has explicit text in the 
abstract about single failure applicability.

24	   (DLFA).  It extends these concepts to provide guaranteed coverage in
25	   any two connected networks using a link-state IGP.  A key aspect of

[major]
in this sentence 'two connected networks' is referenced, while earlier in the 
paragraph there is indication of 'protection of node and adjacency segments'.
How doe two connected networks correlate with the segments?

25	   any two connected networks using a link-state IGP.  A key aspect of
26	   TI-LFA is the FRR path selection approach establishing protection
27	   over the expected post-convergence paths from the point of local
28	   repair, reducing the operational need to control the tie-breaks among
29	   various FRR options.

[minor]
suggested rewrite to make the text better readable:
A principal attribute of TI-LFA is the FRR path selection methodology, which 
establishes protection over the anticipated post-convergence paths from the 
point of local repair. This approach diminishes the operational necessity 
to manage the tie-breaks among various FRR alternatives.

[minor]
why is the path selection better? can a hint be given why it is better 
beyond a statement proclaiming it is better? 

138	   *  TI-LFA: Topology Independant LFA.

[minor]
s/Independant/Independent/

144	   Segment Routing aims at supporting services with tight SLA guarantees
145	   [RFC8402].  By relying on SR this document provides a local repair

[major]
The term SLA does not appear even once in RFC8402. How can the claim of 
tight SLA be justified with RFC8402? can an better pointer to the claim be inserted?

[minor] 
s/Segment Routing/Segment Routing (SR)/

145	   [RFC8402].  By relying on SR this document provides a local repair
146	   mechanism for standard link-state IGP shortest path capable of
147	   restoring end-to-end connectivity in the case of a sudden directly
148	   connected failure of a network component.  Non-SR mechanisms for

[minor]
readability rewrite:
This document outlines a local repair mechanism that leverages Segment 
Routing (SR) to restore end-to-end connectivity in the event of an 
abrupt failure involving a directly connected network component. 
This mechanism is designed for standard link-state Interior Gateway 
Protocol (IGP) shortest path scenarios.

153	   The term topology independent (TI) refers to the ability to provide a
154	   loop free backup path irrespective of the topologies used in the
155	   network.  This provides a major improvement compared to LFA [RFC5286]
156	   and remote LFA [RFC7490] which cannot provide a complete protection
157	   coverage in some topologies as described in [RFC6571].

[minor]
I think what is been trying to say is:
The term topology independent (TI) describes the capability of 
providing a loop-free backup path that is effective across all network 
topologies. This represents a significant enhancement over Loop-Free 
Alternate (LFA) [RFC5286] and Remote LFA as outlined in 
[RFC7490], both of which do not offer comprehensive protection coverage 
in certain topological configurations as detailed in [RFC6571]. TI-LFA 
ensures the availability of a backup path if a post-convergence path 
exists, regardless of the network topology.

167	   TI-LFA is a local operation applied by the PLR when it detects
168	   failure of one of its local links.  As such, it does not affect:

[minor]
It would be welcome to explicit spell that TI-LFA is protection against 
a single local link failure

[minor]
It was mentioned that TI-LFA provide protection against link and node failure.
In this section the abrupt fail of a link is mentioned to trigger FRR. How is 
node-protection with TI-LFA achieved and the PLR triggered that neighboring 
node is no more operational? It is elaborated upon later in this 
section, but maybe a brief hint could be provided here too?

167	   TI-LFA is a local operation applied by the PLR when it detects
168	   failure of one of its local links.  As such, it does not affect:

170	   *  Micro-loops that appear - or do not appear – as part of the
171	      distributed IGP convergence [RFC5715] on the paths to the
172	      destination that do not pass thru TI-LFA paths:

174	      -  As explained in [RFC5714], such micro-loops may result in the
175	         traffic not reaching the PLR and therefore not following TI-LFA
176	         paths.

178	   *  Micro-loops that appear – or do not appear - when the failed link
179	      is repaired.

[minor]
This does not process very well. I tried reading a few times this paragraph 
and believe what is mentioned could be rewritten as follows:

"TI-LFA operates locally at the Point of Local Repair (PLR) upon detecting 
a failure in one of its direct links. Consequently, this local operation 
does not influence:

* Micro-loops that may or may not form during the distributed Interior 
Gateway Protocol (IGP) convergence as delineated in RFC 5715. 

- These micro-loops occur on routes directed towards the destination that 
do not traverse TI-LFA-configured paths. According to [RFC5714], the formation 
of such micro-loops can prevent traffic from reaching the PLR, thereby 
bypassing the TI-LFA paths established for rerouting.

* Micro-loops that may or may not develop when the previously failed link 
is restored to functionality.

This specification highlights that while TI-LFA effectively addresses specific 
link failures, it does not extend its impact to managing micro-loops 
associated with broader IGP convergence issues or subsequent link repairs."

181	   TI-LFA paths are loop-free.  What’s more, they follow the post-
182	   convergence paths, and, therefore, not subject to micro-loops due to
183	   difference in the IGP convergence times of the nodes thru which they
184	   pass.

[minor]
This is a rather unformal writing style. what about the following:

TI-LFA paths are inherently loop-free and align with post-convergence routes. 
Consequently, they are not susceptible to micro-loops that may arise due to 
variations in the IGP convergence times across different nodes through 
which these paths traverse. This ensures a stable and predictable routing 
environment, minimizing disruptions typically associated with asynchronous 
network behavior.

186	   TI-LFA paths are applied from the moment the PLR detects failure of a
187	   local link and until IGP convergence at the PLR is completed.

[minor]
readability rewrite:
TI-LFA paths are activated from the instant the PLR detects a failure in a 
local link and remain in effect until the Interior Gateway Protocol (IGP) 
convergence at the PLR is fully achieved.

190	   micro-loops, especially if these paths have been computed using the
191	   methods described in Section Section 6.2, Section 6.3, or Section 6.4
192	   of the draft.  One of the possible ways to prevent such micro-loops

[minor]
Instead of simply referencing the sections 6.2, 6.3 and 6.4, maybe line up the 
conditions in which this occurs combined with the section references. This could 
be something in the style 'if the FRR path is not using a direct neighbor 
then... etc etc etc'

206	   For each destination in the network, TI-LFA pre-installs a backup

[minor]
what does destination exactly mean? is that a /32 or /128 node? or is it 
router-ids? any other abstraction intended?

224	   By using SR, TI-LFA does not require the establishment of TLDP
225	   sessions (Targeted Label Distribution Protocol) with remote nodes in
226	   order to take advantage of the applicability of remote LFAs (RLFA)
227	   [RFC7490][RFC7916] or remote LFAs with directed forwarding
228	   (DLFA)[RFC5714].  All the Segment Identifiers (SIDs) are available in
229	   the link state database (LSDB) of the IGP.  As a result, preferring
230	   LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or
231	   DLFA repair nodes is not required anymore.

[minor]
possible rewrite for readability and simplicity:

"
By utilizing Segment Routing (SR), TI-LFA eliminates the need to establish 
Targeted Label Distribution Protocol (TLDP) sessions with remote nodes for 
leveraging the benefits of Remote Loop-Free Alternates (RLFA) [RFC7490][RFC7916] 
or Directed Loop-Free Alternates (DLFA) [RFC5714]. All the Segment Identifiers 
(SIDs) required are present within the Link State Database (LSDB) of the 
Interior Gateway Protocol (IGP). Consequently, there is no longer a necessity 
to prefer LFAs over RLFAs or DLFAs, nor is there a need to minimize the number 
of RLFA or DLFA repair nodes.
"

233	   By using SR, there is no need to create state in the network in order
234	   to enforce an explicit FRR path.  This relieves the nodes themselves
235	   from having to maintain extra state, and it relieves the operator
236	   from having to deploy an extra protocol or extra protocol sessions
237	   just to enhance the protection coverage.

[minor]
what about this blob of text:
"
Utilizing SR makes the requirement unnecessary to establish additional
state within the network for enforcing explicit Fast Reroute (FRR) paths. 
This alleviation spares the nodes from maintaining supplementary state and 
frees the operator from the necessity to implement additional protocols or 
protocol sessions solely to augment protection coverage.
"

239	   Although not a Ti-LFA requirement or constraint, TI-LFA also brings

s/Ti-LFA/TI-LFA/

242	   reduces the need of locally configured policies that drive the backup

[minor]
unsure what is meant with 'drive' means here. Would it be better to day that 'describe the backup...'

243	   path selection ([RFC7916]).  The easiest way to express the expected
244	   post-convergence path in a loop-free manner is to encode it as a list
245	   of adjacency segments.  However, this may create a long SID list that

[major]
you write 'is to encode it'. What is the 'it'? I understand this is a suggesting Adj SIDs.
I also believe that simply having a list of Adj SIDs is not sufficient, but that an "ordered" 
list of Adj SIDs is needed. 

245	   of adjacency segments.  However, this may create a long SID list that
246	   some hardware may not be able to push.  One of the challenges of TI-

[minor] 
should we say push or program? push seems more sr-mpls dataplane specific, while 
TI-LFA has applicability with SRv6 also

248	   adjacency segments and node segments.  Each implementation will be
249	   free to have its own SID list optimization algorithm.  This document
250	   details the basic concepts that could be used to build the SR backup
251	   path as well as the associated dataplane procedures

possible rewrite:
"
Each implementation may independently develop its own algorithm for 
optimizing the ordered SID list. This document provides an outline of the 
fundamental concepts applicable to constructing the SR backup path, along 
with the related dataplane procedures.
"

288	   We define the main notations used in this document as the following.

290	   We refer to "old" and "new" topologies as the LSDB state before and
291	   after the considered failure.

[minor]
I would like to prefer not using the word 'we'. It is undefined who 
that is. Is it the editor, authors, the WG the internet community, etc...

286	3.  Terminology

[minor]
Would section 3 be better located before section 2 for clarity?

[major]
Later in the document there is usage of P(S,X) and Q(D,X) while 
the terminology section only documents P(R,X). Maybe add some text 
to clarify the intended use.

321	   EP(P, Q) is an explicit SR-based path from a node P to a node Q.

[minor]
why not simply use 'SR path' instead of 'SR-based path'? does the 
postfix '-based' add any representative value?

335	   An implementation is free to use any local optimization to provide
336	   smaller SID lists by combining Node SIDs and Adjacency SIDs.  In

[minor]
The intent seems to be to integrate adj SIDs and node SIDs into the SID lists.
Not sure that we are combining multiple SIDs into less SIDs:
"An implementation may employ any local optimization strategy to reduce 
the size of SID lists by integrating Node SIDs and Adjacency SIDs into 
the SID lists."

342	5.  Intersecting P-Space and Q-Space with post-convergence paths
343
344	   One of the challenges of defining an SR path following the expected
345	   post-convergence path is to reduce the size of the segment list.  In

[minor]
at the end of section 4 is written "These optimizations are out of scope of this document,"
and then the first paragraph identifies that reducing the SID lists is one of the challenges.
For something that is out-of-scope of the document it is perceived as rather important 
though problem to address. If truly out of scope of this document, then maybe add 
explicit that the section 5 is all informational 

[minor]
in some places the term 'segment lists' is used, in others 'SID lists'. Could a single 
terminology be used throughout the document? 

[major]
In the Terminology section the P-space, extended P-space and the Q-space is explained.
Not sure why all this is explained again in more explicit steps. It make me wonder if 
section 5 can be reduced by reusing the Terminology in section 3 and focus upon those? 

356	   We want to determine which nodes on the post-convergence path from

[minor]
who is 'we'?

358	   regard to resource X (X can be a link or a set of links adjacent to
359	   the PLR, or a neighbor node of the PLR).

[minor]
in section 3 Terminology section the document resource X was defined, but 
using different definition: 'resource X (e.g. a link S-F, a node F, or a SRLG)'
Which one is correct? maybe reuse the Terminology definition for consistency

378	   This can be found by intersecting the set of nodes belonging to the
379	   post-convergence path from R to D, assuming the failure of X, with
380	   Q(D, X).

[minor]
In terminology section 3 the Q(R, X) is described with 'R' used while 
in this section5.2 the term Q(D, X) has 'D' used.
Is this intentional? why not add this in Terminology 
section also? or make the Terminology section more opaque 
to using any letter (e.g. 'R' or 'D') and describe the 
intend of the Q(...) function?

397	   protected resource X and, at the same time, is guaranteed to be loop-
398	   free irrespective of the state of FIBs along the nodes belonging to
399	   the explicit path.  Thus, there is no need for any co-ordination or

[minor]
There is assumption here that only SR programs the FIB. There may be out 
of Band FIB programming that does cause loops. Maybe frame the 
claim better by expressing the assumption made to warrant loop-free paths.

460	6.2.  FRR path using a PQ node

[minor]
Is there a reason that there are no considerations for an implementer 
to select the PQ node closest to the S or closest to the D?

499	   interface for the packet, S-F.  The failure of the primary outgoing

[minor]
what is the 'F' in the S-F?

512	   We define hereafter the FRR behavior applied by S for any packet
513	   received with an active adjacency segment S-F for which protection
514	   was enabled.  As protection has been enabled for the segment S-F and
515	   signaled in the IGP (for instance using protocol extensions from
516	   [RFC8667] and [RFC8665]), any SR policy using this segment knows that
517	   it may be transiently rerouted out of S-F in case of S-F failure.

[minor]
A policy is a configuration. A policy does not 'know' anything. Can the statement 
be made without anthropomorphism?

637	   and Q-Space as well as the post-convergence path.  An implementation
638	   MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that
639	   are unprotected to build the repair list.

[major]
This is written from an sr-mpls perspective. For SRv6 the Adj is bound to an algorithm and this condition does not apply

647	           S --- R2 --- R3 --- R4 --- R5 --- D
648	                    \    |  \  /
649	                       R7 -- R8
650	                        |    |
651	                       R9 -- R10

653	                                  Figure 2

655	   In Figure 2, all the metrics are equal to 1 except
656	   R2-R7,R7-R8,R8-R4,R7-R9 which have a metric of 1000.  Considering R2

[minor]
The drawing here is in different style as figure 1 where - and * is used to visualize the different link metrics.
Maybe consistent drawing style should be used in the document?

665	   To avoid the possibility of this double FRR activation, an
666	   implementation of TI-LFA MAY pick only non protected adjacency
667	   segments when building the repair list.  However, this is important

[minor]
While double failures may initially sound as an exotic event, it may be 
more frequent as initially assumed when SRLGs are considered. In some operators 
multiple 'link' use the same optical cables and if one fiber gets cut, then 
many links may be impacted, causing double failures. Maybe worth to mention 
that double failures is not as rare as one may believe.

676	11.  Advantages of using the expected post-convergence path during FRR

[minor]
This section is complex detailed read and seems surface level over detailed. 
Can the advantage description not be simplified. Is this detail necessary for this place for the document? 
Alternatively, consider moving this section into an appendix
Consider removing anthropomorphism in this section. TI-LFA has no awareness, it may 
however be opaque to constraints (i.e. 'TI-LFA cannot be aware of such path constraints and' )

783	12.  Analysis based on real network topologies

[major]
consider placing this section into an appendix. The shared information 
does not add additional considerations to the TI-LFA procedure description
Orie Steele
No Objection
Roman Danyliw
No Objection
Comment (2024-04-17 for -13) Sent
Thank you to Roni Evans for the GENART review.

** Section 6.1 – 6.3 prescribe behavior that SHOULD happen. What is the consequence if that guidance is not followed?

** Section 9
Section 9.
   An implementation MAY support TI-LFA to protect Node-
   SIDs associated to a FlexAlgo.  In such a case, rather than computing
   the expected post-convergence path based on the regular SPF, an
   implementation SHOULD use the constrained SPF algorithm bound to the
   FlexAlgo (using the Flex Algo Definition) instead of the regular
   Dijkstra in all the SPF/rSPF computations that are occurring during
   the TI-LFA computation.

Why isn’t the above SHOULD a MUST?  If it is the case that an implementation uses a FlexAlgo (per sentence one), what would be the case where an implemented did not use the constrained SPF algorithm bound to the FlexAlgo?
Éric Vyncke
No Objection
Comment (2024-04-10 for -13) Sent
# Éric Vyncke, INT AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13

Thank you for the work put into this document. The flow appears to be logical and the text well explained, but to be honest it is too specific and too acronyms-heavy for me, i.e., my review is rather superficial and I am trusting the RTG ADs for their content review. Nevertheless, I like the clarity of section 10.

Please find below some non-blocking COMMENT points (but replies would be appreciated even if only for my own education), and some nits.

Special thanks to Stewart Bryant for the shepherd's detailed write-up including the WG consensus *but it lack* the justification of the intended status. I like the `This is a deployed protocol.` ;-) OTOH, the justification for *6* authors is rather weak: `The document has taken seven year to get to this point and seems to have settled at this number of authors.`

I hope that this review helps to improve the document,

Regards,

-éric

# COMMENTS (non-blocking)

## Abstract

Should "IP-FRR" be expanded ?

## Section 6

This section contains multiple "SHOULD" but does not explain when the "SHOULD" can be bypassed.

## Section 8.2

I am afraid cannot parse `Then the packet is protected as if its were a transit packet.`

## Section 12

This is value information of course even if the actual networks are not referenced. Beside the depth of the SID list, I would have welcome the amount of additional repair entries required in the node (is it simply destinations * links ?) as it could have an impact of amount of states in the routers.

# NITS (non-blocking / cosmetic)

## Section 2

Usually acronyms are introduced *after* the expansion, e.g., not as in `TLDP sessions (Targeted Label Distribution Protocol)`

## Section 2.1

This BCP14 template should probably better placed after the acronyms.
Francesca Palombini Former IESG member
No Objection
No Objection (for -13) Not sent

                            
John Scudder Former IESG member
(was Discuss) No Objection
No Objection (2025-02-16) Sent
Thanks for all your work. My concerns are sufficiently resolved to let me ballot No Objection.
Murray Kucherawy Former IESG member
(was Discuss) No Objection
No Objection (2025-01-11 for -19) Sent
Thanks for this work.  It was an interesting read, which is uncommon for those of us up in Applications space where the air is thin.

Thanks for addressing my DISCUSS points about use of BCP 14 language.  The remainder of my original comment follows.

==

I have the same question as others about having six authors on this document.  I concur in particular with Eric's comments.

I have some concerns with the shepherd writeup.  Question #11 is incomplete; question #13 has me concerned that the question was not directly asked of the authors.

Section 3 defines "SPT_old()" but it doesn't appear anywhere else in this document.  Also it seems to me the definitions of "Primary interface" and "Primary link" could be merged, because the former term doesn't appear anywhere in the document other than in the definition of the latter.  And a very minor point: We define "adj-sid()", but throughout the document it's variably that or "Adj-Sid()".  We should probably pick one and use it consistently.
Zaheduzzaman Sarker Former IESG member
No Objection
No Objection (for -13) Not sent