Ballot for draft-ietf-rtgwg-segment-routing-ti-lfa
Discuss
Yes
No Objection
No Record
Summary: Has 2 DISCUSSes. Needs one more YES or NO OBJECTION position to pass.
# John Scudder, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13 CC @jgscudder Thanks for this document. The technology is valuable, and the underlying techniques sound. Despite what you might guess from my abundance of comments, I like this document. I suspect it has suffered from having been edited repeatedly by the same set of experts, it becomes hard to see it as a new reader might. So, please accept my comments in the spirit of a new set of eyes looking over some long-established text. Although most of my remarks are non-blocking COMMENTs, I am putting two in as DISCUSS points. Some of my other comments are of a similar nature, but I think they're less serious for the overall coherence of the document. ## DISCUSS ### Whole document, is post-convergence of the essence, or not? The document seems to be arguing with itself about whether following the post-convergence path is, or is not an essential/required feature. In the Abstract: A key aspect of TI-LFA is the FRR path selection approach establishing protection over the expected post-convergence paths from the point of local repair It's a *key* aspect! OK! But then, in the Introduction: Although not a Ti-LFA requirement or constraint, TI-LFA also brings the benefit of the ability to provide a backup path that follows the expected post-convergence path Wait, it's "key" but "not a requirement or constraint"? Moving on to Section 6, The repair list encodes the explicit post-convergence path to the destination So it "encodes the explicit post-convergence path". "Encodes", not "might encode" or "can encode". So the Abstract is right and Section 2 is wrong. But wait there's more! Later in Section 11, traffic can be steered by the PLR onto its expected post-convergence path during the FRR phase So it "can"... which implies "doesn't have to be". There's more, for example, all of Section 5 talks about post-convergence paths, and there are many more mentions in Section 6 too. Given that Sections 5-8 seem to come closest to being the normative ones (though the document is sadly not very precise in this regard) I'm left with the impression that the Abstract is right ("key"), and the quoted passages of Sections 2 and 11 are wrong. In any case, I think this needs to be resolved in some way. ### Section 10, multiple unrelated failures Implementations of TI-LFA should deal with the occurence of multiple unrelated failures in accordance to the IP Fast Reroute Framework [RFC5714]. (Nit, you misspelled "occurrence".) Can you explain what you mean by this sentence? I haven't reviewed it carefully but RFC 5714 is a framework, and I don’t know what it means to be "in accordance to" it. The only relevant text I was able to find in it was RFC 5714 Section 5.2.6, However, it is important that the occurrence of a second failure while one failure is undergoing repair should not result in a level of service which is significantly worse than that which would have been achieved in the absence of any repair strategy. Putting that together with the quote from your specification, I come up with an interpretation like "it's important to behave reasonably in the face of multiple failures, we aren't going tell you how to do it, this other document we are citing isn't going to tell you how to do it either, it's just going to tell you that our specification was supposed to cover this, but we didn't.”
## COMMENTS ### General Gripe, Y U no XML? It looks like you must have uploaded the TXT rendering instead of the XML source. Whenever it is you upload your next revision, please consider uploading the XML source instead. The set of renderings available when you upload text is inferior to the renderings available when you upload source (notably, the modern HTML rendering is not available). If there's some reason uploading source is difficult or impossible for you, disregard this request of course. (The only reason I can think of for it to be hard is if you used a less-common tool to produce your draft, for example, nroff or the Microsoft Word template. If you use the more mainstream XML or MD workflows, it should be easy enough.) ### Abstract, for want of a hyphen the meaning was lost It extends these concepts to provide guaranteed coverage in any two connected networks using a link-state IGP. The only way I can make sense of this is to add a hyphen: It extends these concepts to provide guaranteed coverage in any two-connected networks using a link-state IGP. Is that what you meant? I would also suggest changing "using" to "that uses", as in, It extends these concepts to provide guaranteed coverage in any two-connected network that uses a link-state IGP. I'm not sure the concept of two-connectedness is universally understood enough that it's suitable for use in an Abstract, so I'd support further editing to get rid of the graph theory term entirely, but at least this edit makes it correct. ### Section 1, orphan definitions - RSPT is defined, but never used. Delete? - SLA is defined, only used once (and in a comment below I suggest deleting that use!). Delete and just expand on first use, if the first-and-only use is kept at all? - SPT is defined, but never used. Delete? - SRGB is defined, only used once. Delete and just expand on first use? - TLDP is defined, only used once. Delete and just expand on first use? (If you keep it, please correct the nit from Ben Niven-Jenkins’ RTGAREA review.) ### Section 2, "aims at" Segment Routing aims at supporting services with tight SLA guarantees [RFC8402]. This seems to be rewriting history. That is to say, sure SR “aims at supporting services with tight SLA guarantees”, but only in the sense that every general-purpose packet transport does; without specifics this isn’t very meaningful. In any case, the citation doesn’t supply evidence for the statement, indeed the string “SLA” or “service level” never occurs in RFC 8402. Maybe just remove this sentence? It doesn’t appear essential in any case. ### Section 2, can't parse this sentence By relying on SR this document provides a local repair mechanism for standard link-state IGP shortest path capable of restoring end-to-end connectivity in the case of a sudden directly connected failure of a network component. I can’t parse this sentence. I am *guessing* that what you mean is something like, NEW: This document leverages Segment Routing (SR) to provide a local repair mechanism for a shortest path computed by a standard link-state IGP. The local repair is capable of restoring end-to-end connectivity in the case of a failure of a directly connected network component. If this is what you mean, you’re welcome to use this text if you want. If it’s not what you mean, please explain? Note I removed “sudden” in my NEW text because I’m guessing you don’t mean to exclude a gradual or a foreseeable failure. A failure is a failure, after all. ### Section 2, When the network reconverges When the network reconverges I suggest, NEW: When the network reconverges after a failure See also the next comment. ### Section 2, microloops Much of Section 2 is a discussion of microloops and exactly how TI-LFA relates to them. This doesn’t seem like introductory material, especially because the rest of the specification doesn't talk about microloops at all. I suggest moving that material to a new (sub)section for that purpose and just mentioning it in the Introduction, as in something like "microloops are not addressed by TI-LFA and can be a concern in some deployments. This is discussed in <xref>." I don't insist that you make this change, the document is still usable without it. However, I think that the Introduction as it stands is not very useful *as an introduction* because of the abundance of non-introductory material. (Some of my subsequent comments fall under the same heading.) ### Section 2, “primary link” You mention the “primary link” in a few places here (and nowhere else). What is the “primary link”? Please clarify or re-word. For example, maybe you mean “the link whose failure is detected”. ### Section 2, what value does comparing to older FRR techniques add to an intro? By using SR, TI-LFA does not require the establishment of TLDP sessions (Targeted Label Distribution Protocol) with remote nodes in order to take advantage of the applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or remote LFAs with directed forwarding (DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in the link state database (LSDB) of the IGP. As a result, preferring LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or DLFA repair nodes is not required anymore. I see why you wanted this paragraph during the process of developing this spec and persuading the WG of its value. I don’t see how it contributes any value to the final spec. Similarly, By using SR, there is no need to create state in the network in order to enforce an explicit FRR path. This relieves the nodes themselves from having to maintain extra state, and it relieves the operator from having to deploy an extra protocol or extra protocol sessions just to enhance the protection coverage. Appears to just be a restatement of the value proposition of SR itself. I don’t see value in restating it in this document. I think you could remove both these paragraphs without harm, and it would make the document a quicker and clearer read. ### Section 2, encoding challenges One of the challenges of TI- LFA is to encode the expected post-convergence path by combining adjacency segments and node segments. Do you mean “compactly encode”, “efficiently encode”, or similar? Is encoding, per se, the challenge? ### Section 2, roadmap should be a roadmap I agree with Ben Niven-Jenkins that the omission of Sections 8-10 in the overview list at the end of the Introduction is a bit jarring to the reader. (I would add Section 13, too.) ### Section 3, exclude from that set I think this is wrong: Exclude from that set of neighbors that are reachable from R using X. Did you mean, NEW: Exclude from that set, the neighbors that are reachable from R using X. ### Section 3, defined but not used A symmetric network is a network such that the IGP metric of each link is the same in both directions of the link. This definition is never used. ### Section 6, you can't guarantee that is guaranteed to be loop- free irrespective of the state of FIBs along the nodes belonging to the explicit path. As written, there’s no way to guarantee that. (Trivial proof, one possible state of a FIB is to point back to the preceding node along the path. That might not be an *expected* state, but it is *a* state.) I think this is just a case of overly casual writing, and you mean that the loop-free property will exist regardless of whether the nodes belonging to the explicit path have converged to recognize failure X or not. Consider rewriting along those lines? ### Section 6 and others Please supply definitions for “P node” and “Q node”. ### Section 6, terminology is inconsistent and unclear I can understand NodeSID(R1) well enough even though you haven't supplied a definition, it’s the node SID for router R1. But what is “Node_SID(P)”? P isn’t a router, it’s a set of routers, or a space. Please clarify, whether with a definition or otherwise. Probably your clarification will fix both this and the previous point. While you’re at it, you might as well make your terminology consistent. In one of the node SID cases above you use an underscore, and in the other, you don’t. Also, your node SID notation is inconsistent with your adjacency SID notation which looks like AdjSID_R1R2 — so in one case an underscore and parentheses, in another case parentheses with no underscore, and in the final case an underscore with no parentheses. Pick one. (And then there's Section 12 with "node-SID"...) ### Sections 6.1, 6.2, 6.3, SHOULD NOT use SHOULD What work are the SHOULDs doing here? Considering that in other parts of the document you leave the computation of the repair path up to the implementation, why are you mandating it here? And, if you’re mandating it, why not mandate it all the way with a MUST? It seems to me that if you don’t want to mandate implementation, it would be sensible to take the RFC 2119 language out altogether. If you do want to mandate implementation, I don’t see why you wouldn’t make this a MUST. If you really do want it to be SHOULD, Please explain your reasoning. ### Section 7, primary outgoing interface The existence of a "primary outgoing interface" seems to imply the existence of a secondary outgoing interface, tertiary outgoing interface, etc. Please define primary outgoing interface, or if this isn't an important distinction, consider whether you can simplify to just say "outgoing interface". ### Section 7.1, first The active segment becomes the first segment of the repair list. By “first” do you mean “first to be pushed, last to be processed”? If so, I suggest clarifying that in the text, because the plain English reading of "first" is the opposite. ### Section 7.2, "as stated"... where? As stated in Section 2, when SR policies are involved and a strict compliance of the policy is required, an end-to-end protection should be preferred over a local repair mechanism. I don’t see this in section 2 (I searched for “end-to-end”, “prefer”, “policy” and “policies”). Can you help me understand what text in section 2 you’re talking about? ### Section 7.2.1, what's an Adj()? Please define Adj(). Is this yet another terminology variation? (c.f. Section 6 comment on AdjSID_R1R2) ### Section 8.1, what's the tail end of a node segment? 1. If the active segment is a node segment that has been signaled with penultimate hop popping and the repair list ends with an adjacency segment terminating on the tail-end of the active segment, then the active segment MUST be popped before pushing the repair list. What is the tail end of a node segment? I can’t figure out what that means. I think I know what you’re trying to say, but please find a way to reword it that doesn’t end up making me try to parse the above with my "what did the authors actually *mean*?" glasses on. ### Section 8.2 and others, description of SRv6 behaviors RFC 8754 describes forwarding behaviors using a kind of line-numbered pseudocode, and later documents that modify forwarding behaviors specify updates to the pseudocode. (Examples: RFC 8986, draft-ietf-spring-srv6-srh-compression-15, draft-ietf-rtgwg-srv6-egress-protection-16, draft-ietf-spring-sr-redundancy-protection-03, draft-ietf-spring-srv6-path-segment-07) You don’t do this, you use a descriptive approach instead. I'm ok with this in isolation, but I’d like to know if you made an affirmative decision to diverge from the usual SRv6 way of doing things, and if so, why, and if the SPRING working group specifically considered this and is OK with it. ### Section 8.2, shorter than what? In such case, there is no need for a preceding Prefix SID and the resulting repair list is likely shorter. Shorter than what? This is the first place in this document the string “Prefix SID” occurs, so I’m confused. ### Section 11, limit the implementation of local FRR policies Based on this assumption, in order to facilitate the operation of FRR, and limit the implementation of local FRR policies Do you mean "limit the need for implementation of local FRR policies"? (And you could drop “implementation of” for that matter.) ### Section 11, TI-LFA and SR policies don't mix? The last paragraph, regarding the use of SR policies, and also Section 9, leaves me wondering whether a simpler statement would be that TI-LFA is inappropriate for use in a network that makes use of SR policies. Is this a fair characterization? ### Section 12, can't this be an appendix? Shouldn’t this be an Appendix? In general, that seems like a common (and good!) practice for inessential information like this, especially when it has a potentially limited shelf-life. Also, although I appreciate that you provided some rudimentary parameterization of the topologies in Table 1, I think it would be helpful to at least say what time period the topologies reflect — draft-francois-rtgwg-segment-routing-ti-lfa-00 dates to summer of 2015; are we talking about the topologies that were in vogue in 2015? Those of 2023? Etc. ### Section 12, granularity wut We do not cover the case for 2 SIDs (Section 6.3) separately because there was no granularity in the result. I don’t understand what this means. Can you rephrase it? Generally, I find the words “granular”. “granularity”, and related have almost zero descriptive power. :-( ### Section 12, "2 or more" or "2", "3", and no more? In your description of the table, you say, The convention that we use is as follows * 0 SIDs: the calculated repair path starts with a directly ... * 1 SIDs: the repair node is a PQ node, in which case only 1 SID is ... * 2 or more SIDs: The repair path consists of 2 or more SIDs as described in Section 6.3 and Section 6.4. We do not cover the case for 2 SIDs (Section 6.3) separately ... But the table headers show: +-------------+------------+------------+------------+------------+ | Network | 0 SIDs | 1 SID | 2 SIDs | 3 SIDs | +-------------+------------+------------+------------+------------+ I.e. the table headers don’t show “2 or more” they show 2, and 3, broken out distinctly, and no "or more" case. Seems like these need to be reconciled. ### Section 13, guaranteed upper bound The techniques described in this document are internal functionalities to a router that result in the ability to guarantee an upper bound on the time taken to restore traffic flow upon the failure of a directly connected link or node. This is the only place in the document where you talk about guaranteed upper bound. This is a fairly strong promise to make, I think you shouldn't be mentioning it unless you provide some kind of support for how the guarantee is provided. Note, I don't question that TI-LFA can be part of the machinery providing such a guarantee, but without showing your work I don't think you can make this claim. ## Notes This review is in the ["IETF Comments" Markdown format][ICMF], You can use the [`ietf-comments` tool][ICT] to automatically convert this review into individual GitHub issues. [ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md [ICT]: https://github.com/mnot/ietf-comments
John commented on this, but I think it's worth discussing briefly for clarification: There are several unsupported SHOULDs in this document. By "unsupported" I'm referring to the fact that SHOULD presents the implementer with a choice, and it's in our best interests to provide them with enough context to make an informed one. For instance, in Section 6.1: "When a remote node R is in P(S,X) and Q(D,x) and on the post-convergence path, the repair list SHOULD be made of a single node segment to R and the outgoing interface SHOULD be set to the outgoing interface used to reach R." Why is this a SHOULD? Will this still interoperate properly if the implementer doesn't do this? If so, why not use MAY? If not, why not use MUST? Why/when might someone implementing this specification legitimately not do what it says here? I have the same questions about the SHOULDs in 6.2 and 6.3. The one in Section 9 is closer to a more complete presentation. If the reason for doing this is to accommodate extant deployments that don't, I suggest something like "New implementations MUST do X, but for backward compatibility SHOULD continue to support Y which the deployed base uses". I saw some replies to John's comments along the lines of "computation are part of the implementation", but I didn't understand how that answers the question.
Thanks for this work. It was an interesting read, which is uncommon for those of us up in Applications space where the air is thin. I have the same question as others about having six authors on this document. I concur in particular with Eric's comments. I have some concerns with the shepherd writeup. Question #11 is incomplete; question #13 has me concerned that the question was not directly asked of the authors. Section 3 defines "SPT_old()" but it doesn't appear anywhere else in this document. Also it seems to me the definitions of "Primary interface" and "Primary link" could be merged, because the former term doesn't appear anywhere in the document other than in the definition of the latter. And a very minor point: We define "adj-sid()", but throughout the document it's variably that or "Adj-Sid()". We should probably pick one and use it consistently.
[8-May-2024 - Updated Position: No Objection] # Gunter Van de Velde, RTG AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13 [Resolved] Please find below two blocking DISCUSS points (easy to address), and a series of non-blocking COMMENTs and some nits. Many thanks for the RTGDIR reviews from Stewart Bryant, Andy Smith and Ben Niven-Jenkins during the 7 years development period of the TI-LFA specification. Also many thanks for the shepherd write-up by Steward Bryant to provide a brief overview of the progress of the draft through the WG and the current state of art. Thank you to the authors of this document. I really appreciate the effort and believe it captures the TI-LFA normative procedures well. Reviewing it with fresh eyes, I've made several comments that could help further improve the quality. I hope these insights will be valuable for the authors and the Working Group as you continue to refine the document. DISCUSS: ======== [Resolved] DISCUSS#1 In section '9. TI-LFA and SR algorithms' i found the text written from sr-mpls perspective. SRv6 has different considerations. 637 and Q-Space as well as the post-convergence path. An implementation 638 MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that 639 are unprotected to build the repair list. The above seems written from an sr-mpls perspective. For SRv6 the Adj-SID is bound to a Locator and consequently bound to an algorithm. As result, the observed limitation of sr-mpls does not really apply for SRv6. For SRv6 an implementation can use protected Adj-SID in the repair path without breaking algorithm aware topology requirements. Consider allowing protected SRv6 Adj-SIDs for TI-LFA. In addition consider some blob of text about Adj-SIDs and locators in "section 8.2. SRv6 dataplane considerations" could be beneficial. With sr-mpls there is no correlation to the segment routing algorithm, however when using SRv6 dataplane Adj-SID Locator is correlated to an algorithm. [Resolved] DISCUSS#2 Sections 11 and 12 do not introduce any supplementary artifacts to the normative procedures outlined for TI-LFA. The information within section11 and 12 is provided in extensive detail. Should the Working Group (WG) prefer to maintain this level of specificity, it is advisable to consider relocating the detailed content to an appendix unless there is a strong reason to keep it in the main body of the document. High level comments: ==================== * TI-LFA is based upon Segment Routing, however the document seems to have mostly sr-mpls datapane type language. The SRv6 dataplane is only mentioned first time on line 493, almost half way through the document. Maybe consider mentioning support for SRv6 dataplane earlier onwards. * 6 people on front page. Did all authors edit text in the draft? * Operational impact may want to explicit mention that there is no interop complexity because TI-LFA is a node local operation * the document makes use of the term 'we' and other anthropomorphism. Maybe not the best approach in a formal document. Who is 'we'? editor, authors, WG, IETF community, operators, etc? policies have no awareness or emotions Detailed review COMMENTS ([minor] and [major]) ============================================== (Line numbers are rendered using idnits rendering) 19 This document presents Topology Independent Loop-free Alternate Fast 20 Re-route (TI-LFA), aimed at providing protection of node and 21 adjacency segments within the Segment Routing (SR) framework. This [minor] s/Re-route/Reroute/ [major] The description provide insight that TI-LFA provide protection of node and adj segments. It does not specify what 'protection' is all about or that 'protection' is constrained to single link|node failures. i.e. rfc5286 has explicit text in the abstract about single failure applicability. 24 (DLFA). It extends these concepts to provide guaranteed coverage in 25 any two connected networks using a link-state IGP. A key aspect of [major] in this sentence 'two connected networks' is referenced, while earlier in the paragraph there is indication of 'protection of node and adjacency segments'. How doe two connected networks correlate with the segments? 25 any two connected networks using a link-state IGP. A key aspect of 26 TI-LFA is the FRR path selection approach establishing protection 27 over the expected post-convergence paths from the point of local 28 repair, reducing the operational need to control the tie-breaks among 29 various FRR options. [minor] suggested rewrite to make the text better readable: A principal attribute of TI-LFA is the FRR path selection methodology, which establishes protection over the anticipated post-convergence paths from the point of local repair. This approach diminishes the operational necessity to manage the tie-breaks among various FRR alternatives. [minor] why is the path selection better? can a hint be given why it is better beyond a statement proclaiming it is better? 138 * TI-LFA: Topology Independant LFA. [minor] s/Independant/Independent/ 144 Segment Routing aims at supporting services with tight SLA guarantees 145 [RFC8402]. By relying on SR this document provides a local repair [major] The term SLA does not appear even once in RFC8402. How can the claim of tight SLA be justified with RFC8402? can an better pointer to the claim be inserted? [minor] s/Segment Routing/Segment Routing (SR)/ 145 [RFC8402]. By relying on SR this document provides a local repair 146 mechanism for standard link-state IGP shortest path capable of 147 restoring end-to-end connectivity in the case of a sudden directly 148 connected failure of a network component. Non-SR mechanisms for [minor] readability rewrite: This document outlines a local repair mechanism that leverages Segment Routing (SR) to restore end-to-end connectivity in the event of an abrupt failure involving a directly connected network component. This mechanism is designed for standard link-state Interior Gateway Protocol (IGP) shortest path scenarios. 153 The term topology independent (TI) refers to the ability to provide a 154 loop free backup path irrespective of the topologies used in the 155 network. This provides a major improvement compared to LFA [RFC5286] 156 and remote LFA [RFC7490] which cannot provide a complete protection 157 coverage in some topologies as described in [RFC6571]. [minor] I think what is been trying to say is: The term topology independent (TI) describes the capability of providing a loop-free backup path that is effective across all network topologies. This represents a significant enhancement over Loop-Free Alternate (LFA) [RFC5286] and Remote LFA as outlined in [RFC7490], both of which do not offer comprehensive protection coverage in certain topological configurations as detailed in [RFC6571]. TI-LFA ensures the availability of a backup path if a post-convergence path exists, regardless of the network topology. 167 TI-LFA is a local operation applied by the PLR when it detects 168 failure of one of its local links. As such, it does not affect: [minor] It would be welcome to explicit spell that TI-LFA is protection against a single local link failure [minor] It was mentioned that TI-LFA provide protection against link and node failure. In this section the abrupt fail of a link is mentioned to trigger FRR. How is node-protection with TI-LFA achieved and the PLR triggered that neighboring node is no more operational? It is elaborated upon later in this section, but maybe a brief hint could be provided here too? 167 TI-LFA is a local operation applied by the PLR when it detects 168 failure of one of its local links. As such, it does not affect: 170 * Micro-loops that appear - or do not appear – as part of the 171 distributed IGP convergence [RFC5715] on the paths to the 172 destination that do not pass thru TI-LFA paths: 174 - As explained in [RFC5714], such micro-loops may result in the 175 traffic not reaching the PLR and therefore not following TI-LFA 176 paths. 178 * Micro-loops that appear – or do not appear - when the failed link 179 is repaired. [minor] This does not process very well. I tried reading a few times this paragraph and believe what is mentioned could be rewritten as follows: "TI-LFA operates locally at the Point of Local Repair (PLR) upon detecting a failure in one of its direct links. Consequently, this local operation does not influence: * Micro-loops that may or may not form during the distributed Interior Gateway Protocol (IGP) convergence as delineated in RFC 5715. - These micro-loops occur on routes directed towards the destination that do not traverse TI-LFA-configured paths. According to [RFC5714], the formation of such micro-loops can prevent traffic from reaching the PLR, thereby bypassing the TI-LFA paths established for rerouting. * Micro-loops that may or may not develop when the previously failed link is restored to functionality. This specification highlights that while TI-LFA effectively addresses specific link failures, it does not extend its impact to managing micro-loops associated with broader IGP convergence issues or subsequent link repairs." 181 TI-LFA paths are loop-free. What’s more, they follow the post- 182 convergence paths, and, therefore, not subject to micro-loops due to 183 difference in the IGP convergence times of the nodes thru which they 184 pass. [minor] This is a rather unformal writing style. what about the following: TI-LFA paths are inherently loop-free and align with post-convergence routes. Consequently, they are not susceptible to micro-loops that may arise due to variations in the IGP convergence times across different nodes through which these paths traverse. This ensures a stable and predictable routing environment, minimizing disruptions typically associated with asynchronous network behavior. 186 TI-LFA paths are applied from the moment the PLR detects failure of a 187 local link and until IGP convergence at the PLR is completed. [minor] readability rewrite: TI-LFA paths are activated from the instant the PLR detects a failure in a local link and remain in effect until the Interior Gateway Protocol (IGP) convergence at the PLR is fully achieved. 190 micro-loops, especially if these paths have been computed using the 191 methods described in Section Section 6.2, Section 6.3, or Section 6.4 192 of the draft. One of the possible ways to prevent such micro-loops [minor] Instead of simply referencing the sections 6.2, 6.3 and 6.4, maybe line up the conditions in which this occurs combined with the section references. This could be something in the style 'if the FRR path is not using a direct neighbor then... etc etc etc' 206 For each destination in the network, TI-LFA pre-installs a backup [minor] what does destination exactly mean? is that a /32 or /128 node? or is it router-ids? any other abstraction intended? 224 By using SR, TI-LFA does not require the establishment of TLDP 225 sessions (Targeted Label Distribution Protocol) with remote nodes in 226 order to take advantage of the applicability of remote LFAs (RLFA) 227 [RFC7490][RFC7916] or remote LFAs with directed forwarding 228 (DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in 229 the link state database (LSDB) of the IGP. As a result, preferring 230 LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or 231 DLFA repair nodes is not required anymore. [minor] possible rewrite for readability and simplicity: " By utilizing Segment Routing (SR), TI-LFA eliminates the need to establish Targeted Label Distribution Protocol (TLDP) sessions with remote nodes for leveraging the benefits of Remote Loop-Free Alternates (RLFA) [RFC7490][RFC7916] or Directed Loop-Free Alternates (DLFA) [RFC5714]. All the Segment Identifiers (SIDs) required are present within the Link State Database (LSDB) of the Interior Gateway Protocol (IGP). Consequently, there is no longer a necessity to prefer LFAs over RLFAs or DLFAs, nor is there a need to minimize the number of RLFA or DLFA repair nodes. " 233 By using SR, there is no need to create state in the network in order 234 to enforce an explicit FRR path. This relieves the nodes themselves 235 from having to maintain extra state, and it relieves the operator 236 from having to deploy an extra protocol or extra protocol sessions 237 just to enhance the protection coverage. [minor] what about this blob of text: " Utilizing SR makes the requirement unnecessary to establish additional state within the network for enforcing explicit Fast Reroute (FRR) paths. This alleviation spares the nodes from maintaining supplementary state and frees the operator from the necessity to implement additional protocols or protocol sessions solely to augment protection coverage. " 239 Although not a Ti-LFA requirement or constraint, TI-LFA also brings s/Ti-LFA/TI-LFA/ 242 reduces the need of locally configured policies that drive the backup [minor] unsure what is meant with 'drive' means here. Would it be better to day that 'describe the backup...' 243 path selection ([RFC7916]). The easiest way to express the expected 244 post-convergence path in a loop-free manner is to encode it as a list 245 of adjacency segments. However, this may create a long SID list that [major] you write 'is to encode it'. What is the 'it'? I understand this is a suggesting Adj SIDs. I also believe that simply having a list of Adj SIDs is not sufficient, but that an "ordered" list of Adj SIDs is needed. 245 of adjacency segments. However, this may create a long SID list that 246 some hardware may not be able to push. One of the challenges of TI- [minor] should we say push or program? push seems more sr-mpls dataplane specific, while TI-LFA has applicability with SRv6 also 248 adjacency segments and node segments. Each implementation will be 249 free to have its own SID list optimization algorithm. This document 250 details the basic concepts that could be used to build the SR backup 251 path as well as the associated dataplane procedures possible rewrite: " Each implementation may independently develop its own algorithm for optimizing the ordered SID list. This document provides an outline of the fundamental concepts applicable to constructing the SR backup path, along with the related dataplane procedures. " 288 We define the main notations used in this document as the following. 290 We refer to "old" and "new" topologies as the LSDB state before and 291 after the considered failure. [minor] I would like to prefer not using the word 'we'. It is undefined who that is. Is it the editor, authors, the WG the internet community, etc... 286 3. Terminology [minor] Would section 3 be better located before section 2 for clarity? [major] Later in the document there is usage of P(S,X) and Q(D,X) while the terminology section only documents P(R,X). Maybe add some text to clarify the intended use. 321 EP(P, Q) is an explicit SR-based path from a node P to a node Q. [minor] why not simply use 'SR path' instead of 'SR-based path'? does the postfix '-based' add any representative value? 335 An implementation is free to use any local optimization to provide 336 smaller SID lists by combining Node SIDs and Adjacency SIDs. In [minor] The intent seems to be to integrate adj SIDs and node SIDs into the SID lists. Not sure that we are combining multiple SIDs into less SIDs: "An implementation may employ any local optimization strategy to reduce the size of SID lists by integrating Node SIDs and Adjacency SIDs into the SID lists." 342 5. Intersecting P-Space and Q-Space with post-convergence paths 343 344 One of the challenges of defining an SR path following the expected 345 post-convergence path is to reduce the size of the segment list. In [minor] at the end of section 4 is written "These optimizations are out of scope of this document," and then the first paragraph identifies that reducing the SID lists is one of the challenges. For something that is out-of-scope of the document it is perceived as rather important though problem to address. If truly out of scope of this document, then maybe add explicit that the section 5 is all informational [minor] in some places the term 'segment lists' is used, in others 'SID lists'. Could a single terminology be used throughout the document? [major] In the Terminology section the P-space, extended P-space and the Q-space is explained. Not sure why all this is explained again in more explicit steps. It make me wonder if section 5 can be reduced by reusing the Terminology in section 3 and focus upon those? 356 We want to determine which nodes on the post-convergence path from [minor] who is 'we'? 358 regard to resource X (X can be a link or a set of links adjacent to 359 the PLR, or a neighbor node of the PLR). [minor] in section 3 Terminology section the document resource X was defined, but using different definition: 'resource X (e.g. a link S-F, a node F, or a SRLG)' Which one is correct? maybe reuse the Terminology definition for consistency 378 This can be found by intersecting the set of nodes belonging to the 379 post-convergence path from R to D, assuming the failure of X, with 380 Q(D, X). [minor] In terminology section 3 the Q(R, X) is described with 'R' used while in this section5.2 the term Q(D, X) has 'D' used. Is this intentional? why not add this in Terminology section also? or make the Terminology section more opaque to using any letter (e.g. 'R' or 'D') and describe the intend of the Q(...) function? 397 protected resource X and, at the same time, is guaranteed to be loop- 398 free irrespective of the state of FIBs along the nodes belonging to 399 the explicit path. Thus, there is no need for any co-ordination or [minor] There is assumption here that only SR programs the FIB. There may be out of Band FIB programming that does cause loops. Maybe frame the claim better by expressing the assumption made to warrant loop-free paths. 460 6.2. FRR path using a PQ node [minor] Is there a reason that there are no considerations for an implementer to select the PQ node closest to the S or closest to the D? 499 interface for the packet, S-F. The failure of the primary outgoing [minor] what is the 'F' in the S-F? 512 We define hereafter the FRR behavior applied by S for any packet 513 received with an active adjacency segment S-F for which protection 514 was enabled. As protection has been enabled for the segment S-F and 515 signaled in the IGP (for instance using protocol extensions from 516 [RFC8667] and [RFC8665]), any SR policy using this segment knows that 517 it may be transiently rerouted out of S-F in case of S-F failure. [minor] A policy is a configuration. A policy does not 'know' anything. Can the statement be made without anthropomorphism? 637 and Q-Space as well as the post-convergence path. An implementation 638 MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that 639 are unprotected to build the repair list. [major] This is written from an sr-mpls perspective. For SRv6 the Adj is bound to an algorithm and this condition does not apply 647 S --- R2 --- R3 --- R4 --- R5 --- D 648 \ | \ / 649 R7 -- R8 650 | | 651 R9 -- R10 653 Figure 2 655 In Figure 2, all the metrics are equal to 1 except 656 R2-R7,R7-R8,R8-R4,R7-R9 which have a metric of 1000. Considering R2 [minor] The drawing here is in different style as figure 1 where - and * is used to visualize the different link metrics. Maybe consistent drawing style should be used in the document? 665 To avoid the possibility of this double FRR activation, an 666 implementation of TI-LFA MAY pick only non protected adjacency 667 segments when building the repair list. However, this is important [minor] While double failures may initially sound as an exotic event, it may be more frequent as initially assumed when SRLGs are considered. In some operators multiple 'link' use the same optical cables and if one fiber gets cut, then many links may be impacted, causing double failures. Maybe worth to mention that double failures is not as rare as one may believe. 676 11. Advantages of using the expected post-convergence path during FRR [minor] This section is complex detailed read and seems surface level over detailed. Can the advantage description not be simplified. Is this detail necessary for this place for the document? Alternatively, consider moving this section into an appendix Consider removing anthropomorphism in this section. TI-LFA has no awareness, it may however be opaque to constraints (i.e. 'TI-LFA cannot be aware of such path constraints and' ) 783 12. Analysis based on real network topologies [major] consider placing this section into an appendix. The shared information does not add additional considerations to the TI-LFA procedure description
Thank you to Roni Evans for the GENART review. ** Section 6.1 – 6.3 prescribe behavior that SHOULD happen. What is the consequence if that guidance is not followed? ** Section 9 Section 9. An implementation MAY support TI-LFA to protect Node- SIDs associated to a FlexAlgo. In such a case, rather than computing the expected post-convergence path based on the regular SPF, an implementation SHOULD use the constrained SPF algorithm bound to the FlexAlgo (using the Flex Algo Definition) instead of the regular Dijkstra in all the SPF/rSPF computations that are occurring during the TI-LFA computation. Why isn’t the above SHOULD a MUST? If it is the case that an implementation uses a FlexAlgo (per sentence one), what would be the case where an implemented did not use the constrained SPF algorithm bound to the FlexAlgo?
# Éric Vyncke, INT AD, comments for draft-ietf-rtgwg-segment-routing-ti-lfa-13 Thank you for the work put into this document. The flow appears to be logical and the text well explained, but to be honest it is too specific and too acronyms-heavy for me, i.e., my review is rather superficial and I am trusting the RTG ADs for their content review. Nevertheless, I like the clarity of section 10. Please find below some non-blocking COMMENT points (but replies would be appreciated even if only for my own education), and some nits. Special thanks to Stewart Bryant for the shepherd's detailed write-up including the WG consensus *but it lack* the justification of the intended status. I like the `This is a deployed protocol.` ;-) OTOH, the justification for *6* authors is rather weak: `The document has taken seven year to get to this point and seems to have settled at this number of authors.` I hope that this review helps to improve the document, Regards, -éric # COMMENTS (non-blocking) ## Abstract Should "IP-FRR" be expanded ? ## Section 6 This section contains multiple "SHOULD" but does not explain when the "SHOULD" can be bypassed. ## Section 8.2 I am afraid cannot parse `Then the packet is protected as if its were a transit packet.` ## Section 12 This is value information of course even if the actual networks are not referenced. Beside the depth of the SID list, I would have welcome the amount of additional repair entries required in the node (is it simply destinations * links ?) as it could have an impact of amount of states in the routers. # NITS (non-blocking / cosmetic) ## Section 2 Usually acronyms are introduced *after* the expansion, e.g., not as in `TLDP sessions (Targeted Label Distribution Protocol)` ## Section 2.1 This BCP14 template should probably better placed after the acronyms.