Skip to main content

Early Review of draft-ietf-mpls-ri-rsvp-frr-18
review-ietf-mpls-ri-rsvp-frr-18-rtgdir-early-talaulikar-2024-05-27-00

Request Review of draft-ietf-mpls-ri-rsvp-frr
Requested revision No specific revision (document currently at 22)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2024-05-20
Requested 2024-05-01
Requested by Jim Guichard
Authors Chandrasekar R , Tarek Saad , Ina Minei , Dante Pacella
I-D last updated 2024-05-27
Completed reviews Secdir Early review of -17 by David Mandelberg (diff)
Rtgdir Early review of -18 by Ketan Talaulikar (diff)
Rtgdir Last Call review of -05 by Julien Meuric (diff)
Genart Last Call review of -07 by Reese Enghardt (diff)
Rtgdir Last Call review of -16 by Carlos Pignataro (diff)
Assignment Reviewer Ketan Talaulikar
State Completed
Request Early review on draft-ietf-mpls-ri-rsvp-frr by Routing Area Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/rtg-dir/PRZJa7LH9b3J1aRFo3BvYZ3DQUQ
Reviewed revision 18 (document currently at 22)
Result Has issues
Completed 2024-05-27
review-ietf-mpls-ri-rsvp-frr-18-rtgdir-early-talaulikar-2024-05-27-00
Hello,

I've been asked to review this document for the RTGDIR.

Thanks to the authors and the WG members for a very well written and thorough
document.

My concern is with the overloading of the I-flag with additional semantics
instead of defining a new one and most of the major comments are related to
this. Other than that, I believe the other comments are minor and should be
easy to address/clarify.

Please find below inline comments in the ID nits output (with portions snipped
out).

Thanks,
Ketan



15	Abstract

17	   The RSVP-TE Fast Reroute extensions specified in RFC 4090 defines two
18	   local repair techniques to reroute Label Switched Path (LSP) traffic
19	   over pre-established backup tunnel.  Facility backup method allows
20	   one or more LSPs traversing a connected link or node to be protected
21	   using a bypass tunnel.  The many-to-one nature of local repair
22	   technique is attractive from scalability point of view.  This
23	   document enumerates facility backup procedures in RFC 4090 that rely
24	   on refresh timeout and hence make facility backup method refresh-
25	   interval dependent.  The RSVP-TE extensions defined in this document
26	   will enhance the facility backup protection mechanism by making the
27	   corresponding procedures refresh-interval independent and hence
28	   compatible with Refresh-interval Independent RSVP (RI-RSVP) specified
29	   in RFC 8370.  Hence, this document updates RFC 4090 in order to
30	   support RI-RSVP capability specified in RFC 8370.

< major > Since RFC8370 does not formally update the base RSVP-TE specs, is it
appropriate for this document to formally update RFC4090. I would think that
it enhances RFC4090? I'll let the responsible AD determine if this is
appropriate.



360	4.1.  Requirement on RFC 4090 Capable Node to advertise RI-RSVP
361	      Capability

363	   A node supporting facility backup protection [RFC4090] MUST set the
364	   RI-RSVP flag (I bit) that is defined in Section 3.1 of RSVP-TE
365	   Scaling Techniques [RFC8370] only if it supports all the extensions
366	   specified in the rest of this document.  Hence, this document updates

< major > Was the use of another different capability considered instead of
overloading the existing I bit with additional requirements? This would have
taken care of some of the backward compatibility issues described further.
There is no discussion why this isn't being done - assuming there is/was a good
reason for it.

367	   [RFC4090] by defining extensions and additional procedures over
368	   facility backup protection [RFC4090] in order to advertise RI-RSVP
369	   capability [RFC8370].  However, if a node supporting facility backup
370	   protection [RFC4090] does set the RI-RSVP capability (I bit) but does
371	   not support all the extensions specified in the rest of this
372	   document, then it leaves room for stale state to linger around for an
373	   inordinate period of time given the long refresh intervals
374	   recommended by [RFC8370] or disruption of normal FRR operation.
375	   Procedures for backward compatibility (see Section 4.6.2.3 of this
376	   document) delves on this in detail.

378	4.2.  Signaling Handshake between PLR and MP

380	4.2.1.  PLR Behavior

382	   As per the facility backup procedures [RFC4090], when an LSP becomes
383	   operational on a node and the "local protection desired" flag has
384	   been set in the SESSION_ATTRIBUTE object carried in the Path message
385	   corresponding to the LSP, then the node attempts to make local
386	   protection available for the LSP.

388	   -  If the "node protection desired" flag is set, then the node tries
389	      to become a PLR by attempting to create a NP-bypass LSP to the
390	      NNhop node avoiding the Nhop node on protected LSP path.  In case
391	      node protection could not be made available, the node attempts to
392	      create an LP-bypass LSP to the Nhop node avoiding only the link
393	      that the protected LSP takes to reach the Nhop

395	   -  If the "node protection desired" flag is not set, then the PLR
396	      attempts to create an LP-bypass LSP to the Nhop node avoiding the
397	      link that the protected LSP takes to reach the Nhop

399	   With regard to the PLR procedures described above and that are
400	   specified in [RFC4090], this document specifies the following
401	   additional procedures to support RI-RSVP [RFC8370].

403	   -  While selecting the destination address of the bypass LSP, the PLR
404	      MUST select the router ID of the NNhop or Nhop node from the Node-
405	      ID sub-object included in the RRO object carried in the most
406	      recent Resv message corresponding to the LSP.  If the MP has not
407	      included a Node-ID sub-object in the Resv RRO and if the PLR and
408	      the MP are in the same area, then the PLR may utilize the TED to

< minor > Shouldn't this be MAY ?

409	      determine the router ID corresponding to the interface address
410	      included by the MP in the RRO object.  If the NP-MP in a different
411	      IGP area has not included a Node-ID sub-object in RRO object, then
412	      the PLR MUST execute backward compatibility procedures as if the
413	      downstream nodes along the LSP do not support the extensions
414	      defined in the document (see Section 4.6.2.1).

416	   -  The PLR MUST also include its router ID in a Node-ID sub-object in
417	      RRO object carried in any subsequent Path message corresponding to
418	      the LSP.  While including its router ID in the Node-ID sub-object
419	      carried in the outgoing Path message, the PLR MUST include the
420	      Node-ID sub-object after including its IPv4/IPv6 address or
421	      unnumbered interface ID sub-object.

423	   -  In parallel to the attempt made to create NP-bypass or LP-bypass,
424	      the PLR MUST initiate a Node-ID based Hello session to the NNhop
425	      or Nhop node respectively along the LSP to establish the RSVP-TE
426	      signaling adjacency.  This Hello session is used to detect MP node
427	      failure as well as determine the capability of the MP node.  If
428	      the MP has set the I-bit in the CAPABILITY object [RFC8370]
429	      carried in Hello message corresponding to the Node-ID based Hello
430	      session, then the PLR MUST conclude that the MP supports refresh-
431	      interval independent FRR procedures defined in this document.  If
432	      the MP has not sent Node-ID based Hello messages or has not set
433	      the I-bit in CAPABILITY object [RFC8370], then the PLR MUST
434	      execute backward compatibility procedures defined in
435	      Section 4.6.2.1 of this document.

437	   -  When the PLR associates a bypass to a protected LSP, it MUST
438	      include a B-SFRR-Ready Extended Association object [RFC8796] and
439	      trigger a Path message to be sent for the LSP.  If a B-SFRR-Ready
440	      Extended Association object is included in the Path message
441	      corresponding to the LSP, the encoding and object ordering rules
442	      specified in RSVP-TE Summary FRR [RFC8796] MUST be followed.  In
443	      addition to those rules, the PLR MUST set the Association Source
444	      in the object to its Node-ID address.

446	4.2.2.  Remote Signaling Adjacency

448	   A Node-ID based RSVP-TE Hello session is one in which Node-ID is used
449	   in the source and the destination address fields of RSVP Hello
450	   messages [RFC4558].  This document extends Node-ID based RSVP Hello
451	   session to track the state of any RSVP-TE neighbor that is not
452	   directly connected by at least one interface.  In order to apply
453	   Node-ID based RSVP-TE Hello session between any two routers that are
454	   not immediate neighbors, the router that supports the extensions
455	   defined in the document MUST set TTL to 255 in all outgoing Node-ID
456	   based Hello messages exchanged between the PLR and the MP.  The
457	   default hello interval for this Node-ID hello session MUST be set to
458	   the default specified in RSVP-TE Scaling Techniques [RFC8370].

< minor > Is it possible that there already exists a RSVP Hello session
between the PLR and MP (for some reasons other than FRR)?

< major > Is it not necessary to add some text to indicate when these RSVP
hello session states need to be cleaned-up/removed?


460	   In the rest of the document the term "signaling adjacency", or
461	   "remote signaling adjacency" refers specifically to the RSVP-TE
462	   signaling adjacency.


511	4.2.4.  "Remote" State on MP

513	   Once a router concludes it is the MP for a PLR running refresh-
514	   interval independent FRR procedures as described in the preceding
515	   section, it MUST create a remote path state for the LSP.  The only
516	   difference between the "remote" path state and the LSP state is the
517	   RSVP_HOP object.  The RSVP_HOP object in a "remote" path state
518	   contains the address that the PLR uses to send Node-ID hello messages
519	   to the MP.

521	   The MP MUST consider the "remote" path state corresponding to the LSP
522	   automatically deleted if:

524	   -  The MP later receives a Path message for the LSP with no matching
525	      B-SFRR-Ready Extended Association object corresponding to the
526	      PLR's IP address contained in the Path RRO, or

528	   -  The Node-ID signaling adjacency with the PLR goes down, or

< minor > I assume the above also includes "down" due to something like BFD?

530	   -  The MP receives backup LSP signaling for the LSP from the PLR or

< minor > Does the above assume that the PLR is signaling a different backup
LSP?

532	   -  The MP receives a PathTear for the LSP, or

534	   -  The MP deletes the LSP state on a local policy or an exception
535	      event

537	   The purpose of "remote" path state is to enable the PLR to explicitly
538	   tear down the path and reservation states corresponding to the LSP by
539	   sending a tear message for the "remote" path state.  Such a message
540	   tearing down "remote" path state is called "Remote" PathTear.

542	   The scenarios in which a "Remote" PathTear is applied are described
543	   in Section 4.5 of this document.


869	4.6.  Backward Compatibility Procedures

871	   "Refresh interval Independent FRR" or RI-RSVP-FRR refers to the set
872	   of procedures defined in this document to eliminate the reliance of
873	   periodic refreshes.  The extensions proposed in RSVP-TE Summary FRR
874	   [RFC8796] may apply to implementations that do not support RI-RSVP-
875	   FRR.  On the other hand, RI-RSVP-FRR extensions relating to LSP state
876	   cleanup namely Conditional and "Remote" PathTear require support from
877	   one-hop and two-hop neighboring nodes along the LSP path.  So
878	   procedures that fall under LSP state cleanup category MUST NOT be
879	   turned on if any of the nodes involved in the node protection FRR
880	   i.e.  the PLR, the MP and the intermediate node in the case of NP,
881	   DOES NOT support RI-RSVP-FRR extensions.  Note that for LSPs
882	   requesting link protection, only the PLR and the LP-MP MUST support
883	   the extensions.

885	4.6.1.  Detecting Support for Refresh interval Independent FRR

887	   An implementation supporting RI-RSVP-FRR extensions SHOULD set the

< minor > I believe this is a MUST per RFC8370 sec 3.1, or am I missing
something?

888	   flag "Refresh interval Independent RSVP" or RI-RSVP flag in the
889	   CAPABILITY object carried in Hello messages as specified in RSVP-TE
890	   Scaling Techniques [RFC8370].  If an implementation does not set the
891	   flag even if it supports RI-RSVP-FRR extensions, then its neighbors
892	   will view the node as any node that does not support the extensions.

< major > The above seems to be conflicting with RFC8370 in that it changes
the meaning of the I - flag. See previous comments as well.

894	   -  As nodes supporting the RI-RSVP-FRR extensions initiate Node-ID
895	      based signaling adjacency with all immediate neighbors, such a
896	      node on the path of a protected LSP can determine whether its Phop
897	      and Nhop neighbors support RI-RSVP-FRR enhancements.

899	   -  As nodes supporting the RI-RSVP-FRR extensions also initiate Node-
900	      ID based signaling adjacency with the NNhop along the path of the
901	      LSP requested node protection (see Section 4.2.1 of this
902	      document), each node along the LSP path can determine whether its
903	      NNhop node supports RI-RSVP-FRR enhancements.  If the NNhop (a)
904	      does not reply to remote Node-ID Hello messages or (b) does not
905	      set the RI-RSVP flag in the CAPABILITY object carried in its Node-
906	      ID Hello messages, then the node acting as the PLR can conclude
907	      that NNhop does not support RI-RSVP-FRR extensions.

909	   -  If node protection is requested for an LSP and if (a) the PPhop
910	      node has not included a matching B-SFRR-Ready Extended Association
911	      object in its Path messages or (b) the PPhop node has not
912	      initiated remote Node-ID Hello messages or (c) the PPhop node does
913	      not set the RI-RSVP flag in the CAPABILITY object carried in its
914	      Node-ID Hello messages, then the node MUST conclude that the PLR
915	      does not support RI-RSVP-FRR extensions.


980	4.6.2.3.  Advertising RI-RSVP without RI-RSVP-FRR

982	   If a node supporting facility backup protection [RFC4090] sets the
983	   RI-RSVP capability (I bit) but does not support the RI-RSVP-FRR
984	   extensions, then it leaves room for stale state to linger around for
985	   an inordinate period of time or disruption of normal FRR operation
986	   (see Section 3 of this document).  Consider the example topology
987	   Figure 1 provided in this document.

989	   -  Assume node B does set RI-RSVP capability in its Node-ID based
990	      Hello messages even though it does not support RI-RSVP-FRR
991	      extensions.  When B detects the failure of its Phop link along an
992	      LSP, it will not send Conditional PathTear to C as required by the
993	      RI-RSVP-FRR procedures.  If B simply leaves the LSP state without
994	      deleting, then B may end up holding on to the stale state until
995	      the (long) refresh timeout.

997	   -  Instead of node B, assume node C does set RI-RSVP capability in
998	      its Node-id based Hello messages even though it does not support
999	      RI-RSVP-FRR extensions.  When B details the failure of its Phop
1000	      link along an LSP, it will send Conditional PathTear to C as
1001	      required by the RI-RSVP-FRR procedures.  But, C would not
1002	      recognize the condition encoded in the PathTear and end up tearing
1003	      down the LSP.

1005	   -  Assume node B does set RI-RSVP capability in its Node-ID based
1006	      Hello messages even though it does not support RI-RSVP-FRR
1007	      extensions.  Also assume local repair is about to commence on node
1008	      B for an LSP that has only requested link protection.  That is, B
1009	      has not initiated the backup LSP signaling for the LSP.  If node B
1010	      receives a normal PathTear at this time from ingress A because of
1011	      a management event initiated on A, then B simply deletes the LSP
1012	      state without sending a Remote PathTear to the LP-MP C.  So, C may
1013	      end up holding on to the stale state until the (long) refresh
1014	      timeout.

< major > As mentioned in a previous comment, all these backward compatibility
issues would have been mitigated with the use of a new flag than the existing
I flag?


1089	6.2.  CONDITIONS Flags

1091	   Apart from allocating Class-Number for the CONDITIONS object, the
1092	   allocation of the Merge-point condition bit or M-bit (see Section 4.4
1093	   of this document) will also be done by IANA.

1095	   Flag: 0x1 Name: Merge-point condition bit or M-bit

< major > This seems like a new registry. What is the allocation policy for
it? OTOH from the picture, this seems like a single flag since the rest is
marked as a reserved and not a flags field - in that case, we don't need
anything from IANA, right?