Networking Working Group
             Internet Draft                                        Reshad Rahman,
                                                                     Anca Zamfir,
                                                                     Junaid Israr
                                                                    Cisco Systems
             Document: draft-rahman-rsvp-restart-
             extensions.txt
             Expires: April 2004                                     October 2003
          
          
                           RSVP Graceful Restart Extensions
          
          
                      draft-rahman-rsvp-restart-extensions-00.txt
          
          
          Status of this Memo
          
             This document is an Internet-Draft and is in full conformance
             with all provisions of Section 10 of RFC2026.  Internet-Drafts
             are working documents of the Internet Engineering Task Force
             (IETF), its areas, and its working groups.  Note that other
             groups may also distribute working documents as Internet-
             Drafts.
             Internet-Drafts are working documents of the Internet
             Engineering Task Force (IETF), its areas, and its working
             groups.  Note that other groups may also distribute working
             documents as Internet-Drafts.
             Internet-Drafts are draft documents valid for a maximum of six
             months and may be updated, replaced, or obsoleted by other
             documents at any time.  It is inappropriate to use Internet-
             Drafts as reference material or to cite them other than as
             "work in progress."
             The list of current Internet-Drafts can be accessed at
                  http://www.ietf.org/ietf/1id-abstracts.txt
             The list of Internet-Draft Shadow Directories can be accessed
             at
                  http://www.ietf.org/shadow.html.
          
          
          Abstract
          
             This document describes the extensions needed by certain
             features for the purpose of RSVP Graceful Restart. One of
             these extensions refers to the ability of a node to recover
             the ERO in the case it has performed an ERO expansion before
             control plane restart. Also a small modification is proposed
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 1]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
             in the basic procedure to support simultaneous multiple node
             restarts in a network. Specifically, a node should use a non-
             zero Recovery Time while in the recovery phase. This allows a
             node to determine at restart time if any of its neighbors has
             previously restarted and it is currently in the recovery
             phase.
          
          Conventions used in this document
          
                The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
             "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
             "OPTIONAL" in this document are to be interpreted as described
             in RFC 2119 [RFC2119].
          
          Sub-IP ID Summary
          
             (This section to be removed before publication.)
          
             SUMMARY
          
                This document specifies extensions and mechanisms to RSVP
             Graceful Restart to provide support for ERO Recovery and
             multiple node restart.
          
             WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK?
          
                This work fits in the MPLS box.
          
             WHY IS IT TARGETED AT THIS WG?
          
                This draft is targeted at this WG, because this it
             specifies extensions to RSVP-TE signaling protocol for control
             plane graceful restart
          
             RELATED REFERENCES
          
             Please refer to the reference section.
          
          Table of Contents
          
          
          1.                 Terminology
          
             GR - Graceful Restart procedure for RSVP as specified in
             [RFC3473].
          
          
          
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 2]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
          2.                 RSVP Graceful Restart
          
                The procedure for RSVP Graceful Restart (GR) is described
             in [RFC3473]. The purpose of this procedure is to allow a node
             that has experienced a failure in the control plane but that
             has preserved its forwarding plane to recreate its states
             based on replayed RSVP messages received from its neighbors
             and also based on information retrieved from the forwarding
             plane. Typically, the data that can be obtained from the
             forwarding plane for each LSP endpoint is (port, label). At
             mid, control plane obtains also the cross-connect information:
             (ingress-port, ingress-label) X (egress-port, egress-label)
          
             While most of the objects can be recovered based on these two
             sources of information, the route information that is
             contained in the ERO object cannot be recovered if the
             restarting node has modified the ERO content prior to restart.
             Section 3 proposes a solution for this problem.
          
             The GR procedure described in [RFC3473] handles some cases of
             multiple node restart. In the example below, assume that one
             LSP was signaled by R1 to span L1, R2, L2, R3.
          
                  L1      L2
             R1 ----- R2 ----- R3
          
             If R2 restarts and then R3 restarts after the Hellos have come
             up, then as described in [RFC3473], R2 is able to detect that
             R3 has restarted and if R2 is in the recovery mode when R3 has
             restarted, when R2 receives a recovery message from R1 it
             sends a recovery message to R3. For the LSP described before,
             R1 sends the Path message including a Recovery Label and R2
             also includes a Recovery Label in the Path message sent to R3.
          
             If R2 restarts and then R3 restarts before the Hellos have
             come up, then there is no specified way for R2 to detect that
             R3 has restarted and is currently in Recovery Mode. Therefore,
             when R2 receives the Path message with the Recovery Label from
             R1, after processing it, R2 sends the corresponding outgoing
             Path message to R3 with Suggested Label instead of the
             Recovery Label. This is incorrect since this message must
             include the Recovery Label in order to help R3 recover its
             state.
          
             A solution for this problem is described in Section 4.
          
          
          
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 3]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
          3.                  ERO Recovery
          
             If a node experiences a control plane failure and restarts,
             the existing GR procedures do not ensure that ERO expansion
             before and after the failure yield the same results.  A change
             in ERO expansion should be avoided as it may lead to
             undesirable results.
          
             This section describes how such a change can be prevented.  To
             support this solution, a new RSVP object, called Recovery ERO
             is introduced.
          
          3.1               Recovery ERO Object
          
             The Recovery ERO object is used during nodal fault recovery
             process.  The format of Recovery ERO object is identical to
             that of the ERO object described in [RFC3209].  A Recovery ERO
             object uses Class-Number ?? (of form 10bbbbbb) and the same C-
             Type as the one of  the ERO object it is trying to recover.
             Only C-Type = 1 is currently supported.
          
          3.2               Procedures at the Restarting Node and its Neighbors
          
             When a node experiences control plane restart and receives a
             Path message with Recovery Label from the upstream node, it
             searches for a matching forwarding state as per [RFC3473].  If
             no matching state is found and if ERO expansion is required,
             then the node considers the Path message as a new LSP. It
             processes the incoming Path message and performs ERO expansion
             as specified in [RFC3473], [RFC3209].  If the forwarding state
             is found and if ERO expansion is required, then the node
             processes the incoming Path message as specified in [RFC3473],
             [RFC3209] with following modifications:
          
             1. It performs partial ERO expansion at this point to include:
                - the strict next hop that is contained in the forwarding
             state.
                - the loose hop as in the ERO of the received Path message.
             2. It includes the result of the previous step in the Recovery
             ERO object to be sent as part of the outgoing Path message.
             3. The restarting node sends the outgoing Path message out.
          
             When the Path message is received by the neighbor downstream
             of the restarting node, the following processing occurs:
          
             4. If this message has associated incoming Path and forwarding
             states, the neighbor node retrieves the ERO object as it was
             previously created by the restarting node and formats a new
             Recovery ERO object with this content to be sent upstream.
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 4]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
          
             5. If this message has neither an associated incoming Path
             state nor a forwarding state, then this should be treated as a
             normal setup.
          
             6. If this message has no associated Path state but forwarding
             state is present, then this node is restarting as well and the
             procedure of the restarting node applies. Once the outgoing
             Path state is recovered, this node retrieves the outgoing ERO
             and creates the Recovery ERO object by prepending one or more
             strict elements as identified by forwarding entry associated
             with this LSP on this abstract node.
          
             If a new upstream Recovery ERO object is available after
             executing the steps 4, 5 and 6, then the neighbor node
             includes the upstream ERO content in a Recovery ERO object to
             be sent upstream in the Resv message.
          
             When the restarting node receives the Resv message (after step
             3), it removes the Recovery ERO object before creating the
             Reservation State and uses its content to update the ERO in
             the associated Path State.
          
             In the case where restarting node determines that the
             downstream node has not been able to include the expanded ERO
             (e.g. downstream node is also a restarting node and forwarding
             has not been preserved), the restarting node performs the
             expansion as described in [RFC3473], [RFC3209]. In this case
             the recovery of the LSP is not guaranteed.
          
             Below is an example that covers the steps above:
          
             Assume a simple topology as follows:
          
             R1 ----- R2 ----- R3 ----- R5 ----- R6
                                |        |
                                ----R4----
          
             1. R1 sends a Path message to R2 with an ERO containing [R2,
             R6(loose)].  R2 performs ERO expansion [R3, R4, R5, R6] and
             forwards the Path message to R3.  R3 stores this ERO and LSP
             gets established using normal LSP setup procedures.
          
             2. The control plane on R2 restarts.
          
             3. R2 receives a Path message with Recovery Label and ERO =
             [R2, R6(loose)].  R2 finds the forwarding state and creates
             the incoming Path state with ERO = [R2, R6].
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 5]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
             4. Based on the forwarding state, R2 determines that the next
             hop is R3.  R2 creates an outgoing Path State with a Recovery
             ERO = [R2, R3, R6] and forwards the Path message to R3.
          
             5. R3 finds the associated incoming Path State with ERO = [R3,
             R4, R5, R6] and creates a Recovery ERO with this content. R3
             sends the Resv message upstream including the Recovery ERO
             object that contains: [R3, R4, R5, R6]
          
             6. R2 receives the Resv message, removes the Recovery ERO
             object and creates the Reservation state.
          
             7. R2 uses the Recovery ERO from the Resv message to create
             the ERO of the outgoing Path State as [R3, R4, R5, R6] and
             removes the Recovery ERO.
          
          
          4.                 Handling Multiple Restarts
          
                If a node (R2 below) experiences a control plane failure
             and if this node implements Graceful Restart procedure
             described in [RFC3473], then it can correctly recover all RSVP
             states it had prior to restart.
          
                During recovery, new LSPs may be established as long as
             they do not collide with the LSPs that are in progress of
             being recovered. If a Path message that includes a Suggested
             Label is received and if the restarting node checks its
             forwarding and determines that a previous LSP was using the
             (ingress-port, ingress-label), the new request may get
             rejected.  It may also get accepted with an upstream label
             different than the one suggested in the Suggested Label.
          
          
             R0 ----- R1 ----- R2 ----- R3
          
          
                With the current specification, if a node R1 upstream of
             the restarting node R2 is also in the recovery phase, then the
             only case where the LSP is recovered is when R1 has restarted
             prior to R2. In this case R1 determines based on the Hello
             session that R2 has restarted and therefore it sends Path
             Messages with Recovery Label for the LSPs that need to be
             recovered.
          
                In the case R1 restarted after R2, R1 cannot determine
             based on the Hello Instance that R2 is a restarting node. When
             R1 receives recovery Path messages from its upstream node, it
             sends Path messages with Suggested Label to R2 as indicated in
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 6]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
             [RFC3473]. R2 does not use the Suggested Label in this case.
             This is because its forwarding indicates that a Recovery Path
             message may be received from R1. In fact, it may setup a new
             LSP using a different label.
          
                In the case of GMPLS, messages from different upstream
             neighbors may be received on the same interface which may be
             different from the interface hosting the LSP to be recovered.
          
                This draft proposes the use of Recovery Time value received
             by a restarting node as an indication that its downstream
             neighbor is in recovery mode.
          
                A node that complies with this specification sends non-zero
             Recovery Time while in recovery mode and MUST set it to 0 once
             its recovery has completed.
          
                The alternative solution is to require restarting nodes
             that receive a Path with Recovery Label to forward, after
             processing, a Path with Recovery Label and not Suggested
             Label.
          
                To allow for recovery to complete, a restarting node may
             wish to adjust its advertised Recovery Time when an upstream
             node restarts, in an attempt to prevent the downstream nodes
             to expire states that may be recovered later than expected.
          
          
          5.                 Forward Compatibility Note
          
                A node that does not support the Recovery ERO object and
             the procedure described in this draft, will ignore the
             Recovery ERO object and responds with a Resv. A restarting
             node may choose to continue the recovery by performing a new
             ERO expansion. In the case where the new ERO matches the ERO
             before restart the LSP is recovered. Otherwise, depending on
             the downstream node implementations, the LSP may be torn down.
          
             The extensions specified in this draft do not affect the
             processing of the Restart Cap object at nodes that do not
             support them.
          
             A node that does not comply with this specification and has no
             proprietary way to detect at restart time the downstream
             neighbors that have previously restarted and that are in the
             recovery mode, may ignore the Recovery Time in the Restart_Cap
             object and may forward only Path messages with Suggested
             Label.
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 7]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
             A node that does comply with this specification and that
             receives a Restart_Cap object with a non-zero Recovery Time
             from a downstream node that does not comply with this
             specification, forwards Path messages with Recovery Label
             included for all recovered LSPs while in the recovery period.
             If in fact the downstream node is not in Recovery mode and
             receives a Path message with a Recovery Label, it should
             generate a Resv message and normal state processing continues.
          
          6.                 Security Considerations
          
               This document does not introduce new security issues. The
             security considerations pertaining to the original RSVP
             protocol [RFC2205] remain relevant.
          
          
          References
          
          
             [RFC2205] "Resource ReSerVation Protocol (RSVP) - Version 1,
                Functional Specification", RFC 2205, Braden, et al,
                September 1997.
             [RFC3209] "Extensions to RSVP for LSP Tunnels", D. Awduche, et
             al, RFC 3209, December 2001.
             [RFC3471] "Generalized Multi-Protocol Label Switching (GMPLS)
                Signaling Functional Description", RFC 3471, L. Berger, et
                al, January 2003.
             [RFC3473] "Generalized Multi-Protocol Label Switching (GMPLS)
                Signaling Resource ReserVation Protocol-Traffic Engineering
                (RSVP-TE) Extensions", RFC 3471, L. Berger, et al, January
                2003.
             [RFC3477] "Signaling Unnumbered Links in Resource ReSerVation
                Protocol - Traffic Engineering (RSVP-TE) ", RFC 3477, K.
                Kompella, Y. Rekhter, January 2003.
             [RFC2119] "Key words for use in RFCs to Indicate Requirement
                Levels", RFC 2119, S. Bradner, March 1997.
          
          
          Author's Addresses
          
          
             Reshad Rahman
             Cisco Systems Inc.
             2000 Innovation Dr.,
             Kanata, Ontario, K2K 3E8
             Canada.
             Phone: (613)-254-3519
             Email: rrahman@cisco.com
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 8]


          draft-rahman-rsvp-restart-extensions-00.txt              October 2003
          
          
             Anca Zamfir
             Cisco Systems Inc.
             2000 Innovation Dr.,
             Kanata, Ontario, K2K 3E8
             Canada.
             Phone: (613)-254-3484
             Email: ancaz@cisco.com
          
             Junaid Israr
             Cisco Systems Inc.
             2000 Innovation Dr.,
             Kanata, Ontario, K2K 3E8
             Canada.
             Phone: (613)-254-3693
             Email: jisrar@cisco.com
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          
          Rahman, R., Zamfir, A., Israr, J.                            [Page 9]