Network Working Group               Jonathan P. Lang (Calient Networks)
Internet Draft                            John Drake (Calient Networks)
Expiration Date: August 2001           Yakov Rekhter (Juniper Networks)







                                                          February 2001


                  Generalized MPLS Recovery Mechanisms

                    draft-lang-ccamp-recovery-00.txt


 Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 [RFC2026].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time. It is inappropriate to use Internet- Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

 Abstract

   This draft discusses protection and restoration mechanisms for fault
   management within the GMPLS framework [GMPLS].










Lang, J., Drake, J., Rekhter, Y.                              [Page 1]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

1. Introduction

   A key requirement for the development of a common control plane for
   both optical and electronic networks is that there must be features
   in the signaling, routing, and link management protocols to enable
   intelligent fault management.  Fault management requires four steps:
   fault detection, fault localization, fault notification, and fault
   recovery.  Fault detection should be handled at the layer closest to
   the failure; for optical networks, this is the physical (optical)
   layer.  One measure of fault detection at the physical layer is
   detecting loss of light (LOL); other techniques based on, for
   example, OSNR, BER, dispersion, crosstalk, and attenuation are still
   being investigated (see, for example, [OLCP] and [LMP-DWDM]).  Fault
   localization requires communication between nodes to determine where
   the failure has occurred (for example, SONET AIS is used to localize
   failures between SONET terminating devices).  One interesting
   consequence of using LOL to detect failures in optical networks is
   that LOL propagates downstream along the connectionÆs path.  The
   Link Management Protocol (LMP) [LMP] includes a fault localization
   procedure that is designed to localize failures in both transparent
   (all-optical) and opaque (opto-electrical) networks, and is
   independent of the data encoding scheme.  Fault notification is the
   Communication of a failure between the node detecting it and a node
   equipped to deal with the failure.  Fast fault notification is
   essential for rapid recovery.  The Notify mechanism of [RSVP-GEN] is
   designed to support fast notification of non-adjacent nodes.

   Once a failure has been detected and localized, and the responsible
   node has been notified, protection and restoration can be used to
   recover from the failure. We make the distinction between protection
   and restoration by the time scales in which they operate.
   Protection is designed to react to failures rapidly (say, in less
   than a couple hundred milliseconds) and often involves 100% resource
   redundancy.  For example, SONET automatic protection switching (APS)
   is designed to switch the traffic from a primary (working) path to a
   secondary (protection) path in less than 50ms.  This requires
   simultaneous transmission along both the primary and secondary paths
   (called 1+1 protection) with a selector at the receiving node, and
   uses twice as many network resources as a non-APS protected path.
   Restoration, on the other hand, is designed to react to failures
   quickly, but it typically takes an order of magnitude longer to
   restore the connection compared to protection switching.  This is
   because restoration typically utilizes pools of shared resources
   that are more efficient in terms of the network utilization.  In
   addition, restoration may involve rerouting connections, which can
   be computationally expensive if the paths are not pre-calculated or
   if the pre-calculated resources are no longer available.

   Protection and restoration methods have traditionally been addressed
   using two techniques:  path-level recovery, where the failure is
   addressed at the end nodes (i.e., the initiating and terminating
   nodes of the path); and span-level recovery, where the failure is

Lang, J., Drake, J., Rekhter, Y.                              [Page 2]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

   addressed at an intermediate or transit node.  Path-level recovery
   can be further subdivided into path protection, where secondary (or
   protection) paths are pre-allocated, and path restoration, where
   connections are rerouted, either dynamically or using pre-calculated
   (but not pre-allocated) paths.  Span-level recovery can be
   subdivided into span protection, where traffic is switched to an
   alternate channel or link connecting the same two nodes, and span
   restoration, where traffic is switched to an alternate route between
   the two nodes (this involves passing through additional intermediate
   nodes).

   To effectively use protection, there must be mechanisms to configure
   protected links on a span between nodes, advertise the protection
   bandwidth of a link so that it may be used by a class of traffic
   that has different availability requirements, establish secondary
   (protection) LSPs to protect primary LSPs, allow the resources of
   secondary LSPs to be used by lower priority traffic until a
   switchover occurs, and signal protection switchover when necessary.
   In this draft, we discuss protection and restoration in the context
   of GMPLS signaling.  Specifically, we address these issues in the
   context of RSVP signaling and OSPF and IS-IS routing.

2. Protection Mechanisms

   Protection is designed to react to failures at the fastest timescale
   and typically involves pre-provisioning protection resources.  In
   this section we discuss both span and path protection and present
   mechanisms within GMPLS to implement both protection schemes.

2.1 Span Protection

   A span consists of a number of channels between two adjacent nodes
   that are bundled into a single link called a TE link (see [LMP]).
   Span protection involves switching to a protection channel when a
   failure occurs on a working channel.  At the span level, both
   dedicated (1+1, 1:1) and shared (M:N) protection may be implemented.
   The protection type supported by a TE link (LPT) will be advertised
   throughout the network using an IGP so that intelligent routing
   decisions can be made (see Section 4).  The desired protection for a
   path is signaled as part of the Generalized Label Request in GMPLS
   signaling.  This is needed in signaling if a link supports multiple
   protection types or if loose routing is used.

   For dedicated 1+1 span protection, each node must replicate the data
   onto two separate channels (possibly using separate component links
   of a bundled link or separate ports of a TE link) and the adjacent
   node must select the data from only one channel based on the signal
   integrity.  This is the fastest protection mechanism, however, it
   requires using twice the LSP bandwidth between each pair of nodes
   and the ability to replicate the data on two separate channels.



Lang, J., Drake, J., Rekhter, Y.                              [Page 3]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

   For shared M:N protection, M protection links are shared between N
   primary links.  Since data is not replicated on both the primary and
   secondary links, failures must first be isolated before the
   switchover can occur.  LMP can be used for fault isolation, and the
   upstream node (upstream in terms of the direction an RSVP Path
   message traverses) will initiate the local span protection.  To
   initiate span protection, the upstream node SHOULD send an RSVP Path
   message with a Label Set object including the labels for the
   available secondary links.  If more than one label is included in
   the Label Set object, the Suggested Label object should be used to
   indicate the preferred secondary label.  If the failure affected a
   bi-directional LSP, a new Upstream Label may also need to be
   transmitted.  In addition, new LinkId, PHOP, and modified ERO may
   also need to be included based on the shared protection
   configuration.  Note that the benefit of exchanging the shared
   protection configuration in advance using LMP is that it minimizes
   the potential label conflict when protection switching.  When the
   downstream node receives the Path message with the new objects, it
   MUST verify the parameters, update the RSVP Path state, and respond
   with either an RSVP Resv message with a new label or it should
   generate a PathError message if the resources are not available.

2.2 Path Protection

   Path protection is addressed at the end nodes of an LSP (i.e., LSP
   initiator and terminator) and requires switching to an alternate
   path when a failure occurs.  For 1+1 path protection, a signal is
   transmitted simultaneously over two disjoint paths and a selector is
   used at the receiving node to choose the better signal.  For M:N
   path protection, N primary signals are transmitted along disjoint
   paths, and M secondary paths are pre-established for shared
   protection switching among the N primary paths.

2.2.1 1+1 Path Protection

   There are a number of 1+1 path protection variations that may be
   implemented that provide different levels of protection.  The most
   common notion of 1+1 path protection is to select two disjoint
   paths, one primary and one secondary, where each link along both
   paths is unprotected.  This protects against a single link or node
   failure, depending on how the two paths are disjoint.  One variation
   of 1+1 path protection is to select a single path where each link
   individually supports 1+1 span protection as discussed in Subsection
   2.1.  This protects against a single link failure, but not a node
   failure.  One may also combine the two approaches by ensuring that
   for every contiguous segment of the path that includes only the
   links that don't support 1+1 span protection, the head-end LSR has
   to compute a link-disjoint segment, with the constraint that none of
   the links in the newly computed segments have 1+1 protection.

   After the two paths are computed, the head-end LSR will originate
   two LSPs with dedicated 1+1 and unprotected bits set in the LPT. The

Lang, J., Drake, J., Rekhter, Y.                              [Page 4]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

   setup will indicate that these two paths request Shared-Explicit
   reservations (see [TUNNEL]).  At each node where the two paths
   branch out, the node must replicate the data into both branches.  At
   each node where the two paths merge, the node must select the data
   from only one path based on the integrity of the signal.

   For SONET/SDH, LSPs are bi-directional and each branching point is
   also a merging point and vice versa.

   As an example consider the following:

                                     M
                                    / \
                              A---B   C----D
                                    \ /
                                     N

   Only links A-B and C-D support 1+1 protection. Node A wants to
   establish a 1+1 protected path to D.  In this case, A computes a
   primary path, A, B, M, C, D where the segment B, M, C has links that
   do not support 1+1 protection. Therefore, A computes a link-disjoint
   segment, B, N, C, and uses it to construct a secondary path, A, B,
   N, C, D.  A initiates a setup of two LSPs indicating the desire for
   Shared Explicit (SE) reservations - the first path is routed along
   A, B, M, C, D, and the second path is routed along A, B, N, C, D.

   Since the two LSPs branch out at node B, B sends the data it
   receives from A to both M and N.  At node C, the two LSPs merge and
   C selects the data received over one of these LSPs (based on the
   integrity of the signal), and forwards this data to D.

   When the LSP from A to D is bi-directional, then C must also send
   the data it receives from D to both M and N, and B must select the
   data received from either M or N, and forward it the to A.

2.2.2. M:N Path Protection

   There are a number of M:N path protection variations that may be
   implemented to provide different levels of protection and to address
   different network configurations.  The most common notion of M:N
   path protection is to route N node-disjoint primary paths and pre-
   establish M backup paths that are node disjoint from the primary
   paths.  This protects against M path failures.  Another variation of
   M:N path protection is to select a single path where each link
   individually supports M:N span protection.  This protects against M
   link failures over each span, but is not robust to node failures.
   One may also combine the two approaches by ensuring that for every
   contiguous segment of the path that includes only the links that
   donÆt support M:N span protection, the head-end node has to compute
   a node- or link-disjoint segment, with the constraint that none of
   the links in the newly computed segments need to be protected.


Lang, J., Drake, J., Rekhter, Y.                              [Page 5]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

   An important feature of the GMPLS work is that it allows pre-
   configuring secondary (backup) LSPs to protect primary LSPs.  This
   is done by indicating the LSP is of type Secondary in the protection
   field of the Generalized Label Request.  Secondary LSPs are used for
   fast switchover when primary LSPs fail.  Although the resources for
   the secondary LSPs are pre-allocated, lower priority traffic may use
   the resources with the caveat that the lower priority traffic will
   be preempted if the primary LSP fails.  If lower priority traffic is
   using resources along the secondary LSPs, the end nodes may need to
   be notified of the failure in order to complete the switchover.

   The setup of the primary LSP SHOULD indicate that the LSP initiator
   and terminator wish to receive Notify messages using the Notify
   Request object.  If a failure occurs, LMP can be used to isolate the
   failure.  Once the failure is isolated, the upstream node (upstream
   in terms of the direction an RSVP Path message traverses) SHOULD
   send an RSVP Notify message to the LSP initiator, and the downstream
   node SHOULD send an RSVP Notify message to the LSP terminator.  Upon
   receipt of the Notify messages, the source and destination nodes
   MUST switch the traffic from the primary LSP to the pre-configured
   secondary LSP.  Note that if a common initiator-terminator is used
   for all N primary paths sharing the secondary path (assuming 1:N
   protection), no further notification is required to indicate that
   the N primary LSPs are no longer protected.

   As an example consider the following:

                            A---B       E---F
                           /     \     /     \
                       I---       C----D       ---T
                           \     /     \     /
                            J---K       L---M

   Two node-disjoint routes from initiator I to terminator T cannot be
   found; however, two node-disjoint routes can be found from node I to
   node C and from node D to node T.  Furthermore, the link from node C
   to node D is protected using dedicated 1:1 protection.  In this
   case, I computes the primary route R1={I,A,B,C,D,E,F,T} and
   secondary route R2={I,J,K,C,D,L,M,T} where the segment {C,D}
   supports 1:1 span protection.  A initiates a setup of two LSPs
   indicating the desire for Shared Explicit reservations; the primary
   LSP is routed along R1 and the secondary LSP is routed along R2.

3. Restoration Mechanisms

   Restoration is designed to react to failures quickly and use
   bandwidth efficiently, but typically involves dynamic resource
   establishment and route calculation, and therefore, takes more time
   to switch to an alternate path than protection techniques.
   Restoration can be implemented at the intiator node or at an
   intermediate node once the responsible node has been notified.
   Failure notification can be done using the Notify procedures of

Lang, J., Drake, J., Rekhter, Y.                              [Page 6]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

   [GMPLS] or using the standard RSVP PathError messages.  In the
   section, we briefly discuss span and path restoration and highlight
   the RSVP mechanisms that can be used to implement them.

   To support span restoration, where traffic is switched to an
   alternate route around a failure, a new LSP is established at an
   intermediate node that involves passing through additional
   intermediate nodes.  Span restoration may be beneficial for LSPs
   that span multiple hops and/or large distances because the latency
   incurred for failure notification may be significantly reduced and
   only segments of the LSP are rerouted instead of the entire path.
   The RSVP Notify Request object can be used by an intermediate node
   to request that it be the target of an RSVP Notify message.  Span
   restoration may break traffic-engineering (TE) requirements if a
   strict-hop route is defined for the connection.  Furthermore, the
   constraints used for routing the connection must be forwarded so
   that an intermediate node doing span restoration is able to
   calculate an appropriate alternate route; this is similar to the
   problems when establishing/maintaining TE requirements that span
   mult-areas (see [MULTI] for a proposed mechanism).

   Path restoration, on the other hand, switches traffic to an
   alternate route around a failure, where the new route is selected at
   the LSP initiator and may reuse intermediate nodes used by the
   original LSP and it may include additional intermediate nodes.  For
   strict-hop routing, TE requirements can be directly applied to the
   route calculation, and the filed node or link can be avoided.
   However, if the failure occurred within a loose-routed hop, the
   source node may not have enough information to reroute the
   connection around the failure.

   Restoration (span or path) will be initiated by the node that has
   isolated the failure or by the node that has received either an RSVP
   Notify message or an RSVP Path Error message indicating that a
   failure has occurred.  The new resources can be established in a
   make-before-break fashion, where the new LSP is setup before the old
   LSP is torn down, using the mechanisms of the LSP_Tunnel Session
   object (see [TUNNEL]) and the Shared-Explicit reservation style.
   Both the new and old LSPs share resources at nodes common to both
   LSPs.  The Tunnel end point addresses, Tunnel Id, Extended Tunnel
   Id, Tunnel sender address, and LSP Id are all used to uniquely
   identify both the old and new LSPs; this ensures new resources are
   established without double counting resource requirements along
   common segments.

4. Routing Enhancements

   The GMPLS extensions to OSPF [OSPF-GE] and IS-IS [ISIS-GE] include
   the advertisement of the LPT.  The LPT field is a bit vector that
   indicates the protection capabilities that are supported for the
   link.  The LPT field may be configured with Dedicated 1+1, Dedicated
   1:1, Shared M:N, and Enhanced protection, as well as Unprotected.

Lang, J., Drake, J., Rekhter, Y.                              [Page 7]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001

   For a link that has dedicated 1+1 protection or is unprotected, this
   advertisement provides a complete description of the link
   capabilities and the usable bandwidth.  However, a key argument for
   using dedicated 1:1 or shared M:N is the efficiency gained by
   reusing the protection bandwidth for lower priority traffic when the
   bandwidth would otherwise be idle.

   To advertise the protection bandwidth for a link that has dedicated
   1:1 or shared M:N protection, a link with LPT field Extra Traffic
   should be advertised.  This indicates that bandwidth can be used by
   LSPs, with the caveat that any LSPs routed over this link will be
   preempted if the resources are needed as a result of a failure over
   the primary link.

   When a failure occurs on a dedicated 1:1 or shared M:N link, the
   LSPs routed over the link will automatically be switched to the
   Extra Traffic link that is protecting it.

   To support the routing of Secondary LSPs for M:N path protection (as
   described in Section 2.2.2), new extensions must be added to the
   current GMPLS routing extensions.  In particular, there must be a
   mechanism to advertise secondary bandwidth and processing rules must
   be defined for bandwidth accounting when LSP requests arrive at a
   node.  See [BWAcct] for a proposal addressing these issues.


5. Acknowledgments

   We would like to thank Kireeti Kompella and Ayan Banerjee for their
   comments and fruitful discussions.

6. References

   [RFC2026] Bradner, S., "The Internet Standards Process -- Revision
             3," BCP 9, RFC 2026, October 1996.
   [GMPLS]   Ashwood-Smith, P., Banerjee, A., Berger, L., et al,
             "Generalized MPLS - signaling functional description,"
             Internet Draft, draft-ietf-mpls-generalized-mpls-
             signaling-01.txt, (work in progress).
   [OLCP]    Chiu, A., Strand, J., Tkach, R., ôUnique Features and
             Requirements for The Optical Layer Control Plane, Internet
             Draft, draft-chiu--strand-unique-OLCP-01.txt, (work in
             progress).
   [LMP-DWDM] Fredette, A., Snyder, E., Shantigram, J., et al, ôLink
             Management Protocol (LMP) for WDM Transmission Systems,ö
             Internet Draft, draft-fredette-lmp-wdm-00.txt, (work in
             progress).
   [LMP]     Lang, J. P., Mitra, K., Drake, J., Kompella, K., et al,
             ôLink Management Protocol (LMP),ö Internet Draft, draft-
             ietf-mpls-lmp-01.txt, (work in progress).



Lang, J., Drake, J., Rekhter, Y.                              [Page 8]


Internet Draft     draft-lang-ccamp-recovery-00.txt      February 2001


   [RSVP-GEN] Ashwood-Smith, P., Banerjee, A., Berger, L., et al, "
             Generalized MPLS Signaling - RSVP-TE Extensions," Internet
             Draft, draft-ietf-mpls-generalized-rsvp-te-00.txt, (work
             in progress).
   [TUNNEL]  Awduche, D., Berger, L., Gan, D-H., Li. T., Srinivasan,
             V., Swallow, G., ôRSVP-TE: Extensions to RSVP for LSP
             Tunnels,ö Internet Draft, draft-ietf-mpls-rsvp-lsp-tunnel-
             07.txt, (work in progress).
   [MULTI]   Kompella, K., Rekhter, Y., ôMulti-area MPLS Traffic
             Engineering,ö Internet Draft, draft-kompella-mpls-
             multiarea-te-00.txt, (work in progress).
   [OSPF-GE] Kompella, K., Rekhter, Y., Banerjee, A., Drake, J., et al,
             ôOSPF Extensions in Support of MPL(ambda)S," Internet
             Draft, draft-kompella-ospf-ompls-extensions-00.txt, (work
             in progress).
   [ISIS-GE] Kompella, K., Rekhter, Y., Banerjee, A., Drake, J., et al,
             ôIS-IS Extensions in Support of Generalized MPLS,ö
             Internet Draft, draft-ietf-isis-gmpls-extensions-01.txt,
             (work in progress).
   [BWAcct]  Kompella, K., Lang, J.P., Drake, J., ôBandwidth Accouting
             in Support of Secondary LSPs,ö  Internet Draft, (work in
             progress).






























Lang, J., Drake, J., Rekhter, Y.                              [Page 9]


7. Author's Addresses

   Jonathan P. Lang                        John Drake
   Calient Networks                        Calient Networks
   25 Castilian Drive                      5853 Rue Ferrari
   Goleta, CA 93117                        San Jose, CA 95138
   email: jplang@calient.net               email: jdrake@calient.net

   Yakov Rekhter
   Juniper Networks
   1194 N. Mathilda Avenue
   Sunnyvale, CA 94089
   email: yakov@juniper.net








































Lang, J., Drake, J., Rekhter, Y.                             [Page 10]