Network Working Group Jonathan P. Lang (Calient Networks)
Internet Draft John Drake (Calient Networks)
Expiration Date: August 2001 Yakov Rekhter (Juniper Networks)
February 2001
Generalized MPLS Recovery Mechanisms
draft-lang-ccamp-recovery-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [RFC2026].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet- Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This draft discusses protection and restoration mechanisms for fault
management within the GMPLS framework [GMPLS].
Lang, J., Drake, J., Rekhter, Y. [Page 1]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
1. Introduction
A key requirement for the development of a common control plane for
both optical and electronic networks is that there must be features
in the signaling, routing, and link management protocols to enable
intelligent fault management. Fault management requires four steps:
fault detection, fault localization, fault notification, and fault
recovery. Fault detection should be handled at the layer closest to
the failure; for optical networks, this is the physical (optical)
layer. One measure of fault detection at the physical layer is
detecting loss of light (LOL); other techniques based on, for
example, OSNR, BER, dispersion, crosstalk, and attenuation are still
being investigated (see, for example, [OLCP] and [LMP-DWDM]). Fault
localization requires communication between nodes to determine where
the failure has occurred (for example, SONET AIS is used to localize
failures between SONET terminating devices). One interesting
consequence of using LOL to detect failures in optical networks is
that LOL propagates downstream along the connectionÆs path. The
Link Management Protocol (LMP) [LMP] includes a fault localization
procedure that is designed to localize failures in both transparent
(all-optical) and opaque (opto-electrical) networks, and is
independent of the data encoding scheme. Fault notification is the
Communication of a failure between the node detecting it and a node
equipped to deal with the failure. Fast fault notification is
essential for rapid recovery. The Notify mechanism of [RSVP-GEN] is
designed to support fast notification of non-adjacent nodes.
Once a failure has been detected and localized, and the responsible
node has been notified, protection and restoration can be used to
recover from the failure. We make the distinction between protection
and restoration by the time scales in which they operate.
Protection is designed to react to failures rapidly (say, in less
than a couple hundred milliseconds) and often involves 100% resource
redundancy. For example, SONET automatic protection switching (APS)
is designed to switch the traffic from a primary (working) path to a
secondary (protection) path in less than 50ms. This requires
simultaneous transmission along both the primary and secondary paths
(called 1+1 protection) with a selector at the receiving node, and
uses twice as many network resources as a non-APS protected path.
Restoration, on the other hand, is designed to react to failures
quickly, but it typically takes an order of magnitude longer to
restore the connection compared to protection switching. This is
because restoration typically utilizes pools of shared resources
that are more efficient in terms of the network utilization. In
addition, restoration may involve rerouting connections, which can
be computationally expensive if the paths are not pre-calculated or
if the pre-calculated resources are no longer available.
Protection and restoration methods have traditionally been addressed
using two techniques: path-level recovery, where the failure is
addressed at the end nodes (i.e., the initiating and terminating
nodes of the path); and span-level recovery, where the failure is
Lang, J., Drake, J., Rekhter, Y. [Page 2]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
addressed at an intermediate or transit node. Path-level recovery
can be further subdivided into path protection, where secondary (or
protection) paths are pre-allocated, and path restoration, where
connections are rerouted, either dynamically or using pre-calculated
(but not pre-allocated) paths. Span-level recovery can be
subdivided into span protection, where traffic is switched to an
alternate channel or link connecting the same two nodes, and span
restoration, where traffic is switched to an alternate route between
the two nodes (this involves passing through additional intermediate
nodes).
To effectively use protection, there must be mechanisms to configure
protected links on a span between nodes, advertise the protection
bandwidth of a link so that it may be used by a class of traffic
that has different availability requirements, establish secondary
(protection) LSPs to protect primary LSPs, allow the resources of
secondary LSPs to be used by lower priority traffic until a
switchover occurs, and signal protection switchover when necessary.
In this draft, we discuss protection and restoration in the context
of GMPLS signaling. Specifically, we address these issues in the
context of RSVP signaling and OSPF and IS-IS routing.
2. Protection Mechanisms
Protection is designed to react to failures at the fastest timescale
and typically involves pre-provisioning protection resources. In
this section we discuss both span and path protection and present
mechanisms within GMPLS to implement both protection schemes.
2.1 Span Protection
A span consists of a number of channels between two adjacent nodes
that are bundled into a single link called a TE link (see [LMP]).
Span protection involves switching to a protection channel when a
failure occurs on a working channel. At the span level, both
dedicated (1+1, 1:1) and shared (M:N) protection may be implemented.
The protection type supported by a TE link (LPT) will be advertised
throughout the network using an IGP so that intelligent routing
decisions can be made (see Section 4). The desired protection for a
path is signaled as part of the Generalized Label Request in GMPLS
signaling. This is needed in signaling if a link supports multiple
protection types or if loose routing is used.
For dedicated 1+1 span protection, each node must replicate the data
onto two separate channels (possibly using separate component links
of a bundled link or separate ports of a TE link) and the adjacent
node must select the data from only one channel based on the signal
integrity. This is the fastest protection mechanism, however, it
requires using twice the LSP bandwidth between each pair of nodes
and the ability to replicate the data on two separate channels.
Lang, J., Drake, J., Rekhter, Y. [Page 3]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
For shared M:N protection, M protection links are shared between N
primary links. Since data is not replicated on both the primary and
secondary links, failures must first be isolated before the
switchover can occur. LMP can be used for fault isolation, and the
upstream node (upstream in terms of the direction an RSVP Path
message traverses) will initiate the local span protection. To
initiate span protection, the upstream node SHOULD send an RSVP Path
message with a Label Set object including the labels for the
available secondary links. If more than one label is included in
the Label Set object, the Suggested Label object should be used to
indicate the preferred secondary label. If the failure affected a
bi-directional LSP, a new Upstream Label may also need to be
transmitted. In addition, new LinkId, PHOP, and modified ERO may
also need to be included based on the shared protection
configuration. Note that the benefit of exchanging the shared
protection configuration in advance using LMP is that it minimizes
the potential label conflict when protection switching. When the
downstream node receives the Path message with the new objects, it
MUST verify the parameters, update the RSVP Path state, and respond
with either an RSVP Resv message with a new label or it should
generate a PathError message if the resources are not available.
2.2 Path Protection
Path protection is addressed at the end nodes of an LSP (i.e., LSP
initiator and terminator) and requires switching to an alternate
path when a failure occurs. For 1+1 path protection, a signal is
transmitted simultaneously over two disjoint paths and a selector is
used at the receiving node to choose the better signal. For M:N
path protection, N primary signals are transmitted along disjoint
paths, and M secondary paths are pre-established for shared
protection switching among the N primary paths.
2.2.1 1+1 Path Protection
There are a number of 1+1 path protection variations that may be
implemented that provide different levels of protection. The most
common notion of 1+1 path protection is to select two disjoint
paths, one primary and one secondary, where each link along both
paths is unprotected. This protects against a single link or node
failure, depending on how the two paths are disjoint. One variation
of 1+1 path protection is to select a single path where each link
individually supports 1+1 span protection as discussed in Subsection
2.1. This protects against a single link failure, but not a node
failure. One may also combine the two approaches by ensuring that
for every contiguous segment of the path that includes only the
links that don't support 1+1 span protection, the head-end LSR has
to compute a link-disjoint segment, with the constraint that none of
the links in the newly computed segments have 1+1 protection.
After the two paths are computed, the head-end LSR will originate
two LSPs with dedicated 1+1 and unprotected bits set in the LPT. The
Lang, J., Drake, J., Rekhter, Y. [Page 4]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
setup will indicate that these two paths request Shared-Explicit
reservations (see [TUNNEL]). At each node where the two paths
branch out, the node must replicate the data into both branches. At
each node where the two paths merge, the node must select the data
from only one path based on the integrity of the signal.
For SONET/SDH, LSPs are bi-directional and each branching point is
also a merging point and vice versa.
As an example consider the following:
M
/ \
A---B C----D
\ /
N
Only links A-B and C-D support 1+1 protection. Node A wants to
establish a 1+1 protected path to D. In this case, A computes a
primary path, A, B, M, C, D where the segment B, M, C has links that
do not support 1+1 protection. Therefore, A computes a link-disjoint
segment, B, N, C, and uses it to construct a secondary path, A, B,
N, C, D. A initiates a setup of two LSPs indicating the desire for
Shared Explicit (SE) reservations - the first path is routed along
A, B, M, C, D, and the second path is routed along A, B, N, C, D.
Since the two LSPs branch out at node B, B sends the data it
receives from A to both M and N. At node C, the two LSPs merge and
C selects the data received over one of these LSPs (based on the
integrity of the signal), and forwards this data to D.
When the LSP from A to D is bi-directional, then C must also send
the data it receives from D to both M and N, and B must select the
data received from either M or N, and forward it the to A.
2.2.2. M:N Path Protection
There are a number of M:N path protection variations that may be
implemented to provide different levels of protection and to address
different network configurations. The most common notion of M:N
path protection is to route N node-disjoint primary paths and pre-
establish M backup paths that are node disjoint from the primary
paths. This protects against M path failures. Another variation of
M:N path protection is to select a single path where each link
individually supports M:N span protection. This protects against M
link failures over each span, but is not robust to node failures.
One may also combine the two approaches by ensuring that for every
contiguous segment of the path that includes only the links that
donÆt support M:N span protection, the head-end node has to compute
a node- or link-disjoint segment, with the constraint that none of
the links in the newly computed segments need to be protected.
Lang, J., Drake, J., Rekhter, Y. [Page 5]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
An important feature of the GMPLS work is that it allows pre-
configuring secondary (backup) LSPs to protect primary LSPs. This
is done by indicating the LSP is of type Secondary in the protection
field of the Generalized Label Request. Secondary LSPs are used for
fast switchover when primary LSPs fail. Although the resources for
the secondary LSPs are pre-allocated, lower priority traffic may use
the resources with the caveat that the lower priority traffic will
be preempted if the primary LSP fails. If lower priority traffic is
using resources along the secondary LSPs, the end nodes may need to
be notified of the failure in order to complete the switchover.
The setup of the primary LSP SHOULD indicate that the LSP initiator
and terminator wish to receive Notify messages using the Notify
Request object. If a failure occurs, LMP can be used to isolate the
failure. Once the failure is isolated, the upstream node (upstream
in terms of the direction an RSVP Path message traverses) SHOULD
send an RSVP Notify message to the LSP initiator, and the downstream
node SHOULD send an RSVP Notify message to the LSP terminator. Upon
receipt of the Notify messages, the source and destination nodes
MUST switch the traffic from the primary LSP to the pre-configured
secondary LSP. Note that if a common initiator-terminator is used
for all N primary paths sharing the secondary path (assuming 1:N
protection), no further notification is required to indicate that
the N primary LSPs are no longer protected.
As an example consider the following:
A---B E---F
/ \ / \
I--- C----D ---T
\ / \ /
J---K L---M
Two node-disjoint routes from initiator I to terminator T cannot be
found; however, two node-disjoint routes can be found from node I to
node C and from node D to node T. Furthermore, the link from node C
to node D is protected using dedicated 1:1 protection. In this
case, I computes the primary route R1={I,A,B,C,D,E,F,T} and
secondary route R2={I,J,K,C,D,L,M,T} where the segment {C,D}
supports 1:1 span protection. A initiates a setup of two LSPs
indicating the desire for Shared Explicit reservations; the primary
LSP is routed along R1 and the secondary LSP is routed along R2.
3. Restoration Mechanisms
Restoration is designed to react to failures quickly and use
bandwidth efficiently, but typically involves dynamic resource
establishment and route calculation, and therefore, takes more time
to switch to an alternate path than protection techniques.
Restoration can be implemented at the intiator node or at an
intermediate node once the responsible node has been notified.
Failure notification can be done using the Notify procedures of
Lang, J., Drake, J., Rekhter, Y. [Page 6]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
[GMPLS] or using the standard RSVP PathError messages. In the
section, we briefly discuss span and path restoration and highlight
the RSVP mechanisms that can be used to implement them.
To support span restoration, where traffic is switched to an
alternate route around a failure, a new LSP is established at an
intermediate node that involves passing through additional
intermediate nodes. Span restoration may be beneficial for LSPs
that span multiple hops and/or large distances because the latency
incurred for failure notification may be significantly reduced and
only segments of the LSP are rerouted instead of the entire path.
The RSVP Notify Request object can be used by an intermediate node
to request that it be the target of an RSVP Notify message. Span
restoration may break traffic-engineering (TE) requirements if a
strict-hop route is defined for the connection. Furthermore, the
constraints used for routing the connection must be forwarded so
that an intermediate node doing span restoration is able to
calculate an appropriate alternate route; this is similar to the
problems when establishing/maintaining TE requirements that span
mult-areas (see [MULTI] for a proposed mechanism).
Path restoration, on the other hand, switches traffic to an
alternate route around a failure, where the new route is selected at
the LSP initiator and may reuse intermediate nodes used by the
original LSP and it may include additional intermediate nodes. For
strict-hop routing, TE requirements can be directly applied to the
route calculation, and the filed node or link can be avoided.
However, if the failure occurred within a loose-routed hop, the
source node may not have enough information to reroute the
connection around the failure.
Restoration (span or path) will be initiated by the node that has
isolated the failure or by the node that has received either an RSVP
Notify message or an RSVP Path Error message indicating that a
failure has occurred. The new resources can be established in a
make-before-break fashion, where the new LSP is setup before the old
LSP is torn down, using the mechanisms of the LSP_Tunnel Session
object (see [TUNNEL]) and the Shared-Explicit reservation style.
Both the new and old LSPs share resources at nodes common to both
LSPs. The Tunnel end point addresses, Tunnel Id, Extended Tunnel
Id, Tunnel sender address, and LSP Id are all used to uniquely
identify both the old and new LSPs; this ensures new resources are
established without double counting resource requirements along
common segments.
4. Routing Enhancements
The GMPLS extensions to OSPF [OSPF-GE] and IS-IS [ISIS-GE] include
the advertisement of the LPT. The LPT field is a bit vector that
indicates the protection capabilities that are supported for the
link. The LPT field may be configured with Dedicated 1+1, Dedicated
1:1, Shared M:N, and Enhanced protection, as well as Unprotected.
Lang, J., Drake, J., Rekhter, Y. [Page 7]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
For a link that has dedicated 1+1 protection or is unprotected, this
advertisement provides a complete description of the link
capabilities and the usable bandwidth. However, a key argument for
using dedicated 1:1 or shared M:N is the efficiency gained by
reusing the protection bandwidth for lower priority traffic when the
bandwidth would otherwise be idle.
To advertise the protection bandwidth for a link that has dedicated
1:1 or shared M:N protection, a link with LPT field Extra Traffic
should be advertised. This indicates that bandwidth can be used by
LSPs, with the caveat that any LSPs routed over this link will be
preempted if the resources are needed as a result of a failure over
the primary link.
When a failure occurs on a dedicated 1:1 or shared M:N link, the
LSPs routed over the link will automatically be switched to the
Extra Traffic link that is protecting it.
To support the routing of Secondary LSPs for M:N path protection (as
described in Section 2.2.2), new extensions must be added to the
current GMPLS routing extensions. In particular, there must be a
mechanism to advertise secondary bandwidth and processing rules must
be defined for bandwidth accounting when LSP requests arrive at a
node. See [BWAcct] for a proposal addressing these issues.
5. Acknowledgments
We would like to thank Kireeti Kompella and Ayan Banerjee for their
comments and fruitful discussions.
6. References
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision
3," BCP 9, RFC 2026, October 1996.
[GMPLS] Ashwood-Smith, P., Banerjee, A., Berger, L., et al,
"Generalized MPLS - signaling functional description,"
Internet Draft, draft-ietf-mpls-generalized-mpls-
signaling-01.txt, (work in progress).
[OLCP] Chiu, A., Strand, J., Tkach, R., ôUnique Features and
Requirements for The Optical Layer Control Plane, Internet
Draft, draft-chiu--strand-unique-OLCP-01.txt, (work in
progress).
[LMP-DWDM] Fredette, A., Snyder, E., Shantigram, J., et al, ôLink
Management Protocol (LMP) for WDM Transmission Systems,ö
Internet Draft, draft-fredette-lmp-wdm-00.txt, (work in
progress).
[LMP] Lang, J. P., Mitra, K., Drake, J., Kompella, K., et al,
ôLink Management Protocol (LMP),ö Internet Draft, draft-
ietf-mpls-lmp-01.txt, (work in progress).
Lang, J., Drake, J., Rekhter, Y. [Page 8]
Internet Draft draft-lang-ccamp-recovery-00.txt February 2001
[RSVP-GEN] Ashwood-Smith, P., Banerjee, A., Berger, L., et al, "
Generalized MPLS Signaling - RSVP-TE Extensions," Internet
Draft, draft-ietf-mpls-generalized-rsvp-te-00.txt, (work
in progress).
[TUNNEL] Awduche, D., Berger, L., Gan, D-H., Li. T., Srinivasan,
V., Swallow, G., ôRSVP-TE: Extensions to RSVP for LSP
Tunnels,ö Internet Draft, draft-ietf-mpls-rsvp-lsp-tunnel-
07.txt, (work in progress).
[MULTI] Kompella, K., Rekhter, Y., ôMulti-area MPLS Traffic
Engineering,ö Internet Draft, draft-kompella-mpls-
multiarea-te-00.txt, (work in progress).
[OSPF-GE] Kompella, K., Rekhter, Y., Banerjee, A., Drake, J., et al,
ôOSPF Extensions in Support of MPL(ambda)S," Internet
Draft, draft-kompella-ospf-ompls-extensions-00.txt, (work
in progress).
[ISIS-GE] Kompella, K., Rekhter, Y., Banerjee, A., Drake, J., et al,
ôIS-IS Extensions in Support of Generalized MPLS,ö
Internet Draft, draft-ietf-isis-gmpls-extensions-01.txt,
(work in progress).
[BWAcct] Kompella, K., Lang, J.P., Drake, J., ôBandwidth Accouting
in Support of Secondary LSPs,ö Internet Draft, (work in
progress).
Lang, J., Drake, J., Rekhter, Y. [Page 9]
7. Author's Addresses
Jonathan P. Lang John Drake
Calient Networks Calient Networks
25 Castilian Drive 5853 Rue Ferrari
Goleta, CA 93117 San Jose, CA 95138
email: jplang@calient.net email: jdrake@calient.net
Yakov Rekhter
Juniper Networks
1194 N. Mathilda Avenue
Sunnyvale, CA 94089
email: yakov@juniper.net
Lang, J., Drake, J., Rekhter, Y. [Page 10]