IETF Draft Changcheng Huang
Multi-Protocol Label Switching Vishal Sharma
Expires: September 2000 Srinivas Makam
Ken Owens
Tellabs
March 2000
A Path Protection/Restoration Mechanism for MPLS Networks
<draft-chang-mpls-path-protection-00.txt.
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
To deliver reliable service, multi-protocol label switching (MPLS)0
requires a set of procedures to provide protection of the traffic
carried on the label switched paths (LSPs). This imposes certain
requirements on the path recovery process and procedures 0, and
requires - signaling support for: the configuration of working and
protection paths, the communication of fault information, and
appropriate switchover action. This document specifies a mechanism
for path protection switching and restoration in MPLS.
Table of Contents
1.0 Introduction
2.0 Core Path Protection Components
2.1 Reverse Notification Tree (RNT)
2.2 Protection Domain
2.2.1 Relationship between protection domains with different RNTs
2.2.2 Relationship between protection domains with the same RNT
2.3 Multiple Faults
2.4 Timers and Thresholds
3.0 Configuration
3.1 Establishing a Recovery/Protection Path
3.2 Creating the RNT
3.3 Engineering a Protection Domain
3.4 Configuring Timers
4.0 Fault Detection
4.1 Unidirectional Link Fault
4.1.1 Downlink Fault
4.1.2 Uplink Fault
4.2 Bi-directional Link Fault or -Node Fault
5.0 Fault Notification
6.0 Switch Over
7.0 Switch Back
8.0 Security considerations
9.0 Acknowledgement
10.0 Intellectual Property Consideration
11.0 Authors' Addresses
12.0 References
1.0 Introduction
With the migration of real-time and high-priority traffic to IP
networks, and with the need for IP networks to increasingly carry
mission-critical business data, network survivability has become
critical for future IP networks. Current routing algorithms, despite
being robust and survivable, can take a substantial amount of time,
on the order of several seconds to minutes, to recover from a failure
causing serious disruption of service in the interim. This is
unacceptable for many applications that require a highly reliable
service, and has motivated network providers to give serious
consideration to the issue of network survivability.
Path-oriented technologies, such as MPLS, can be used to support
advanced survivability requirements and enhance the reliability of IP
networks. Different from legacy IP networks, MPLS networks pre-
establish label switched paths (LSPs), with packets with the same
label following the same path. This potentially allows MPLS to pre-
establish protection LSPs for working LSPs, and achieve better
protection switching times thanlegacy IP networks.
This contribution describes an MPLS path recovery mechanism that can
facilitate fast protection switching. The mechanism supports both 1:1
and bypass tunneling, and contains timers to enable it to inter-work
with protection mechanisms at other layers. Some of the key features
of this protection mechanism are:
A liveness message to detect faults.
Our assumption is that faults fall into different classes, and that
different faults may be detected and corrected by different layers.
Some faults (for example, the loss of signal or transmitter faults)
may be detected and corrected by lower layer mechanisms (such as
SONET), while others (for example, failure of the reverse link) may
be detected (but may not be corrected) by lower layers and may be
communicated to the MPLS layer. Still other faults (such as node
failures or faults on the reverse link) may not be detected by lower
layers, and will have to be detected and corrected at the MPLS layer.
Therefore, we adopt the liveness message as a complementary fault
detection mechanism.
- Special tree structure to distribute fault and/or recovery
information.
Existing published proposals û for MPLS recovery have not addressed
the issue of fault notification in detail. Specifically, none of
these proposals has discussed how to - perform fault notification for
the label merging case-. In this draft, we propose a new fault
notification ûstructure, called the reverse notification tree (RNT),
which - makes fault notification more efficient and ûscalable (we
provide details of the RNT in subsequent sections).
- Ability to permit recovery mechanisms at different layers to
coexist.
In the evolving IP network infrastructure, recovery will increasingly
be possible at different layers, and the interworking of recovery
mechanisms at different layers will be needed to ensure smooth
network operation in the presence of faults. For example, optical
layer or SONET layer recovery mechanisms could be used to recover
optical paths or SONET channels. However, these mechanisms may
initially be limited to ring topologies and may not provide the right
level of granularity at which recovery might be desired, making MPLS
layer protection desirable.
While MPLS-layer protection can be faster than IP layer rerouting it
may be more costly to use (it may need to reserve extra bandwidth for
example). In certain cases - it may become too complicated for MPLS
protection to be effective (for example, when there are multiple
faults, or faults on both the working and protection paths), making
it necessary for IP layer rerouting û to take over -. In this draft,
we assume that MPLS layer protection is - is used first, and, if
necessary, recovery may be handed - to IP layer rerouting following
that.
- Lightweight notification mechanism.
Reliable transport mechanisms, such as TCP, are typically state-
oriented and therefore difficult to scale. It is also very difficult
to support point-to-multipoint communications based on reliable
transport mechanisms. In our scheme, therefore, we use a stateless
notification mechanism to achieve scalability.
- Minimize delays of a recovery cycle.
In the mechanism proposed in this draft we attempt to minimize the
ûduration of the recovery cycle. To minimize the notification delay
we use a stateless transport mechanism together with high priority
for the control traffic. We also use a simple label merging approach
to handle the traffic on the working and protection paths, thereby
eliminating the need for synchronization (or handshaking) between the
LSRs at the two ends of a recovery path.
2.0 Core MPLS Path Protection Components
This document assumes the terminology given in 0,0 and 0, and
introduces some additional terms.
- Path Switch LSR (PSL)
An LSR that is the transmitter of both the working path traffic and
its corresponding recovery path traffic. The PSL is responsible for
switching of the traffic between the working path and the recovery
path. The PSL is the origin of the recovery path, but may or may not
be the origin of the working path (that is the working path may be
transiting the PSL).
- Path Merge LSR (PML)
An LSR that receives both working path traffic and its corresponding
recovery path traffic, and either merges their traffic into a single
outgoing path, or, if it is itself the destination, passes the
traffic on to the higher layer protocols. The PML is the destination
of the recovery path, but may or may not be the destination of the
working path (that is, the working path may be transiting through the
PML)
- Intermediate LSR
An LSR on a working or recovery path that is neither a PSL nor a PML
for that path.
- Working Path
A working path is denoted by the sequence of LSRs through which it
passes. For example, in Fig. 1, the working path that starts at LSR 1
and terminates at LSR 7 is denoted by (1-2-3-4-6-7).
- Recovery Path
A recovery path is also denoted by the sequence of LSRs through which
it passes. Again, in Fig. 1, the recovery path that starts at LSR 1
and terminates at LSR 7 is denoted by (1-5-7).
2.1 Reverse Notification Tree (RNT)
Since LSPs are unidirectional entities and recovery requires the
notification of faults to the LSR responsible for switchover to the
recovery path, a mechanism must be provided for the fault indication
and the fault recovery notification to travel from the point of
occurrence of the fault back to the PSL(s). The situation is
complicated with label merging, because in this case multiple working
paths converge to form a multipoint-to-point tree, with the PSLs as
the leaves and the PML as the root. In this case, therefore, the
fault indication and recovery notification should be able to travel
along a reverse path of the working path to all the PSLs affected by
the fault. Such a path is provided by the reverse notification tree
(RNT), which is a point-to-multipoint tree rooted at the PML that is
an exact mirror image of the converged multipoint-to-point working
paths, along which the FIS and the FRS travel (see Fig. 1). There are
several advantages to having an RNT:
- The RNT can be established in association with the working path,
simply by making each LSR along a working path remember its upstream
neighbor (or the collection of upstream neighbors whose working paths
converge at the LSR and exit as one. Thus, no multicast routing is
required. We elaborate more on the RNT in Section 3.
- Only one RNT is required for all the working paths that merge to
form the multipoint-to-point forward path. The RNT is rooted at the
PML and terminated at the PSLs. All intermediate LSRs on the
converged working paths share the same RNT.
Therefore, the RNT enables a reduction in the signaling overhead
associated with recovery. Unlike schemes that treat each LSP
independently, and require signaling between a PSL and the PML for
each LSP individually, the RNT allows for only one (or a small
number of) signaling messages on the shared segments of the LSPs.
- The RNT can be implemented either at Layer 3 or at Layer 2. In
either case, the delay along the RNT needs to be carefully
controlled. This may be ensured by giving the highest priority to the
fault and recovery notification packets, which travel along the RNT.
2.2 Protection Domain
The protection domain - is defined by the set of LSRs over which the
working path and its corresponding recovery path are routed. Thus, a
protection domain is bounded by the LSRs that provide the switching
and merging functions for MPLS protection-, namely, the PSL and the
PML, respectively. The PSL and the PML are identified during the
setting up of an LSP, either via an offline algorithm or an algorithm
that runs at the head-end of an LSP to decide the specific nodes that
the LSP must pass through. (Note that segments of the LSP between the
PSL and the PML may be loosely-explicitly routed, as long as the PSL
and PML are known). Recovery should ideally be performed between the
source and destination (end-to-end), but in some cases segment
recovery may be desired (for example, when certain segments are more
unreliable than others)or may be the only option (due to the topology
of the network, see Fig. 1). For example, in Fig. 1, the working path
9-3-4-6-7, can only have protection on the segment 9-3-4-6-7.
Note that when multiple LSPs merge into a single LSP, the working
paths corresponding to these LSPs also converge. As explained in
Section 2.4, an RNT is needed in this case for propagating the
failure and recovery notification back to the concerned PSL(s). We
can therefore have a situation where different protection domains
share a common RNT. - A protection domain is denoted by a specifying
the working path and the recovery path. For example, in Fig. 1, the
protection domain bounded by LSR 1 and LSR 7, is denoted by (1-2-3-4-
6-7, 1-5-7).
Figure 1: Illustration of MPLS protection configuration.
2.2.1 Relationship between protection domains with different RNTs
When protection domains have different RNTs, two cases may arise,
depending on whether or not any portions of the two domains overlap,
that is, have a node or link in common. If the protection domains do
not overlap, the protection domains are independent (note that by
virtue of the RNTs in the two domains being different, neither the
working paths nor the RNTs in the two domains can overlap). In other
words, failures in one domain do not interact with failures in the
other domain. For example, the protection domain defined by (9-3-4-6-
7, 9-10-7) is completely independent of the domain defined by (11-13-
5-15, 11-13-14-15). As a result, as long as faults occur in
independent domains, the network shown in Fig. 1 can tolerate
multiple -faults (for example, simultaneous failures on the working
path in each domain).
If protection domains with different RNTs overlap, it is still the
case that failures on the working paths of the two domains do not
affect one another. However, failures on the protection path of one
may affect the working path of the other and visa versa. For example,
the protection domain defined by (1-2-3-4-6-7, 1-5-7) is not
independent of the domain defined by (11-13-5-15, 11-13-14-15 ) since
LSR 5 lies on the protection path of the former domain and on the
working path of the latter domain.
2.2.2 Relationship between protection domains with the same RNT
When protection domains have the same RNT, different failures along
the working paths may affect both paths differently. As shown in
Fig. 1, for example, working paths 1-2-3-4-5-7 and 9-3-4-6-7 share
the same RNT. As a result, for a failure on some segment of the
working path, both domains will be affected, resulting in a
protection switch in both (for example, the segment 3-4-6-7 in Fig.
1). Likewise, for failures on other segments of the working path,
only one domain may be affected (for example, failure on segment 2-3
affects only the first working path 1-2-3-4-6-7, where as failure on
the segment 9-3 affects only the second working path 9-3-4-6-7).
2.3 Multiple Faults
We note that transferring the working traffic to the recovery path is
enough to take care of multiple faults on the working path. However,
if multiple faults happen such that there is at least one failure on
both the working and recovery paths, MPLS layer recovery may no
longer suffice. In this case, the PSL will either have to allow for
Layer 3 rerouting or inform the administrator via an alarm, thus
enabling the manual reconfiguration of a different working and
backup path. Note that for the PSL to be able to generate an alarm,
it must have a mechanism for detecting multiple faults on the
recovery path, such as a RNT for the recovery path (to allow for the
fault notification on the recovery path to be propagated to the PSL).
2.4 Timers and Thresholds
For its proper operation, the protection mechanism described in this
contribution uses the following timers and thresholds:
Timer or Symbol Function
Threshold
Inter FIS t1 Interval at which successive FIS
packet timer packets are transmitted by a LSR to
its upstream neighbor.
Max. FIS t2 Max. time for which FIS packets are
duration timer transmitted by an LSR to its upstream
peer.
Inter FRS T1Æ Interval at which successive FRS
packet timer packets are sent by a LSR to its
upstream neighbor.
Max. FRS t2Æ Max. time for which the FRS packets
duration timer are sent by an LSR to its upstream
neighbor.
Protection t3 Time interval between receipt of a
switching protection switch trigger and the
dampening timer initiation of the protection switch,
thereby allowing traffic on the
working path (downstream of the
fault) to clear out.
Restoration t3Æ Time interval between the initiation
dampening timer of the restoration switch and the
actual flow of data on the working
path, thereby allowing traffic on the
recovery path to clear out.
Liveness msg. t4 Interval at which successive liveness
sender timer messages are sent by an LSR to peer
LSRs that have a working path (and
RNT) through this LSR.
Liveness msg. t4' A timer set to count down the
receiver interval at the end of which a
timeout timer liveness message should be received.
Hold-off Timer T2 Interval between the detection of a
0 failure at an LSR, and the generation
of the first FIS message, to allow
time for lower layer protection to
take effect.
Wait-to-Restore T8 Interval between the detection of a
Timer 0 recovery/failure at an LSR, and the
generation of the first FRIS message,
to allow time for the stability of
restoration.
Lost liveness K No. of liveness messages that can be
message lost before an LSR will declare a
threshold fault and generate the first FIS.
Table 1. Timers and Thresholds
3.0 Configuration
In the following sections, we describe the operation of the path
protection mechanism, and explain the various steps involved with
reference to Fig. 1.
Protection configuration consists of two aspects: establishing the
protection path and creating the reverse notification tree.
3.1 Establishing a Recovery/Protection Path
The establishment of the recovery path requires the identification of
the working path, and hence the protection domain. In most cases, the
working path and its corresponding recovery path would be specified
during LSP setup, either via a path selection algorithm (running at a
centralized location or at an ingress LSR) or via administrative
configuration. Observe that the specification of the path, does not,
strictly speaking, require the entire path to be explicitly
specified. Rather, it requires only that the PSL and PML be
specified, with the segments between them being be loosely routed, if
required. In other words, the path would be established between the
two nodes at the boundaries of the protection domain (namely, the PSL
and the PML) via explicit (or source) routing using LDP 0, 0/RSVP 0,
0 signaling (alternatively, via constraint-based routing (with the
requirement that the path pass through the PSL and the PML), or using
manual configuration). The signaling would be used to specify both
PMTPs and working paths, where the working paths could span either an
entire LSP or a segment of a LSP.
Ingress Ingress Egress Egress Egress Egress
Label of Interface Label of Interface Label of Interface
RNT of RNT RNT of RNT RNT of RNT
N43 I34 N32 I23 N39 I93
Table 2. An example inverse cross-connect table for LSR 3 using MPLS
(Layer 2) RNT
Egress Egress Next Hop Egress Next Hop Egress
Label of Interface IP Address Interface IP Address Interface
Working of Working of RNT of RNT of RNT of RNT
Path Path
L34 I34 I2 I23 I9 I93
Table 3. An example inverse cross-connect table for LSR 3 using hop-
by-hop (Layer 3) RNT
The roles of the various core protection/recovery components are:
PSL: The PSL initiates the working LSP and the recovery LSP. It is
also responsible for storing information about which LSPs (or
portions thereof) have protection enabled, and for maintaining a
binding between outgoing labels -specifying the working path and the
protection/recovery path. The latter enables the switchover to the
recovery path upon the receipt of a protection switch trigger. The
PSL also maintains the timers t3, t3Æ, t4, t4Æ, T2, T8, and the
threshold K.
PML: The PML participates in the setting up of a recovery path as a
merging LSR. Therefore, it learns during signaling (or configuration)
about which working and protection paths are merged to the same
outgoing LSP. The PML also maintains timers t1, t1',t2, t2Æ, t4, t4',
T2, T8, and the threshold K.
Intermediate LSR: An intermediate LSR participates in the setup of
the recovery path, either as a normal LSR or as a merging LSR. It
also maintains timers t1, t1', t2, t2Æ, t4, t4Æ, T2, T8, and the
threshold K.
3.2 Creating the RNT
The RNT is used for propagating the FIS and the FRS, and can be
created by a simple extension to the LSP setup process. During the
establishment of the working path, the signaling message carries with
it the identity (address) of the upstream node that sent it (-for
example, via the path attribute in RSVP). Each LSR along the path
simply remembers the identity of its immediately prior upstream
neighbor on each incoming link. Through neighbor discovery mechanism
of the routing protocol, each LSR findsthe interface connecting it to
an upstream LSR. (It is assumed in this draft that there is a bi-
directional connection between two neighboring LSRs, such as a bi-
directional SONET link a bi-directional lower layer network link
(e.g., an ATM VP), or a pair of bi-directional tunnels over IP an
subnetwork. The node then creates an ôinverseö cross-connect table
that for each protected outgoing LSP maintains a list of the incoming
LSPs that merge into that outgoing LSP, together with the identity of
the upstream node and incoming interface that each incoming LSP comes
- through. Upon receiving an FIS, an LSR extracts the labels
contained in it (which are the labels of the protected LSPs that use
the outgoing link that the FIS was received on) and checks whether
the current LSR is the PSL for that LSP. If it is it terminates the
FIS. Otherwise, it consults its inverse cross-connect table to
determine the identity of the upstream nodes that the protected LSPs
come from, and creates and transmits an FIS to each of them.
Therefore, based on whether the RNT is implemented at Layer 3 or
Layer 2, two cases arise:
If the RNT is implemented by a point-to-multipoint LSP, then the
working path can be bound to the ingress label and interface of the
RNT LSP at a LSR. The ingress label and interface then can be used as
an index in the "inverse" cross-connect table to find the egress
labels and interfaces of the RNT LSP as shown in Table 2. Upon
receiving an FIS, an LSR extracts the labels and checks whether it is
the PSL for that LSP. If it is, it terminates the FIS. Otherwise, it
consults its inverse cross-connect table to determine the outgoing
labels and interfaces, inserts them into the FIS and forwards it to
the appropriate upstream node(s).
If the RNT is implemented by a hop-by-hop Layer 3 mechanism, using,
for example, UDP packets(with a specific port number to identify
notification message type), then the egress label and interface of
the working path can be used as an index into the inverse cross-
connect table to obtain the IP addresses of the previous hop(s) and
the associated outgoing interface(s), as shown in Table 3. On each
hop, the FIS carried in the UDP packet will carry the label and
interface of the working path for that hop. Thus, if the receiving
node is not a PSL, the label and interface in the FIS can be
extracted to access the inverse cross-connect table, and the label
and interface used by the working LSP on the hop(s) to the upstream
node(s) can be inserted into FIS packet(s). The FIS packet(s) are
then transmitted to the appropriate upstream node(s).
The roles of the various core protection/recovery components are:
PSL: The PSL must be able to correlate the RNT with the working and
recovery paths. To this end, it maintains a table with a list of
working LSPs protected by an RNT, and the identity of the recovery
LSPs that each working path is to be switched to in the event of a
failure on the working path. It need not maintain an inverse cross-
connect table (for those LSPs and working paths for which it is the
PSL).
PML: The PML is the root of the RNT, and has to associate each of its
upstream nodes with a working path and RNT. It need not maintain an
inverse cross-connect table (for those LSPs and working paths for
which it is a PML).
Intermediate LSR: An intermediate LSR has to only remember all of its
upstream neighbors and associate them with the appropriate working
paths and RNTs, and has to maintain an ôinverseö cross-connect table.
3.3 Engineering a Protection Domain
For 1:1 protection, the bandwidth reserved for a protection/recovery
path should be the same as the bandwidth reserved for its
corresponding working path. This guarantees the same bandwidth for
the protected traffic after protection switching. If the LSRs on the
protection path support excess mode 0, the bandwidth reserved on the
protection path for protecting high priority traffic can be used by
other lower priority traffic streams. That is, low priority traffic
that is destined for the same node as the working traffic can be sent
on the protection path, and is transmitted onward by the PML after
merging with the working traffic. Also, if delay, jitter or other QoS
parameters are to be satisfied, the protection path in 1:1 protection
should be chosen such that these requirements are satisfied.
Since the volume of signaling traffic (e.g., FIS/FRS messages, or
liveness messages) is small, in general bandwidth need not be
reserved for the signaling traffic provided that there exist other
mechanisms that can ensure that the delay requirements of signaling
messages are met (by using, for example, the highest priority for
signaling messages).
For bypass tunneling protection, multiple working LSPs may share the
same protection bandwidth by tunneling protection LSPs over a common
path. This requires that e the paths of these working LSPs be
disjoint, except at the PSL and PML, so that they can be assumed to
not all fail at the same time. In this case, the bandwidth reserved
will be the maximum of all individual paths. Otherwise, a bypass
tunnel could be created to carry all the backup paths, with the
bandwidth reserved for the tunnel being the maximum bandwidth
required over all failure scenarios of the working LSPs.
3.4 Configuring Timers
The purpose of timers t1/t1' is to control the tradeoff between
notification delay of the FIS/FRS and the resources consumed when
sending the FIS/FRS. If t1/t1' is large, it may take a relatively
long time for the initiation node to send the second the FIS/FRS if
the first FIS/FRS message is lost, thereby increasing notification
delay. On the other hand, if t1/t1' is small, the repetitive sending
of FIS/FRS messages may waste bandwidth and processing power because
the first message may already have reached the PSL(s).
It is assumed that after t2/t2' it is not necessary to do protection
at MPLS layer, either because it is no longer useful or because by
that time an upper layer protection mechanism will have been
triggered.
The purpose of timers t3/t3' is to minimize the misordering of
packets at a PML following a protection (restoration) switch from the
working (backup) to the backup (working) path. This is because
packets buffered on the working (backup) path, downstream of the
fault, may continue to arrive at the PML even as working traffic
begins to arrive on the protection (working) path. Therefore, forcing
the PSL to hold off the protection (or restoration) switching action,
gives the buffers on the working (protection) path time to clear
before data on the protection (working) path begins to arrive.
The timers t4/t4' are used to control the frequency of liveness
messages sent between neighboring LSRs where t4 control how often the
liveness message should be sent out from the sender side and t4' is
the timeout timer on the receiver side. While frequent exchanges of
liveness messages can unnecessarily consume network resources, too
few exchanges may delay the discovery of faults. To accommodate delay
jitter, t4' may be set at a slightly different value from t4.
The timers T2/T8 are used to allow the lower layer protection to
take effect before initiating MPLS layer recovery mechanisms (for
example, an automatic protection switching between fibers that
comprise a link between two LSRs). Following the detection of a
fault/fault recovery, an LSR waits for T2/T8 time units before
issuing the first FIS/FRS packet, respectively. This allows for the
lower layer protection to take effect and for the LSR to learn this
through one of several ways: via an indication from a lower layer, or
by the resumption of the reception of a liveness message, or by the
lack ofLF, LD, PF or PD conditions.
The threshold K helps to minimize false alarms due to the occasional
loss of a liveness message, which may occur, for example, due to a
temporary impairment in a link or a peer LSR or due to a buffer
overflow.
4.0 Fault Detection
Each LSR must be able to detect certain types of faults, such as PF,
PD, LF, and LD 0 and propagate an FIS message towards the PSL. Here
we consider unidirectional linkfaults , bi-directional (or complete)
linkfaults , and nodefaults.
4.1 Unidirectional Link Fault
A uni-directional link fault implies that only one direction of a bi-
directional link has experienced a fault.
4.1.1 Downlink Fault
A fault on a link in the downstream direction will be detected by the
node downstream of the faulty link, either via the PF or PD condition
being detected at the MPLS layer, or via LF or LD signals being
propagated to the MPLS layer by the lower layer or via the absence of
liveness messages. The downstream node will then periodically
transmit FIS messages to its upstream neighbor (via the uplink),
which will propagate these further upstream (using its inverse cross-
connect table) until they eventually reach the appropriate PSLs,
which will perform the protection switch.
Therefore, in Fig. 1, if link L34 has a fault,LSR 4 will detect the
fault via one of the means described above, and start transmitting an
FIS packet once every t1 time units back to LSR 3 over link L43. The
traffic in the queues of LSR 4 will continue to be serviced. LSR 3 in
turn will propagate the FIS over the RNT back to LSR 2 and LSR 9. The
actual protection switch will be performed by LSRs 9 and 1, t3 time
units after the receipt of the first FIS. LSR 4 will stop
transmitting FIS messages t2 time units after the transmission of the
first FIS message.
4.1.2 Uplink Fault
A fault on a link in the upstream direction will be detected by a
node upstream of the faulty link, either via a LF or LD being
detected at the lower layer and propagated to the MPLS layer (if
there was traffic on this reverse link), or via the PD or PF
condition being detected at the MPLS layer, or via absence of
liveness messages. The upstream node will then periodically send out
FIS messages to the node upstream of it, which in turn will propagate
these further until eventually the PSL(s) learns of the failure and
performs the protection switch.
Therefore, in Fig. 1, if link L43 experiences a fault, LSR 3 will
detect the fault, and transmit an FIS to nodes 2 and 9. Node 2, in
turn, will transmit an FIS to node 1, and nodes 1 and 9 will perform
the actual protection switch
4.2 Bi-directional link fault or Node Fault
When both directions of the link have a fault (as in the case of a
fiber cut), nodes at both ends of the link will detect the fault
either due to the LF or PF signal or due to the absence of liveness
messages. Both will transmit FIS messages to their upstream nodes.
However, it is only the node upstream of the failed link whose FIS
messages will propagate further upstream, eventually reaching the
appropriate PSLs, which will perform the protection switch to the
recovery path.
The case of a node fault is similar, with the node upstream of the
failed node detecting the failure (due to loss of liveness messages,
for example) and propagating that information via the FIS message.
For example in Fig. 1, when both directions of the link between nodes
3 and 4 experience a fault (or when node 4has a fault), LSR 3 will
detect this failure via the non-reception of the liveness message,
and transmit FIS messages to nodes 2 and 9 as before. When nodes 1
and 9 receive the FIS message they will perform the protection switch
after waiting for an interval of t3 time units.
The roles of the various core protection components in failure
detection are the same. The PSL, PML, and intermediate LSR must all
be able to detect PF and PD conditions and/or be able to interpret
and respond to the LF and LD indications received from the lower
layers.
5.0 Fault Notification
The rapid notification of a fault is effected by the propagation of
the FIS message along the RNT. Due to the timers built into the
FIS/FRS propagation mechanism, the transportation of FIS/FRS messages
does not require a reliable mechanism like TCP. Any LSR may generate
an FIS, but a PSL is the only LSR that may terminate it.
For instance, in Fig. 1 if link L23 fails, LSR 3 will detect it and
transmit a FIS to LSR 2 (after waiting for time T2), its upstream
neighbor along link L23. The FIS will contain the incoming labels (at
node 3) of those LSPs on link L23 that have protection enabled. Upon
receiving the FIS message, LSR 2 will consult its inverse-cross
connect table and generate an FIS message for LSR 1, which on
receiving the first FIS packet will wait for time t3 before
performing a protection switch. The node which initiates the FIS will
continue to send FIS messages at an interval of t1 until timer t2
expires. After t2 expires it is assumed that either upper layer
protection will be triggered or enough number of FIS messages will
have been sent to reach the desired reliability in conveying fault
information to the PSL(s).
The roles of the various core protection switching components are:
PSL: The PSL does not generate a FIS message, but must be able to
detect FIS packets.
PML: The PML must be able to generate the FIS packets in response to
detecting failure, and should transmit them over the RNT. The PML
begins FIS transmission after continuously detecting a fault for T2
time units, and does so every t1 time units for a maximum of t2 time
units.
Intermediate LSR: An intermediate LSR must be able to
generate/forward FIS packets, either as a result of continuously
detecting a fault for T2 time units or in response to a received FIS
packet. It must transmit these to all its affected upstream neighbors
as per its inverse cross-connect table. Again, it does so every t1
time units for a maximum of t2 time units.
6.0 Switch Over
The switch over is the actual switching of the working traffic from
the working path to the recovery path. This is performed by a PSL, t3
time units after the reception of the first FIS packet.
For example, in Fig. 1, consider protection domain (1-2-3-4-6-7, 1-5-
7). When link L34 fails, the PSL LSR 1 on learning of the failure
will perform a protection switch of the protected traffic from the
working path 1-2-3-4-6-7 to the backup path 1-5-7. Notice that LSR 7
acts as a protection merge LSR, merging traffic from the working and
backup paths. Since buffered packets from LSR 4 may continue to
arrive at LSR 7 even after the protection switch (the dampening timer
t43at the PSL tends to mitigate this), a short-term misordering of
packets may happen at LSR 7, until the buffers on the working path
drain out.
The role of the core protection components is as follows:
PSL: Performs the protection switch upon receipt of the FIS message,
but after waiting for time t3 following the first FIS message.
PML: The PML automatically merges protection traffic with working
traffic. For a short period of time this may cause misordering of
packets, since packets buffered at LSRs downstream of the fault may
continue to arrive at the PML along the working path.
Intermediate LSR: The intermediate LSR has no special action.
7.0 Switch Back
Switch back or restoration is the transfer of working traffic from
the recovery path to the working path, once the working path is
repaired. This may be because the recovery path may be a limited
recovery path 0, or because the working path is deemed to be
preferred 0in some respect. Restoration may be automatic or it may be
performed by manual intervention (or not performed at all). In the
revertive mode, restoration is performed upon the receipt of the FRS
message, while in the non-revertive mode it may be performed by
operator intervention.
The role of the core protection components is similar here to what it
is for protection switching. The PML does not need to do anything,
unless it was the node that detected the failure, in which case it
transmits a FRS upstream T8 time units after continuously detecting
recover signal from lower layer or after detecting liveness messages
from its peers. The intermediate LSR generates the FRS message if it
was the node that detected the recovery or generates a FRS to relay
the restoration status received from a downstream node. The PSL
performs the restoration switch t3Æ seconds after receiving the first
FIS message.
8.0 Security Considerations
The MPLS protection that is specified herein does not raise any
security issues that are not already present in the MPLS
architecture.
9.0 Acknowledgement
We would like to thank Mr. Ben Mack-Crane from Tellabs.
10.0 Intellectual Property Consideration
In accordance with the intellectual property rights procedures of the
IETF standards process, to the extent that Tellabs has patents,
pending applications and/or other intellectual property rights that
are essential to implementation of any subject matter submitted by
Tellabs that is included in a standard, Tellabs is prepared to grant,
on the basis of reciprocity (grantback), a license on such subject
matter under terms and conditions that are reasonable and non-
discriminatory.
11.0 Authors' Addresses
Changcheng Huang
Tellabs Operations, Inc.
4951 Indiana Avenue
Lisle, IL 60532
Email: Changcheng.Huang@tellabs.com
Ph: 630-512-7954
Vishal Sharma
Tellabs Research Center
One Kendall Square
Bldg. 100, Suite 121
Cambridge, MA 02139
Email: Vishal.Sharma@trc.tellabs.com
Ph: 617-577-8760
Srinivas Makam
Tellabs Operations, Inc.
4951 Indiana Avenue
Lisle, IL 60532
Email: Srinivas.Makam@tellabs.com
Ph: 630-512-7217
Ken Owens
Tellabs
1106 Fourth Street
St. Louis, MO 63126
Email: Ken.Owens@tellabs.com
Ph: 314-825-7009
12.0 References
[1] Rosen, E., Viswanathan, A., and Callon, R., "Multiprotocol Label
Switching Architecture", Work in Progress, Internet Draft <draft-ietf-
mpls-arch-06.txt>, August 1999.
[2] Callon, R., Doolan, P., Feldman, N., Fredette, A., Swallow, G.,
Viswanathan, A., "A Framework for Multiprotocol Label Switching",
Work in Progress, Internet Draft <draft-ietf-mpls-framework-05.txt>,
September 1999.
[3] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M., McManus,J.,
"Requirements for Traffic Engineering Over MPLS", RFC 2702, September
1999.
[4] Andersson, L., Doolan, P., Feldman, N., Fredette, A., Thomas, B.,
"LDP Specification", Work in Progress, Internet Draft <draft-ietf-
mpls-ldp-06.txt>, September 1999.
[5] Jamoussi, B. "Constraint-Based LSP Setup using LDP", Work in
Progress, Internet Draft <draft-ietf-mpls-cr-ldp-03.txt>, September
1999.
[6] Makam, V., Sharma, V., Huang, C., Owens, K., Mack-Crane, B., et
al, ôA Framework for MPLS-based Recovery,ö Work in Progress, Internet
Draft <draft-makam-mpls-recovery-frmwrk-00.txt>, February 2000.
[7] Braden, R., Zhang, L., Berson, S., Herzog, S., "Resource
ReSerVation Protocol (RSVP) -- Version 1 Functional Specification",
RFC 2205, September 1997.
[8] Awduche, D. et al "Extensions to RSVP for LSP Tunnels", Work in
Progress, Internet Draft <draft-ietf-mpls-rsvp-lsp-tunnel-04.txt,
September 1999.