Networking Working Group
Internet Draft Reshad Rahman,
Anca Zamfir,
Junaid Israr
Cisco Systems
Document: draft-rahman-rsvp-restart-
extensions.txt
Expires: April 2004 October 2003
RSVP Graceful Restart Extensions
draft-rahman-rsvp-restart-extensions-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026. Internet-Drafts
are working documents of the Internet Engineering Task Force
(IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed
at
http://www.ietf.org/shadow.html.
Abstract
This document describes the extensions needed by certain
features for the purpose of RSVP Graceful Restart. One of
these extensions refers to the ability of a node to recover
the ERO in the case it has performed an ERO expansion before
control plane restart. Also a small modification is proposed
Rahman, R., Zamfir, A., Israr, J. [Page 1]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
in the basic procedure to support simultaneous multiple node
restarts in a network. Specifically, a node should use a non-
zero Recovery Time while in the recovery phase. This allows a
node to determine at restart time if any of its neighbors has
previously restarted and it is currently in the recovery
phase.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in RFC 2119 [RFC2119].
Sub-IP ID Summary
(This section to be removed before publication.)
SUMMARY
This document specifies extensions and mechanisms to RSVP
Graceful Restart to provide support for ERO Recovery and
multiple node restart.
WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK?
This work fits in the MPLS box.
WHY IS IT TARGETED AT THIS WG?
This draft is targeted at this WG, because this it
specifies extensions to RSVP-TE signaling protocol for control
plane graceful restart
RELATED REFERENCES
Please refer to the reference section.
Table of Contents
1. Terminology
GR - Graceful Restart procedure for RSVP as specified in
[RFC3473].
Rahman, R., Zamfir, A., Israr, J. [Page 2]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
2. RSVP Graceful Restart
The procedure for RSVP Graceful Restart (GR) is described
in [RFC3473]. The purpose of this procedure is to allow a node
that has experienced a failure in the control plane but that
has preserved its forwarding plane to recreate its states
based on replayed RSVP messages received from its neighbors
and also based on information retrieved from the forwarding
plane. Typically, the data that can be obtained from the
forwarding plane for each LSP endpoint is (port, label). At
mid, control plane obtains also the cross-connect information:
(ingress-port, ingress-label) X (egress-port, egress-label)
While most of the objects can be recovered based on these two
sources of information, the route information that is
contained in the ERO object cannot be recovered if the
restarting node has modified the ERO content prior to restart.
Section 3 proposes a solution for this problem.
The GR procedure described in [RFC3473] handles some cases of
multiple node restart. In the example below, assume that one
LSP was signaled by R1 to span L1, R2, L2, R3.
L1 L2
R1 ----- R2 ----- R3
If R2 restarts and then R3 restarts after the Hellos have come
up, then as described in [RFC3473], R2 is able to detect that
R3 has restarted and if R2 is in the recovery mode when R3 has
restarted, when R2 receives a recovery message from R1 it
sends a recovery message to R3. For the LSP described before,
R1 sends the Path message including a Recovery Label and R2
also includes a Recovery Label in the Path message sent to R3.
If R2 restarts and then R3 restarts before the Hellos have
come up, then there is no specified way for R2 to detect that
R3 has restarted and is currently in Recovery Mode. Therefore,
when R2 receives the Path message with the Recovery Label from
R1, after processing it, R2 sends the corresponding outgoing
Path message to R3 with Suggested Label instead of the
Recovery Label. This is incorrect since this message must
include the Recovery Label in order to help R3 recover its
state.
A solution for this problem is described in Section 4.
Rahman, R., Zamfir, A., Israr, J. [Page 3]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
3. ERO Recovery
If a node experiences a control plane failure and restarts,
the existing GR procedures do not ensure that ERO expansion
before and after the failure yield the same results. A change
in ERO expansion should be avoided as it may lead to
undesirable results.
This section describes how such a change can be prevented. To
support this solution, a new RSVP object, called Recovery ERO
is introduced.
3.1 Recovery ERO Object
The Recovery ERO object is used during nodal fault recovery
process. The format of Recovery ERO object is identical to
that of the ERO object described in [RFC3209]. A Recovery ERO
object uses Class-Number ?? (of form 10bbbbbb) and the same C-
Type as the one of the ERO object it is trying to recover.
Only C-Type = 1 is currently supported.
3.2 Procedures at the Restarting Node and its Neighbors
When a node experiences control plane restart and receives a
Path message with Recovery Label from the upstream node, it
searches for a matching forwarding state as per [RFC3473]. If
no matching state is found and if ERO expansion is required,
then the node considers the Path message as a new LSP. It
processes the incoming Path message and performs ERO expansion
as specified in [RFC3473], [RFC3209]. If the forwarding state
is found and if ERO expansion is required, then the node
processes the incoming Path message as specified in [RFC3473],
[RFC3209] with following modifications:
1. It performs partial ERO expansion at this point to include:
- the strict next hop that is contained in the forwarding
state.
- the loose hop as in the ERO of the received Path message.
2. It includes the result of the previous step in the Recovery
ERO object to be sent as part of the outgoing Path message.
3. The restarting node sends the outgoing Path message out.
When the Path message is received by the neighbor downstream
of the restarting node, the following processing occurs:
4. If this message has associated incoming Path and forwarding
states, the neighbor node retrieves the ERO object as it was
previously created by the restarting node and formats a new
Recovery ERO object with this content to be sent upstream.
Rahman, R., Zamfir, A., Israr, J. [Page 4]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
5. If this message has neither an associated incoming Path
state nor a forwarding state, then this should be treated as a
normal setup.
6. If this message has no associated Path state but forwarding
state is present, then this node is restarting as well and the
procedure of the restarting node applies. Once the outgoing
Path state is recovered, this node retrieves the outgoing ERO
and creates the Recovery ERO object by prepending one or more
strict elements as identified by forwarding entry associated
with this LSP on this abstract node.
If a new upstream Recovery ERO object is available after
executing the steps 4, 5 and 6, then the neighbor node
includes the upstream ERO content in a Recovery ERO object to
be sent upstream in the Resv message.
When the restarting node receives the Resv message (after step
3), it removes the Recovery ERO object before creating the
Reservation State and uses its content to update the ERO in
the associated Path State.
In the case where restarting node determines that the
downstream node has not been able to include the expanded ERO
(e.g. downstream node is also a restarting node and forwarding
has not been preserved), the restarting node performs the
expansion as described in [RFC3473], [RFC3209]. In this case
the recovery of the LSP is not guaranteed.
Below is an example that covers the steps above:
Assume a simple topology as follows:
R1 ----- R2 ----- R3 ----- R5 ----- R6
| |
----R4----
1. R1 sends a Path message to R2 with an ERO containing [R2,
R6(loose)]. R2 performs ERO expansion [R3, R4, R5, R6] and
forwards the Path message to R3. R3 stores this ERO and LSP
gets established using normal LSP setup procedures.
2. The control plane on R2 restarts.
3. R2 receives a Path message with Recovery Label and ERO =
[R2, R6(loose)]. R2 finds the forwarding state and creates
the incoming Path state with ERO = [R2, R6].
Rahman, R., Zamfir, A., Israr, J. [Page 5]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
4. Based on the forwarding state, R2 determines that the next
hop is R3. R2 creates an outgoing Path State with a Recovery
ERO = [R2, R3, R6] and forwards the Path message to R3.
5. R3 finds the associated incoming Path State with ERO = [R3,
R4, R5, R6] and creates a Recovery ERO with this content. R3
sends the Resv message upstream including the Recovery ERO
object that contains: [R3, R4, R5, R6]
6. R2 receives the Resv message, removes the Recovery ERO
object and creates the Reservation state.
7. R2 uses the Recovery ERO from the Resv message to create
the ERO of the outgoing Path State as [R3, R4, R5, R6] and
removes the Recovery ERO.
4. Handling Multiple Restarts
If a node (R2 below) experiences a control plane failure
and if this node implements Graceful Restart procedure
described in [RFC3473], then it can correctly recover all RSVP
states it had prior to restart.
During recovery, new LSPs may be established as long as
they do not collide with the LSPs that are in progress of
being recovered. If a Path message that includes a Suggested
Label is received and if the restarting node checks its
forwarding and determines that a previous LSP was using the
(ingress-port, ingress-label), the new request may get
rejected. It may also get accepted with an upstream label
different than the one suggested in the Suggested Label.
R0 ----- R1 ----- R2 ----- R3
With the current specification, if a node R1 upstream of
the restarting node R2 is also in the recovery phase, then the
only case where the LSP is recovered is when R1 has restarted
prior to R2. In this case R1 determines based on the Hello
session that R2 has restarted and therefore it sends Path
Messages with Recovery Label for the LSPs that need to be
recovered.
In the case R1 restarted after R2, R1 cannot determine
based on the Hello Instance that R2 is a restarting node. When
R1 receives recovery Path messages from its upstream node, it
sends Path messages with Suggested Label to R2 as indicated in
Rahman, R., Zamfir, A., Israr, J. [Page 6]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
[RFC3473]. R2 does not use the Suggested Label in this case.
This is because its forwarding indicates that a Recovery Path
message may be received from R1. In fact, it may setup a new
LSP using a different label.
In the case of GMPLS, messages from different upstream
neighbors may be received on the same interface which may be
different from the interface hosting the LSP to be recovered.
This draft proposes the use of Recovery Time value received
by a restarting node as an indication that its downstream
neighbor is in recovery mode.
A node that complies with this specification sends non-zero
Recovery Time while in recovery mode and MUST set it to 0 once
its recovery has completed.
The alternative solution is to require restarting nodes
that receive a Path with Recovery Label to forward, after
processing, a Path with Recovery Label and not Suggested
Label.
To allow for recovery to complete, a restarting node may
wish to adjust its advertised Recovery Time when an upstream
node restarts, in an attempt to prevent the downstream nodes
to expire states that may be recovered later than expected.
5. Forward Compatibility Note
A node that does not support the Recovery ERO object and
the procedure described in this draft, will ignore the
Recovery ERO object and responds with a Resv. A restarting
node may choose to continue the recovery by performing a new
ERO expansion. In the case where the new ERO matches the ERO
before restart the LSP is recovered. Otherwise, depending on
the downstream node implementations, the LSP may be torn down.
The extensions specified in this draft do not affect the
processing of the Restart Cap object at nodes that do not
support them.
A node that does not comply with this specification and has no
proprietary way to detect at restart time the downstream
neighbors that have previously restarted and that are in the
recovery mode, may ignore the Recovery Time in the Restart_Cap
object and may forward only Path messages with Suggested
Label.
Rahman, R., Zamfir, A., Israr, J. [Page 7]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
A node that does comply with this specification and that
receives a Restart_Cap object with a non-zero Recovery Time
from a downstream node that does not comply with this
specification, forwards Path messages with Recovery Label
included for all recovered LSPs while in the recovery period.
If in fact the downstream node is not in Recovery mode and
receives a Path message with a Recovery Label, it should
generate a Resv message and normal state processing continues.
6. Security Considerations
This document does not introduce new security issues. The
security considerations pertaining to the original RSVP
protocol [RFC2205] remain relevant.
References
[RFC2205] "Resource ReSerVation Protocol (RSVP) - Version 1,
Functional Specification", RFC 2205, Braden, et al,
September 1997.
[RFC3209] "Extensions to RSVP for LSP Tunnels", D. Awduche, et
al, RFC 3209, December 2001.
[RFC3471] "Generalized Multi-Protocol Label Switching (GMPLS)
Signaling Functional Description", RFC 3471, L. Berger, et
al, January 2003.
[RFC3473] "Generalized Multi-Protocol Label Switching (GMPLS)
Signaling Resource ReserVation Protocol-Traffic Engineering
(RSVP-TE) Extensions", RFC 3471, L. Berger, et al, January
2003.
[RFC3477] "Signaling Unnumbered Links in Resource ReSerVation
Protocol - Traffic Engineering (RSVP-TE) ", RFC 3477, K.
Kompella, Y. Rekhter, January 2003.
[RFC2119] "Key words for use in RFCs to Indicate Requirement
Levels", RFC 2119, S. Bradner, March 1997.
Author's Addresses
Reshad Rahman
Cisco Systems Inc.
2000 Innovation Dr.,
Kanata, Ontario, K2K 3E8
Canada.
Phone: (613)-254-3519
Email: rrahman@cisco.com
Rahman, R., Zamfir, A., Israr, J. [Page 8]
draft-rahman-rsvp-restart-extensions-00.txt October 2003
Anca Zamfir
Cisco Systems Inc.
2000 Innovation Dr.,
Kanata, Ontario, K2K 3E8
Canada.
Phone: (613)-254-3484
Email: ancaz@cisco.com
Junaid Israr
Cisco Systems Inc.
2000 Innovation Dr.,
Kanata, Ontario, K2K 3E8
Canada.
Phone: (613)-254-3693
Email: jisrar@cisco.com
Rahman, R., Zamfir, A., Israr, J. [Page 9]