CCAMP Working Group Richard Rabbat, Ed. (FLA)
Internet Draft Vishal Sharma, Ed. (Metanoia, Inc.)
Expires: November 2004 Norihiko Shinomiya (FLL)
Ching-Fong Su (FLA)
May 2004
Fault Notification Protocol for GMPLS-Based Recovery in Shared Mesh
Networks
draft-rabbat-fault-notification-protocol-05.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [1].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This draft presents a fault notification protocol for use in a GMPLS-
based failure recovery scheme. The protocol guarantees recovery
path(s) activation in a bounded time in the event of single resource
failures. These failures include fiber cut, transponder failure and
node failure. Bounded recovery time is achieved by pre-signaling
recovery paths whose nodes can be reached within a specific time,
based on the physical capabilities of the nodes and the delay
characteristics of the control plane. We propose using a flooding
protocol for fault notification to allow for per-failure notification
and to speed up the recovery process. We justify choices made for
the notification method and the messaging required for the protocol.
The draft does not mandate a specific implementation of the Fault
Notification Protocol.
Rabbat & Sharma (Eds.) Expires November 2004 [Page 1]
draft-rabbat-fault-notification-protocol-05.txt May
2004
Table of Contents
1. Overview.......................................................2
2. Terminology....................................................4
3. Glossary of Terms Used.........................................4
4. Requirements at Recovery Path Setup Time.......................4
5. Steps in Failure Notification and Service Recovery.............6
5.1 T1: Fault Detection Time......................................6
5.2 T2: Hold-Off Time.............................................7
5.3 Tt: Transfer Time.............................................7
5.4 T5: Traffic Recovery Time.....................................7
6. Fault Notification Protocol (FNP)..............................7
6.1 FNP Flooding Operation........................................9
6.2 Delays Incurred by Messages..................................10
6.3 Notification Message Data....................................10
7. Reversion (Normalization).....................................11
8. Discussion on Database Updates................................11
9. Security Considerations.......................................12
10. Conclusion...................................................12
11. Acknowledgments..............................................12
12. Intellectual Property Considerations.........................12
13. Authors' Addresses...........................................16
Appendix A. Fault Notification Message Delays on a Path..........16
A.1 Delays Associated with Link Traversal........................16
A.2 Delays Incurred at the Nodes.................................17
Full Copyright Statement.........................................18
Changes from Previous Version
- Updated Intellectual Property section
- Added discussion about database updating to explain that IGP is
sole updating agent
- Updated terminology to be synchronize with [2]
1. Overview
Recovery (protection and restoration) in optical switching networks
under tight time constraints has been recognized as a challenging
issue [2, 3] that is crucial to enable fast path restoration and meet
requirements for high-availability and service-level guarantees.
Several mechanisms have been devised for recovery in mesh and ring
topologies. The CCAMP WG has produced a collection of drafts that
address the issue of recovery in networks featuring a Generalized
Multi-Protocol Label Switching (GMPLS) control-plane: a terminology
draft for GMPLS-based recovery [2], an analysis draft [4] that looks
at differences between protection, restoration, path-based, link-
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 2]
draft-rabbat-fault-notification-protocol-05.txt May
2004
based and span-based approaches, a functional specification draft [5]
that presents a functional description of some of the protocol
extensions needed to support GMPLS-based recovery and solutions for
edge-to-edge and segment recovery [6, 7]. The requirements for
recovery in optical networks that this draft addresses are presented
in [8].
In general, a fault notification protocol for optical transport
networks should address recovery requirements falling into three main
categories:
- Timing requirements: it must meet adequate bounds on timing to
enable fast path restoration
- Control plane resources: it must use control plane resources
efficiently
- Design of recovery schemes: it must allow for the design of
flexible recovery schemes
Protection and restoration algorithms can be used for local repair
(link-based or node-based), span recovery, and path recovery. This
document presents a fault notification protocol and recovery scheme
designed to ensure bounded recovery times, (e.g., 50 ms), which are
comparable to recovery times in the ring-based SONET/SDH networks
that implement 1+1 or 1:1 protection schemes.
Link-based recovery can handle faults such as fiber link failures and
transponder failures. However, in the case of a node failure, the
control plane uses either node-based or path-based recovery. The
advantage of path-based recovery lies in its ability to reduce
wavelength redundancy (wavelengths that are reserved for possible
failures), but its disadvantage is the potentially lengthy delay
incurred in notifying all nodes along the recovery path of the
failure of a remote resource. Span-based protection allows the
protection of independent segments on the working path, thereby
decreasing the recovery time, but requires more resources for
protection. In addition, the provider has to go to a greater degree
of planning to protect the same resource. In some applications,
recovery paths need to be chosen carefully to meet certain recovery
time requirements.
This document presents a fault notification protocol that applies to
intra-domain recovery, and that we will call FNP. (We shall use the
term fault notification protocol, when referring to a generic scheme
for notification, and the term FNP, when referring to the specific
scheme discussed in this document). The protocol applies to networks
that implement shared recovery, and deals with both ring and mesh-
based recovery. Multi-domain recovery is not within the scope of
this draft. In addition, this proposal focuses on scalability, an
important issue that arises when using signaling for fault
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 3]
draft-rabbat-fault-notification-protocol-05.txt May
2004
notification. Implementation of the protocol is left for further
drafts. For details about the applicability of FNP, please refer to
the accompanying draft [9].
We assume unidirectional traffic through Label Switched Paths (LSPs)
and assume that bidirectional traffic is carried by two
unidirectional LSPs. Assumptions made in this draft are also valid
for bi-directional LSPs. For the purpose of illustration, we use a
mesh Wavelength Division Multiplexing (WDM) network with OEO
switching (i.e. wavelength termination at nodes); applicability to
ring-topology networks is automatic.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [10].
3. Glossary of Terms Used
In addition to the terminology for GMPLS-based recovery that is
documented in [2], this draft uses the following acronyms:
o AIS: Alarm Indication Signal, a signal at the SONET/SDH
transport layer
o BDI: Backward Defect Indication, a signal at the transport
layer sent upstream
o LSP: Label Switched Path
o MEMS: Micro-Electro Mechanical System
o WDM: Wavelength Division Multiplexing
4. Requirements at Recovery Path Setup Time
A request for a working path signaled into the network indicates the
type of protection or restoration it requires, and, optionally, a
recovery priority value. The recovery priority is useful if, during
the recovery process, a node has to decide on which (of many) working
paths to protect. After the recovery route computation algorithm
calculates the protection or restoration path, the link resources
(wavelengths, wavebands, etc.) along that path are reserved and
possibly activated. When the recovery path is not activated, these
link resources may be used to carry preemptible best-effort traffic
to increase network utilization. This traffic is generally
identified as "extra traffic." Alternatively, the same link resource
may be reserved by multiple recovery LSPs for different link failures
as long as these recovery LSPs do not need to be activated
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 4]
draft-rabbat-fault-notification-protocol-05.txt May
2004
simultaneously (e.g., M:N shared protection). In either case, proper
link resources need to be activated upon notification of failure.
When a label for a recovery LSP is setup on a certain node A by RSVP-
TE, node A should be aware of the network resource that this LSP is
protecting. When using RSVP-TE for example, the protection PATH
message may notify all nodes on the protection path of this
information at path setup time as proposed in [11]. This allows node
A to bind (or group together) labels (as well as link resources) that
protect a particular network resource. For example, if two labels j
and k correspond to two LSPs used to protect working paths from the
failure of link (X,Y), then they belong to the set L (X,Y). This
allows node A to process, in its control plane, the joint event of
the two LSP failures and possibly jointly activate/cross-connect both
LSPs referenced by labels j and k when it receives notification of
the failure of link (X,Y).
This document proposes a method for per-failure fault notification
(as compared to per-LSP fault notification); hence such aggregated
label information is essential. The main difference between "per-
failure" vs. "per-LSP" notification is in the number of notification
mechanisms that have to concurrently occur. Per-failure fault
notification allows the engaging of one mechanism to notify all
relevant nodes of the fault. On the other hand, per-LSP notification
requires activating as many mechanisms as the number of failed LSPs
(for example, all LSPs that failed due to a link failure). In an
optical network, carrying possibly hundreds of wavelengths per fiber
or TDM channels per wavelength, per-LSP notification can be taxing on
the hardware and resource-intensive.
In the case of GMPLS-based signaling, there is generally one fault
notification (using the Notify message) per disrupted Label Switched
Path. One could achieve some amount of efficiency by bundling
notification messages by correlating fault information and sending
one Notify message per source node. Each of these nodes, upon
receipt of the Notify, would have to initiate a handshake process
through RSVP-TE Path and Resv signaling messages to complete the
activation of the backup path before working traffic could be
switched to it. While some recovery LSPs may be the same and could
be signaled together during the handshake phase, this is generally
restrictive in a mesh network. In general, the notification followed
by the handshake mechanism increase recovery time. The restrictions
on the network topology and the choice of recovery paths and end-
nodes, make these scalability enhancements hard to realize in a
network that implements shared-mesh restoration.
Hence, signaling does not scale well with the number of connections;
in addition, the message processing delay is less predictable. As
explained later, the flooding approach decreases the recovery time by
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 5]
draft-rabbat-fault-notification-protocol-05.txt May
2004
removing the need for such a mechanism. For details about the
notification methods and the choice of flooding for the current
mechanism, the reader is encouraged to refer to [12].
A companion draft [13] explains the need for an expedited flooding
mechanism to realize FNP. The document outlines how a flooding
protocol implementation must balance the need for fast flooding of
failure notifications with the need for controlling the frequency of
flooding message transmission, so that it maintains network
stability.
5. Steps in Failure Notification and Service Recovery
The steps described in this section detail the different times
between the occurrence of a network impairment and completion of all
recovery actions. The failure sequence is based on the timing
sequence in the ITU-T recommendation G.808.1 [14], adapted for the
purposes of this draft. The critical component in guaranteeing time
constraints to service recovery is the fault notification process.
The following sequence of events MUST be followed in order to ensure
that the recovery process is completed within a specific amount of
time.
+-Network Impairment
| +-Fault Detected
| | +-Start of Fault Notification and Recovery
| | | +-Recovery Operation Complete
| | | | +-Traffic Recovered
| | | | |
| | | | |
v v v v v
------------------------------------------------>
| T1 | T2 | Tt | T5 | time
Figure 1. Recovery Temporal Model
5.1 T1: Fault Detection Time
This is the period of time between the impairment in a network and
the detection of signals triggered by that impairment. An example of
such network impairment is a fiber cut in a WDM network with
termination at all nodes (nodes are OEO switches). In general, if a
bi-directional link is cut, both its upstream and downstream nodes
will detect the fault. A unidirectional link failure will be
detected by the downstream node.
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 6]
draft-rabbat-fault-notification-protocol-05.txt May
2004
To support the failure detection requirement, nodes MUST implement
per-channel monitoring that will pinpoint the failure and report it.
5.2 T2: Hold-Off Time
This is the period of time between the detection of a fault and the
start of the fault recovery process. In other words, it is the period
of time that the reporting entity waits before starting the fault
recovery process. This allows the fault recovery process at a given
layer to wait for recovery to occur at a lower layer. In the case of
WDM-based networks with no data recovery mechanism, this time should
be 0 sec. In other networks that use SONET-based recovery, this time
T2 may be set to 50ms such that SONET protection scheme can complete
before any IP-based recovery is triggered.
5.3 Tt: Transfer Time
Tt (Transfer time) is the sum of T3 (fault notification time) and T4
(completion of recovery operation time). Tt is the period of time
between the start of the fault notification process and the
completion of recovery switching of the traffic on the protection
path. This includes the transmission and processing of the control
signals required to effect recovery. In other words, it is the
interval between the time when the detecting entity starts sending
out a fault notification message and the time when every node,
including ingress nodes and intermediate nodes on the corresponding
recovery paths, have been notified of the failure and finished
reconfiguring themselves for carrying restored traffic. This
includes the fact that the end-nodes have switched traffic unto the
backup path.
5.4 T5: Traffic Recovery Time
T5 is the period of time between the completion of the recovery
actions and the full restoration of working traffic. In other words,
this is the time between the last recovery action at the end-nodes
and the time that the traffic (if present) is completely recovered.
This interval is intended to account for the time required for
traffic to once again arrive at the point in the network that
experienced disrupted or degraded service due to the occurrence of
the fault, i.e. the egress node.
6. Fault Notification Protocol (FNP)
The Fault Notification Protocol is a series of procedures designed to
be executed during Tt (the transfer time that consists of the fault
notification and completion of recovery operation time) and to effect
timely notification of network faults. This protocol is used for
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 7]
draft-rabbat-fault-notification-protocol-05.txt May
2004
notifying nodes of the resource failure and in activating the
recovery LSPs.
For link-based recovery, the ingress node to the recovery LSP is the
upstream detecting node. If the recovery time is strictly
constrained, the ingress node SHOULD be as close to the link failure
as possible. This reduces the recovery time since no messages have
to be relayed to a remote or centralized authority to initiate
recovery.
A fault detecting entity may be at different locations depending on
the type of fault. That detecting entity will initiate the
notification process.
The detecting entity MAY use several fault notification methods to
notify other nodes of the failure, including GMPLS-based signaling or
flooding as discussed earlier in section 4.
In the case of flooding, the message sent from the detecting entity
to all nodes on the various recovery paths reaches each of them
within the specified recovery time T-rec minus the reconfiguration
time T-cfg(k) needed at each node k after it is notified of a fault.
We define this time as the fault notification time T-ntf(k) as
T-ntf(k) = T-rec T-cfg(k)
Nodes on a recovery path (including the ingress node) are aware that
they are protecting against the failure of a particular resource.
All nodes notified of the failure will de-correlate the fault
information to learn what LSPs are impacted and activate the recovery
LSPs by performing any required hardware reconfiguration (e.g.,
moving mirrors in the case of a MEMS-based switching fabric or cross-
connecting TDM channels). The approach outlined in this draft
supports node reconfiguration whether applied sequentially (e.g.,
parallel movement of the mirrors is not available), or in parallel
(e.g., electronic switching fabric). They also stop carrying extra
traffic, if any, on the recovery channel. An algorithm that computes
the constrained recovery path SHOULD take the physical capability of
nodes into account in its path calculation.
The ingress node starts sending data on the recovery LSP at the start
time T-start(I) specified in the next paragraph. If one of the
detecting entities at the ingress or egress node detect, at the data
plane, a failure in the recovery LSP to be activated, it MUST raise
an alarm that may be dealt with at the management plane. The
management plane will take appropriate remediation action. Alarm and
remediation are outside the scope of this draft.
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 8]
draft-rabbat-fault-notification-protocol-05.txt May
2004
The nodes on that belong to recovery LSPs receive the fault
notification within a deterministic time. This time delay is
calculated by each node as explained in Appendix A. To avoid complex
clock synchronization, an ingress node, identified as node I, that
receives the notification from a detecting node, node J, calculates
the start time T-start(I) at which it must switch traffic to the
recovery LSP as follows:
T-start(I) = time-of-notification(I) - min-delay-between(J,I) +
T-rec
Where
- time-of-notification(I) returns the clock time at node I.
- min-delay-between(J,I) returns the minimum time needed for the
notification from node J to reach node I; this value is
dependent on the topology and the different equipment in the
network. It is calculated offline based on the topology and
hardware information, and is stored as a static table at every
node.
Note that (time-of-notification(I) - min-delay-between(J,I)) will
give the time when failure was detected at J, and T-rec is the
recovery time requirement.
Our scheme, therefore, works in the following manner:
1. Given the topology and the equipment in the network, it is
possible to calculate T-rec and T-ntf for a given failure.
2. An offline or online algorithm outside the scope of this draft
may calculate the recovery path using this information.
3. Upon the occurrence of a failure, when flooding-based
notification is used as described above, a node I on the
recovery path is guaranteed that at T-start(I), all other nodes
along the recovery path have been informed of the failure and
have taken the appropriate action to move traffic onto the
recovery path.
6.1 FNP Flooding Operation
Fault notification is done via flooding as follows. The detecting
entity sends a notification packet to its neighbors on all outgoing
links. The notification packet is a high-priority packet, and
contains the unique identifier of the link at fault. Each node that
receives such a packet sends an acknowledgement to the sender (its
neighbor) and transmits duplicates of the notification to all other
neighboring nodes. To reduce the amount of fault notification
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 9]
draft-rabbat-fault-notification-protocol-05.txt May
2004
traffic that is flooded, the nodes avoid re-broadcasting packets
notifying about the same fault and decrement a time-to-live field in
the packets as they are received.
6.2 Delays Incurred by Messages
The above discussion suggests that in order for the recovery path
calculation algorithm to abide by the T-rec recovery requirement, it
needs to adopt one of two methods.
1. Be aware of timing issues to be able to select a proper path.
2. Only consider the nodes and links that satisfy the timing
constraints.
Due to the complexity of the first method, we believe that the second
method will be easier to develop and implement. For example, a
pruned topology may be considered for recovery path computation,
where links/nodes that violate the strict recovery time requirements
are excluded. A database of link information should hold the fiber
physical length and the capacity of each link (or channel) as well as
the notification message processing time. The total time needed by a
notification packet to travel from source to destination can be
broken into two delay components: the time needed to traverse each
link and the time needed to go through each node. While the
different delay calculations are discussed in Appendix A, the
algorithm for computing the recovery LSPs is outside the scope of
this draft.
6.3 Notification Message Data
Two types of messages are needed for reliable communication of fault
notifications:
- A Fault Notify Message to carry the information regarding the
failure from each node on each of its outgoing links to its
neighboring node(s).
- A Fault Notify Acknowledge Message to indicate that the
notification message was properly received by a neighboring
node.
Aside from implementation-dependent constructs, the data to be
carried in these messages is presented in Table 1 below.
Table 1. Required and Optional Data for Fault Notifications
--------------------------------------------------------------------
Data Object Fault Fault Notify Description
Notify Acknowledge
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 10]
draft-rabbat-fault-notification-protocol-05.txt May
2004
--------------------------------------------------------------------
Message ID R R Identifies notification messages
Fault Link ID R - Identifies the failed link
Fault ID R - Identifies sequence of failure
Channel Status O - Indication of link fault status
Detecting Node ID O - Identifies the original node
that is reporting the failure
TTL O - Time To Live field
--------------------------------------------------------------------
R: required, O: optional, -: not applicable
A node keeps sending Fault Notify messages at separate intervals
until it receives a Fault Notify Acknowledgement response or the
control channel connectivity is declared lost.
7. Reversion (Normalization)
Most of the current literature recommends that for resource
efficiency, the traffic should be moved back to the original path
when the failed link or node is back online. Although reversion is
an optional step, it is typically employed. If reversion is not
used,
the "orphaned" bandwidth on the failed working paths should be
reclaimed as these paths are repaired. The notification of fault
repair is similar to that of fault notifications. However, the
reversion phase does not have strict time constraints and can start
at a later time, allowing detecting nodes to spend more time in the
correlation phase before sending the fault repair notification.
During that period of time, the IGP may have notified different nodes
that the failed resource is up again. In that respect, nodes will
not engage in any reversion action until notified by FNP. They will
however update their TE databases as usual and use them for any new
path calculations. Therefore, no interference in the database
information will occur as a result of using FNP.
8. Discussion on Database Updates
The FNP flooding operation discussed in this draft is used for rapid
fault recovery. It does not preempt the regular IGP flooding process
that occurs periodically to update the Traffic Engineering databases
with new network information. Therefore, FNP does not impact the
regular IGP process. The TE database is only updated through the IGP
process, thereby preserving consistency.
A network operator MAY have a policy that allows the information
relayed to a node through FNP to be used to temporarily exclude the
failed resource that has been communicated through FNP. An example
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 11]
draft-rabbat-fault-notification-protocol-05.txt May
2004
of such use is to calculate new paths. In such a case, the TE
database is not updated by the path computation, but the computation
algorithm excludes the failed resource from path calculation. The
nodes that receive notification through FNP about the failed resource
in that case MUST start a timer that expires after a set period of
time of a few seconds. When the timer expires, the exclusion is no
longer used and the computation algorithm will only consider the
information stored in the TE database for its path computation. In
the meantime, the IGP may or may not have updated the node about the
resource failure, as is generally the case. The objective of this
optional strategy is to increase the path setup success rate.
9. Security Considerations
This draft proposes a scheme for rapid fault notification in GMPLS
networks, and does not have any known security issues. Detailed
analysis of any security impact is TBD.
10. Conclusion
This draft presented the Fault Notification Protocol for IP-
controlled optical networks that implement shared recovery. It
described the steps required in the notification process and how they
lead to the recovery of service within specific time bounds. A "per-
failure" approach (as opposed to the "per-LSP" approach) to fault
notification was proposed in order to improve scalability and
guarantees.
11. Acknowledgments
The authors would like to thank Jonathan Lang, Adrian Farrel, Neil
Harisson, Jonathan Sadler, Fabio Ricciato, Zafar Ali and Roberto
Albanese for feedback and helpful comments on the fault notification
protocol, and Takafumi Chujo, Peter Czezowski, and Akira Chugo for
valuable inputs to this draft. We would also like to acknowledge the
feedback of George Newsome, Deborah Brungard and John Drake.
12. Intellectual Property Considerations
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 12]
draft-rabbat-fault-notification-protocol-05.txt May
2004
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 13]
draft-rabbat-fault-notification-protocol-05.txt May
2004
References
[1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP
9, RFC 2026, October 1996.
[2] Mannie, E., ed., et al, "Recovery (Protection and Restoration)
Terminology for Generalized Multi-Protocol Label Switching
(GMPLS)", Internet Draft, work in progress, draft-ietf-ccamp-
gmpls-recovery-terminology-04.txt, April 2004.
[3] Lai, W.S., and D. McDysan (Eds.), "Network Hierarchy and
Multilayer Survivability", RFC 3386, November 2002.
[4] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based
Recovery Mechanisms (including Protection and Restoration)",
Internet draft, work in progress, draft-ietf-ccamp-gmpls-
recovery-analysis-03.txt, April 2004.
[5] Lang, J., and Rajagopalan, B. (Eds.) "Generalized MPLS recovery
functional specification," Internet Draft, work in progress,
draft-ietf-ccamp-gmpls-recovery-functional-02.txt, April 2004.
[6] Lang, J. (ed) et al, "RSVP-TE Extensions in support of End-to-
End GMPLS-based Recovery," Internet Draft, work in progress,
draft-ietf-ccamp-gmpls-recovery-e2e-signaling-01.txt, May 2004.
[7] Berger, L. et al, "GMPLS Based Segment Recovery," Internet
Draft, work in progress, draft-ietf-ccamp-gmpls-segment-
recovery-00.txt, March 2004.
[8] Rabbat, R. and Soumiya, T., (Eds.), "Optical transport network
failure recovery requirements", Internet Draft, work in
progress, draft-rabbat-optical-recovery-reqs-01.txt, January
2004.
[9] Rabbat, R., Su, C.-F., Sharma, V., "Observations on the
Applicability of the Fault Notification Protocol", Internet
Draft, work in progress, draft-rabbat-fnp-applicability-01.txt,
June 2004.
[10] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[11] Li, G., J. Yates, et al, "Experiments in Fast Restoration using
GMPLS in Optical/Electronic Mesh Networks", Post-deadline Papers
Digest, OFC 2001, Anaheim, CA, March 2001.
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 14]
draft-rabbat-fault-notification-protocol-05.txt May
2004
[12] Rabbat, R. et al, "Fault Notification and Service Recovery in
WDM Networks", white paper available at:
http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf.
[13] Rabbat, R., Sharma, V. and Ali, Z., "Expedited Flooding for
Restoration in Shared-Mesh Transport Networks", Internet draft,
work in progress, draft-rabbat-expedited-flooding-01.txt,
February 2004.
[14] ITU-T, G.808.1, "Generic Protection Switching- Linear trail and
subnetwork protection", December 2003.
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 15]
13. Authors' Addresses
Richard Rabbat Vishal Sharma
Fujitsu Labs of America, Inc. Metanoia, Inc.
1240 E. Arques Ave, MS 345 888 Villa Street, Suite 200B
Sunnyvale, CA 94085 Mtn. View, CA 94041
United States of America United States of America
Phone: +1-408-530-4537 Phone: +1-408-530-8313
Email: rabbat@alum.mit.edu Email: v.sharma@ieee.org
Norihiko Shinomiya Ching-Fong Su
Fujitsu Laboratories Ltd. Fujitsu Labs of America, Inc.
1-1, Kamikodanaka 4-Chome 1240 E. Arques Ave
Nakahara-ku, Kawasaki Sunnyvale, CA 94085
211-8588, Japan United States of America
Phone: +81-44-754-2635 Phone: +1-408-530-4572
Email: shinomi@jp.fujitsu.com Email: csu@fla.fujitsu.com
Appendix A. Fault Notification Message Delays on a Path
This appendix describes the delays incurred on a path. Two types of
delays occur on the path between any two nodes. They are delays
incurred during traversal of the links on that path, and delays that
occur at the nodes along the path. The following presents the
computations and expected values for the different delays.
A.1 Delays Associated with Link Traversal
The time needed to traverse each link is the sum of the transmission
time and the link propagation delay:
1. The transmission time is a value based on link capacity. The
calculation is as follows: D trans = (packet size) / (link
speed).
2. The link propagation delay is due to the physical length of the
link: D prop = length / (light propagation speed on fiber).
The length of a notification packet is expected to be of the order of
a hundred bytes (about 10^3 bits). As an example, for a link speed
of 1 Gbps,
D trans ~= 10^3 / 10^9 = 10^-6 s = 1 microsecond.
This value therefore can safely be ignored in calculating delays. On
the other hand, the link propagation delay in metropolitan area and
long-haul networks affects total delay. For a distance of 100 km,
with light speed in a fiber at 2/3 (about 200,000 km/s) of its speed
in free space,
Rabbat & Sharma (Eds.) Expires November 2004 [Page 16]
draft-rabbat-fault-notification-protocol-05.txt May
2004
D prop ~= 10^2 / (2 * 10^5) = 0.5*10^-3 s = 500 microseconds.
A.2 Delays Incurred at the Nodes
At each node, two delays are important: queuing delay and processing
time. The processing time D proc has been identified in the
literature [11] as a few tenths of a millisecond in the case of an
RSVP object. This value is smaller in the case of a simpler LMP
message requesting the activation of an LSP path.
The issue of queuing delay is important at all intermediate nodes.
Fault notification messages should be queued at the front of the
buffer that holds other control packets in order to avoid queuing
delays, (those messages do not have to contend with data packets
since obviously no data are sent over the control channel). A
queuing process such as priority queuing would allow those packets to
be admitted at the head of the queue, through the setup of the
priority of the packet. A simple mechanism such as the setup of the
priority bits at the IP header, such as the IP precedence bits or
DSCP code points of the TOS (Type Of Service) byte would be
appropriate. Using priority queuing for fault notification messages
will ensure that their queuing delay will be bounded. In the case of
flooding for fault notification, D queue(A) = 0 sec. If other
fault
notification messages are in the queue as well, this implies multiple
failures, where the time recovery guarantee does not apply.
Otherwise, it may indicate the fact that multiple messages are
traveling on different revovery LSPs to notify the same link failure,
such as the case when a signaling protocol is used for fault
notification. In the case of per-LSP fault notification just as in
the case of using a signaling protocol, the maximum queuing delay at
node A is:
D queue max(A)= (number of recovery LSPs) * (packet size) /
(link bandwidth).
This provides the mathematical basis for using flooding for fault
notification; flooding allows this value to be 0 seconds. In the
absence of priority queuing, the maximum queue delay can be
calculated as follows at node A, assuming fair queuing at the FIFO
buffers of all control channels and assuming input buffers only:
D queue max(A)= (number of queues) * (queue size) / (link
bandwidth).
This value is an upper bound, and is dependent on hardware buffer
implementations.
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 17]
draft-rabbat-fault-notification-protocol-05.txt May
2004
Full Copyright Statement
"Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
Rabbat & Sharma (Eds.) Expires - November 2004 [Page 18]