[Search] [txt|pdfized|bibtex] [Tracker] [Email] [Diff1] [Diff2] [Nits]
Versions: 00 01 02 03 04 05                                             
CCAMP Working Group                           Richard Rabbat, Ed. (FLA)
Internet Draft                      Vishal Sharma, Ed. (Metanoia, Inc.)
Expires: November 2004                         Norihiko Shinomiya (FLL)
                                                    Ching-Fong Su (FLA)

                                                               May 2004

    Fault Notification Protocol for GMPLS-Based Recovery in Shared Mesh
                                 Networks
              draft-rabbat-fault-notification-protocol-05.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 [1].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months

   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.


Abstract

   This draft presents a fault notification protocol for use in a GMPLS-
   based failure recovery scheme.  The protocol guarantees recovery
   path(s) activation in a bounded time in the event of single resource
   failures.  These failures include fiber cut, transponder failure and
   node failure.  Bounded recovery time is achieved by pre-signaling
   recovery paths whose nodes can be reached within a specific time,
   based on the physical capabilities of the nodes and the delay
   characteristics of the control plane.  We propose using a flooding
   protocol for fault notification to allow for per-failure notification

   and to speed up the recovery process.  We justify choices made for
   the notification method and the messaging required for the protocol.

   The draft does not mandate a specific implementation of the Fault
   Notification Protocol.




Rabbat & Sharma (Eds.) Expires  November 2004               [Page 1]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004



Table of Contents

   1. Overview.......................................................2
   2. Terminology....................................................4
   3. Glossary of Terms Used.........................................4
   4. Requirements at Recovery Path Setup Time.......................4
   5. Steps in Failure Notification and Service Recovery.............6
   5.1 T1: Fault Detection Time......................................6
   5.2 T2: Hold-Off Time.............................................7
   5.3 Tt: Transfer Time.............................................7
   5.4 T5: Traffic Recovery Time.....................................7
   6. Fault Notification Protocol (FNP)..............................7
   6.1 FNP Flooding Operation........................................9
   6.2 Delays Incurred by Messages..................................10
   6.3 Notification Message Data....................................10
   7. Reversion (Normalization).....................................11
   8. Discussion on Database Updates................................11
   9. Security Considerations.......................................12
   10. Conclusion...................................................12
   11. Acknowledgments..............................................12
   12. Intellectual Property Considerations.........................12
   13. Authors' Addresses...........................................16
   Appendix A. Fault Notification Message Delays on a Path..........16
   A.1 Delays Associated with Link Traversal........................16
   A.2 Delays Incurred at the Nodes.................................17
   Full Copyright Statement.........................................18

Changes from Previous Version

   - Updated Intellectual Property section

   - Added discussion about database updating to explain that IGP is
     sole updating agent

   - Updated terminology to be synchronize with [2]

1. Overview

   Recovery (protection and restoration) in optical switching networks
   under tight time constraints has been recognized as a challenging
   issue [2, 3] that is crucial to enable fast path restoration and meet

   requirements for high-availability and service-level guarantees.
   Several mechanisms have been devised for recovery in mesh and ring
   topologies.  The CCAMP WG has produced a collection of drafts that
   address the issue of recovery in networks featuring a Generalized
   Multi-Protocol Label Switching (GMPLS) control-plane: a terminology
   draft for GMPLS-based recovery [2], an analysis draft [4] that looks
   at differences between protection, restoration, path-based, link-

Rabbat & Sharma (Eds.) Expires - November 2004               [Page 2]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   based and span-based approaches, a functional specification draft [5]

   that presents a functional description of some of the protocol
   extensions needed to support GMPLS-based recovery and solutions for
   edge-to-edge and segment recovery [6, 7].  The requirements for
   recovery in optical networks that this draft addresses are presented
   in [8].

   In general, a fault notification protocol for optical transport
   networks should address recovery requirements falling into three main

   categories:

      - Timing requirements: it must meet adequate bounds on timing to
        enable fast path restoration
      - Control plane resources: it must use control plane resources
        efficiently
      - Design of recovery schemes: it must allow for the design of
        flexible recovery schemes

   Protection and restoration algorithms can be used for local repair
   (link-based or node-based), span recovery, and path recovery.  This
   document presents a fault notification protocol and recovery scheme
   designed to ensure bounded recovery times, (e.g., 50 ms), which are
   comparable to recovery times in the ring-based SONET/SDH networks
   that implement 1+1 or 1:1 protection schemes.

   Link-based recovery can handle faults such as fiber link failures and

   transponder failures.  However, in the case of a node failure, the
   control plane uses either node-based or path-based recovery.  The
   advantage of path-based recovery lies in its ability to reduce
   wavelength redundancy (wavelengths that are reserved for possible
   failures), but its disadvantage is the potentially lengthy delay
   incurred in notifying all nodes along the recovery path of the
   failure of a remote resource.  Span-based protection allows the
   protection of independent segments on the working path, thereby
   decreasing the recovery time, but requires more resources for
   protection.  In addition, the provider has to go to a greater degree
   of planning to protect the same resource.  In some applications,
   recovery paths need to be chosen carefully to meet certain recovery
   time requirements.

   This document presents a fault notification protocol that applies to
   intra-domain recovery, and that we will call FNP. (We shall use the
   term fault notification protocol, when referring to a generic scheme
   for notification, and the term FNP, when referring to the specific
   scheme discussed in this document).  The protocol applies to networks

   that implement shared recovery, and deals with both ring and mesh-
   based recovery.  Multi-domain recovery is not within the scope of
   this draft.  In addition, this proposal focuses on scalability, an
   important issue that arises when using signaling for fault

Rabbat & Sharma (Eds.) Expires - November 2004               [Page 3]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   notification.  Implementation of the protocol is left for further
   drafts.  For details about the applicability of FNP, please refer to
   the accompanying draft [9].

   We assume unidirectional traffic through Label Switched Paths (LSPs)
   and assume that bidirectional traffic is carried by two
   unidirectional LSPs.  Assumptions made in this draft are also valid
   for bi-directional LSPs.  For the purpose of illustration, we use a
   mesh Wavelength Division Multiplexing (WDM) network with OEO
   switching (i.e. wavelength termination at nodes); applicability to
   ring-topology networks is automatic.


2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [10].


3. Glossary of Terms Used

   In addition to the terminology for GMPLS-based recovery that is
   documented in [2], this draft uses the following acronyms:

      o AIS:   Alarm Indication Signal, a signal at the SONET/SDH
               transport layer
      o BDI:   Backward Defect Indication, a signal at the transport
               layer sent upstream
      o LSP:   Label Switched Path
      o MEMS:  Micro-Electro Mechanical System
      o WDM:   Wavelength Division Multiplexing


4. Requirements at Recovery Path Setup Time

   A request for a working path signaled into the network indicates the
   type of protection or restoration it requires, and, optionally, a
   recovery priority value.  The recovery priority is useful if, during
   the recovery process, a node has to decide on which (of many) working

   paths to protect.  After the recovery route computation algorithm
   calculates the protection or restoration path, the link resources
   (wavelengths, wavebands, etc.) along that path are reserved and
   possibly activated.  When the recovery path is not activated, these
   link resources may be used to carry preemptible best-effort traffic
   to increase network utilization.  This traffic is generally
   identified as "extra traffic."  Alternatively, the same link resource

   may be reserved by multiple recovery LSPs for different link failures

   as long as these recovery LSPs do not need to be activated

Rabbat & Sharma (Eds.) Expires - November 2004               [Page 4]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   simultaneously (e.g., M:N shared protection).  In either case, proper

   link resources need to be activated upon notification of failure.

   When a label for a recovery LSP is setup on a certain node A by RSVP-
   TE, node A should be aware of the network resource that this LSP is
   protecting.  When using RSVP-TE for example, the protection PATH
   message may notify all nodes on the protection path of this
   information at path setup time as proposed in [11].  This allows node

   A to bind (or group together) labels (as well as link resources) that

   protect a particular network resource.  For example, if two labels j
   and k correspond to two LSPs used to protect working paths from the
   failure of link (X,Y), then they belong to the set L (X,Y).  This
   allows node A to process, in its control plane, the joint event of
   the two LSP failures and possibly jointly activate/cross-connect both

   LSPs referenced by labels j and k when it receives notification of
   the failure of link (X,Y).

   This document proposes a method for per-failure fault notification
   (as compared to per-LSP fault notification); hence such aggregated
   label information is essential.  The main difference between "per-
   failure" vs. "per-LSP" notification is in the number of notification
   mechanisms that have to concurrently occur.  Per-failure fault
   notification allows the engaging of one mechanism to notify all
   relevant nodes of the fault.  On the other hand, per-LSP notification

   requires activating as many mechanisms as the number of failed LSPs
   (for example, all LSPs that failed due to a link failure).  In an
   optical network, carrying possibly hundreds of wavelengths per fiber
   or TDM channels per wavelength, per-LSP notification can be taxing on

   the hardware and resource-intensive.

   In the case of GMPLS-based signaling, there is generally one fault
   notification (using the Notify message) per disrupted Label Switched
   Path.  One could achieve some amount of efficiency by bundling
   notification messages by correlating fault information and sending
   one Notify message per source node.  Each of these nodes, upon
   receipt of the Notify, would have to initiate a handshake process
   through RSVP-TE Path and Resv signaling messages to complete the
   activation of the backup path before working traffic could be
   switched to it.  While some recovery LSPs may be the same and could
   be signaled together during the handshake phase, this is generally
   restrictive in a mesh network.  In general, the notification followed

   by the handshake mechanism increase recovery time.  The restrictions
   on the network topology and the choice of recovery paths and end-
   nodes, make these scalability enhancements hard to realize in a
   network that implements shared-mesh restoration.

   Hence, signaling does not scale well with the number of connections;
   in addition, the message processing delay is less predictable.  As
   explained later, the flooding approach decreases the recovery time by


Rabbat & Sharma (Eds.) Expires - November 2004               [Page 5]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   removing the need for such a mechanism.  For details about the
   notification methods and the choice of flooding for the current
   mechanism, the reader is encouraged to refer to [12].

   A companion draft [13] explains the need for an expedited flooding
   mechanism to realize FNP.  The document outlines how a flooding
   protocol implementation must balance the need for fast flooding of
   failure notifications with the need for controlling the frequency of
   flooding message transmission, so that it maintains network
   stability.


5. Steps in Failure Notification and Service Recovery

   The steps described in this section detail the different times
   between the occurrence of a network impairment and completion of all
   recovery actions.  The failure sequence is based on the timing
   sequence in the ITU-T recommendation G.808.1 [14], adapted for the
   purposes of this draft.  The critical component in guaranteeing time
   constraints to service recovery is the fault notification process.
   The following sequence of events MUST be followed in order to ensure
   that the recovery process is completed within a specific amount of
   time.


         +-Network Impairment
         |    +-Fault Detected
         |    |    +-Start of Fault Notification and Recovery
         |    |    |    +-Recovery Operation Complete
         |    |    |    |    +-Traffic Recovered
         |    |    |    |    |
         |    |    |    |    |
         v    v    v    v    v
        ------------------------------------------------>
         | T1 | T2 | Tt | T5 |                    time

     Figure 1. Recovery Temporal Model


5.1 T1: Fault Detection Time

   This is the period of time between the impairment in a network and
   the detection of signals triggered by that impairment.  An example of

   such network impairment is a fiber cut in a WDM network with
   termination at all nodes (nodes are OEO switches).  In general, if a
   bi-directional link is cut, both its upstream and downstream nodes
   will detect the fault.  A unidirectional link failure will be
   detected by the downstream node.


Rabbat & Sharma (Eds.) Expires - November 2004               [Page 6]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   To support the failure detection requirement, nodes MUST implement
   per-channel monitoring that will pinpoint the failure and report it.

5.2 T2: Hold-Off Time

   This is the period of time between the detection of a fault and the
   start of the fault recovery process. In other words, it is the period

   of time that the reporting entity waits before starting the fault
   recovery process.  This allows the fault recovery process at a given
   layer to wait for recovery to occur at a lower layer.  In the case of

   WDM-based networks with no data recovery mechanism, this time should
   be 0 sec.  In other networks that use SONET-based recovery, this time

   T2 may be set to 50ms such that SONET protection scheme can complete
   before any IP-based recovery is triggered.

5.3 Tt: Transfer Time

   Tt (Transfer time) is the sum of T3 (fault notification time) and T4
   (completion of recovery operation time).  Tt is the period of time
   between the start of the fault notification process and the
   completion of recovery switching of the traffic on the protection
   path.  This includes the transmission and processing of the control
   signals required to effect recovery.  In other words, it is the
   interval between the time when the detecting entity starts sending
   out a fault notification message and the time when every node,
   including ingress nodes and intermediate nodes on the corresponding
   recovery paths, have been notified of the failure and finished
   reconfiguring themselves for carrying restored traffic.  This
   includes the fact that the end-nodes have switched traffic unto the
   backup path.

5.4 T5: Traffic Recovery Time

   T5 is the period of time between the completion of the recovery
   actions and the full restoration of working traffic. In other words,
   this is the time between the last recovery action at the end-nodes
   and the time that the traffic (if present) is completely recovered.
   This interval is intended to account for the time required for
   traffic to once again arrive at the point in the network that
   experienced disrupted or degraded service due to the occurrence of
   the fault, i.e. the egress node.


6. Fault Notification Protocol (FNP)

   The Fault Notification Protocol is a series of procedures designed to

   be executed during Tt (the transfer time that consists of the fault
   notification and completion of recovery operation time) and to effect

   timely notification of network faults.  This protocol is used for

Rabbat & Sharma (Eds.) Expires - November 2004               [Page 7]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   notifying nodes of the resource failure and in activating the
   recovery LSPs.

   For link-based recovery, the ingress node to the recovery LSP is the
   upstream detecting node.  If the recovery time is strictly
   constrained, the ingress node SHOULD be as close to the link failure
   as possible.  This reduces the recovery time since no messages have
   to be relayed to a remote or centralized authority to initiate
   recovery.

   A fault detecting entity may be at different locations depending on
   the type of fault.  That detecting entity will initiate the
   notification process.

   The detecting entity MAY use several fault notification methods to
   notify other nodes of the failure, including GMPLS-based signaling or

   flooding as discussed earlier in section 4.

   In the case of flooding, the message sent from the detecting entity
   to all nodes on the various recovery paths reaches each of them
   within the specified recovery time T-rec minus the reconfiguration
   time T-cfg(k) needed at each node k after it is notified of a fault.

   We define this time as the fault notification time T-ntf(k) as

   T-ntf(k) = T-rec  T-cfg(k)

   Nodes on a recovery path (including the ingress node) are aware that
   they are protecting against the failure of a particular resource.
   All nodes notified of the failure will de-correlate the fault
   information to learn what LSPs are impacted and activate the recovery

   LSPs by performing any required hardware reconfiguration (e.g.,
   moving mirrors in the case of a MEMS-based switching fabric or cross-
   connecting TDM channels).  The approach outlined in this draft
   supports node reconfiguration whether applied sequentially (e.g.,
   parallel movement of the mirrors is not available), or in parallel
   (e.g., electronic switching fabric).  They also stop carrying extra
   traffic, if any, on the recovery channel.  An algorithm that computes

   the constrained recovery path SHOULD take the physical capability of
   nodes into account in its path calculation.

   The ingress node starts sending data on the recovery LSP at the start

   time T-start(I) specified in the next paragraph.  If one of the
   detecting entities at the ingress or egress node detect, at the data
   plane, a failure in the recovery LSP to be activated, it MUST raise
   an alarm that may be dealt with at the management plane.  The
   management plane will take appropriate remediation action.  Alarm and

   remediation are outside the scope of this draft.



Rabbat & Sharma (Eds.) Expires - November 2004               [Page 8]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   The nodes on that belong to recovery LSPs receive the fault
   notification within a deterministic time.  This time delay is
   calculated by each node as explained in Appendix A.  To avoid complex

   clock synchronization, an ingress node, identified as node I, that
   receives the notification from a detecting node, node J, calculates
   the start time T-start(I) at which it must switch traffic to the
   recovery LSP as follows:

   T-start(I) = time-of-notification(I) - min-delay-between(J,I) +
T-rec

   Where

      - time-of-notification(I) returns the clock time at node I.

      - min-delay-between(J,I) returns the minimum time needed for the
        notification from node J to reach node I; this value is
        dependent on the topology and the different equipment in the
        network.  It is calculated offline based on the topology and
        hardware information, and is stored as a static table at every
        node.

   Note that (time-of-notification(I) - min-delay-between(J,I)) will
   give the time when failure was detected at J, and T-rec is the
   recovery time requirement.

   Our scheme, therefore, works in the following manner:

     1. Given the topology and the equipment in the network, it is
        possible to calculate T-rec and T-ntf for a given failure.

     2. An offline or online algorithm outside the scope of this draft
        may calculate the recovery path using this information.

     3. Upon the occurrence of a failure, when flooding-based
        notification is used as described above, a node I on the
        recovery path is guaranteed that at T-start(I), all other nodes
        along the recovery path have been informed of the failure and
        have taken the appropriate action to move traffic onto the
        recovery path.

6.1 FNP Flooding Operation

   Fault notification is done via flooding as follows.  The detecting
   entity sends a notification packet to its neighbors on all outgoing
   links.  The notification packet is a high-priority packet, and
   contains the unique identifier of the link at fault.  Each node that
   receives such a packet sends an acknowledgement to the sender (its
   neighbor) and transmits duplicates of the notification to all other
   neighboring nodes.  To reduce the amount of fault notification

Rabbat & Sharma (Eds.) Expires - November 2004               [Page 9]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   traffic that is flooded, the nodes avoid re-broadcasting packets
   notifying about the same fault and decrement a time-to-live field in
   the packets as they are received.

6.2 Delays Incurred by Messages

   The above discussion suggests that in order for the recovery path
   calculation algorithm to abide by the T-rec recovery requirement, it
   needs to adopt one of two methods.

     1. Be aware of timing issues to be able to select a proper path.
     2. Only consider the nodes and links that satisfy the timing
        constraints.

   Due to the complexity of the first method, we believe that the second

   method will be easier to develop and implement.  For example, a
   pruned topology may be considered for recovery path computation,
   where links/nodes that violate the strict recovery time requirements
   are excluded.  A database of link information should hold the fiber
   physical length and the capacity of each link (or channel) as well as

   the notification message processing time.  The total time needed by a

   notification packet to travel from source to destination can be
   broken into two delay components: the time needed to traverse each
   link and the time needed to go through each node.  While the
   different delay calculations are discussed in Appendix A, the
   algorithm for computing the recovery LSPs is outside the scope of
   this draft.

6.3 Notification Message Data

   Two types of messages are needed for reliable communication of fault
   notifications:

      - A Fault Notify Message to carry the information regarding the
        failure from each node on each of its outgoing links to its
        neighboring node(s).

      - A Fault Notify Acknowledge Message to indicate that the
        notification message was properly received by a neighboring
        node.

   Aside from implementation-dependent constructs, the data to be
   carried in these messages is presented in Table 1 below.


   Table 1. Required and Optional Data for Fault Notifications
   --------------------------------------------------------------------
   Data Object    Fault   Fault Notify  Description
                  Notify  Acknowledge

Rabbat & Sharma (Eds.) Expires - November 2004              [Page 10]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   --------------------------------------------------------------------
   Message ID        R         R        Identifies notification messages

   Fault Link ID     R         -        Identifies the failed link
   Fault ID          R         -        Identifies sequence of failure
   Channel Status    O         -        Indication of link fault status
   Detecting Node ID O         -        Identifies the original node
                                        that is reporting the failure
   TTL               O         -        Time To Live field
   --------------------------------------------------------------------
   R: required, O: optional, -: not applicable

   A node keeps sending Fault Notify messages at separate intervals
   until it receives a Fault Notify Acknowledgement response or the
   control channel connectivity is declared lost.


7. Reversion (Normalization)

   Most of the current literature recommends that for resource
   efficiency, the traffic should be moved back to the original path
   when the failed link or node is back online.  Although reversion is
   an optional step, it is typically employed.  If reversion is not
used,
   the "orphaned" bandwidth on the failed working paths should be
   reclaimed as these paths are repaired.  The notification of fault
   repair is similar to that of fault notifications.  However, the
   reversion phase does not have strict time constraints and can start
   at a later time, allowing detecting nodes to spend more time in the
   correlation phase before sending the fault repair notification.

   During that period of time, the IGP may have notified different nodes

   that the failed resource is up again.  In that respect, nodes will
   not engage in any reversion action until notified by FNP.  They will
   however update their TE databases as usual and use them for any new
   path calculations.  Therefore, no interference in the database
   information will occur as a result of using FNP.


8. Discussion on Database Updates

   The FNP flooding operation discussed in this draft is used for rapid
   fault recovery.  It does not preempt the regular IGP flooding process

   that occurs periodically to update the Traffic Engineering databases
   with new network information.  Therefore, FNP does not impact the
   regular IGP process.  The TE database is only updated through the IGP

   process, thereby preserving consistency.

   A network operator MAY have a policy that allows the information
   relayed to a node through FNP to be used to temporarily exclude the
   failed resource that has been communicated through FNP.  An example

Rabbat & Sharma (Eds.) Expires - November 2004              [Page 11]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   of such use is to calculate new paths.  In such a case, the TE
   database is not updated by the path computation, but the computation
   algorithm excludes the failed resource from path calculation.  The
   nodes that receive notification through FNP about the failed resource

   in that case MUST start a timer that expires after a set period of
   time of a few seconds.  When the timer expires, the exclusion is no
   longer used and the computation algorithm will only consider the
   information stored in the TE database for its path computation.  In
   the meantime, the IGP may or may not have updated the node about the
   resource failure, as is generally the case.  The objective of this
   optional strategy is to increase the path setup success rate.


9. Security Considerations

   This draft proposes a scheme for rapid fault notification in GMPLS
   networks, and does not have any known security issues. Detailed
   analysis of any security impact is TBD.


10. Conclusion

   This draft presented the Fault Notification Protocol for IP-
   controlled optical networks that implement shared recovery.  It
   described the steps required in the notification process and how they

   lead to the recovery of service within specific time bounds. A "per-
   failure" approach (as opposed to the "per-LSP" approach) to fault
   notification was proposed in order to improve scalability and
   guarantees.


11. Acknowledgments

   The authors would like to thank Jonathan Lang, Adrian Farrel, Neil
   Harisson, Jonathan Sadler, Fabio Ricciato, Zafar Ali and Roberto
   Albanese for feedback and helpful comments on the fault notification
   protocol, and Takafumi Chujo, Peter Czezowski, and Akira Chugo for
   valuable inputs to this draft.  We would also like to acknowledge the

   feedback of George Newsome, Deborah Brungard and John Drake.


12. Intellectual Property Considerations

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to

   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information


Rabbat & Sharma (Eds.) Expires - November 2004              [Page 12]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of

   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at

   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.



































Rabbat & Sharma (Eds.) Expires - November 2004              [Page 13]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


   References

   [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP
       9, RFC 2026, October 1996.

   [2] Mannie, E., ed., et al, "Recovery (Protection and Restoration)
       Terminology for Generalized Multi-Protocol Label Switching
       (GMPLS)", Internet Draft, work in progress, draft-ietf-ccamp-
       gmpls-recovery-terminology-04.txt, April 2004.

   [3] Lai, W.S., and D. McDysan (Eds.), "Network Hierarchy and
       Multilayer Survivability", RFC 3386, November 2002.

   [4] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based
       Recovery Mechanisms (including Protection and Restoration)",
       Internet draft, work in progress, draft-ietf-ccamp-gmpls-
       recovery-analysis-03.txt, April 2004.

   [5] Lang, J., and Rajagopalan, B. (Eds.) "Generalized MPLS recovery
       functional specification," Internet Draft, work in progress,
       draft-ietf-ccamp-gmpls-recovery-functional-02.txt, April 2004.

   [6] Lang, J. (ed) et al, "RSVP-TE Extensions in support of End-to-
       End GMPLS-based Recovery," Internet Draft, work in progress,
       draft-ietf-ccamp-gmpls-recovery-e2e-signaling-01.txt, May 2004.

   [7] Berger, L. et al, "GMPLS Based Segment Recovery," Internet
       Draft, work in progress, draft-ietf-ccamp-gmpls-segment-
       recovery-00.txt, March 2004.

   [8] Rabbat, R. and Soumiya, T., (Eds.), "Optical transport network
       failure recovery requirements", Internet Draft, work in
       progress, draft-rabbat-optical-recovery-reqs-01.txt, January
       2004.

   [9] Rabbat, R., Su, C.-F., Sharma, V., "Observations on the
       Applicability of the Fault Notification Protocol", Internet
       Draft, work in progress, draft-rabbat-fnp-applicability-01.txt,
       June 2004.

   [10] Bradner, S., "Key words for use in RFCs to Indicate Requirement
       Levels", BCP 14, RFC 2119, March 1997.

   [11] Li, G., J. Yates, et al, "Experiments in Fast Restoration using
       GMPLS in Optical/Electronic Mesh Networks", Post-deadline Papers
       Digest, OFC 2001, Anaheim, CA, March 2001.




Rabbat & Sharma (Eds.) Expires - November 2004              [Page 14]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004




   [12] Rabbat, R. et al, "Fault Notification and Service Recovery in
       WDM Networks", white paper available at:
       http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf.

   [13] Rabbat, R., Sharma, V. and Ali, Z., "Expedited Flooding for
       Restoration in Shared-Mesh Transport Networks", Internet draft,
       work in progress, draft-rabbat-expedited-flooding-01.txt,
       February 2004.

   [14] ITU-T, G.808.1, "Generic Protection Switching- Linear trail and
       subnetwork protection", December 2003.






































Rabbat & Sharma (Eds.) Expires - November 2004              [Page 15]


13. Authors' Addresses

   Richard Rabbat                      Vishal Sharma
   Fujitsu Labs of America, Inc.       Metanoia, Inc.
   1240 E. Arques Ave, MS 345          888 Villa Street, Suite 200B
   Sunnyvale, CA 94085                 Mtn. View, CA 94041
   United States of America            United States of America
   Phone: +1-408-530-4537              Phone: +1-408-530-8313
   Email: rabbat@alum.mit.edu          Email: v.sharma@ieee.org

   Norihiko Shinomiya                  Ching-Fong Su
   Fujitsu Laboratories Ltd.           Fujitsu Labs of America, Inc.
   1-1, Kamikodanaka 4-Chome           1240 E. Arques Ave
   Nakahara-ku, Kawasaki               Sunnyvale, CA 94085
   211-8588, Japan                     United States of America
   Phone: +81-44-754-2635              Phone: +1-408-530-4572
   Email: shinomi@jp.fujitsu.com       Email: csu@fla.fujitsu.com


Appendix A. Fault Notification Message Delays on a Path

   This appendix describes the delays incurred on a path.  Two types of
   delays occur on the path between any two nodes.  They are delays
   incurred during traversal of the links on that path, and delays that
   occur at the nodes along the path.  The following presents the
   computations and expected values for the different delays.

A.1 Delays Associated with Link Traversal

   The time needed to traverse each link is the sum of the transmission
   time and the link propagation delay:

   1. The transmission time is a value based on link capacity.  The
      calculation is as follows: D trans = (packet size) / (link
      speed).
   2. The link propagation delay is due to the physical length of the
      link: D prop = length / (light propagation speed on fiber).

   The length of a notification packet is expected to be of the order of

   a hundred bytes (about 10^3 bits).  As an example, for a link speed
   of 1 Gbps,

   D trans ~= 10^3 / 10^9 = 10^-6 s = 1 microsecond.

   This value therefore can safely be ignored in calculating delays.  On

   the other hand, the link propagation delay in metropolitan area and
   long-haul networks affects total delay.  For a distance of 100 km,
   with light speed in a fiber at 2/3 (about 200,000 km/s) of its speed
   in free space,

Rabbat & Sharma (Eds.) Expires  November 2004              [Page 16]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004



   D prop ~= 10^2 / (2 * 10^5) = 0.5*10^-3 s = 500 microseconds.

A.2 Delays Incurred at the Nodes

   At each node, two delays are important: queuing delay and processing
   time.  The processing time D proc has been identified in the
   literature [11] as a few tenths of a millisecond in the case of an
   RSVP object.  This value is smaller in the case of a simpler LMP
   message requesting the activation of an LSP path.

   The issue of queuing delay is important at all intermediate nodes.
   Fault notification messages should be queued at the front of the
   buffer that holds other control packets in order to avoid queuing
   delays, (those messages do not have to contend with data packets
   since obviously no data are sent over the control channel).  A
   queuing process such as priority queuing would allow those packets to

   be admitted at the head of the queue, through the setup of the
   priority of the packet.  A simple mechanism such as the setup of the
   priority bits at the IP header, such as the IP precedence bits or
   DSCP code points of the TOS (Type Of Service) byte would be
   appropriate.  Using priority queuing for fault notification messages
   will ensure that their queuing delay will be bounded.  In the case of

   flooding for fault notification, D queue(A) = 0 sec.  If other
fault
   notification messages are in the queue as well, this implies multiple

   failures, where the time recovery guarantee does not apply.
   Otherwise, it may indicate the fact that multiple messages are
   traveling on different revovery LSPs to notify the same link failure,

   such as the case when a signaling protocol is used for fault
   notification.  In the case of per-LSP fault notification just as in
   the case of using a signaling protocol, the maximum queuing delay at
   node A is:

   D queue max(A)= (number of recovery LSPs) * (packet size) /
                   (link bandwidth).

   This provides the mathematical basis for using flooding for fault
   notification; flooding allows this value to be 0 seconds.  In the
   absence of priority queuing, the maximum queue delay can be
   calculated as follows at node A, assuming fair queuing at the FIFO
   buffers of all control channels and assuming input buffers only:

   D queue max(A)= (number of queues) * (queue size) / (link
bandwidth).

   This value is an upper bound, and is dependent on hardware buffer
   implementations.




Rabbat & Sharma (Eds.) Expires - November 2004              [Page 17]


         draft-rabbat-fault-notification-protocol-05.txt      May
2004


Full Copyright Statement

   "Copyright (C) The Internet Society (2003). All Rights Reserved.
   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are

   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
























Rabbat & Sharma (Eds.) Expires - November 2004              [Page 18]