Network Working Group                         Thomas D. Nadeau
Internet Draft                                Monique Morrow
Expires: August 2003                          George Swallow
                                               Cisco Systems, Inc.

                                               David Allan
                                               Nortel Networks

                                               February 2003


            OAM Requirements for MPLS Networks
          draft-ietf-mpls-oam-requirements-00.txt


Status of this Memo

    This document is an Internet-Draft and is in full
    conformance with all provisions of Section 10 of RFC 2026
    [RFC2026].

    Internet-Drafts are working documents of the Internet
    Engineering Task Force (IETF), its areas, and its working
    groups.  Note that other groups may also distribute working
    documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by
    other documents at any time.  It is inappropriate to use
    Internet-Drafts as reference material or to cite them other
    than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt.

    The list of Internet-Draft Shadow Directories can be
    accessed at http://www.ietf.org/shadow.html.

Abstract

    As transport of diverse traffic types such as voice, frame
    relay, and ATM over MPLS become more common, the ability to detect,
    handle and diagnose control and data plane defects becomes critical.
    Detection and specification of how to handle those defects is not
    only important because such defects may not only affect the
    fundamental operation of an MPLS network, but also because they
    may impact SLA commitments for customers of that network.

    This Internet draft describes requirements for user and data
    plane operations and management (OAM) for Multi-Protocol
    Label Switching (MPLS). These requirements have been gathered
    from network operators who have extensive experience deploying



Nadeau et al.              Expires October 2002              [Page 1]


Internet Draft         MPLS OAM Requirements        February 26, 2003



    MPLS networks, similarly some of these requirements have
    appeared in other documents [Y1710]. This draft specifies OAM
    requirements for MPLS, as well as for applications of MPLS such
    as pseudowire voice and VPN services. Those interested in specific
    issues relating to instrumenting MPLS for OAM purposes are directed
    to [FRAMEWORK]


    Table of Contents

    Introduction        2
    Terminology 2
    Motivations 3
    Requirements        4
    Security Considerations     8
    Acknowledgments     9
    References  9
    Authors' Addresses  10
    Intellectual Property Rights Notices        11
    Full Copyright Statement    11

1. Introduction

    This Internet draft describes requirements for user and data
    plane operations and management (OAM) for Multi-Protocol
    Label Switching (MPLS). These requirements have been gathered
    from network operators who have extensive experience deploying
    MPLS networks. This draft specifies OAM requirements
    for MPLS, as well as for applications of MPLS such as
    pseudowire [PWE3FRAME] voice, and VPN services.

    No specific mechanisms are proposed to address these
    requirements at this time.  The goal of this draft is to
    identify a commonly applicable set of requirements for MPLS
    OAM. Specifically, a set of requirements that apply to
    the most common set of MPLS networks deployed by service
    provider organizations today. These requirements can then be used
    as a base for network management tool development and to guide
    the evolution of currently specified tools, as well as the
    specification of OAM functions that are intrinsic to protocols
    used in MPLS networks.

    Comments should be made directly to the MPLS mailing list
    at mpls@uu.net.

    This memo does not, in its draft form, specify a standard
    for the Internet community.


2. Terminology



Nadeau et al.              Expires October 2002              [Page 2]


Internet Draft         MPLS OAM Requirements        February 26, 2003




    CE:     Customer Edge

    Defect:   Any error condition that prevents an LSP
              functioning correctly. For example, loss of an
              IGP path will most likely also result in an LSP
              not being able to deliver traffic to its
              destination. Another example is the breakage of
              a TE tunnel.  These may be due to physical
              circuit failures or failure of switching nodes
              to operate as expected.

              Multi-vendor/multi-provider network operation typically
              requires agreed upon definitions of defects (when it is
              broken and when it is not) such that both recovery
              procedures and SLA impacts can be specified.

    ECMP:  Equal Cost Multipath

    LSP:   Label Switch Path

    LSR:   Label Switch Router

    OAM:   Operations and Management

    PE:    Provider Edge

    PW:    Pseudowire

    SLA:   Service Level Agreement

    VCC:   Virtual Circuit Channel

    VPC:   Virtual Path Connection


3   Motivations

    MPLS OAM has been tackled in numerous Internet drafts.
    However all existing drafts focus on single provider
    solutions or focus on a single aspect of the MPLS architecture
    or application of MPLS. For example, the use of RSVP or LDP
    signaling and defects may be covered in some deployments,
    and a corresponding SNMP MIB module exists to manage this
    application; however, the handling of defects and specification
    of which types of defects are interesting to operational
    networks may not have been created in concert with those for
    other applications of MPLS such as L3 VPN.  This leads to
    inconsistent and inefficient applicability across the MPLS
    architecture, and/or requires significant modifications to



Nadeau et al.              Expires October 2002              [Page 3]


Internet Draft         MPLS OAM Requirements        February 26, 2003



    operational procedure and systems in order to provide consistent
    and useful OAM functionality. As MPLS matures relationships
    between providers has become more complex. Furthermore, the
    deployment of multiple concurrent applications
    of MPLS is commonplace. This has led to a need to consider
    deployments that span arbitrary networking arrangements and
    boundaries so that broader and more uniform applicability
    to the MPLS architecture for OAM is possible.


3. Requirements

    The following sections enumerate the OAM requirements
    gathered from service providers. Each requirement is
    further specified in detail to further clarify its
    applicability.

    3.1 Detection of Broken Label Switch Paths

    The ability to detect a broken Label Switch Path (LSP)
    should not require manual hop-by-hop troubleshooting of
    each LSR used to switch traffic for that LSP. For example,
    it is not desirable to manually visit each LSR
    along the data plane path used to transport an LSP; instead,
    this function should be automated and performed from the
    origination of that LSP.  Furthermore, the automation of
    path liveliness is desired in cases where large amounts of
    LSPs might be tested. For example, automated PE-to-PE
    LSP testing functionality is desired. The goal is to detect LSP
    problems before customers do, and this requires detection of
    problems in a "reasonable" amount of time. One useful definition
    of reasonable is both predictable and consistent. If the time to
    detect defects is specified and tools designed accordingly then
    a harmonized operational framework can be established both
    within MPLS levels, and with MPLS applications. If the time to
    detect is known, then automated responses can be
    specified both w.r.t.with regard to resiliency and SLA
    reporting. One consequence is that ambiguity in maintenance
    procedures MUST be minimized as ambiguity in test results impacts
    detection time.

    Although ICMP-based ping can be sent through an LSP, the use of
    this tool to verify the LSP path liveliness has the potential
    for returning erroneous results (both positive and negative)
    given the nature of MPLS LSPs. For example, failures can be
    may occur where inconsistencies exist between the IP and MPLS
    forwarding tables, inconsistencies in the MPLS control and data
    plane or problems with the reply path (i.e.: a reverse MPLS
    path does not exist). Detection tools should have minimal
    dependencies on network components that do not implement the LSP.



Nadeau et al.              Expires October 2002              [Page 4]


Internet Draft         MPLS OAM Requirements        February 26, 2003




    Furthermore, the path liveliness function
    MUST have the ability to support equal cost multipath
    (ECMP) scenarios within the operator's network. Specifically,
    the ability to detect failures on any parallel (i.e.: equal
    IGP cost) paths used to load share traffic in order to more
    efficiently use the network. It is common to base the algorithm
    of how to load share traffic by examining certain fields within
    the packet header. Unfortunately, there is no standard for this
    algorithm, but it is important that any function be capable
    of detecting failures on all operational paths as failure of
    any branch may lead to loss of traffic, regardless of load sharing
    algorithm. This introduces complexity into ensuring that ECMP
    connectivity permutations are exercised, and that defect
    detection occurs in a reasonable amount of time. [GUIDELINES]
    discusses some of the issues and offers suggestions for ensuring
    mutual compatibility of ECMP and maintenance functions (both
    detection and diagnostic).

   3.2 Diagnosis of a Broken Label Switch Path

    The ability to diagnose a broken LSP and to isolate the failed
    resource in the path is required. This is particularly true for
    misbranching defects which are particularly difficult to specify
    recovery actions in an LDP network.
    Experience suggests that this is best accomplished via a path
    trace function that can return the entire list of LSRs and links
    used by a certain LSP (or at least the set of LSRs/links up to the
    location of the defect) is required. The tracing capability should
    include the ability to trace recursive paths, such as when nested
    LSPs are used, or when LSPs enter and exit traffic-engineered
    tunnels [TUNTRACE]. This path trace function must also be
    capable of diagnosing LSP mis-merging by permitting comparison
    of expected vs. actual forwarding behavior at any LSR in the path.
    The path trace capability should be capable of being
    executed from both the head end Label Switch Router (LSR) and any
    mid-point LSR. Additionally, the path trace function MUST have
    the ability to support equal cost multipath scenarios as described
    above in section 3.1.

   3.3 Path characterization

    The ability of a path trace function to reveal details of LSR
    forwarding operations relevant to OAM functionality. This would
    include but not be limited to:
      - use of pipe or uniform TTL models by an LSR
      - externally visible aspects of load spreading (such as
        ECMP), including
        type of algorithm used



Nadeau et al.              Expires October 2002              [Page 5]


Internet Draft         MPLS OAM Requirements        February 26, 2003



        examples of how algorithm will spread traffic
      - data/control plane OAM capabilities of the LSR
      - stack operations performed by the LSR (pushes and pops)


3.4 Service Level Agreement Measurement

    Mechanisms are required to measure diverse aspects of Service
    Level Agreements:
      - availability - in which the service is considered to be
        available and the other aspects of performance measurement
        listed below have meaning, or unavailable and other aspects
        of performance measurement do not.
      - latency - amount of time required for traffic to transit
        the network
      - packet loss
      - jitter - measurement of latency variation

    Such measurements can be made independently of the user traffic
    or via a hybrid of user traffic measurement and OAM probing.

    At least one mechanism
    is required to measure the quantity
    (i.e.: number of packets) of OAM packets. In addition, the
    ability to measure the qualitative aspects of OAM probing must
    be available to specifically compute the latency of OAM packets
    generated and received at each end of a tested LSP. Latency is
    considered in this context as a measurable parameter for SLA reporting.
    There is no assumption that bursts of OAM packets are required to
    characterize the performance of an LSP, but it is suggested that any
    method considered be capable of measuring the latency of an LSP with
    minimal impact on network resources.

3.5 Frequency of OAM Execution

    The operator MUST be have the flexibility to configure OAM
    parameters and the frequency of the execution of any OAM
    functions provided that there is some synchronization possible
    of tool usage for availability metrics. The motivation for this
    is to permit the network to function as a system of harmonious
    OAM functions consistent across the entire network.

    To elaborate, there are defect conditions (specifically
    misbranching or misdirection of traffic) for probe based detection
    mechanisms combined with automated network response requires
    harmonization of probe insertion rates and probe handling across
    the network in order to avoid flapping.

    One observation would be that commoditization of MPLS, common
    optimized implementation of monitoring tools and the need for inter-



Nadeau et al.              Expires October 2002              [Page 6]


Internet Draft         MPLS OAM Requirements        February 26, 2003



    carrier harmonization of defect and SLA handling will drive
    specification of OAM parameters to commonly agreed on values and
    such values will have to be harmonized with the surrounding
    technologies (e.g. SONET/SDH, ATM etc.) in order to be useful.
    This will become particularly important as networks scale
    and misconfiguration can result in churn, alarm flapping etc.


3.5 Alarm Suppression and layer coordination

    Devices must provide alarm suppression functionality that
    prevents the generation of superfluous generation of alarms.
    When viewed in conjuction with requirement 3.6 below, this
    typically requires fault notification to the LSP egress, that
    may have specific time constraints if the client PW independently
    implements path continuity testing (for example ATM I.610
    Continuity check (CC)[I610]).

    This would also be true for LSPs that have client LSPs that are
    monitored. MPLS arbitrary hierarchy introduces the opportunity to have
    multiple MPLS levels attempt to respond to defects simultaneously.
    Mechanisms are required to coordinate network response to defects.


3.6 Support for OAM Interworking for Fault Notification

    An LSR supporting OAM functions for pseudo-wire functions that
    join one or more networking technologies over MPLS must be
    able to translate an MPLS defect into the native technology's
    error condition. For example, errors occurring over the MPLS
    transport LSP that supports an emulated ATM VC must translate
    errors into native ATM OAM AIS cells at the edges of the pseudo-
    wire. The mechanism SHOULD consider possible bounded detection
    time parameters, e.g., a "hold off" function before reacting as
    to harmonize with the client OAM. One goal would be alarm suppression
    in the psuedo-wire's client layer. As observed in 3.5, this requires
    that the MPLS layer perform detection in a bounded timeframe in
    order to initiate alarm suppression prior to the psuedo-wire
    client layer independently detecting the defect.

3.7 Error Detection and Recovery.

     Mechanisms are needed to detect an error, react to it (ideally
     in some form of automated response by the network), recover from
     it and alert the network operator prior to the customer informing
     the network operator of the error condition. The ideal situation
     would be where the network is resilient and can restore service
     prior any significant impact on the customer perception of the
     service. There are also defects that by virtue of available network
     resources or topology that cannot be recovered automatically.



Nadeau et al.              Expires October 2002              [Page 7]


Internet Draft         MPLS OAM Requirements        February 26, 2003




     It is however, sometimes a requirement that the customer be
     notified of the defect condition at the same time that the network
     operator is made aware of the defect (as in the example of alarm
     suppression for PW clients discussed above). In these situations,
     the customer network may be capable of processing automated responses
     based on notification of a defect condition.  It is preferred
     that the format of these notifications be made consistent (i.e.:
     standardized) as to increase the applicability of such messages.
     Depending on the device's capabilities, the device may be programmed
     to take automatic corrective actions as a result of detection of
     defect conditions. These actions may be user or operator-specified,
     or may simply be inherent to the underlying transport technology
     (i.e.: MPLS Fast-Reroute, graceful restart or high-availability
     functionality).

3.8 The commoditization of MPLS will require common information
     modeling of management and control of OAM functionality. This
     will be reflected in the the integration of standard MPLS-related
     MIBs (e.g. [LSRMIB][TEMIB][LBMIB][FTNMIB]) for fault, statistics
     and configuration management. These standard interfaces
     provide operators with common programmatic interface access to
     operations and management functions and their status.

3.9 Detection of Denial of Service attacks as part of security
     management.

4. Security Considerations


    LSP mis-merging has security implications beyond that of simply
    being a network defect. LSP mis-merging can happen due to a number
    of potential sources of failure, some of which (due to MPLS label
    stacking) are new to  MPLS.

    The performance of diagnostic functions and path characterization
    involve extracting a significant amount of information about
    network construction which the network operator may consider private.
    Mechanisms are required to prevent unauthorized use of either those
    tools or protocol features.


5. Acknowledgments

    The authors wish to acknowledge and thank the following
    individuals for their valuable comments to this document:
    Adrian Smith, British Telecom; Chou Lan Pok, SBC; Mr.
    Ikejiri, NTT Communications and Mr.Kumaki of KDDI.
    Hari Rakotoranto, Cisco Systems; Danny McPherson from TCB.




Nadeau et al.              Expires October 2002              [Page 8]


Internet Draft         MPLS OAM Requirements        February 26, 2003




6. References

    [TUNTRACE]    Bonica, R., Kompella, K., Meyer, D.,
                  "Tracing Requirements for Generic Tunnels",
                  Internet Draft <draft-bonica-tunneltrace-
                  02.txt>, November 2001.

    [LSRMIB]      Srinivasan, C., Viswanathan, A. and T.
                  Nadeau, "MPLS Label Switch Router Management
                  Information Base Using SMIv2", Internet
                  Draft <draft-ietf-mpls-lsr-mib-07.txt>,
                  January 2001.

    [TEMIB]       Srinivasan, C., Viswanathan, A. and T.
                  Nadeau, "MPLS Traffic Engineering Management
                  Information Base Using SMIv2", Internet
                  Draft <draft-ietf-mpls-te-mib-07.txt>,
                  August 2001.

    [FTNMIB]      Nadeau, T., Srinivasan, C., and A.
                  Viswanathan, "Multiprotocol Label Switching
                  (MPLS) FEC-To-NHLFE (FTN) Management
                  Information Base", Internet Draft <draft-
                  ietf-mpls-ftn-mib-03.txt>, August 2001.

    [LBMIB]       Dubuc, M., Dharanikota, S., Nadeau, T., J.
                  Lang, "Link Bundling Management Information
                  Base Using SMIv2", Internet Draft <draft-
                  ietf-mpls-bundle-mib-00.txt>, September
                  2001.

    [PWE3FRAME]   Pate, P., Xiao, X., White., C., Kompella.,
                  K., Malis, A., Johnson, T., and T. Nadeau,
                  "Framework for Pseudo Wire Emulation Edge-to-
                  Edge (PWE3)", Internet Draft <draft-ietf-
                  pwe3-framework-00.txt>, September, 2001.

    [RFC2026]     S. Bradner, "The Internet Standards Process
                  -- Revision 3", RFC 2026, October 1996.

    [Y1710]       ITU-T Recommendation Y.1710, "Requirements for
                  OAM Functionality In MPLS Networks"


    [GUIDELINES]  Allan, D., "Guidelines for MPLS load
                  balancing", Internet draft,
                  <draft-allan-mpls-loadbal-01.txt>, February
                  2003




Nadeau et al.              Expires October 2002              [Page 9]


Internet Draft         MPLS OAM Requirements        February 26, 2003



    [I610]      ITU-T Recommendation I.610, "B-ISDN operations and
                maintenance principles and functions", February 1999

    [FRAMEWORK] Allan et.al. "A Framework for MPLS OAM", Internet
                draft <draft-allan-mpls-oam-frmwk-04.txt>, February 2003


7. Authors' Addresses

   Thomas D. Nadeau
   Cisco Systems, Inc.
   300 Apollo Drive
   Chelmsford, MA 01824
   Phone: 978-244-3051
   Email: tnadeau@cisco.com

   Monique Jeanne Morrow
   Cisco Systems, Inc.
   Glatt-Com, 2nd Floor
   CH-8301
   Switzerland
   Voice:  (0)1 878-9412
   EMail: mmorrow@cisco.com

   George Swallow
   Cisco Systems, Inc.
   250 Apollo Drive
   Chelmsford, MA 01824
   Voice:  978 244 8143
   Email: swallow@cisco.com

   David Allan
   Nortel Networks
   3500 Carling Ave.
   Voice: 1-613-763-6362
   Ottawa, Ontario, CANADA
   Email: dallan@nortelnetworks.com


8. Full Copyright Statement

    Copyright (C) The Internet Society (2001). All Rights
    Reserved.

    This document and translations of it may be copied and
    furnished to others, and derivative works that comment on
    or otherwise explain it or assist in its implementation may
    be prepared, copied, published and distributed, in whole or
    in part, without restriction of any kind, provided that the
    above copyright notice and this paragraph are included on



Nadeau et al.              Expires October 2002             [Page 10]


Internet Draft         MPLS OAM Requirements        February 26, 2003



    all such copies and derivative works.  However, this
    document itself may not be modified in any way, such as by
    removing the copyright notice or references to the Internet
    Society or other Internet organizations, except as needed
    for the purpose of developing Internet standards in which
    case the procedures for copyrights defined in the Internet
    Standards process must be followed, or as required to
    translate it into languages other than English.

    The limited permissions granted above are perpetual and
    will not be revoked by the Internet Society or its
    successors or assigns. This document and the information
    contained herein is provided on an "AS IS" basis and THE
    INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE
    DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
    NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
    PURPOSE.

9. Intellectual Property Rights Notices

    The IETF takes no position regarding the validity or scope of any
    intellectual property or other rights that might be claimed to
    pertain to the implementation or use of the technology described in
    this document or the extent to which any license under such rights
    might or might not be available; neither does it represent that it
    has made any effort to identify any such rights.  Information on the
    IETF's procedures with respect to rights in standards-track and
    standards-related documentation can be found in BCP-11.  Copies of
    claims of rights made available for publication and any assurances of
    licenses to be made available, or the result of an attempt made to
    obtain a general license or permission for the use of such
    proprietary rights by implementers or users of this specification can
    be obtained from the IETF Secretariat.

    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights which may cover technology that may be required to practice
    this standard.  Please address the information to the IETF Executive
    Director.

Nadeau et al.              Expires October 2002             [Page 11]