Internet Draft                                      David Allan(editor)
 Document: draft-allan-mpls-oam-frmwk-04.txt             Nortel Networks
 Category: Informational                                   February 2003
 Expires: August 2003
                     A Framework for MPLS User Plane OAM
 Status of this Memo
    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026.
    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."
    The list of current Internet-Drafts can be accessed at
    The list of Internet-Draft Shadow Directories can be accessed at
 Copyright Notice
    Copyright(C) The Internet Society (2003). All Rights Reserved.
    This Internet draft discusses many of the issues associated with
    user plane OAM for MPLS. The goal being to provide tools to perform
    "in service" maintenance of LSPs. Included in this discussion is
    some of the implications of MPLS architecture on the ability to
    support fault, diagnostic and performance management OAM
    applications, potential solutions for distinguishing user plane OAM,
    and a summary of what the authors believe can be achieved in
    addressing all aspects of the MPLS architecture.
 Sub-IP ID Summary
    (This section to be removed before publication.)
    Fits in the MPLS box.
    Allan           Expires August 2003                   Page 1
                  A Framework for MPLS User Plane OAM    February 2003
    MPLS WG is currently looking at MPLS-specific error detection and
    recovery mechanisms.  The mechanisms and design aspects discussed in
    this document are for packet-based MPLS LSPs independent of payload.
    The WG should consider this document, as it discusses the design
    aspects of error detection and measurement for packet based MPLS
 Table of Contents
 1.  Conventions used in this document...............................3
 2.  Changes since the last version..................................3
 3.  Contributors....................................................3
 4.  Requirements....................................................4
 5.  Domain Concepts.................................................4
 6.  Different deployment scenarios..................................5
 7.  MPLS architecture implications for OAM..........................6
    7.1   Topology variations within an MPLS level...................6
        7.1.1 Implications for fault management......................8
        7.1.2 Implications for performance management................8
    7.2   LSP Creation Method.......................................10
    7.3   Lack of Fixed Hierarchy...................................11
    7.4   Use of time to live (TTL).................................12
    7.5   State Association.........................................12
    7.6   Other design issues.......................................13
 8.  OAM Applications...............................................13
 9.  Ease of Implementation.........................................15
 10.  OAM Messaging.................................................15
 11.  Distinguishing OAM user plane flows...........................16
    11.1  Adding an LSP level with arbitrary label..................16
    11.2  Adding an LSP level with a reserved label value...........16
    11.3  Header modification.......................................17
 12.  The OAM return path...........................................17
 13.  Use of Hierarchy to Simplify OAM..............................19
 14.  Security Considerations.......................................20
 15.  A summary of what can be achieved.............................20
 16.  References....................................................21
 17.  Editor's Address..............................................22
    Allan et. al.           Expires April 2003               Page 2
                  A Framework for MPLS User Plane OAM    February 2003
 1. Conventions used in this document
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    this document are to be interpreted as described in RFC-2119 [1].
    The term MPLS "level" nominally refers to the MPLS stack level
    inclusive of reserved labels. In this document the term "level" is
    used exclusive of reserved labels, therefore the term "level" is
    more precisely analogous to a specific MPLS subnetwork layer
 2. Changes since the last version
    Motivations section removed to align with [MPLSREQS]. References
 3. Contributors
    Mina Azad
    Azad-Mohtaj Consulting       Phone: 1-613-722-0878
    Ottawa, Ontario, CANADA      Email:
    Enrique G. Cuevas
    Room D3-2B25                 Phone: +1 732 420 3252
    200 S. Laurel Avenue         E-mail:
    Middletown, NJ 07748 USA
    Neil Harrison
    British Telecom              Phone: 44-1604-845933
    Heath Bank                   Email:
    Iugby Road, Harleston
    South Hampton, UK
    Sanford Goldfless
    192 Fuller St.
    Brookline, MA 02446 USA      Email:
    Eric Davalo
    Maple Optical Systems
    3200 North First Street      Phone:  408 545 3110
    San Jose CA 95134            Email:
    Arun Punj
    Marconi Communications
    1000 Marconi Drive,
    Warrandale - PA - 15086      Email:
    Marcus Brunner
    Network Laboratories - NEC Europe Ltd.
    Adenauerplatz 6              Phone: +49 (0)6221/ 9051129
    Allan et. al.           Expires April 2003               Page 3
                  A Framework for MPLS User Plane OAM    February 2003
    D-69115 Heidelberg, Germany  Email:
    Chou Lan Pok
    SBC Technology Resources, Inc.
    4698 Willow Road,            Phone: 925-598-1229
    Pleasanton, CA 94583         Email:
    Wesam Alanqar
    9300, Metcalf Ave,           Phone: +1-913-534-5623
    Overland Park, KS 66212
 4. Requirements
    MPLS user-plane OAM specific requirements and a summary of
    requirements that have appeared in numerous PPVPN, PWE3, and MPLS
    documents appear in [HARRISON-REQ] [Y1710] and [MPLSREQS]. This
    Internet draft discusses the implications of extending OAM across
    the MPLS architecture, and adds additional user-plane OAM
    requirements and capabilities for managing multi-provider networks.
    This document also broadens the scope of the requirements discussion
    in identifying where certain OAM applications simply cannot be
    implemented without modifications to current practice/architecture.
 5. Domain Concepts
    MPLS introduces a richness in layering which renders traditional
    definitions inadequate. In particular, it is noted that MPLS has no
    fixed layered hierarchy (this is a unique property that no other
    technology has offered before).
    A provider may have MPLS peer providers, use MPLS transit from
    serving providers (and require MPLS or non-MPLS client transport),
    and offer MPLS transit to MPLS or non-MPLS clients). Further, the
    same provider may use a hierarchy of LSPs within their own network.
    This Internet Draft defines the concept of an "Operations Domain"
    (to cover OAM capabilities operated by a single provider) that may
    only be a partition of the end-end LSP. Operations Domain functions
    are an interdependent mix of control-plane, user-plane (a.k.a. data-
    plane), and management-plane functions.
    An LSP "of level m" may span numerous Operational Domains
    (contiguous user plane) while the control and management planes may
    be disjoint. The goal is to provide OAM functionality for each LSP
    "of level m" regardless of "m", and independent of the LSP creation
    mechanism or payload.
    It is possible to have a hierarchy of operators (e.g. carriers of
    carriers), where overlay Operational Domains are opaque to the
    serving Domain. Therefore it is required that each LSP "of level m"
    implement its own OAM functionality, and the OAM applications are
    confined to the Operational Domains traversed at level "m".
    Allan et. al.           Expires April 2003               Page 4
                  A Framework for MPLS User Plane OAM    February 2003
    Note that this concept has subtle differences with concepts of
    horizontal and vertical hierarchy as defined in [HIERARCHY].
    Vertical hierarchy usually refers to networking layer boundaries
    distinguished by technology. An operational domain may refer to an
    operator specific hierarchical subset of the LSP levels within the
    MPLS network  and/or a horizontal partitioning within a specific
    LSP. Similarly there is a further way to consider the concept of
    operational domain and horizontal hierarchy. An operational domain
    may be hierarchically partitioned (e.g. OSPF "areas") but may be
    operationally integrated and contiguous.
 6. Different deployment scenarios
    At the present time there are a number of MPLS deployment scenarios
    each with a number of subtleties from a user plane OAM perspective.
    Each can be viewed as a characteristic of an operational domain:
    The sparse model: This can be in conjunction with control plane
    signalling (e.g. MPLS based traffic engineering applied to an IP
    network) or with simple provisioned LSPs (no control plane
    signalling). The key feature being that the MPLS operational domain
    will not have any-to-any connectivity at the MPLS layer due to the
    sparse use of LSPs to augment the served layer connectivity. This
    has operational and scalability implications as OAM connectivity
    must be explicitly added to the model, or the operator may be
    obliged to depend on "layer violations" embedded in OAM mechanisms
    which are strictly only relevant to a different higher layer network
    (e.g. [ICMP]) to generate a return path.
    The ubiquitous model: This model generally combines MPLS, integrated
    routing and control to produce ubiquitous any-to-any connectivity
    within an operational domain. This may be combined with a hierarchy
    of LSPs to modify the topology presented to the client layer. This
    offers providers the option of utilizing the resources inherent to
    all planes of the Operational Domain in designing OAM functionality.
    These two models of MPLS connectivity can be stacked or concatenated
    to support numerous configurations of peering and overlay networking
    arrangements between providers and users. A direct inference being
    that an operational domain will not necessarily have knowledge of
    the domains above and/or below it, and in the general case far less
    knowledge of (and certainly less control over) its peer domains. OAM
    applications for LSPs of a specific level are confined to an
    operational domain and its user plane peers.
    More recently there is a tendency to overlay a L2 or L3 VPN service
    level on the user-plane of an operational domain, with it's own
    identifiers and addressing, while tunneling control information
    across the control plane of the operational domain using BGP-4
    [2547][KOMPELLA] or extended LDP discovery [MARTINI][HEINANEN]. From
    a user plane OAM perspective, we would consider this to be a
    separate operational domain, and anticipate that it is only a matter
    Allan et. al.           Expires April 2003               Page 5
                  A Framework for MPLS User Plane OAM    February 2003
    of time before such service levels evolve to span multiple
    operational domains (for example, an L2 or L3 VPN that spans
    multiple providers, or the introduction of tandem points at the user
    plane of the service level).
 7. MPLS architecture implications for OAM
 7.1 Topology variations within an MPLS level
    There are a number of topology variations in the MPLS architecture
    that have OAM implications. These are:
    - Uni-directional and bi-directional LSPs. A uni-directional LSP
    only provides connectivity in one direction, and if return path
    connectivity exists, it is an attribute of the operational domain
    (e.g. signalling, management or client layers), and not a unique
    attribute of the LSP. Bi-directional LSPs or specific return path
    (e.g. [CHANG][DUBE]) have inherent symmetrical connectivity as an
    attribute of the LSP.
    - Multipoint-to-point (mp2p) LSPs are where a single LSP uses
    "merge" LSR transfer functions to provide connectivity between
    multiple ingress LSRs and a single egress LSR (sufficient
    information being present in the payload to permit higher layer
    demultiplexing at the egress). There are a number of problems
    inherent to mp2p topological constructs that cannot be addressed by
    traditional p2p mechanisms. One issue being that for some OAM
    applications (e.g. user plane fault propagation) OAM flows may
    require visibility at merge-points to limit the impact of partial
    failures or congestion.
    "Best effort" mp2p LSPs may have fairness issues with some packet
    schedulers. This may complicate obtaining consistent measurements
    under congestion conditions. Explicitly routed mp2p LSPs with
    associated resource reservations are significantly more complex to
    engineer. The resource reservations required will be cumulative at
    merge points (as will jitter), and the ability to provide
    differentiated handling for specific ingresses is lost once any
    merge point is crossed. One opinion would be that the complexity and
    difficulty in the configuration/maintenance of ER-mp2p LSPs
    significantly outweighs the scalability benefits, and would not
    likely be deployed.
    - Penultimate Hop label Popping (PHP), is an optimization in the
    architecture in which the last LSR prior to the egress removes the
    (supposedly) redundant current MPLS label from the label stack.
    Therefore the ability to infer LSP specific context (OAM and other)
    is lost prior to reaching the final destination.
    MPLS does not provide for protocol multiplexing via payload
    identification (with the exception of the explicit IPV4 and IPV6
    labels). PHP requires that the final hop have a common protocol
    payload (typically IP) or is able to map to lower layer protocol
    Allan et. al.           Expires April 2003               Page 6
                  A Framework for MPLS User Plane OAM    February 2003
    multiplexing capability (e.g. PPP Protocol Field or Ethernet
    ethertype) as the ability to infer payload type from LSP label is
    Another scenario where PHP is employed is when the egress LSR is not
    actually MPLS user plane capable. This has user-plane OAM
    implications in that any MPLS specific flows need to terminate at
    the PHP LSR. This requires that the PHP LSR proxies OAM functions on
    behalf of the egress LSR. This will introduce complexity when any
    type of consequent actions such as layer interworking of fault
    notification is required.
    - E-LSPs [MPLSDIFF] in which a single LSP supports multiple queuing
    disciplines to support multiple QoS behavior aggregates. Ability to
    perform OAM functions on a "per behavior aggregate" basis is
    critical to managing E-LSPs.
    - Management plane provisioned LSPs, vs. control plane signalled
    LSPs. In many scenarios associated with a control plane, the
    topology of the LSP varies over time. This can be due to many
    reasons, implicit routing, dynamic set up of local repair tunnels
    etc. etc.
    - The potential existence of multiple LSPs between an ingress and an
    egress LSR. This can be for many reasons, L-LSPs, equal cost
    multipath routing etc. etc.
    - The potential existence of multiple next hop label forwarding
    entries (NHLFEs) for a single incoming label. This is the scenario
    whereby the incoming label map (ILM) for an incoming label switch
    hop (LSH) maps to an inverse multiplex of NHLFEs which may be re-
    merged into a common egress or have multiple egress points. The
    mechanism for selecting the NHLFE to use may be proprietary and is
    performed on a packet by packet basis. Some implementations hash
    both the label stack and any IP payload source and destination
    addresses in order to preserve flow ordering while achieving good
    fan out. However this means that predictability of any nested LSPs
    degrades in the presence of problems.
    OAM tools not specifically aware of this construct may produce
    random results (insufficient frequency of failure to trigger
    threshold detection), or pathologically may only test a subset of
    the NHLFEs impacting both the detection and diagnosis of defects.
    Similarly performance monitoring is impacted as packets in flight
    cannot accurately be accounted for. The ramifications are
    comprehensively discussed in [ALLAN].
    - Use of per-platform label space. A per-platform label has
    significance at a nodal level and not just an interface level. Some
    of the more interesting applications being the ability to create
    unsignalled facility backup LSPs in "bypass tunnels" [SWALLOW].
    Traffic arriving on multiple interfaces and/or LSP tunnels may use a
    common per-platform label and will have a common ILM and NHLFEs.
    Allan et. al.           Expires April 2003               Page 7
                  A Framework for MPLS User Plane OAM    February 2003
    This can have implications similar to mp2p and PHP depending on how
    it is used; LSP origin information is not conserved when multiple
    sources use a common label.
    - p2mp and mp2mp LSPs (a.k.a. MPLS Multicast) is for further study.
    At the present time what placeholders exist in the architecture for
    multicast treat it as a separate protocol from "unicast" MPLS (with
    the exception of ATM variations of MPLS).
    These topological variations introduce complexity when attempting to
    instrument OAM applications within a specific MPLS level such as
    performance management, fault detection, fault isolation/diagnosis,
    fault handling (e.g. consequent actions taken to avoid raising
    unnecessary alarms in client layers) and fault notification.
 7.1.1 Implications for fault management
    mp2p, E-LSPs and PHP have implications for fault management,
    specifically if an LSR is required to have knowledge of both the
    ingress LSR and the specific LSP that an OAM message arrived on, or
    is expected to have knowledge of, and maintain state about the set
    of ingress LSRs for an LSP. OAM messaging needs mechanisms to
    distinguish both the ingress LSR and the specific LSP. (This ability
    is expressed on these terms as LSPs are typically not given globally
    unique identifiers, more frequently some locally administered LSR-ID
    is used).
    Connectivity verification requires testing of connectivity between
    all possible ingress/egress combinations. Frequently it will not be
    possible to infer the ingress LSR and specific LSP directly at the
    egress as such information may be lost at merge points in mp2p LSPs
    or due to PHP. This is true for both OAM messaging, and normal user
    plane payloads. There may be numerous reasons why an ingress-egress
    pair may have a plurality of LSPs between them, so the ability to
    distinguish the source and purpose of specific probes beyond mere
    knowledge of the originating LSR is required.
    The ability to distinguish the ingress can be achieved via modifying
    the OAM protocol to carry such information, or may be achieved via
    modifications to operational procedures such as overlaying p2p
 7.1.2 Implications for performance management
    Many performance management functions can be performed by obtaining
    and comparing measurements taken at different points in the network.
    Comparing ingress and egress statistics being the simplest example
    (but is usually restricted to within a single domain). The key issue
    is ensuring that "apples-to-apples" comparison of measurements is
    possible. This means that all measurement points need to be able to
    similarly classify the traffic and performance they are measuring,
    and that the measurements are synchronized in time and compensate
    for traffic in flight between the measurement points.
    Allan et. al.           Expires April 2003               Page 8
                  A Framework for MPLS User Plane OAM    February 2003
    For example, a relatively simple technique for establishing key
    performance metrics would be to compare what was sent with what was
    received. For example the PPP line quality monitoring (LQM) function
    the ingress periodically sends statistics to the egress for
    comparison subject to the same queuing discipline as the user plane
    traffic, such that traffic in flight is properly accounted for.
    (Note that re-ordering will introduce errors but is not expected to
    be frequent.)
    It is important to distinguish, and be able to measure, what
    constitutes the up and down states of an LSP.  This needs to be
    standardized so that there is unified treatment.  A key observation
    here is that QoS metrics (like loss, errored packets, delay, etc)
    are only relevant to when the LSP is in the up-state; and so any
    collection of QoS measurements is suspended when the LSP enters the
    down-state. This requires specification of the state transitions to
    achieve measurement consistency, and is a pre-requisite to QoS
    assessment. This is a particularly important metric to operators,
    since customers will be expecting operators to be able to offer both
    QoS and availability SLAs, and so these must be differentiated and
    uniquely measurable.
    A simple ingress/egress comparison is not always possible, there is
    no ability to similarly classify what is being measured at the
    ingress and egress of an LSP. mp2p LSPs and PHP do not have a 1:1
    relationship between the ingress and the egress. LSPs containing
    ILMs that map to multiple NHLFEs introduce measurement inaccuracy as
    not all packets share a common queuing discipline and where this
    results in multiple egress points from the network, there is an
    inability to synchronize measurements. Partial failure of an mp2p
    LSP (incl. ECMP) will result in the inability to successfully
    collect statistics
    So, in addition to having to define up/down-state transitions, for
    successful PM the 1:1 relationship needs to be restored by either:
    - The mp2p/PHP LSP is modeled as one LSP for measurement. This means
    that measurements performed at ingress points need to be
    synchronized and adjusted for common LSP segments such that the
    results are all presented to the egress simultaneously (again
    correcting for traffic in flight), a technique dependent on such a
    high degree of synchronization would be impossible to perfect, and
    prone to a degree of error.
    - The mp2p/PHP LSP is modeled as a collection of "ingress" LSPs for
    measurement. This means that the egress needs to be able maintain
    statistics by ingress and appropriately classify traffic
    Neither of the above is achievable at the present time without
    modifying existing operational procedures. The first approach
    involves treating the mp2p/PHP LSP as an aggregate, and as such it
    Allan et. al.           Expires April 2003               Page 9
                  A Framework for MPLS User Plane OAM    February 2003
    can partially fail and degrade. This complicates the establishment
    of performance metrics and specifying recovery procedures on errors.
    The second approach requires decomposing the mp2p/PHP LSP such that
    both payload and OAM traffic can be demultiplexed at the egress and
    correctly associated with "per-ingress" state. The ability to
    demultiplex both OAM and payload implies a common wrapper, and the
    net effect would be to overlay p2p connectivity on top of the
    merge/PHP based transport level.
    The existence of E-LSPs adds a wrinkle to the problem of measurement
    synchronization. An E-LSP may implement multiple diffserv PHBs and
    incorporate multiple queuing disciplines. An aggregate measurement
    for the entire LSP sent from ingress to egress would frequently have
    a small margin of error when compared with an aggregate measurement
    taken at the egress. Separate measurement comparisons for each
    supported EXP code point would be required to eliminate the error.
    The situation is slightly different for p2p LSPs containing ILMs
    that map to multiple NHLFEs. If all the NHLFEs are merged back into
    a single entity prior to the egress, there will inherently be a
    degree of measurement error that modifications to operational
    procedure cannot correct. However there is no guarantee that this
    will be the case, and any individual ingress measurement may be
    compared with only one of several egress measurement points (either
    random or pathological).
 7.2 LSP Creation Method
    The ability to usefully audit the constituent components of an LSP
    is dependent on the technique used to create the LSP.Presently
    defined are provisioning, LDP, CR_LDP, RSVP-TE, and BGP.
    LSP creation techniques that are currently defined fall at two
    relative extremes:
    At one extreme is explicitly routed point-to-point connection
    between fixed ingress and egress points in the network. Explicitly
    routed (ER) LSPs  (today created via provisioning, CR-LDP, RSVP-TE
    or BGP) have a significant degree of testability as the path across
    the network and the egress point is fixed and knowable to a testing
    entity. Similarly explicit pairwise and stateful
    testing/measurement relationships can be set up (e.g. connectivity
    verification) and strict criteria for failure established.
    In the middle is static mp2p constructs typically signalled via BGP
    (e.g. RFC 2547).
    At the other extreme is when LSP construction is topology driven
    (such as dynamic "shortest path first" routing combined with LDP),
    whereby the details of path construction between the ingress and
    egress points in the network will vary over time and may involve
    several stages of multiplexing with traffic from other sources. The
    Allan et. al.           Expires April 2003               Page 10
                  A Framework for MPLS User Plane OAM    February 2003
    details of path construction at any given instant are not
    necessarily knowable to an auditing entity so any attempt to
    interpret the results of an audit may generate spurious results.
    Further, the MPLS network may only be a portion of the operational
    domain, and the egress point from the network for an FEC may vary
    over time.
    The testable unit in an LDP network is the FEC not the LSP, and the
    potential existence of a many to many relationship of ingress and
    egress points limits the testability of the FEC, or at least may
    limit the frequency of using such tests.
    The connectivity instantiated in a specific LSP created by a
    topology driven control plane signalling mechanism will recover
    from many defects in the network. The quality of recovery typically
    being a function of how the network is engineered.
    Problems are typically detected by having MPLS connectivity fate
    share with the constituent physical links and routing adjacencies,
    and topology driven path re-arrangement will restore the
    connectivity (with some interruption and other side effects
    occurring between the initial failure and re-convergence of the
    network). However exclusive dependence on fate sharing for failure
    detection means that LSP components may have unique failure modes
    from which the network will not recover and can only be diagnosed
    As can be inferred from the above, what is required for topology
    driven LSPs is a test mechanism that audits forwarding policy as
    this is the metric by which some aspects of network performance can
    be measured.
 7.3 Lack of Fixed Hierarchy
    MPLS supports an arbitrary hierarchy in the form of label stacking.
    This is a facility that can be leveraged for OAM purposes. As an
    example, the section on implications for performance management has
    already outlined how p2p topology for PM can be overlaid on an
    arbitrary merged topology to add manageability of services.
    Similarly functions requiring sectionalization of an LSP or ability
    to isolate partial failure of a complex construct can be achieved by
    constructing the LSP as an overlay upon a concatenation of
    operationally significant shorter LSPs. By operationally significant
    we would refer to LSPs that spanned useful portions of the whole
    construct (e.g. a branch of an mp2p LSP, or bypassed LSRs that did
    not have OAM capability).
    This could simplify the instrumentation of level specific OAM by
    ensuring only e2e functions were required (as opposed to functions
    originating or terminating at arbitrary points in the network),
    while driving up the complexity of LSP establishment due to the
    resultant inter-level configuration issues when creating multi-level
    constructs with the desired manageability.
    Allan et. al.           Expires April 2003               Page 11
                  A Framework for MPLS User Plane OAM    February 2003
 7.4 Use of time to live (TTL)
    Experience within the IP world has suggested that TTL was a
    serendipitous feature that can be similarly leveraged by MPLS.
    However in the MPLS world, TTL suffers from inconsistent
    implementation depending on the link layer technology spanned by the
    target LSP. The existence of non-TTL capable links (e.g. MPLS/ATM)
    has impact on the utility of using TTL to augment the MPLS OAM
    toolkit. For example, use of TTL as an aid in fault sectionalization
    can only isolate a fault to the granularity of a non-TTL capable
    span of LSH or LSP segments.
    There are other variations in TTL handling that suggest interpreting
    results of TTL based tests may be problematic. As outlined in [TTL]
    there are two models of TTL handling with different implications:
    - the uniform model, in which decrement of TTL is independent of the
    MPLS level. At the ingress point to an MPLS level, the current TTL
    is copied into the new top label, and at egress is copied back to
    the revealed top level.
    - the pipe and short pipe models, whereby MPLS tunnels (aka LSPs)
    are used to hide the intermediate MPLS nodes between LSP ingress and
    egress from a TTL perspective.
    The uniform model originates with preserving IP TTL semantics when
    IP traffic transits an MPLS subnetwork. The uniform model will
    reduce the resource consumption of routing loops, but in a correctly
    operating network may lead to premature discard of packets outside
    the operational domain they originated from (due to the existence of
    an arbitrary number of serving MPLS levels). Similarly when a
    routing loop occurs, determining the MPLS level that is the source
    of the problem will be difficult as there is no method to correlate
    it with the level where the exhaustion event occurred.
    The pipe model is more consistent with the operational domain model
    in that TTL exhaustion will only occur at a specified level and the
    initial values used at LSP ingress are more likely to be reflective
    of detecting what would genuinely constitute a routing loop.
    A reasonable expectation is that the uniform model would not be used
    outside of an operational domain.
    A separate issue is that it is also possible that an LSR may
    decrement TTL by an amount other than one as a matter of policy.
    This means that the results obtained via any tools that use TTL
    exhaustion will require some interpretation.
 7.5 State Association
    Allan et. al.           Expires April 2003               Page 12
                  A Framework for MPLS User Plane OAM    February 2003
    The design of OAM flows in MPLS levels that multiplex traffic from
    multiple sources together may introduce implementation complexity
    where the flows are processed. The receiver of the OAM message will
    need to extract information from the packet to identify the LSP and
    associate it with ingress and LSP specific state. If the ingress/LSP
    identifier in the packet is not administered by the processing node,
    it will be unable to optimize the implementation of the state
    association mechanism and will be required to perform some sort of
    table search.
    If the identifier is administered by the processing node and that
    node is not the originator of the probe, some mechanism will be
    required to distribute this information uniquely to each probe
 7.6 Other design issues
    It is desirable to make the user plane OAM implementations
    independent of LSP specifics. It would be desirable to have common
    mechanisms across p2p and mp2p LSPs, PHP or no-PHP and independent
    of payload and the method of LSP creation in order to minimize
    overall complexity. The OAM application originator should not need
    (as far as is practical) any knowledge of the details of LSP
    PM may require that instrumentation of many OAM applications is only
    possible for p2p LSPs and therefore would only be possible for a
    select group of MPLS levels (e.g. overlaid service labels as per
    Fault management must be applicable across the spectrum of all label
    levels and LSR transfer functions.
    Finally, the possibility of re-ordering of OAM messaging must be
    considered. The design of OAM applications and messaging must be
    tolerant of out of order delivery and some degree of packet loss.
    For some applications the originator/termination will require a
    means to uniquely correlate requests with probe responses (including
    responses to mis-directed probes) or verify in sequence receipt.
 8. OAM Applications
    The purpose of having user plane LSP specific OAM transactions is to
    support useful OAM applications. Examples of such applications
    Fault management
    - On demand verification: the ability to perform connectivity tests
    that exercise the specific LSP and the provisioning at the ingress
    and egress. On demand suggests that verification may be performed on
    an ad-hoc basis.
   Allan et. al.           Expires April 2003               Page 13
                  A Framework for MPLS User Plane OAM    February 2003
    - Fault detection: Operators cannot expect customers to act as fault
    detectors, and so the ability to perform automated detection of the
    failure of a specific LSP is a "must have" feature (although when
    one reviews the section on LSP creation above, one realizes it will
    not be ubiquitously used). Some MPLS deployment scenarios may not
    have a control plane or may have LSP processing components not in
    common with the control plane, so fault detection procedures may
    need to be augmented with LSP specific methods.
    - Fault sectionalization: The ability to efficiently determine where
    a failure has occurred in an LSP.  Sectionalization must be able to
    be performed from an arbitrary LSR along the path of the LSP.
    - Fault Propagation: specific MPLS deployment scenarios may not have
    a control-plane to propagate LSP failure information. Fault
    propagation has numerous forms and there are variations depending on
    whether the failure is in the serving layer/level or :
    i)  Northbound from the failed level to the management plane.
    ii)  Within the failed level.
    iii) From the failed level to its clients.
    iv)  Within the client level to the LSP ingress and egress either
    via the user or control planes.
    And in all cases it is the termination of a layer that performs the
    Performance management
    - The ability to determine whether an LSP meets certain goals with
    respect to latency, packet loss etc.
    - The ability to collect information to facilitate network
    engineering decisions.
    Of the above applications, verification, detection and
    sectionalization explicitly need to exercise all components of the
    forwarding path of the target LSP, otherwise there will be failure
    scenarios that cannot be detected or properly sectionalized. These
    applications cannot be supported properly if there are differences
    in handling between user traffic and OAM probes at intermediate
    A separate and useful classification of the applications outlined
    above is to distinguish the difference between monitoring
    applications and diagnosis. Monitoring applications are typically
    unattended in operation, collect operational statistics, and upon
    detection of problems, must provide sufficient information to permit
    precise diagnosis of the problem to be performed and frequently some
    form of automated network response to problems. Diagnosis
    applications are typically attended in operation and must be able to
    authoritatively locate and isolate faults. The security implications
    of this distinction is discussed in the security considerations
    Allan et. al.           Expires April 2003               Page 14
                  A Framework for MPLS User Plane OAM    February 2003
 9. Ease of Implementation
    Complex functions are typically require software implementation and
    are not capable of handling line rate messaging. Implementations
    defend themselves via rate-limiting or similar load management
    techniques to avoid vulnerabilities to DOS attacks or simple mis-use
    by incompetent craftspersons. In many cases, the complexity of
    adding strong authentication as defense against DOS attacks may be
    less onerous than promiscuous processing of complex probes.
    Probes supporting monitoring applications gain the most benefit when
    they can run at line rate such that there are no concerns about
    processing capacity at the processing network elements. Such tests
    will generate predicable results (or at least not have results
    degraded when network elements are under stress) and automated
    procedures can be designed around such mechanisms. MP2P LSPs are an
    exemplary case where egress processing of probes may be required to
    support probes from an arbitrary number of unsynchronized sources.
    Messaging mechanisms to perform diagnostic tests (once a fault has
    been authoritatively established) tend to be more complex and
    software intensive. Diagnostic tests are frequently used by
    craftspersons, and can be more tolerant of things like discard due
    to rate limiting.
 10.OAM Messaging
    OAM should be decoupled from user behavior to ensure consistent OAM
    functional behavior (under any traffic conditions) and avoid the use
    of customers as guinea pigs. We consider it to be self evident that
    providers will require a toolkit that includes some form of user
    plane OAM messaging.
    At the specific LSP level, support of OAM applications require
    messages that flow between three entities, the LSP ingress, the
    intervening network and the LSP egress. As an LSP is unidirectional,
    it should be self evident that OAM applications that require
    feedback in the reverse direction will have such communication occur
    either at the specific LSP level, or some user plane LSP level in
    the operational domain, or one of the other planes (control or
    management) of the operational domain.
    The set of possible individual transactions (plus examples of their
    utility) is as follows:
    LSP specific user-plane transactions:
    - ingress to egress
        applicability: verification, fault detection, performance
    - ingress to network
        message will terminate at an intermediate LSR traversed by
        the LSP.
        Applicability: sectionalization from source
    Allan et. al.           Expires April 2003               Page 15
                  A Framework for MPLS User Plane OAM    February 2003
    - network to egress
        message is inserted into the LSP at an intermediate node
        and terminates at the LSP egress LSR.
        Applicability: sectionalization from arbitrary point in an
    - Network to network
        Applicability: sectionalization from arbitrary point in an
    Feedback transactions
    - egress to ingress
        applicability: verification, fault detection.
    - egress to network
        flow originates at the LSP egress and terminates at
        intermediate node traversed by the LSP.
           Applicability: sectionalization from arbitrary point in an
    - network to ingress
        flow will originate at an intermediate LSR traversed by
        the LSP and terminate at the LSP source.
        Applicability: sectionalization from ingress.
    - network to network
        Applicability: sectionalization from arbitrary point in an
 11.Distinguishing OAM user plane flows
    MPLS does not currently provide for protocol multiplexing at a
    specific LSP level. However a requirement still exists to
    distinguish per-LSP OAM messaging from user payload.
    The options for addressing the identification of OAM flows are:
 11.1 Adding an LSP level with arbitrary label
    OAM flows could be identified by adding an LSP level to the existing
    LSP using an arbitrary label value that by convention (negotiated at
    LSP establishment) carries the OAM payload. Adding an arbitrarily
    labeled LSP level to multiplex OAM flows will add complexity and
    additional failure modes that make it an undesirable solution.
 11.2 Adding an LSP level with a reserved label value
    Adding an LSP level to the existing LSP via the use of a stacked
    reserved label value that explicitly identifies OAM flows. Note that
    this approach was proposed in [HARRISON-MECH]. Adding an LSP level
    using a reserved label has a number of virtues:
    - LSP level OAM flows will explicitly exercise the components of the
      forwarding path.
    - LSP level OAM flows will not be erroneously forwarded outside the
      LSP they have been inserted into. The LSP itself may be defective
    Allan et. al.           Expires April 2003               Page 16
                  A Framework for MPLS User Plane OAM    February 2003
      or misrouted (but that is a separate issue) but we have not
      introduced an additional LSP level for OAM with its own set of
      possible defects.
    - LSP level e2e OAM flows are transparent to non-compliant/legacy
    However hop-by-hop OAM flows are not possible  (although the
    approach could be augmented via the use of MPLS TTL or the router
    alert label to gain SOME of the benefits of hop-by-hop messaging).
    By losing hop-by-hop, the ability to provide flows with preferential
    treatment when required is also lost.
 11.3 Header modification
    Modifying the MPLS header or stealing a bit from the label space to
    permit OAM payload to be uniquely distinguished by all LSRs
    traversed at the specific LSP level similarly has a number of
    advantages and disadvantages. We gain hop-by-hop visibility of
    messaging but at the expense of:
    - Losing backwards compatibility with legacy equipment.
    - Losing the ability to ensure we are fully exercising the
      path for verification and sectionalization.
    Similarly stealing a label bit for PHP and E-LSPs becomes
    problematic as the OAM identifier is lost when the label is popped
    (in the original MPLS architecture, the label was considered to have
    no informational value past the next to egress LSR). Adding special
    handling for OAM packets to specifically avoid PHP would no longer
    exercise all components of the forwarding path.
    The use of a reserved label under the top label would appear to be
    the approach that has the most utility and least impact on current
 12.The OAM return path
    The ability to use OAM applications such as single-ended monitoring
    of both directions from one end, or to support applications such as
    protection switching in a 1/N:M case, requires a return path to the
    LSP ingress. This enhances the scalability and reliability of some
    OAM applications as data plane OAM can function as a closed system.
    A specific example being use of a loopback where the only place
    state and timing need be maintained is at the loopback originator.
    This requires a return path to complete the loop between the "target
    LSP" and the OAM application originator. This will permit reliable
    transaction flows to be implemented that impose minimal state on the
    For the few OAM applications that require a return path, the return
    path can be tolerant of being topologically disjoint with the target
    LSP (providing the differential delays are small, ie <<1s),
    reachability of the application originator being the only hard
    Allan et. al.           Expires April 2003               Page 17
                  A Framework for MPLS User Plane OAM    February 2003
    requirement. Similarly, different OAM applications will have
    different return path requirements, and a hybrid of using all the
    planes of the operational domain (according to the application) may
    be significantly simpler and more operationally tractable than
    significant modifications to current usage to fill in connectivity
    gaps at the specific label level.
    This is a key point, LSPs are currently by definition uni-
    directional (bi-directional to date being a construct of multiple
    uni-directional LSPs), so for any non-ubiquitous deployment of MPLS
    connectivity, some modification of operational procedure to provide
    for OAM messaging will be required for the few applications that
    need it. Strict symmetry of connectivity at a specific label level
    is not guaranteed.
    In any type of sparse usage scenario (e.g. provisioned LSPs or use
    exclusively for TE) there will not be an inherent any-to-any
    connectivity in the user plane, and there may not be a control plane
    signalling system. Additional artificial constructs such as a
    "reverse notification tree" [CHANG] have been proposed to address
    this although these introduce additional operational complexity and
    a requirement for OAM for the OAM connectivity.
    In an implicit MPLS topology (e.g. LDP DU), any to any connectivity
    will typically exist, or will be easily available with minor
    alterations to operational procedure (LSRs advertise selves as
    FECs). This would continue to be true for an integrated model in
    which TE and an implicit topology were combined.
    In any type of multi-provider MPLS topology, the scenario is more
    complex, as for numerous reasons a provider may not wish to
    provision/advertise external connectivity to their LSRs. Similarly,
    for security reasons, providers may wish to apply some degree of
    policy or filtering of OAM traffic at operational domain boundaries.
    User plane OAM messaging should be designed to leverage as much
    "free connectivity" as can be obtained in the network, while
    ensuring the constructs have sufficient extensibility to ensure the
    corner cases are covered.
    Within the operational domain of a single provider, it is relatively
    easy to envision that a combination of user-plane, and control plane
    functionality will ensure that a user-plane return path is
    frequently available (although it may be topologically disjoint from
    the target LSP). This is less so for inter provider scenarios. Here
    there are a number of potential obstacles such as:
    - disjoint control plane
    - disjoint addressing plan
    - requirements for policy enforcement and security
    - impacts to scalability of ubiquitous visibility of individual LSRs
    across multiple operational domains.
    Allan et. al.           Expires April 2003               Page 18
                  A Framework for MPLS User Plane OAM    February 2003
    There are a number of approaches to providing inter-domain OAM
    connectivity, the following is a brief commentary on each:
    1) Reverse Notification Tree (a.k.a using bi-directional LSP)
    In this method, each LSP has a dedicated reverse path - i.e. the
    reverse path is established and associated with the LSP at the LSP
    setup time. This requires binding the reverse path to each LSR that
    is traversed by the LSP. This method is not scaleable, as it
    requires doubling the number of LSPs in the network. Moreover each
    reverse path requires its own OAM.
    2) Global OAM capability
    Similar to IP v4 to IP v6 migration methodology, this method
    proposes use of a global operations domain with control-plane, user-
    plane, and management-plane that interact with control-plane, user-
    plane, and management-plane of individual operations domains. This
    method requires commitment and buy-in from all network operators.
    3) Inter-domain OAM gateway
    This method proposes use of a gateway like functions at LSRs that
    are at operations domain boundaries. OAM gateway like functions
    includes capabilities to correlate OAM information from one
    operations domain to another and permit inter-carrier
    sectionalization problems to be resolved.
    Specification of an inter-domain OAM gateway capability would appear
    to be the most realistic solution.
 13. Use of Hierarchy to Simplify OAM
    Section 7.1 outlined some of the impacts of MPLS topological
    constructs that multiplexed traffic from multiple sources together.
    Section 7.5 identified additional complexity modifying protocols to
    address state mapping for OAM purposes could entail. The key issue
    identified is that for fault management, OAM protocol design would
    permit mp2p and PHP to be addressed (but at a specific
    implementation cost), but this is not possible for performance
    management, in particular if ingress specific traffic counts are
    Rather than attempting OAM protocol design to address what by
    definition will be an incomplete solution, it would be useful to
    define a common mechanism to demultiplex both MPLS level payload and
    OAM flows. The common mechanism ideally would be in the form of a
    wrapper that included an egress administered ingress identifier.
    One instantiation of such a wrapper would be a p2p MPLS label. The
    mechanisms exist for label distribution (in the form of extended LDP
    discovery), and LSPs are already passively instrumented (e.g. packet
    and byte counts). Similar benefits are obtained when the
    implementation is extended into the use of probe messages. State
    association at the egress becomes simple in that the state is
    associated directly with the incoming label (and can be obtained by
    augmenting the ILM lookup).
    Allan et. al.           Expires April 2003               Page 19
                  A Framework for MPLS User Plane OAM    February 2003
    The use of p2p overlays is one method of instrumenting mp2p and PHP
    LSPs that addresses all the issues outlined in section 7. It also
    significantly simplifies OAM protocol design and implementation.
 14.Security Considerations
    Support for intra-provider user plane OAM messaging does not
    introduce any new security concerns to the MPLS architecture.
    Though it does actually address some that already exist, i.e.
    through rigorous defect handling operator's can offer their
    customers a greater degree of integrity protection that their
    traffic will not be misdelivered (for example by being able to
    detect leaking LSP traffic from a VPN).
    Support for inter-provider user plane OAM messaging introduces a
    number of security concerns as by definition, portions of LSPs will
    not be in trusted space, the provider has no control over who may
    inject traffic into the LSP. This creates opportunity for malicious
    or poorly behaved users to disrupt network operations. Attempts to
    introduce filtering on target LSP OAM flows may be problematic if
    flows are not visible to intermediate LSRs. However it may be
    possible to interdict flows on the return path between providers (as
    faithfulness to the forwarding path is not a return path
    requirement) to mitigate aspects of this vulnerability.
    OAM tools may permit unauthorized or malicious users to extract
    significant amounts of information about network configuration. This
    would be especially true of IP based tools as in many network
    configurations, MPLS does not typically extend to untrusted hosts,
    but IP does. This suggests that tools used for problem diagnosis or
    which by design are capable of extracting significant amounts of
    information will require authentication and authorization of the
    originator. This may impact the scalability of such tools when
    employed for monitoring instead of diagnosis.
 15. A summary of what can be achieved.
    This draft identifies useful MPLS OAM capability that potentially
    could be provided via user plane OAM functions. In particular with
    respect to automatic fault detection and failure handling.
    This draft suggests that it may be possible to provide this
    capability for any level in the label stack either by instrumenting
    that level, or instrumenting an overlay.
    This draft also identifies that many aspects of performance
    management are intractable for some MPLS topological constructs. Any
    type of comparative measurement between an ingress and the egress of
    an LSP requires a 1:1 cardinality, or the ability of the egress to
    uniquely determine the ingress for each measured unit of
    communication, something that LSP merge, PHP and possible use of per
    platform label space at the measured LSP level undermine. Again a
    Allan et. al.           Expires April 2003               Page 20
                  A Framework for MPLS User Plane OAM    February 2003
    potential solution is to instrument a p2p overlay where such
    detailed measurements are required, and otherwise unavailable.
 16. References
    [ALLAN] Allan, D., "Guidelines for MPLS Load Balancing", draft-
      allan-mpls-loadbal-01.txt, IETF work in progress, February 2003
    [CHANG] Owens, "A Path Protection/Restoration Mechanism
      for MPLS Networks", draft-chang-mpls-path-protection-02.txt,
      IETF work in progress, November 2000.
    [DUBE] Dube, R., Costa, M. "Bi-directional LSPs for classical
      MPLS", draft-dube-bidirectional-lsp-00.txt, IETF work in
      progress, July 2002
    [HEINANEN] Heinanen, J., "Directory/LDP Based Ethernet VPNs",
      draft-heinanen-dirldp-eth-vpns-01.txt, IETF work in progress,
      November 2001
    [HIERARCHY] Lai " Network Hierarchy and Multilayer
      Survivability", draft-ietf-tewg-restore-hierarchy-00.txt, IETF
      Work in Progress, September 2001
    [ICMP] Bonica et. al. "ICMP Extensions for MultiProtocol Label
      Switching", draft-ietf-mpls-icmp-02.txt,
      IETF Work in Progress, August 2000.
    [KOMPELLA] Kompella "MPLS-based Layer 2 VPNs",
      draft-kompella-mpls-l2vpn-02.txt, IETF Work in Progress,
      December 2000
    [MARTINI]Martini "Transport of Layer 2 Frames Over
      MPLS", draft-martini-l2circuit-trans-mpls-06.txt, IETF Work
      in Progress, May 2001
    [MPLSDIFF] Le Faucheur "MPLS Support of Differentiated
      Services", IETF RFC 3270, May 2002
    [MPLSREQS] Nadeau, "OAM Requirements for MPLS Networks",
      draft-nadeau-ietf-oam-requirements-00.txt, February 2003
    [2547] Rosen, E. Rekhter, Y., "BGP/MPLS VPNs", IETF RFC 2547,
      March 1999
    [HARRISON-MECH] Harrison "OAM Functionality for MPLS
      Networks", draft-harrison-mpls-oam-00.txt, February 2001
    [HARRISON-REQ] Harrison "Requirements for OAM in MPLS
      Networks", draft-harrison-mpls-oam-req-01.txt, November 2001
    [SWALLOW] Swallow, G. and Goguen, R., "RSVP Label Allocation for
      Backup Tunnels", draft-swallow-rsvp-bypass-label-01.txt,
    Allan et. al.           Expires April 2003               Page 21
                  A Framework for MPLS User Plane OAM    February 2003
      November 2000
    [TTL] Agarwal, P., and Akyol, B., "TTL Processing in MPLS Networks",
      IETF RFC 3443, January 2003
 17. Editor's Address
    David Allan
    Nortel Networks              Phone: 1-613-763-6362
    3500 Carling Ave.            Email:
    Ottawa, Ontario, CANADA
    Allan et. al.           Expires April 2003               Page 22