Internet Draft David Allan, Editor
Document: draft-ietf-mpls-oam-frmwk-00.txt Nortel Networks
Thomas D. Nadeau, Editor
Cisco Systems, Inc.
Category: Informational
Expires: May 2005 November 2004
A Framework for MPLS Operations
and Management (OAM)
Status of this Memo
By submitting this Internet-Draft, we certify that any applicable
patent or other IPR claims of which we are aware have been disclosed,
or will be disclosed, and any of which we become aware will be
disclosed, in accordance with RFC 3668.
This document is an Internet-Draft and is subject to all
provisions of section 3 of RFC 3667. By submitting this
Internet-Draft, each author represents that any applicable patent
or other IPR claims of which he or she is aware have been or will
be disclosed, and any of which he or she become aware will be
disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use
Internet-Drafts as reference material or to cite them other than
as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document is a framework for how data plane OAM functions can be
applied to operations and maintenance procedures. The document is
structured to outline how OAM functionality can be used to assist in
fault management, configuration, accounting, performance management
and security, commonly known by the acronym FCAPS.
Table of Contents
MPLS Working Group Expires May 2005 [Page 1]
Internet Draft MPLS OAM Framework November, 2004
1. Introduction and Scope ........................................2
2. Terminology....................................................2
3. Fault Management...............................................2
3.1 Fault detection...............................................2
3.1.1 Enumeration and detection of types of data plane faults.....3
3.1.2 Timeliness..................................................5
3.2 Diagnosis.....................................................5
3.2.1 Characterization............................................5
3.2.2 Isolation...................................................5
3.3 Availability..................................................5
4. Configuration Management.......................................5
5. Accounting.....................................................6
6. Performance measurement........................................6
7. Security.......................................................6
8. Full Copyright Statement.......................................7
9. Intellectual Property Rights Notices...........................7
10. References.....................................................7
11. Editors Address................................................8
1. Introduction and Scope
This memo outlines in broader terms how data plane OAM functionality
can assist in meeting the operations and management (OAM)
requirements outlined in [REQ] and can apply to the operational
functions of fault, configuration, accounting, performance and
security (commonly known as FCAPS). The approach of the document is
to outline the requisite functionality, the potential mechanisms to
provide the function and the applicability of data plane OAM
functions.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
OAM Operations and Management
FCAPS Fault, Administration, Configuration,
Provisioning, and Security
ILM Incoming Label Map
NHLFE Next Hop Label Forwarding Entry
MIB Management Information Base
LSR Label Switching Router
RTT Round Trip Time
3. Fault Management
MPLS Working Group Expires May 2005 [Page 2]
Internet Draft MPLS OAM Framework November, 2004
3.1 Fault detection
Fault detection encompasses identifying all causes of failure to
transfer information between the ingress and egress of an LSP
ingress. This section will enumerate common failure scenarios and
explain how one might (or might not) detect the situation.
3.1.1 Enumeration and detection of types of data plane faults
Physical layer faults:
Lower layer faults are those that impact the physical layer or
link layer that transports MPLS between adjacent LSRs. Some
physical links (such as SONET/SDH) may have link layer OAM
functionality and detect and notify the LSR of link layer
faults directly. Some physical links (such as Ethernet) may not
have this capability and require MPLS or IP layer heartbeats to
detect failures. However, once detected, reaction to these
fault notifications is often the same as those described in the
first case.
Node failures:
Node failures are those that impact the forwarding capability
of an entire node, including its entire set of links. This can
be due to component failure, power outage, or reset of control
processor in an LSR employing a distributed architecture, etc.
MPLS LSP misbranching:
Misbranching occurs when there is a loss of synchronization
between the data and the control planes. This can occur due to
hardware failure, software failure or configuration problems.
It will manifest itself in one of two forms:
- packets belonging to a particular LSP are cross connected
into a an NHLFE for which there is no corresponding ILM at
the next downstream LSR. This can occur in cases where the
NHLFE entry is corrupted. Therefore the packet arrives at
the next LSR with a top label value for which the LSR has no
corresponding forwarding information, and is typically
dropped. This is a No Incoming Label Map (ILM) condition and
can be detected directly by the downstream LSR which
receives the incorrectly labeled packet.
- packets belonging to a particular LSP are cross connected
into an incorrect NHLFE entry for which there is a
corresponding ILM at the next downstream LSR, but which was
is associated with a different L
SP. This may be detected by
a number of means:
MPLS Working Group Expires May 2005 [Page 3]
Internet Draft MPLS OAM Framework November, 2004
o some or all of the misdirected traffic is not routable
at the egress node.
o Or OAM probing is able to detect the fault by detecting
the inconsistency between the path and the control
plane.
Discontinuities in the MPLS Encapsulation
The forwarding path of the FEC carried by an LSP may transit
nodes for which MPLS is not configured. This may result in a
number of behaviors (most undesirable). When there was only one
label in the stack and the payload was IP, IP forwarding will
direct the packet to the correct interface. This would be the
same if PHP is employed. Packets with a label stack will be
discarded (Tom: can you confirm this for your end).
MTU problems
MTU problems occur when client traffic cannot be fragmented by
intermediate LSRs, and is dropped somewhere along the path of
the LSP. MTU problems should appear as a discrepancy in the
traffic count between the set of ingresses and the egresses for
a FEC and will appear in the corresponding MIB performance
tables in the transit LSRs as discarded packets.
TTL Mishandling
Some Penultimate hop LSRs may consistently process TTL expiry
and propagation at penultimate hop LSRs. In these cases, it is
possible for tools that rely on consistent processing to fail.
Congestion
Congestion occurs when the offered load on any interface
exceeds the link capacity for sufficient time that the
interface buffering is exhausted. Congestion problems will
appear as a discrepancy in the traffic count between the set of
ingresses and the egresses for a FEC and will appear in the MIB
performance tables in the transit LSRs as discarded packets.
Misordering
Misordering of LSP traffic occurs when incorrect or
inappropriate load sharing is implemented within an MPLS
network. Load sharing typically takes place when equal cost
paths exist between the ingress and egress of an LSP. In these
cases, traffic is split among these equal cost paths using a
variety of algorithms. One such algorithm relies on splitting
traffic between each path on a per-packet basis. When this is
done, it is possible for some packets along the path to be
delayed due to congestion or slower links, which may result in
packets being received out of order at the egress. Detection
and remedy of this situation may be left up to client
applications that use the LSPs. For instance, TCP is capable of
re-ordering packets belonging to a specific flow. Detection of
MPLS Working Group Expires May 2005 [Page 4]
Internet Draft MPLS OAM Framework November, 2004
mis-ordering can also be determined by sending probe traffic
along the path and verifying that all probe traffic is indeed
received in the order it was transmitted.
LSRs do not normally implement mechanisms to detect misordering
of flows.
Payload Corruption
Payload corruption may occur and be undetectable by LSRs. Such
errors are typically detected by client payload integrity
mechanisms.
3.1.2 Timeliness
The design of SLAs and systems requires that ample headroom be
alloted in terms of their processing capabilites in order to
to process and handle all necessary fault conditions within the
bounds stipulated in the SLA. This includes planning for event hand
ling using a time budget which takes into account the over-all SLA
and time to address any defects which arise. However, it is
possible that some fault conditions may surpass this budget due
their catastrophic nature (i.e.: fibre cut) or due to misplanning
of the time processing budget.
^ --------------
| | ^
| | |---- Time to notify NOC + process/correct
SLA | | v defect
Max - | -------------
Time | | ^
| | |----- Time to detect/diagnose fault
| | v
v -------------
Figure 1: Fault Correction Budget
In figure 1, we represent the overall fault correction time budget
by the maximum time as specified in an SLA for the service in
question. This time is then divided into two subsections, the first
encompassing the total time required to detect a fault and notify an
operator (or optionally automatically correct the defect). This
section may have an explicit maximum time to detect defects arising
from either the application or a need to do alarm management (i.e.:
supression) and this will be reflected in the frequency of OAM
execution. The second section indicates the time required to notify
the operational systems used to diagnose and correct the defect
(if they cannot be corrected automatically).
MPLS Working Group Expires May 2005 [Page 5]
Internet Draft MPLS OAM Framework November, 2004
3.2 Diagnosis
3.2.1 Characterization
Characterization is defined as determining the forwarding path of a
packet (which may not be necessarily known). Characterization may be
performed on a working path through the network. This is done for
example, to determine ECMP paths, the MTU of a path, or simply to
know the path occupied by a specific FEC. Characterization will be
able to leverage mechanisms used for isolation.
3.2.2 Isolation
Isolation of a fault can occur in two forms. In the first case, the
local failure is detected, and the node where the failure occurred
is capable of issuing an alarm for such an event. The node should
attempt to withdraw the defective resources and/or rectify the
situation prior to raising an alarm. Active data plane OAM
mechanisms may also detect the failure conditions remotely and issue
their own alarms if the situation is not rectified quickly enough.
In the second case, the fault has not been detected locally. In this
case, the local node cannot raise an alarm, nor can it be expected
to rectify the situation. In this case, the failure may be detected
remotely via data plane OAM. This mechanism should also be able to
determine the location of the fault, perhaps on the basis of limited
information such as a customer complaint. This mechanism may also be
able to automatically remove the defective resources from and the
network and restore service, but should at least provide a network
operator with enough information by which they can perform this
operation. Given that detection of faults is desired to happen as
quickly as possible, tools which posses the ability to incrementally
test LSP health should be used to uncover faults.
3.3 Availability
Availability is the measure of the percentage of time that a service
is operating within specification, often specified by an SLA.
MPLS has several forwarding modes (depending on the control plane
used). As such more than one availability models may be defined.
4. Configuration Management
Data plane OAM can assist in configuration management by providing
the ability to verify configuration of an LSP or of applications
that may utilize that LSP. This would be an ad-hoc data plane probe
MPLS Working Group Expires May 2005 [Page 6]
Internet Draft MPLS OAM Framework
November, 2004
that should both verify path integrity (a complete path exists) as
well as verifying that the path function is synchronized with the
control plane. The probe would carry as part of the payload relevant
control plane information that the receiver would be able to compare
with the local control plane configuration.
5. Accounting
The requirements for accounting as specified in [MPLSREQS] do not
place any requirements on data plane OAM.
6. Performance measurement
Performance measurement permits the information transfer
characteristics of LSPs to be measured, perhaps in order to
compare against an SLA. This falls into two categories, latency
(where jitter is considered a variation in latency) and information
loss.
Latency can be measured in two ways: one is to have precisely
synchronized clocks at the ingress and egress such that timestamps
in PDUs flowing from the ingress to the egress can be compared. The
other is to use an exchange of PING type PDUs that gives a round
trip time (RTT) measurement, and an estimate of the one way latency
can be inferred with some loss of precision. Use of load spreading
techniques such as ECMP mean that any individual RTT measurement is
only representative of the typical RTT for a FEC.
To measure information loss, a common practice is to periodically
read ingress and egress counters (i.e.: MIB module counters). This
information may also be used for offline correlation. Another common
practice is to send explicit probe traffic which traverses the data
plane path in question. This probe traffic can also be used to
measure jitter and delay.
7. Security
Support for intra-provider data plane OAM messaging does not
introduce any new security concerns to the MPLS architecture.
Though it does actually address some that already exist, i.e.
through rigorous defect handling operator's can offer their
customers a greater degree of integrity protection that their
traffic will not be misdelivered (for example by being able to
detect leaking LSP traffic from a VPN).
Support for inter-provider data plane OAM messaging introduces a
number of security concerns as by definition, portions of LSPs will
not be in trusted space, the provider has no control over who may
inject traffic into the LSP which can be exploited for denial of
MPLS Working Group Expires May 2005 [Page 7]
Internet Draft MPLS OAM Framework November, 2004
service attacks. This creates opportunity for malicious
or poorly behaved users to disrupt network operations. Attempts to
introduce filtering on target LSP OAM flows may be problematic if
flows are not visible to intermediate LSRs. However it may be
possible to interdict flows on the return path between providers (as
faithfulness to the forwarding path is not a return path
requirement) to mitigate aspects of this vulnerability.
OAM tools may permit unauthorized or malicious users to extract
significant amounts of information about network configuration. This
would be especially true of IP based tools as in many network
configurations, MPLS does not typically extend to untrusted hosts,
but IP does. For example, TTL hiding at ingress and egress LSRs will
prevent external users from using TTL-based mechanisms to probe an
operator's network. This suggests that tools used for problem
diagnosis or which by design are capable of extracting significant
amounts of information will require authentication and authorization
of the originator. This may impact the scalability of such tools
when employed for monitoring instead of diagnosis.
8. Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved.
9. Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
MPLS Working Group Expires May 2005 [Page 8]
Internet Draft MPLS OAM Framework November, 2004
10. Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
11. Copyright Statement
Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
12. Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
13. References
13.1 Normative References
13.2 Informative References
[RFC3031] Rosen, E., Viswanathan, A., and R. Callon,
"Multiprotocol Label Switching Architecture", RFC
3031, January 2001.
[ALLAN] Allan, D., "Guidelines for MPLS Load Balancing",
draft-allan-mpls-loadbal-05.txt, IETF work in progress,
October 2003
[MPLSREQS] Nadeau et.al., "OAM Requirements for MPLS Networks",
draft-ietf-mpls-oam-requirements-01.txt, June 2003
[Y1710] ITU-T Recommendation Y.1710(2002), "Requirements for OAM
Functionality for MPLS Networks"
14. Editors' Address
David Allan
Nortel Networks Phone: +1-613-763-6362
3500 Carling Ave. Email: dallan@nortelnetworks.com
MPLS Working Group Expires May 2005 [Page 9]
Internet Draft MPLS OAM Framework November, 2004
Ottawa, Ontario, CANADA
Thomas D. Nadeau
Cisco Systems Phone: +1-978-936-1470
300 Beaver Brook Drive Email: tnadeau@cisco.com
Boxborough, MA 01824
MPLS Working Group Expires May 2005 [Page 10]