Network Working Group S. Amante
Internet-Draft Level 3 Communications, LLC
Intended status: Informational A. Atlas
Expires: August 21, 2008 BT
A. Lange
Alcatel-Lucent
D. McPherson
Arbor Networks, Inc.
February 18, 2008
Operations and Maintenance Next Generation Requirements
draft-amante-oam-ng-requirements-01
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 21, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2008).
Abstract
Current IP and MPLS OAM techniques need to be extended to permit
operators to effectively diagnose load-balancing issues.
Specifically, new ad-hoc OAM techniques are needed to diganose
Amante, et al. Expires August 21, 2008 [Page 1]
Internet-Draft OAM-NG Requirements February 2008
various link-bundling techniques, such as IP/MPLS Equal Cost Multi-
Path (ECMP) and Link Aggregation Groups (LAG). In addition, these
OAM tools should also be extended to permit performance monitoring
over longer time durations. This document defines requirements for
the next generation of OAM solutions.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Contributors . . . . . . . . . . . . . . . . . . . . . . . 4
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Types of Exercise Mechanisms . . . . . . . . . . . . . . . 5
3.2. Scenario 1: Traceroute through Routed Hops . . . . . . . . 5
3.3. Scenario 2: Traceroute through One Switched Hop . . . . . 6
3.4. Scenario 3: Traceroute through Two, or More, Switched
Hops . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5. ECMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.6. Proxy Traceroute/Ping Functionality . . . . . . . . . . . 10
4. Performance Monitoring . . . . . . . . . . . . . . . . . . . . 11
4.1. Proactive Network Monitoring and Verification . . . . . . 11
4.1.1. Proactive Periodic Network Monitoring and
Verification . . . . . . . . . . . . . . . . . . . . . 12
4.1.2. Proactive Perpetual Network Monitoring and
Verification . . . . . . . . . . . . . . . . . . . . . 12
4.2. Network Performance Monitoring . . . . . . . . . . . . . . 13
5. Other Requirements . . . . . . . . . . . . . . . . . . . . . . 13
5.1. Intra-AS Requirements . . . . . . . . . . . . . . . . . . 13
5.2. Inter-AS Requirements . . . . . . . . . . . . . . . . . . 16
5.3. MTU considerations . . . . . . . . . . . . . . . . . . . . 17
5.4. Extensibility . . . . . . . . . . . . . . . . . . . . . . 18
5.5. Path Capabilities . . . . . . . . . . . . . . . . . . . . 18
5.6. Per Hop Behavior Modification . . . . . . . . . . . . . . 19
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
7. Security Considerations . . . . . . . . . . . . . . . . . . . 19
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
9.1. Informative References . . . . . . . . . . . . . . . . . . 20
9.2. Normative References . . . . . . . . . . . . . . . . . . . 20
9.3. References . . . . . . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
Amante, et al. Expires August 21, 2008 [Page 2]
Internet-Draft OAM-NG Requirements February 2008
Intellectual Property and Copyright Statements . . . . . . . . . . 22
Amante, et al. Expires August 21, 2008 [Page 3]
Internet-Draft OAM-NG Requirements February 2008
1. Introduction
Current networks make extensive use of multiple network paths to
create larger virtual links between network elements, in particular
when a single physical-layer link has exceeded its carrying capacity
and no larger bandwidth physical layer technologies exist. Operators
use various link bundling techniques, such as Link Aggregation Groups
(LAGs) and IP and MPLS Equal Cost Multi-Path (ECMP), to augment the
capacity between network elements when physical link-layer capacity
is exhausted. Existing troubleshooting tools, based on 'legacy' ping
and traceroute, are insufficient to effectively examine the
underlying component-links that traffic will use.
In addition, as more of the world's traffic converges around IP and
MPLS based networks, service providers need to extract temporally
aware traffic performance information.
This draft is NOT intended to address transport MPLS capabilities.
Transport-oriented requirements would be complimentary to the
requirements presented here.
1.1. Contributors
The following made vital contributions to this document:
Rajeev Manur, Force10 Networks, Inc.
2. Background
The use of Link Aggregate Groups (LAG's), Equal Cost Multi-Path
(ECMP) or a combination of ECMP over LAG's is a common technique used
to bond multiple parallel circuits or paths together to achieve the
appearance of a larger aggregate link between two nodes. The
advantage of these techniques, in particular LAG's, is a reduced
number of routing and signaling protocol adjacencies between devices,
reducing control plane processing overhead. A disadvantage of these
techniques is an inability to determine the individual component-link
used for traffic forwarding inside a LAG or ECMP path, specifically
for a given microflow, between two devices using traditional
traceroute or ping utilities.
A key problem related to LAG or ECMP paths is, due to inefficiencies
in LAG or ECMP load-distribution algorithms, a particular component-
link may experience congestion or a soft-failure, which would go
unnoticed by NMS systems and, likely, IP/MPLS Control Plane
protocols. The end result is performance degradation of a subset of
end-user microflows that use the affected component-links between two
Amante, et al. Expires August 21, 2008 [Page 4]
Internet-Draft OAM-NG Requirements February 2008
adjacent devices.
What is needed by operators are the following. First, and the most
immediate need, is a capability to determine the set of component-
links used by individual network elements through which traceroute or
ping messages are traversing. Second, a capability to specify an
end-user's microflow, e.g.: a 5-tuple "flow" in the case of IP
traffic, that will be used by intermediate devices to calculate the
component-link or ECMP path used for that flow to allow periodic or
perpetual performance monitoring. Ultimately, these capabilities are
necessary to both determine and exercise the actual path that is/was
used by an end-user's particular application through the network.
3. Use Cases
3.1. Types of Exercise Mechanisms
This memo classifies two types of ping and traceroute requests that
are needed in modern networks where many inter-node links consist of
LAG, ECMP or LAG over ECMP paths. First, a "traditional" or "legacy"
traceroute and ping request where intermediate devices only
understand how to use outer IP header information as the input to a
LAG or ECMP hashing algorithm. This type of mechanism has limited
utility insomuch as existing devices, interior to a Service
Provider's network, only understand how to process limited
information in traceroute or ping requests. Note that when operators
originate traceroute and/or ping sessions from within their network,
requests are sourced from devices, often routers, whose interfaces
reside within their network.
On the other hand, a "next-generation" traceroute and ping request
where intermediate devices understand new information likely
contained in the payload of the traceroute and ping request, which
can then be fed as input to the LAG or ECMP hashing algorithm. This
would allow operators to, for example, specify the exact "tuple" used
by customer traffic in order to properly exercise the LAG or ECMP
paths used by a particular customer 'flow' through the network.
3.2. Scenario 1: Traceroute through Routed Hops
Amante, et al. Expires August 21, 2008 [Page 5]
Internet-Draft OAM-NG Requirements February 2008
I1: 10.1.1.1/30 I3: 10.5.1.1/30
+------+ +------+ +------+
| |-- A1 ----------- A2 --| |-- D1 ---------- D2 --| |
| R1 |-- B1 -- LAG-1 -- B2 --| R2 | LAG-2 | R3 |
| |-- C1 ----------- C2 --| |-- E1 ---------- E2 --| |
+------+ +------+ +------+
10.1.1.2/30: I2 10.5.1.2/30: I4
Note on figures: Figures 1 through 3 represent a piece of a network
for illustrative purposes. In a real network, other nodes will be
present.
Figure 1: Traceroute through Routed Hops
In the above example, the links A1-A2, B1-B2 and C1-C2 are grouped
into a single LAG, called LAG-1, between nodes R1 and R2.
Furthermore, D1-D2 and E1-E2 are grouped into a single LAG, called
LAG-2, between nodes R2 and R3. I1 represents the IPv4 address
10.1.1.1/30 assigned to the LAG-1 interface on R1. I2 represents the
IPv4 address 10.1.1.2/30 assigned to the LAG-1 interface on R2. I3
and I4 are the IP interfaces assigned to R2 and R3, respectively, on
LAG-2. R1 and R2 will maintain a single set of routing and signaling
protocol (e.g.: IS-IS, RSVP and/or LDP), adjacencies over LAG-1,
while R2 and R3 will maintain a single set of routing and signaling
protocol adjacencies over LAG-2. Assuming the individual component
link sizes between R1, R2 and R3 are 10 Gbps, the end result is that
R1 and R2 believe they have a single 30 Gbps connection between them
and R2 and R3 believe they have a 20 Gbps connection between them.
When performing a traceroute from R1 through R2 to R3, each router
independently and automatically determines, through a proprietary LAG
or ECMP load-distribution algorithm, the outgoing component-link
inside a LAG or ECMP path to send out traceroute UDP probe packets.
Unfortunately, the details of the specific component-links are not
exposed to a user interface, which would allow operators to determine
the exact physical path used by traceroute. Furthermore, those
details cannot also be used as input to a 'ping' utility, (using ICMP
echo-request and echo-reply messages [RFC792]), to test longer term
performance of a specific physical path through the network. The end
result is a network operator may believe that a given path between
devices is behaving properly when, in fact, end-user traffic is
traversing a different set of component-links and experiencing
congestion or other link-layer forwarding problems.
3.3. Scenario 2: Traceroute through One Switched Hop
Amante, et al. Expires August 21, 2008 [Page 6]
Internet-Draft OAM-NG Requirements February 2008
I1: 10.1.1.1/30
+------+ +-------+ +------+
| |-- A1 ----------- A2 --| |-- C1 ----------- C2 --| |
| R1 | LAG-1 | SW1 |-- D1 -- LAG-2 -- D2 --| R2 |
| |-- B1 ----------- B2 --| |-- E1 ----------- E2 --| |
+------+ +-------+ +------+
10.1.1.2/30: I2
Figure 2: Traceroute through One Switched Hop
In this scenario, links A1-A2 and B1-B2 are grouped into a single 20
Gbps LAG, called LAG-1, between nodes R1 and SW1. Furthermore, links
C1-C2, D1-D2 and E1-E2 are also joined together into a single 30 Gbps
LAG, called LAG-2, between nodes SW1 and R2. I1 represents the IPv4
address 10.1.1.1/30 assigned to the LAG-1 interface on R1. I2
represents the IPv4 address 10.1.1.2/30 assigned to the LAG-2
interface on R2. As in Scenario 1, R1 and R2 will maintain a single
set of IP/MPLS routing and signaling protocol adjacencies over the
LAG's through SW1.
As in scenario 1, each device along the path R1 to SW1 to R2, (or
vice-versa), automatically and independently determines the outgoing
component-link inside a LAG or ECMP "bundle" to send out traceroute
UDP probe packets. Unfortunately, in this scenario if only the
incoming component-link interface ID is displayed to an end-user or
network operator, that will not reveal the entire physical path
traversed from R1 through SW1 to R2. This scenario highlights the
need to also show both the outgoing component-link interface ID on R1
and the incoming component-link interface ID on R2. With both of
those pieces of information, and a priori knowledge that there is
only one Layer-2 switch between R1 and R3, an operator can rely on a
"legacy" traceroute implementation to determine the actual component-
links that were used in a traceroute request.
If the operator does not have a priori knowledge that there is a
Layer-2 switch between R1 and R2, it would be useful for R1 and R2 to
include relevant Layer-2 information, learned from a Link-Layer
Discovery Protocol, on both R1 and R3 in the traceroute reply. In
this example, R1 would reply with its own outgoing component-link
name, SW1's hostname and SW1's incoming component-link name.
Furthermore, when R2 sends a traceroute reply it would respond with
its own incoming component-link name, SW1's hostname and SW1's
outgoing component-link name. This would immediately point out to an
operator the presence of one, or more, Layer-2 switches in the middle
of a Layer-3 path. Ultimately, without specific component-link
'neighbor' information, such as from a Link-Layer Discovery Protocol,
it will be difficult to rapidly determine the presence or absence of
Layer-2 switches in the interior of a Layer-3 path.
Amante, et al. Expires August 21, 2008 [Page 7]
Internet-Draft OAM-NG Requirements February 2008
It's also important to point out in this particular scenario that, at
best, SW1 only understands how to parse information in the outer IP
header of a legacy traceroute UDP probe, or other data packets, for
input into its LAG hash algorithm, which ultimately determines the
outgoing component-link it will use to send packets to R2. It would
be highly desirable that SW1 was able to intercept and act upon data
fields contained in "next-generation" traceroute and/or ping probe
packets, so that operators could specify the actual 5-tuple "flow" to
be input into SW1's LAG hash algorithm in order to exercise a
specific component-link on SW1 outbound toward R3. If this approach
is not used it would likely prevent operators from periodically or
continuously exercising a specific set of component-links through a
given edge-to-edge path on the network, such as through a proactive
network monitoring system, as discussed in Section 4.1 of this
document.
3.4. Scenario 3: Traceroute through Two, or More, Switched Hops
I1: 10.1.1.1/30
+----+ +-----+ +-----+ +----+
| |-A1-------A2-| |-C1-------C2-| |-E1---------E2-| |
| R1 | LAG-1 | SW1 | LAG-2 | SW2 |-F1- LAG-3 -F2-| R2 |
| |-B1-------B2-| |-D1-------D2-| |-G1---------G2-| |
+----+ +-----+ +-----+ +----+
10.1.1.2/30: I2
Figure 3: Traceroute through Two, or More Switched Hops
In this case, two Layer-2 switches are inserted in the path between
Layer-3 nodes R1 and R2. LAG-1 and LAG-2 are each grouped together
into their own 20 Gbps LAG. Furthermore, LAG-3, between nodes SW2
and R2, is joined together as a single 30 Gbps LAG. Finally, I1
represents the IPv4 address 10.1.1.1/30 assigned to the LAG-1
interface on R1; in addition, I2 denotes the IPv4 address 10.1.1.2/30
assigned to the LAG-2 interface on R2.
This scenario is common in Enterprise or DataCenter environments
where R1 may be a router or server, SW1 a top-of-rack distribution
switch, SW2 an aggregation switch and, finally, R2, which is a
Layer-3 router typically providing WAN connectivity.
This particular case further highlights the need to automatically
learn the presence of Layer-2 switches and, ideally, allow one to
automatically exercise their LAG hash algorithms to fully qualify the
exact set of component-links taken between two Layer-3 devices. In
order to learn the presence of Layer-2 switches, it will be necessary
for traceroute replies to also include relevant Layer-2 information,
such as the next-hop device's hostname and incoming component-link
Amante, et al. Expires August 21, 2008 [Page 8]
Internet-Draft OAM-NG Requirements February 2008
name, from a Link-Layer Discovery Protocol. In the case of "legacy"
traceroute, R1 would reply with its outgoing component-link name,
plus two pieces of information learned from a Link-Layer Discovery
Protocol: SW1's hostname and SW1's incoming component-link name.
Furthermore, when the next traceroute UDP probe is sent to R2, it
will reply with it's incoming component-link name, SW2's hostname and
SW2's outgoing component-link name. Unfortunately, this only yields
a partial solution, because it would not reveal the actual component-
link used between SW1 and SW2, nor the presence of a third Layer-2
switch between SW1 and SW2. In this instance, an operator would want
to use Layer-2 OAM tools in an attempt to identify and diagnose the
particular component-link that is used between SW1 and SW2.
Unfortunately, Layer-2 OAM tools do not have the ability to identify
or troubleshoot component-links in a 802.3ad LAG. In addition, it is
time consuming for operators to stop using Layer-2.5 (such as LSP-
Ping or LSP-Trace) or Layer-3 ping/traceroute tools, login to R1 and
R2 and use Layer-2 OAM tools to resume diagnosing the problem.
Furthermore, due to the lack of an integrated toolset, it prevents
operators from using an NMS to continuously monitor component-links
on paths that go over one or more Layer-2 switches.
Instead, what is needed by operators is integrated Layer-2 and
Layer-3 ping/traceroute tools, which allow for rapid and accurate
diagnosis and troubleshooting of LAG/ECMP problems. Ultimately, if
Layer-2 switches can intercept and act upon "next-generation"
traceroute and ping requests, that would enable operators to specify
the actual 5-tuple "flow" to be input into each Layer-2 switches' LAG
hash algorithm. This would allow operators to periodically or
continuously exercise a specific set of component-links over all
Layer-2 and Layer-3 devices, all at the same time, along a complete
edge-to-edge path through the network, as discussed in Section 4.1 of
this document.
It should be noted that the above presumes intermediate Layer-2
switches are capable of intercepting and acting upon NG-OAM probe-
requests, which may not be true initially in all environments.
Therefore, this document requires all NG-OAM solutions to document
how they will determine if intermediate Layer-2 switches are NG-OAM
capable and communicating that back to the initiator of an NG-OAM
request, in order that operators can tell if the complete path was
properly exercised.
3.5. ECMP
TBD
Amante, et al. Expires August 21, 2008 [Page 9]
Internet-Draft OAM-NG Requirements February 2008
3.6. Proxy Traceroute/Ping Functionality
To enable more rapid troubleshooting and diagnosis of problems
related to LAG, ECMP and/or asymmetric paths in a large-scale
network, it is useful to use "proxy" routers/hosts within a network
that can initiate a traceroute or ping on behalf of a Network
Monitoring System (NMS), such as via [PROXY-LSP-PING]. This is
particularly valuable in the following scenarios:
o When troubleshooting problems related to asymmetric paths, it is
useful to perform a traceroute and/or ping from a source to the
destination as well as from the destination back to the source.
o Some IP/MPLS routers use 'input interface' as input into the LAG
and/or ECMP hashing algorithm; therefore, quickly exercising the
associated direction of a particular flow through the network is
required.
o When narrowing a problem down to specific sequence of links within
the network, it is useful to rapidly focus additional testing on
suspicious segments, which are a subset of an overall edge-to-edge
path.
o Periodic monitoring of a large-scale network composed of a
multitude of LAG and/or ECMP paths. In order to divide up the
periodic testing of a large set of component-links and paths while
simultaneously providing timely results, it is useful to
distribute testing out to the IP/MPLS routers in the network on or
near the paths to be tested. (See Section 3.6 for more details).
In this scenario, there are three types of devices:
Initiator: The node which creates a proxy traceroute/ping request
with: 1) a "5-tuple" to be used as input to a LAG and/or ECMP hashing
algorithm; 2) the IP address of the Proxy IP/MPLS router that will
initiate the ping/traceroute on behalf of the Initiator; and, 3) the
IP address of the destination IP/MPLS router/host that will terminate
this ping/traceroute request.
Proxy IP/MPLS Router: The node which receives a proxy traceroute/ping
request from an Initiator. Once it has interpreted the proxy
request, it initiates a proxy ping/traceroute request from itself
toward the destination IP/MPLS router specified in the proxy ping/
traceroute request.
Proxy Request Terminator: The node(s) which terminate a proxy
traceroute/ping request received from the Proxy IP/MPLS Router. In
the case of a proxy traceroute, intermediate nodes along the path to
Amante, et al. Expires August 21, 2008 [Page 10]
Internet-Draft OAM-NG Requirements February 2008
the final destination of proxy traceroute are considered
"Intermediate Proxy Request Terminators".
A NG-OAM solution MUST support Proxy Traceroute/Ping Functionality.
A NG-OAM solution MUST support replies from the Proxy Request
Terminator (or Intermediate Proxy Request Terminators) being sent
back to the Proxy IP/MPLS Router, before they are relayed back to the
Initiator. The advantage of this approach is that replies should
follow a symmetrical path back to the Initiator, which is useful if
the NMS is behind a stateful firewall. On the other hand, an NG-OAM
solution MAY support replies from the Proxy Request Terminator (or,
Intermediate Proxy Request Terminators) directly back to the
Initiator. The advantage of this scheme is that it does not rely on
the Proxy IP/MPLS Router to cache or relay/reformat Proxy Reply
Information, before replying back to the Initiator. This may be
useful in situations where it's desirable to reduce the load on the
Proxy IP/MPLS Router.
4. Performance Monitoring
4.1. Proactive Network Monitoring and Verification
There are two forms of Proactive Network Monitoring and Verification
(PNMV): Perpetual and Periodic. In a Perpetual PNMV case, the nodes
performing monitoring send OAM messages at a specific interval, and
record the results on a perpetual basis. In the Periodic case, the
messages are sent only on demand of an external system, such as an
NMS, or an operator's command. These forms can be implementation
cases of the same solution.
Today's solutions, such as ping, traceroute, and simulated user
traffic between management nodes, can address the case when there is
a single path between two endpoints. However, in large national and
international networks, there will exist several routed hops for
certain paths through the network. Furthermore, between each pair of
IP/MPLS routers there will exist LAG's and/or ECMP paths.
Unfortunately at present, Network Monitoring Systems (NMS) are unable
to exercise the set of component-links through specific paths on the
network. This would allow the NMS to identify and notify a Network
Operations Center (NOC) to a soft-failure through one or more
component-links on the network. The NOC could then proactively
respond to the problem by, for example, quickly taking the affected
component-link(s) out-of-service or, alternatively, administratively
disabling the link bundle or ECMP path and allowing traffic to switch
to another in-service path.
The challenge with monitoring a large set of LAG and/or ECMP paths in
Amante, et al. Expires August 21, 2008 [Page 11]
Internet-Draft OAM-NG Requirements February 2008
a network will be to find the right balance between monitoring all
component-links in the network, minimizing the resource utilization
(e.g.: CPU, memory, network I/O) on the NMS system(s) while
simultaneously having a timely detection interval to allow for
proactive notification of problems to the NOC. Therefore, a solution
must be devised that allows an NMS to transmit multiple independent,
concurrent LAG and/or ECMP path test queries into various points in
the network. Within the network, Proxy IP/MPLS Routers will carry
out the test queries and report back the test results to the NMS.
A NG-OAM solution SHOULD support the ability to do Proactive
Perpetual Network Monitoring and Verification, again through the use
of Proxy Traceroute/Ping Functionality described in Section 3.5. It
should be noted that Perpetual PNMV may be more resource intensive on
devices, which is why that requirement is relaxed compared to
Periodic PNMV.
4.1.1. Proactive Periodic Network Monitoring and Verification
Periodic network monitoring is often done in response to a suspected
network event, or done as a sampled case of Perpetual network
monitoring when Perpetual network monitoring cannot be scaled to the
necessary level. Probes sent Periodically are often sent with a
shorter inter-message interval, and often request more information
than a test that runs on a Perpetual basis.
In order to perform periodic monitoring, the Initiator MUST send the
Proxy IP/MPLS Router, the number and interval of the probe requests.
For example, the Initiator may send the Proxy IP/MPLS Router a
request to run 300 consecutive probes at an interval of 500 msec
between probes.
4.1.2. Proactive Perpetual Network Monitoring and Verification
Perpetual network monitoring is done consistently among a subset of
end points in the total network. The subset, such as sample PoP
router to sample PoP router, is selected to strike a balance between
a good view of network performance and an unmaintainable set of
messages.
In order to perform perpetual monitoring, the selected monitoring and
monitored nodes must run the test, such as NG-Ping, at a set interval
and collect and store the resulting statistics.
Network Performance Monitoring, as described in section 3.7, is as
good example of the case where Perpetual PNMV is required.
An NG-OAM solution MUST offer the ability to change monitoring timing
Amante, et al. Expires August 21, 2008 [Page 12]
Internet-Draft OAM-NG Requirements February 2008
intervals. Values as low as 3.3 ms have been suggested, but are
optional. Values down to 100 ms SHOULD be supported.
4.2. Network Performance Monitoring
Network Performance Monitoring (PM, or NPM) is the art and science of
recording temporally aware network performance characteristics. A
use case for the resulting statistics is for SLA verification, in
addition to proactive maintenance.
Relevant PM characteristics are typically loss, latency and jitter.
A PM solution MUST index these characteristics to time intervals.
Knowing that 100 packets were lost, but not knowing when is not
particularly actionable. The limits of existing tools and
information often results in a NOC "clearing counters" then running a
"fast ping" for an arbitrary length of time and hoping that the error
occurs again. Keeping all results of a Perpetual PNMV test is one
possible solution, however this volume of information can be
difficult to store or to sort through when a network event is
occurring. A NG-OAM solution SHOULD provide easy-to-read,
temporally-aware, statistic that allows an operator to easily assess
the magnitude of the problem.
An example of this sort of statistic from the world of SONET/SDH
transport is the errored second, and severely errored second.
The level of granularity of PM statistics gathering SHOULD be
configurable.
5. Other Requirements
5.1. Intra-AS Requirements
The NG-OAM solution SHOULD use the same mechanism to address both the
Intra-AS (this section) and Intra-AS (Section 5.2) requirements. An
operator MUST be able to run a traceroute from one domain and through
another. The amount of information this traceroute provides may
differ depending on where the probe is originated, and what sort of
authorization it possesses to access information in other domains.
Intra-AS requirements are applicable within an Autonomous System
(AS), where all IP/MPLS devices are expected to be under a single
administrative authority. Because devices are under a single
administrative authority, copious diagnostic information that can be
returned to the Initiator of a ping/traceroute request. Ultimately,
however, an NG-OAM solution MUST ensure that extensive Intra-AS
diagnostic information is not leaked across the boundaries of the
Amante, et al. Expires August 21, 2008 [Page 13]
Internet-Draft OAM-NG Requirements February 2008
Autonomous System, since it would provide valuable network
intelligence information. In addition, it is desirable if
lightweight authentication and/or encryption techniques can be used
to secure both probe requests and replies, in order to limit the
effects of resource exhaustion on network elements that are
processing probe request/replies.
The following is a brief summary of the minimal set of information
that a NG-OAM solution is expected to address. NG-OAM solutions MAY
capture additional information through, for example, experimental or
vendor-specific objects specified in the NG OAM probe-request.
NG-OAM Probe Requests and Probe Replies MUST contain a "Query ID",
generated by the Probe Initiator, that can be used to associate Probe
Responses to Probe Requests.
Next-Gen Traceroute
o MUST work for IP and MPLS
o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
Request
o MUST be able to specify whether the IPv4 packet is a first-
fragment, or subsequent fragment, in order that intermediate
devices can adjust their LAG/ECMP calculation appropriately.
o MUST be able to specify the MPLS label stack use to identify a
"flow" across an MPLS-only portion of the network in a Probe
Request.
o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
in a Probe Request.
o MUST be able to specify a combination of label stack and IP
5-tuple, if both are used in the ECMP/LAG hash algorithm.
o MUST capture the following information in a Probe Reply:
* The specific components of Layer-2, (e.g.: Ethernet), header,
MPLS label stack and/or IP 5-tuple, that were used in the ECMP/
LAG hash algorithm at this hop
* Incoming Interface Name
* Outgoing Interface Name
Amante, et al. Expires August 21, 2008 [Page 14]
Internet-Draft OAM-NG Requirements February 2008
* Number of component-links in a bundle
* Size (Bandwidth) of individual component-links in a bundle
* Percent bandwidth utilization on interface(s)
* Remote Link-Layer neighbor name and interface name
o SHOULD be able to, on request of the source, to provide recent
performance history of the incoming or outgoing link(s)
Next-Gen Ping
o MUST work for IP and MPLS
o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
Request
o MUST be able to specify the MPLS label stack use to identify a
"flow" across an MPLS-only portion of the network in a Probe
Request.
o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
in a Probe Request.
o MUST follow the regular data-plane path for forwarding within a
network element
o MUST be able to test all links/paths concurrently, or serially,
between two network elements when operators do not know a
customer's "flow" information, which can be used as input to a LAG
and/or ECMP hash calculation.
Proxy Traceroute
o All of the requirements mentioned above for "Next-Gen Traceroute",
plus:
o The Initiator MUST be able to specify the number of Probe
Requests.
o The Initiator MAY also specify the interval between Probe
Requests, which the Proxy IP/MPLS Router is responsible for
carrying out on the Initiator's behalf.
Proxy Ping
Amante, et al. Expires August 21, 2008 [Page 15]
Internet-Draft OAM-NG Requirements February 2008
o All of the requirements mentioned above for "Next-Gen Ping", plus:
o The Initiator MUST be able to specify the number of Probe Requests
and interval between Probe Requests, which the Proxy IP/MPLS
Router is responsible for carrying out on the Initiator's behalf.
Next-Gen OAM Traceroute/Ping Probe Replies MUST capture error
conditions that were encountered during an unsuccessful Probe
Request. Those replies are expected to capture not only those
conditions defined by classic [ICMP], (e.g: Destination Unreachable
Type), but also new error conditions specific to NG-OAM solutions.
In order to seamlessly accommodate future error conditions, NG-OAM
solutions MUST use a TLV format for specifying error conditions in
Probe Replies.
Intra-AS probe requests (and probe replies) MUST be easily
identifiable in the data plane, in order that routers acting on NG-
traceroute or NG-ping requests (or replies) can rapidly drop them in
order to avoid resource exhaustion. NG-traceroute and NG-ping
solutions MUST provide configurable methods to rate-limit the number
of Intra-AS request (or reply) packets to prevent resource
exhaustion.
5.2. Inter-AS Requirements
Inter-AS requirements are applicable across administrative domains,
such as the Internet or, perhaps, several MPLS service providers
delivering a single MPLS VPN solution. Because devices are not under
a single administrative authority, only a limited amount of
diagnostic information must be returned to the Initiator of a ping/
traceroute request. This information is primarily useful in the
context of helping the responsible party pinpoint the specific
location of a problem. For example, Customer A may be experiencing
packet loss in Service Provider A's network for his Internet service.
The link between Customer A and Service Provider A consists of a ECMP
path between SP A's ASBR and Customer A's ASBR. Customer A can
perform a NG-traceroute through this ECMP path and provide the output
of NG-traceroute to SP A's NOC in order to more rapidly identify the
particular component-link, which is the causing a problem. Other
examples where this is useful are: over Internet (IPv4 or IPv6)
peering/transit links and within DataCenters from servers through to
the DataCenter provider's ASBR attached to several SP's, where MPLS
is not used.
Inter-AS probe requests (and probe replies) MUST be easily
identifiable in the data plane, in order that routers acting on NG-
traceroute or NG-ping requests (or replies) can rapidly drop them in
order to avoid resource exhaustion. NG-traceroute and NG-ping
Amante, et al. Expires August 21, 2008 [Page 16]
Internet-Draft OAM-NG Requirements February 2008
solutions MUST provide configurable methods to rate-limit the number
of Inter-AS request (or reply) packets to prevent resource
exhaustion.
Next-Gen Traceroute
o MUST work for IP and MPLS
o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
Request
o MUST be able to specify the MPLS label stack use to identify a
"flow" across an MPLS-only portion of the network in a Probe
Request.
o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
in a Probe Request.
o MUST be able to specify a combination of label stack and IP
5-tuple, if both are used in the ECMP/LAG hash algorithm.
o MUST capture the following information in a Probe Reply:
* Incoming Interface Name
* Outgoing Interface Name
Next-Gen Ping
o MUST work for IP and MPLS
o MUST be able to specify a 5-tuple IPv4 or IPv6 "flow" in a Probe
Request
o MUST be able to specify the MPLS label stack use to identify a
"flow" across an MPLS-only portion of the network in a Probe
Request.
o MUST be able to specify the Layer-2, (e.g.: Ethernet), header used
in a Probe Request.
Proxy Ping/Traceroute requirements are not applicable to Inter-AS
scenarios, since the risk of resource starvation is too large.
5.3. MTU considerations
Traceroute probes need to be kept to minimal size. Traceroute reply
PDU's should be kept to 1500 Bytes in size in order to avoid the need
Amante, et al. Expires August 21, 2008 [Page 17]
Internet-Draft OAM-NG Requirements February 2008
for IP fragmentation. It is a safe assumption that operators have a
minimum of 1500 Bytes for IP MTU, and often significantly larger.
Optionally, path MTU discovery may be used to determine a minimum
MTU. The MTU values MUST be configurable by the operator to adjust
to unanticipated conditions. A Traceroute reply packet MAY span
multiple packets.
5.4. Extensibility
It would be useful to allow for the "next-generation" traceroute and
ping protocols to contain TLV's, in order that they may be easily
extended in the future to account for additional capabilities, which
may be developed at a later point in time.
5.5. Path Capabilities
In order to be certain that NG-ping or NG-traceroute will be able to
properly exercise component-links in a LAG and/or ECMP path through
the network, it is necessary to determine if all devices along a
specific path are capable of supporting the requisite protocols and
replying with appropriate results back to the originator of the NG-
ping or NG-traceroute request. There are potentially two methods
that can be employed to determine these capabilities: 1) path
discovery; or, 2) encoding special/reserved codepoints into the
packet header of NG-OAM request/reply packets. With the first
method, the originating host/router could use a path discovery
function to determine the capabilities and properties of intermediate
and/or terminating devices prior to actually using NG-ping or NG-
traceroute to test the data path. Once the originating host/router
has learned the characteristics of intermediate and/or terminating
devices, it could then originate a NG-ping/traceroute request using
that information to exercise the actual data path.
The second method is likely to encode the NG OAM packets with
specific values in the packet header of NG-OAM request/reply packets,
(for example, via new ICMP type/codes or MPLS label values). In this
approach, the originating host/router can simply launch a NG-ping/
traceroute request allowing each intermediate and/or terminating
device to independently determine if it's capable of supporting the
NG-OAM request and, concurrently, exercising the component-links
appropriate to the LAG and/or ECMP path.
Although the latter approach has the potential disadvantage that it
may be more difficult to support on some existing hardware, this
document recognizes that it is the superior approach of the two
choices. If one depends on, for example, NG-traceroute to "discover"
characteristics of a path before allowing one to ping, it creates a
Amante, et al. Expires August 21, 2008 [Page 18]
Internet-Draft OAM-NG Requirements February 2008
circular dependency. Specifically, in the case where one is doing
perpetual pings and the underlying path changes for legitimate
reasons, the NG-OAM would have to discover the change to the path,
trigger a new NG-traceroute and then resume perpetual pings along the
new path. Note that a change to the existing path could consist of
any of the following: 1) a component-link in a LAG goes down, yet,
the LAG itself remains operational, (e.g.: a 10x LAG goes to a 9x
LAG), ultimately changing the result of LAG hashing algorithm; or, 2)
the entire LAG and/or ECMP path goes down and data packets are routed
along an alternate path. Ultimately, if each NG-OAM packet is a
self-contained, autonomous OAM unit, then each intermediate and/or
terminating device will act on it appropriately.
Therefore, this document specifies that a NG-OAM solution MUST
support the second method, autonomous OAM units, outlined above. NG-
OAM solutions MAY support the first method, to provide short-term NG
OAM coverage with existing hardware.
5.6. Per Hop Behavior Modification
Modification of per-hop behavior in order to support NG-OAM is
acceptable, but not required of NG-OAM solutions. This allows
solutions where intermediate routers have to look at something new to
determine if they are looking at an OAM packet, or to determine if
they are they target or Proxy of a NG-OAM request.
6. IANA Considerations
This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an
RFC.
7. Security Considerations
Devices MUST rate-limit the amount traceroute and/or ping traffic
they process to avoid DoS attacks. Those rate-limits MUST be
configurable to suit the appropriate environment in which they are
deployed. An attacker must not be allowed to force an inordinate
amount of traceroute and/or ping traffic down a single physical
component-link causing congestion. Therefore, devices MUST rate-
limit the amount of "external" traceroute and/or ping traffic through
any specific component-link or set of component-links. Note,
implementations SHOULD provide exceptions that to allow a network
operators Intra-Domain traceroute and/or ping traffic, particularly
for performance monitoring, to get through without interference by
Amante, et al. Expires August 21, 2008 [Page 19]
Internet-Draft OAM-NG Requirements February 2008
rate-limiters.
A lightweight authentication method SHOULD be provided by an NG-OAM
solution. This mechanism can be used to defend against DoS or
insertion attacks from other systems spoofing NG-OAM information.
This can also be used in a reply message to defend against a "SLA
Violation" attack where a malicious system could make it appear as if
an operator's network has violated the SLA, when, in fact, they have
not.
8. Acknowledgements
The authors would like to thank Nitin Bahadur, Ping Pan, Nasser El-
Aawar, Dimitri Papadimitriou for their reviews and thoughtful
feedback.
9. References
9.1. Informative References
[BFD-BASE]
"draft-ietf-bfd-base-07.txt - Bidirectional Forwarding
Detection", January 2008.
[LLDP] "IEEE Standard - 802.1AB-2005", May 2005.
[LMP] "RFC 4204 - Link Management Protocol", October 2005.
[PROXY-LSP-PING]
George Swallow and Vanson Lim, "Proxy LSP Ping,
draft-ietf-mpls-remote-lsp-ping-01.txt", November 2007.
[RSVP-DIAG]
"RFC 2745 - RSVP Diagnostic Messages", January 2000.
9.2. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.3. References
[RFC 792] "Internet Control Message Protocol", 2005.
Amante, et al. Expires August 21, 2008 [Page 20]
Internet-Draft OAM-NG Requirements February 2008
Authors' Addresses
Shane Amante
Level 3 Communications, LLC
1025 Eldorado Blvd
Broomfield, CO 80021
Email: shane.amante@level3.com
Alia Atlas
BT
Email: alia.atlas@bt.com
Andrew Lange
Alcatel-Lucent
Email: andrew.lange@alcatel-lucent.com
Danny McPherson
Arbor Networks, Inc.
Email: danny@arbot.net
Amante, et al. Expires August 21, 2008 [Page 21]
Internet-Draft OAM-NG Requirements February 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Amante, et al. Expires August 21, 2008 [Page 22]