A Connectivity Monitoring Metric for IPPM
draft-ietf-ippm-connectivity-monitoring-01
The information below is for an old version of the document.
| Document | Type |
This is an older version of an Internet-Draft whose latest revision state is "Active".
|
|
|---|---|---|---|
| Author | Ruediger Geib | ||
| Last updated | 2021-02-22 (Latest revision 2020-12-23) | ||
| Replaces | draft-geib-ippm-connectivity-monitoring | ||
| RFC stream | Internet Engineering Task Force (IETF) | ||
| Formats | |||
| Additional resources | Mailing list discussion | ||
| Stream | WG state | WG Document | |
| Document shepherd | (None) | ||
| IESG | IESG state | I-D Exists | |
| Consensus boilerplate | Unknown | ||
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-ietf-ippm-connectivity-monitoring-01
ippm R. Geib, Ed.
Internet-Draft Deutsche Telekom
Intended status: Standards Track February 22, 2021
Expires: August 26, 2021
A Connectivity Monitoring Metric for IPPM
draft-ietf-ippm-connectivity-monitoring-01
Abstract
Within a Segment Routing domain, segment routed measurement packets
can be sent along pre-determined paths. This enables new kinds of
measurements. Connectivity monitoring allows to supervise the state
and performance of a connection or a (sub)path from one or a few
central monitoring systems. This document specifies a suitable
type-P connectivity monitoring metric.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 26, 2021.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Geib Expires August 26, 2021 [Page 1]
Internet-Draft Abbreviated Title February 2021
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5
2. A brief segment routing connectivity monitoring framework . . 5
3. Topology and measurement loop set up requirements . . . . . . 9
3.1. General network topology requirements . . . . . . . . . . 9
3.2. Sub-path Monitoring measurement loop routing requirements 10
3.3. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4. Generic Type-P-SR-Path-Periodic-* metric . . . . . . . . . . 11
4.1. Metric Name . . . . . . . . . . . . . . . . . . . . . . . 12
4.2. Generic Metric Parameters . . . . . . . . . . . . . . . . 12
4.3. Metric Units . . . . . . . . . . . . . . . . . . . . . . 12
5. Singleton Definition for Type-P-SR-Path-Periodic-Delay . . . 12
5.1. Metric Name . . . . . . . . . . . . . . . . . . . . . . . 12
5.2. Metric Parameters . . . . . . . . . . . . . . . . . . . . 12
5.3. Delay Metric Units . . . . . . . . . . . . . . . . . . . 12
5.4. Definition . . . . . . . . . . . . . . . . . . . . . . . 13
5.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 13
5.6. Methodologies . . . . . . . . . . . . . . . . . . . . . . 13
5.7. Errors and Uncertainties . . . . . . . . . . . . . . . . 13
5.8. Reporting the metric . . . . . . . . . . . . . . . . . . 13
6. Definition of Samples for Type-P-SR-Path-Periodic-Delay . . . 13
6.1. Generic Type-P-SR-Path-Periodic-Delay-* metric . . . . . 13
6.1.1. Metric Name . . . . . . . . . . . . . . . . . . . . . 14
6.1.2. Metric Parameters . . . . . . . . . . . . . . . . . . 14
6.1.3. Metric Units . . . . . . . . . . . . . . . . . . . . 14
6.1.4. Metric Defintion . . . . . . . . . . . . . . . . . . 14
6.1.5. Discussion . . . . . . . . . . . . . . . . . . . . . 14
6.1.6. Errors and uncertainties . . . . . . . . . . . . . . 14
6.2. Definition of Type-P-SR-Path-Periodic-Delay-Stream . . . 14
6.2.1. Metric Name . . . . . . . . . . . . . . . . . . . . . 15
6.3. Definition of Type-P-SR-Path-Periodic-Delay-Variation . . 15
6.3.1. Metric Name . . . . . . . . . . . . . . . . . . . . . 15
6.3.2. Methodologies . . . . . . . . . . . . . . . . . . . . 15
6.3.3. Discussion of SRDV . . . . . . . . . . . . . . . . . 15
6.3.4. Errors and uncertainties . . . . . . . . . . . . . . 15
6.4. Definition of Type-P-SR-Path-Periodic-Delay-Variation-
Stream . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.4.1. Metric Name . . . . . . . . . . . . . . . . . . . . . 15
6.4.2. Metric Defintion . . . . . . . . . . . . . . . . . . 16
7. Statistic Definitions for SR-Path-Periodic-*-Stream samples . 16
7.1. SR-Path-Periodic-*-Mean . . . . . . . . . . . . . . . . . 16
7.2. SR-Path-Periodic-*-Std . . . . . . . . . . . . . . . . . 16
8. Sub-Path monitoring metrics derived from samples captured
Geib Expires August 26, 2021 [Page 2]
Internet-Draft Abbreviated Title February 2021
along the measurement loops . . . . . . . . . . . . . . . . . 17
8.1. Baseline measurement . . . . . . . . . . . . . . . . . . 17
8.2. Discussion of the baseline measurement . . . . . . . . . 18
8.3. Definition of SR-Path-Sub-Path-RTD-Estimate . . . . . . . 18
8.4. Definition of SR-Path-Sub-Path-*-Changepoint . . . . . . 19
8.5. Discussion of SR-Path-Sub-Path-*-Changepoint . . . . . . 19
8.6. Definition of SR-Path-Sub-Path-Congestion-Location . . . 20
8.7. Discussion of SR-Path-Sub-Path-*-Location . . . . . . . . 21
9. Discussion of Temporal Resolution . . . . . . . . . . . . . . 21
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
11. Security Considerations . . . . . . . . . . . . . . . . . . . 22
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
12.1. Normative References . . . . . . . . . . . . . . . . . . 22
12.2. Informative References . . . . . . . . . . . . . . . . . 23
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 24
1. Introduction
Within a Segment Routing domain, measurement packets can be sent
along pre-determined segment routed paths [RFC8402]. A segment
routed path may consist of pre-determined sub paths, specific router-
interfaces or a combination of both. A measurement path may also
consist of sub paths spanning multiple routers, given that all
segments to address a desired path are available and known at the SR
domain edge interface.
A Path Monitoring System or PMS (see [RFC8403]) is a dedicated
central Segment Routing (SR) domain monitoring device (as compared to
a distributed monitoring approach based on router-data and -functions
only). Monitoring individual sub-paths or point-to-point connections
is executed for different purposes. IGP exchanges hello messages
between neighbors to keep alive routing and swiftly adapt routing to
topology changes. Network Operators may be interested in monitoring
connectivity and congestion of interfaces or sub-paths at a timescale
of seconds, minutes or hours. In both cases, the periodicity is
significantly smaller than commodity interface monitoring based on
router counters, which may be collected on a minute timescale to keep
the processor- or monitoring data-load low.
The IPPM architecture was a first step to that direction [RFC2330].
Commodity IPPM solutions require dedicated measurement systems, a
large number of measurement agents and synchronised clocks.
Monitoring a domain from edge to edge by commodity IPPM solutions
increases scalability of the monitoring system. But localising the
site of a detected change in network behaviour may then require
network tomography methods.
Geib Expires August 26, 2021 [Page 3]
Internet-Draft Abbreviated Title February 2021
The IPPM Metrics for Measuring Connectivity offer generic
connectivity metrics [RFC2678]. These metrics allow to measure
connectivity between end nodes without making any assumption on the
paths between them. The metric and the type-p packet specified by
this document follow a different approach: they are designed to
monitor connectivity and performance of a specific single link or a
path segment. The underlying definition of connectivity is partially
the same: a packet not reaching a destination indicates a loss of
connectivity. An IGP re-route may indicate a loss of a link, while
it might not cause loss of connectivity between end systems. The
metric specified here detects a link-loss, if the change in end-to-
end delay along a new route is differing from that of the original
path.
A Segment Routing PMS is part of an SR domain. The PMS is IGP
topology aware, covering the IP and (if present) the MPLS layer
topology [RFC8402]. This allows to steer PMS measurement packets
along arbitrary pre-determined concatenated sub-paths, identified by
suitable Segment IDs. Basically, the SR connectivity metric as
specified by this document requires set up of a number of
constrained, overlaid measurement loops (or measurement paths). The
delay of the packets sent along each of these measurement loops is
measured. A single congested interface or a single loss of
connectivity of a monitored sub-path cause a delay change on several
measurement paths. Any single evnet of that type on one of the
monitored sub-paths changes delays of a unique subset of measurement
loops. The number of measurement loops may be limited to one per
sub-path (or connection) to be monitored, if a hub-and-spoke like
sub-path topology as described below is monitored. In addition to
information revealed by a commodity ICMP ping measurement, the
metrics and methods specified here identify the location of a
congested interface. To do so, tomography assumptions and methods
are combined to first plan the overlaid SR measurement loop set up
and later on to evaluate the captured delay measurements.
There's another difference as compared with commodity ping: the
measurement loop packets remain in the data plane of passed routers.
These need to forward the measurement packets without additional
processing apart from that.
It is recommended to consider automated measurement loop set-ups.
The methods proposed here are error-prone if the topology and
measurement loop design isn't followed properly. While details of an
automated set-up are not within scope of this document, some formal
defintions of constraints to respected are given.
This document specifies a type-p metric determining properties of an
SR path which allows to monitor connectivity and congestion of
Geib Expires August 26, 2021 [Page 4]
Internet-Draft Abbreviated Title February 2021
interfaces and further allows to locate the path or interface which
caused a change in the reported type-p metric. This document is
limited to the Segment Routing MPLS layer, but the methodology may be
applied within SR domains or MPLS domains in general.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. A brief segment routing connectivity monitoring framework
The Segment Routing IGP topology information consists of the IP and
(if present) the MPLS layer topology. The minimum SR topology
information consists of Node-Segment-Identifiers (Node-SID),
identifying an SR router. The IGP exchange of Adjacency-SIDs
[RFC8667], which identify local interfaces to adjacent nodes, is
optional. It is RECOMMENDED to distribute Adj-SIDs in a domain
operating a PMS to monitor connectivity as specified below. If Adj-
SIDs aren't availbale, [RFC8287] provides methods how to steer
packets along desired paths by the proper choice of an MPLS Echo-
request IP-destination address. A detailed description of [RFC8287]
methods as a replacement of Adj-SIDs is out of scope of this
document.
An active round trip measurement between two adjacent nodes is a
simple method to monitor connectivity of a connecting link. If
multiple links are operational between two adjacent nodes and only a
single one fails, a single plain round trip measurement may fail to
notice that or identify which link has failed. A round trip
measurement also fails to identify which interface is congested, even
if only a single link connects two adjacent nodes.
Segment Routing enables the set-up of extended measurement loops.
Several different measurement loops can be set up to form a partial
overlay. If done properly, any network change impacts more than a
single measurement loop's round trip delay (or causes drops of
packets of more than one loop). Randomly chosen measurement loop
paths including the interfaces or paths to be monitored may fail to
produce the desired unique result patterns, hence commodity network
tomography methods aren't applicable here [CommodityTomography]. The
approach pursued here uses a pre-specified measurement loop overlay
design.
A centralised monitoring approach doesn't require report collection
and result correlation from two (or more) receivers (the measured
delays of different measurement loops still need to be correlated).
Geib Expires August 26, 2021 [Page 5]
Internet-Draft Abbreviated Title February 2021
An additional property of the measurement path set-up specified below
is that it allows to estimate the packet round trip and the one way
delay of a monitored sub-path. The delay along a single link is not
perfectly symmetric. Packet processing causes small delay
differences per interface and direction. These cause an error, which
can't be quantified or removed by the specified method. Quantifying
this error requires a different measurement set-up. As this will
introduce additional measurements loops, packets and evaluations, the
cost in terms of reduced scalability is not felt to be worth the
benefit in measurement accuracy. IPPM metrics prefer precision to
accuracy and the mentioned processing differences are relatively
stable, resulting in relatively precise delay estimates for each
monitored sub-path.
An example hub and spoke network, operated as SR domain, is shown
below. The included PMS shown is supposed to monitor the
connectivity of all 6 links (a very generic kind of sub-path)
attaching the spoke-nodes L050, L060 and L070 to the hub-nodes L100
and L200.
+---+ +----+ +----+
|PMS| |L100|-----|L050|
+---+ +----+\ /+----+
| / \ \_/_____
| / \ / \+----+
+----+/ \/_ +----|L060|
|L300| / |/ +----+
+----+\ / /\_
\ / / \
\+----+ / +----+
|L200|-----|L070|
+----+ +----+
Hub and spoke connectivity verification with a PMS
Figure 1
The SID values are picked for convenient reading only. Node-SID: 100
identifies L100, Node-SID: 300 identifies L300 and so on. Adj-SID
10050: Adjacency L100 to L050, Adj-SID 10060: Adjacency L100 to L060,
Adj-SID 60200: Adjacency L60 to L200 and so on (note that the Adj-SID
are locally assigned per node interface, meaning two per link).
Monitoring the 6 links between hub nodes Ln00 (where n=1,2) and spoke
nodes L0m0 (where m=5,6,7) requires 6 measurement loops, which have
the following properties:
Geib Expires August 26, 2021 [Page 6]
Internet-Draft Abbreviated Title February 2021
o Each measurement loop follows a single round trip from one hub
Ln00 to one spoke L0m0 (e.g., between L100 and L050).
o Each measurement loop passes two more links: one between the same
hub Ln00 and another spoke L0m0 and from there to the alternate
hub Ln00 (e.g., between L100 and L060 and then from L060 to L200)
o Every monitored link is passed by a single round trip measurement
loop only once and further only once unidirectional by two other
loops. These unidirectional mearurement loop sections forward
packets in opposing direction along the monitored link. In the
end, three measurement loops pass each single monitored link (sub-
path). In figure 1, e.g., one measurement loop having a round
trip L100 to L050 and back (M1, see below), a second loop passing
L100 to L050 only (M3) and a third loop passing L050 to L100 only
(M6).
Note that any 6 links connecting two to five nodes can be monitored
that way too. Further note that the measurement loop overlay chosen
is optimised for 6 links and a hub and spoke topology of two to five
nodes. The 'one measurement loop per measured sub-path' paradigm
only works under these conditions.
The above overlay scheme results in 6 measurement loops for the given
example. The start and end of each measurement loop is PMS to L300
to L100 or L200 and a similar sub-path on the return leg. These
parts of the measurement loops are omitted here for brevity (some
discussion may befound below). The following delays are measured
along the SR paths of each measurement loop:
1. M1 is the delay along L100 -> L050 -> L100 -> L060 -> L200
2. M2 is the delay along L100 -> L060 -> L100 -> L070 -> L200
3. M3 is the delay along L100 -> L070 -> L100 -> L050 -> L200
4. M4 is the delay along L200 -> L050 -> L200 -> L060 -> L100
5. M5 is the delay along L200 -> L060 -> L200 -> L070 -> L100
6. M6 is the delay along L200 -> L070 -> L200 -> L050 -> L100
An example for a stack of a loop consisting of Node-SID segments
allowing to caprture M1 is (top to bottom): 100 | 050 | 100 | 060 |
200 | PMS.
An example for a stack of Adj-SID segments the loop resulting in M1
is (top to bottom): 100 | 10050 | 50100 | 10060 | 60200 | PMS. As
Geib Expires August 26, 2021 [Page 7]
Internet-Draft Abbreviated Title February 2021
can be seen, the Node-SIDs 100 and PMS are present at top and bottom
of the segment stack. Their purpose is to transport the packet from
the PMS to the start of the measurement loop at L100 and return it to
the PMS from its end.
The Evaluation of the measurement loop Round Trip Delays M1 - M6
allow to detect the follwing state-changes of the monitored sub-
paths:
o If the loops are set up using Node-SIDs only, any single complete
loss of connectivity caused by a failing single link between any
Ln00 and any L0m0 node briefly disturbs (and changes the measured
delay) of three loops. The traffic to the Node-SIDs is rerouted
(in the case of a single links loss, no node is completely
disconnected in the example network).
o If the loops are set up using Adj-SIDs only, any single complete
loss of connectivity caused by a failing single link between any
Ln00 and any L0m0 node terminates the traffic along three
measurement loops. The packets of all three loops will be
dropped, until the link gets back into service. Traffic to Adj-
SIDs is not rerouted. Note that Node-SIDs may be used to foward
the measurement packets from the PMS to the hub node, where the
first sub-path to be monitored begins and from the hub node,
receiving the measurement from the last monitored sub path, to the
PMS.
o Any congested single interface between any Ln00 and any L0m0 node
only impacts the measured delay of two measurement loops.
o As an example, the formula for a single link (sub-path) Round Trip
Delay (RTD) is shown here 4 * RTD_L100-L050-L100 = 3 * M1 + M3 +
M6 - M2 - M4 - M5. This formula is reproducible for all other
links: sum up 3*RTD measured along the loop passing the monitored
link of interest in round trip fashion, and add the RTDs of the
two measurement loops passing the link of interest only in a
single direction. From this sum subtract the RTD measured on all
loops not passing the monitored link of interest to get four times
the RTD of the monitored link of interest.
A closer look reveals that each single event of interest for the
proposed metric, which are a loss of connectivity or a case of
congestion, uniquely only impacts a single set of measurement loops
which can be determined a-priori. If, e.g., connectivity is lost
between L200 and L050, measurement loops (3), (4) and (6) indicate a
change in the measured delay.
Geib Expires August 26, 2021 [Page 8]
Internet-Draft Abbreviated Title February 2021
As a second example: if the interface L070 to L100 is congested,
measurement loops (3) and (5) indicate a change in the measured
delay. Without listing all events, all cases of single losses of
connectivity or single events of congestion influence only delay
measurements of a unique set of measurement loops.
Assume that the measurement loops are set up while there's no
congestion. In that case, the congestion free RTDs of all monitored
links can be calculated as shown above. A single congestion event
adds queuing delay to the RTD measured by two specific measurement
loops. The two measurement loops impacted allow to distinct the
congested interface and calculation of the queue-depth in terms of
seconds. As an example, assume a queue of an average depth of 20 ms
to build up at interface L200 to L070 after the uncongested
measurement interval T0. The measurement loops M5 and M6 are the
only ones passing the interface in that direction. Both indicate a
congestion M5 and M6 of + 20 ms during measurement interval T1, while
M1-4 indicate no change. The location of the congested interface is
determined by the combination of the two (and only two) measurement
loops M5 and M6 showing an increased delay. The average queue depth
= ( M5[T1] - M5[T0] + M5[T1] - M5[T0] )/2.
As mentioned there's a constant delay added for each measurement
loop, which is the delay of the path transversed from PMS -> L100 +
L200 -> PMS. Please note, that this added delay is appearing twice
in the formula resulting in the monitored link delay estimate of the
example network. Then it is the RTD PMS -> L100 + RTD L200 -> PMS.
Both RTDs can be directly measured by two additional measurements
Cor1 = RTD ( PMS -> L100 -> PMS) and Cor2 = RTD (PMS -> L200 -> PMS).
The monitored link RTD formula was linkRTDuncor = 3*Mx + My + Mz - Ms
- Mt - Mu. The correct 4*linkRTDx = 4*linkRTDxuncor - Cor1 - Cor2.
If the interface between PMS and L100/L200 is congested, all
measurements loops M1-M6 as well as Cor1 and Cor2 will see a change.
A congested interface of a monitored link doesn't impact the RTDs
captured by Cor1 and Cor2.
The measurement loops may also be set up between hub nodes L100 and
L200, if that's preferred and supported by the nodes. In that case,
the above formulas apply without correction.
3. Topology and measurement loop set up requirements
3.1. General network topology requirements
The metric and methods specified below can be applied to monitor
networks or sub-paths forming a hub and spoke topology. A single
sub-path status change of type loss of connectivity or congestion can
Geib Expires August 26, 2021 [Page 9]
Internet-Draft Abbreviated Title February 2021
be detected. The nodes don't have to act as hubs or spokes, this
terminology is only chosen to describe a topology requirement. In
detail, the topology to be monitored MUST meet the following
constraints:
o The SR domain sub-paths to be monitored create a hub and spoke
topology with a PMS connected to all hub nodes. The PMS may
reside in a hub.
o Exactly 6 (six) sub-paths are monitored.
o The monitored sub-paths connect at least two and no more than 5
nodes.
o Every spoke node MUST have at least one path to every hub node.
o Every spoke node MUST at least be connected to one (or more) hub
node(s) by two monitored sub-paths.
o Sub-paths between spokes can't be monitored and therefore are out
of scope (the overlay measurement loops can't be set up as
desired).
Shared resources, like a Shared Risk Link Group (e.g., a single fiber
bundle) or a shared queue passed by several logical links need to be
considered during set up. Shared resources may either be desired or
to be avoided. As an example, if a set of logical links share one
parental scheduler queue, it is sufficient to monitor a single
logical connection to monitor the state of that parental scheduler.
3.2. Sub-path Monitoring measurement loop routing requirements
The methodologies sepcified by this document REQUIRE a measurement
loop path overlay of all path delay measurement streams Fi, i in [1,
2...6] as defined in this section. In the follwing, a path delay
measurement stream Fi is called measurement (loop) Fi for brevity.
o Define the segment routed Sub-paths SPi, i in [1, 2...6] to be
monitored. The Sub-paths SPi SHOULD not share resources, if the
operator isn't aware of the impact of the shared resources on the
measurement loops Fi and the methodologies defined below. The
Sub-path SPi topology SHOULD respect the general network topology
requirements as specified above.
o Set up i = 1, 2...6 measurement loops Fi thus that measurement Fi
passes SPi and only SPi bidirectional (or by a round-trip) from
Hub to Spoke and back. Note that the correspondance of SPi and Fi
Geib Expires August 26, 2021 [Page 10]
Internet-Draft Abbreviated Title February 2021
isn't strictly required. Measurement Fi thus however appears in
all methodologies calculating a metric related to SPi.
o Set up the SR path per measurement loops Fj and Fk thus that SPi
is passed by exactly one other measurement loop Fj unidirectional
in direction Hub to Spoke and by exactly one other measurement
loop Fk unidirectional in the opposite direction (Spoke to Hub).
The measurement loop Fi != Fj != Fk. As a description, one
measurement loop Fj pass SPi in "downstream" direction from Hub to
Spoke, whereas measurement loop Fk passes SPi in "upstream"
direction from Spoke to Hub.
o Set up each segment routed measurement loop path Fi thus that it
passes SPi bidirectional as specified above, SPj unidirectional
from Hub to Spoke and SPk unidirectional from Spoke to Hub. The
monitored Sub-path SPi MUST NOT be equal to SPj and MUST NOT be
equal to SPk.
o The measurement loop set up to monitor all Sub-paths SPi is
completed, if:
o
* Each Sub-path SPi is passed by exactly three measurements loops
Fi, Fj and Fk as specified above.
* Each segment routed measurement loop path Fi passes exactly
three concatenated Sub-paths SPi, SPm and SPn as specified
above (indices m and n are chosen here only to avoid
misconceptions which may result from picking indices j and k
already appearing before - equality of j and k with either m
and n is neither excluded nor required).
3.3. Path
This document specifies sub-path monitoring within a closed domain by
a controlled and pre-designed measurement loop set-up. The path
traversed by the packet SHOULD be reported, as detecting data plane
forwarding in line with the desired measurement loop set-up is
essential for the metric to enable and verify accurate evaluation.
See [RFC8287] for SR MPLS OAM and
[ID.draft-ietf-6man-spring-srv6-oam] for SRv6 OAM.
4. Generic Type-P-SR-Path-Periodic-* metric
To reduce the redundant information presented in the detailed metrics
sections that follow, this section presents the specifications that
are common to two or more metrics. The section is organized using
Geib Expires August 26, 2021 [Page 11]
Internet-Draft Abbreviated Title February 2021
the same subsections as the individual metrics, to simplify
comparisons.
4.1. Metric Name
All metrics use the Type-P convention as described in [RFC2330]. The
rest of the name is unique to each metric.
4.2. Generic Metric Parameters
Refer to section 3.2. Metric Parameters: Type-P-* of [RFC6673]. The
following parameters are added, enhanced or removed:
Dst SHOULD be a diagnostic IP address as specified by [RFC8287]
and [RFC8029], if MPLS OAM is operated to capture the metric.
Fi, where i in [1, 2...6], a selection function defining
unambiguously a packet of one particular stream i forming part of
the monitoring overlay measurement loop set up.
L, a packet length in bits. The packets of all Type-P-SR-Path-
Delay-Periodic-Streams Fi SHOULD all be of the same length.
MLAi, a stack of Segment IDs determining a monitoring loop Fi.
The Segment-IDs MUST be chosen so that a singleton type-p packet
of selection function Fi passes the sub-path i to be monitored.
No support: lambda (Poisson Streams remain ffs.)
4.3. Metric Units
Refer to section 3.4. Metric Units: Type-P-* of [RFC6673].
5. Singleton Definition for Type-P-SR-Path-Periodic-Delay
5.1. Metric Name
Type-P-SR-Path-Periodic-Delay
5.2. Metric Parameters
See section Section 4.2.
5.3. Delay Metric Units
A sequence of consecutive time values. The value of a Type-P-SR-
Path-Periodic-Delay is either a real number or an undefined
Geib Expires August 26, 2021 [Page 12]
Internet-Draft Abbreviated Title February 2021
(informally, infinite) number of seconds per singleton of each stream
Fi.
5.4. Definition
Section 3.4 of [RFC7679] applies per singleton of each stream Fi.
The additional information related to singletons of section 4.2.4 of
[RFC3432] applies too.
5.5. Discussion
See section 3.5 of [RFC7679]. One generalisation seems appropriate:
a global satellite navigation system affords one way to achieve
synchronization within usec.
5.6. Methodologies
Section 3.6 of [RFC7679] applies per stream Fi with one exception: at
the Src host, select Src and Dst IP addresses, if IP-routing is
applied, or select the proper functional IP-destination address if an
[RFC8287] SR MPLS OAM packet format is applied. Further add the
appropriate stack of Segment IDs MLAi determining the monitoring loop
Fi and form a test packet of Type-P with these addresses and the
segment stack.
5.7. Errors and Uncertainties
See section 3.7 of [RFC7679] and section 4.6 of [RFC3432].
5.8. Reporting the metric
See section 3.8 of [RFC7679].
6. Definition of Samples for Type-P-SR-Path-Periodic-Delay
This sections defines metric samples and metrics derived from
samples.
6.1. Generic Type-P-SR-Path-Periodic-Delay-* metric
To reduce the redundant information presented in the detailed metrics
sections that follow, this section presents the specifications that
are common to two or more metrics. The section is organized using
the same subsections as the individual metrics, to simplify
comparisons.
Geib Expires August 26, 2021 [Page 13]
Internet-Draft Abbreviated Title February 2021
6.1.1. Metric Name
Type-P-SR-Path-Periodic-Delay-*
6.1.2. Metric Parameters
Src, the IP address of a host
Dst, the IP address of a host
MLAi, a stack of Segment IDs
T0, a time
Tf, a time
incT, a time
6.1.3. Metric Units
See section Section 5.3.
6.1.4. Metric Defintion
Given T0 and Tf and nominal inter-packet interval incT, those time
values greater than or equal to T0 and less than or equal to Tf are
then selected. At each of the selected times in this process, we
obtain one value of Type-P-SR-Path-Periodic-Delay. The value of the
sample is the sequence made up of the resulting [time, delay] pairs.
If there are no such pairs, the sequence is of length zero and the
sample is said to be empty.
6.1.5. Discussion
See section 4.4 of [RFC3432].
6.1.6. Errors and uncertainties
See section 4.6 of [RFC3432].
6.2. Definition of Type-P-SR-Path-Periodic-Delay-Stream
The only definition required for this metric is a unique metric name.
Geib Expires August 26, 2021 [Page 14]
Internet-Draft Abbreviated Title February 2021
6.2.1. Metric Name
Type-P-SR-Path-Periodic-Delay-Stream
6.3. Definition of Type-P-SR-Path-Periodic-Delay-Variation
The smallest sample Type-P-SR-Path-Periodic-Delay-Stream is one of
two consecutively received values. These may be used to calculate a
Segment Routed Path Delay-Variation (SRDV) singleton, defined below.
6.3.1. Metric Name
Type-P-SR-Path-Periodic-Delay-Variation
6.3.2. Methodologies
SRDV[i,j], for each sample of packets j and j-1 of stream Fi, j > 1,
the delay variation between successive packets is calculated as:
SRDV[i,j] = Delay[i,j] - Delay [i,j-1],
j in [2,3...N] and N the total number of packets received at Dst. If
one or more of the M packets sent by Src are lost, they are ignored
for the metric, as no reasonable metric value is defined here. If N
> 1, the metric is calculated for every valid packet received and the
preceding one.
6.3.3. Discussion of SRDV
Evaluation statistics of differential SRDV metric samples may help to
identify issues.
6.3.4. Errors and uncertainties
See section 2.7 of [RFC3393].
6.4. Definition of Type-P-SR-Path-Periodic-Delay-Variation-Stream
The only definition required for this metric is a unique metric name.
6.4.1. Metric Name
Type-P-SR-Path-Periodic-Delay-Variation-Stream
Geib Expires August 26, 2021 [Page 15]
Internet-Draft Abbreviated Title February 2021
6.4.2. Metric Defintion
Given T0 and Tf, those time values greater than or equal to T0 and
less than or equal to Tf are then selected. At each of the selected
times in this process, we obtain one value of Type-P-SR-Path-
Periodic-Delay. The value of the sample is the sequence made up of
the resulting [time, delay-variation] pairs with time being set to
the Dst timestamp of the Delay-Variation singleton, for which a valid
singleton is calculated. If there are no such pairs, the sequence is
of length zero and the sample is said to be empty. If N Delay
singletons are captured and sampled N-1 Delay-Variation singletons
are sampled during the same interval
7. Statistic Definitions for SR-Path-Periodic-*-Stream samples
Change point detection requires statistical defintions. These are
provided below. The names of the statistics contain an "*"
placeholder, which may be replaced by "Delay" or "Delay-Variation"
[Editor note: a "Loss" metric remains tbd].
7.1. SR-Path-Periodic-*-Mean
For a type-p metric, the mean is specified by:
SR-*Mean = (1/N) * Sum(from i=1 to N, value[i])
o N sample size
o value sample value of a sampled [time, value] pair
7.2. SR-Path-Periodic-*-Std
For a type-p metric, the Standard-Deviation Std is specified by:
SR-*Std = [1/(N-1)] * Sum(from i=1 to N, [SR-*Mean - value[i]]^2 )
o N sample size
o value sample value of a sampled [time, value] pair
o SR-*Mean sample mean of the same metric as defined above
The definition as given above requires a two-pass calculation per
sample. Algorithms estimating the standard-deviation by one-pass
calculation have been published and might be preferable, if metric
singletons and samples aren't buffered or calculations need to be
fast.
Geib Expires August 26, 2021 [Page 16]
Internet-Draft Abbreviated Title February 2021
8. Sub-Path monitoring metrics derived from samples captured along the
measurement loops
To produce meaningful sub-path monitoring values, the measurement
loop metrics are captured during a phase with stable networking
conditions. In a backbone network domain, the absence of congestion
often is a sufficient condition (frequent traffic shifts due to
changes in routing and traffic engineering aren't expected). This
may be different in a network based on a shared medium. It may be
outright difficult in networks with frequently changing traffic
management- and routing-policies.
In the following, the index CS indicates a statistic captured during
a mesurement interval with stable routing and no congestion.
8.1. Baseline measurement
Capture a sample of delay values Type-P-SR-Path-Periodic-Delay-Stream
of sample size N for each measurment loop Fi. As a rule of thumb
choose N in [30, 100].
For each measurement loop Fi, calculate the following metrics
characterising the monitored Sub-Paths during stable and congestion
free network conditions:
o SR-Path-Delay-MeanCSi, the mean delay captured along measurement
loop Fi
o SR-Path-Delay-StdCSi, the standard-deviation of the delay captured
along measurement loop Fi
o SR-Path-Delay-Variation-MeanCSi, the mean delay variation captured
along measurement loop Fi
o SR-Path-Delay-Variation-StdCSi, the standard-deviation of the
delay variation captured along measurement loop Fi
A stable and uncongested network should produce rather constant
delays, resulting in low standard-deviation values and almost zero
mean delay variation.
Example data was captured in a lightly loaded Gigabit network. 11
routers are passed per measurement loop. The sample size is 30
packets, more than 200 samples were captured per measurement loop.
The loops are set up for a different purpose than specified here,
they are picked due to a high number of passed routers. Note that
SR-DV-Mean here refers to an abs(SR-DV-Mean) sample, thus small,
positive, non-zero means result. The time unit is microseconds.
Geib Expires August 26, 2021 [Page 17]
Internet-Draft Abbreviated Title February 2021
Metric|Quantile|SR-D-Mean|SR-D-Std|SR-DV-Mean|SR-DV-Std
------+--------+---------+--------+----------+---------
Loop1 | 95% | 34510 | 40 | 28 | 49
------+--------+---------+--------+----------+---------
Loop2 | 95% | 35104 | 45 | 34 | 49
------+--------+---------+--------+----------+---------
Loop1 | 50% | 34495 | 17 | 15 | 13
------+--------+---------+--------+----------+---------
Loop2 | 50% | 35088 | 15 | 14 | 12
------+--------+---------+--------+----------+---------
Loop1 | 5% | 34504 | 10 | 12 | 11
------+--------+---------+--------+----------+---------
Loop2 | 5% | 35080 | 13 | 12 | 9
------+--------+---------+--------+----------+---------
Example baseline metrics for an 11 hop measurement loop
Figure 2
8.2. Discussion of the baseline measurement
Delay outliers may occur at any time in any communication network,
and the measurement system packet processing itself may also produce
some. It is fair to expect only single outliers in a stable, not
congested network. It may be worth to capture several consecutive
SR-Path-Periodic-*-Stream samples and compare their statistics,
before picking reasonable baseline metric values. Samples showing
higher standard deviations (compare the 95% quantile values in the
above figure to the 50% quantile values) may benefit from removing
the maximum singleton value from the sample. This will smooth the
mean and standard-deviation, and if the result then is closer to
those of the majority of the samples, foster confidence in
determining the baseline metrics. Depending on the preferred method
of data-processing and storing, this may require capturing the sample
maximum as a separate metric.
8.3. Definition of SR-Path-Sub-Path-RTD-Estimate
Within a single evaluation interval of identical Time T0 and Tf, SR-
Path-Delay-MeanCSi(from now on DMeanCSi)is the mean delay of the
measurement loop passing the monitored Sub-Path SPi by a round trip.
Let's keep the indexig applied above, then Fj and Fk with captured
mean delays DMeanCSj and DMeanCSk pass SPi uniderictional. Further,
3 measurement loops Fx, Fy and Fz don't pass Sub-Path at all. The
corresponding mean delays are DMeanCSs, DMeanCSt and DMeanCSu.
Geib Expires August 26, 2021 [Page 18]
Internet-Draft Abbreviated Title February 2021
The the SR-Path-Sub-Path-RTD-Estimate of the Round Trip Delay along
the monitored Sub-Path Fi, RTD_Fi, is
RTD_Fi=(3*DMeanCSi+DMeanCSj+DMeanCSk-DMeanCSx-DMeanCSy-DMeanCSz)/4
8.4. Definition of SR-Path-Sub-Path-*-Changepoint
The asterisk stands for "Interface" as well as "Connectivity" (the
latter may be indicated by packet drops, but also a change in sub-
path routes with a change in measurement loop delay may be applied to
detect and locate this event).
Network changes are often characterised by a change in the mean delay
of a monitoring measurement. CUSUM (cumulative sum ) charts have
been shown to be efficient in detecting shifts in the mean of a
process [NIST]. The upper bound CUSUM is defined as:
Sup(t)-Fi-Delay = max(0,Sup(t-1) + xt - SR-Path-*-MeanCSi - ki)
with Sh0 = 0, ki = Delta * SR-Path-*-StdCSi (Delta is a dimensionless
integer number), xt = Type-P-SR-Path-Periodic-* singleton for
measurement loop Fi at time t.
The actual SR-Path-Delay-Mean of Measurement Loop Fi is decided to be
significantly above SR-Path-*-MeanCSi, if:
Sup(t)-Fi-Delay > h_SP, with h_SP = d*ki (d is a dimensionless
integer number).
An analogus CUSUM controls changes to a lower mean delay (which may
be caused by a re-routing event):
Slo(t)-Fi-Delay = max(0,Slo(t-1) + SR-Path-*-MeanCSi - xj - k)
The actual SR-Path-Delay-Mean of Fi is decided to be significantly
below SR-Path-*-MeanCSi, if:
Slo(t)-Fi-Delay > h_SP
8.5. Discussion of SR-Path-Sub-Path-*-Changepoint
CUSUM chart based changepoint detection is sensible even to small
changes in the mean. CUSUM charts offer a limited protection against
single, isolated outliers. A cumulated sum only grows, if the
controled process consistenly changes its mean (or standard
deviation, respectively). Assuming constant physical minimum delays
to characterise wireline communication networks, a change in standard
Geib Expires August 26, 2021 [Page 19]
Internet-Draft Abbreviated Title February 2021
deviation not affecting the mean delay doesn't seem to be caused by a
change in networking conditions.
The measured delays will change once a Sub-Path route has changed, or
once persistent congestion starts to fill a queue. Both indicate
changes in the network. As the Sub-Pathes SPi form an overlay with
designed properties, every network change affecting a sub-path
creates correlated SR-Path-* metric changes. As the correspondance
of network changes to Sub-Path metrics is known a-priory, detecting
correlated SR-Path-* metric changes allows to locate the change.
In the absence of packet re-routing, packet loss is characterising a
loss of connectivity. Packet loss requires a time threshold when to
decide that an active measurement packet was lost, and consecutive
loss requires receiver awareness, that packets have been sent (this
argues for the sender to be the receiver, unless both comminicate
fast and reliable out of band).
The preferred CUSUM parametrisation will depend on the kind of events
to detected and on the outlier characteristics.
ki = Delta * SR-Path-*-StdCSi may be set to a value relevant high
enough to exclude single outliers to trigger an alert, but low enough
to indicate persistent changes in delay. The same holds for the to
be picked for d.
A broader discussion on CUSUM parametrisation may be found in
literature. Networking skills are required to parametrise CUSUM, as
well as to interprete the results (notably to differ re-routing from
congestion).
8.6. Definition of SR-Path-Sub-Path-Congestion-Location
An interface along a single monitored Sub-Path SPi whose queue is
persistently filled adds latency to measurement loop Fi and one of
the two unidirectional measurement loops Fj and Fk passing Sub-Path
SPi. Fj has been defined to pass SPi from Hub to Spoke and Fk pass
SPI in opposite direction.
IF Sup(t)_SPi_Periodic-Delay + Sup(t)_SPj_Periodic-Delay > h_SP
AND h_SP > Sup(t)_SPk_Periodic-Delay
AND h_SP > Sup(t)_SPx_Periodic-Delay
AND h_SP > Sup(t)_SPy_Periodic-Delay
AND h_SP > Sup(t)_SPz_Periodic-Delay
Geib Expires August 26, 2021 [Page 20]
Internet-Draft Abbreviated Title February 2021
Then Sub-Path SPi faces congestion in direction "Hub to Spoke".
IF Sup(t)_SPi_Periodic-Delay + Sup(t)_SPk_Periodic-Delay > h_SP
AND h_SP > Sup(t)_SPj_Periodic-Delay
AND h_SP > Sup(t)_SPx_Periodic-Delay
AND h_SP > Sup(t)_SPy_Periodic-Delay
AND h_SP > Sup(t)_SPz_Periodic-Delay
Then Sub-Path SPi faces congestion in direction "Spoke to Hub".
Here, h_SP is a universal threshold in unit time to indicate a
filling queue or a significant change in delay due to a Sub-Path
reroute or another persistent change in topology (like e.g. automated
Layer 1 / Layer 2 topology changes). SPx, SPy and SPz don't pass
8.7. Discussion of SR-Path-Sub-Path-*-Location
[Editor Note: Discussion and a suitable connectivity monitoring
metric Definition+Discussion to be added.]
9. Discussion of Temporal Resolution
[Editor Note: requires a review..] The metric reports loss of
connectivity of monitored sub-path or congestion of an interface and
identifies the sub-path and the direction of traffic in the case of
congestion.
The temporal resolution of the detected events depends on the spacing
interval of packets transmitted per measurement path. An identical
sending interval is chosen for every measurement path. As a rule of
thumb, an event is reliably detected if a sample consists of at least
5 probes indicating the same underlying change in behavior.
Depending on the underlying event either two or three measurement
paths are impacted. At least two consecutively received measurement
packets per measurement path should suffice to indicate a change.
The values chosen for an operational network will have to reflect
scalability constraints of a PMS measurement interface. As an
example, a PMS may work reliable if no more than one measurement
packet is transmitted per millisecond. Further, measurement is
configured so that the measurement packets return to the sender
interface. Assume always groups of 6 links to be monitored as
described above by 6 measurements paths. If one packet is sent per
measurement path within 500 ms, up to 498 links can be monitored with
Geib Expires August 26, 2021 [Page 21]
Internet-Draft Abbreviated Title February 2021
a reliable temporal resolution of roughly one second per detected
event.
Note that per group measurement packet spacing, measurement loop
delay difference and latency caused by congestion impact the
reporting interval. If each measurement path of a single 6 link
monitoring group is addressed in consecutive milliseconds (within the
500 ms interval) and the sum of maximum physical delay of the per
group measurement paths and latency possibly added by congestion is
below 490 ms, the one second reports reliably capture 4 packets of
two different measurement paths, if two measurement paths are
congested, or 6 packets of three different measurement paths, if a
link is lost.
A variety of reporting options exist, if scalability issues and
network properties are respected.
10. IANA Considerations
If standardised, the metric will require an entry in the IPPM metric
registry.
11. Security Considerations
This draft specifies how to use methods specified or described within
[RFC8402] and [RFC8403]. It does not introduce new or additional SR
features. The security considerations of both references apply here
too.
12. References
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2678] Mahdavi, J. and V. Paxson, "IPPM Metrics for Measuring
Connectivity", RFC 2678, DOI 10.17487/RFC2678, September
1999, <https://www.rfc-editor.org/info/rfc2678>.
[RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation
Metric for IP Performance Metrics (IPPM)", RFC 3393,
DOI 10.17487/RFC3393, November 2002,
<https://www.rfc-editor.org/info/rfc3393>.
Geib Expires August 26, 2021 [Page 22]
Internet-Draft Abbreviated Title February 2021
[RFC3432] Raisanen, V., Grotefeld, G., and A. Morton, "Network
performance measurement with periodic streams", RFC 3432,
DOI 10.17487/RFC3432, November 2002,
<https://www.rfc-editor.org/info/rfc3432>.
[RFC6673] Morton, A., "Round-Trip Packet Loss Metrics", RFC 6673,
DOI 10.17487/RFC6673, August 2012,
<https://www.rfc-editor.org/info/rfc6673>.
[RFC7679] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton,
Ed., "A One-Way Delay Metric for IP Performance Metrics
(IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, January
2016, <https://www.rfc-editor.org/info/rfc7679>.
[RFC7680] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton,
Ed., "A One-Way Loss Metric for IP Performance Metrics
(IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, January
2016, <https://www.rfc-editor.org/info/rfc7680>.
[RFC8029] Kompella, K., Swallow, G., Pignataro, C., Ed., Kumar, N.,
Aldrin, S., and M. Chen, "Detecting Multiprotocol Label
Switched (MPLS) Data-Plane Failures", RFC 8029,
DOI 10.17487/RFC8029, March 2017,
<https://www.rfc-editor.org/info/rfc8029>.
[RFC8287] Kumar, N., Ed., Pignataro, C., Ed., Swallow, G., Akiya,
N., Kini, S., and M. Chen, "Label Switched Path (LSP)
Ping/Traceroute for Segment Routing (SR) IGP-Prefix and
IGP-Adjacency Segment Identifiers (SIDs) with MPLS Data
Planes", RFC 8287, DOI 10.17487/RFC8287, December 2017,
<https://www.rfc-editor.org/info/rfc8287>.
[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
Decraene, B., Litkowski, S., and R. Shakir, "Segment
Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
July 2018, <https://www.rfc-editor.org/info/rfc8402>.
[RFC8667] Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C.,
Bashandy, A., Gredler, H., and B. Decraene, "IS-IS
Extensions for Segment Routing", RFC 8667,
DOI 10.17487/RFC8667, December 2019,
<https://www.rfc-editor.org/info/rfc8667>.
12.2. Informative References
Geib Expires August 26, 2021 [Page 23]
Internet-Draft Abbreviated Title February 2021
[CommodityTomography]
Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C.,
Kolaczyk, ED., and N. Taft, "Structural analysis of
network traffic flows", 2004,
<https://www.cc.gatech.edu/classes/AY2007/cs7260_spring/
papers/odflows-sigm04.pdf>.
[ID.draft-ietf-6man-spring-srv6-oam]
Zafar, A., Filsfils, C., Matsushima, S., Voyer, D., and M.
Chen, "Operations, Administration, and Maintenance (OAM)
in Segment Routing Networks with IPv6 Data plane (SRv6)",
2021.
[NIST] NIST, "NIST/SEMATECH e-Handbook of Statistical Methods,
section CUSUM Control Charts", 2021,
<http://www.itl.nist.gov/div898/handbook/>.
[RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis,
"Framework for IP Performance Metrics", RFC 2330,
DOI 10.17487/RFC2330, May 1998,
<https://www.rfc-editor.org/info/rfc2330>.
[RFC8403] Geib, R., Ed., Filsfils, C., Pignataro, C., Ed., and N.
Kumar, "A Scalable and Topology-Aware MPLS Data-Plane
Monitoring System", RFC 8403, DOI 10.17487/RFC8403, July
2018, <https://www.rfc-editor.org/info/rfc8403>.
Author's Address
Ruediger Geib (editor)
Deutsche Telekom
Heinrich Hertz Str. 3-7
Darmstadt 64295
Germany
Phone: +49 6151 5812747
Email: Ruediger.Geib@telekom.de
Geib Expires August 26, 2021 [Page 24]