Internet Engineering Task Force Yimin Shen
Internet-Draft Zhaohui Zhang
Intended status: Standards Track Juniper Networks
Expires: August 6, 2020 February 3, 2020
Point-to-Multipoint Transport Using Chain Replication in Segment Routing
draft-shen-spring-p2mp-transport-chain-00
Abstract
This document specifies a point-to-multipoint (P2MP) transport
mechanism based on chain replication. It can be used in segment
routing to achieve traffic optimization.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 6, 2020.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 1]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Specification of Requirements . . . . . . . . . . . . . . . . 3
3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 3
4. P2MP Transport Using Chain Replication . . . . . . . . . . . 3
4.1. Bud Segment . . . . . . . . . . . . . . . . . . . . . . . 4
4.2. P2MP Chain . . . . . . . . . . . . . . . . . . . . . . . 6
4.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Path Computation for P2MP Chains . . . . . . . . . . . . . . 8
6. IGP and BGP-LS Extensions for Bud Segment . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
8. Security Considerations . . . . . . . . . . . . . . . . . . . 9
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
10.1. Normative References . . . . . . . . . . . . . . . . . . 10
10.2. Informative References . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
The Segment Routing Architecture [RFC8402] describes segment routing
(SR) and its instantiation in two data planes, i.e. MPLS and IPv6.
In SR, point-to-multipoint (P2MP) transport is currently achieved by
using ingress replication, where a point-to-point (P2P) SR tunnel is
constructed from a root node to each leaf node, and every ingress
packet is replicated and sent via a bundle of such P2P SR tunnels to
all the leaf nodes. Although this approach provides P2MP
reachability, it does not consider traffic optimization across the
tunnels, as the path of each tunnel is computed or decided
independently.
An alternative approach would be to use P2MP-tree based transport.
Such approach can achieve maximum traffic optimization, but it relies
a controller or path computation element (PCE) to dynamically
provision and manage "replication segments" on branch nodes. The
replication segments are essentially per-P2MP-tree (i.e. per-tunnel)
state on transit routers. Therefore, this approach is not fully
aligned with SR's principles of single-point (i.e. ingress router)
provisioning and stateless core.
This document introduces a new solution for P2MP transport in SR,
based on "chain replication". In this solution, P2MP transport is
achieved by constructing a set of "P2MP chain tunnels" (or simply
"P2MP chains") from a root node to leaf nodes. Each P2MP chain is a
tunnel with a leaf node at the tail end and some transit leaf nodes
along the path, resembling a chain. A transit leaf node replicates a
packet only once for local processing off the chain, and forwards the
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 2]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
original packet down the chain. The root node replicates and sends
packets via the set of P2MP chains to all the leaf nodes.
As a P2MP chain can reach multiple leaf nodes, it is considered to be
more efficient than the multiple P2P tunnels which would be needed in
ingress replication to reach these leaf nodes. Compared with ingress
replication and the P2MP-tree based approach, this solution provides
a middle ground by achieving a certain level of traffic optimization,
while aligning with the fundamental principles of SR, including
single-point provisioning and stateless core. The solution can be
used to improve P2MP transport efficiency in general, and to achieve
maximum traffic optimization in certain types of topologies.
2. Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119] and
[RFC8174].
3. Applicability
The P2MP transport mechanism in this document is generally applicable
to all networks. However, it benefits more for certain types of
topologies than for others. These topologies include ring
topologies, linear topologies, topologies with leaf nodes
concentrated in geographical sites which can be modeled as leaf
groups, etc.
The mechanism is transparent to all transit routers. Leaf nodes
intended to take advantage of the mechanism will need to support the
new forwarding behavior specified in this document. For other leaf
nodes, the mechanism has a backward compatibility to allow them to be
reached by P2P tunnels using ingress replication. Path computation
and P2MP chain construction will need to be supported by a controller
or root nodes, depending on where they are performed.
The mechanism is applicable to both SR-MPLS [RFC8660] and SRv6
[SRv6-SRH], [SRv6-Programming].
4. P2MP Transport Using Chain Replication
In this document, a P2MP transport scheme associated with a root node
and a set of leaf nodes is denoted as {root node, leaf nodes}. It is
achieved by using a bundle of P2MP chains covering all the leaf
nodes. Each P2MP chain is a tunnel starting from the root node and
reaching one or multiple leaf nodes along the path. The tail-end
node of the P2MP chain is a leaf node, called a "tail-end" leaf node.
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 3]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
Each leaf node traversed by the P2MP chain is called a "transit" leaf
node. As a special case, a P2MP chain may have no transit leaf node,
but only a tail-end leaf node, essentially becoming a P2P tunnel of
ingress replication.
R ------ R1 ------ R2 ------ L1 ------ R3 ------ L2 ------ L3
R : root node
Li : leaf node
Ri : transit router
Figure 1
A tail-end leaf node and a transit leaf nodes have different
behaviors when processing a received packet. In particular, a tail-
end leaf node processes the packet as a normal receiver. A transit
leaf node not only processes the packet as a receiver, but also
forwards it downstream along the P2MP chain, hence acting as a "bud
node". To achieve this, the transit leaf node needs to replicate the
packet, producing two packets, one for forwarding and the other for
local processing. Such packet replication happens on every transit
leaf node along a P2MP chain. Therefore, it is called "chain
replication".
This document introduces a new type of segments, called "bud
segments", to facilitate the above packet processing on leaf nodes.
The segment ID (SID) of a bud segment is a "bud-SID".
4.1. Bud Segment
On a leaf node, a bud segment represents the following instructions
for forwarding hardware to execute on a received packet P. They
apply when the active SID of the packet P is the bud-SID of this bud
segment.
[1] Detect whether this leaf node is a transit or tail-end leaf
node, based on whether the bud-SID is the last SID of a P2MP
chain.
[2] If this is a transit leaf node, replicate the packet to
produce a copy P1.
[2.1] For P, perform a NEXT operation on the bud-SID, make the
next SID active, and forward the packet based on that SID.
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 4]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
[2.2] For P1, perform a sequence of NEXT operations on the bud-
SID and all the subsequent SIDs of the P2MP chain, and process
the packet locally.
[3] If this is a tail-end leaf node, perform a NEXT operation on
the bud-SID for P, and process the packet locally.
In [2.2], when the transit leaf node processes P1 locally, all the
SIDs of the P2MP chain are not useful. Hence, they are removed
before the processing.
Bud segments are global segments of leaf nodes. They are routable
segments via topological shortest-paths. Only one bud segment is
needed per leaf node, and per SR-MPLS or SRv6. Bud-SIDs are
allocated from SRGB (SR global block).
In SR-MPLS, bud-SIDs are labels. In SRv6, bud-SIDs are IPv6
addresses explicitly associated with bud segments. Therefore, the
above instructions [1] to [3] are achieved in different ways in SR-
MPLS and SRv6:
(a) In SR-MPLS, there are two cases:
(a.1) The packet should have no service label, but only P2MP
chain labels in MPLS header. In [1], the bud segment SHOULD
detect whether the leaf node is a transit or tail-end leaf node
based on the S-bit (bottom of stack) of the bud-SID label. If
the S-bit is 0, the leaf node is a transit leaf node. If the
S-bit is 1, it is a tail-end leaf node. In [2.2], the bud
segment SHOULD simply pop the entire MPLS header.
(a.2) The packet may have service label(s) after P2MP chain
labels in MPLS header, e.g. a bridge domain label, a source
Ethernet segment label, etc. In this case, the bud segment
MUST have a way to identify the position of the last P2MP chain
label. This document introduces an "end-of-chain" (EoC) label
to facilitate the process. An EoC label is a label which is
known to all root nodes and leaf nodes in a network. It MUST
have a globally common value, via configuration on these nodes.
When a root node constructs an MPLS header for a packet, the
EoC label MUST be pushed immediately before P2MP chain labels,
making it the next label after the last P2MP chain label.
Thus, in [1], the bud segment SHOULD detect whether the leaf
node is a transit or tail-end leaf node based on whether the
next label in the current MPLS header is the EoC label. If so,
the leaf node is a tail-end leaf node. Otherwise, it is a
transit leaf node. In [2.2], the bud segment SHOULD pop labels
until the EoC label is popped. In [3], the bud segment SHOULD
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 5]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
pop the bud-SID label and the next label, which is the EoC
label.
(b) In SRv6, the packet is encapsulated with an outer IPv6 header
corresponding to the P2MP chain, optionally followed by a segment
routing header (SRH) containing the SIDs of the P2MP chain, and
followed by an inner header (of IPv4, IPv6, MPLS, layer-2, etc.)
associated with a service. In [1], the bud segment SHOULD detect
whether it is the last P2MP chain SID based on the SRH. If the
SRH does not exist or the Segments Left in the SRH is 0, the leaf
node is a tail-end leaf node. Otherwise, it is a transit leaf
node. In [2.2] and [3], the bud segment SHOULD simply remove the
outer IPv6 header and the SRH (if any), and leave the packet with
the inner header to local processing.
Bud segments are shared by all P2MP transport schemes, i.e. all
combinations of {root node, leaf nodes}. A leaf node SHOULD advertise
a bud segment for SR-MPLS, if its forwarding hardware supports the
above SR-MPLS processing. Likewise, it SHOULD advertise a bud
segment for SRv6, if its forwarding hardware supports the above SRv6
processing. The advertisement may be via IGP (ISIS, OSPF) or BGP-LS.
The advertisement allows the leaf node to be considered on a P2MP
chain. If a leaf node does not advertise a bud segment, it MUST be
reached via a P2P tunnel using ingress replication.
Bud segments are generic purpose segments. They may also be used in
cases other than P2MP transport, such as traffic monitoring. These
use cases are out of the scope of this document.
4.2. P2MP Chain
Construction of P2MP chains for a P2MP transport scheme is performed
by a controller or a root node based on path computation (Section 5).
The path of a P2MP chain is a single path traversing one or multiple
transit leaf nodes and terminating at a tail-end leaf node. Between
the root node and the first transit leaf node, and between two
consecutive leaf nodes, there may be none, one, or multiple transit
routers.
The path is then translated to a SID list to be programmed on the
root node. In the SID list, each transit leaf node has its bud-SID
in a corresponding position. Given a P2MP chain to a set of leaf
nodes in the order of L1, L2, ..., Ln, the SID list may be
represented as:
<SID_11, SID_12, ...>, bud-SID of L1, ..., <SID_i1, SID_i2, ...>,
bud-SID of Li, ..., <SID_n1, SID_n2, ...>, <bud-SID of Ln>
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 6]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
Where:
o <SID_11, SID_12, ...> is the sub-path from the root node to L1.
o <SID_i1, SID_i2, ...> is the sub-path from Li-1 to Li.
o Ln's bud-SID is the last SID of the list, if the sub-path from
Ln-1 to Ln is partial or empty, or if an EoC label is needed in
SR-MPLS. It is optional in other cases.
The above sub-paths are regular point-to-point paths. The SIDs in
the sub-paths are regular SIDs, such as adjacency-SIDs, node-SIDs,
binding-SIDs, etc. There is no SID specific to the given P2MP chain.
A sub-path from Li-1 to Li may have an empty SID list, if the sub-
path takes the shortest path indicated by the bud-SID of Li.
The root node then uses the SID list in packet encapsulation. Note
that in the SR-MPLS case where an EoC label is needed, the EoC label
SHOULD be pushed to an MPLS header, before the SID list is pushed.
4.3. Example
In the following example, P2MP transport is needed from the root node
R, to leaf nodes L1, L2, L3 and L4.
R ------ R1 -------------------- R2 ------- L1
| | /
| | /
| | /
R3 -------------------- R4 ------- L2
| |
| |
| |
R5 -------------------- R6 ------- L3
| | /
| | /
| | /
R7 -------------------- R8 ------- L4
Figure 2
Path computation results in two P2MP chains:
P2MP chain 1:
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 7]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
Path: R -> R1 -> R2 -> L1 -> R4 -> L2, where L1 is a transit
leaf node, and L2 is the tail-end leaf node.
Assuming that the sub-path L1 -> R4 -> L2 matches the shortest
path from L1 to L2, the bud-SID of L2 is used to represent this
sub-path. The segment list applied to packets on R is:
adj-SID 100 - link from R to R1
adj-SID 200 - link from R1 to R2
adj-SID 300 - link from R2 to L1
bud-SID 1000 - L1
bud-SID 2000 - L2
P2MP chain 2:
Path: R -> R1 -> R3 -> R5 -> R6 -> L3 -> R8 -> L4, where L3 is
a transit leaf node, and L4 is the tail-end leaf node.
Assuming that the sub-path R -> R1 -> R3 -> R5 -> R6 -> L3
matches the shortest path from R to L3, the bud-SID of L3 is
used to represent this sub-path. The segment list applied to
packets on R is:
bud-SID 3000 - L3
adj-SID 600 - link from L3 to R8
adj-SID 700 - link from R8 to L4
bud-SID 4000 - L4
5. Path Computation for P2MP Chains
Path computation for the P2MP chains of a P2MP transport scheme {root
node, leaf nodes} lies in the responsibility of a controller or the
root node. This document does not enforce a particular computation
algorithm. In fact, any P2P path computation algorithm may be
extended to serve the purpose.
The path computation may consider general metric for shortest paths,
or traffic engineering (TE) constraints for TE paths. This document
recommends the following constraints to be considered as well:
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 8]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
- The maximum hop count of path. This SHOULD be based on the
maximum delay allowed for a packet to accumulate before reaching a
tail-end leaf node.
- The maximum length of SID list. This SHOULD be based on the
maximum header size which a root node may apply to a packet. This
is typically a limit of forwarding hardware.
Note that a SID list is translated from a computed path. Hence, the
length of the SID list and the hop count of the path are typically
not the same.
The path computation may achieve more predictable results by dividing
leaf nodes into groups based on their geographical or administrative
location. Thus, paths MAY be computed in a manner that each P2MP
chain is used to reach only a given group, while the number of P2MP
chains to reach all the leaf nodes of the group is minimized.
6. IGP and BGP-LS Extensions for Bud Segment
The protocol extensions of IGP (ISIS and OSPF) and BGP-LS for bud
segment advertisement will be specified in the next version of this
document.
7. IANA Considerations
This document requires IANA registration and allocation for the ISIS,
OSPF and BGP-LS extensions for bud segment advertisement. The
details will be provided in the next version of this document.
8. Security Considerations
This document introduces bud segments for leaf nodes to act as both
packet receivers and transit routers. A security attack may target
on a leaf node by constructing malicious packets with the node's bud-
SID. Such kind of attacks can be defeated by restricting bud segment
distribution and P2MP chain construction within the scope of a
controller and a given network.
9. Acknowledgements
This document leverages work done by Alexander Arseniev and Ron
Bonica.
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 9]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
10. References
10.1. Normative References
[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
Decraene, B., Litkowski, S., and R. Shakir, "Segment
Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
July 2018, <https://www.rfc-editor.org/info/rfc8402>.
[RFC8660] Bashandy, A., Ed., Filsfils, C., Ed., Previdi, S.,
Decraene, B., Litkowski, S., and R. Shakir, "Segment
Routing with the MPLS Data Plane", RFC 8660,
DOI 10.17487/RFC8660, December 2019,
<https://www.rfc-editor.org/info/rfc8660>.
[SRv6-SRH]
Filsfils, C., Dukes, D., Previdi, S., Leddy, J.,
Matsushima, S., and D. Voyer, "IPv6 Segment Routing
Header", draft-ietf-6man-segment-routing-header (work in
progress), 2019.
[SRv6-Programming]
Filsfils, C., Garvia, P., Leddy, J., Voyer, D.,
Matsushima, S., and Z. Li, "SRv6 Network Programming",
draft-ietf-spring-srv6-network-programming (work in
progress), 2019.
10.2. Informative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
Authors' Addresses
Yimin Shen
Juniper Networks
10 Technology Park Drive
Westford, MA 01886
USA
Email: yshen@juniper.net
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 10]
Internet-DraftPoint-to-Multipoint Transport Using Chain RepFebruary 2020
Zhaohui Zhang
Juniper Networks
10 Technology Park Drive
Westford, MA 01886
USA
Email: zzhang@juniper.net
Yimin Shen & Zhaohui ZhanExpires August 6, 2020 [Page 11]