BGP MultiNexthop attribute
draft-kaliraj-idr-multinexthop-attribute-02
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Kaliraj Vairavakkalai , Jeyananth Minto Jeganathan , Gyan Mishra | ||
| Last updated | 2021-12-28 | ||
| Stream | (None) | ||
| Formats | plain text html xml htmlized pdfized bibtex | ||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-kaliraj-idr-multinexthop-attribute-02
Network Working Group K. Vairavakkalai
Internet-Draft M. Jeyananth
Intended status: Standards Track Juniper Networks, Inc.
Expires: 1 July 2022 G. Mishra
Verizon Communications Inc.
28 December 2021
BGP MultiNexthop attribute
draft-kaliraj-idr-multinexthop-attribute-02
Abstract
Today, a BGP speaker can advertise one nexthop for a set of NLRIs in
an Update. This nexthop can be encoded in either the BGP-Nexthop
attribute (code 3), or inside the MP_REACH attribute (code 14).
For cases where multiple nexthops need to be advertised, BGP-Addpath
is used. Though Addpath allows basic ability to advertise multiple-
nexthops, it does not allow the sender to specify desired
relationship between the multiple nexthops being advertised e.g.,
relative-preference, type of load-balancing. These are local
decisions at the receiving speaker based on local configuration and
path-selection between the various additional-paths, which may tie-
break on some arbitrary step like Router-Id or BGP nexthop address.
Some scenarios with a BGP-free core may benefit from having a
mechanism, where egress-node can signal multiple-nexthops along with
their relationship, in one BGP route, to ingress nodes. This
document defines a new BGP attribute "MultiNexthop (MNH)" that can be
used for this purpose.
This attribute can be used for both labeled and unlabled BGP
families. The MNH can be used to advertise MPLS label along with
nexthop for unlabeled families (e.g. Inet Unicast, Inet6 Unicast).
Such that, mechanisms at the transport layer can work uniformly on
labeled and unlabled BGP families. Service route scale can be
confined closer to the service edge nodes, making the transport layer
nodes light and nimble. They dont have any service route state, only
have service end-point state.
The MNH plays different role in "downstream allocation" scenario than
"upstream allocation" scenario. E.g. for RFC8277 families that
advertise downstream allocated labels, the MNH can play the "Label
Descriptor" role, describing the forwarding semantics of the label
being advertised. This can be useful in network visualization and
controller based traffic engineering (e.g. EPE).
Vairavakkalai, et al. Expires 1 July 2022 [Page 1]
Internet-Draft BGP MultiNexthop attribute December 2021
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 1 July 2022.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Use-cases examples . . . . . . . . . . . . . . . . . . . . . 4
2.1. Optimal forwarding exit-points signaling to
ingress-node . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Choosing a received label based on it's forwarding-semantic
at advertising node . . . . . . . . . . . . . . . . . . . 5
2.3. Signaling desired forwarding behavior when installing MPLS
Upstream labels at receiving node . . . . . . . . . . . . 5
2.4. Load-balancing over EBGP parallel links . . . . . . . . . 5
Vairavakkalai, et al. Expires 1 July 2022 [Page 2]
Internet-Draft BGP MultiNexthop attribute December 2021
2.5. Flowspec routes with multiple Redirect-IP nexthops . . . 6
2.6. Color-Only resolution nexthop . . . . . . . . . . . . . . 6
3. The "MultiNexthop (MNH)" BGP attribute encoding . . . . . . . 6
3.1. Operations . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1. BGP Capability for MNH attribute . . . . . . . . . . 8
3.1.2. Scope of use, and propagation . . . . . . . . . . . . 8
3.1.3. Interaction of MNH with Nexthop (in attr-code 3,
14) . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.4. Interaction with Addpath . . . . . . . . . . . . . . 9
3.1.5. Path-selection considerations . . . . . . . . . . . . 9
3.1.6. NH-Flags U bit, denoting upstream/downstream
semantics . . . . . . . . . . . . . . . . . . . . . . 9
3.2. Nexthop Forwarding Semantics TLV . . . . . . . . . . . . 10
3.3. Nexthop-Leg Descriptor TLV . . . . . . . . . . . . . . . 11
3.4. Nexthop Attributes Sub-TLV . . . . . . . . . . . . . . . 12
3.4.1. IP Address . . . . . . . . . . . . . . . . . . . . . 12
3.4.2. Labeled IP nexthop . . . . . . . . . . . . . . . . . 13
3.4.3. Transport Class ID (Color) . . . . . . . . . . . . . 14
3.4.4. Available Bandwidth . . . . . . . . . . . . . . . . . 15
3.4.5. Load balance factor . . . . . . . . . . . . . . . . . 16
3.4.6. Forwarding-context name . . . . . . . . . . . . . . . 17
3.4.7. Forwarding-context Route-Target . . . . . . . . . . . 17
4. Error handling procedures . . . . . . . . . . . . . . . . . . 18
5. Scaling considerations . . . . . . . . . . . . . . . . . . . 19
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
7. Security Considerations . . . . . . . . . . . . . . . . . . . 20
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20
9.1. Normative References . . . . . . . . . . . . . . . . . . 20
9.2. References . . . . . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21
1. Introduction
Today, a BGP speaker can advertise one nexthop for a set of NLRIs in
an Update. This nexthop can be encoded in either the top-level BGP-
Nexthop attribute (code 3), or inside the MP_REACH attribute (code
14).
For cases where multiple nexthops need to be advertised, BGP-Addpath
is used. Though Addpath allows basic ability to advertise multiple-
nexthops, it does not allow the sender to specify desired
relationship between the multiple nexthops being advertised e.g.,
relative-ordering, type of load-balancing, fast-reroute. These are
local decision at the receiving node based on local configuration and
path-selection between the various additional-paths, which may tie-
break on some arbitrary step like Router-Id or BGP nexthop address.
Vairavakkalai, et al. Expires 1 July 2022 [Page 3]
Internet-Draft BGP MultiNexthop attribute December 2021
Some scenarios with a BGP-free core may benefit from having a
mechanism, where egress-node can signal multiple-nexthops along with
their relationship to ingress nodes. This document defines a new BGP
attribute "MultiNexthop (MNH)" that can be used for this purpose.
This attribute can be used for both labeled and unlabled BGP
families. The MNH can be used to advertise MPLS label along with
nexthop for unlabeled families (e.g. Inet Unicast, Inet6 Unicast).
Such that, mechanisms at the transport layer can work uniformly on
labeled and unlabled BGP families. Service route scale can be
confined closer to the service edge nodes, making the transport layer
nodes light and nimble. They dont have any service route state, only
have service end-point state.
The MNH plays different role in "downstream allocation" scenario than
"upstream allocation" scenario. E.g. for RFC8277 families that
advertise downstream allocated labels, the MNH can play the "Label
Descriptor" role, describing the forwarding semantics of the label
being advertised. This can be useful in network visualization and
controller based traffic engineering (e.g. EPE).
A new BGP capability ([RFC3392]) called "MultiNexthop (MNH" is
defined with type code: IANA TBD. This capability is used to express
the ability to send and receive MNH attribute.
2. Use-cases examples
2.1. Optimal forwarding exit-points signaling to ingress-node
In a BGP free core, one can dynamically signal to the ingress-node,
how traffic should be load-balanced towards a set of exit-nodes, in
one BGP-route containing this attribute.
Example, for prefix1, perform equal cost load-balancing towards exit-
nodes A, B; where-as for prefix2, perform unequal-cost load-balancing
(40%, 30%, 30%) towards exit-nodes A, B, C.
Example, for prefix1, use PE1 as primary-nexthop and use PE2 as a
backup-nexthop.
Vairavakkalai, et al. Expires 1 July 2022 [Page 4]
Internet-Draft BGP MultiNexthop attribute December 2021
2.2. Choosing a received label based on it's forwarding-semantic at
advertising node
In Downstream label allocation case, the MNH plays role of "Label
descriptor" and describes the forwarding treatment given to the label
at the advertising speaker. The receiving speaker can benefit from
this information as in the following examples:
- For a Prefix, a label with FRR enabled nexthop-set can be preferred
to another label with a nexthop-set that doesn't provide FRR.
- For a Prefix, a label pointing to 10g nexthop can be preferred to
another label pointing to a 1g nexthop
- Set of labels advertised can be aggregated, if they have same
forwarding semantics (e.g. VPN per-prefix-label case)
2.3. Signaling desired forwarding behavior when installing MPLS
Upstream labels at receiving node
In Upstream label allocation case, the receiving speaker's
forwarding-state can be controlled by the advertising speaker, thus
enabling a standardized API to program desired MPLS forwarding-state
at the receiving node. This is described in the [MPLS-NAMESPACES]
2.4. Load-balancing over EBGP parallel links
Consider N parallel links between two EBGP speakers. There are
different models possible to do load balancing over these links:
N single-hop EBGP sessions over the N links. Interface addresses
are used as next-hops. N copies of the RIB are exchanged to form
N-way ECMP paths. The routes advertised on the N sessions can be
attached with Link bandwidth comunity to perform weighted ECMP.
1 multi-hop EBGP session between loopback addresses, reachable via
static route over the N links. Loopback addresses are used as
next-hops. 1 copy of the RIB is exchanged with loopback address as
nexthop. And a static route can be configured to the loopback
address to perform desired N-way ECMP path. M loopbacks are
configured in this model, to achieve M different load balancing
schemes: ECMP, weighted ECMP, Fast-reroute enabled paths etc.
Vairavakkalai, et al. Expires 1 July 2022 [Page 5]
Internet-Draft BGP MultiNexthop attribute December 2021
1 multi-hop EBGP session between loopback addresses, reachable via
static route over the N links. Interface addresses are used as
next-hops, without using additional loopbacks. 1 copy of the RIB
is exchanged with MNH attribute to form N-way ECMP paths, weighted
ECMP, Fast-reroute backup paths etc. BFD may be used to these
directly connected BGP nexthops to detect liveness.
2.5. Flowspec routes with multiple Redirect-IP nexthops
There are existing protocol machinery which can benefit from the
ability of MNH to clearly specify fallback behavior when multiple
nexthops are involved. One example is the scenario described in
[FLWSPC-REDIR-IP] where multiple Redirect-to-IP nexthop addresses
exist for a Flowspec prefix. In such a scenario, the receiving
speakers may redirect the traffic to different nexthops, based on
variables like IGP-cost. If instead, the MNH was used to specify the
redirect-to-IP nexthop, then the order of preference between the
different nexthops can be clearly specified using one flowspec route
carrying a MNH containing those different nexthop-addresses
specifying the desired preference-order. Such that, irrespective of
IGP-cost, the receiving speakers will redirect the flow towards the
same traffic collector device.
2.6. Color-Only resolution nexthop
Another existing protocol machinery that manufactures nexthop
addresses from overloaded extended color community is specified in
[SRTE-COLOR-ONLY]. In a way, the color field is overloaded to carry
one anycast BGP next-hop with pre-specified fallback options. This
approach gives us only two next-hops to play with. The 'BGP nexthop
address' and the 'Color-only nexthop'
Instead, the MNH could be used to achieve the same result with more
flexibility. Multiple BGP nexthops can be carried, each resolving
over a desired Transport class (Color), and with customizable
fallback order. And the solution will work for non-SRTE networks as-
well.
3. The "MultiNexthop (MNH)" BGP attribute encoding
"MultiNexthop (MNH)" is a new BGP optional non-transitive attribute
(code TBD), that can be used to convey multiple-nexthops to a BGP-
speaker. This attribute describes forwarding semantics using one or
more Nexthop-Forwarding-Semantics TLV.
Vairavakkalai, et al. Expires 1 July 2022 [Page 6]
Internet-Draft BGP MultiNexthop attribute December 2021
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 1(Flags) |Attr. Type Code| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MNH-Flags | PNH-Len | ..Advertising|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PNH Address /32 or /128.. | Num-Nexthops |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...one or more "Nexthop-Forwarding-Semantics TLV"... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig 1: MultiNexthop - BGP Attribute
- Flags
BGP Path-attribute flags. 1001 to indicate Optional
Non-Transitive, Extended-length field.
- Attr. Type Code
IANA TBD.
- Length
Two bytes field stating length of attribute value in bytes.
- MNH-Flags
16 bit flag (UR..R)
Only one bit MSB is defined currently, others are reserved.
R: Reserved
U: 1 means the Upstream-allocation, attribute describes
forwarding state desired at receiving speaker.
0 means the Downstream-allocation, attribute describes
forwarding state present at advertising-speaker.
- PNH-Len
Protocol-NH Length in bits (= 32 or 128) Advertising PNH IPv4 or IPv6
- PNH-address
BGP Protocol Nexthop address (Len = 32 or 128) advertised in NEXT_HOP or
MP_REACH_NLRI attr. Used to sanity-check this attribute.
- Num-Nexthops
Number of nexthop addresses carried in the MNH.
>1 if ECMP or Alternate-paths.
Sec 3.2 describes the Nexthop-Forwarding-Semantics TLV.
Vairavakkalai, et al. Expires 1 July 2022 [Page 7]
Internet-Draft BGP MultiNexthop attribute December 2021
3.1. Operations
3.1.1. BGP Capability for MNH attribute
A new BGP capability [RFC3392] called "MultiNexthop (MNH)" is defined
with type code: IANA TBD. The MNH attribute MUST NOT be sent to a
BGP speaker that has not advertise the MNH capability. A BGP speaker
MUST ignore the MNH attribute received from a peer which has not
advertised the MNH attribute.
3.1.2. Scope of use, and propagation
The MNH attribute is intended to be used in a BGP free core, between
egress and ingress BGP speakers that understand this attribute.
Also, it is required to avoid un-intentionally leaking it to other AS
on an EBGP session, via a BGP speaker that does not understand MNH
attribute.
To achieve this, the attribute is defined as "optional non-
transitive", and uses a new BGP capability. If a MNH-attribute is
received by a PE BGP-speaker that does not understand it, the
optional non-transitive nature avoids unintentionally propagating it
towards EBGP-peers.
This also means that a RR needs to be upgraded to support this
attribute before any PEs in the network can make use of it. When a
RR receives the MNH-attribute from a client that supports the
attribute, it propagates the attribute as-is when reflecting the
route with nexthop unchanged.
When a BGP speaker receives the MNH-attribute from another speaker
that did not advertise support of the attribute, the attribute is
ignored.
The MNH attribute capability provides additonaly protection against
receiving this attribute from EBGP peers, when not intended.
3.1.3. Interaction of MNH with Nexthop (in attr-code 3, 14)
When adding a MultiNexthop attribute to an advertised BGP route, the
speaker MUST put the same next-hop address in the Advertising PNH
field as it put in the Nexthop field inside NEXT_HOP attribute or
MP_REACH_NLRI attribute. Any speaker that recognizes this attribute
and changes the PNH while re-advertising the route MUST remove the
MultiNexthop-Attribute in the re-advertisement. The speaker MAY
however add a new MultiNexthop-Attribute to the re-advertisement;
while doing so the speaker MUST record in the "Advertising-PNH" field
Vairavakkalai, et al. Expires 1 July 2022 [Page 8]
Internet-Draft BGP MultiNexthop attribute December 2021
the same next-hop address as used in NEXT_HOP field or MP_REACH_NLRI
attribute.
A speaker receiving a MNH attribute SHOULD ignore it if the next-hop
address contained in Advertising-PNH field is not the same as the
next-hop address contained in NEXT_HOP field or MP_REACH_NLRI field.
3.1.4. Interaction with Addpath
[ADDPATH-GUIDELINES] suggests the following:
"Diverse path: A BGP path associated with a different BGP next-hop
and BGP router than some other set of paths. The BGP router
associated with a path is inferred from the ORIGINATOR_ID attribute
or, if there is none, the BGP Identifier of the peer that advertised
the path."
When selecting "diverse paths" for ADD_PATH as specified above, the
MNH attribute should also be compared if it exists, to determine if
two routes have "different BGP next-hop".
3.1.5. Path-selection considerations
While tie breaking in the path-selection as described in RFC-4271,
9.1.2.2. step (e) viz. the "IGP cost to nexthop", consider the
highest cost among the nexthop-legs present in this attribute.
3.1.6. NH-Flags U bit, denoting upstream/downstream semantics
U-bit being Set indicates that this attribute describes what the
forwarding semantics of an Upstream-allocated label at the receiving-
speaker should be. All other bits in NH-Flags are currently
reserved, MUST be set to 0 by sender and MUST be ignored by receiver.
This attribute can be used for both labeled and unlabled BGP
families.
A MultiNexthop attribute with U=0 is called "Label Descriptor" role.
A BGP speaker advertising a downstream-allocated label-route MAY add
this attribute to the BGP route Update, to "describe" to the
receiving speaker what the label's forwarding semantics at the
sending speaker is.
Today semantics of a downstream-allocated label is known only to the
egress-node advertising the label. The speaker receiving the label-
binding doesn't know what the label's forwarding semantic at the
advertiser is. In some environments, it may be useful to convey this
information to the receiving speaker. This may help in better
Vairavakkalai, et al. Expires 1 July 2022 [Page 9]
Internet-Draft BGP MultiNexthop attribute December 2021
debugging and manageability, or enable the receiving speaker, which
could also be some centralized controller, make better decisions
about which label to use, based on the label's forwarding-semantic.
While doing upstream-label allocation, this attribute (U-bit Set) can
be used to convey the forwarding-semantics at the receiving node
should be. Details of the BGP protocol extensions required for
signaling upstream-label allocation are out of scope of this
document, and are described in [MPLS-NAMESPACES].
In rest of this document, the use of term "Label" will mean
downstream allocated label, unless specified otherwise as upstream-
allocated label.
When using the MultiNexthop attribute for IP-routes, U-bit is Set.
Since IP prefixes are by nature upstream allocated.
3.2. Nexthop Forwarding Semantics TLV
Each Forwarding-Semantics TLV expresses a nexthop leg's forwarding
action. i.e. a "FwdAction" with an associated Nexthop. The type of
actions defined by this TLV are given below. The "Nexthop-Leg" field
takes appropriate values based on the FwdAction.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FwdAction | Len |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...Nexthop-Leg Descriptor-TLV... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig 2: Nexthop Forwarding Semantics TLV
FwdAction Meaning
--------- -------------
1 Forward
2 Pop-And-Forward
3 Swap
4 Push
5 Pop-And-Lookup
6 Replicate
- Len
Length of Nexthop Forwarding Semantics TLV including all
Nexthop-Leg Descriptor TLVs.
Vairavakkalai, et al. Expires 1 July 2022 [Page 10]
Internet-Draft BGP MultiNexthop attribute December 2021
Meaning of most of the above FwdAction semantics is well understood.
FwdAction 1 is applicable for both IP and MPLS routes. FwdActions
2-5 are applicable for MPLS routes only. FwdActions 1 and 6 are
applicable for Flowspec routes for Redirect and Mirror actions.
The "Forward" action means forward the IP/MPLS packet with the
destination prefix (IP-dest-addr/MPLS-label) value unchanged. For IP
routes, this is the forwarding-action given for next-hop addresses
contained in BGP path-attributes: Nexthop (code 3) or MP_REACH_NLRI
(code 14). For MPLS routes, usage of this action is equivalent to
SWAP with same label-value; one such usage is explained in
[MPLS-NAMESPACES] when Upstream-label-allocation is in use.
The "Pop-And-Forward" action means Pop the MPLS-label and forward the
payload towards the Nexthop IP-address specified in the sub-TLV,
using appropriate encapsulation to reach the Nexthop.
The "Pop-And-Lookup" action may result in a MPLS-lookup or an upper-
layer header (like IPv4, IPv6) lookup, depending on whether the label
that was popped was the bottom of stack label.
If an incompatible FwdAction is received for a prefix-type, or an
unsupported FwdAction is received, it is considered a semantic-error
and MUST be dealt with as explained in section 5.
3.3. Nexthop-Leg Descriptor TLV
The Nexthop-Leg Descriptor TLV describes various attributes of the
Nexthop-legs that the FwdAction is associated with.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NhopDescrType | Len |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flags | Relative-Preference |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ..Nexthop Attributes SubTLV.. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ..Nexthop Attributes SubTLV.. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig 3: Nexthop-Leg Descriptor TLV
Vairavakkalai, et al. Expires 1 July 2022 [Page 11]
Internet-Draft BGP MultiNexthop attribute December 2021
NhopDescrType Meaning
------------- ---------
1 IPv4-nexthop
2 IPv6-nexthop
3 Labeled-IP-Nexthop
4 Forwarding-Context-Nexthop
- Len (2 octets)
Length in bytes of Nexthop-Leg Descriptor TLV, including Flags, Relative-Preference and all
Nexthop Attributes SubTLVs.
- Flags
2 octets. Must send zero. Must ignore on receive.
- Relative-Preference
Unsigned 2 octet integer specifying relative order or
preference, to use in FIB. Use in FIB all usable legs with lowest
relative-weight. If multiple legs exist with that weight, form ECMP.
3.4. Nexthop Attributes Sub-TLV
SubTLV type Meaning
----------- ----------
1 IP-Address
2 Labeled-IP-Nexthop
3 Transport Class ID (Color)
4 Bandwidth
5 Load-Balance-Factor
6 Forwarding-context Name
7 Forwarding-context Route-Target
3.4.1. IP Address
Vairavakkalai, et al. Expires 1 July 2022 [Page 12]
Internet-Draft BGP MultiNexthop attribute December 2021
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 1 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flags (2 bytes) | PfxLen | ..IPv4 or |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPv6 Address .. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- Flags
2 octets. Must send zero. Must ignore on receive.
- PfxLen (1 octet)
Length in bits of Nexthop IP-address (32 or 128)
- IPv4 or IPv6 Address
Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address.
Fig 4: IP-Address attribute sub-TLV
This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
with FwdAction of Pop-And-Forward or Forward.
3.4.2. Labeled IP nexthop
Vairavakkalai, et al. Expires 1 July 2022 [Page 13]
Internet-Draft BGP MultiNexthop attribute December 2021
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 2 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flags (2 bytes) | Label (20 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |Rsrv |S| PfxLen | ..IPv4 or IPv6 Address .. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- Flags (2 octets):
ELC (MSB bit): indicates if this egress NH is Entropy Label Capable.
Remaining bits are Reserved. Must send zero. Must ignore on receive.
- Label:
The Label field is a 20-bit field containing an MPLS label value
(see [RFC3032]).
- Rsrv:
This 3-bit field SHOULD be set to zero on transmission and
MUST be ignored on reception.
- S:
This 1-bit field MUST be set to one on last label being pushed.
- PfxLen (1 octet)
Length in bits of Nexthop IP-address (32 or 128)
- IPv4 or IPv6 Address
Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address.
Fig 5: "Labeled nexthop" attribute sub-TLV
This sub-TLV would be valid with Nexthop-Leg Forwarding-Semantics TLV
with FwdAction of Swap or Push.
3.4.3. Transport Class ID (Color)
The Nexthop can be associated with a Transport Class, so as to
resolve a path that satisfies required Transport tunnel
characteristics. Transport Class is defined in [BGP-CT]
Vairavakkalai, et al. Expires 1 July 2022 [Page 14]
Internet-Draft BGP MultiNexthop attribute December 2021
Transport Class is a per-nexthop scoped attribute. Without MNH, the
Transport class is applied to the nexthop IP-address encoded in the
BGP-Nexthop attribute (code 3), or inside the MP_REACH attribute
(code 14). With MNH, the Transport Class can be specified per
Nexthop-Leg TLV. It is applied to the IP-address encoded in the
Nexthop Attribute Sub-TLVs of type "IP Address", "Labeled IP
nexthop".
The format of the Transport Class ID Sub-TLV is as follows:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 3 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Class ID (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- Transport Class ID (Color):
This is a 32 bit identifier, associated with the Nexthop address.
The Nexthop specified in "IP-address or Labeled Nexthop" TLVs
are resolved over tunnels of this color.
Defined in [BGP-CT] [draft-kaliraj-idr-bgp-classful-transport-planes]
Fig 6: "Transport Class ID (Color)" attribute sub-TLV
This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
with FwdAction of Forward, Swap or Push.
3.4.4. Available Bandwidth
Vairavakkalai, et al. Expires 1 July 2022 [Page 15]
Internet-Draft BGP MultiNexthop attribute December 2021
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 4 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Bandwidth (8 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Bandwidth (contd.) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- Bandwidth
The bandwidth of the link expressed as 8 octets,
units being bits per second.
Fig 6: "Bandwidth" attribute sub-TLV
This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
with FwdAction of Forward, Swap or Push.
This sub-TLV would also be valid in a Label-Descriptor-attribute
whose U-bit is reset.
3.4.5. Load balance factor
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 5 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Balance Percentage |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- Balance Percentage:
This is the explicit "balance percentage" requested by the sender,
for unequal load-balancing over these Nexthop-Descriptor-TLV legs.
This balance percentage would override the implicit
balance-percentage calculated using "Bandwidth" attribute
sub-TLV.
Fig 7: "Load-Balance-Factor" attribute sub-TLV
Vairavakkalai, et al. Expires 1 July 2022 [Page 16]
Internet-Draft BGP MultiNexthop attribute December 2021
This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
with FwdAction of Forward, Swap or Push.
This is the explicit "balance percentage" requested by the sender,
for unequal load-balancing over these Nexthop-Descriptor-TLV legs.
This balance percentage would override the implicit balance-
percentage calculated using "Bandwidth" attribute sub-TLV
3.4.6. Forwarding-context name
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 6 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NameLen (2 octets) | ..Fwd-Context-name...(unicode)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- NameLen (2 octets)
Length in bytes of Fwd-Context-Name
- Forwarding Context Name:
Name of forwarding context (e.g. VRF-name) where lookup should happen.
Fig 8: Forwarding-Context name attribute sub-TLV
This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Fowarding-
context-name identfies the forwarding-context (for e.g. the VRF-
name) where the lookup should happen after pop label.
3.4.7. Forwarding-context Route-Target
Vairavakkalai, et al. Expires 1 July 2022 [Page 17]
Internet-Draft BGP MultiNexthop attribute December 2021
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr SubTLV Type = 7 | Len (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type (2 octets) | ...Route Target... (8 octets)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ..Route Target... (continued) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|...Route Target... (8 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Len (2 octets)
Length in bytes of remaining portion of SubTLV.
- Type:
value of 1 indicates Route Target follows.
- Route Target:
Import Route Target of the forwarding context
(e.g. VRF-name) where lookup should happen.
Fig 9: "Route-Target identifying the Forwarding-Context" attribute
sub-TLV
This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Route
Target identfies the forwarding-context (for e.g. VRF) where the
lookup should happen after pop label.
If any of these sub-TLVs or FwdAction combinations are unrecognized
or unsupported by a receiving speaker, it is considered a semantic
error for that speaker, and in such case error-handling procedures
described in section 4 should be followed.
4. Error handling procedures
When U-bit is Reset, this attribute is used to describe the label
advertised by the BGP-peer. If the value in the attribute is
syntactically parse-able, but not semantically valid, the receiving
speaker should deal with the error gracefully and MUST NOT tear down
the BGP session. In such cases the rest of the BGP-update can be
consumed if possibe.
Vairavakkalai, et al. Expires 1 July 2022 [Page 18]
Internet-Draft BGP MultiNexthop attribute December 2021
When U-bit is Set, this attribute is used to specify the forwarding
action at the receiving BGP-peer. If the value in the attribute is
syntactically parse-able, but not semantically valid, the receiving
speaker SHOULD deal with the error gracefully by ignoring the MNH
attribute, and continue processing the route. It MUST NOT tear down
the BGP session.
If a MNH with U-bit Reset is received for an IP-route (SAFI Unicast),
the MNH attribute SHOULD be ignored. Because IP route prefixes are
upstream allocated by nature.
If a MNH with U-bit Reset is received for an [MPLS-NAMESPACES] route,
the MNH attribute SHOULD be ignored. Because the label prefix in
MPLS-NAMESPACE family routes is upstream allocated.
The receiving BGP speaker MAY consider the "Num-Nexthop" value in a
MNH attribute (U-bit Set) not acceptable, based on it's forwarding
capabilities. In such cases, the MNH attribute SHOULD be considered
Unusable, and not be used, ignored on receipt. The condition SHOULD
be dealt gracefully and MUST NOT tear down the BGP session.
5. Scaling considerations
The MNH attribute allows receiving multiple nexthops on the same BGP
session. This flexibility also opens up the possibility that a peer
can send large number of multipath (ECMP/UCMP/FRR) nexthops that may
overwhelm the local system's forwarding plane. Prefix-limit based
checks will not avoid this situation.
To keep the scaling limits under check, a BGP speaker MAY keep
account of number of unique multipath nexthops that are received from
a BGP peer, and impose a configurable max-limit on that. This is
especially useful for EBGP peers.
A good scaling property of conveying multipath nexthops using the MNH
attribute with N nexthop legs on one BGP session, as against BGP
routes on N BGP sessions is that, it limits the amount of
transitionary multipath combinatorial state in the latter model.
Because the final multipath state is conveyed by one route update in
deterministic manner, there is no transitionary multipath
combinatorial explosion created during establishment of N sessions.
6. IANA Considerations
This document makes request to IANA to allocate the following codes
in BGP attributes registry.
1. MultiNexthop (MNH) BGP-attribute: A new BGP attribute code TBD.
Vairavakkalai, et al. Expires 1 July 2022 [Page 19]
Internet-Draft BGP MultiNexthop attribute December 2021
This document makes request to IANA to allocate the following sub
registries for MNH attribute:.
1. "FwdAction" type as defined in 3.1.
2. Nexthop-Leg Descriptor TLV:"NhopDescrType" as defined in 3.2.
3. "Nexthop Attributes Sub-TLV type" as defined in 3.3.
This document makes request to IANA to allocate a BGP capability code
TBD for MNH attribute:.
Note to RFC Editor: this section may be removed on publication as an
RFC.
7. Security Considerations
The attribute is defined as optional non-transitive BGP attribute,
such that it does not accidentally get propagated or leaked via BGP
speakers that dont support this feature, especially does not
unintentionally leak across EBGP boundaries.
8. Acknowledgements
Thanks to Robert Raszuk, Gyan Mishra, Ron Bonica for the review,
discussions and input to the draft.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC3392] Chandra, R. and J. Scudder, "Capabilities Advertisement
with BGP-4", RFC 3392, DOI 10.17487/RFC3392, November
2002, <https://www.rfc-editor.org/info/rfc3392>.
9.2. References
[ADDPATH-GUIDELINES]
Uttaro, Ed., "BGP Flow-Spec Redirect to IP Action", 25
April 2016, <https://datatracker.ietf.org/doc/html/draft-
ietf-idr-add-paths-guidelines-08#section-2>.
Vairavakkalai, et al. Expires 1 July 2022 [Page 20]
Internet-Draft BGP MultiNexthop attribute December 2021
[BGP-CT] Vairavakkalai, Ed., "BGP Classful Transport Planes", 25
August 2021, <https://datatracker.ietf.org/doc/draft-
kaliraj-idr-bgp-classful-transport-planes/12/>.
[FLWSPC-REDIR-IP]
Simpson, Ed., "BGP Flow-Spec Redirect to IP Action", 2
February 2015, <https://datatracker.ietf.org/doc/html/
draft-ietf-idr-flowspec-redirect-ip#section-3>.
[MPLS-NAMESPACES]
Vairavakkalai, Ed., "BGP signalled MPLS-namespaces", 28
December 2021, <https://datatracker.ietf.org/doc/html/
draft-kaliraj-bess-bgp-sig-private-mpls-labels-04>.
[SRTE-COLOR-ONLY]
Filsfils, Ed., "BGP Flow-Spec Redirect to IP Action", 21
February 2018, <https://tools.ietf.org/html/draft-
filsfils-spring-segment-routing-policy-06#section-8.8.1>.
Authors' Addresses
Kaliraj Vairavakkalai
Juniper Networks, Inc.
1194 N. Mathilda Ave.
Sunnyvale, CA 94089
United States of America
Email: kaliraj@juniper.net
Minto Jeyananth
Juniper Networks, Inc.
1194 N. Mathilda Ave.
Sunnyvale, CA 94089
United States of America
Email: minto@juniper.net
Gyan Mishra
Verizon Communications Inc.
13101 Columbia Pike
Silver Spring, MD 20904
United States of America
Email: gyan.s.mishra@verizon.com
Vairavakkalai, et al. Expires 1 July 2022 [Page 21]