Network Working Group K. Patel
Internet-Draft A. Lindem
Intended status: Standards Track A. Roy
Expires: January 9, 2017 D. Yeung
V. Venugopal
Cisco Systems
July 8, 2016
Shortest Path Routing Extensions for BGP Protocol
draft-keyupate-idr-bgp-spf-00.txt
Abstract
Many Massively Scaled Data Centers (MSDCs) have converged on
simplified layer 3 routing. Furthermore, requirements for
operational simplicity have lead many of these MSDCs to converge on
BGP as their single routing protocol for both their fabric routing
and their Data Center Interconnect (DCI) routing. This document
describes a solution which leverages BGP Link-State distribution and
the Shortest Path First algorithm similar to Internal Gateway
Protocols (IGPs) such as OSPF.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 9, 2017.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
Patel, et al. Expires January 9, 2017 [Page 1]
Internet-Draft BGP Protocol SPF Extensions July 2016
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
2. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 4
2.1. BGP Single-Hop Peering on Network Node Connections . . . 4
2.2. BGP Peering Between Directly Connected Network Nodes . . 4
2.3. BGP Peering in Route-Reflector or Controller Topology . . 4
3. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . . . 5
3.1. Node NLRI Usage and Modifications . . . . . . . . . . . . 5
3.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . . . 6
3.3. Prefix NLRI Usage . . . . . . . . . . . . . . . . . . . . 6
4. Shortest Path Routing (SPF) Capability . . . . . . . . . . . 6
5. Decision Process with SPF Algorithm . . . . . . . . . . . . . 6
5.1. Impact on BGP Tie-breaking attributes . . . . . . . . . . 7
5.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 7
5.3. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 7
5.4. Error Handling . . . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7.1. Acknowledgements . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Normative References . . . . . . . . . . . . . . . . . . 9
8.2. Information References . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
Patel, et al. Expires January 9, 2017 [Page 2]
Internet-Draft BGP Protocol SPF Extensions July 2016
1. Introduction
Many Massively Scaled Data Centers (MSDCs) have converged on
simplified layer 3 routing. Furthermore, requirements for
operational simplicity have lead many of these MSDCs to converge on
BGP [RFC4271] as their single routing protocol for both their fabric
routing and their Data Center Interconnect (DCI) routing.
Requirements and procedures for using BGP are described in
[I-D.ietf-rtgwg-bgp-routing-large-dc]. This document describes an
alternative solution which leverages BGP-LS [RFC7752] and the
Shortest Path First algorithm similar to Internal Gateway Protocols
(IGPs) such as OSPF [RFC2328].
[RFC4271] defines the Decision Process that is used to select routes
for subsequent advertisement by applying the policies in the local
Policy Information Base (PIB) to the routes stored in its Adj-RIBs-
In. The output of the Decision Process is the set of routes that are
announced by a BGP speaker to its peers. These selected routes are
stored by a BGP speaker in the speaker's Adj-RIBs-Out according to
policy.
[RFC7752] describes a mechanism by which link-state and TE
information can be collected from networks and shared with external
components using BGP. This is achieved by defining a NLRI carried
within BGP-LS AFIs and BGP-LS SAFIs. The BGP-LS extensions defined
in [RFC7752] makes use of the Decision Process defined in [RFC4271].
This draft modifies [RFC7752] by replacing its use of the existing
Decision Process; in particular the Phase 1 and 2 decision functions
of the Decision Process are replaced with the Shortest Path Algorithm
(SPF) also known as the Dijkstra Algorithm. The Phase 3 decision
function is also simplified since it is no longer dependent on the
previous phases. This solution avails the benefits of both BGP and
SPF-based IGPs. These include TCP based flow-control, no periodic
link-state refresh, and completely incremental NLRI advertisement.
These advantages can reduce the overhead in MSDCs where there is a
high degree of Equal Cost Multi-Path (ECMPs) and the topology is very
stable. Additionally, using a SPF-based computation can support fast
convergence and the computation of Loop-Free Alternatives (LFAs)
[RFC5286] in the event of link failures. Furthermore, a BGP based
solution lends itself to multiple peering models including those
incorporating route-reflectors [RFC4456] or controllers.
Support for Multiple Topology Routing (MTR) as described in [RFC4915]
is an area for further study dependent on deployment requirements.
Patel, et al. Expires January 9, 2017 [Page 3]
Internet-Draft BGP Protocol SPF Extensions July 2016
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. BGP Peering Models
Depending on the requirements, scaling, and capabilities of the BGP
speakers, various peering models are supported. The only requirement
is that all BGP speakers in the BGP SPF routing domain receive link-
state NLRI on a timely basis, run an SPF calculation, and update
their data plane appropriately. The content of the Link NLRI is
described in Section 3.2.
2.1. BGP Single-Hop Peering on Network Node Connections
The simplest peering model is the one described in section 5.2.1 of
[I-D.ietf-rtgwg-bgp-routing-large-dc]. In this model, EBGP single-
hop sessions are established over direct point-to-point links
interconnecting the network nodes. For the purposes of BGP SPF, Link
NLRI is only advertised if a single-hop BGP session has been
established, the Link-State address family capability has been
exchanged, and the SPF capability has been exchanged on the
corresponding session. If the session goes down, the NLRI will be
withdrawn.
2.2. BGP Peering Between Directly Connected Network Nodes
In this model, BGP speakers peer with all directly connected network
nodes but the sessions may be multi-hop and the direct connection
discovery and liveliness detection for those connections are
independent of the BGP protocol. How this is accomplished is outside
the scope of this document. Consequently, there will be a single
session even if there are multiple direct connections between BGP
speakers. For the purposes of BGP SPF, Link NLRI is advertised as
long as a BGP session has been established, the Link-State address
family capability has been exchanged, the SPF capability has been
exchanged, and the corresponding link is up and considered
operational.
2.3. BGP Peering in Route-Reflector or Controller Topology
In this model, BGP speakers peer solely with one or more Route
Reflectors [RFC4456] or controllers. As in the previous model,
direct connection discovery and liveliness detection for those
connections are done outside the BGP protocol. For the purposes of
Patel, et al. Expires January 9, 2017 [Page 4]
Internet-Draft BGP Protocol SPF Extensions July 2016
BGP SPF, Link NLRI is advertised as long as the corresponding link is
up and considered operational.
3. Extensions to BGP-LS
[RFC7752] describes a mechanism by which link-state and TE
information can be collected from networks and shared with external
components using BGP protocol. It contains two parts: definition of
a new BGP NLRI that describes links, nodes, and prefixes comprising
IGP link-state information and definition of a new BGP path attribute
(BGP-LS attribute) that carries link, node, and prefix properties and
attributes, such as the link and prefix metric or auxiliary Router-
IDs of nodes, etc.
The BGP protocol will be used in the Protocol-ID field specified in
table 1 of [I-D.ietf-idr-bgpls-segment-routing-epe]. The local and
remote node descriptors for all NLRI will be the BGP Router-ID (TLV
516) and either the AS Number (TLV 512) [RFC7752] or the BGP
Confederation Member (TLV 517)
[I-D.ietf-idr-bgpls-segment-routing-epe]. However, if the BGP
Router-ID is known to be unique within the BGP Routing domain, it can
be used as the sole descriptor.
3.1. Node NLRI Usage and Modifications
The SPF capability is a new Node Attribute TLV that will be added to
those defined in table 7 of [RFC7752]. The new attribute TLV will
only be applicable when BGP is specified in the Node NLRI Protocol ID
field. The TBD TLV type will be defined by IANA. The new Node
Attribute TLV will contain a single octet SPF algorithm field:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SPF Algorithm |
+-+-+-+-+-+-+-+-+
The SPF Algorithm may take the following values:
1 - Normal SPF
2 - Strict SPF
When computing the SPF for a given BGP routing domain, only BGP nodes
advertising the SPF capability attribute will be included the
Shortest Path Tree (SPT).
Patel, et al. Expires January 9, 2017 [Page 5]
Internet-Draft BGP Protocol SPF Extensions July 2016
3.2. Link NLRI Usage
The criteria for advertisement of Link NLRI are discussed in
Section 2.
Link NLRI is advertised with local and remote node descriptors as
described above and unique link identifiers dependent on the
addressing. For IPv4 links, the links local IPv4 (TLV 259) and
remote IPv4 (TLV 260) addresses will be used. For IPv6 links, the
local IPv6 (TLV 261) and remote IPv6 (TLV 262) addresses will be
used. For unnumbered links, the link local/remote identifiers (TLV
258) will be used. For links supporting having both IPv4 and IPv6
addresses, both sets of descriptors may be included in the same Link
NLRI. The link identifiers are described in table 5 of [RFC7752].
The link IGP metric attribute TLV (TLV 1095) as well as any others
required for non-SPF purposes SHOULD be advertised. Algorithms such
as setting the metric inversely to the link speed as done in the OSPF
MIB [RFC4750] may be supported. However, this is beyond the scope of
this document.
3.3. Prefix NLRI Usage
Prefix NLRI is advertised with a local descriptor as described above
and the prefix and length used as the descriptors (TLV 265) as
described in [RFC7752]. The prefix metric attribute TLV (TLV 1155)
as well as any others required for non-SPF purposes SHOULD be
advertised. For loopback prefixes, the metric should be 0. For non-
loopback, the setting of the metric is beyond the scope of this
document.
4. Shortest Path Routing (SPF) Capability
In order to replace the Phase 1 and 2 decision functions of the
existing Decision Process with an SPF-based Decision Process, this
draft introduces a new capability to signal the support of an SPF
Decision Process. The SPF Capability is a new BGP Capability
[RFC5492]. The Capability Code for this capability is allocated by
IANA as specified in the Section 6. The Capability Length field of
this capability has a value of 0.
5. Decision Process with SPF Algorithm
The Decision Process described in [RFC4271] takes place in three
distinct phases. The Phase 1 decision function of the Decision
Process is responsible for calculating the degree of preference for
each route received from a Speaker's peer. The Phase 2 decision
function is invoked on completion of the Phase 1 decision function
Patel, et al. Expires January 9, 2017 [Page 6]
Internet-Draft BGP Protocol SPF Extensions July 2016
and is responsible for choosing the best route out of all those
available for each distinct destination, and for installing each
chosen route into the Loc-RIB. The combination of the Phase 1 and 2
decision functions is also known as a Path vector algorithm.
The SPF based Decision process starts with selecting only those Node
NLRI whose SPF capability TLV matches with the local BGP speaker's
SPF capability TLV value. These selected Node NLRI and their Link/
Prefix NLRI are use to build a directed graph during the SPF
computation. The best paths for BGP prefixes are installed as a
result of the SPF process. The Phase 3 decision function of the
Decision Process [RFC4271] is also simplified since it is no longer
based on the output of the previous phases. Since Link-State NLRI
always contains the local descriptor [RFC7752], it will only be
originated by a single BGP speaker in the BGP routing domain. Hence,
for each valid NLRI, the Phase 3 decision function will simply need
to advertise a valid NLRI instance dependent on policy.
5.1. Impact on BGP Tie-breaking attributes
The modified Decision Process with SPF algorithm uses the metric from
Link and Prefix NLRI Attribute TLVs [RFC7752]. As a result, any
attributes that would influence the Decision process defined in
[RFC4271] like ORIGIN, MULTI_EXIT_DISC, and LOCAL_PREF attributes are
ignored by the SPF algorithm. Furthermore, the NEXT_HOP attribute
value is preserved and validated but otherwise ignored in any
received BGP Update messages.
5.2. Dual Stack Support
The SPF based decision process operates on Node, Link, and Prefix
NLRIs that support both IPv4 and IPv6 addresses. Whether to run a
single SPF instance or multiple SPF instances for separate AFs is a
matter of a local implementation. Normally, IPv4 next-hops are
calculated for IPv4 prefixes and IPv6 next-hops are calculated for
IPv6 prefixes. However, an interesting use-case is deployment of
[RFC5549] where IPv6 link-local next-hops are calculated for both
IPv4 and IPv6 prefixes. As stated in Section 1, support for Multiple
Topology Routing (MTR) is an area for future study.
5.3. NEXT_HOP Manipulation
A BGP speaker that supports SPF extensions MAY interact with peers
that don't support SPF extensions. If the BGP Link-State address
family is advertised to a peer not supporting the SPF extensions
described herein, then the BGP speaker MUST conform to the NEXT_HOP
rules mentioned in [RFC4271] when announcing the Link-State address
family routes to those peers.
Patel, et al. Expires January 9, 2017 [Page 7]
Internet-Draft BGP Protocol SPF Extensions July 2016
All BGP peers that support SPF extensions would locally compute the
NEXT_HOP values as result of the SPF process. As a result, the
NEXT_HOP attribute is always ignored on receipt. However BGP
speakers should set the NEXT_HOP address according to the NEXT_HOP
attribute rules mentioned in [RFC4271].
5.4. Error Handling
When a BGP speaker receives a BGP Update containing a malformed SPF
Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST
ignore the received TLV and the Node NLRI and not pass it to other
BGP peers as specified in [RFC7606]. When discarding a Node NLRI
with malformed TLV, a BGP speaker SHOULD log an error for further
analysis.
6. IANA Considerations
This document defines a new capability for BGP known as a SPF
Capability. We request IANA to assign a BGP capability number from
BGP Capability Codes Registry.
This document also defines a new attribute TLV for BGP LS Node NLRI.
We request IANA to assign a new TLV for the SPF capability from the
"BGP-LS Node Descriptor, Link Descriptor, Prefix Descriptor, and
Attribute TLVs" Registry. Additionally, IANA is requested to create
a new registry for "BGP-LS SPF Capability Algorithms" for the value
of the algorithm both in the BGP-LS Node Attribute TLV and the BGP
SPF Capability. The initial assignments are:
+-------------+-----------------------------------+
| Value(s) | Assignment Policy |
+-------------+-----------------------------------+
| 0 | Reserved (not to be assigned) |
| | |
| 1 | SPF |
| | |
| 2 | Strict SPF |
| | |
| 3-254 | Unassigned (IETF Review) |
| | |
| 255 | Reserved (not to be assigned) |
+-------------+-----------------------------------+
BGP-LS SPF Capability Algorithms
Patel, et al. Expires January 9, 2017 [Page 8]
Internet-Draft BGP Protocol SPF Extensions July 2016
7. Security Considerations
This extension to BGP does not change the underlying security issues
inherent in the existing [RFC4724] and [RFC4271].
7.1. Acknowledgements
The authors would like to thank .... for the review and comments.
8. References
8.1. Normative References
[I-D.ietf-idr-bgpls-segment-routing-epe]
Previdi, S., Filsfils, C., Ray, S., Patel, K., Dong, J.,
and M. Chen, "Segment Routing BGP Egress Peer Engineering
BGP-LS Extensions", draft-ietf-idr-bgpls-segment-routing-
epe-05 (work in progress), May 2016.
[I-D.ietf-rtgwg-bgp-routing-large-dc]
Lapukhov, P., Premji, A., and J. Mitchell, "Use of BGP for
routing in large-scale data centers", draft-ietf-rtgwg-
bgp-routing-large-dc-11 (work in progress), June 2016.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006,
<http://www.rfc-editor.org/info/rfc4271>.
[RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement
with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February
2009, <http://www.rfc-editor.org/info/rfc5492>.
[RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K.
Patel, "Revised Error Handling for BGP UPDATE Messages",
RFC 7606, DOI 10.17487/RFC7606, August 2015,
<http://www.rfc-editor.org/info/rfc7606>.
[RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
S. Ray, "North-Bound Distribution of Link-State and
Traffic Engineering (TE) Information Using BGP", RFC 7752,
DOI 10.17487/RFC7752, March 2016,
<http://www.rfc-editor.org/info/rfc7752>.
Patel, et al. Expires January 9, 2017 [Page 9]
Internet-Draft BGP Protocol SPF Extensions July 2016
8.2. Information References
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998,
<http://www.rfc-editor.org/info/rfc2328>.
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route
Reflection: An Alternative to Full Mesh Internal BGP
(IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006,
<http://www.rfc-editor.org/info/rfc4456>.
[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y.
Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724,
DOI 10.17487/RFC4724, January 2007,
<http://www.rfc-editor.org/info/rfc4724>.
[RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed.,
Coltun, R., and F. Baker, "OSPF Version 2 Management
Information Base", RFC 4750, DOI 10.17487/RFC4750,
December 2006, <http://www.rfc-editor.org/info/rfc4750>.
[RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P.
Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF",
RFC 4915, DOI 10.17487/RFC4915, June 2007,
<http://www.rfc-editor.org/info/rfc4915>.
[RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for
IP Fast Reroute: Loop-Free Alternates", RFC 5286,
DOI 10.17487/RFC5286, September 2008,
<http://www.rfc-editor.org/info/rfc5286>.
[RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network
Layer Reachability Information with an IPv6 Next Hop",
RFC 5549, DOI 10.17487/RFC5549, May 2009,
<http://www.rfc-editor.org/info/rfc5549>.
Authors' Addresses
Keyur Patel
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: keyupate@cisco.com
Patel, et al. Expires January 9, 2017 [Page 10]
Internet-Draft BGP Protocol SPF Extensions July 2016
Acee Lindem
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: acee@cisco.com
Abhay Roy
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: akr@cisco.com
Derek Yeung
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: myeung@cisco.com
Venu Venugopal
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: venuv@cisco.com
Patel, et al. Expires January 9, 2017 [Page 11]