Making Route Servers Aware of Data Link Failures at IXPs
The information below is for an old version of the document.
This is an older version of an Internet-Draft whose latest revision state is "Expired".
|Authors||Randy Bush , Jeffrey Haas , John Scudder , Arnold Nipper , Christoph Dietzel|
|Last updated||2018-12-02 (Latest revision 2018-10-01)|
|RFC stream||Internet Engineering Task Force (IETF)|
|Additional resources||Mailing list discussion|
|Stream||WG state||Waiting for Implementation|
|Document shepherd||Susan Hares|
|IESG||IESG state||I-D Exists|
|Send notices to||Susan Hares <firstname.lastname@example.org>|
Network Working Group R. Bush Internet-Draft Internet Initiative Japan Intended status: Standards Track J. Haas Expires: April 4, 2019 J. Scudder Juniper Networks, Inc. A. Nipper C. Dietzel DE-CIX October 1, 2018 Making Route Servers Aware of Data Link Failures at IXPs draft-ietf-idr-rs-bfd-06 Abstract When BGP route servers are used, the data plane is not congruent with the control plane. Therefore, peers at an Internet exchange can lose data connectivity without the control plane being aware of it, and packets are lost. This document proposes the use of a newly defined BGP Subsequent Address Family Identifier (SAFI) both to allow the route server to request its clients use BFD to track data plane connectivity to their peers' addresses, and for the clients to signal that connectivity state back to the route server. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC2119] only when they appear in all upper case. They may also appear in lower or mixed case as English words, without normative meaning. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Bush, et al. Expires April 4, 2019 [Page 1] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 This Internet-Draft will expire on April 4, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Next Hop Validation . . . . . . . . . . . . . . . . . . . . . 5 4.1. ReachAsk . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. LocReach . . . . . . . . . . . . . . . . . . . . . . . . 6 4.3. ReachTell . . . . . . . . . . . . . . . . . . . . . . . . 7 4.4. NHIB . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Advertising NH-Reach state in BGP . . . . . . . . . . . . . . 7 6. Client Procedures for NH-Reach Changes . . . . . . . . . . . 9 7. Recommendations for Using BFD . . . . . . . . . . . . . . . . 9 8. Other Considerations . . . . . . . . . . . . . . . . . . . . 10 9. Acknolwedgments . . . . . . . . . . . . . . . . . . . . . . . 10 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 11. Security Considerations . . . . . . . . . . . . . . . . . . . 10 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 12.1. Normative References . . . . . . . . . . . . . . . . . . 11 12.2. Informative References . . . . . . . . . . . . . . . . . 12 Appendix A. Summary of Document Changes . . . . . . . . . . . . 12 Appendix B. Other Forms of Connectity Checks . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction In configurations (typically Internet Exchange Points (IXPs)) where EBGP routing information is exchanged between client routers through the agency of a route server (RS) [RFC7947], but traffic is exchanged directly, operational issues can arise when partial data plane connectivity exists among the route server client routers. Since the Bush, et al. Expires April 4, 2019 [Page 2] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 data plane is not congruent with the control plane, the client routers on the IXP can lose data connectivity without the control plane - the route server - being aware of it, resulting in significant data loss. To remedy this, two basic problems need to be solved: 1. Client routers must have a means of verifying connectivity amongst themselves, and 2. Client routers must have a means of communicating the knowledge of the failure (and restoration) back to the route server. The first can be solved by application of Bidirectional Forwarding Detection [RFC5880]. The second can be solved by exchanging BGP routes which use the NH-Reach Subsequent Address Family Identifier (SAFI) defined in this document. Throughout this document, we generally assume that the route server being discussed is able to represent different RIBs towards different clients, as discussed in section 184.108.40.206 of [RFC7947]. If this is not the case, the procedures described here to allow BFD to be automatically provisioned between clients still have value; however, the procedures for signaling reachability back to the route server may not. Throughout this document, we refer to the "route server", "RS" or just "server" and the "client" to describe the two BGP routers engaging in the exchange of information. We observe that there could be other applications for this extension. Our use of terminology is intended for clarity of description, and not to limit the future applicability of the proposal. [I-D.ietf-idr-bgp-bestpath-selection-criteria] discusses enhancement of the route resolvability condition of section 220.127.116.11 of [RFC4271] to include next hop reachability and path availability checks. This specification represents in part an instance of such, implemented using BFD as the OAM mechanism. 2. Definitions o Indirect peer: If a route server is configured such that routes from a given client might be sent to some other client, or vice- versa, those two clients are considered to be indirect peers. o Indirect Peer's Address, IPA, next hop: We refer frequently to a next hop. It should generally be clear from context what is intended, almost always an address associated with an indirect peer (the exception, when an indirect peer sends a third party next hop, is discussed in Section 3). In Section 5 we discuss the Bush, et al. Expires April 4, 2019 [Page 3] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 MP-BGP [RFC4760] Next Hop field; this is distinguished by its capitalization and should also be clear from context. Later in that section we define the Indirect Peer's Address field of the NLRI, also called "IPA". It will be clear to the reader that this refers to the "next hops" discussed elsewhere in the document, but we don't use the name "next hop" for this field to avoid confusion with the pre-existing next hop path attribute of [RFC4271] and attribute field of [RFC4760]. o RS: Route Server. See [RFC7947]. 3. Overview As with the base BGP protocol, we model the function of this extension as the interaction between a conceptual set of databases: o ReachAsk: The reachability request database. A database of next hops (host addresses) for which data plane reachability is being queried. o ReachAsk-Out: A set of queries sent to the client. o ReachAsk-In: A set of queries received from the route server. o ReachTell: The reachability response database. A database of responses to ReachAsk queries, indicating what is known about data plane reachability. o ReachTell-Out: The responses being sent to the route server. o ReachTell-In: The response received from the client. o LocReach: The local reachability database. o NHIB: Next Hop Information Base. Stores what is known about the client's reachability to its next hops. Bush, et al. Expires April 4, 2019 [Page 4] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 +--------------------------------------------------------+ | +------------+ +------------+ +------------+ | | | Per- | | Configured | | Per- | | | | Client | | indirect | | Client | | | | NHIB | | peers | | RIB | | | +-----^------+ +------------+ +-----+------+ | | | \ | | | +-----+------+ `-->-----v------+ | | |ReachTell-In| |ReachAsk-Out| | | +------^-----+ Route Server +-----+------+ | +----------|----------------------------------|----------+ | | | | | | | | +----------|----------------------------------|----------+ | +------+------+ RS Client +-----v-----+ | | |ReachTell-Out| |ReachAsk-In| | | +------^------+ +-----+-----+ | | | +------------+ | | | | | | | | | `----------+ LocReach <----------' | | | | | | +------------+ | +--------------------------------------------------------+ Route Server, RS Client, and Reachability Ask and Tell databases with In/Out Queues In outline, the route server requests its client to track connectivity for all the potential next hops the RS might send to the client, by sending these next hops as ReachAsk "routes". The client tracks connectivity using BFD and reports its connectivity status to the RS using ReachTell "routes". Connectivity status may be that the next hop is reachable, unreachable, or unknown. Once the RS has been informed by the client of its connectivity, it uses this information to influence the route selection the RS performs on behalf of the client. Details are elaborated in the following sections. 4. Next Hop Validation Below, we detail procedures where a route server tells its client router about other client next hops by sending it ReachAsk routes and the client router verifies connectivity to those other client routers and communicates its findings back to the RS using ReachTell routes. The RS uses the received ReachTell routes as input to the NHIB and hence the route selection process it performs on behalf of the client. Bush, et al. Expires April 4, 2019 [Page 5] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 4.1. ReachAsk The route server maintains a ReachAsk database for each client that supports this proposal, that is, for each client that has advertised support (Section 5) for the NH-Reach SAFI. This database is the union of: o The set of next hops found in the associated per-client Loc-RIB (see section 18.104.22.168 of [RFC7947]). o The set of addresses of this client's indirect peers (Section 2). o The RS MAY also add other entries, for example under configuration control. We note that under most circumstances, the first (Loc-RIB next hops) set will be a subset of the second (indirect peers) set. For this not to be the case, a client would have to have sent a "third party" next hop [RFC4271] to the server. To cover such a case, an implementation MAY note any such next hops, and include them in its list of indirect peers. (This implies that if a third party next hop for client C is conveyed to client A, not only will C be placed in A's ReachAsk database, but A will be placed in C's ReachAsk database.) The contents of the ReachAsk database are communicated to the client using the NLRI format and procedures described in Section 5. 4.2. LocReach The client MUST attempt to track data plane connectivity to each host address depicted in the ReachAsk database. It MAY also track connectivity to other addresses. The use of BFD for this purpose is detailed in Section 6. For each address being tracked, its state is maintained by the client in a LocReach entry. The state can be: o Unknown. Connectivity status is unknown. This may be due to a temporary or permanent lack of feasible OAM mechanism to determine the status. o Up. The address has been determined to be reachable. o Down. The address has been determined to be unreachable. The LocReach database is used as input for the ReachTell database; it MAY also be used as input to the client's route resolvability condition (section 22.214.171.124 of [RFC4271]). Bush, et al. Expires April 4, 2019 [Page 6] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 4.3. ReachTell The ReachTell database contains an entry for every entry in the LocReach database. The contents of the ReachTell database are communicated to the server using the NLRI format and procedures described in Section 5. 4.4. NHIB The route server maintains a per-client Next Hop Information Base, or NHIB. This contains the information about next hop status received from ReachTell. In computing its per-client Loc-RIB, the RS uses the content of the related per-client NHIB as input to the route resolvability condition (section 126.96.36.199 of [RFC4271]). The next hop being resolved is looked up in the NHIB and its state determined: o Up next hops are considered resolvable. o Unknown next hops MAY be considered resolvable. They MAY be less preferred for selection. o Down next hops MUST NOT be considered resolvable. o If a given next hop is not present in the NHIB, but is present in ReachAsk-Out, either the client has not responded yet (a transient condition) or an error exists. Similar to Unknown next hops, such routes MAY be considered resolvable; they MAY be less preferred. 5. Advertising NH-Reach state in BGP A new BGP SAFI, the NH-Reach SAFI, is defined in this document. It has been assigned value TBD. A route server or a route server client using the procedures in this document MUST advertise support for this SAFI, for the IPv4 and/or IPv6 Address Family Identifier (AFI). The use of this SAFI with any other AFI is not defined by this document. NH-Reach NLRI "routes" have a Length of Next Hop Network Address value of 0, therefore they have an empty Network Address of Next Hop field (section 3 of [RFC4760]). Since as specified here, ReachTell "routes" from different clients populate distinct databases on the RS, there will generally be only a single path per "route"; this implies that route selection need not be performed (or equivalently, that it's trivial to perform). In the other direction, a client might peer with multiple route servers and receive differing sets of ReachAsk routes from them. An implementation MAY handle this situation by implementing a distinct Bush, et al. Expires April 4, 2019 [Page 7] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 ReachAsk and ReachTell per server, but it MAY also handle it by placing all servers' ReachAsk "routes" into a single ReachAsk, and sending the results to all servers from a single ReachTell. This would imply some route server(s) might get ReachTell results they had not asked for, but this is permissible in any case. Again, since the contents of ReachAsk are simply a set of host routes to be tested, route selection over a combined ReachAsk MAY be omitted. ReachAsk and ReachTell entries are exchanged using the NH-Reach NLRI encoding: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|Reserved |Sta| Indirect Peer's Address (4 or 16 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . ... Indirect Peer's Address (4 or 16 octets) ... . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NH-Reach NLRI Format o T: Type is a one-bit field that can take the value 0, meaning the NLRI is a ReachAsk entry, or 1, meaning it is a ReachTell entry. o Reserved: These five bits are reserved. They MUST be sent as zero and MUST be disregarded on receipt. o Sta: State is a two-bit field used to signal the LocReach (Section 4.2) state: * 0 or 3: Unknown. * 1: Up. * 2: Down. Although either 0 or 3 is to be interpreted as "Unknown", the value 0 MUST be used on transmission. The value 3 MUST be accepted as an alias for 0 on receipt. o The Indirect Peer's Address ("IPA") field is an IPv4 or IPv6 host route, depending on whether the AFI is IPv4 or IPv6. ReachAsk and ReachTell entries MUST NOT be propagated from one BGP peering session to another; the routes are not transitive. The IPA field is the key for the NH-Reach NLRI type; the information encoded in the top octet is non-key information. It is possible in principle (although unlikely) for two NLRI to be validly present in an UPDATE message with identical IPA fields but different types. However, two NLRI with the same IPA field and different State fields MUST NOT be encoded in the same UPDATE message. If such is Bush, et al. Expires April 4, 2019 [Page 8] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 encountered, the receiver MUST behave as though the state "Unknown" was received for the IPA in question. 6. Client Procedures for NH-Reach Changes When an entry is added to a route server client's ReachAsk-In for a route server peering session, the client will then attempt to verify connectivity to the host depicted by that entry. The procedure described in this specification utilizes BFD. If no existing BFD session exists to this next hop, a BFD session is provisioned to that IP address and the LocReach reachability state (Section 4.2) is set to Unknown. If the client cannot establish a BFD session with an entry in its ReachAsk-In, the next hop remains in LocReach with its Reachable state Unknown. Once the BFD session moves to the Up state, the LocReach reachability state is set to Up. When the BFD session transitions out of the Up state to the Down state, the LocReach reachability state is set to Down. If the BFD session transitions out of the Up state to the AdminDown state, the LocReach reachability state is set to Unknown. When entries are removed from the route server client's ReachAsk-In for a route server peering session, the client MAY delay de- provisioning the BFD peering session. If the client delays de- provisioning the session, it should remove it if the BFD session transitions to the Down or AdminDown states. 7. Recommendations for Using BFD The RECOMMENDED way a client router can confirm the data plane connectivity to its next hops is available, is the use of BFD in asynchronous mode. Echo mode MAY be used if both client routers running a BFD session support this. The use of authentication in BFD is OPTIONAL as there is a certain level of trust between the operators of the client routers at a particular IXP. If trust cannot be assumed, it is recommended to use pair-wise keys (how this can be achieved is outside the scope of this document). The ttl/hop limit values as described in section 5 [RFC5881] MUST be obeyed in order to shield BFD sessions against packets coming from outside the IXP. The following values of the BFD configuration of client routers (see section 6.8.1 [RFC5880]) are RECOMMENDED: Bush, et al. Expires April 4, 2019 [Page 9] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 o DesiredMinTxInterval: 1,000,000 (microseconds) o RequiredMinRxInterval: 1,000,000 (microseconds) o DetectMult: 3 A client router administrator MAY select more appropriate values to meet the special needs of a particular deployment. 8. Other Considerations For purposes of routing stability, implementations may wish to apply hysteresis ("holddown") to next hops that have transitioned from reachable to unreachable and back. Implementations MAY restrict the range of addresses with which they will attempt to form BFD relationships. For example, an implementation might by default only allow BFD relationships with peers that share a subnetwork with the route server. An implementation MAY apply such restrictions by default. In a route-server environment, use of this feature SHOULD be restricted to consider only routes that are advertised from within the IXP network. This might include checks on AS_PATH length. 9. Acknolwedgments The authors would like to thanks Thomas King for his contributions toward this work. 10. IANA Considerations IANA is requested to allocate a value from the Subsequent Address Family Identifiers (SAFI) Parameters registry for this proposal. Its Description in that registry shall be NH-Reach with a Reference of this RFC. 11. Security Considerations The mechanism in this document permits a route server client to influence the contents of the route server's Adj-Ribs-Out through its reports of next hop reachability state using the NH-Reach SAFI. Since this state is per-client, if a route server client is able to inject NH-Reach routes for another route server's BGP session to a client, it can cause the route server to select different forwarding than otherwise expected. This issue may be mitigated using transport security on the BGP sessions between the route server and its clients. See [RFC4272]. Bush, et al. Expires April 4, 2019 [Page 10] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 The NH-Reach SAFI enables the server to trigger creation of a BFD session on its client. A malicious or misbehaving server could trigger an unreasonable number of sessions, a potential resource exhaustion attack. The sedate default timers proposed in Section 7 mitigate this; they also mitigate concerns about use of the client as a source of packets in a flooding attack. An implementation MAY also impose limits on the number of BFD sessions it will create at the request of the server. The reachability tests between route server clients themselves may be a target for attack. Such attacks may include forcing a BFD session Down through injecting false BFD state. A less likely attack includes forcing a BFD session to stay Up when its real state is Down. These attacks may be mitigated using the BFD security mechanisms defined in [RFC5880]. 12. References 12.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc- editor.org/info/rfc2119>. [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, <https://www.rfc- editor.org/info/rfc4271>. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, January 2007, <https://www.rfc- editor.org/info/rfc4760>. [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, <https://www.rfc-editor.org/info/rfc5880>. [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, DOI 10.17487/RFC5881, June 2010, <https://www.rfc- editor.org/info/rfc5881>. [RFC7947] Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, "Internet Exchange BGP Route Server", RFC 7947, DOI 10.17487/RFC7947, September 2016, <https://www.rfc- editor.org/info/rfc7947>. Bush, et al. Expires April 4, 2019 [Page 11] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 12.2. Informative References [I-D.chen-bfd-unsolicited] Chen, E., Shen, N., and R. Raszuk, "Unsolicited BFD for Sessionless Applications", draft-chen-bfd-unsolicited-02 (work in progress), January 2018. [I-D.ietf-idr-bgp-bestpath-selection-criteria] Asati, R., "BGP Bestpath Selection Criteria Enhancement", draft-ietf-idr-bgp-bestpath-selection-criteria-09 (work in progress), June 2018. [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", RFC 4272, DOI 10.17487/RFC4272, January 2006, <https://www.rfc-editor.org/info/rfc4272>. [RFC7880] Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and S. Pallagatti, "Seamless Bidirectional Forwarding Detection (S-BFD)", RFC 7880, DOI 10.17487/RFC7880, July 2016, <https://www.rfc-editor.org/info/rfc7880>. Appendix A. Summary of Document Changes idr-06: Refresh -05. idr-04 to idr-05: Added reference to "BGP Bestpath Selection Criteria Enhancement" draft. Rename "next hop" field of NLRI to "Indirect Peer's Address". Add suggestion about AS_PATH length checks. idr-03 to idr-04: Note other forms of connectivity checks. idr-02 to idr-03: Substantial rewrite. Introduce NLRI format that embeds state. idr-01 to idr-02: Move from BGP-LS to NH-Reach SAFI. Lots of editorial changes. idr-00 to idr-01: Add BGP Capability. Move from NH-Cost to BGP-LS. ymbk-01 to idr-00: No technical changes; adopted by IDR. ymbk-00 to ymbk-01: Clarifications to BFD procedures. Use BFD state as an input to BGP route selection. Appendix B. Other Forms of Connectity Checks RFC 5880/5881 BFD is a well-deployed feature. For this reason, it was chosen as the connectivity check utilized for nexthop reachability by this document. As other forms of BFD become more widely deployed, they may also be utilized to provide the connectivity check functionality. Examples of other such BFD mechanisms include: Bush, et al. Expires April 4, 2019 [Page 12] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 o Seamless BFD [RFC7880] o Unsolicited BFD for Sessionless Applications [I-D.chen-bfd-unsolicited] Implementations MUST support RFC 5880/5881 BFD to be compliant with this specification. Implementations MAY support other forms of connectivity check, including those mechanisms listed above, so long as they provide the ability to fall-back to RFC 5880/5881 BFD. Authors' Addresses Randy Bush Internet Initiative Japan 5147 Crystal Springs Bainbridge Island, Washington 98110 US Email: email@example.com Jeffrey Haas Juniper Networks, Inc. 1133 Innovation Way Sunnyvale, CA 94089 US Email: firstname.lastname@example.org John G. Scudder Juniper Networks, Inc. 1133 Innovation Way Sunnyvale, CA 94089 US Email: email@example.com Arnold Nipper DE-CIX Management GmbH Lichtstrasse 43i Cologne 50825 Germany Email: firstname.lastname@example.org Bush, et al. Expires April 4, 2019 [Page 13] Internet-Draft Making RSes aware of IXP Data Link Failures October 2018 Christoph Dietzel DE-CIX Management GmbH Lichtstrasse 43i Cologne 50825 Germany Email: email@example.com Bush, et al. Expires April 4, 2019 [Page 14]