Network Working Group K. Patel
Internet-Draft C. Appanna
Intended status: Standards Track P. Mohapatra
Expires: February 9, 2011 Cisco Systems
J. Scudder
Juniper Networks
J. Uttaro
AT&T
August 8, 2010
Root cause notification to solve BGP path hunting
draft-keyupate-bgp-rcn-00.txt
Abstract
Whenever a prefix is withdrawn using BGP withdrawal mechanism, it
triggers a number of updates in certain scenarios before the prefix
is completly withdrawn from the entire BGP network. This phenomenon
is popularly known as _path exploration_ or _path hunting_ and occurs
because of path vector property of BGP. It results in a series of
unwanted or redundant transitions that overloads the BGP network.
This document describes a mechanism to help limit the amount of such
path exploration by defining two optional transitive path attributes
for BGP: SPEAKERID_PATH and ROOT_CAUSE.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 9, 2011.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
Patel, et al. Expires February 9, 2011 [Page 1]
Internet-Draft Root Cause Notification August 2010
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Patel, et al. Expires February 9, 2011 [Page 2]
Internet-Draft Root Cause Notification August 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5
2. Reference Diagram . . . . . . . . . . . . . . . . . . . . . . 5
3. SPEAKERID_PATH attribute . . . . . . . . . . . . . . . . . . . 6
4. ROOT_CAUSE attribute . . . . . . . . . . . . . . . . . . . . . 8
5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Sending SPEAKERID_PATH attribute . . . . . . . . . . . . . 9
5.2. Sending ROOT_CAUSE attribute . . . . . . . . . . . . . . . 9
5.2.1. At the point of occurrence . . . . . . . . . . . . . . 9
5.2.2. At an intermediate point . . . . . . . . . . . . . . . 9
5.3. Receiving ROOT_CAUSE Attribute . . . . . . . . . . . . . . 10
5.4. Usage of BGP Aggregates . . . . . . . . . . . . . . . . . 10
5.5. BGP Confederation . . . . . . . . . . . . . . . . . . . . 10
5.6. BGP Inactive Timer . . . . . . . . . . . . . . . . . . . . 10
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.1. Normative References . . . . . . . . . . . . . . . . . . . 11
9.2. Informative References . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
Patel, et al. Expires February 9, 2011 [Page 3]
Internet-Draft Root Cause Notification August 2010
1. Introduction
Whenever a prefix is withdrawn using BGP withdrawal mechanism, it
triggers a number of updates in certain scenarios before the prefix
is completly withdrawn from the entire BGP network. This phenomenon
is popularly known as _path exploration_ or _path hunting_ and occurs
because of path vector property of BGP. It results in a series of
unwanted or redundant transitions that overloads the BGP network
([I-D.li-bgp-stability]).
It is interesting to note that these redundant transitions can end up
triggering route dampening ([RFC2439], if deployed in the network.
Additionally, route dampening itself is known to cause path
exploration in the network due to the delay it introduces
([I-D.li-bgp-stability]). This effectively creates a spiral effect
on BGP instability. Both the generation of unwanted update messages
and the triggering of route dampening can adversly affect the BGP
convergence time.
The problem lies in the way BGP path vector is defined. With a link
state protocol, each router stores a complete view of the entire
network and derives reachability information from that view. In the
event of a flap, each router can correctly determine all paths that
suffer from the same root cause. This is not scalable in large
networks in which BGP operates. By design, BGP advertises only the
path it is using in terms of ASes to its neighbors with each prefix.
Unfortunately, this information is coarse even in a simple topology
as the number of possible paths through the routers is quite large.
When a route is not reachable, because the detail route information
is not included, BGP selection process may end up choosing an
alternative path that is actually not available. After sets of such
transitions, BGP speaker will resolve this abnormality and decide on
correct available path based on receiver side loop detection.
This document proposes a mechanism to identify unreachable paths for
which BGP withdrawals are not received and prevent them from being
selected as prefered paths. This helps avoid unnecessary route
flapping within the network. A new optional transitive path
attribute, SPEAKERID_PATH is tagged in BGP announcements as the
prefix travels through the network, essentially creating more
granular information about routers in the path. When a prefix is
withdrawn, another optional transitive attribute, ROOT_CAUSE is
attached to the implicit or explicit withdraws that are generated at
different points in the network. This attribute is created once at
the point of occurrence of the fault and gets attached to the
resulting UPDATE message throughout the network unchanged. At a
receiving speaker, the ROOT_CAUSE attribute is matched against the
SPEAKERID_PATH attributes of available paths to help identify and
Patel, et al. Expires February 9, 2011 [Page 4]
Internet-Draft Root Cause Notification August 2010
avoid those that are unreachable since they are affected by the same
root cause.
Path exploration caused by new prefix advertisements is not discussed
in this document.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Reference Diagram
+-------+
| AS5 |
.................R11 R13.......
. | | .
. +--R12--+ .
. . .
. . .
. . .
. . .
+--R4---+ +---R7--+ +--R10--+
| | | | | |
R2 R3........... R5 R6............... R8 R9
. | AS2 | | AS3 | | AS4 |
. +-------+ +-------+ +-------+
.
.
.
.
+-- R1--+
| |
| AS1 |
| |
+-------+
The figure above describes a topology that leads to classic path
hunting problem. In steady state, AS5 has 3 paths for prefixes
received from AS1:
Patel, et al. Expires February 9, 2011 [Page 5]
Internet-Draft Root Cause Notification August 2010
+----------+---------+
| Path | AS_PATH |
+----------+---------+
| p1(best) | 2 1 |
| p2 | 3 2 1 |
| p3 | 4 3 2 1 |
+----------+---------+
When the link between AS1 and AS2 goes down, it leads to a series of
events and actions at AS5 as follows:
+------+---------------------+--------------------------------------+
| Step | Event | Action |
+------+---------------------+--------------------------------------+
| 1 | Recv withdraw of p1 | Select p2 as best |
| | | Send AS_PATH (5 3 2 1) upstream |
| -- | -- | -- |
| 2 | Recv withdraw of p2 | Select p3 as best |
| | | Send AS_PATH (5 4 3 2 1) upstream |
| -- | -- | -- |
| 3 | Recv withdraw of p3 | Prefixes have no path |
| | | Send withdraw for the prefixes |
| | | upstream |
+------+---------------------+--------------------------------------+
This trivial example creates unnecessary churn in the network till
the end state is reached.
3. SPEAKERID_PATH attribute
SPEAKERID_PATH is an optional transitive attribute that is very
similar in encoding and operation to the AS_PATH attribute. It is
composed of a sequence of SPEAKERID path segments. Each segment is
represented by a triple (type, length, value). Following is the
format:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
| (1 octet) | (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ~
~ Value ~
~ ~
Patel, et al. Expires February 9, 2011 [Page 6]
Internet-Draft Root Cause Notification August 2010
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The type is a 1-octet field with the following value defined:
Value Type definition
1 AS_ID_SEQUENCE: ordered set of AS and Speaker ID pair
a route in the UPDATE message has traversed.
The length is a 1-octet field, containing the number of such pairs.
Thus when the type is 1, the value contains one or more entries of
the following:
+---------------------------------+
| AS (4 bytes) |
+---------------------------------+
| SPEAKER-ID (4 bytes) |
+---------------------------------+
The use and meaning of these fields are as follows:
AS: The AS is a four-octet field that indicates the AS number of
the BGP speaker. If this ASN is from the public ASN space, it
must have been assigned by the appropriate authority (use of ASN
values from the private ASN space is strongly discouraged). Note
that when a four-octet AS supporting speaker (NEW) announces an
UPDATE to a two-octet AS supporting speaker (OLD), it encodes
AS_TRANS as a two-octet AS in the AS_PATH attribute instead of its
own AS ([I-D.ietf-idr-rfc4893bis]). But while encoding the
SPEAKERID_PATH attribute, it MUST put its own four-octet AS in
this field regardless of whether the neighbor to whom the UPDATE
message is being sent is an OLD or NEW speaker.
SPEAKER-ID: The SPEAKER-ID is a four-octet field that indicates
the router-id of the BGP speaker. If the router-id is from the
public address space, it must have been assigned by the
appropriate authority. (use of the private ip address as a
router-id is strongly discourged).
Patel, et al. Expires February 9, 2011 [Page 7]
Internet-Draft Root Cause Notification August 2010
4. ROOT_CAUSE attribute
ROOT_CAUSE is an optional transitive attribute that is composed of
one or more triple (type, length, value). Following is the format:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
| (1 octet) | (1 octet) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ~
~ Value ~
~ ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The type is a 1-octet field with the following value defined:
Value Type definition
1 AS_ID_CONN: AS and router-ID pairs from both sides of the
connection that is the point of occurrence for the
withdraw.
The length is a 1-octet field, containing the length in octets of the
value field. When the type is 1, the value contains the following:
+--------------+
|Flags(1 octet)|
+---------------------------------+
| left AS (4 bytes) |
+---------------------------------+
| left SPEAKER-ID (4 bytes) |
+---------------------------------+
| right AS (4 bytes) |
+---------------------------------+
| right SPEAKER-ID (4 bytes) |
+---------------------------------+
Patel, et al. Expires February 9, 2011 [Page 8]
Internet-Draft Root Cause Notification August 2010
5. Operation
5.1. Sending SPEAKERID_PATH attribute
When a BGP speaker supporting the mechanism described in this
document propagates a route it learned from another BGP speaker's
UPDATE message, it modifies the route's SPEAKERID_PATH attribute by
prepending its own router-ID and AS number as the last pair of the
sequence. If there is no such attribute, the local system creates
the attribute, creates a new segment in the attribute of type
AS_ID_SEQUENCE and places its own pair into that segment. If the act
of prepending will cause an overflow in the existing segment (i.e.
more than 255 pairs), it MUST prepend a new segment of type
AS_ID_SEQUENCE and prepend its own pair to this new segment. This
operation should be performed regardless of whether the peer is IBGP
or EBGP.
5.2. Sending ROOT_CAUSE attribute
5.2.1. At the point of occurrence
A BGP speaker originates the ROOT_CAUSE attribute into an UPDATE
message in one of the following scenarios:
o A session with a peer AS goes down or the associated link goes
down and the received prefixes need to be withdrawn or their
bestpath changes.
o it receives withdraws for some prefixes without the ROOT_CAUSE
attribute and they in turn need to be either withdrawn from the
ASes upstream or re-advertised with new paths.
While originating the attribute, the speaker encodes the router-ID
and AS of each side of the session.
5.2.2. At an intermediate point
Any speaker receiving a withdrawal UPDATE message with ROOT_CAUSE
attribute should preserve and announce the resulting UPDATE message
with the same attribute value. This can be an explicit withdraw for
a prefix or an implicit withdraw.
Any speaker receiving a reachable UPDATE message with ROOT_CAUSE
attribute should preserve the attribute and not announce the
attribute in resulting UPDATE message unless the resulting UPDATE
message is an explicit withdrawal message.
Patel, et al. Expires February 9, 2011 [Page 9]
Internet-Draft Root Cause Notification August 2010
5.3. Receiving ROOT_CAUSE Attribute
Whenever a BGP speaker receives an update message to process
withdrawn prefixes, it does the following:
o Remove the BGP path of the prefix withdrawn.
o Find all the other paths that have matching ROOT_CAUSE information
to the one present in path that is removed. Place these paths on
an Inactive timer for an Inactive time interval. Do not select
these paths for the BGP bespath selection.
5.4. Usage of BGP Aggregates
Whenever a BGP speaker creates an aggregate route from more specific
routes, it will not inherit any BGP SPEAKERID_PATH information from
its more specific routes used for aggregation. Instead, it will
create its own SPEAKERID_PATH attribute when it announces the
aggregate route to its BGP peers, i.e. the attribute will contain one
segment with only its own (AS, router-id) pair when it announces the
aggregate.
5.5. BGP Confederation
BGP Confederation Speaker peering with EBGP peers and receiving
routes from them will exchange BGP Route Originator attributes as
well. Whenever a Special Withdrawal message is received, following
is done:
o Remove the path announced by peer (sending a Special Withdrawal
message).
o Not select any other BGP Paths with matching Route Originator
Attribute (as one received in the Special Withdrawal).
o If there arent any alternate paths available, forward the Special
Withdrawal message (with originate Route Originator Attribute).
5.6. BGP Inactive Timer
BGP inactive timer is used for suppressing path information from
being used in BGP bestpath selection. This prevents BGP from
selecting such alternate paths for which withdrawals are not received
yet. A BGP speaker should remove suppress paths whenever withdrawn.
A BGP speaker must subject all the suppress paths for BGP bestpath
selection if they are not withdrawn even after inactive timer
expires. The timeout for an Inactive Timer should be kept big enough
to allow the withdrawal information to propagate across the AS.
Patel, et al. Expires February 9, 2011 [Page 10]
Internet-Draft Root Cause Notification August 2010
6. Acknowledgements
Authors would like to thank Robert Raszuk and Pedro Marques for their
input.
7. IANA Considerations
IANA shall assign codepoints for the SPEAKERID_PATH and ROOT_CAUSE
attributes. These codepoints will come from the "BGP Path
Attributes" registry.
8. Security Considerations
This extension to BGP does not change the underlying security issues.
9. References
9.1. Normative References
[I-D.ietf-idr-rfc4893bis]
Vohra, Q. and E. Chen, "BGP Support for Four-octet AS
Number Space", draft-ietf-idr-rfc4893bis-01 (work in
progress), October 2009.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route
Flap Damping", RFC 2439, November 1998.
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006.
9.2. Informative References
[I-D.li-bgp-stability]
Huston, G. and T. Li, "BGP Stability Improvements",
draft-li-bgp-stability-01 (work in progress), June 2007.
Patel, et al. Expires February 9, 2011 [Page 11]
Internet-Draft Root Cause Notification August 2010
Authors' Addresses
Keyur Patel
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: keyupate@cisco.com
Chandra Appanna
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: chandra@cisco.com
Pradosh Mohapatra
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134
USA
Email: pmohapat@cisco.com
John Scudder
Juniper Networks
1194 N. Mathilda Ave
Sunnyvale, CA 94089
USA
Email: jgs@juniper.net
James Uttaro
AT&T
200 S. Laurel Ave
Middletown, NJ 07748
USA
Email: uttaro@att.com
Patel, et al. Expires February 9, 2011 [Page 12]