Inter-Domain Routing                                               T. Li
Internet-Draft                                       Cisco Systems, Inc.
Intended status: Standards Track                               G. Huston
Expires: December 15, 2007                                         APNIC
                                                           June 13, 2007


                       BGP Stability Improvements
                       draft-li-bgp-stability-01

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 15, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   BGP is the routing protocol used to tie the Autonomous Systems (ASes)
   of the Internet together.  The ongoing stability of BGP in the face
   of arbitrary inputs, both malicious and unintentional, is of primary
   importance to the overall stability of the Internet.  The overall
   issue is not a new one.  Previously, one aspect of stability, known
   as route flap damping was originally discussed in RFC 2439.  In the
   intervening years, a great deal of experience with flap damping and



Li & Huston             Expires December 15, 2007               [Page 1]


Internet-Draft         BGP Stability Improvements              June 2007


   other stability concerns has been accumulated.  Most recently, the
   issue of BGP stability has been highlighted in RAWS.  This document
   describes the experience that has been gained concerning stability in
   the intervening years, hypotheses about remaining problems,
   suggestions for experiments to be performed, and proposals for
   possible alternatives.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  3
     1.2.  History  . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.3.  Observations . . . . . . . . . . . . . . . . . . . . . . .  4
       1.3.1.  Path hunting . . . . . . . . . . . . . . . . . . . . .  4
     1.4.  The wave model . . . . . . . . . . . . . . . . . . . . . .  5
       1.4.1.  Refraction and diffraction . . . . . . . . . . . . . .  5
   2.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     2.1.  Flap damping . . . . . . . . . . . . . . . . . . . . . . .  6
     2.2.  Rapid convergence  . . . . . . . . . . . . . . . . . . . .  6
     2.3.  Reduced overhead . . . . . . . . . . . . . . . . . . . . .  7
   3.  Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . .  7
     3.1.  Turn it off  . . . . . . . . . . . . . . . . . . . . . . .  7
     3.2.  Alternate parameters . . . . . . . . . . . . . . . . . . .  7
     3.3.  Band-stop filtering  . . . . . . . . . . . . . . . . . . .  8
     3.4.  Path length damping  . . . . . . . . . . . . . . . . . . .  8
     3.5.  Optimal path hysteresis  . . . . . . . . . . . . . . . . . 10
       3.5.1.  Optimal path length hysteresis . . . . . . . . . . . . 11
     3.6.  Delayed best path selection  . . . . . . . . . . . . . . . 11
     3.7.  Deprecate MRAI . . . . . . . . . . . . . . . . . . . . . . 12
     3.8.  Hybrid approaches  . . . . . . . . . . . . . . . . . . . . 13
     3.9.  Other work . . . . . . . . . . . . . . . . . . . . . . . . 13
   4.  Next steps . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     4.1.  Call for collaboration . . . . . . . . . . . . . . . . . . 13
     4.2.  Literature search  . . . . . . . . . . . . . . . . . . . . 13
     4.3.  Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 14
       4.3.1.  A Case Study: Path length damping  . . . . . . . . . . 14
     4.4.  Prototyping, Testing and Pilot Deployment  . . . . . . . . 19
   5.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 19
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 20
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 20
     8.3.  Potential References . . . . . . . . . . . . . . . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
   Intellectual Property and Copyright Statements . . . . . . . . . . 23




Li & Huston             Expires December 15, 2007               [Page 2]


Internet-Draft         BGP Stability Improvements              June 2007


1.  Introduction

   BGP [RFC4271] is the routing protocol used to tie the Autonomous
   Systems (ASes) of the Internet together.  The ongoing stability of
   BGP in the face of arbitrary inputs, both malicious and
   unintentional, is of primary importance to the overall stability of
   the Internet.  The overall issue is not a new one.  Previously, one
   aspect of stability, known as route flap damping was originally
   discussed in RFC 2439 [RFC2439].  In the intervening years, a great
   deal of experience with flap damping and other stability concerns has
   been accumulated.  Most recently, the issue of BGP stability has been
   highlighted in RAWS [I-D.iab-raws-report].  This document describes
   the experience that has been gained concerning stability in the
   intervening years, hypotheses about remaining problems, suggestions
   for experiments to be performed, and proposals for possible
   alternatives.

   Please note that this document is very much a work-in-progress.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.2.  History

   The circuits used in computer networks have the unfortunate property
   that they can intermittently fail and then recover.  This was an
   especially common failure mode for copper-based circuits.  Under such
   circumstances, when there was a BGP speaker on both ends of the
   circuit, any prefixes advertised across the link would tend to
   oscillate at the frequency induced by the intermittent link.  The
   oscillating prefixes would then propagate across the full Internet,
   causing the entire routing subsystem to churn at the rate of the
   prefix.

   Individually, a single such prefix is not a significant issue.
   However, as the Internet continued to scale upwards, it became
   obvious that the CPU requirements to deal with the ever-increasing
   number of oscillating prefixes would quickly become onerous.  This
   was aggravated by the fact that the party responsible for the
   flapping circuit was frequently unaware of the problem, or, worse
   yet, unwilling to address the issue.

   Route flap can also be caused by policy oscillations [CN2000] or MED
   churn oscillations [RFC3345].




Li & Huston             Expires December 15, 2007               [Page 3]


Internet-Draft         BGP Stability Improvements              June 2007


   Thus, the original goal of route flap damping was to protect the
   control plane from oscillations.  This was done by determining the
   number of flaps and the time elapsed since the last transition.  This
   is fed into an exponential decay function, and, if the prefix is
   found to be flapping based on this data, the actual propagation of
   the route is suppressed.  Since the frequency information must be
   stored even if the prefix is not currently active, there is state
   overhead associated with flap damping for each prefix that has been
   oscillating.

1.3.  Observations

   Unfortunately, flap damping isn't truly discerning about the nature
   of routing changes.  Any routing change can easily be misinterpreted
   by flap damping as instability, resulting in premature damping of
   prefixes [Harmful].

1.3.1.  Path hunting

   One source of path changes is BGP's normal mechanism for _path
   exploration_ or _path hunting_.  These situations occur because BGP
   is a path-vector protocol, where each BGP speaker advertises the path
   that it is using to its neighbors, complete with the full AS path to
   the destination.  Since the number of possible paths through even a
   simple topology is large, there can be many different path
   transitions that can possibly be advertised.

   Path hunting can occur both when a prefix is first advertised or when
   a prefix is withdrawn.  At advertisement time, the prefix may
   propagate through the topology at different rates, sometimes
   resulting in it first appearing at an AS with a suboptimal path.
   Over time, optimal paths will appear where suboptimal paths were
   before, resulting in a path change that is subsequently propagated.

   Similarly, when a prefix is withdrawn from the network, as the local
   BGP speaker receives the prefix withdrawal from a BGP peer it may
   substitute a previously received announcement for this prefix from a
   different peer in its place and announce this replacement route to
   its peers in response to the received withdrawal.  If this
   replacement path is subsequently withdrawn, the local BGP speaker
   will again select another previously received announcement from a
   different peer, if one exists.  This process may continue until the
   original prefix withdrawal has propagated across the entire routing
   space.

   Interestingly, the amount of path hunting can increase dramatically
   as the meshiness of the topology increases.  It's easy to observe
   this if you first consider an acyclic topology (i.e., a tree).  In



Li & Huston             Expires December 15, 2007               [Page 4]


Internet-Draft         BGP Stability Improvements              June 2007


   such a topology, there is only one possible path, so there is no
   hunting.  If a single link is added to this topology, then there is
   one cycle in the graph and at most two possible paths for BGP to
   explore.  Subsequent links can add many more alternate paths,
   depending on their placement.

   In the worst possible case path hunting in BGP can explore every
   possible path of each path length.  More commonly it has been
   observed that path hunting in today's Internet can add an additional
   2 or 3 BGP updates to a prefix withdrawal.

1.4.  The wave model

   An intuitive means of understanding the observed behavior is by
   analogy to a wave.  Any change in the network triggers the
   dissemination of information (either updates or withdrawals) through
   the topology from the point of occurrence.  The information travels
   outwards along all of the paths supported by BGP, in much the same
   way that a wave would propagate from a pebble dropped in a lake.

   The wave expands at each BGP speaker, where the information is
   propagated to all other BGP peers, including ones that already have
   the information.  If the newly arrived information is inferior to the
   existing path information, then the wave dies out at that BGP
   speaker.  If the newly arrived path is the best path, then the wave
   continues to propagate.

1.4.1.  Refraction and diffraction

   Information does not traverse the BGP mesh at constant rates.
   Differences in implementations, processing loads, propagation delay,
   damping parameters, and policy can all contribute to delaying
   updates.  As with a physical wave, we know that the speed of
   propagation varies with the material.  This results in a bending of
   the wave, known as _refraction_.

   Similarly, since the BGP mesh is not a continuous medium we get
   _diffraction_, where the wave passes through an aperture and fans out
   from there, creating a new wavefront.  Each additional wavefront
   represents additional processing burden on the routing subsystem.

   It is interesting to note that flap damping itself may be a
   contributor to the creation of additional wavefronts.  Since a route
   that is being damped will be delayed for a long time, damping is
   effectively delaying a wave of information, possibly creating more
   refractive and diffractive effects.

   Another major contributor to the generation of additional



Li & Huston             Expires December 15, 2007               [Page 5]


Internet-Draft         BGP Stability Improvements              June 2007


   intermediate wavefronts of information is the disparate use of the
   Minimum Route Advertisement Interval (MRAI) Timer, where various
   implementations impose differing delays on update propagation.

   It should be noted that the wave analogy does break down when it
   comes to interference.  Unlike a physical wave, two waves of BGP
   information do not interfere additively or destructively.  Instead,
   as noted above, at most one wave will propagate from any point, and
   which wave will propagate may vary from point to point.


2.  Goals

2.1.  Flap damping

   As we reconsider the mechanisms that constitute flap damping, we need
   to keep in mind that the original goals of detecting and protecting
   the routing subsystem from noisy inputs is still a requirement.
   While copper circuits are now less common in the core, the overall
   network has expanded dramatically and there is a wide variance in the
   skills and experience in operational roles.

   As a result, it is still possible for an errant AS to inject flapping
   information into the BGP mesh, either as the result of policy
   misconfiguration, implementation error, an intermittent circuit, or
   even as an intentional destructive act.  Thus, it is important that
   there still be mechanisms that intervene and ameliorate these
   effects, protecting the routing subsystem.

2.2.  Rapid convergence

   While protecting the routing system is of paramount importance, it is
   vital that the routing subsystem also continue to perform its primary
   task: providing routing.  Any flap damping and path hunting
   suppression mechanism must continue to provide rapid convergence to
   some workable path so that connectivity is restored.  However, this
   goal should not be construed to require rapid optimality.  While a
   best path should eventually be selected and propagated, it is far
   more important that some connectivity be restored immediately.  Most
   applications can survive with a sub-optimal path, while no
   applications can succeed if no path is selected.

   This distinction seems vital for understanding the precise behavior
   of any mechanism, so for the sake of this discussion, we explicitly
   define two milestones for convergence:






Li & Huston             Expires December 15, 2007               [Page 6]


Internet-Draft         BGP Stability Improvements              June 2007


   reachability:  The source has a valid path to the destination.  The
      path may be suboptimal and may not respect the ultimate traffic
      engineering preferences of the ASes along the path.  As a result,
      the path may exhibit congestion, unwelcome QoS handling, or any
      other of a number of sub-optimalities.  Time-sensitive
      applications such as video or voice may require the optimal path
      for proper operation.

   optimality:  The source has the long-term best path to the
      destination.  This milestone has been reached when there is a
      stable transitive closure of the best path selection process of
      all ASes along the path from source to destination.

2.3.  Reduced overhead

   Any form of damping of BGP updates should strive to remove the
   unnecessary exchange of information.  As described above, both path
   hunting and refractive effects cause unnecessary churn in BGP.  The
   flap damping mechanism should be generalized to help suppress as much
   of this unnecessary information as possible.


3.  Hypotheses

   In this section, we put forth a number of hypotheses about possible
   mechanisms to achieve the goals above.  As of this writing, more
   investigation is needed on each of these theories, and where possible
   we've included some discussion of the experiments that we feel would
   be worthwhile.  Our goal here is to examine a number of different
   mechanisms, understand their relative benefits, and select a small
   subset to become the core set of replacement mechanisms.

3.1.  Turn it off

   Given the concern about the negative effects of path damping,
   [RIPE-378] recommends that path damping be disabled.  While this is
   not unreasonable given the lack of beneficial alternatives, we feel
   that some of the possibilities presented here will eventually prevail
   and that this sentiment can be changed over time.

3.2.  Alternate parameters

   It has been suggested in [Harmful] that the default flap damping
   parameters in existing implementations are simply too aggressive and
   quickly convert normal path hunting into a damping event that
   precludes connectivity.  Significantly increasing the parameters
   could permit significantly more churn to be passed by the routing
   subsystem while still filtering out truly periodic sources of flap.



Li & Huston             Expires December 15, 2007               [Page 7]


Internet-Draft         BGP Stability Improvements              June 2007


   It would be useful to test this by simply configuring numerous
   differing parameters and observing if there is any beneficial effect.
   At this time, we have no recommendations for possible alternative
   parameter settings.

3.3.  Band-stop filtering

   Another view is that classical flap damping isn't working as well as
   we might like because of a frequency mismatch.  The current mechanism
   looks for a number of changes in a given period of time.  If the
   route exceeds this threshold frequency, then it is damped.  The
   assumed information model behind the current BGP Flap Damping
   parameters was that of a relatively low frequency oscillation
   occurring over an extended period of time.

   Measurements of BGP activity indicate that one of the predominant
   signal components is a high frequency oscillation occurring over a
   short time interval.  This acts as a false positive for damping that
   we would like to avoid.  One alternative approach is to shift from
   looking for flapping above a given threshold frequency and simply
   accept that when there is a real topological change, there will be
   extensive high frequency path changes.  These should be dealt with by
   separate path hunting suppression techniques, as described below.

   Some time after the topological change, the high-frequency path
   changes should stop and the route should then resume its stability.
   Subsequent path changes would then be indicative of real oscillation
   and would be subject to damping.

   The implementation of this would be relatively straightforward.  When
   a change is seen on a stable route, it opens an oscillation window of
   a fixed duration (e.g., 60s).  Any changes within that window are not
   considered as contributing to flap damping.  After the window is
   closed, any subsequent changes would count as significant events
   towards damping.  Effectively, this technique creates a filter that
   passes very, very low frequencies (e.g., configuration changes) and
   defers high frequency changes to path hunting mechanisms, but will
   detect and deter ongoing route flap within a certain frequency band.
   This is normally known as a band-stop filter.

3.4.  Path length damping

   The increased meshiness of the core of the Internet has significantly
   changed the nature of path changes that are visible in BGP.  As the
   meshiness of the network increases, the number of parallel links
   between any given pair of ASes tends to increase.  This helps protect
   against single link failures between ASes.  This also reduces the
   frequency of AS path changes on transit prefixes because most of the



Li & Huston             Expires December 15, 2007               [Page 8]


Internet-Draft         BGP Stability Improvements              June 2007


   link failures in the densely meshed part of the network will not
   result in an AS path change.

   As a result, when a BGP speaker does see a change in the AS path, and
   in particular, when the AS path length increases, this would seem to
   be a good heuristic indication that there is some significant
   failure.  The ultimate trigger for the advertisement of an update
   with a longer AS path is the removal of a previously used shorter
   path.  This is either due to withdrawal at the origin, or removal of
   an interior connection.  As a result, it seems likely that such
   failures are less likely to have alternative working paths and that
   the increase in path length is a harbinger of path hunting that is
   likely to be unsuccessful.  We therefore suggest that this event
   could be used to trigger a suppression period, which would allow the
   prefix to oscillate arbitrarily without propagation to the remainder
   of the network.  The obvious risk is that this would be a false
   negative, unnecessarily disrupting connectivity.

   Observations of the time-coupling of updates in BGP that refer to the
   same prefix show that only 28% of all BGP updates occur as an
   isolated event, 26% of all updates occur as a part of a pair of
   updates occurring within 65 seconds of each other and the remaining
   46% of all BGP Updates occur within a coupled sequence of 3 or more
   updates where each update occurs within 65 seconds of the previous
   update in the sequence.

   Again, the implementation of this would be relatively
   straightforward.  When a BGP speaker found that it needed to change
   its best path for a prefix and that the new best path was longer than
   the previous best path, then it would suppress the advertisement of
   the longer AS path to its neighbor and start a timer.  Subsequent
   changes to the prefix that continue to lengthen the AS path would
   restart the timer.  If the BGP speaker installed a shorter best path,
   or undertook a local withdrawal it would remove the suppressed update
   and cancel the timer.  Otherwise, when the timer expired, the BGP
   speaker would advertise the result, if any.

   Some analysis suggests that this technique will be effective at
   suppressing about 20% of the overall churn rate.  [Potaroo0607]

   The benefits of this approach are that it would drastically reduce
   the amount of path hunting that happened when a stub site had a
   failure of its tail circuit.

   This approach has a number of drawbacks:

   1.  Failures that result in alternate best paths with path lengths
       that are equal to the previous best path are not damped, even in



Li & Huston             Expires December 15, 2007               [Page 9]


Internet-Draft         BGP Stability Improvements              June 2007


       the case of stub tail-circuit failures.

   2.  Convergence is harmed if the alternate AS paths that are damped
       out are optimal:

       A.  If the failure that triggered the path change is not due to a
           tail-circuit failure, then useful alternate AS paths will be
           ignored.

       B.  While this approach is beneficial for stub sites, such sites
           are not particularly good candidates for being explicitly
           routed.  Such sites should only ever need prefixes that are
           aggregateable.  Such prefixes may be explicitly distributed
           by BGP for the sake of traffic engineering, so understanding
           the scope of affected prefixes is left for future study.  For
           non-stub sites, this approach damages convergence.
           Unfortunately, it seems difficult to distinguish between stub
           prefixes and non-stub prefixes.  To do so would seem to
           require require explicit annotation that could only be
           reasonably generated by manual configuration and would likely
           be incorrect.  Transporting the annotation itself would
           require further protocol extension.

3.5.  Optimal path hysteresis

   It has been observed that the overall topology of the Internet at the
   AS level changes at a fairly low rate.  Thus, the optimal AS path to
   a given prefix, ignoring transient issues, changes at a very low
   rate.  This suggests that caching the optimal AS path and waiting for
   it to reappear would be another alternative heuristic to help select
   only the long-term optimal path.

   An implementation of this technique might retain a copy of the AS
   path on per-prefix basis, even if it had no active path to the
   prefix.  Because most implementations maintain a cache of AS paths,
   this is not necessarily prohibitively expensive.  When a new AS path
   is received for a prefix, the new path is compared to the cached
   optimal path.  If it matches, or it is preferable to the stored
   optimal path, then the new path is immediately accepted, advertised,
   and the cache can be updated appropriately.  However, if the new AS
   path is inferior to the cached path, then the implementation can
   infer that there is some path hunting in progress and can choose to
   either not perform best path selection, not select the new path, or
   not advertise the new path.  Again, after a suitable period has
   elapsed, the implementation may decide that the optimal path is
   unlikely to appear and may process the inferior path normally.

   The benefits of this approach are that when a site has been



Li & Huston             Expires December 15, 2007              [Page 10]


Internet-Draft         BGP Stability Improvements              June 2007


   temporarily unreachable for a short time, then when it returns to
   connectivity, only the optimal path will propagate through the
   network.

   There are three drawbacks to this approach.  First, this approach
   will delay convergence in the case where the cached optimal path is
   not going to be restored.  Second, this approach will delay
   reachability.  Third, the maintenance of the cache could be a non-
   trivial amount of overhead.  Many implementations maintain a cache of
   AS paths, where the actual path is stored a single time and then
   various prefixes point to a cache entry.  In such circumstances, if
   the AS path is maintained for some other active prefix, then the
   additional cost of caching the path is the additional entry for the
   unreachable prefix and the pointer to the cache entry.  However, if
   there are no other references for the AS path, then the storage of
   the AS path itself would be part of the overhead.  The impact of this
   overhead is tempered by the fact that a prefix that is only
   disconnected from the network for a short time would likely reuse the
   same memory shortly thereafter in any case.  An implementation could
   also alleviate the cost of this overhead by limiting the amount of
   memory spent on caching optimal paths for inactive prefixes, or
   temporally limiting the lifetime of the cached information.

3.5.1.  Optimal path length hysteresis

   A variant of this approach would be to only cache the path length of
   the optimal path.  This would allow certain suboptimal paths matching
   the cached length to pass rapidly through the system, and in
   exchange, it would decrease the amount of overhead necessary to
   maintain the cache.

3.6.  Delayed best path selection

   Another observation based on the discussion in Section 1.4.1 is that
   the amount of flap is exacerbated by each AS selecting the best
   possible path each time a new path is presented.  This is not
   strictly required by BGP, so ignoring some of the incoming paths
   would be perfectly acceptable.  Further, an implementation could
   reasonably delay performing any best path analysis for an arbitrarily
   long time, as long as it continued to advertise the path it actually
   used.  Thus, one possible policy would be to only perform best path
   selection when absolutely required.  When the first path for a prefix
   arrives, the implementation would immediately select that path,
   thereby restoring connectivity.  Subsequent paths from other
   neighbors for the same prefix would not trigger a new best path
   computation.  Rather, they would simply start a timer that would only
   expire when the paths had stabilized.




Li & Huston             Expires December 15, 2007              [Page 11]


Internet-Draft         BGP Stability Improvements              June 2007


   The benefit of this approach is that it would provide rapid
   reachability and a major reduction in churn.  By propagating one path
   and suppressing most of the intermediate paths, only a few paths are
   likely to be propagated.  The drawback to this approach is that it
   will still necessary propagate some suboptimal paths.  If the initial
   path was the long-term optimal path, then churn would not be an
   issue.  Thus, under this approach, propagating the first path helps
   to optimize reachability, but makes it very likely that the optimal
   path will subsequently follow.  Further, if the neighbor that relayed
   the initial path decides to change its path selection for any reason,
   then the local BGP speaker will have no alternative except to execute
   best path selection and propagate a new best path.  This would likely
   be another intermediate path, not necessarily the long-term optimal
   path.

   Some basic analysis shows that this technique, when combined with
   optimal path hysteresis is capable of reducing the overall BGP
   message load by 35% for prefixes that oscillate frequently.  In
   addition, when these techniques are combined with path length
   damping, there is negligible further improvement.  Examination of the
   result shows that optimal path hysteresis also effectively damps out
   much of the path hunting messaging that occurs when a prefix is being
   withdrawn, in much the same way that path length damping would do.

3.7.  Deprecate MRAI

   BGP specifies that a prefix should not be advertised multiple times
   within a fixed period of time.  This is called the Min Route
   Advertisement Interval (MRAI).  Implementations of this are sometimes
   simplistic and can result in information being delayed for the length
   of this interval even when such delay causes an increase in the
   number of updates due to diffractive replications.  It also leads to
   unnecessary delays in convergence.

   The original intent of MRAI was to rate-limit BGP updates to prevent
   thrashing.  Unfortunately, this unsophisticated control has some side
   effects that are deleterious to the overall effort.  If other can
   provide appropriate guarantees that the update rate will remain
   appropriately constrained, then the spirit of the MRAI requirement
   would be satisfied and the actual mechanism itself would not be
   necessary.

   For example, path length damping could be used to ensure that the
   rate of increases in the AS path length would be kept to a
   controllable level.  Refinements in flap damping might be used to
   deal with the case where the AS path length is constant.  No
   mechanism would be necessary to deal with a decreasing AS path
   length, since that is necessarily bounded by the AS path length



Li & Huston             Expires December 15, 2007              [Page 12]


Internet-Draft         BGP Stability Improvements              June 2007


   itself.  In such circumstances, it should be possible to remove the
   MRAI mechanism entirely, improve convergence, decrease diffraction,
   and continue to ensure overall mesh stability.

3.8.  Hybrid approaches

   The approaches listed here could, in principle, be used in
   conjunction with one another.  This would result in a hybrid behavior
   that had the benefits and drawbacks of the combined mechanisms.  For
   example, it should be possible to combine optimal path caching and
   delayed best path selection.  This approach would then propagate the
   first path seen by a BGP speaker, but would then delay subsequent
   path selection opportunities until the optimal path is seen.

3.9.  Other work

   Other work is already in progress to help reduce the amount of BGP
   messages that are generated when a large number of routes are
   withdrawn.  [AggrWithdrawl]


4.  Next steps

4.1.  Call for collaboration

   As can be seen from the above, there is a great deal of work yet to
   be done on this subject.  Collaborators are most welcome in any
   aspect of the work.

4.2.  Literature search

   There are a number of technical articles listed below that have been
   published on BGP flap damping and stability that need to be reviewed
   and included if they prove substantive.  A few known ones are listed
   here.  There are very likely a number of other articles in the
   literature that are relevant.

      [TON-1998]

      [Infocom-1999]

      [FTCS-1999]

      [Sigcomm-2000]

      [Infocom-2001]





Li & Huston             Expires December 15, 2007              [Page 13]


Internet-Draft         BGP Stability Improvements              June 2007


      [Sigcomm-2002]

      [PCC-2004]

      [Infocom-2005]

      [R-BGP]

4.3.  Analysis

   A number of projects have collected traces of BGP update messages
   that demonstrate both flap and path hunting.  It would be of great
   interest to examine the effects of some of the proposal in Section 3
   in detail on these traces.

4.3.1.  A Case Study: Path length damping

   This case study examines the BGP updates over the month of April 2007
   based on an observation point adjacent to AS 4637.

   During that month a single eBGP peering session received 1,341,520
   BGP update messages, reflecting 3,523,906 individual prefix updates
   and 627,538 individual prefix withdrawals.  Considering that there
   were an average of some 215,000 individual prefixes in the BGP
   routing table across the month, that's an average of around 19
   updates for every prefix in the month, assuming a uniform
   distribution of updates across the entire routing domain.

   Of course, the distribution of updates is not uniform, and most of
   the network is highly stable.  Half of these 210,000 prefixes had
   less than 10 routing updates across the month, and only 20,000
   prefixes had more than 40 updates for the month.  In other words,
   this is a very skewed distribution, with 10% of announced prefixes
   being responsible for 53% of all routing updates, and the busiest 1%
   of prefixes responsible for 24% of the routing updates for the month.

   The first step is to look at what kinds of updates one can expect
   from a single peer in BGP.  The following table classifies the types
   of BGP updates of interest here.












Li & Huston             Expires December 15, 2007              [Page 14]


Internet-Draft         BGP Stability Improvements              June 2007


   +------+------------------------------------------------------------+
   | Code | Description                                                |
   +------+------------------------------------------------------------+
   | AA+  | Announcement of an already announced prefix with a longer  |
   |      | AS path (update to longer path)                            |
   | AA-  | Announcement of an announced prefix with a shorter AS path |
   |      | (update to shorter path)                                   |
   | AA0  | Announcement of an announced prefix with a different path  |
   |      | of the same length (update to a different AS path of same  |
   |      | length)                                                    |
   | AA*  | Announcement of an announced prefix with the same path but |
   |      | different attributes (update of attributes)                |
   | AA   | Announcement of an announced prefix with no change in path |
   |      | or attributes (possible BGP error or data collection       |
   |      | error)                                                     |
   | WA+  | Announcement of a withdrawn prefix, with longer AS path    |
   | WA-  | Announcement of a withdrawn prefix, with shorter AS path   |
   | WA0  | Announcement of a withdrawn prefix, with different AS path |
   |      | of the same length                                         |
   | WA*  | Announcement of a withdrawn prefix with the same AS path,  |
   |      | but different attributes                                   |
   | WA   | Announcement of a withdrawn prefix with the same AS path   |
   |      | and same attributes                                        |
   | AW   | Withdrawal of an announced prefix                          |
   | WW   | Withdrawal of a withdrawn prefix (possible BGP error or a  |
   |      | data collection error)                                     |
   +------+------------------------------------------------------------+

   The following table classifies all the updates according to this
   taxonomy.

                            +------+---------+
                            | Code | Count   |
                            +------+---------+
                            | AA+  | 607,093 |
                            | AA-  | 555,609 |
                            | AA0  | 594,029 |
                            | AA*  | 782,404 |
                            | AA   | 195,707 |
                            | WA+  | 238,141 |
                            | WA-  | 190,328 |
                            | WA0  | 51,780  |
                            | WA*  | 30,797  |
                            | WA   | 77,440  |
                            | AW   | 627,538 |
                            | WW   | 0       |
                            +------+---------+




Li & Huston             Expires December 15, 2007              [Page 15]


Internet-Draft         BGP Stability Improvements              June 2007


   The interesting numbers here are those associated with BGP path
   hunting following a withdrawal, which are likely to be associated
   with the 607,093 AA+ updates and the 627,538 AW updates.  But the
   population of these update types alone is not enough on its own to
   justify a conclusion that over 1.2 million updates are associated
   with path hunting as a precursor to prefix withdrawal events.  The
   other salient factor that needs to be examined is the time
   distribution of updates, as the path hunting condition is associated
   with a rapid burst of updates.

   In looking at the time distribution of updates for the same prefix,
   there are some prominent peaks.  The operation of the 30 second MRAI
   timer appears to be very prominent, and 934,391 updates occurred
   precisely 30 seconds after the previous update for the same prefix,
   and a total 1,636,093 updates for the same prefix occurred within 31
   seconds of the previous update.  That's the equivalent to 39% of the
   entire BGP update activity for the month.  There are further local
   peaks at 30 second intervals at 60, 90 and all the way through to 240
   seconds.  Almost half of the BGP update activity occurs in a closely
   time-coupled manner.

   There are also smaller local peaks at 30 minute and 1 hour intervals.
   It is likely that these correspond to Route Flap Damping outcomes,
   where the damping period is typically one of these two values.
   Interestingly, there are also local peaks at 1, 2 and 3 day
   intervals.  This is unlikely to be an artifact of Route Flap Damping,
   and is more likely to be the outcome of some form of time-managed
   traffic system that performs routing changes on a regular 24 hour
   basis.

   Another way of looking at this time distribution of updates is to
   construct update "sequences" where a pair of updates is considered to
   be part of the same sequence if it refers to the same prefix and is
   received within 65 seconds (or slightly longer than two MRAI Timer
   intervals) of any previous update for the same prefix.  Only 28% of
   the updates for the month are not part of any sequence, while 26% of
   updates are part of a coupled update pair, and 46% of updates are
   part of sequences of 3 or more updates.  Interestingly enough,
   changing the timer as to what defines a sequence does not alter the
   profile greatly.  Extending the sequence timer to 125 seconds (or
   four MRAI Timer intervals) produces the outcome that 54% of updates
   are part of sequences of 3 or more updates, while reducing the
   sequence timer to 35 seconds produces the outcome that 36% of all
   updates are part of sequences of 3 or more updates.

   The approaches to flap damping to date have tended to look at flap
   damping as a persistent condition that lasts for hours or longer, and
   are an outcome of a strongly persistent announcement and withdrawal



Li & Huston             Expires December 15, 2007              [Page 16]


Internet-Draft         BGP Stability Improvements              June 2007


   characteristics that are assumed to be associated with some form of
   cyclical behavior of an underlying circuit.  This now appears to be
   well wide of the mark in terms of capturing the profile of what
   appears to be redundant BGP updates that reflect transitory routing
   states that are not in any converged state.

   The question this prompts is whether there is any value in looking at
   BGP update patterns in the micro view rather than the macro?  Can we
   identify, on the fly, update sequences that are highly likely to
   correlate to the BGP behavior of path hunting to a withdrawal and
   damp the intermediate path hunting states and simply propagate the
   resultant converged state?  This was part of the intent of the MRAI
   timer, but rather than simply apply a uniform damping interval to
   update propagation, can we devise a selective algorithm that attempt
   to pinpoint routing transitions that are strongly associated with BGP
   path hunting?

   There are a number of observations here that appear to point to some
   value in considering this approach:

   o  A BGP update generator may perform "update compression" by
      removing an already queued update when a further update that
      refers to the same prefix is generated.  Thus, when using the MRAI
      timer, only the most recent state for each prefix is passed to the
      BGP peers, and any intermediate state that occurred during the
      MRAI-imposed quiescent time is suppressed.

   o  Convergence in BGP appears to take longer than a single MRAI timer
      interval.  As noted in the sequencing profile for updates, some
      36% of all sequences take more than two MRAI timer intervals, or
      more than 60 seconds, to complete.

   o  Path hunting in BGP is commonly represented as an update sequence
      of the form {AA+}* AW, i.e. a sequence of lengthening AS path
      lengths followed by a withdrawal.

   o  Suppression of updates that lengthen the AS path length of a
      prefix does not implicitly create any risks of routing loop
      formation during the suppression period.  If the peer had already
      selected a different path as the best path, then the update to a
      longer path would have no impact on the previous selection.  If
      the peer was using the path advertised by this BGP speaker as its
      best path, then the suppression may cause the peer to continue to
      use this out-of-date path, but would not cause a path loop, as if
      the peer was listed on the longer path then the peer would already
      have a shorter path, and this update would not alter the peer's
      forwarding state.




Li & Huston             Expires December 15, 2007              [Page 17]


Internet-Draft         BGP Stability Improvements              June 2007


   So the profile of update sequences that appear to be effective
   targets for some form of local suppression are those that lengthen
   the AS path, and possibly also those updates that do not change the
   AS path Length, and are also part of a sequence of time-clustered
   updates for the same prefix.

   One approach to path length damping is to delay the processing an
   update if the update represents a lengthening of the AS path for an
   already announced prefix, selecting the AA+ updates.  Furthermore,
   the updates that are of interest here are those that occur during BGP
   path hunting, so the length of time of the suppression should not be
   minutes or hours, but slightly over one MRAI time interval, or 35
   seconds.  If no further updates for this prefix are generated in this
   suppression interval, then the update is processed at the end of the
   suppression time, otherwise the suppressed update is replaced by its
   successor update.

   How effective would this form of path length damping be in the
   context of the BGP Update data set we've been examining here?

   The algorithm used to implement this damping response is to suppress
   the processing all AA+ updates by up to 35 seconds.  If a further
   update for this prefix occurs during this suppression interval, then
   the suppressed update is ignored and the successor update is
   processed in stead.  If this update represents a further lengthening
   of the AS path then it, too, is suppressed for 35 seconds.  There are
   607,093 AA+ updates in this set of suppressed updates, or some 15% of
   the total update load for the month.  Path length damping would
   result in not processing 371,943 updates, or some 9.5% of the total
   update load.  This result also indicates that 61% of all AA+ updates
   are followed by a subsequent update for the same prefix within one
   MRAI time interval.

   Decreasing the sensitivity of the suppression parameters to a little
   over 2 MRAI intervals, or 65 seconds, increases the number of
   unprocessed updates to 418,805, or an additional 1%, so it appears
   that a damping sensitivity of a single MRAI interval represents a
   suitable point of compromise between maximizing the number of damped
   BGP events without making the BGP convergence process significantly
   slower.

   Of these damped updates, how many are actually path hunt updates?
   Some 96,135 of these damped updates are immediately followed by a
   withdrawal within the 35 second suppression period, and a further
   36,691 damped updates are followed by another suppressed AA+ update.

   This approach could be extended in a number of directions.  One
   approach is to regard any update that does not reduce the AS path



Li & Huston             Expires December 15, 2007              [Page 18]


Internet-Draft         BGP Stability Improvements              June 2007


   length or withdraw the prefix as being a candidate for damping.  In
   this case some 845,290 updates would be damped or 21% of all updates.
   Of these update just under one quarter, or 208,007 of these damped
   updates are followed by a withdrawal within one MRAI interval, and a
   further 474,234 of these damped updates are followed by an update
   with an AS path of the same or greater length.  The implication being
   that path length damping removes around one fifth of the total BGP
   update volume without reducing the time to convergence for route
   withdrawal, nor the time for propagation of more preferred routing
   paths.

   Using this latter form of path length damping, over the month the
   average prefix update rate per second falls from 1.60 prefix updates
   per second to 1.22 prefix updates per second, with 0.38 damped
   updates per second on average.

   The average peak update rate per hour falls from 355 to 290 prefix
   updates per second using path length damping, an average reduction of
   65 updates per second on the hourly peaks.

4.4.  Prototyping, Testing and Pilot Deployment

   After some analysis, it would then be helpful to actually implement
   the most useful possible solutions in a number of BGP
   implementations.  Since this is a change to BGP, extensive testing is
   going to be necessary and a period of pilot deployment will be
   required.  Implementers, testers, and operators could help accelerate
   this portion of the project.


5.  Acknowledgments

   This document builds on the work of [RFC2439] and we would like to
   thank Curtis Villamizar, Ravi Chandra, and Ramesh Govindan for their
   excellent work.

   We would like to thank Brian Carpenter and Robin Whittle for their
   helpful comments.


6.  IANA Considerations

   This memo includes no requests to IANA.


7.  Security Considerations

   This document raises no new security issues.



Li & Huston             Expires December 15, 2007              [Page 19]


Internet-Draft         BGP Stability Improvements              June 2007


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.

8.2.  Informative References

   [CN2000]   Varadhan, K., Govindan, R., and D. Estrin, "Persistent
              Route Oscillations in Inter-domain Routing", Computer
              Networks vol. 32, pp. 1-16, 2000, <http://
              lecs.cs.ucla.edu/~estrin/papers/rt-oscillations.ps>.

   [Harmful]  Bush, R., Griffin, T., and Z. Mao, "Route flap damping:
              harmful?", <http://www.nanog.org/mtg-0210/ppt/flap.pdf>.

   [I-D.iab-raws-report]
              Meyers, D., "Report from the IAB Workshop on Routing and
              Addressing", draft-iab-raws-report-02 (work in progress),
              April 2007.

   [Potaroo0607]
              Huston, G., "Damping BGP",
              <http://www.potaroo.net/ispcol/2007-06/dampbgp.html>.

   [RFC2439]  Villamizar, C., Chandra, R., and R. Govindan, "BGP Route
              Flap Damping", RFC 2439, November 1998.

   [RFC3345]  McPherson, D., Gill, V., Walton, D., and A. Retana,
              "Border Gateway Protocol (BGP) Persistent Route
              Oscillation Condition", RFC 3345, August 2002.

   [RIPE-378]
              Smith, P. and C. Panigl, "RIPE Routing Working Group
              Recommendations on Route-flap Damping",
              <http://www.ripe.net/ripe/docs/ripe-378.html>.

8.3.  Potential References

   [AggrWithdrawl]
              Raszuk, R., Patel, K., Appanna, C., and D. Ward, "BGP
              Aggregate Withdraw", draft-raszuk-aggr-withdraw-00 (work
              in progress), <http://tools.ietf.org/tools/rfcmarkup/
              rfcmarkup.cgi/doc/html/draft-raszuk-aggr-withdraw-00.txt>.



Li & Huston             Expires December 15, 2007              [Page 20]


Internet-Draft         BGP Stability Improvements              June 2007


   [FTCS-1999]
              Labovitz, C., Ahuja, A., and F. Jahanian, "Experimental
              Study of Internet Stability and Wide-Area Network
              Failures", FTCS 1999.

   [Infocom-1999]
              Labovitz, C., Malan, G., and F. Jahanian, "Origins of
              Internet Routing Instability", Infocom 1999.

   [Infocom-2001]
              Labovitz, C., Ahuja, A., Wattenhofer, R., and S.
              Venkatachary, "The Impact of Internet Policy and Topology
              on Delayed Routing Convergence", Infocom 2001.

   [Infocom-2005]
              Chandrashekar, J., Duan, Z., Zhang, Z., and J. Krasky,
              "Limiting path exploration in BGP", Infocom 2005.

   [PCC-2004]
              Duan, Z., Chandrashekar, J., Krasky, J., Xu, K., and Z.
              Zhang, "Damping BGP Route Flaps", IEEE International
              Conference on Performance, Computing, and
              Communications 2002.

   [R-BGP]    Kushman, N., Kandula, S., Katabi, D., and B. Maggs,
              "R-BGP: Staying Connected In a Connected World", 4th
              USENIX Symposium on Networked Systems Design &
              Implementation 2007,
              <http://nms.csail.mit.edu/~dina/pub/rbgp.pdf>.

   [Sigcomm-2000]
              Labovitz, C., Ahuja, A., Bose, A., and F. Jahanian,
              "Delayed Internet Routing Convergence", Sigcomm 2000.

   [Sigcomm-2002]
              Mao, Z., Govindan, R., Varghese, G., and R. Katz, "Route
              Flap Damping Exacerbates Internet Routing Convergence",
              Sigcomm 2002.

   [TON-1998]
              Labovitz, C., Malan, G., and F. Jahanian, "Internet
              Routing Instability", TON 1998.









Li & Huston             Expires December 15, 2007              [Page 21]


Internet-Draft         BGP Stability Improvements              June 2007


Authors' Addresses

   Tony Li
   Cisco Systems, Inc.
   170 W. Tasman Dr.
   San Jose, CA  95134
   US

   Phone: +1 408 853 1494
   Email: tli@cisco.com


   Geoff Huston
   Asia Pacific Network Information Centre

   Email: gih@apnic.net
   URI:   http://www.apnic.net


































Li & Huston             Expires December 15, 2007              [Page 22]


Internet-Draft         BGP Stability Improvements              June 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Li & Huston             Expires December 15, 2007              [Page 23]