Inter-Domain Routing                                      I. van Beijnum
Internet-Draft                                            IMDEA Networks
Expires: September 10, 2009                                    R. Winter
                                                         NEC Labs Europe
                                                           March 9, 2009


                     A BGP Inter-AS Cost Attribute
                      draft-van-beijnum-idr-iac-02

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 10, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   Although BGP implementations have extensive path selection
   algorithms, in practice operators have trouble performing



van Beijnum & Winter   Expires September 10, 2009               [Page 1]


Internet-Draft              BGP Inter-AS Cost                 March 2009


   satisfactory traffic engineering of incoming traffic based on BGP
   attributes that are taken into account in the path selection
   algorithm alone.  For this reason, many ASes deaggregate their
   address range(s) into smaller blocks and announce these blocks
   differently to different neighboring ASes in order to arrive at the
   desired traffic flow.  This practice contributes to the growth of the
   global routing table, which drives up capital expenditures for
   networks engaging in inter-domain routing.  This memo introduces a
   new inter-domain metric that supports finer-grained traffic
   engineering than current BGP attributes.


1.  Introduction

   An origin AS today has no appropriate means to express preference of
   a certain path leading towards it, as the BGP decision process is not
   designed to take such preference into account.  The two sole means an
   origin AS has in order to influence the way traffic enters its
   network are either prefix disaggregation - resulting in global
   routing table growth - or AS path prepending - a very imprecise
   method.  It's easy to see how comparing AS paths lengths is
   problematic in today's flat AS hierarchy.  Assume 10 tier-1 ISPs that
   can reach all destinations connected to the internet through peering,
   and assume that the local AS buys transit service from two tier-1
   ISPs.  The traffic to the customers of those ISPs will normally flow
   through the respective ISP.  However, for all destinations reachable
   over the 8 other tier-1s, the AS paths will have the same length over
   both transit ISPs.  This means that prepending the AS path towards
   one ISP has a very dramatic effect: as much as 80% of all traffic may
   subsequently flow over the non-prepended ISP.  A similar situation
   can occur in more complex types of connectivity.  With a finer-
   grained value that is communicated across ASes this problem would be
   reduced.

   This memo proposes such a finer-grained inter-AS metric: the inter-AS
   cost (IAC).  With this metric, it is possible for destinations of
   traffic to make precise adjustments to the metrics seen by the
   sources of traffic and thus make it possible to arrive at more
   favorable load sharing ratios between multiple links to different
   ASes without having to resort to the advertisement of more specific
   prefixes.

   In the past, efforts somewhat similar to this have been undertaken.
   In 1995, [I-D.antonov-bgp-metrics] proposed new per-hop BGP metrics.
   However, this proposal suffered from high complexity and a resulting
   risk of unforeseen consequences.  A year later, [I-D.chen-bgp-dpa]
   proposed a new inter-AS metric for the purpose of allowing symmetric
   routing and load sharing.  This proposal wasn't fleshed out in much



van Beijnum & Winter   Expires September 10, 2009               [Page 2]


Internet-Draft              BGP Inter-AS Cost                 March 2009


   detail.

   Neither proposal specifically addressed the issue of granularity in
   an inter-AS metric.

   Note that the definition of IAC and IAClocal have been fundamentally
   changed since version -01 of this draft.  See the end of the document
   for more information about the changes.


2.  IAC and IAClocal

   The new metric is named Inter-AS Cost (IAC) which is added by the
   origin AS as an optional transitive attribute to the prefix
   announcement.  The content of the IAC is an 8-bit signed value that
   represents the relative cost or preference towards the source of the
   associated prefix compared to other paths for reaching the same
   prefix.  However, to avoid small changes in IAC from having a very
   large effect, like prepending the AS path by one AS hop has a very
   large effect today, a randomization component is introduced.  The
   IAC, the randomization component and optionally a local cost towards
   the next hop together make up the IAClocal value, which is used to
   compare prefixes.

   The randomization component is computed by XORing all the octets in
   the AS numbers of the source of the announcement, the next hop AS and
   the local AS.  The resulting 8-bit value R is subsequently
   interpreted as a signed value with possible values -128 to 127.  The
   randomization component R makes sure that even when no policy is
   applied, destinations with the same properties will be preferred
   through different next hop addresses, and that different ASes make
   this selection differently, so there is a (very roughly) equal
   distribution of traffic over different links, both for the sending
   and the receiving ASes.

   The optional local cost (LC) is an integer that can take the values
   between -256 and 255.  Its purpose is to give some control to the
   local AS is case the comparison of the computed local is unfavorable
   for the local AS.  It has to be noted that the local AS has other
   means than the IAClocal to accomplish outbound traffic engineering:
   the LOCAL_PREF.

   IAClocal is computed as follows:

   IAClocal = IAC * 2 + R (+ LC)

   Hence, IAClocal can be an integer value between -640 and 636.  Any
   IAClocal values outside this range MUST lead to the presence of the



van Beijnum & Winter   Expires September 10, 2009               [Page 3]


Internet-Draft              BGP Inter-AS Cost                 March 2009


   IAClocal attribute being ignored.  The IAClocal is stored as a 16-bit
   signed value in network byte order in the IAC BGP path attribute.
   The new IAC path attribute is an optional transitive attribute that
   can take two forms: over eBGP, the attribute only contains the IAC.
   When communicated through iBGP, the attribute both contains the IAC
   and the IAClocal, in that order.  When a router generates a route
   locally for announcement over BGP, an IAC of 0 MAY be included.
   However, it is recommended that an IAC attribute is only generated
   when an IAC is specified in the configuration.  A missing IAC is
   semantically distinct from an IAC of 0, so configuring a 0 IAC MUST
   result in the inclusion of the IAC attribute.

   The IAClocal is computed by the router receiving a prefix containing
   the IAC attribute over eBGP, or when sourcing a prefix advertisement.
   If no IAC attribute is received over eBGP, a router MUST NOT create
   one, and no IAClocal is computed.  The IAClocal MUST NOT be computed
   when no IAC attribute is present.

   In the BGP route selection algorithm, the IAClocal is compared
   immediately following the comparison of the IGP cost.  As such, the
   IAClocal is only considered for routes that have identical
   LOCAL_PREF, AS_PATH, possibly MED and learned over eBGP/iBGP
   properties and more.  For these routes, without IAClocal, route
   selection would come down to the last two tie breaking steps.  In
   addition, the IAClocal is only considered when all the routes under
   consideration at this point in the selection process contain the IAC
   attribute holding an IAClocal value.  If the IAClocal is considered,
   the route with the highest IAClocal is selected.  If there are
   multiple routes that share the highest IAClocal, the remaining tie
   breaking rules are executed over the routes sharing the highest
   IAClocal.

   WARNING:

   In iBGP, there is no loop detection.  As such, loops may occur when
   the tie breaking rules aren't implemented identically by all iBGP
   routers.  Consider the following topology:

   R1 --- R2 --- R3 --- R4

   If R1 and R4 have external routes towards a destination, and the IGP
   costs that R2 and R3 see over both R1 and R4 are identical, it would
   be possible for R2 to prefer the path over R4 because of the
   IAClocal, but if R3 doesn't implement the IAC attribute, it may
   prefer the path over R1 because of the existing tie breaking rules.

   This situation may occur when not all routers in an AS support the
   IAC attribute, and next hop addresses for eBGP routes are



van Beijnum & Winter   Expires September 10, 2009               [Page 4]


Internet-Draft              BGP Inter-AS Cost                 March 2009


   redistributed in the IGP using a metric that doesn't take the
   interior hops into account, such as the OSPF external type 1 metric.
   For this reason, operators MUST avoid redistributing connected
   interfaces as E1 in OSPF (or similar in other IGPs) if there is a
   mixed IAC-capable and non-IAC-capable environment in the AS.


3.  Usage guidelines

   If the distribution of AS path lengths between two or more links
   towards the rest of the internet is equal, then the randomization
   factor should make the traffic distribution between different links
   very roughly equal.  Suppose there are two links, and a traffic
   distribution of 1 : 2 (33% vs 67%) is desired.  This means that for
   67 - 33 = 34% of the ASes the route selection must be pushed towards
   the other link using the IAClocal.  That means a difference of 127 *
   34% = 43 in IAC between the route announced over the first link and
   the same route announced over the second link.  This can be
   accomplished with an IAC of +43 on the first and 0 on the second link
   or -43 on the second link and 0 on the first, or any other
   combination of IACs with a difference of 43.

   However, in practice the new traffic distribution will probably not
   be immediately equal to what's desired, so additional adjustments
   will likely be necessary.  Those can be based on the difference
   between the observed traffic distribution and the desired traffic
   distribution.  So if the difference is 10%, the difference in IACs
   can be increased or decreased by 10%.  Other heuristics may prove
   useful in practice.


4.  Changes

   In the previous versions of this draft, the IAC replaced the AS path
   length in the path selection algorithm.  We changed this for two
   reasons.  The first is that the differences in path selection between
   routers that do and routers that don't support the IAC would be too
   large in this situation, making for a very challenging deployment
   scenario.  The second reason is because we believe that there is no
   real reason for intermediate ASes to update the IAC.  Intermediate
   ASes are almost always ISPs, which as a rule don't have a need for
   additional mechanisms to balance traffic between outgoing paths
   towards the same destination.  By having the IAC be considered after
   the IGP cost, existing mechanisms that ISPs use to influence the BGP
   traffic flow, such as manipulating MEDs and IGP costs, are
   maintained.





van Beijnum & Winter   Expires September 10, 2009               [Page 5]


Internet-Draft              BGP Inter-AS Cost                 March 2009


5.  IANA considerations

   IANA is requested to allocate a BGP optional transitive attribute
   type code.


6.  Security considerations

   As the IAClocal is compared so late in the BGP route selection
   process, there is little chance of the presence of the IAC being a
   security risk, other than the potential for iBGP loops as outlined
   earlier.

   It is highly recommended that implementers include a mechanism to
   remove the IAC attribute in incoming or outgoing BGP updates.  This
   mechanism MUST be disabled by default.


7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.

7.2.  Informational References

   [I-D.chen-bgp-dpa]
              Chen, E. and T. Bates, "Destination Preference Attribute
              for BGP", draft-ietf-idr-bgp-dpa-05 (work in progress),
              September 1996.

   [I-D.antonov-bgp-metrics]
              Antonov, V., "BGP AS Path Metrics",
              draft-ietf-idr-bgp-metrics-00 (work in progress),
              March 1995.


Appendix A.  Document and discussion information

   The latest version of this document will always be available at
   http://www.muada.com/drafts/.  Please direct questions and comments
   to the idr or grow mailinglists or directly to the authors.





van Beijnum & Winter   Expires September 10, 2009               [Page 6]


Internet-Draft              BGP Inter-AS Cost                 March 2009


Appendix B.  Acknowledgement

   Rolf Winter and Iljitsch van Beijnum are partly funded by Trilogy, a
   research project supported by the European Commission under its
   Seventh Framework Program.


Authors' Addresses

   Iljitsch van Beijnum
   IMDEA Networks
   Avda. del Mar Mediterraneo, 22
   Leganes, Madrid  28918
   Spain

   Email: iljitsch@muada.com


   Rolf Winter
   NEC Labs Europe
   Kurfuersten-Anlage 36
   Heidelberg  69115
   Germany

   Email: rolf.winter@nw.neclab.eu


























van Beijnum & Winter   Expires September 10, 2009               [Page 7]