Network Working Group                                Pedro Marques (Ed.)
Internet Draft                                          Juniper Networks
Expiration Date: March 2005
                                                           Robert Raszuk
Luyuan Fang                                                 Luca Martini
AT&T                                                         Keyur Patel
                                                            Jim Guichard
Ronald Bonica                                         Cisco Systems Inc.
MCI

                                                          September 2004


                   Constrained VPN route distribution


                  draft-ietf-l3vpn-rt-constrain-01.txt

Status of this Memo

   By submitting this Internet-Draft, we certify that any applicable
   patent or other IPR claims of which we are aware have been disclosed,
   or will be disclosed, and any of which we become aware will be
   disclosed, in accordance with RFC 3668.

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.









Marques, et al.                                                 [Page 1]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


Abstract

   This document defines MP-BGP procedures that allow BGP speakers to
   exchange Route Target reachability information. This information can
   be used to build a route distribution graph in order to limit the
   propagation of VPN NLRI (such as VPN-IPv4, VPN-IPv6 or L2-VPN NLRI)
   between different autonomous systems or distinct clusters of the same
   autonomous system.



Table of Contents

 1      Specification of Requirements  .............................   2
 2      Intellectual Property Statement  ...........................   3
 3      Introduction  ..............................................   3
 4      NLRI DIstribution  .........................................   4
 4.1    Inter-AS VPN route distribution.  ..........................   4
 4.2    Intra-AS VPN route distribution  ...........................   6
 5      Route Target membership NLRI advertisements  ...............   7
 6      Capability Advertisement  ..................................   8
 7      Operation  .................................................   8
 8      Deployment considerations  .................................   9
 9      Security considerations  ...................................  10
10      Acknowledgments  ...........................................  10
11      Normative References  ......................................  10
12      Informative References  ....................................  11
13      Author Information  ........................................  11





1. Specification of Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119













Marques, et al.                                                 [Page 2]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


2. Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any assur-
   ances of licenses to be made available, or the result of an attempt
   made to obtain a general license or permission for the use of such
   proprietary rights by implementers or users of this specification can
   be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.



3. Introduction

   In BGP/MPLS IP VPNs, PE routers use Route Target (RT) extended commu-
   nities to control the distribution of routes into VRFs. Within a
   given iBGP mesh, PE routers need only to hold routes marked with
   Route Targets pertaining to VRFs that have local CE attachments.

   It is common, however, for an autonomous system use route reflection
   [BGP-RR] in order to simplify the process of bringing up a new PE
   router in the network and to limit the size of the iBGP peering mesh.

   In such a scenario, as well as when VPNs may have members in more
   than one autonomous system, the number of routes carried by the
   inter-cluster or inter-as distribution routers is an important con-
   sideration.

   In order to limit the VPN routing information that is maintained at a
   given RR, RFC2547bis [RFC2547bis] suggests, in section 4.3.3., the
   usage of "Cooperative Route Filtering" [ORF] between route reflec-
   tors. This proposal extends [RFC2547bis] ORF work to include support
   for multiple autonomous systems, and asymetric VPN topologies such as
   hub-and-spoke.                                           While it



Marques, et al.                                                 [Page 3]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


   would be possible to extend the encoding currently defined for
   extended-community ORF in order to achieve this purpose, BGP itself
   already has all the necessary machinery for dissemination of arbi-
   trary information in a loop free fashion, both within a single
   autonomous system, as well as across multiple autonomous systems.

   This document builds on the model described in [RFC2547bis] and on
   concept of cooperative route filtering by adding the ability to prop-
   agate Route Target membership information between iBGP meshes.

   By using MP-BGP UPDATE messages to propagate Route Target membership
   information it is possible to reuse all this machinery including
   route reflection, confederations and inter-as information loop detec-
   tion.

   Received Route Target membership information can then be used to
   restrict advertisement of VPN NLRI to peers that have advertised
   their respective Route Targets, effectively building a route distri-
   bution graph. In this model, VPN NLRI routing information flows in
   the inverse direction of Route Target membership information.

   This mechanism is applicable to any BGP NLRI that controls the dis-
   tribution of routing information based on Route Targets, such as BGP
   L2VPNs [L2VPN] and VPLS [VPLS].

   Throughout this document, the term NLRI, which originally expands to
   "Network Layer Reachability Information" is used to describe routing
   information carried via MP-BGP updates without any assumption of
   semantics.

   An NLRI consisting of <as#, route-target> will be referred to as RT
   membership information for the purpose of the explanation in this
   document.


4. NLRI DIstribution

4.1. Inter-AS VPN route distribution.

   In order to better understand the problem at hand, it is helpful to
   divide it in its inter-AS and intra-AS components.  Figure 1 repre-
   sents an arbitrary graph of autonomous systems (a through j) inter-
   connected in an ad-hoc fashion.  The following discussion ignores the
   complexity of intra-AS route distribution.







Marques, et al.                                                 [Page 4]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


                  +----------------------------------+
                  | +---+    +---+    +---+          |
                  | | a | -- | b | -- | c |          |
                  | +---+    +---+    +---+          |
                  |   |        |                     |
                  |   |        |                     |
                  | +---+    +---+    +---+    +---+ |
                  | | d | -- | e | -- | f | -- | j | |
                  | +---+    +---+    +---+    +---+ |
                  |        /            |            |
                  |       /             |            |
                  | +---+    +---+    +---+          |
                  | | g | -- | h | -- | i |          |
                  | +---+    +---+    +---+          |
                  +----------------------------------+
               Figure 1. Topology of autonomous systems.


   Lets consider the simple case of a VPN with CE attachments in ASes a
   and i using a single Route Target to control VPN route distribution.
   Ideally we would like to build a flooding graph for the respective
   VPN routes that would not include nodes (c, g, h, j).

   In order to achieve this we will rely on ASa and ASi generating a
   NLRI consisting of <as#, route-target> ( RT membership information ).
   Receipt of such an advertisement by one of the ASes in the network
   will signal the need to distribute VPN routes containing this Route
   Target community to the peer that advertised this route.

   Using RT membership information that includes both route-target and
   originator AS number, allows BGP speakers to use standard path selec-
   tion rules concerning as-path length (and other policy mechanisms) to
   prune duplicate paths in the RT membership information flooding
   graph, while maintaining the information required to reach all
   autonomous systems advertising the Route Target.

   In the example above, AS e needs to maintain a path to AS a in order
   to flood VPN routing information originating from AS i and vice-
   versa. It SHOULD however prune less preferred paths such as the
   longer path to ASi with as-path (g h i).

   Extending the example above to include AS j as a member of the VPN
   distribution graph would cause AS f to advertise 2 RT Membership NLRI
   to AS e, one containing origin AS i and one containing origin AS j.
   While advertising a single path, lets assume (f j) is selected, would
   be sufficient to guarantee that VPN information flows to all VPN mem-
   ber ASes, the information concerning the path (f i) is necessary to
   prune the arc (e g h i) from the route distribution graph.



Marques, et al.                                                 [Page 5]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


   As with other approaches for building distribution graphs, the bene-
   fits of this mechanism are directly proportional to how "sparse" the
   VPN membership is. Standard RFC2547 inter-AS behavior can be seen as
   a dense-mode approach, to make the analogy with multicast routing
   protocols.


4.2. Intra-AS VPN route distribution

   As indicated above, the inter-AS VPN route distribution graph, for a
   given route-target, is constructed by creating a directed arc on the
   inverse direction of received Route Target membership UPDATEs con-
   taining an NLRI of the form <as#, route-target>.

   Inside the BGP topology of a given autonomous-system, as far as
   external RT membership information is concerned (route-targets where
   the as# is not the local as), it is easy to see that standard BGP
   route selection and advertisement rules [BGP-BASE] will allow a tran-
   sit AS to create the necessary flooding state.

   Consider a IPv4 NLRI prefix, sourced by a single AS, which dis-
   tributed via BGP within a given transit AS. BGP protocol rules guar-
   antee that a BGP speaker has a valid route that can be used for for-
   warding of data packets for that destination prefix, in the inverse
   path of received routing updates.

   By the same token, and given that a <as#, route-target> key provides
   uniqueness between several ASes that may be sourcing this route-tar-
   get, BGP route selection and advertisement procedures guarantee that
   a valid VPN route distribution path exists to the origin of the Route
   Target membership information advertisement.

   Route Target membership information that are originated within the
   autonomous-system however require more careful examination. Several
   PE routers within a given autonomous-system may source the the same
   NLRI <as#, route-target>, thus default route advertisement rules are
   no longer sufficient to guarantee that within the given AS each node
   in the distribution graph has selected a feasible path to each of the
   PEs that import the given route-target.

   When processing RT membership NLRIs received from internal iBGP
   peers, it is necessary to consider all availiable iBGP paths for a
   given RT prefix, when building the outbound route filter, and not
   just the best path.

   In addition, when advertising Route Target membership information
   sourced by the local autonomous system to an iBGP peer, a BGP speaker
   shall modify its procedure to calculate the BGP attributes such that:



Marques, et al.                                                 [Page 6]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


        -i. When advertising RT membership NLRI to a route-reflector
            client, the Originator attribute shall be set to the router-
            id of the advertiser and the Next-hop attribute shall be set
            of the local address for that session.

       -ii. When advertising a RT membership NLRI to a non client peer,
            if the best path as selected by path selection procedure
            described in section 9.1 of [BGP-BASE], is a route received
            from a non-client peer, and there is an alternative path to
            the same destination from a client, the attributes of the
            client path are advertised to the peer.

   The first of these route advertisement rules is designed such that
   the originator of RT membership NLRI does not drop a RT membership
   NLRI which is reflected back to it, thus allowing the route reflector
   to use this RT membership NLRI in order to signal the client that it
   should distribute VPN routes with the specific target torwards the
   reflector.

   The second rule makes is such that any BGP speaker present in an iBGP
   mesh can signal the interest of its route reflection clients in
   receiving VPN routes for that target.

   These procedures assume that the autonomous-system route reflection
   topology is configured such that IPv4 unicast routing would work cor-
   rectly. For instance, route reflection clusters must be contiguous

   An alternative solution to the procedure given above would have been
   to source different routes per PE, such as NLRI of the form <origina-
   tor-id, route-target>, and aggregate them at the edge of the network.
   The solution adopted is considered to be advantageous over the former
   given that it requires less routing-information within a given AS.


5. Route Target membership NLRI advertisements

   Route Target membership NLRI is advertised in BGP UPDATE messages
   using the MP_REACH_NLRI and MP_UNREACH_NLRI attributes [BGP-MP]. The
   <AFI, SAFI> value pair used to identify this NLRI is (AFI=1,
   SAFI=132).

   The Next Hop field of MP_REACH_NLRI attribute shall be interpreted as
   an IPv4 address, whenever the lenght of NextHop address is 4 octects,
   and as a IPv6 address, whenever the lenght of the NextHop address is
   16 octets.

   The NLRI field in the MP_REACH_NLRI and MP_UNREACH_NLRI is a prefix
   of 0 to 96 bits encoded as defined in section 4 of [BGP-MP].



Marques, et al.                                                 [Page 7]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


   This prefix is structured as follows:


     +-------------------------------+
     | origin as        (4 octects)  |
     +-------------------------------+
     | route target     (8 octects)  |
     +                               +
     |                               |
     +-------------------------------+


   Except for the default route target, which is encoded as a 0 length
   prefix, the minimum prefix length is 32 bits. As the origin-as field
   cannot be interpreted as a prefix.

   Route targets can then be expressed as prefixes, where for instance,
   a prefix would encompass all route target extended communities
   assigned by a given Global Administrator [BGP-EXTCOMM].

   The default route target can be used to indicate to a peer the will-
   ingness to receive all VPN route advertisements such as, for
   instance, the case of route reflector speaking to one of its PE
   router clients.


6. Capability Advertisement

   A BGP speaker that wishes to exchange Route Target membership infor-
   mation must use the the Multiprotocol Extensions Capability Code as
   defined in [BGP-MP], to advertise the corresponding (AFI, SAFI) pair.

   A BGP speaker MAY participate in the distribution of Route Target
   information while not using the learned information for purposes of
   VPN NLRI output route filtering, although the latter is discouraged.


7. Operation

   A VPN NLRI route should be advertised to a peer that participates in
   the exchange of Route Target membership information if that peer has
   advertised either the default Route Target membership NLRI or a Route
   Target membership NLRI containing any of the targets contained in the
   extended communities attribute of the VPN route in question.

   When a BGP speaker receives a BGP UPDATE that advertises or withdraws
   a given Route Target membership NLRI, it should examine the RIB-OUTs
   of VPN NLRIs and reevaluate the advertisement status of routes that



Marques, et al.                                                 [Page 8]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


   match the Route Target in question.

   A BGP speaker should generate the minimum set of BGP VPN route
   updates necessary to transition between the previous and current
   state of the route distribution graph that is derived from Route Tar-
   get membership information.

   In order to avoid VPN route churn when a BGP session is established,
   implementations SHOULD generate an End-of-RIB marker, as defined in
   [BGP-GR], for the Route Target membership (afi, safi). Regardless of
   whether graceful-restart is enabled on the BGP session. This allows
   the receiver to know when it has received the full contents of the
   peers membership information. The exchange of VPN NLRI should follow
   the receipt of the End-of-RIB markers.


8. Deployment considerations

   This mecanism reduces the scaling requirements that are imposed on
   route reflectors by limiting the number of VPN routes and events that
   a reflector has to process to the VPN routes used by its direct
   clients. By default, a reflector must scale in terms of the total
   number of VPN routes present on the network.

   This also means that its is now possible to reduce the load impossed
   on a given reflector by dividing the PE routers present on its clus-
   ter into a new set of clusters. This is a localized configuration
   change that need not affect any system outside this cluster.

   The effectiveness of RT-based filtering depends on how sparse the VPN
   membership is.

   For instance, in the inter-as case, it is likely that a given VPN is
   connected to only a subset of all participating ASes. The only cur-
   rent mechanism to limit the scope of VPN route flooding is through
   manual filtering on the EBGP border routers. With the current pro-
   posal such filtering can be performed based on the dynamic Route Tar-
   get membership information.

   In some inter-as deployments not all RTs used for a given VPN have
   external significance. For example, a VPN can use an hub RT and a
   spoke RT internally to an autonomous-system. The spoke RT does not
   have meaning outside this AS and so it may be stripped at an external
   border router. The same policy rules that result in extended commu-
   nity filtering can be applied to RT membership information in order
   to avoid advertising an RT membership NLRI for the spoke-RT in the
   example above.




Marques, et al.                                                 [Page 9]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


   Throughout this document, we assume that autonomous-systems agree on
   an RT assignment convention. RT translation at the external border
   router boundary, is considered to be a local implementation decision,
   as it should not affect inter-operability.


9. Security considerations

   This document does not alter the security properties of BGP-based
   VPNs.  However it should be taken into consideration that output
   route filters built from RT membership information NLRI are not
   intended for security purposes. When exchanging routing information
   between separate administrative domains, it is a good practice to
   filter all incoming and outgoing NLRIs by some other mean in addition
   to RT membership information.  Implementations SHOULD also provide
   means to filter RT membership information.


10. Acknowledgments

   This proposal is based on the extended community route filtering
   mechanism defined in [ORF].

   Ahmed Guetari was instrumental in defining requirements for this pro-
   posal.

   The authors would also like to thank Yakov Rekhter, Dan Tappan, Dave
   Ward, John Scudder, and Jerry Ash for their comments and suggestions.


11. Normative References

   [BGP-BASE] Y. Rekhter, T. Li, S. Hares, "A Border Gateway Protocol 4
        (BGP-4)", draft-ietf-idr-bgp4-20.txt, 03/03

   [BGP-RR] Bates, Chandra, and Chen, "BGP Route Reflection: An
        alternative to full mesh IBGP", RFC 2796.

   [BGP-CAP] R. Chandra, J. Scudder, "Capabilities Advertisement with BGP-4",
        RFC2842.

   [BGP-MP] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol
        Extensions for BGP-4", RFC2858.

   [BGP-GR] S. Sangli, Y. Rekhter, R. Fernando, J. Scudder, E. Chen,
        "Graceful Restart Mechanism for BGP", draft-ietf-idr-restart-10.txt, 06/04.





Marques, et al.                                                [Page 10]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


12. Informative References

   [RFC2547bis] "BGP/MPLS VPNs", Rosen et. al.,
        draft-ietf-ppvpn-rfc2547bis-03.txt, 10/02.

   [ORF] E. Chen, Y. Rekhter, "Cooperative Route Filtering Capability for
        BGP-4", draft-ietf-idr-route-filter-09.txt, 08/03.

   [BGP-EXTCOMM] S. Sangli, D. Tappan, Y. Rekhter, "BGP Extended Communities
        Attribute", draft-ietf-idr-bgp-ext-communities-05.txt, 05/02.

   [L2VPN] K. Kompella et al., "Layer 2 VPNs Over Tunnels",
        draft-kompella-ppvpn-l2vpn-02.txt, 11/01.

   [VPLS] K Kompella (Ed.), "Virtual Private LAN Service",
        draft-kompella-ppvpn-vpls-01.txt, 11/02


13. Author Information

Ronald P. Bonica
MCI
22001 Loudoun County Pkwy
Ashburn, Virginia, 20147
Phone: 703 886 1681
Email: ronald.p.bonica@mci.com


Luyuan Fang
AT&T
200 Laurel Avenue, Room C2-3B35
Middletown, NJ 07748
Phone: 732-420-1921
Email: luyuanfang@att.com


Luca Martini
Cisco Systems, Inc.
9155 East Nichols Avenue, Suite 400
Englewood, CO, 80112
e-mail: lmartini@cisco.com










Marques, et al.                                                [Page 11]


Internet Draft    draft-ietf-l3vpn-rt-constrain-01.txt    September 2004


Pedro Marques
Juniper Networks
1194 N. Mathilda Ave.
Sunnyvale, CA 94089
Email: roque@juniper.net


Robert Raszuk
Cisco Systems, Inc.
170 West Tasman Dr
San Jose, CA 95134
Email: rraszuk@cisco.com


Keyur Patel
Cisco Systems, Inc.
170 West Tasman Dr
San Jose, CA 95134
Email: keyupate@cisco.com


Jim Guichard
Cisco Systems, Inc.
300 Beaver Brook Road
Boxborough, MA, 01719
Email: jguichar@cisco.com

























Marques, et al.                                                [Page 12]