BESS Working Group                                        R. Sharma, Ed.
Internet-Draft                                          A. Banerjee, Ed.
Intended status: Standards Track                              A. Sajassi
Expires: December 27, 2018                                  L. Krattiger
                                                             R. Sivaramu
                                                           Cisco Systems
                                                           June 25, 2018


           Multi-site EVPN based VXLAN using Border Gateways
                  draft-sharma-bess-multi-site-evpn-00

Abstract

   This document describes the procedures for interconnecting two or
   more BGP based Ethernet VPN (EVPN) sites in a scalable fashion over
   an IP-only network.  The motivation is to support extension of EVPN
   sites without having to rely on typical Data Center Interconnect
   (DCI) technologies like MPLS/VPLS.  The requirements for such a
   deployment are very similar to the ones specified in RFC 7209 --
   "Requirements for Ethernet VPN (EVPN)".

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 27, 2018.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents



Sharma, et al.          Expires December 27, 2018               [Page 1]


Internet-Draft               Multi-site EVPN                   June 2018


   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Multi-Site EVPN Overview  . . . . . . . . . . . . . . . . . .   4
     3.1.  MS-EVPN Interconnect Requirements . . . . . . . . . . . .   4
     3.2.  MS-EVPN Interconnect concept and framework  . . . . . . .   5
   4.  Multi-site EVPN Interconnect Procedures . . . . . . . . . . .   7
     4.1.  Border Gateway Discovery  . . . . . . . . . . . . . . . .   7
     4.2.  Border Gateway Provisioning . . . . . . . . . . . . . . .   8
       4.2.1.  Border Gateway Designated Forwarder Election  . . . .   9
       4.2.2.  Anycast Border Gateway  . . . . . . . . . . . . . . .   9
       4.2.3.  Multi-path Border Gateway . . . . . . . . . . . . . .  10
     4.3.  EVPN route processing at Border Gateway . . . . . . . . .  10
     4.4.  Multi-Destination tree between Border Gateways  . . . . .  12
     4.5.  Inter-site Unicast traffic  . . . . . . . . . . . . . . .  12
     4.6.  Inter-site Multi-destination traffic  . . . . . . . . . .  13
     4.7.  Host Mobility . . . . . . . . . . . . . . . . . . . . . .  13
   5.  Convergence . . . . . . . . . . . . . . . . . . . . . . . . .  13
     5.1.  Fabric to Border Gateway Failure  . . . . . . . . . . . .  13
     5.2.  Border Gateway to Border Gateway Failures . . . . . . . .  13
   6.  Interoperability  . . . . . . . . . . . . . . . . . . . . . .  13
   7.  Isolation of Fault Domains  . . . . . . . . . . . . . . . . .  14
   8.  MVPN with Multi-site EVPN . . . . . . . . . . . . . . . . . .  14
     8.1.  Inter-Site MI-PMSI  . . . . . . . . . . . . . . . . . . .  14
     8.2.  Stitching of customer multicast trees across sites  . . .  15
     8.3.  RP placement across sites . . . . . . . . . . . . . . . .  15
     8.4.  Inter-Site S-PMSI . . . . . . . . . . . . . . . . . . . .  15
   9.  Observations with Multi-site EVPN . . . . . . . . . . . . . .  15
   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16
   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   12. Security Considerations . . . . . . . . . . . . . . . . . . .  16
   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  16
     13.1.  Normative References . . . . . . . . . . . . . . . . . .  16
     13.2.  Informative References . . . . . . . . . . . . . . . . .  16
   Appendix A.  Additional Stuff . . . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17







Sharma, et al.          Expires December 27, 2018               [Page 2]


Internet-Draft               Multi-site EVPN                   June 2018


1.  Introduction

   BGP based Ethernet VPNs (EVPNs) are being used to support various VPN
   topologies with the motivation and requirements being discussed in
   RFC7209 [RFC7209].  EVPN has been used to provide a Network
   Virtualization Overly (NVO) solution with a variety of tunnel
   encapsulation options in RFC8365 [RFC8365] for the Data center
   interconnect (DCI) at the WAN Edge.  Procedures for IP and MPLS hand-
   off at site boundaries are additionally discussed in [DCI-OVERLAY].

   In current EVPN deployments, there is a need to segment the EVPN
   domains within a Data Center (DC) primarily due to the service
   architecture and the scaling requirements around it.  The number of
   routes, tunnel end-points, and next-hops needed in the DC are
   sometimes larger than the capability of the hardware elements that
   are being deployed.  Network operators would like to inter-connect
   these domains without using traditional DCI technologies.  In
   essence, they want smaller multi-site EVPN domains with an IP
   backbone.  Additionally, they would like to have an Anycast model for
   the nodes at the gateways.  This alleviates the hardware of having to
   support multi-path on overlay reachability.

   Network operators today are using the Virtual Network Identifier
   (VNI) to designate a service.  They would like to have this service
   available to a smaller set of nodes within the DC for administrative
   reasons; in essence they want to break up the EVPN domain to multiple
   smaller sites.  An advantage of having a smaller footprint for these
   EVPN sites results in fault isolation domains being constrained.  It
   also allows for re-use of VNI space across sites.

   In a traditional leaf-spine architecture, it is conceivable, that the
   network operator may decide to support both the Route-Reflector and
   Gateway functionality on the spine nodes.  In such a deployment
   model, it is necessary to have a site identifier marked with each
   domain, such that route import and export rules can work effectively.

   In this document we focus primarily on the VXLAN encapsulation for
   EVPN deployments, with the underlay providing only IP connectivity.
   We describe in detail the IP/VXLAN hand-off mechanisms to
   interconnect these smaller sites within the data center itself, and
   refer to this deployment model as multi-site EVPN (MS-EVPN).  The
   procedures described here go into substantial detail regarding
   interconnecting Layer-2 (L2) and Layer-3 (L3) networks, for unicast
   and multicast domains across MS-EVPNs.  In this specification, we
   also define the use of the Type 5 Ethernet Segment Identifier (ESI)
   (Section 5 of RFC7432 [RFC7432]) between multiple sites using the
   Anycast routing model.




Sharma, et al.          Expires December 27, 2018               [Page 3]


Internet-Draft               Multi-site EVPN                   June 2018


1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.  Terminology

   o  Border Gateway (BG): This is the node that interacts with nodes
      that are internal to a site and external to it.  It is responsible
      for functionality related to traffic entering and exiting a site.

   o  Anycast Border Gateway: A virtual set of shared BGs acting as
      multiple entry-exit points for a single site.

   o  Multipath Border Gateway: A virtual set of unique BGs acting as a
      multiple entry-exit points for a single site.

   o  RT-X: Route Type X as defined for various EVPN route types.

3.  Multi-Site EVPN Overview

   In this section we describe the motivation, requirements, and
   framework for the Multi-Site EVPN (MS-EVPN) functionality.

3.1.  MS-EVPN Interconnect Requirements

   a.  Scalability: Multi-Site EVPN (MS-EVPN) should be able to
       interconnect multiple sites, allowing for addition/deletion of
       new sites or modifying capacity of existing ones seamlessly.

   b.  Multi-Destination traffic over unicast-only cloud: MS-EVPN
       mechanisms should provide an efficient forwarding mechanism for
       multi-destination frames by using existing network elements as-
       is.  A large flat fabric rules out the option of ingress
       replication, as the number of replications becomes practically
       unachievable due to the internal hardware bandwidth needed.

   c.  Maintain Site-specific Administrative control: MS-EVPN should be
       able to interconnect fabrics from different Administrative
       domains.  The solution should allow for different sites to have
       different VLAN-VNI mappings, use different underlay routing
       protocols, and/or have different PIM-SM group ranges.

   d.  Isolate fault domains: MS-EVPN technology hand-off should have
       capability to isolate traffic across site boundaries and prevent
       defects to percolate from one site to another.  As an example, a
       broadcast storm in a site should not propagate to other sites.



Sharma, et al.          Expires December 27, 2018               [Page 4]


Internet-Draft               Multi-site EVPN                   June 2018


3.2.  MS-EVPN Interconnect concept and framework

   EVPN with IP-only interconnect is conceptualized as multiple site-
   local EVPN control planes and IP forwarding domains interconnected
   via a single common EVPN control and IP forwarding domain.  Every
   node is identified with a unique site-scope identifier.  A site-local
   EVPN domain consists of EVPN nodes with the same site identifier.

   Border Gateways (BGs) are explicitly part of a site-specific EVPN
   domain, and implicitly part of a common interconnect EVPN domain with
   BGs from other sites.  Although a BG has only a single explicit site-
   id (that of the site it is a member of, see Section 4.1), it can be
   considered to also have a second implicit site-id, that of the
   interconnect-domain which has membership of all the BGs from all
   sites that are being interconnected.  BGs discover each other through
   EVPN RT-1 A-D routes and act as both control and forwarding plane
   gateway across sites.  This facilitates site-local nodes to visualize
   all other sites to be reachable only via its BGs.

   We describe the MS-EVPN deployment model using the topology as shown
   in Figure 1.  In the topology there are 3 sites, Site A, Site B, and
   Site C that are inter-connected using IP.  This entire topology is
   deemed to be part of the same Data Center.  In most deployments these
   sites can be thought of as pods, which may span a rack, a row, or
   multiple rows in the data center, depending on the size of domain
   desired for scale and fault and/or administrative isolation.

   In this topology, site-local nodes are connected to each other by
   iBGP EVPN peering and BGs are connected by eBGP Muti-hop EVPN peering
   via inter-site cloud.  We explicitly spell this out to ensure that we
   can re-use BGP semantics of route announcement between and across the
   sites.  Other BGP mechanisms to instantiate this will be discussed in
   a separate document.  This implies that each domain/site has its own
   AS number.  In the topology, only 2 border gateway per site are
   shown; this is more for ease of illustration and explanation.  The
   technology poses no such limitation.  As mentioned earlier, site-
   specific EVPN domain consists of only site-local nodes in the sites.
   A BG is logically partitioned into site specific EVPN domain towards
   the site and into common EVPN domain towards other sites.  This
   facilitates them to act as control and forwarding plane gateway for
   forwarding traffic across sites.

   EVPN nodes with in a site will discover each other via regular EVPN
   procedures and build site-local bidirectional VXLAN tunnels and
   multi-destination trees from leaves to BGs.  BGs will discover each
   other by RT-1 routes with unique site-identifiers and build inter-
   site bi-directional VXLAN tunnels and multi-destination trees between
   them.  We thus build an end-to-end bidirectional forwarding path



Sharma, et al.          Expires December 27, 2018               [Page 5]


Internet-Draft               Multi-site EVPN                   June 2018


   across all sites by stitching (and not by stretching end-to-end)
   site-local VXLAN tunnels with inter-site VXLAN tunnels.  In essence,
   a MS-EVPN fabric is built in complete downstream and modular fashion.

   ____________________________
   | ooo Encapsulation tunnel |
   | X X X  Leaf-spine fabric |
   |__________________________|


    Site A (EVPN site A)              Site B (EVPN site B)
    ___________________________      ____________________________
   |      X X X X X X X X     |      |      X X X X X X X X     |
   |         X X X X          |      |         X X X X          |
   |        o       o         |      |        o       o         |
   |BG-1 Site A    BG-2 Site A|      |BG-1 Site B    BG-2 Site B|
    ___________________________      ____________________________
           o           o                o               o
            o           o              o               o
             o           o            o               o
              o           o          o               o
          _______________________________________________
          |                                             |
          |                                             |
          |        Inter-site common EVPN site          |
          |                                             |
          |                                             |
          _______________________________________________
                        o                   o
                         o                 o
                          o               o
                           o             o
                      ___________________________
                      | BG-1 Site C    BG-2 Site C|
                      |         X X X X           |
                      |      X X X X X X X X      |
                      _____________________________
                       Site C (EVPN site C)

                                 Figure 1

   Site-local tenant domains (for example, bridging, flood, routing, and
   multicast) are interconnected only via BGs with site-remote tenant
   domains (bridging, flood, routing, and multicast respectively) from
   other sites.  It stitches such tenant domains (bridging, flood,
   routing, and multicast) in complete downstream fashion using EVPN
   route advertisements.  Such interconnects do not assume uniform
   mappings of mac-vrf (or IP-VRF) to VNI across sites.



Sharma, et al.          Expires December 27, 2018               [Page 6]


Internet-Draft               Multi-site EVPN                   June 2018


4.  Multi-site EVPN Interconnect Procedures

   In this section we describe the new functionalities in the Border
   Gateway (BG) nodes for interconnecting EVPN sites within the DC.

   In a nutshell, BG discovery will facilitate termination and re-
   origination of inter-site VXLAN tunnels.  Such discovery provides
   flexibility for intra-site leaf-to-leaf VXLAN tunnels to co-exist
   with inter-site tunnels terminating on BGs.  Additionally, BGs need
   to discover each other such that it is possible to run the Designated
   Forwarder (DF) election between the border nodes of a site.  It also
   needs to be aware of other remote BGs such that it can allow for
   appropriate import/export of routes from other sites.

4.1.  Border Gateway Discovery

   BGs leverage the RT-1 A-D route type defined in RFC7432 [RFC7432].
   BGs in different sites will use RT-1 A-D routes with unique site-
   identifiers to announce themselves as "Borders" to other BGs.  Nodes
   within the same site MUST be configured or auto-generate the same
   site-identifier.  Nodes that are not configured to be a border node
   will build VXLAN tunnels only between each member of the site (which
   it is aware due to the site-identifier that is additionally announced
   by them).  Border nodes will additionally build VXLAN tunnels between
   itself and other border nodes that are announced with a different
   site identifier.  The site-identifier is encoded within the ESI label
   itself as described below.

   In this specification, we reuse the AS-based Ethernet Segment
   Identifier (ESI) Type 5 (see Section 5 of RFC7432 [RFC7432]) that can
   be auto-generated or configured by the operator.  It is repeated here
   to illustrate the encoding of the site-identifier.

   o  Type 5 (T=0x05): The ESI value is constructed with the site-id
      parameter being embedded as follows.

      *  AS number (4 octets).  This is an AS number owned by the system
         and MUST be encoded in the high-order 4 octets of the ESI Value
         field.  If a 2-octet AS number is used, the high-order extra 2
         octets will be 0x0000.

      *  Local Discriminator/Site Identifier (4 octets): The Local
         Discriminator is also referred to as the Site Identifier and
         its value MUST be encoded as follows.  The high-order 2 octets
         will be 0x0000, and the low order 2 octets will be set to the
         site-identifier to which this node belongs.  All border
         gateways MUST announce this value.  We need the AS number and
         the site identifier together to be automatically derivable to



Sharma, et al.          Expires December 27, 2018               [Page 7]


Internet-Draft               Multi-site EVPN                   June 2018


         less than 6 octets; this enables for auto import and export of
         routes (see the ES-Import RT definition in RFC7432 [RFC7432]).

      *  Reserved (1 octet): The low-order octets of the ESI Value will
         be set to 0 on transmission and will be ignored on receipt.

   Along with the RT-1 Ethernet A-D routes, border nodes MUST set the
   second low order bit (Flags B0: Single Active, B1: MS-Border) of the
   octet flag in the ESI Label Extended Community attribute that is
   announced in tandem.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Type=0x06     | Sub-Type=0x01 | Flags(1 octet)|  Reserved=0   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Reserved=0   |          ESI Label                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                 Figure 2

   The site-identifier value is globally unique within the deployments.
   The RT-1 Ethernet A-D route along with (i) the MS-Border bit being
   set in the ESI Label Extended Community and (ii) the per-VNI RT
   Extended Community will enable all BGs be aware of all the other BGs
   in the network.  All BGs are thus able to figure out other members in
   the same site, and armed with this information is able to run a
   Designated Forwarder (DF) election for BGs site and VNI scoped as
   against the traditional Ethernet segment DF election.  In Figure 1,
   nodes BG-A1, BG-A2, BG-B1, BG-B2, BG-C1, and BG-C2, will announce the
   ESI Label and the per-VNI RT Extended Communities.  Nodes, BG-A1, and
   BG-A2, will perform a DF election for Site-A, whereas, nodes BG-B1,
   and BG-B2 will perform one for site-B.  Even though, all BG nodes are
   able to see all the advertisements, the site-identifier scopes the DF
   election (using RT-4 ES Routes) to its site members.  This
   specification uses the All-Active Redundancy Mode specially when the
   Anycast model of route announcements are used for the local routes.

4.2.  Border Gateway Provisioning

   Border Gateway nodes manage both the control-plane communications and
   the data forwarding plane for any inter-site traffic.  Once BGs are
   discovered (using RT-1 routes), any RT-2/RT-5 routes from other sites
   will be terminated and re-originated on such BGs.  RT-2/RT-5 routes
   carry downstream VNI labels.  As BG discovery is agnostic to
   symmetric or downstream VNI provisioning, rewriting next-hop
   attributes before re-advertising these routes from other sites to a
   given site provides flexibility to keep different mac-VRF or IP-VRF




Sharma, et al.          Expires December 27, 2018               [Page 8]


Internet-Draft               Multi-site EVPN                   June 2018


   to VNI mapping in different sites and still able to interconnect L3
   and L2 domains.

   RT-1, RT-3, and RT-4 from other sites will be terminated at the BGs.
   As has been defined in the specifications, RT-3 routes carry
   downstream VNI labels and will be used to pre-build VXLAN tunnels in
   the common EVPN domain for L2, L3, and Multi-Destination traffic.

4.2.1.  Border Gateway Designated Forwarder Election

   In the presence of more than one BG nodes in a site, forwarding of
   multi-destination L2 or L3 traffic both into the site and out of the
   site needs to be carried out by a single node.  This node is termed
   as a designated forwarder and elected per-VNI as per rules defined in
   Section 8.5 of RFC7432 [RFC7432].  RT-4 Ethernet Segment routes are
   used for the DF election.  In the multi-site deployment, the RT-4
   Ethernet Segment routes carry a ES-Import RT Extended Community
   attribute with it.  We need to enforce that these are imported to
   only the local site members when the ES-Import value matches with its
   own value.  The 6-byte values are generated using a concatenation of
   the 4-byte AS number the member belongs, with the 2-bytes of site-
   identifier.  As a result, only local site-members will match to form
   the candidate list.  All the BGs are able to extract the site
   identifier from this attribute and the list of nodes where this
   election is run is now constrained to the BGs between same site
   members.

   In both modes (Anycast and Multipath), RT-3 routes will be generated
   locally and advertised by DF winner Border Gateway with unique
   gateway IP.  This will facilitate building fast converging flood
   domain connectivity inter-site and intra-site and on same time
   avoiding duplicate traffic by electing DF winner to forward multi-
   destination inter-site traffic.

   Failure events which lead to a BG losing all of its connectivity to
   the IP interconnect backbone should trigger the BG to withdraw its
   Border RT-4 Ethernet Segment route(s) and RT-1 A-D route, to indicate
   to other BG's of the same site that it is no longer a candidate BG
   and to indicate BG's of different sites that it is no longer a Border
   Gateway.

4.2.2.  Anycast Border Gateway

   In this mode all BGs share same gateway IP and rewrite EVPN next-hop
   attributes with a shared logical next-hop entity.  However, these BGs
   will maintain unique gateway IP to facilitate building IR trees from
   site-local nodes to forward Multi-Destination traffic.  EVPN RT-2,
   RT-5 routes will be advertised to the nodes in the site from all



Sharma, et al.          Expires December 27, 2018               [Page 9]


Internet-Draft               Multi-site EVPN                   June 2018


   other BGs and BG will run DF election per VNI for Multi destination
   traffic.  RT-3 routes will be advertised by the DF winner BG for a
   given VNI so that only DF will receive and forward inter-site
   traffic.  It is also possible to advertise and draw traffic by all
   BGs at a site to improve convergence properties of the network.  In
   case of multi-destination trees built by non-EVPN procedures (say
   PIM), all BGs will receive but only DF winner will forward traffic.

   It is recommended that BG be enabled in the Anycast mode wherein the
   BG functionality is available to the rest of the network as a single
   logical entity for inter-site communication.  In the absence of
   Anycast capability the BG could be enabled as individual gateways
   (Single-Active BG) wherein a single node will perform the active BG
   role for a given flow at a given time.  As of now, the Border Gateway
   system mac of the other border nodes belonging to the same site is
   expected to be configured out-of-band.

4.2.3.  Multi-path Border Gateway

   In this mode, Border gateways will rewrite EVPN Next-hop attributes
   with unique next-hop entities.  This provides flexibility to apply
   usual policies and pick per-VRF, per-VNI or per-flow primary/backup
   border Gateways.  Hence, an intra-site node will see each BG as a
   next-hop for any external L2 or L3 unicast destination, and would
   perform an ECMP path selection to load-balance traffic sent to
   external destinations.  In case an intra-site node is not capable of
   performing ECMP hash based path-selection (possibly some L2
   forwarding implementations), the node is expected to choose one of
   the BG's as its designated forwarder.  EVPN RT-2, RT-5 routes will be
   advertised to the nodes in the site from all border gateways and
   Border gateway will run DF election per VNI for Multi destination
   traffic.  RT-3 routes will be advertised by DF winner Border gateway
   for a given VNI so that only DF will receive and forward inter-site
   traffic.  It is also possible to advertise and draw traffic by all
   Border Gateways at a site to improve convergence properties of the
   network.  In case of multi-destination trees built by non-EVPN
   procedures (say PIM), all border gateways will receive but only DF
   winner will forward traffic.

4.3.  EVPN route processing at Border Gateway

   BG functionality in an EVPN site SHOULD be enabled on more than one
   node in the network for redundancy and high-availability purposes.
   Any external RT-2/RT-5 routes that are received by the BGs of a site
   are advertised to all the intra-site nodes by all the BGs.  For
   internal RT-2/RT-5 routes received by the BG's from the intra-site
   nodes, all the BGs of a site would advertise them to the remote BG's,
   so any L2/L3 known unicast traffic to internal destinations could be



Sharma, et al.          Expires December 27, 2018              [Page 10]


Internet-Draft               Multi-site EVPN                   June 2018


   sent to any one of the local BG's by remote sources.  For known L2
   and L3 unicast traffic, all of the individual BGs will behave either
   as single logical forwarding node (Anycast model) or a set of active
   forwarding nodes.

   All control plane and data plane states are interconnected in a
   complete downstream fashion.  For example, BGP import rules for a
   Type 3 route should be able to extend a flood domain for a VNI and
   flood traffic destined to advertised EVPN node should carry the VNI
   which is announced in Type 3 route.  Similarly Type 2, Type 5 control
   and forwarding states should be interconnected in a complete
   downstream fashion.

   o  Route Target processing for RT-1 routes: Every IP-VRF and MAC-VRF
      will generate RT-1 with the format described in section 4.1.
      Route targets can be auto derived from Ethernet Tag ID (VLAN ID)
      for that EVPN instance as described in section 7.10.1 of RFC7432
      [RFC7432].  ES import route target extended community as described
      in Section 7.6 of RFC7432 [RFC7432] is optional for RT-1 routes in
      this context.  ESI Label Extended Community Attribute is a MUST in
      this context, since it carries the MS-Border notion as a new bit.

   o  Route Target processing for RT-4 routes: Every IP-VRF and MAC-VRF
      will generate RT-4 with the format described in section 4.1.
      Route targets can be auto derived from Ethernet Tag ID (VLAN ID)
      for that EVPN instance as described in Section 7.10.1 of RFC7432
      [RFC7432].  ES import route target extended community as described
      in Section 7.6 of RFC7432 [RFC7432] is mandatory for RT-4 in this
      context.  The encoding of ES-Import is based on AS number and
      Site-identifier as described in Section 4.2.1.  Such import route
      target will allow import of RT-4 only to the Border gateways of
      same sites.

   o  Route Target processing for RT-2, RT-3, RT-5 routes: These routes
      will carry either auto-derived route targets (based on Ethernet
      Tag ID (VLAN ID) for that EVPN instance) or explicit route
      targets.  Border gateways usual import rules will imports these
      routes and re-advertise these with border gateway next hops.  Also
      the routes which are imported at Border Gateways and re-advertised
      SHOULD implement a mechanism to avoid looping of updates should
      they come back at Border Gateways.  RT-3 routes will be imported
      and processed on border gateways from other border gateways but
      MUST NOT be advertised again.








Sharma, et al.          Expires December 27, 2018              [Page 11]


Internet-Draft               Multi-site EVPN                   June 2018


4.4.  Multi-Destination tree between Border Gateways

   The procedures described here recommends building an Ingress
   Replication (IR) tree between Border Gateways.  This will facilitate
   every site to independently build site-specific Multi destination
   trees.  Multi-destination end-to-end trees between leafs could be PIM
   (site 1) + IR (between border Gateways) + PIM(site 2) or IR-IR-IR or
   PIM-IR-IR.  However this does not rule out using IR-PIM-IR or end-to-
   end PIM to build multi-destination trees end-to-end.

   Border Gateways will generate RT-3 routes with unique gateway IP and
   advertise to Border Gateways of other sites.  These RT-3 routes will
   help in building IR trees between border gateways.  However, only DF
   winner per VNI will forward multi-destination traffic across sites.

   As Border Gateways are part of both site-specific and inter-site
   Multi-destination IR trees, split-horizon mechanism will be used to
   avoid loops.  Multi-destination tree with Border gateway as root to
   other sites (or Border-Gateways) will be in a separate horizon group.
   Similarity Multi-destination IR tree with Border Gateway as root to
   site-local nodes will be in another split horizon group.

   If PIM is used to build Multi-Destination trees in site-specific
   domain, all Border gateway will join such PIM trees and draw multi-
   destination traffic.  However only DF Border Gateway will forward
   traffic towards other sites.

4.5.  Inter-site Unicast traffic

   As site-local nodes will see all inter-site EVPN routes via Border
   Gateways, VXLAN tunnels will be built between leafs and site-local
   Border Gateways and Inter-site VXLAN tunnels will be built between
   Border gateways in different sites.  An end-to-end VXLAN
   bidirectional forwarding path between inter-site leafs will consist
   of VXLAN tunnel from leaf (say Site A) to its Border Gateway (BG-A1),
   another VXLAN tunnel from Border Gateway (BG-A1) to Border Gateway
   (BG-B1) in another site (say site B) and Border gateway (BG-B1) to
   leaf (in site B).  Such an arrangement of tunnels is scalable as a
   full mesh of VXLAN tunnels across inter-site leafs is substituted by
   combination of intra-site and inter-site tunnels.

   L2 and L3 unicast frames from site-local leafs will reach border
   gateway using VXLAN encapsulation.  At Border gateway, VXLAN header
   is stripped out and another VXLAN header is pushed to sent frames to
   destination site Border Gateway.  Destination site Border gateway
   will strip off VXLAN header and push another VXLAN header to send
   frame to the destination site leaf.




Sharma, et al.          Expires December 27, 2018              [Page 12]


Internet-Draft               Multi-site EVPN                   June 2018


4.6.  Inter-site Multi-destination traffic

   Multi-destination traffic will be forwarded from one site to other
   site only by DF for that VNI.  As frames reach Border Gateway from
   site-local nodes, VXLAN header will be decapsulated from the payload,
   and encapsulated with another VXLAN header (derived from downstream
   Type 3 EVPN routes received from the border gateways of the
   destination site) to forward the payload to the destination site
   border gateway.  Similarly destination site Border Gateway will strip
   off VXLAN header and forward the payload after encapsulating with
   another VXLAN header towards the destination leaf.

   As explained in Section 4.4, split horizon mechanism will be used to
   avoid looping of inter-site multi-destination frames.

4.7.  Host Mobility

   Host movement handling will be same as defined in RFC7432 [RFC7432].
   When host moves, EVPN RT-2 routes with updated sequence number will
   be propagated to every EVPN node.  When a host moves inter-site, only
   Border gateways may see EVPN updates with both next-hop attributes
   and sequence number changes and leafs may see updates only with
   updated sequence numbers.  However in other cases, both Border
   gateway and leaves may see next-hop and sequence number changes.

5.  Convergence

5.1.  Fabric to Border Gateway Failure

   If a Border Gateway is lost, Border gateway next-hop will be
   withdrawn for RT-2/RT-5 routes.  Also per-VNI DF election will be
   triggered to chose new DF.  DF new winner will become forwarder of
   Multi-destination inter-site traffic.

5.2.  Border Gateway to Border Gateway Failures

   In case where inter-site cloud has link failures, direct forwarding
   path between border gateways can be lost.  In this case, traffic from
   one site can reach other site via border gateway of an intermediate
   site.  However, this will be addressed like regular underlay failure
   and traffic terminations end-points will still stay same for inter-
   site traffic flows.

6.  Interoperability

   The procedures defined here are only for Border Gateways.  Therefore
   other EVPN nodes in the network should be RFC7432 [RFC7432] compliant
   to operate in such topologies.



Sharma, et al.          Expires December 27, 2018              [Page 13]


Internet-Draft               Multi-site EVPN                   June 2018


   As the procedures described here are applicable only after receiving
   Border A-D route, if other domains are connected which are not
   capable of such multi-site gateway model, they can work in regular
   EVPN mode.  The exact procedures will be detailed in a future version
   of the draft.

   The procedures here provides flexibility to connect non-EVPN VXLAN
   sites by provisioning Border Gateways on such sites and inter-
   connecting such Border Gateways by Border Gateways of other sites.
   Such Border Gateways in non-EVPN VXLAN sites will play dual role of
   EVPN gateway towards common EVPN domain and non-EVPN gateway towards
   non-EVPN VXLAN site.

7.  Isolation of Fault Domains

   Isolation of network defects requires policies like storm control,
   security ACLs etc to be implemented at site boundaries.  Border
   gateways should be capable of inspecting inner payload of packets
   received from VXLAN tunnels and enforce configured policies to
   prevent defects percolating from one part to rest of the network.

8.  MVPN with Multi-site EVPN

   BGP based MVPN as defined in RFC6513 [RFC6513] and RFC6514 [RFC6514]
   will coexist with Multisite-EVPN with out any changes in route types
   and encodings defined for MVPN route types in these RFCs.  Route
   Distinguisher and VRF route import extended communities will be
   attached to MVPN routes as defined in the BGP MVPN RFCs.  Import and
   Export Route targets will be attached to MVPN routes either by Auto-
   generating them from VNI or by explicit configuration per MVPN.
   Since, BGP MVPN RFC adapts to any VPN address family to provide RPF
   information to build C-Multicast trees, EVPN route types will be used
   to provide required RPF information for Multicast sources in MVPNs.
   In order to follow segmentation model of Multisite-EVPN, following
   procedures are recommended to build provider and customer multicast
   trees between sources and receivers across sites.

8.1.  Inter-Site MI-PMSI

   As defined in above mentioned MVPN RFCs, I-PMSI A-D routes are used
   to signal a provider tunnel or MI-PMSI per MVPN.  Multisite-EVPN
   recommends EVPN Type-3 routes to build such MI-PMSI provider tunnel
   per VPN between Border Gateways of different sites.  Every MVPN node
   will use its unique router identifier to build these MI-PMSI provider
   tunnels.  In Anycast Border gateway model also, these MI-PMSI
   provider tunnels are built using unique router identifier of Border
   gateways.  In similar fashion, these Type-3 routes can be used to
   build MI-PMSI provider tunnel per MVPN with in sites.



Sharma, et al.          Expires December 27, 2018              [Page 14]


Internet-Draft               Multi-site EVPN                   June 2018


8.2.  Stitching of customer multicast trees across sites

   All Border Gateways will rewrite next-hop and re-originate MVPN
   routes received from other sites to local site and from local site to
   other sites.  Therefore customer Multicast trees will be logically
   built end-to-end across sites by stitching these trees via Border
   gateways.  A C-multicast join route (say Type 7 MVPN) will follow
   EVPN RPF path to build C-multicast tree from leaf in a site to its
   Border gateway and to destination site leafs via destination site
   Border Gateways.  Similarly Source-Active A-D MVPN route (Type 5
   MVPN) will be rewritten with next-hop and re-originated via Border
   gateways so that source C-Multicast trees will be stitched via Border
   gateways.

8.3.  RP placement across sites

   Multisite-EVPN recommends only Source C-Multicast trees across sites.
   Therefore Customer RP placement per MVPN should be restricted with in
   sites.  Source-Active A-D MVPN route type (Type 5) will be used to
   signal C-Multicast sources across sites.

8.4.  Inter-Site S-PMSI

   As defined in BGP MVPN RFCs, S-PMSI A-D routes (Type 3 MVPN) will be
   used to signal selective PMSI trees for high bandwidth C-Multicast
   streams.  These S-PMSI A-D routes will be signaled across sites via
   Border gateways rewriting next-hop and re-originating them to other
   sites.  PMSI tunnel attribute in re-originated S-PMSI routes will be
   adjusted to the provide tunnel types between Border gateways across
   sites.

9.  Observations with Multi-site EVPN

   Since an Anycast address is now advertised in the underlay protocols
   per ES, this solution does increase the scale of routes for the
   underlay.  Furthermore, the ES failures are now conveyed via the
   underlay protocols.  To drop down to single homing mode, one would
   need to track the interfaces that are used for the inter-site
   traffic.  It is a requirement to not have intra-site and inter-site
   traffic use the same links from the nodes.  Due to the anycast
   formulation of the gateways, it is not possible to entertain any
   load-balancing per ES link for the gateway nodes.

   Loop avoidance by the use of the domain-path-id as defined in
   [EVPN-IPVPN-INTERWORKING] will be detailed in a future version of the
   draft.





Sharma, et al.          Expires December 27, 2018              [Page 15]


Internet-Draft               Multi-site EVPN                   June 2018


10.  Acknowledgements

   This authors would like to thank Max Ardica, Murali Garimella, Anuj
   Mittal, Lilian Quan, Veera Ravinutala, Tarun Wadhwa for their review
   and comments.

11.  IANA Considerations

   TBD.

12.  Security Considerations

   TBD.

13.  References

13.1.  Normative References

   [DCI-OVERLAY]
              A. Sajassi et. al., "A Network Virtualization Overlay
              Solution using EVPN", 2018, <https://tools.ietf.org/html/
              draft-ietf-bess-dci-evpn-overlay-07>.

   [EVPN-IPVPN-INTERWORKING]
              A. Sajassi et. al., "EVPN Interworking with IPVPN", 2018,
              <https://tools.ietf.org/html/
              draft-rabadan-sajassi-bess-evpn-ipvpn-interworking-00>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC7432]  Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
              Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
              Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
              2015, <https://www.rfc-editor.org/info/rfc7432>.

13.2.  Informative References

   [RFC6513]  Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/
              BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February
              2012, <https://www.rfc-editor.org/info/rfc6513>.

   [RFC6514]  Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
              Encodings and Procedures for Multicast in MPLS/BGP IP
              VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012,
              <https://www.rfc-editor.org/info/rfc6514>.



Sharma, et al.          Expires December 27, 2018              [Page 16]


Internet-Draft               Multi-site EVPN                   June 2018


   [RFC7209]  Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N.,
              Henderickx, W., and A. Isaac, "Requirements for Ethernet
              VPN (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014,
              <https://www.rfc-editor.org/info/rfc7209>.

   [RFC8365]  Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R.,
              Uttaro, J., and W. Henderickx, "A Network Virtualization
              Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365,
              DOI 10.17487/RFC8365, March 2018,
              <https://www.rfc-editor.org/info/rfc8365>.

Appendix A.  Additional Stuff

   TBD.

Authors' Addresses

   Rajesh Sharma (editor)
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA
   USA

   Email: rajshr@cisco.com


   Ayan Banerjee (editor)
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA
   USA

   Email: ayabaner@cisco.com


   Ali Sajassi
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA
   USA

   Email: sajassi@cisco.com









Sharma, et al.          Expires December 27, 2018              [Page 17]


Internet-Draft               Multi-site EVPN                   June 2018


   Lukas Krattiger
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA
   USA

   Email: lkrattig@cisco.com


   Raghava Sivaramu
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA
   USA

   Email: raghavas@cisco.com



































Sharma, et al.          Expires December 27, 2018              [Page 18]