INTERNET-DRAFT                                         L. Krattiger, Ed.
Intended Status: Informational                          A. Banerjee, Ed.
Expires: November 13, 2022                                    A. Sajassi
                                                               R. Sharma
                                                             R. Sivaramu
                                                           Cisco Systems

                                                            May 12, 2022

          Multi-Site Solution for Ethernet VPN (EVPN) Overlay


   This document describes the procedures for interconnecting two or
   more Network Virtualization Overlays (NVOs) via NVO over IP-only
   network. The solution interconnects Ethernet VPN network by using NVO
   with Ethernet VPN (EVPN) to facilitate the interconnect in a scalable
   fashion. The motivation is to support extension of Layer-2 and Layer-
   3, Unicast & Multicast, VPNs without having to rely on typical Data
   Center Interconnect (DCI) technologies like MPLS/VPLS. The
   requirements for the interconnect are similar to the ones specified
   in [RFC7209] "Requirements for Ethernet VPN (EVPN)". In particular,
   this document describes the difference of the Gateways (GWs)
   procedure and incremental functionality from [RFC9014] "Interconnect
   Solution for Ethernet VPN (EVPN) Overlay Networks", which this
   solution is interoperable to. This document updates and replaces all
   previous version of [SHARMA-MULTI-SITE].

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

Sharma, et al.         Expires November 13, 2022                [Page 1]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   The list of Internet-Draft Shadow Directories can be accessed at

Copyright and License Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Conventions and Terminology  . . . . . . . . . . . . . . . . .  4
   3.  Multi-Site EVPN Overview . . . . . . . . . . . . . . . . . . .  5
     3.1.  MS-EVPN Interconnect Requirements  . . . . . . . . . . . .  6
     3.2.  MS-EVPN Interconnect concept and framework . . . . . . . .  7
   4.  Multi-site EVPN Interconnect Procedures  . . . . . . . . . . .  9
     4.1.  Border Gateway Discovery . . . . . . . . . . . . . . . . .  9
     4.2.  Border Gateway Provisioning  . . . . . . . . . . . . . . . 11
       4.2.1.  Border Gateway Designated Forwarder Election . . . . . 11
       4.2.2.  Anycast Border Gateway . . . . . . . . . . . . . . . . 12
       4.2.3.  Multi-path Border Gateway  . . . . . . . . . . . . . . 12
     4.3.  EVPN route processing at Border Gateway  . . . . . . . . . 13
     4.4.  Multi-Destination tree between Border Gateways . . . . . . 14
     4.5.  Inter-site Unicast traffic . . . . . . . . . . . . . . . . 14
     4.6.  Inter-site Multi-destination traffic . . . . . . . . . . . 15
     4.7.  Host Mobility  . . . . . . . . . . . . . . . . . . . . . . 15
   5.  Convergence  . . . . . . . . . . . . . . . . . . . . . . . . . 16
     5.1.  Fabric to Border Gateway Failure . . . . . . . . . . . . . 16
     5.2.  Border Gateway to Border Gateway Failures  . . . . . . . . 16
   6.  Interoperability . . . . . . . . . . . . . . . . . . . . . . . 16
   7.  Isolation of Fault Domains . . . . . . . . . . . . . . . . . . 16
   8.  MVPN with Multi-site EVPN  . . . . . . . . . . . . . . . . . . 17
     8.1.  Inter-Site MI-PMSI . . . . . . . . . . . . . . . . . . . . 17
     8.2.  Stitching of customer multicast trees across sites . . . . 17
     8.3.  RP placement across sites  . . . . . . . . . . . . . . . . 18

Sharma, et al.         Expires November 13, 2022                [Page 2]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

     8.4.  Inter-Site S-PMSI  . . . . . . . . . . . . . . . . . . . . 18
   9.  Secure Data and Control Plane  . . . . . . . . . . . . . . . . 18
   10.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . 18
   11.  Security Considerations . . . . . . . . . . . . . . . . . . . 18
   12.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
   13.  References  . . . . . . . . . . . . . . . . . . . . . . . . . 18
     13.1.  Normative References  . . . . . . . . . . . . . . . . . . 19
     13.2.  Informative References  . . . . . . . . . . . . . . . . . 19
   14. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 20

1.  Introduction

   Ethernet VPNs (EVPNs) are being used to support various VPN
   topologies with the motivation and requirements being discussed in
   [RFC7209]. EVPN has been used as the control plane to provide a
   Network Virtualization Overly (NVO) solution with a variety of tunnel
   encapsulation options, as per [RFC8365]. The Layer-2 Data center
   interconnect (DCI) procedures for IP and MPLS hand-off at domain
   boundaries are additionally discussed in [RFC9014], which is
   complemented by [EVPN-IPVPN] for Layer-3 DCI. The Multi-Site Solution
   combines Layer-2 and Layer-3 DCI for Ethernet VPN (EVPN) Overlay.

   In current EVPN deployments, there is a need to segment the EVPN
   domains within a Data Center (DC), primarily due to the service
   architecture and the scaling requirements around it. The number of
   routes, tunnel end-points (TEPs), and next-hops needed within a DC
   domain are sometimes larger than the capability of the hardware
   elements that are being deployed. Network operators would like to
   interconnect these domains without using traditional DCI
   technologies. In essence, they want smaller EVPN domains with an IP-
   based backbone to interconnect. Additionally, they seek a simple and
   scalable redundancy model for the interconnect gateway with IP-based
   ECMP load distribution that does not incur additional protocol
   requirements to any of the surrounding TEPs. Using Anycast  for the
   gateway redundancy provides minimal state sharing and it can scale
   out widely. A number of gateways participate in a  Anycast set, which
   is represented by a single Anycast IP Address often also referred to
   as Virtual IP address or VIP. A group of gateways shares the same VIP
   and together represents the entry and exit of a given DC domain. The
   many TEPs within a DC domain are masqueraded behind a single Anycast
   TEP, which represents the gateway between the DC internal and DC
   external domain. Also, the Anycast gateway approach alleviates the
   hardware of performing multi-path for overlay reachability and
   respectively reduces control plane paths.

   Network operators today are using the Virtual Network Identifier

Sharma, et al.         Expires November 13, 2022                [Page 3]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   (VNI) to designate a service.  They would like to have this service
   available to a smaller set of nodes within the DC for administrative
   reasons; in essence they want to break up the EVPN domain to multiple
   smaller administrative domains. An advantage of having a smaller
   footprint for these EVPN sites results in fault isolation domains
   being constrained. It also allows for flexible VNI allocation across
   sites, which subsequent can be stitched together for end-to-end

   In this document we focus on the Layer-2 and Layer-3 DCI with VXLAN
   encapsulation for EVPN deployments with the underlay providing only
   IP connectivity. We describe in detail the IP/VXLAN gateway procedure
   using the Anycast mode to interconnect  smaller sites within the data
   center itself, and refer to this deployment model as multi-site EVPN
   (MS-EVPN). The procedures described here goes into substantial
   details regarding interconnecting Layer-2 (L2) and Layer-3 (L3)
   networks, for unicast and multicast domains across MS-EVPNs using the
   Anycast gateway model. In this specification, we are based on the
   [RFC9014] definitions for Layer-2 DCI with addition for operating
   with an Anycast gateway approach. The Anycast gateway mode as
   describe within this document can be extended to interop with a DC
   domain that interconnects with a [RFC9014] gateway.

2.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   DC:  Data Center

   DCI:  Data Center Interconnect

   DF:  Designated Forwarder

   EVI:  EVPN Instance

   EVPN:  Ethernet Virtual Private Network, as in [RFC7432]

   Border Gateway (BGW): This is the gateway node that is located
   between the DC/site internal and DC/site external domain. It is
   responsible for functionality related to traffic entering and exiting
   a site.

   Anycast Border Gateway (A-BGW): A virtual set of BGWs sharing the

Sharma, et al.         Expires November 13, 2022                [Page 4]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   same Anycast IP address (Virtual IP / VIP) acting as common
   entry/exit points for a single site.

   Multipath Border Gateway: A virtual set of unique gateways, as
   described in [RFC9014], acting as a multiple individual entry/exit
   points for a single site.

   ES:  Ethernet Segment

   ESI:  Ethernet Segment Identifier

   GW:  Gateway or Data Center Gateway

   I-ES and I-ESI: Interconnect Ethernet Segment and Interconnect
   Ethernet Segment Identifier.  An I-ES is defined on the GWs for
   multihoming to/from the WAN.

   RT-X: Route Type X as defined for various EVPN route types.

   VNI: refers to VXLAN virtual identifiers

   VXLAN: Virtual eXtensible LAN

3.  Multi-Site EVPN Overview

   In this section we describe the motivation, requirements, and
   framework for the Multi-Site EVPN (MS-EVPN) functionality. To
   introduce the Multi-Site solution, we compare [RFC9014] with the
   Multi-Site solution of this I-D.

   |              | DCI EVPN-Overlay   | Multi-Site                    |
   | Interconnect | Integrated (1-Box) | Integrated (1-Box)            |
   |              | Decoupled (2-Box)  |                               |
   | DCI Encap    | VPLS, PBB-VPLS,    | VXLAN                         |
   |              |  EVPN-NPLS,        |                               |
   |              |  PBB-EVPN, VXLAN   |                               |
   | Gateway Mode | Multipath PIP      | Anycast VIP   | Multipath PIP |
   | ECMP         | Underlay and       | Underlay      | Underlay and  |
   |              |  Overlay           |               |  Overlay      |
   | RT-1 on GW   | Consumed           | None          | Consumed      |
   |              |  and Generated     |               |  and Generated|

Sharma, et al.         Expires November 13, 2022                [Page 5]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   | RT-2 on GW   | Re-Originated by   | Re-Originated | Re-Originated |
   |              |  GW with I-ESI     |  with ESI 0   |  with I-ESI   |
   |              |                    |               |  (Site-ID)    |
   | RT-3 on GW   | Consumed and       | Consumed and  | Consumed and  |
   |              |  Generated         |  Generated    |  Generated    |
   | RT-4 on GW   | Consumed and       | Consumed and  | Consumed and  |
   |              |  Generated         |  Generated    |  Generated    |
   | RT-5 on GW   | [EVPN-IPVPN]       | Re-Originated | Re-Originated |
   | Route        | Separate RD for    | Separate RD for VIP and PI    |
   | Distinguisher|  Intra and Inter DC|                               |
   | Route Target | Separate RT for    | Separate RD for VIP and PI    |
   |              |  Intra and Inter DC|                               |
   | VNI          | Global and         | Global and Downstream         |
   |  Allocation  |  Downstream        |                               |
   |  Stitching at| Gateway            | Gateway                       |
   | DF Election  | Based on RT-4      | Based on RT-4                 |
   |  Identifier  | I-ESI              | I-ESI (Site-ID)               |
   |  Split       | Local Bias         | Local Bias                    |
   |   Horizon    |                    |                               |
   |  ESI-Type    | Type 0             | Type 5 (AS Based) or          |
   |              |  (Operator Managed)|  Type 3 (MAC based)           |
   | BUM Tree #   | 2, GW stitched     | 2, GW stitched                |
   |              | (Intra & Inter DC) |  (Intra & Inter DC)           |

3.1.  MS-EVPN Interconnect Requirements

   a. Scalability: Multi-Site EVPN (MS-EVPN) should be able to
   interconnect multiple sites, allowing for addition/deletion of new
   sites or modifying capacity of existing ones seamlessly.

   b. Multi-Destination traffic over unicast-only backbone: MS-EVPN
   mechanisms should provide an efficient forwarding mechanism for
   multi-destination frames by using existing network elements as-is. A

Sharma, et al.         Expires November 13, 2022                [Page 6]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   large flat fabric rules out the option of ingress replication, as the
   number of replications becomes practically unachievable due to the
   internal hardware bandwidth needed.

   c. Maintain Site-specific Administrative control: MS-EVPN should be
   able to interconnect fabrics from different Administrative domains.
   The solution should allow for different sites to have different VLAN-
   VNI mappings, use different underlay routing protocols, and/or have
   different PIM-SM group ranges.

   d. Isolate fault domains: MS-EVPN technology hand-off should have
   capability to isolate traffic across site boundaries and prevent
   defects to percolate from one site to another. As an example, a
   broadcast storm in a site should not propagate to other sites.

3.2.  MS-EVPN Interconnect concept and framework

   MS-EVPN is conceptualized as multiple EVPN control plane and NVO
   forwarding domains, interconnected via a single common EVPN control
   and NVO forwarding domain. A set of gateway node are identified with
   a unique identifier, which then represent a site. A site is a EVPN
   domain, consisting of multiple EVPN nodes frontended by a set of

   Border Gateways (BGWs) are explicitly part of one site-specific EVPN
   domain, and implicitly part of a common interconnect EVPN domain wit
   BGWs from other sites. Although a BGW has only a single explicit
   site-id (that of the site it is a member of, see Section X.X), it can
   be considered to also have a second implicit site-id, that of the
   interconnect-domain which has membership of all the BGWs from all
   sites that are being interconnected. BGWs act implicitly given they
   are the BGP next-hop from an entry/exit perspective; they perform
   both, the control and forwarding plane gateway functionally. This
   facilitates site internal nodes to visualize all other sites to be
   reachable only via its BGWs

   We describe the MS-EVPN deployment model using the topology as shown
   in Figure 1. In the topology there are 3 sites, Site A, Site B, and
   Site C that are inter-connected using a IP backbone. This entire
   topology is deemed to be part of the same Data Center. In most
   deployments these sites can be thought of as pods, which may span a
   rack, a row, or multiple rows in the data center, depending on the
   size of domain desired for scale and fault and/or administrative
   isolation. Nothing prevents MS-EVPN to perform long distance or
   geographically dispersed Data center interconnect service.

   In this topology, site internal nodes are connected to each other by

Sharma, et al.         Expires November 13, 2022                [Page 7]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   iBGP EVPN peering and BGWs are connected by eBGP Muti-hop EVPN
   peering towards remote site BGW. We explicitly spell this out to
   ensure that we can re-use BGP semantics of route announcement between
   and across the sites. Other BGP mechanisms to instantiate this will
   be discussed in a separate document. This implies that each
   domain/site has its own AS number. In the topology, only 2 border
   gateway per site are shown; this is more for ease of illustration and
   explanation. The technology poses no such limitation. As mentioned
   earlier, site internal EVPN domain consists of only nodes within a
   site. A BGW is logically partitioned into site internal EVPN domain
   towards the site and into common EVPN domain towards other sites
   (external). This facilitates them to act as control and forwarding
   plane gateway for forwarding traffic across sites.

   EVPN nodes within a site will discover each other via regular EVPN
   procedures and build site internal bidirectional VXLAN tunnels and
   multi-destination trees from leaves to BGWs. Similarly BGWs will
   discover each other by regular EVPN procedure and build site external
   bi-directional VXLAN tunnels and multi-destination trees between
   them. We thus build an end-to-end bidirectional forwarding path
   across all sites by stitching (and not by stretching end-to-end) site
   internal VXLAN tunnels with site external VXLAN tunnels. In essence,
   a MS-EVPN fabric is built in complete downstream and modular fashion.

       +----+    +----+        +----+    +----+          ___
       |    |    |    |        |    |    |    |           |
       |NVE1|    |NVE2|        |NVE3|    |NVE4|           |
       |    |    |    |        |    |    |    |           |
       +----+    +----+        +----+    +----+           |
         |         |             |         |            EVPN
     +------------------+    +------------------+        Ovl*
     |                  |    |                  |         |
     |     Site A       |    |      Site B      |         |
     | +----+    +----+ |    | +----+    +----+ |         |
     +-|    |----|    |-+    +-|    |----|    |-+         |
       |BGW1|    |BGW2|        |BGW3|    |BGW4|          ---
   +---|    |----|    |--------|    |----|    |---+       |
   |   +----+    +----+        +----+    +----+   |       |
   |                                              |       |
   |                 IP Backbone                  |      EVPN
   |                                              |      Ovl*
   |              +----+     +----+               |       |
   +--------------|    |-----|    |---------------+       |
                  |BGW5|     |BGW6|                      ---
              +---|    |-----|    |---+                   |
              |   +----+     +----+   |                   |
              |         Site C        |                   |
              |                       |                   |

Sharma, et al.         Expires November 13, 2022                [Page 8]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

              +-----------------------+                   |
                   |          |                         EVPN
                 +----+    +----+                       Ovl*
                 |    |    |    |                         |
                 |NVE5|    |NVE6|                         |
                 |    |    |    |                         |
                 +----+    +----+                        ---

   * EVPN-Ovl stands for EVPN-Overlay (and it's an interconnect option).

            Figure 1

   Intra site tenant domains (for example, bridging, flood, routing, and
   multicast) are interconnected only via BGWs with site external tenant
   domains (bridging, flood, routing, and multicast respectively) from
   remote sites. It stitches such tenant domains (bridging, flood,
   routing, and multicast) in complete downstream fashion using EVPN
   route advertisements. Such interconnects do not assume uniform
   mappings of mac-vrf (or IP-VRF) to VNI across sites.

4.  Multi-site EVPN Interconnect Procedures

   In this section we describe the new functionalities in the Border
   Gateway (BGW) nodes for interconnecting EVPN sites within the DC.

   In a nutshell, BGW discovery will facilitate termination and re-
   origination of inter-site VXLAN tunnels. Such discovery provides
   flexibility for intra-site TEP-to-TEP VXLAN tunnels to co-exist with
   inter-site tunnels terminating on BGWs. Additionally, BGWs need to
   discover each other such that it is possible to run the Designated
   Forwarder (DF) election between the border nodes of a site. It also
   needs to be aware of other remote BGWs such that it can allow for
   appropriate import/export of routes from other sites.

4.1.  Border Gateway Discovery

   BGW nodes of the same site MUST be configured or auto-generate the
   same site-identifier. In addition, the BGW is aware of its site
   internal and site external connection. Nodes that are part of the
   same site will build VXLAN tunnels only between members of the same
   site including the BGW; this is facilitated by site internal EVPN
   node reachability that stays site internal. BGWs will additionally
   build VXLAN tunnels between itself and other BGWs that are of a
   remote site. The remote BGWs are identified by the EVPN peering of
   type "external".

Sharma, et al.         Expires November 13, 2022                [Page 9]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   The site-identifier, used for BGW site participation and DF election,
   is encoded within a Site ESI label (I-ESI) itself as described below.

   In this specification, we reuse the AS-based Ethernet Segment
   Identifier (ESI) Type 5 (see Section 5 of [RFC7432]) that can be
   auto-generated or configured by the operator. It is repeated here to
   illustrate the encoding of the site-identifier.

   o Type 5 (T=0x05): The ESI value is constructed with the site-id
   parameter being embedded as follows.

    * AS number (4 octets). This is an AS number owned by the system and
   MUST be encoded in the high-order 4 octets of the ESI Value field. If
   a 2-octet AS number is used, the high-order extra 2 octets will be

    * Local Discriminator/Site Identifier (4 octets): The Local
   Discriminator is also referred to as the Site Identifier and its
   value MUST be encoded as follows. The high-order 2 octets will be
   0x0000, and the low order 2 octets will be set to the site-identifier
   to which this node belongs. All border gateways MUST announce this
   value. We need the AS number and the site identifier together to be
   automatically derivable to less than 6 octets; this enables for auto
   import and export of routes (see the ES-Import RT definition in

    * Reserved (1 octet): The low-order octets of the ESI Value will be
   set to 0 on transmission and will be ignored on receipt.

           0   1   2   3   4   5   6   7   8   9
        | T |          ESI Value                |

            Figure 2

   The site identifier value must be globally unique within the
   deployments. Hence all BGWs are able to figure out other BGWs
   belonging to the same site, and armed with this information is able
   to run a Designated Forwarder (DF) election for BGWs site and VNI
   scoped as against the traditional Ethernet segment DF election. This
   said, the usage of the Type 5 ESI is not absolute, meaning other ESI
   Types could be leverage, like how [RFC9014] describes the usage. This
   alternate numbering is sufficient as long as the type and value
   requirement has ben satisfied globally, as well as for a set of BGW
   serving a common site. For example, if a implementation chooses to
   leverage a ESI of Type 0 or Type 3 and encodes the site-identifier
   respectively, this should not result in any disadvantage to any site

Sharma, et al.         Expires November 13, 2022               [Page 10]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   internal or site external EVPN node. [RFC9014] for example recommends
   the usage of ESI Type 0 for the I-ESI. In Figure 1, nodes BGW1, BGW2,
   BGW3, BGW4, BGW5 and BGW6, will announce the ESI Label and the per-
   VNI RT Extended Communities.  Nodes, BGW1, and BGW2, will perform a
   DF election for Site-A, whereas, nodes BGW3, and BGW4 will perform
   one for site-B.  Even though, all BGW nodes are able to see all the
   advertisements, the site identifier scopes the DF election (using RT-
   4 ES Routes) to its site members. This specification uses the All-
   Active Redundancy Mode specially when the Anycast model of route
   announcements are used for the local routes. It is noteworthy that
   even with the DF election based on RT-4, the EVPN RT-2, MAC/IP Route,
   will not leverage any ESI in its NLRI and hence is not required to
   send a related RT-1 (EAD route). Given the Anycast BGW model, no
   overlay multi-path is required given the next-hop is always the VIP

4.2.  Border Gateway Provisioning Border Gateways manage both the
   control-plane communications and the data forwarding plane for any
   traffic between sites.

   BGWs are implicitly discovered by any RT-2/RT-5 routes from other
   sites. Any RT-2/RT-5 route will be terminated and re-originated on
   such BGWs. RT-2/RT-5 routes carry downstream VNI labels. As BGW
   discovery is agnostic to symmetric or downstream VNI provisioning,
   rewriting next-hop attributes before re-advertising these routes from
   other sites to a given site provides flexibility to keep different
   mac-VRF or IP-VRF to VNI mapping in different sites and still able to
   interconnect L3 and L2 domains.

   RT-1, RT-3, and RT-4 from other sites will be terminated at the BGWs.
   As has been defined in the specifications, RT-3 routes carry
   downstream VNI labels and will be used to pre-build VXLAN tunnels in
   the common EVPN domain for L2, L3, and Multi-Destination traffic.

4.2.1.  Border Gateway Designated Forwarder Election

   In the presence of more than one BGW node in a site, forwarding of
   multi-destination L2 or L3 traffic both into the site and out of the
   site needs to be carried out by a single node. This node is termed as
   a designated forwarder and elected per-VNI as per rules defined in
   Section 8.5 of [RFC7432]. RT-4 Ethernet Segment routes are used for
   the DF election. In the multi-site deployment, the RT-4 Ethernet
   Segment routes carry a ES-Import RT Extended Community attribute with
   it. We need to enforce that these are imported to only the local site
   members when the ES-Import value matches with its own value. The 6-

Sharma, et al.         Expires November 13, 2022               [Page 11]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   byte values are generated using a concatenation of the 4-byte AS
   number the member belongs, with the 2-bytes of site-identifier. As a
   result, only local site-members will match to form the candidate
   list. All the BGWs are able to extract the site-identifier from this
   attribute and the list of nodes where this election is run is now
   constrained to the BGWs between same site members.

   In both modes (Anycast and Multipath), RT-3 routes will be generated
   locally and advertised by DF winner Border Gateway with unique
   gateway IP.  This will facilitate building fast converging flood
   domain connectivity inter-site and intra-site and on same time
   avoiding duplicate traffic by electing DF winner to forward multi-
   destination inter-site traffic.

   Failure events which lead to a BGW losing all of its connectivity to
   the IP interconnect backbone should trigger the BGW to withdraw its
   Border RT-4 Ethernet Segment route(s), to indicate to other BGW's of
   the same site that it is no longer a candidate BGW.

4.2.2.  Anycast Border Gateway

   In this mode all BGWs share same gateway IP (VIP) and rewrite EVPN
   next-hop attributes with a shared logical next-hop entity. However,
   these BGWs will maintain unique gateway IP (PIP) to facilitate
   building IR trees from site-local nodes to forward Multi-Destination
   traffic. EVPN RT-2, RT-5 routes will be advertised to the nodes in
   the site from all other BGWs and BGW will run DF election per VNI for
   Multi destination traffic. RT-3 routes will be advertised by the DF
   winner BGW for a given VNI so that only DF will receive and forward
   inter-site traffic. It is also possible to advertise and draw traffic
   by all BGWs at a site to improve convergence properties of the
   network. In case of multi-destination trees built by non-EVPN
   procedures (say PIM), all BGWs will receive but only DF winner will
   forward traffic.

   It is recommended that BGW be enabled in the Anycast mode wherein the
   BGW functionality is available to the rest of the network as a single
   logical entity for inter-site communication. In the absence of
   Anycast capability the BGW could be enabled as individual gateways.
   As of now, the Border Gateway system MAC of the other border nodes
   belonging to the same site is expected to be configured out-of-band.

4.2.3.  Multi-path Border Gateway

   In this mode, Border gateways will rewrite EVPN Next-hop attributes
   with unique next-hop entities. This provides flexibility to apply

Sharma, et al.         Expires November 13, 2022               [Page 12]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   usual policies and pick per-VRF, per-VNI or per-flow primary/backup
   border Gateways. Hence, an intra-site node will see each BGW as a
   next-hop for any external L2 or L3 unicast destination, and would
   perform an ECMP path selection to load-balance traffic sent to
   external destinations. In case an intra-site node is not capable of
   performing ECMP hash based path-selection (possibly some L2
   forwarding implementations), the node is expected to choose one of
   the BGW's as its designated forwarder. EVPN RT-2, RT-5 routes will be
   advertised to the nodes in the site from all border gateways and
   Border gateway will run DF election per VNI for Multi destination
   traffic. RT-3 routes will be advertised by DF winner Border gateway
   for a given VNI so that only DF will receive and forward inter-site
   traffic. It is also possible to advertise and draw traffic by all
   Border Gateways at a site to improve convergence properties of the
   network. In case of multi-destination trees built by non-EVPN
   procedures (say PIM), all border gateways will receive but only DF
   winner will forward traffic. The Multi-path Border Gateway follows
   the model of the interconnect ESI (I-ESI) as described in [RFC9014].
   With this requirement of multi-path, the RT-2 are labeled with the I-
   ESI and a RT-1 is used for the route resolution as described in
   [RFC7432] section 9.2.2.

4.3.  EVPN route processing at Border Gateway

   BGW functionality in an EVPN site SHOULD be enabled on more than one
   node in the network for redundancy and high-availability purposes.
   Any external RT-2/RT-5 routes that are received by the BGWs of a site
   are advertised to all the intra-site nodes by all the BGWs. For
   internal RT-2/RT-5 routes received by the BGW's from the intra-site
   nodes, all the BGWs of a site would advertise them to the remote
   BGW's, so any L2/L3 known unicast traffic to internal destinations
   could be sent to any one of the local BGW's by remote sources. For
   known L2 and L3 unicast traffic, all of the individual BGWs will
   behave either as single logical forwarding node (Anycast model) or a
   set of active forwarding nodes.

   All control plane and data plane states are interconnected in a
   complete downstream fashion. For example, BGP import rules for a RT-3
   route should be able to extend a flood domain for a VNI and flood
   traffic destined to advertised EVPN node should carry the VNI which
   is announced in RT-3 route. Similarly Type 2, Type 5 control and
   forwarding states should be interconnected in a complete downstream

   o Route Target processing for RT-4 routes: Every IP-VRF and MAC-VRF
   will generate RT-4 with the format described in section 4.1. Route
   targets can be auto derived from Ethernet Tag ID (VLAN ID) for that

Sharma, et al.         Expires November 13, 2022               [Page 13]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   EVPN instance as described in Section 7.10.1 of [RFC7432]. ES import
   route target extended community as described in Section 7.6 of
   [RFC7432] is mandatory for RT-4 in this context. The encoding of ES-
   Import is based on AS number and Site-identifier as described in
   Section 4.2.1.  Such import route target will allow import of RT-4
   only to the Border gateways of same sites.

   o Route Target processing for RT-2, RT-3, RT-5 routes: These routes
   will carry either auto-derived route targets (based on Ethernet Tag
   ID (VLAN ID) for that EVPN instance) or explicit route targets.
   Border gateways usual import rules will imports these routes and re-
   advertise these with border gateway next hops.  Also the routes which
   are imported at Border Gateways and re-advertised SHOULD implement a
   mechanism to avoid looping of updates should they come back at Border
   Gateways. RT-3 routes will be imported and processed on border
   gateways from other border gateways but MUST NOT be advertised again.

4.4.  Multi-Destination tree between Border Gateways

   The procedures described here recommends building an Ingress
   Replication (IR) tree between Border Gateways. This will facilitate
   every site to independently build site-specific Multi-destination
   trees.  Multi-destination end-to-end trees between leafs could be PIM
   (site 1) + IR (between border Gateways) + PIM (site 2) or IR-IR-IR or
   PIM-IR-IR. However this does not rule out using IR-PIM-IR or end-to-
   end PIM to build multi-destination trees end-to-end.

   Border Gateways will generate RT-3 routes with unique gateway IP and
   advertise to Border Gateways of other sites. These RT-3 routes will
   help in building IR trees between border gateways. However, only DF
   winner per VNI will forward multi-destination traffic across sites.

   As Border Gateways are part of both site-specific and inter-site
   Multi-destination IR trees, split-horizon mechanism will be used to
   avoid loops. Multi-destination tree with Border gateway as root to
   other sites (or Border-Gateways) will be in a separate horizon group.
   Similarity Multi-destination IR tree with Border Gateway as root to
   site-local nodes will be in another split horizon group.

   If PIM is used to build Multi-Destination trees in site-specific
   domain, all Border gateway will join such PIM trees and draw multi-
   destination traffic. However only DF Border Gateway will forward
   traffic towards other sites.

4.5.  Inter-site Unicast traffic

Sharma, et al.         Expires November 13, 2022               [Page 14]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   As site internal node will see all site external EVPN routes via
   Border Gateways, VXLAN tunnels will be built between leafs and site
   internal Border Gateways and Inter-site VXLAN tunnels will be built
   between Border gateways in different sites. An end-to-end VXLAN
   bidirectional forwarding path between inter-site leafs will consist
   of VXLAN tunnel from leaf (say Site A) to its Border Gateway (BGW1),
   another VXLAN tunnel from Border Gateway (BGW1) to Border Gateway
   (BGW3) in another site (say site B) and Border gateway (BGW3) to leaf
   (in site B). Such an arrangement of a hierarchical tunnel topology is
   more scalable as a full mesh of VXLAN tunnels across inter-site leafs
   is substituted by combination of intra-site and inter-site tunnels.

   L2 and L3 unicast frames from site internal leafs will reach border
   gateway using VXLAN encapsulation. At Border gateway, VXLAN header is
   stripped out and another VXLAN header is pushed to sent frames to
   destination site Border Gateway. Destination site Border gateway will
   strip off VXLAN header and push another VXLAN header to send frame to
   the destination site leaf.

4.6.  Inter-site Multi-destination traffic

   Multi-destination traffic will be forwarded from one site to other
   site only by DF for that VNI. As frames reach Border Gateway from
   site internal nodes, VXLAN header will be decapsulated from the
   payload, and encapsulated with another VXLAN header (derived from
   downstream RT-3 EVPN routes received from the border gateways of the
   destination site) to forward the payload to the destination site
   border gateway. Similarly destination site Border Gateway will strip
   off VXLAN header and forward the payload after encapsulating with
   another VXLAN header towards the destination leaf.

   As explained in Section 4.4, split horizon mechanism will be used to
   avoid looping of inter-site multi-destination frames.

4.7.  Host Mobility

   Host movement handling will be same as defined in [RFC7432]. When
   host moves, EVPN RT-2 routes with updated sequence number will be
   propagated to every EVPN node. When a host moves inter-site, only
   Border gateways may see EVPN updates with both next-hop attributes
   and sequence number changes and leafs may see updates only with
   updated sequence numbers; this is as described in [RFC9014] section
   4.4.4. However in other cases, both Border gateway and leaves may see
   next-hop and sequence number changes.

Sharma, et al.         Expires November 13, 2022               [Page 15]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

5.  Convergence

5.1.  Fabric to Border Gateway Failure

   If a Border Gateway is lost, Border gateway next-hop will be
   withdrawn for RT-2/RT-5 routes. Also per-VNI DF election will be
   triggered to chose new DF. DF new winner will become forwarder of
   Multi-destination inter-site traffic.

5.2.  Border Gateway to Border Gateway Failures

   In case where inter-site cloud has link failures, direct forwarding
   path between border gateways can be lost. In this case, traffic from
   one site can reach other site via border gateway of an intermediate
   site.  However, this will be addressed like regular underlay failure
   and traffic terminations end-points will still stay same for inter-
   site traffic flows.

6.  Interoperability

   The procedures defined here are only for Border Gateways. Therefore
   other EVPN nodes in the network should be [RFC7432] compliant to
   operate in such topologies.

   As the procedures described here are applicable only based on the
   respective topology configuration or discovery, if other domains are
   connected which are not capable of such multi-site gateway model,
   they can work in regular EVPN mode. In the case of remote sites
   operate in different modes, for example some in Anycast mode, others
   in Multi-Path or [RFC9014] mode, the Anycast BGW will be able to
   accommodate either and adjusts the respective mode. The signalization
   of the respective mode is driven through the presence of ESI in RT-2
   and the per-ES EAD RT-1 route.

   The procedures here provides flexibility to connect non-EVPN VXLAN
   sites by provisioning Border Gateways on such sites and inter-
   connecting such Border Gateways by Border Gateways of other sites.
   Such Border Gateways in non-EVPN VXLAN sites will play dual role of
   EVPN gateway towards common EVPN domain and non-EVPN gateway towards
   non-EVPN VXLAN site.

7.  Isolation of Fault Domains

   Isolation of network defects requires policies like storm control,

Sharma, et al.         Expires November 13, 2022               [Page 16]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   security ACLs etc to be implemented at site boundaries. Border
   gateways should be capable of inspecting inner payload of packets
   received from VXLAN tunnels and enforce configured policies to
   prevent defects percolating from one part to rest of the network.

8.  MVPN with Multi-site EVPN

   BGP based MVPN as defined in [RFC6513] and [RFC6514] will coexist
   with Multisite-EVPN with out any changes in route types and encodings
   defined for MVPN route types in these RFCs. Route Distinguisher and
   VRF route import extended communities will be attached to MVPN routes
   as defined in the BGP MVPN RFCs. Import and Export Route targets will
   be attached to MVPN routes either by Auto-generating them from VNI or
   by explicit configuration per MVPN. Since, BGP MVPN RFC adapts to any
   VPN address family to provide RPF information to build C-Multicast
   trees, EVPN route types will be used to provide required RPF
   information for Multicast sources in MVPNs. In order to follow
   segmentation model of Multisite-EVPN, following procedures are
   recommended to build provider and customer multicast trees between
   sources and receivers across sites.

8.1.  Inter-Site MI-PMSI

   As defined in above mentioned MVPN RFCs, I-PMSI A-D routes are used
   to signal a provider tunnel or MI-PMSI per MVPN. Multisite-EVPN
   recommends EVPN Type-3 routes to build such MI-PMSI provider tunnel
   per VPN between Border Gateways of different sites.  Every MVPN node
   will use its unique router identifier to build these MI-PMSI provider
   tunnels. In Anycast Border gateway model also, these MI-PMSI provider
   tunnels are built using unique router identifier of Border gateways.
   In similar fashion, these Type-3 routes can be used to build MI-PMSI
   provider tunnel per MVPN with in sites.

8.2.  Stitching of customer multicast trees across sites

   All Border Gateways will rewrite next-hop and re-originate MVPN
   routes received from other sites to local site and from local site to
   other sites.  Therefore customer Multicast trees will be logically
   built end-to-end across sites by stitching these trees via Border
   gateways. A C-multicast join route (say Type 7 MVPN) will follow EVPN
   RPF path to build C-multicast tree from leaf in a site to its Border
   gateway and to destination site leafs via destination site Border
   Gateways. Similarly Source-Active A-D MVPN route (Type 5 MVPN) will
   be rewritten with next-hop and re-originated via Border gateways so
   that source C-Multicast trees will be stitched via Border gateways.

Sharma, et al.         Expires November 13, 2022               [Page 17]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

8.3.  RP placement across sites

   Multisite-EVPN recommends only Source C-Multicast trees across sites.
   Therefore Customer RP placement per MVPN should be restricted with in
   sites. Source-Active A-D MVPN route type (Type 5) will be used to
   signal C-Multicast sources across sites.

8.4.  Inter-Site S-PMSI

   As defined in BGP MVPN RFCs, S-PMSI A-D routes (Type 3 MVPN) will be
   used to signal selective PMSI trees for high bandwidth C-Multicast
   streams. These S-PMSI A-D routes will be signaled across sites via
   Border gateways rewriting next-hop and re-originating them to other
   sites.  PMSI tunnel attribute in re-originated S-PMSI routes will be
   adjusted to the provide tunnel types between Border gateways across

9.  Secure Data and Control Plane

   In [SECURE-EVPN] the use case is centered around providing inter-site
   and WAN connectivity over public Internet in a secured manner with
   same level of privacy, integrity, and authentication for tenant's
   traffic as IPsec tunneling using IKEv2. The multi-site enhancements
   in this draft in conjunction with the definitions specified in
   [SECURE-EVPN] can provide EVPN domains with secure communications
   between them.

10.  Acknowledgements

   These authors would like to thank Max Ardica, Murali Garimella,
   Anuj,Mittal, Lilian Quan, Veera Ravinutala, Tarun Wadhwa for their
   review and comments.

11.  Security Considerations


12.  IANA Considerations


13.  References

Sharma, et al.         Expires November 13, 2022               [Page 18]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

13.1.  Normative References

              [SHARMA-MULTI-SITE] Sharma et. al., "Multi-site EVPN based
              VXLAN using Border Gateways", 2017, <draft-sharma-multi-

   [RFC9014] Rabadan et al., "Interconnect Solution for Ethernet VPN
              (EVPN) Overlay Networks", RFC 9014, DOI 10.17487, May
              2021, <>.

   [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
              Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
              Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
              2015, <>.

   [EVPN-IPVPN] J. Rabadan et. al., "EVPN Interworking with IPVPN",
              2021, <

   [SECURE-EVPN] A. Sajassi et. al., "Secure EVPN", 2021,

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, DOI
              10.17487/RFC2119, March 1997, <https://www.rfc-

   [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, DOI
              10.17487/RFC2119, March 1997, <http://www.rfc-

   [RFC1776]  Crocker, S., "The Address is the Message", RFC 1776, DOI
              10.17487/RFC1776, April 1 1995, <http://www.rfc-

   [TRUTHS]   Callon, R., "The Twelve Networking Truths", RFC 1925, DOI
              10.17487/RFC1925, April 1 1996, <http://www.rfc-

13.2.  Informative References

   [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/BGP
              IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 2012,

Sharma, et al.         Expires November 13, 2022               [Page 19]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022


   [RFC6514]  Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
              Encodings and Procedures for Multicast in MPLS/BGP IP
              VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012,

   [RFC7209]  Sajassi, A., Ed., Aggarwal, R., Uttaro, J., Bitar, N.,
              Henderickx, W., and Isaac, A., "Requirements for Ethernet
              VPN (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014,

   [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R.,
              Uttaro, J., and W. Henderickx, "A Network Virtualization
              Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, DOI
              10.17487/RFC8365, March 2018, <https://www.rfc-

   [EVILBIT]  Bellovin, S., "The Security Flag in the IPv4 Header",
              RFC 3514, DOI 10.17487/RFC3514, April 1 2003,

   [RFC5513]  Farrel, A., "IANA Considerations for Three Letter
              Acronyms", RFC 5513, DOI 10.17487/RFC5513, April 1 2009,

   [RFC5514]  Vyncke, E., "IPv6 over Social Networks", RFC 5514, DOI
              10.17487/RFC5514, April 1 2009, <http://www.rfc-

14. Authors' Addresses

   Lukas Krattiger (editor)
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA


   Ayan Banerjee (editor)
   Cisco Systems

Sharma, et al.         Expires November 13, 2022               [Page 20]

INTERNET DRAFT              Multi-Site EVPN                 May 12, 2022

   170 W Tasman Drive
   San Jose, CA


   Ali Sajassi
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA


   Rajesh Sharma
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA


   Raghava Sivaramu
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA


Sharma, et al.         Expires November 13, 2022               [Page 21]