Network Working Group Sami Boutros
INTERNET-DRAFT Ali Sajassi
Category: Standards Track Samer Salam
Dennis Cai
Samir Thoria
Cisco
John Drake
Juniper
Expires: January 16, 2014 July 16, 2013
VXLAN DCI Using EVPN
draft-boutros-l2vpn-vxlan-evpn-02.txt
Abstract
This document describes how Ethernet VPN (EVPN) technology can be
used to interconnect VXLAN or NVGRE networks over an MPLS/IP network.
This is to provide intra-subnet connectivity at Layer 2 and control-
plane separation among the interconnected VXLAN or NVGRE networks.
The scope of the learning of host MAC addresses in VXLAN or NVGRE
network is limited to data plane learning in this document.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Boutros Expires January 16, 2014 [Page 1]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
Copyright and License Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Control Plane Separation among VXLAN/NVGRE Networks . . . . 3
2.2 All-Active Multi-homing . . . . . . . . . . . . . . . . . . 4
2.3 Layer 2 Extension of VNIs/VSIDs over the MPLS/IP Network . . 4
2.4 Support for Integrated Routing and Bridging (IRB) . . . . . 4
3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Redundancy and All-Active Multi-homing . . . . . . . . . . 5
4. EVPN Routes . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. BGP MAC Advertisement Route . . . . . . . . . . . . . . . 6
4.2. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . 7
4.3. Per VPN Route Targets . . . . . . . . . . . . . . . . . . 7
4.4 Inclusive Multicast Route . . . . . . . . . . . . . . . . . 7
4.5. Unicast Forwarding . . . . . . . . . . . . . . . . . . . . 7
4.6. Handling Multicast . . . . . . . . . . . . . . . . . . . . 8
4.6.2. Multicast Stitching with Per-VNI Load Balancing . . . . 9
5. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
7. Security Considerations . . . . . . . . . . . . . . . . . . . 10
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1 Normative References . . . . . . . . . . . . . . . . . . . 10
9.2 Informative References . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
Boutros Expires January 16, 2014 [Page 2]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
1 Introduction
[EVPN] introduces a solution for multipoint L2VPN services, with
advanced multi-homing capabilities, using BGP control plane over the
core MPLS/IP network. [VXLAN] defines a tunneling scheme to overlay
Layer 2 networks on top of Layer 3 networks. [VXLAN] allows for
optimal forwarding of Ethernet frames with support for multipathing
of unicast and multicast traffic. VXLAN uses UDP/IP encapsulation for
tunneling.
In this document, we discuss how Ethernet VPN (EVPN) technology can
be used to interconnect VXLAN or NVGRE networks over an MPLS/IP
network. This is achieved by terminating the VxLAN tunnel at the the
hand-off points, performing data plane MAC learning of customer
traffic and providing intra-subnet connectivity for the customers at
Layer 2 across the MPLS/IP core. The solution maintains control-plane
separation among the interconnected VXLAN or NVGRE networks. The
scope of the learning of host MAC addresses in VXLAN or NVGRE network
is limited to data plane learning in this document. The distribution
of MAC addresses in control plane using BGP in VXLAN or NVGRE network
is outside of the scope of this document and it is covered in [EVPN-
OVERLY].
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
LDP: Label Distribution Protocol
MAC: Media Access Control
MPLS: Multi Protocol Label Switching
NVO: Network Virtualization Overlay
NVE: NVO Endpoint
OAM: Operations, Administration and Maintenance
PE: Provide Edge Node
PW: PseudoWire
TLV: Type, Length, and Value
VPLS: Virtual Private LAN Services
VXLAN: Virtual eXtensible Local Area Network
VTEP: VXLAN Tunnel End Point
VNI: VXLAN Network Identifier (or VXLAN Segment ID)
ToR: Top of Rack switch
2. Requirements
2.1. Control Plane Separation among VXLAN/NVGRE Networks
Boutros Expires January 16, 2014 [Page 3]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
It is required to maintain control-plane separation for the underlay
networks (e.g., among the various VXLAN/NVGRE networks) being
interconnected over the MPLS/IP network. This ensures the following
characteristics:
- scalability of the IGP control plane in large deployments and fault
domain localization, where link or node failures in one site do not
trigger re-convergence in remote sites.
- scalability of multicast trees as the number of interconnected
networks scales.
2.2 All-Active Multi-homing
It is important to allow for all-active multi-homing of the
VXLAN/NVGRE network to MPLS/IP network where traffic from a VTEP can
arrive at any of the PEs and can be forwarded accordingly over the
MPLS/IP network. Furthermore, traffic destined to a VTEP can be
received over the MPLS/IP network at any of the PEs connected to the
VXLAN/NVGRE network and be forwarded accordingly. The solution MUST
support all-active multi-homing to an VXLAN/NVGRE network.
2.3 Layer 2 Extension of VNIs/VSIDs over the MPLS/IP Network
It is required to extend the VXLAN VNIs or NVGRE VSIDs over the
MPLS/IP network to provide intra-subnet connectivity between the
hosts (e.g. VMs) at Layer 2.
2.4 Support for Integrated Routing and Bridging (IRB)
The data center WAN edge node is required to support integrated
routing and bridging in order to accommodate both inter-subnet
routing and intra-subnet bridging for a given VNI/VSID. For example,
inter-subnet switching is required when a remote host connected to an
enterprise IP-VPN site wants to access an application resided on a
VM.
3. Solution Overview
Every VXLAN/NVGRE network, which is connected to the MPLS/IP core,
runs an independent instance of the IGP control-plane. Each PE
participates in the IGP control plane instance of its VXLAN/NVGRE
network.
Each PE node terminates the VXLAN or NVGRE data-plane encapsulation
where each VNI or VSID is mapped to a bridge-domain. The PE performs
data plane MAC learning on the traffic received from the VXLAN/NVGRE
network.
Boutros Expires January 16, 2014 [Page 4]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
Each PE node implements EVPN or PBB-EVPN to distribute in BGP either
the client MAC addresses learnt over the VXLAN tunnel in case of
EVPN, or the PEs' B-MAC addresses in case of PBB-EVPN. In the PBB-
EVPN case, client MAC addresses will continue to be learnt in data
plane.
Each PE node would encapsulate the Ethernet frames with MPLS when
sending the packets over the MPLS core and with the VXLAN or NVGRE
tunnel header when sending the packets over the VXLAN or NVGRE
Network.
+--------------+
| |
+---------+ +----+ MPLS +----+ +---------+
+-----+ | |---|PE1 | |PE3 |--| | +-----+
|NVE1 |--| | +----+ +----+ | |--|NVE3 |
+-----+ | VXLAN | +----+ +----+ | VXLAN | +-----+
+-----+ | |---|PE2 | |PE4 |--| | +-----+
|NVE2 |--| | +----+Backbone+----+ | |--|NVE4 |
+-----+ +---------+ +--------------+ +---------+ +-----+
|<--- Underlay IGP ---->|<-Overlay BGP->|<--- Underlay IGP --->| CP
|<----- VXLAN --------->|<EVPN/PBB-EVPN>|<------ VXLAN ------->| DP
|<----MPLS----->|
Legend: CP = Control Plane View DP = Data Plane View
Figure 1: Interconnecting VXLAN Networks with VXLAN-EVPN
3.1. Redundancy and All-Active Multi-homing
When a VXLAN network is multi-homed to two or more PEs, and provided
that these PEs have the same IGP distance to a given NVE, the
solution MUST support load-balancing of traffic between the NVE and
the MPLS network, among all the multi-homed PEs. This maximizes the
use of the bisectional bandwidth of the VXLAN network. One of the
main capabilities of EVPN/PBB-EVPN is the support for all-active
multi-homing, where the known unicast traffic to/from a multi-homed
site can be forwarded by any of the PEs attached to that site. This
ensures optimal usage of multiple paths and load balancing. EVPN/PBB-
EVPN, through its DF election and split-horizon filtering mechanisms,
ensures that no packet duplication or forwarding loops result in such
scenarios. In this solution, the VXLAN network is treated as a
multi-homed site for the purpose of EVPN operation.
Since the context of this solution is VXLAN networks with data-plane
Boutros Expires January 16, 2014 [Page 5]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
learning paradigm, it is important for the multi-homing mechanism to
ensure stability of the MAC forwarding tables at the NVEs, while
supporting all-active forwarding at the PEs. For example, in Figure 1
above, if each PE uses a distinct IP address for its VTEP tunnel,
then for a given VNI, when an NVE learns a host's MAC address against
the originating VTEP source address, its MAC forwarding table will
keep flip-flopping among the VTEP addresses of the local PEs. This is
because a flow associated with the same host MAC address can arrive
at any of the PE devices. In order to ensure that there is no
flip/flopping of MAC-to-VTEP address associations, an IP Anycast
address MUST be used as the VTEP address on all PEs multi-homed to a
given VXLAN network. The use of IP Anycast address has two
advantages:
a) It prevents any flip/flopping in the forwarding tables for the
MAC-to-VTEP associations
b) It enables load-balancing via ECMP for DCI traffic among the
multi-homed PEs
In the baseline [EVPN] draft, the all-active multi-homing is
described for a multi-homed device (MHD) using [LACP] and the single-
active multi-homing is described for a multi-homed network (MHN)
using [802.1Q]. In this draft, the all-active multi-homing is
described for a VXLAN MHN. This implies some changes to the filtering
which will be described in details in the multicast section (Section
4.6.2).
The filtering used for BUM traffic of all-active multi-homing in
[EVPN] is asymmetric; where the BUM traffic from the MPLS/IP network
towards the multi-homed site is filtered on non-DF PE(s) and it
passes thorough the DF PE. There is no filtering of BUM traffic
originating from the multi-homed site because of the use of Ethernet
Link Aggregation: the MHD hashes the BUM traffic to only a single
link. However, in this solution because BUM traffic can arrive at
both PEs in both core-to-site and site-to-core directions, the
filtering needs to be symmetric just like the filtering of BUM
traffic for single-active multi-homing (on a per service
instance/VLAN basis).
4. EVPN Routes
This solution leverages the same BGP Routes and Attributes defined in
[EVPN], adapted as follows:
4.1. BGP MAC Advertisement Route
Boutros Expires January 16, 2014 [Page 6]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
This route and its associated modes are used to distribute the
customer MAC addresses learnt in data plane over the VXLAN tunnel in
case of EVPN. Or can be used to distribute the provider Backbone MAC
addresses in case of PBB-EVPN.
In case of EVPN, the Ethernet Tag ID of this route is set to zero for
VNI-based mode, where there is one-to-one mapping between a VNI and
an EVI. In such case, there is no need to carry the VNI in the MAC
advertisement route because BD ID can be derived from the RT
associated with this route. However, for VNI-aware bundle mode, where
there is multiple VNIs can be mapped to the same EVI, the Ethernet
Tag ID MUST be set to the VNI. At the receiving PE, the BD ID is
derived from the combination of RT + VNI - e.g., the RT identifies
the associated EVI on that PE and the VNI identifies the
corresponding BD ID within that EVI.
4.2. Ethernet Auto-Discovery Route
When EVPN is used, the application of this route is as specified in
[EVPN]. However, when PBB-EVPN is used, there is no need for this
route per [PBB-EVPN].
4.3. Per VPN Route Targets
VXLAN-EVPN uses the same set of route targets defined in [EVPN].
4.4 Inclusive Multicast Route
The EVPN Inclusive Multicast route is used for auto-discovery of PE
devices participating in the same tenant virtual network identified
by a VNI over the MPLS network. It also enables the stitching of the
IP multicast trees, which are local to each VXLAN site, with the
Label Switched Multicast (LSM) trees of the MPLS network.
The Inclusive Multicast Route is encoded as follow:
- Ethernet Tag ID is set to zero for VNI-based mode and to VNI for
VNI-aware bundle mode.
- Originating Router's IP Address is set to one of the PE's IP
addresses.
All other fields are set as defined in [EVPN].
Please see section 4.6 "Handling Multicast"
4.5. Unicast Forwarding
Boutros Expires January 16, 2014 [Page 7]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
Host MAC addresses will be learnt in data plane from the VXLAN
network and associated with the corresponding VTEP identified by the
source IP address. Host MAC addresses will be learnt in control plane
if EVPN is implemented over the MPLS/IP core, or in the data-plane if
PBB-EVPN is implemented over the MPLS core. When Host MAC addressed
are learned in data plane over MPLS/IP core [in case of PBB-EVPN],
they are associated with their corresponding BMAC addresses.
L2 Unicast traffic destined to the VXLAN network will be encapsulated
with the IP/UDP header and the corresponding customer bridge VNI.
L2 Unicast traffic destined to the MPLS/IP network will be
encapsulated with the MPLS label.
4.6. Handling Multicast
Each VXLAN network independently builds its P2MP or MP2MP shared
multicast trees. A P2MP or MP2MP tree is built for one or more VNIs
local to the VXLAN network.
In the MPLS/IP network, multiple options are available for the
delivery of multicast traffic:
- Ingress replication
- LSM with Inclusive trees
- LSM with Aggregate Inclusive trees
- LSM with Selective trees
- LSM with Aggregate Selective trees
When LSM is used, the trees are P2MP.
The PE nodes are responsible for stitching the IP multicast trees, on
the access side, to the ingress replication tunnels or LSM trees in
the MPLS/IP core. The stitching must ensure that the following
characteristics are maintained at all times:
1. Avoiding Packet Duplication: In the case where the VXLAN network
is multi-homed to multiple PE nodes, if all of the PE nodes forward
the same multicast frame, then packet duplication would arise. This
applies to both multicast traffic from site to core as well as from
core to site.
2. Avoiding Forwarding Loops: In the case of VXLAN network multi-
homing, the solution must ensure that a multicast frame forwarded by
a given PE to the MPLS core is not forwarded back by another PE (in
the same VXLAN network) to the VXLAN network of origin. The same
applies for traffic in the core to site direction.
The following approach of per-VNI load balancing can guarantee proper
Boutros Expires January 16, 2014 [Page 8]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
stitching that meets the above requirements.
4.6.2. Multicast Stitching with Per-VNI Load Balancing
To setup multicast trees in the VXLAN network for DC applications,
PIM Bidir can be of special interest because it reduces the amount of
multicast state in the network significantly. Furthermore, it
alleviates any special processing for RPF check since PIM Bidir
doesn't require any RPF check. The RP for PIM Bidir can be any of the
spine nodes. Multiple trees can be built (e.g., one tree rooted per
spine node) for efficient load-balancing within the network. All PEs
participating in the multi-homing of the VXLAN network join all the
trees. Therefore, for a given tree, all PEs receive BUM traffic. DF
election procedures of [EVPN] are used to ensure that only traffic
to/from a single PE is forwarded, thus avoiding packet duplications
and forwarding loops. For load-balancing of BUM traffic, when a PE or
an NVE wants to send BUM traffic over the VXLAN network, it selects
one of the trees based on its VNI and forwards all the traffic for
that VNI on that tree. PIM SM will be described in future revision
of this draft.
Multicast traffic from VXLAN/NVGRE is first subjected to filtering
based on DF election procedures of [EVPN] using the VNI as the
Ethernet Tag. This is similar to filtering in [EVPN] in principal;
however, instead of VLAN ID, VNI is used for filtering, and instead
of being 802.1Q frame, it is a VXLAN encapsulated packet. On the DF
PE, where the multicast traffic is allowed to be forwarded, the VNI
is used to select a bridge domain,. After the packet is de-
capsulated, an L2 lookup is performed based on host MAC DA. It should
be noted that the MAC learning is performed in data-plane for the
traffic received from the VXLAN/NVGRE network and the host MAC SA is
learnt against the source VTEP address.
The PE nodes, connected to a multi-homed VXLAN network, perform BGP
DF election to decide which PE node is responsible for forwarding
multicast traffic associated with a given VNI. A PE would forward
multicast traffic for a given VNI only when it is the DF for this
VNI. This forwarding rule applies in both the site-to-core as well as
core-to-site directions.
5. NVGRE
Just like VXLAN, all the above specification would apply for NVGRE,
replacing the VNI with Virtual Subnet Identifier (VSID) and the VTEP
with NVGRE Endpoint.
Boutros Expires January 16, 2014 [Page 9]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
6. Acknowledgements
TBD.
7. Security Considerations
There are no additional security aspects that need to be discussed
here.
8. IANA Considerations
TBD.
9. References
9.1 Normative References
[KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2 Informative References
[EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf-
l2vpn-evpn-00.txt, work in progress, February, 2012.
[TRILL] Sajassi et al., TRILL-EVPN draft-ietf-l2vpn-trill-evpn-00,
work in progress, June 2012.
[VXLAN] Mahalingam, Dutt et al., A Framework for Overlaying
Virtualized Layer 2 Networks over Layer 3 Networks draft-mahalingam-
dutt-dcops-vxlan-02.txt, work in progress, August, 2012.
[NVGRE] Sridharan et al., Network Virtualization using Generic
Routing Encapsulation draft-sridharan-virtualization-nvgre-01.txt,
work in progress, July, 2012.
Authors' Addresses
Sami Boutros
Cisco
EMail: sboutros@cisco.com
Ali Sajassi
Cisco
EMail: sajassi@cisco.com
Samer Salam
Boutros Expires January 16, 2014 [Page 10]
INTERNET DRAFT VXLAN-EVPN July 16, 2013
Cisco
EMail: ssalam@cisco.com
Dennis Cai
Cisco
EMail: dcai@cisco.com
John Drake
Juniper Networks
Email: jdrake@juniper.net
Samir Thoria
Cisco
EMail: sthoria@cisco.com
Boutros Expires January 16, 2014 [Page 11]