L2VPN Workgroup J. Rabadan
Internet Draft W. Henderickx
S. Palislamovic
Intended status: Standards Track Alcatel-Lucent
J. Drake F. Balus
Juniper Nuage Networks
A. Sajassi A. Isaac
Cisco Bloomberg
Expires: January 5, 2015 July 4, 2014
IP Prefix Advertisement in EVPN
draft-rabadan-l2vpn-evpn-prefix-advertisement-02
Abstract
EVPN provides a flexible control plane that allows intra-subnet
connectivity in an IP/MPLS and/or an NVO-based network. In NVO
networks, there is also a need for a dynamic and efficient inter-
subnet connectivity across Tenant Systems and End Devices that can be
physical or virtual and may not support their own routing protocols.
This document defines a new EVPN route type for the advertisement of
IP Prefixes and explains some use-case examples where this new route-
type is used.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
Rabadan et al. Expires January 5, 2015 [Page 1]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 5, 2015.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction and problem statement . . . . . . . . . . . . . . 3
2.1 Inter-subnet connectivity requirements in Data Centers . . . 4
2.2 The requirement for advertising IP prefixes in EVPN . . . . 6
2.3 The requirement for a new EVPN route type . . . . . . . . . 7
3. The BGP EVPN IP Prefix route . . . . . . . . . . . . . . . . . 9
3.1 IP Prefix Route encoding . . . . . . . . . . . . . . . . . . 9
4. Benefits of using the EVPN IP Prefix route . . . . . . . . . . 11
5. IP Prefix next-hop use-cases . . . . . . . . . . . . . . . . . 12
5.1 TS IP address next-hop use-case . . . . . . . . . . . . . . 12
5.2 Floating IP next-hop use-case . . . . . . . . . . . . . . . 15
5.3 IRB IP next-hop use-case . . . . . . . . . . . . . . . . . . 16
5.4 ESI next-hop ("Bump in the wire") use-case . . . . . . . . . 18
5.5 IRB forwarding without core-facing IRB use-case
(VRF-to-VRF) . . . . . . . . . . . . . . . . . . . . . . . . 20
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7. Conventions used in this document . . . . . . . . . . . . . . . 24
8. Security Considerations . . . . . . . . . . . . . . . . . . . . 24
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 24
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.1 Normative References . . . . . . . . . . . . . . . . . . . 24
10.2 Informative References . . . . . . . . . . . . . . . . . . 24
11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24
12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 24
Rabadan et al. Expires January 5, 2015 [Page 2]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
1. Terminology
GW IP: Gateway IP Address
IPL: IP address length
IRB: Integrated Routing and Bridging interface
ML: MAC address length
NVE: Network Virtualization Edge
TS: Tenant System
VA: Virtual Appliance
RT-2: EVPN route type 2, i.e. MAC/IP advertisement route
RT-5: EVPN route type 5, i.e. IP Prefix route
Overlay next-hop: object used in the IP Prefix route, as described in
this document. It can be an IP address in the tenant space or an ESI,
and identifies the next-hop to be used in IP lookups for a given IP
Prefix at the routing context importing the route.
Underlay next-hop: IP address sent by BGP along with any EVPN route,
i.e. BGP next-hop. It identifies the NVE sending the route and it is
used at the receiving NVE as the VXLAN destination VTEP or NVGRE
destination end-point.
2. Introduction and problem statement
Inter-subnet connectivity is required for certain tenants within the
Data Center. [EVPN-INTERSUBNET] defines some fairly common inter-
subnet forwarding scenarios where TSes can exchange packets with TSes
located in remote subnets. In order to meet this requirement, [EVPN-
INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes are not
only used to populate MAC-VRF and overlay ARP tables, but also IP-VRF
tables with the encoded TS host routes (/32 or /128). In some cases,
EVPN may advertise IP Prefixes and therefore provide aggregation in
the IP-VRF tables, as opposed to program individual host routes. This
document complements the scenarios described in [EVPN-INTERSUBNET]
and defines how EVPN may be used to advertise IP Prefixes.
Section 2.1 describes the inter-subnet connectivity requirements in
Data Centers. Section 2.2 and 2.3 explain why neither IP-VPN nor the
existing EVPN route types meet the requirements for IP Prefix
advertisements. Once the need for a new EVPN route type is justified,
Rabadan et al. Expires January 5, 2015 [Page 3]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
sections 2 and 3 will describe this route type and how it is used in
some specific use cases.
2.1 Inter-subnet connectivity requirements in Data Centers
[EVPN] is used as the control plane for a Network Virtualization
Overlay (NVO3) solution in Data Centers (DC), where Network
Virtualization Edge (NVE) devices can be located in Hypervisors or
TORs, as described in [EVPN-OVERLAYS].
If we use the term Tenant System (TS) to designate a physical or
virtual system identified by MAC and IP addresses, and connected to
an EVPN instance, the following considerations apply:
o The Tenant Systems may be Virtual Machines (VMs) that generate
traffic from their own MAC and IP.
o The Tenant Systems may be Virtual Appliance entities (VAs) that
forward traffic to/from IP addresses of different End Devices
seating behind them.
o These VAs can be firewalls, load balancers, NAT devices, other
appliances or virtual gateways with virtual routing instances.
o These VAs do not have their own routing protocols and hence
rely on the EVPN NVEs to advertise the routes on their behalf.
o In all these cases, the VA will forward traffic to the Data
Center using its own source MAC but the source IP will be the
one associated to the End Device seating behind or a
translated IP address (part of a public NAT pool) if the VA is
performing NAT.
o Note that the same IP address could exist behind two of these
TS. One example of this would be certain appliance resiliency
mechanisms, where a virtual IP or floating IP can be owned by
one of the two VAs running the resiliency protocol (the master
VA). VRRP is one particular example of this. Another example
is multi-homed subnets, i.e. the same subnet is connected to
two VAs.
o Although these VAs provide IP connectivity to VMs and subnets
behind them, they do not always have their own IP interface
connected to the EVPN NVE, e.g. layer-2 firewalls are examples
of VAs not supporting IP interfaces.
The following figure illustrates some of the examples described
above.
Rabadan et al. Expires January 5, 2015 [Page 4]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
NVE1
+--------+
TS1(VM)--|(EVI-10)|---------+
IP1/M1 +--------+ | DGW1
+---------+ +-------------+
| |----|(EVI-10) |
SN1---+ NVE2 | | | IRB1 |
| +--------+ | | | (VRF)|---+
SN2---TS2(VA)--|(EVI-10)|----| | +-------------+ _|_
| IP2/M2 +--------+ | VXLAN/ | ( )
IP4---+ <-+ | nvGRE | DGW2 ( WAN )
| | | +-------------+ (___)
vIP23 (floating) | |----|(EVI-10) | |
| +---------+ | IRB2 | |
SN1---+ <-+ NVE3 | | | | (VRF)|---+
| IP3/M3 +--------+ | | | +-------------+
SN3---TS3(VA)--|(EVI-10)|------+ | |
| +--------+ | |
IP5---+ | |
| |
NVE4 | | NVE5 +--SN5
+---------------------+ | | +--------+ |
IP6------|(EVI-1) | | +----|(EVI-10)|--TS4(VA)--SN6
| \ IRB3 | | +--------+ |
| (VRF)-(EVI-10)|--+ ESI4 +--SN7
| / |
|---|(EVI-2) |
SN4| +---------------------+
Figure 1 DC inter-subnet use-cases
Where:
NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same EVI for a
particular tenant. EVI-10 is the corresponding EVPN MAC-VRF for the
shared EVI on each element, i.e. core-facing EVI, and all the hosts
connected to that instance belong to the same IP subnet. The hosts
connected to EVI-10 are listed below:
o TS1 is a VM that generates/receives traffic from/to IP1, where
IP1 belongs to the EVI-10 subnet.
o TS2 and TS3 are Virtual Appliances (VA) that generate/receive
traffic from/to the subnets and hosts seating behind them
(SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3)
belong to the EVI-10 subnet and they can also generate/receive
traffic. When these VAs receive packets destined to their own
Rabadan et al. Expires January 5, 2015 [Page 5]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
MAC addresses (M2 and M3) they will route the packets to the
proper subnet or host. These VAs do not support routing
protocols to advertise the subnets connected to them and can
move to a different server and NVE when the Cloud Management
System decides to do so. These VAs may also support redundancy
mechanisms for some subnets, similar to VRRP, where a floating
IP is owned by the master VA and only the master VA forwards
traffic to a given subnet. E.g.: vIP23 in figure 1 is a
floating IP that can be owned by TS2 or TS3 depending on who
the master is. Only the master will forward traffic to SN1.
o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3
have their own IP addresses that belong to the EVI-10 subnet
too. These IRB interfaces connect the EVI-10 subnet to Virtual
Routing and Forwarding (VRF) instances that can route the
traffic to other connected subnets for the same tenant (within
the DC or at the other end of the WAN).
o TS4 is a layer-2 VA that provides connectivity to subnets SN5,
SN6 and SN7, but does not have an IP address itself in the
EVI-10. TS4 is connected to a physical port on NVE5 assigned
to Ethernet Segment Identifier 4.
All the above DC use cases require inter-subnet forwarding and
therefore the individual host routes and subnets MUST be advertised:
a) From the NVEs (since VAs and VMs do not run routing protocols) and
b) Associated to an overlay next-hop that can be a VA IP address, a
floating IP address, and IRB IP address or an ESI.
2.2 The requirement for advertising IP prefixes in EVPN
In all the inter-subnet connectivity cases discussed in section 2.1
there is a need to advertise IP prefixes in the control plane. The
advertisement of such prefixes must meet certain requirements,
specific to NVO-based Data Centers:
o The data plane in NVO-based Data Centers is not based on IP
over a GRE or MPLS tunnel as required by [RFC4364], but
Ethernet over an IP tunnel, such as VXLAN or NVGRE.
o The IP prefixes in the DC must be advertised with a
flexibility that does not exist in IP-VPNs today. For
instance:
a) The advertised overlay next-hop for a given IP prefix can
be an IRB IP address (see section 5.3), a floating IP
Rabadan et al. Expires January 5, 2015 [Page 6]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
address (see section 5.2) or even an ESI (see section 5.4).
b) VXLAN or NVGRE virtual identifiers can have a global or a
local scope. The implementation MUST support the flexibility
to advertise IP Prefixes associated to a global identifier
(32-bit value encoded in the EVPN Ethernet Tag ID) or a
locally significant identifier (20-bit value encoded in the
MPLS label field). At the moment, [RFC4364] can only
advertise Prefixes associated to a locally significant
identifier (MPLS label).
c) Since an NVE can potentially advertise many Prefixes with
different overlay next-hops and different VXLAN/NVGRE
identifiers, it is highly desirable to be able to advertise
those prefixes with their corresponding overlay next-hops
and VXLAN/NVGRE identifiers as attributes within the same
NLRI, for a better BGP update packing. [RFC4364] does not
have the capability of advertising a flexible overlay next-
hop together with a prefix in the same NLRI.
o IP prefixes must be advertised by NVE devices that have no VRF
instances defined and no capability to process IP-VPN
prefixes. These NVE devices just support EVPN and advertise IP
Prefixes on behalf of some connected Tenant Systems. In other
words: any attempt to solve this problem by simply using
[RFC4364] routes requires that any EVPN deployment must be
accompanied with a concurrent IP-VPN topology, which is not
possible in most of the cases.
o Finally, Data Center providers want to use a single BGP
Subsequent Address Family (AFI/SAFI) for the advertisement of
addresses within the Data Center, i.e. BGP EVPN only, as
opposed to using EVPN and IP-VPN in a concurrent topology.
This minimizes the control plane overhead in TORs and
Hypervisors and simplifies the operations.
EVPN is extended - as described in this document - to advertise IP
prefixes with the flexibility required by the current and future Data
Center applications.
2.3 The requirement for a new EVPN route type
[EVPN] defines a MAC/IP route (or RT-2) where a MAC address can be
advertised together with an IP address length (IPL) and IP address
(IP). While a variable IPL might be used to indicate the presence of
an IP prefix in a route type 2, there are several specific use cases
in which using this route type to deliver IP Prefixes is not
suitable.
Rabadan et al. Expires January 5, 2015 [Page 7]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
One example of such use cases is the "floating IP" example described
in section 2.1. In this example we need to decouple the advertisement
of the prefixes from the advertisement of the floating IP (vIP23 in
figure 1) and MAC associated to it, otherwise the solution gets
highly inefficient and does not scale.
E.g.: if we are advertising 1k prefixes from M2 (using route type 2)
and the floating IP owner changes from M2 to M3, we would need to
withdraw 1k routes from M2 and re-advertise 1k routes from M3.
However if we use a separate route type, we can advertise the 1k
routes associated to the floating IP address (vIP23) and only one
route type 2 for advertising the ownership of the floating IP, i.e.
vIP23 and M2 in the route type 2. When the floating IP owner changes
from M2 to M3, a single route type 2 withdraw/update is required to
indicate the change. The remote DGW will not change any of the 1k
prefixes associated to vIP23, but will only update the ARP resolution
entry for vIP23 (now pointing at M3).
Other reasons to decouple the IP Prefix advertisement from the MAC
route are listed below:
o Clean identification, operation of troubleshooting of IP
Prefixes, not subject to interpretation and independent of the
IPL and the IP value. E.g.: a default IP route 0.0.0.0/0 must
always be easily and clearly distinguished from the absence of
IP information.
o MAC address information must not be compared by BGP when
selecting two IP Prefix routes. If IP Prefixes are to be
advertised using MAC routes, the MAC information is always
present and part of the route key.
o IP Prefix routes must not be subject to MAC route procedures
such as MAC Mobility or aliasing. Prefixes advertised from two
different ESIs do not mean mobility; MACs advertised from two
different ESIs do mean mobility. Similarly load balancing for
IP prefixes is achieved through IP mechanisms such as ECMP,
and not through MAC route mechanisms such as aliasing.
o NVEs that do not require processing IP Prefixes must have an
easy way to identify an update with an IP Prefix and ignore
it, rather than processing the MAC route only to find out
later that it carries a Prefix that must be ignored.
The following sections describe how EVPN is extended with a new route
type for the advertisement of prefixes and how this route is used to
address the current and future inter-subnet connectivity requirements
existing in the Data Center.
Rabadan et al. Expires January 5, 2015 [Page 8]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
3. The BGP EVPN IP Prefix route
The current BGP EVPN NLRI as defined in [EVPN] is shown below:
+-----------------------------------+
| Route Type (1 octet) |
+-----------------------------------+
| Length (1 octet) |
+-----------------------------------+
| Route Type specific (variable) |
+-----------------------------------+
Where the route type field can contain one of the following specific
values:
+ 1 - Ethernet Auto-Discovery (A-D) route
+ 2 - MAC advertisement route
+ 3 - Inclusive Multicast Route
+ 4 - Ethernet Segment Route
This document defines an additional route type that will be used for
the advertisement of IP Prefixes:
+ 5 - IP Prefix Route
The support for this new route type is OPTIONAL.
Since this new route type is OPTIONAL, an implementation not
supporting it MUST ignore the route, based on the unknown route type
value.
The detailed encoding of this route and associated procedures are
described in the following sections.
3.1 IP Prefix Route encoding
An IP Prefix advertisement route type specific EVPN NLRI consists of
the following fields:
Rabadan et al. Expires January 5, 2015 [Page 9]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
+---------------------------------------+
| RD (8 octets) |
+---------------------------------------+
|Ethernet Segment Identifier (10 octets)|
+---------------------------------------+
| Ethernet Tag ID (4 octets) |
+---------------------------------------+
| IP Address Length (1 octet) |
+---------------------------------------+
| IP Address (4 or 16 octets) |
+---------------------------------------+
| GW IP Address (4 or 16 octets) |
+---------------------------------------+
| MPLS Label (3 octets) |
+---------------------------------------+
Where:
o RD, Ethernet Tag ID and MPLS Label fields will be used as
defined in [EVPN] and [EVPN-OVERLAYS].
o The Ethernet Segment Identifier will be a non-zero 10-byte
identifier if the ESI is used as an overlay next-hop. It will
be zero otherwise.
o The IP address length can be set to a value between 0 and 32
(bits) for ipv4 and between 0 and 128 for ipv6.
o The IP address will be a 32 or 128-bit field (ipv4 or ipv6).
o The GW IP (Gateway IP Address) will be a 32 or 128-bit field
(ipv4 or ipv6), and will encode the overlay IP next-hop for
the IP Prefixes. The GW IP field can be zero if it is not used
as an overlay next-hop.
o The total route length will indicate the type of prefix (ipv4
or ipv6) and the type of GW IP address (ipv4 or ipv6). Note
that the IP Address + the GW IP should have a length of either
64 or 256 bits, but never 160 bits (ipv4 and ipv6 mixed values
are not allowed).
The Eth-Tag ID, IP address length and IP address will be part of the
route key used by BGP to compare routes. The rest of the fields will
be out of the route key.
The route will contain a single overlay next-hop, i.e. if the ESI
field is zero, the GW IP field will not, and vice versa. The
following table shows the different inter-subnet use-cases described
Rabadan et al. Expires January 5, 2015 [Page 10]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
in this document and the corresponding coding of the overlay next-hop
in the route-type 5 (RT-5).
+----------------------------+----------------------------------+
| Overlay next-hop use-case | Field in the RT-5 |
+----------------------------+----------------------------------+
| TS IP address | GW IP Address |
| Floating IP address | GW IP Address |
| IRB IP address | GW IP Address |
| "Bump in the wire" | ESI |
| VRF-to-VRF | GW MAC Address (Tunnel Attribute)|
+----------------------------+----------------------------------+
4. Benefits of using the EVPN IP Prefix route
This section clarifies the different functions accomplished by the
EVPN RT-2 and RT-5 routes, and provides a list of benefits derived
from using a separate route type for the advertisement of IP Prefixes
in EVPN.
[EVPN] describes the content of the BGP EVPN route type 2 specific
NLRI, i.e. MAC/IP Advertisement Route, where the IP address length
(IPL) and IP address (IP) of a specific advertised MAC are encoded.
The subject of the MAC advertisement route is the MAC address (M) and
MAC address length (ML) encoded in the route. The MAC mobility and
other complex procedures are defined around that MAC address. The IP
address information carries the host IP address required for the ARP
resolution of the MAC according to [EVPN] and and the host route to
be programmed in the IP-VRF [EVPN-INTERSUBNET].
The BGP EVPN route type 5 defined in this document, i.e. IP Prefix
Advertisement route, decouples the advertisement of IP prefixes from
the advertisement of any MAC address related to it. This brings some
major benefits to NVO-based networks where certain inter-subnet
forwarding scenarios are required. Some of those benefits are:
a) Upon receiving a route type 2 or type 5, an egress NVE can easily
distinguish MACs and IPs from IP Prefixes. E.g. an IP prefix with
IPL=32 being advertised from two different ingress NVEs (as RT-5)
can be identified as such and be imported in the designated
routing context as two ECMP routes, as opposed to two MACs
competing for the same IP.
b) Similarly, upon receiving a route, an egress NVE not supporting
processing IP Prefixes can easily ignore the update, based on the
route type.
c) A MAC route includes the ML, M, IPL and IP in the route key that
Rabadan et al. Expires January 5, 2015 [Page 11]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
is used by BGP to compare routes, whereas for IP Prefix routes,
only IPL and IP (as well as Ethernet Tag ID) are part of the route
key. Advertised IP Prefixes are imported into the designated
routing context, where there is no MAC information associated to
IP routes. In the example illustrated in figure 1, subnet SN1
should be advertised by NVE2 and NVE3 and interpreted by DGW1 as
the same route coming from two different next-hops, regardless of
the MAC address associated to TS2 or TS3. This is easily
accomplished in the route type 5 by including only the IP
information in the route key.
d) By decoupling the MAC from the IP Prefix advertisement procedures,
we can leave the IP prefix advertisements out of the MAC mobility
procedures defined in [EVPN] for MACs. In addition, this allows us
to have an indirection mechanism for IP prefixes advertised from a
MAC/IP that can move between hypervisors. E.g. if there are 1,000
prefixes seating behind TS2 (figure 1), NVE2 will advertise all
those prefixes in RT-5 routes associated to the next-hop IP2.
Should TS2 move to a different NVE, a single MAC advertisement
route withdraw for the M2/IP2 route from NVE2 will invalidate the
1,000 prefixes, as opposed to have to wait for each individual
prefix to be withdrawn. This may be easily accomplished by using
IP Prefix routes that are not tied to a MAC address, and use a
different MAC route to advertise the location and resolution of
the overlay next-hop to a MAC address.
5. IP Prefix next-hop use-cases
The IP Prefix route can use a GW IP, an ESI or a GW MAC as an overlay
next-hop. This section describes some use-cases for these next-hop
types.
5.1 TS IP address next-hop use-case
The following figure illustrates an example of inter-subnet
forwarding for subnets seating behind Virtual Appliances (on TS2 and
TS3).
Rabadan et al. Expires January 5, 2015 [Page 12]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
SN1---+ NVE2 DGW1
| +--------+ +---------+ +-------------+
SN2---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) |
| IP2/M2 +--------+ | | | IRB1\ |
IP4---+ | | | (VRF)|---+
| | +-------------+ _|_
| VXLAN/ | ( )
| nvGRE | DGW2 ( WAN )
SN1---+ NVE3 | | +-------------+ (___)
| IP3/M3 +--------+ | |----|(EVI-10) | |
SN3---TS3(VA)--|(EVI-10)|----| | | IRB2\ | |
| +--------+ +---------+ | (VRF)|---+
IP5---+ +-------------+
Figure 2 TS IP address use-case
An example of inter-subnet forwarding between subnet SN1/24 and a
subnet seating in the WAN is described below. NVE2, NVE3, DGW1 and
DGW2 are running BGP EVPN. TS2 and TS3 do not support routing
protocols, only a static route to forward the traffic to the WAN.
(1) NVE2 advertises the following BGP routes on behalf of TS2:
o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32,
IP=IP2
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP2
(2) NVE3 advertises the following BGP routes on behalf of TS3:
o Route type 2 (MAC route) containing: ML=48, M=M3, IPL=32,
IP=IP3
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP3
(3) DGW1 and DGW2 import both received routes based on the RT:
o Based on the EVI-10 route-target in DGW1 and DGW2, the MAC
route is imported and M2 is added to the EVI-10 MAC-VRF along
with its corresponding tunnel information. For the VXLAN use
case, the VTEP will be derived from the MAC route BGP next-hop
(underlay next-hop) and VNI from the Ethernet Tag or MPLS
fields. IP2 - M2 is added to the ARP table.
o Based on the EVI-10 route-target in DGW1 and DGW2, the IP
Prefix route is also imported and SN1/24 is added to the
Rabadan et al. Expires January 5, 2015 [Page 13]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
designated routing context with next-hop IP2 pointing at the
local EVI-10. Should ECMP be enabled in the routing context,
SN1/24 would also be added to the routing table with next-hop
IP3.
(4) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 VRF routing
table and next-hop=IP2 is found. The tunnel information to
encapsulate the packet will be derived from the route-type 2
(MAC route) received for M2/IP2.
o IP2 is resolved to M2 in the ARP table, and M2 is resolved to
the tunnel information given by the MAC FIB (remote VTEP and
VNI for the VXLAN case).
o The IP packet destined to IPx is encapsulated with:
. Source inner MAC = IRB1 MAC
. Destination inner MAC = M2
. Tunnel information provided by the MAC-VRF (VNI, VTEP IPs
and MACs for the VXLAN case)
(5) When the packet arrives at NVE2:
o Based on the tunnel information (VNI for the VXLAN case), the
EVI-10 context is identified for a MAC lookup.
o Encapsulation is stripped-off and based on a MAC lookup
(assuming MAC forwarding on the egress NVE), the packet is
forwarded to TS2, where it will be properly routed.
(6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will
be applied to the MAC route IP2/M2, as defined in [EVPN]. Route type
5 prefixes are not subject to MAC mobility procedures, hence no
changes in the DGW VRF routing table will occur for TS2 mobility,
i.e. all the prefixes will still be pointing at IP2 as next-hop.
There is an indirection for e.g. SN1/24, which still points at
next-hop IP2 in the routing table, but IP2 will be simply resolved to
a different tunnel, based on the outcome of the MAC mobility
procedures for the MAC route IP2/M2.
Note that in the opposite direction, TS2 will send traffic based on
its static-route next-hop information (IRB1 and/or IRB2), and regular
EVPN procedures will be applied.
Rabadan et al. Expires January 5, 2015 [Page 14]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
5.2 Floating IP next-hop use-case
Sometimes Tenant Systems (TS) work in active/standby mode where an
upstream floating IP - owned by the active TS - is used as the next-
hop to get to some subnets behind. This redundancy mode, already
introduced in section 2.1 and 2.3, is illustrated in Figure 3.
NVE2 DGW1
+--------+ +---------+ +-------------+
+---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) |
| IP2/M2 +--------+ | | | IRB1\ |
| <-+ | | | (VRF)|---+
| | | | +-------------+ _|_
SN1 vIP23 (floating) | VXLAN/ | ( )
| | | nvGRE | DGW2 ( WAN )
| <-+ NVE3 | | +-------------+ (___)
| IP3/M3 +--------+ | |----|(EVI-10) | |
+---TS3(VA)--|(EVI-10)|----| | | IRB2\ | |
+--------+ +---------+ | (VRF)|---+
+-------------+
Figure 3 Floating IP next-hop for redundant TS
In this example, assuming TS2 is the active TS and owns IP23:
(1) NVE2 advertises the following BGP routes for TS2:
o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32,
IP=IP23
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP23
(2) NVE3 advertises the following BGP routes for TS3:
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP23
(3) DGW1 and DGW2 import both received routes based on the RT:
o M2 is added to the EVI-10 MAC FIB along with its corresponding
tunnel information. For the VXLAN use case, the VTEP will be
derived from the MAC route BGP next-hop and VNI from the
Ethernet Tag or MPLS fields. IP23 - M2 is added to the ARP
table.
o SN1/24 is added to the designated routing context in DGW1 and
DGW2 with next-hop IP23 pointing at the local EVI-10.
Rabadan et al. Expires January 5, 2015 [Page 15]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
(4) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table and next-hop=IP23 is found. The tunnel
information to encapsulate the packet will be derived from the
route-type 2 (MAC route) received for M2/IP23.
o IP23 is resolved to M2 in the ARP table, and M2 is resolved to
the tunnel information given by the MAC-VRF (remote VTEP and
VNI for the VXLAN case).
o The IP packet destined to IPx is encapsulated with:
. Source inner MAC = IRB1 MAC
. Destination inner MAC = M2
. Tunnel information provided by the MAC FIB (VNI, VTEP IPs
and MACs for the VXLAN case)
(5) When the packet arrives at NVE2:
o Based on the tunnel information (VNI for the VXLAN case), the
EVI-10 context is identified for a MAC lookup.
o Encapsulation is stripped-off and based on a MAC lookup
(assuming MAC forwarding on the egress NVE), the packet is
forwarded to TS2, where it will be properly routed.
(6) When the redundancy protocol running between TS2 and TS3 appoints
TS3 as the new active TS for SN1, TS3 will now own the floating IP23
and will signal this new ownership (GARP message or similar). Upon
receiving the new owner's notification, NVE3 will issue a route type
2 for M3-IP23. DGW1 and DGW2 will update their ARP tables with the
new MAC resolving the floating IP. No changes are carried out in the
VRF routing table.
In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1
(from NVE2 and NVE3) but only the one with the same BGP next-hop as
the IP23 route type 2 BGP next-hop will be valid.
5.3 IRB IP next-hop use-case
In some other cases, the NVEs and DGWs will have just IRB interfaces
as hosts in the EVPN instance. This use-case is referred as "IRB
Rabadan et al. Expires January 5, 2015 [Page 16]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
forwarding on NVEs with core-facing IRB Interface" in [EVPN-
INTERSUBNET], however the new requirement here is the advertisement
of IP Prefixes as opposed to only host routes. Figure 4 illustrates
an example.
NVE1
+---------------------+ DGW1
IP1---|(EVI-1) | +-------------+
| \ IRB3 | +---------+ |(EVI-10) |
| (VRF)-(EVI-10)|--| |--| IRB1\ |
| / | | | | (VRF)|---+
|-|(EVI-2) | | | +-------------+ _|_
SN1| +---------------------+ | | ( )
| +---------------------+ | VXLAN/ | DGW2 ( WAN )
|-|(EVI-2) | | nvGRE | +-------------+ (___)
| \ IRB4 | | | |(EVI-10) | |
| (VRF)-(EVI-10)|--| |--| IRB2\ | |
| / | +---------+ | (VRF)|---+
SN2---|(EVI-3) | +-------------+
+---------------------+
NVE2
Figure 4 IRB IP next-hop use-case
In this case:
(1) NVE1 advertises the following BGP routes for SN1 resolution:
o Route type 2 (MAC route) containing: ML=48, M=IRB3-MAC,
IPL=32, IP=IRB3-IP
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IRB3-IP
(2) NVE2 advertises the following BGP routes for SN1 resolution:
o Route type 2 (MAC route) containing: ML=48, M=IRB4-MAC,
IPL=32, IP=IRB4-IP
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IRB4-IP
(3) DGW1 and DGW2 import both received routes based on the RT:
o IRB3-MAC and IRB4-MAC are added to the EVI-10 MAC-VRF along
with their corresponding tunnel information. For the VXLAN use
case, the VTEP will be derived from the MAC route BGP next-hop
and VNI from the Ethernet Tag or MPLS fields. IRB3-MAC - IRB3-
Rabadan et al. Expires January 5, 2015 [Page 17]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
IP and IRB4-MAC - IRB4-IP are added to the ARP table.
o SN1/24 is added to the designated routing context in DGW1 and
DGW2 with next-hop IRB3-IP (and/or IRB4-IP) pointing at the
local EVI-10.
Similar forwarding procedures as the ones described in the previous
use-cases are followed.
5.4 ESI next-hop ("Bump in the wire") use-case
The following figure illustrates and example of inter-subnet
forwarding for a subnet route that uses an ESI as an overlay next-
hop. In this use-case, TS2 and TS3 are layer-2 VA devices without any
IP address that can be included as an overlay next-hop in the GW IP
field of the IP Prefix route.
NVE2 DGW1
+--------+ +---------+ +-------------+
+---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) |
| ESI23 +--------+ | | | IRB1 |
| + | | | (VRF)|---+
| | | | +-------------+ _|_
SN1 | | VXLAN/ | ( )
| | | nvGRE | DGW2 ( WAN )
| + NVE3 | | +-------------+ (___)
| ESI23 +--------+ | |----|(EVI-10) | |
+---TS3(VA)--|(EVI-10)|----| | | IRB2 | |
+--------+ +---------+ | (VRF)|---+
+-------------+
Figure 5 ESI next-hop use-case
Since neither TS2 nor TS3 can run any routing protocol and have no IP
address assigned, an ESI, i.e. ESI23, will be provisioned on the
attachment ports of NVE2 and NVE3. This model supports VA redundancy
in a similar way as the one described in section 4.2 for the floating
IP next-hop use-case, only using the EVPN A-D route instead of the
MAC advertisement route to advertise the location of the overlay
next-hop. The procedure is explained below:
(1) NVE2 advertises the following BGP routes for TS2:
o Route type 1 (A-D route for EVI-10) containing: ESI=ESI23 and
the corresponding tunnel information (Ethernet Tag and/or MPLS
label). Assuming the ESI is active on NVE2, NVE2 will
advertise this route.
Rabadan et al. Expires January 5, 2015 [Page 18]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=ESI23, GW IP address=0.
(2) NVE3 advertises the following BGP routes for TS3:
o Route type 1 (A-D route for EVI-10) containing: ESI=ESI23 and
the corresponding tunnel information (Ethernet Tag and/or MPLS
label). NVE3 will advertise this route assuming the ESI is
active on NVE2. Note that if the resiliency mechanism for TS2
and TS3 is in active-active mode, both NVE2 and NVE3 will send
the A-D route. Otherwise, that is, the resiliency is active-
standby, only the NVE owning the active ESI will advertise the
A-D route for ESI23.
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=23, GW IP address=0.
(3) DGW1 and DGW2 import the received routes based on the RT:
o The tunnel information to get to ESI23 is installed in DGW1
and DGW2. For the VXLAN use case, the VTEP will be derived
from the A-D route BGP next-hop and VNI from the Ethernet Tag
or MPLS fields (see [EVPN-OVERLAYS]).
o SN1/24 is added to the designated routing context in DGW1 and
DGW2 with next-hop ESI23 pointing at the local EVI-10.
(4) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 VRF routing
table and next-hop=ESI23 is found. The tunnel information to
encapsulate the packet will be derived from the route-type 1
(A-D route) received for ESI23.
o The IP packet destined to IPx is encapsulated with:
. Source inner MAC = IRB1 MAC
. Destination inner MAC = M2 (this MAC will be obtained
after a looked up in the VRF ARP table or in the EVI-10
FDB table associated to ESI23).
. Tunnel information provided by the A-D route for ESI23
(VNI, VTEP IP and MACs for the VXLAN case).
(5) When the packet arrives at NVE2:
Rabadan et al. Expires January 5, 2015 [Page 19]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
o Based on the tunnel information (VNI for the VXLAN case), the
EVI-10 context is identified for a MAC lookup (assuming MAC
disposition model).
o Encapsulation is stripped-off and based on a MAC lookup
(assuming MAC forwarding on the egress NVE), the packet is
forwarded to TS2, where it will be properly forwarded.
(6) If the redundancy protocol running between TS2 and TS3 follows an
active/standby model and there is a failure, appointing TS3 as the
new active TS for SN1, TS3 will now own the connectivity to SN1 and
will signal this new ownership (GARP message or similar). Upon
receiving the new owner's notification, NVE3 will issue a route type
1 for ESI23, whereas NVE2 will withdraw it's A-D route for ESI23.
DGW1 and DGW2 will update their tunnel information to resolve ESI23.
No changes are carried out in the VRF routing table.
In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1
(from NVE2 and NVE3) but only the one with the same BGP next-hop as
the ESI23 route type 1 BGP next-hop will be valid.
5.5 IRB forwarding without core-facing IRB use-case (VRF-to-VRF)
This use-case is referred as "IRB forwarding on NVEs without core-
facing IRB Interface" in [EVPN-INTERSUBNET], however the new
requirement here is the advertisement of IP Prefixes as opposed to
only host routes. In the previous examples, the EVI instance can
connect IRB interfaces and any other Tenant Systems connected to it.
EVPN provides connectivity for:
a) Traffic destined to the IRB IP interfaces as well as
b) Traffic destined to IP subnets seating behind the IRB interfaces,
e.g. SN1 or SN2.
In order to provide connectivity for (a) we need MAC/IP routes (RT-2)
distributing IRB MACs and IPs. Connectivity type (b) is accomplished
by the exchange of IP Prefix routes (route type 5) for IPs and
subnets seating behind IRBs.
In some cases, connectivity type (a) (see above) is not required and
the EVI instance is connecting only IRB interfaces, which are never
the final destination of any packet. This use case is depicted in the
diagram below and we refer to it as the "IRB forwarding on NVEs
without core-facing IRB Interface" use-case:
Rabadan et al. Expires January 5, 2015 [Page 20]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
NVE1
+------------+
IP1-----|(EVI-1) | DGW1
| \ | +---------+ +-----+
| (VRF)|----| |----|(VRF)|----+
| / | | | +-----+ |
|---|(EVI-2) | | | _|_
| +------------+ | | ( )
SN1| | VXLAN/ | ( WAN )
| NVE2 | nvGRE | (___)
| +------------+ | | |
|---|(EVI-2) | | | DGW2 |
| \ | | | +-----+ |
| (VRF)|----| |----|(VRF)|----+
| / | +---------+ +-----+
SN2-----|(EVI-3) |
+------------+
Figure 6 Inter-subnet forwarding without core-facing IRB interfaces
In this case, we need to provide connectivity from/to IP hosts in
SN1, SN2, IP1 and hosts seating at the other end of the WAN. The EVI
in the core just connects all the IRBs in NVE1, NVE2, DGW1 and DGW2
but there will not be any IP host in this core EVI that is the final
destination of any IP packet.
Therefore there is no need to define IRB interfaces (IRBs are not
represented in the diagram). This is the reason why we refer to this
solution as "Inter-subnet forwarding without core-facing IRB
interfaces" or "VRF-to-VRF" solution.
In this case, the proposal is to use EVPN type 5 routes and a BGP
tunnel encapsulation attribute as in [EVPN-INTERSUBNET], where the
following information is carried:
o Route type 5 Eth-Tag ID can contain the core instance VNI (if
the VNI is global, otherwise, for local significant VNIs, an
MPLS label field may be added with a 20-bit VNI encoded in the
label space).
o Route type 5 IP address length and IP address, as explained in
the previous sections.
o Route type 5 GW IP address=0 and ESI=0.
o Tunnel Encapsulation Attribute as per [EVPN-INTERSUBNET]
containing the following fields and including the GW MAC to be
Rabadan et al. Expires January 5, 2015 [Page 21]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
used in the overlay encapsulation:
o Tunnel Type (2 octets) is:
+ TBD - VXLAN Encapsulation
+ TBD - NVGRE Encapsulation
o Length (2 octets): the total number of octets of the value
field.
o Address Length= 6 bytes (for MAC address)
o Address= GW MAC Address, a MAC address associated to the
system advertising the route. This MAC address identifies
the NVE/DGW and can be re-used for all the IP-VRFs in the
node.
Example of prefix advertisement for the ipv4 prefix SN1/24 advertised
from NVE1:
(1) NVE1 advertises the following BGP route for SN1:
o Route type 5 (IP Prefix route) containing: Eth-Tag=VNI=10
(assuming global VNI), IPL=24, IP=SN1. In addition to that, a
Tunnel Encapsulation Attribute will be sent, where: Tunnel-type=
VXLAN or NVGRE, and the address value will contain a GW MAC
address= NVE1 MAC.
(2) DGW1 imports the received route from NVE1 and SN1/24 is added to
the designated routing context. The next-hop for SN1/24 will be given
by the route type 5 BGP next-hop (NVE1), which is resolved to a
tunnel. For instance: if the tunnel is VXLAN based, the BGP next-hop
will be resolved to a VXLAN tunnel where: destination-VTEP= NVE1 IP,
VNI=10, inner destination MAC = NVE1 MAC (derived from the GW MAC
value in the Tunnel Encapsulation attribute).
(3) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 VRF routing table
and next-hop= "NVE1 IP" is found. The tunnel information to
encapsulate the packet will be derived from the route-type 5
received for SN1.
o The IP packet destined to IPx is encapsulated with: Source inner
MAC = DGW1 MAC, Destination inner MAC = NVE1 MAC, Source outer IP
(source VTEP) = DGW1 IP, Destination outer IP (destination VTEP) =
Rabadan et al. Expires January 5, 2015 [Page 22]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
NVE1 IP
(4) When the packet arrives at NVE1:
o Based on the tunnel information (VNI for the VXLAN case), the
routing context is identified for an IP lookup.
o An IP lookup is performed in the routing context, where SN1 turns
out to be a local subnet associated to EVI-2. A subsequent lookup
in the ARP table and the EVI-2 MAC-VRF will return the forwarding
information for the packet in EVI-2.
6. Conclusions
A new EVPN route type 5 for the advertisement of IP Prefixes is
proposed in this document. This new route type will have a
differentiated role from the RT-2 route and will address all the Data
Center (or NVO-based networks in general) inter-subnet connectivity
scenarios in which IP Prefix advertisement is required. Using this
new RT-5 route, an IP Prefix will be advertised along with an overlay
next-hop that can be a GW IP address, an ESI or a GW MAC address. As
discussed throughout the document, IP-VPN cannot address all the
inter-subnet use-cases in an NVO-based DC and the existing EVPN RT-2
does not meet the requirements for all the DC use cases, therefore a
new EVPN route type is required.
This new EVPN route type 5 decouples the IP Prefix advertisements
from the MAC route advertisements in EVPN, hence:
a) Allows the clean and clear announcements of ipv4 or ipv6 prefixes
in an NLRI with no MAC addresses in the route key, so that only IP
information is used in BGP route comparisons.
b) Since the route type is different from the MAC/IP advertisement
route, the advertisement of prefixes will be excluded from all the
procedures defined for the advertisement of VM MACs, e.g. MAC
Mobility or aliasing. As a result of that, the current EVPN
procedures do not need to be modified.
c) Allows a flexible implementation where the prefix can be linked to
different types of next-hops: MAC address, IP address, IRB IP
address, ESI, etc. and these MAC or IP addresses do not need to
reside in the advertising NVE.
d) An EVPN implementation not requiring IP Prefixes can simply
discard them by looking at the route type value. An unknown route
type MUST be ignored by the receiving NVE/PE.
Rabadan et al. Expires January 5, 2015 [Page 23]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
7. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC-2119
[RFC2119].
8. Security Considerations
9. IANA Considerations
10. References
10.1 Normative References
[RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006.
10.2 Informative References
[EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf-
l2vpn-evpn-03.txt, work in progress, February, 2013
[EVPN-OVERLAYS] Sajassi-Drake et al., "A Network Virtualization
Overlay Solution using EVPN", draft-sd-l2vpn-evpn-overlay-03.txt,
work in progress, June, 2014
[EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in
EVPN", draft-sajassi-l2vpn-evpn-inter-subnet-forwarding-04.txt,
work in progress, July, 2014
11. Acknowledgments
The authors would like to thank Mukul Katiyar and Senthil
Sathappan for their valuable feedback and contributions.
12. Authors' Addresses
Jorge Rabadan
Alcatel-Lucent
777 E. Middlefield Road
Mountain View, CA 94043 USA
Email: jorge.rabadan@alcatel-lucent.com
Wim Henderickx
Rabadan et al. Expires January 5, 2015 [Page 24]
Internet-Draft EVPN Prefix Advertisement July 4, 2014
Alcatel-Lucent
Email: wim.henderickx@alcatel-lucent.com
Florin Balus
Nuage Networks
Email: florin@nuagenetworks.net
Aldrin Isaac
Bloomberg
Email: aisaac71@bloomberg.net
Senad Palislamovic
Alcatel-Lucent
Email: senad.palislamovic@alcatel-lucent.com
John E. Drake
Juniper Networks
Email: jdrake@juniper.net
Ali Sajassi
Cisco
Email: sajassi@cisco.com
Rabadan et al. Expires January 5, 2015 [Page 25]