Overlay Network Tenant System Address Migration
draft-merged-nvo3-ts-address-migration-00
The information below is for an old version of the document.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Yakov Rekhter , Linda Dunbar , Rahul Aggarwal , Ravi Shekhar , Wim Henderickx , Luyuan Fang , Ali Sajassi | ||
| Last updated | 2014-10-09 | ||
| Stream | (None) | ||
| Formats | plain text htmlized pdfized bibtex | ||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-merged-nvo3-ts-address-migration-00
NVO3 Working Group Y. Rekhter
Internet Draft Juniper Networks
Intended status: Standards track L. Dunbar
Expires: April 2015 Huawei
R. Aggarwal
Arktan Inc
R. Shekhar
Juniper Networks
W. Henderickx
Alcatel-Lucent
L. Fang
Microsoft
A. Sajassi
Cisco
October 9, 2014
Overlay Network Tenant System Address Migration
draft-merged-nvo3-ts-address-migration-00.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Lopez, et al. Expires April 9, 2015 [Page 1]
Internet-Draft NVO3 Mobility Scheme October 2014
This Internet-Draft will expire on April 9, 2009.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Abstract
This document describes the schemes to overcome the network-related
issues to achieve seamless Virtual Machine mobility in the data
center and between data centers.
Table of Contents
1. Introduction...................................................3
2. Conventions used in this document..............................4
3. Terminology....................................................4
4. Scheme to resolve VLAN-IDs usage in L2 access domains..........8
5. Layer 2 Extension.............................................10
5.1. Layer 2 Extension Problem................................10
5.2. NVA based Layer 2 Extension Solution.....................10
5.3. E-VPN based Layer 2 Extension Solution...................12
6. Optimal IP Routing............................................14
6.1. Preserving Policies......................................15
6.2. TS Default Gateway solutions.............................16
6.2.1. E-VPN based TS Default Gateway Solutions............16
6.2.1.1. E-VPN based TS Default Gateway Solution 1......17
6.2.1.2. E-VPN based TS Default Gateway Solution 2......18
6.2.2. Distributed Proxy Default Gateway Solution..........18
6.3. Triangular Routing.......................................19
merged, et al. Expires April 9, 2015 [Page 2]
Internet-Draft NVO3 Mobility Scheme October 2014
6.3.1. NVA based Intra Data Center Triangular Routing Solution
...........................................................20
6.3.2. E-VPN based Intra Data Center Triangular Routing
Solution...................................................20
7. L3 Address Migration..........................................21
8. Managing duplicated addresses.................................22
9. Manageability Considerations..................................23
10. Security Considerations......................................23
11. IANA Considerations..........................................23
12. Acknowledgements.............................................23
13. References...................................................23
13.1. Normative References....................................23
13.2. Informative References..................................23
1. Introduction
An important feature of data centers identified in [nvo3-problem] is
the support of Virtual Machine (TS) mobility within the data center
and between data centers. This document describes the schemes to
overcome the network-related issues to achieve seamless Virtual
Machine mobility in the data center and between data centers, where
seamless mobility is defined as the ability to move a TS from one
server in a data center to another server in the same or different
data center, while retaining the IP and MAC address of the TS. In
the context of this document the term mobility or a reference to
moving a TS should be considered to imply seamless mobility, unless
otherwise stated.
Note that in the scenario where a TS is moved between servers
located in different data centers, there are certain issues related
to the current state of the art of the Virtual Machine technology,
the bandwidth that may be available between the data centers, the
distance between the data centers, the ability to manage and operate
such TS mobility, storage-related issues (the moved TS has to have
access to the same virtual disk), etc. Discussion of these issues
is outside the scope of this document.
merged, et al. Expires April 9, 2015 [Page 3]
Internet-Draft NVO3 Mobility Scheme October 2014
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying RFC-2119 significance.
DC: Data Center
DCBR: Data Center Bridge Router
LAG: Link Aggregation Group
POD: Modular Performance Optimized Data Center. POD and Data Center
are used interchangeably in this document.
ToR: Top of Rack switch
TS: Tenant System (used interchangeably with VM on servers
supporting Virtual Machines)
VEPA: Virtual Ethernet Port Aggregator (IEEE802.1Qbg)
VN: Virtual Network
3. Terminology
In this document the term "Top of Rack Switch (ToR)" is used to
refer to a switch in a data center that is connected to the servers
that host TSs. A data center may have multiple ToRs. Some servers
may have embedded blade switches, some servers may have virtual
switches to interconnect the TSs, and some servers may not have any
embedded switches. When External Bridge Port Extenders (as defined
by 802.1BR) are used to connect the servers to the data center
network, the ToR switch is the Controlling Bridge.
Several data centers or PODs could be connected by a network. In
addition to providing interconnect among the data centers/PODs, such
merged, et al. Expires April 9, 2015 [Page 4]
Internet-Draft NVO3 Mobility Scheme October 2014
a network could provide connectivity between the TSs hosted in these
data centers and the sites that contain hosts communicating with
such TSs. Each data center has one or more Data Center Border Router
(DCBR) that connects the data center to the network, and provides
(a) connectivity between TSs hosted in the data center and TSs
hosted in other data centers, and (b) connectivity between TSs
hosted in the data center and hosts communicating with these TSs.
The following figure illustrates the above:
__________
( )
( Data Center)
( Interconnect )-------------------------
( Network ) |
(__________) |
| | |
---- ---- |
| | |
--------+--------------+--------------- -------------
| | | Data | | |
| ------ ------ Center | | Data Center |
| | DCBR | | DCBR | /POD | | /POD |
| ------ ------ | -------------
| | | |
| --- --- |
| ___|______|__ |
| ( ) |
| ( Data Center ) |
| ( Network ) |
| (___________) |
| | | |
| ---- ---- |
| | | |
| ------------ ----- |
| | ToR Switch | | ToR | |
| ------------ ----- |
| | | |
| | ---------- | ---------- |
| |--| Server | |--| Server | |
| | | vSwitch | | ---------- |
| | | ---- | | |
| | | | TS | | | ---------- |
| | | ----- | --| Server | |
| | | | TS | | ---------- |
| | | ----- | |
merged, et al. Expires April 9, 2015 [Page 5]
Internet-Draft NVO3 Mobility Scheme October 2014
| | | | TS | | |
| | | ---- | |
| | ---------- |
| | |
| | ---------- |
| |--| Server | |
| | ---------- |
| | |
| | ---------- |
| --| Server | |
| ---------- |
| |
----------------------------------------
Figure 1: A Typical Data Center Network
The data centers/PODs and the network that interconnects them may be
either (a) under the same administrative control, or (b) controlled
by different administrations.
Consider a set of TSs that (as a matter of policy) are allowed to
communicate with each other, and a collection of devices that
interconnect these TSs. If communication among any TSs in that set
could be accomplished in such a way as to preserve MAC source and
destination addresses in the Ethernet header of the packets
exchanged among these TSs (as these packets traverse from their
sources to their destinations), we will refer to such set of TSs as
an Layer 2 based Virtual Network (VN) or Closed User Group (L2-based
CUG). In this document, the Closed User Group and Virtual Network
(VN) are used interchangeably.
A given TS may be a member of more than one VN or L2-based VN.
In terms of IP address assignment this document assumes that all TSs
of a given L2-based VN have their IP addresses assigned out of a
single IP prefix. Thus, in the context of this document a single IP
subnet corresponds to a single L2-based VN. If a given TS is a
member of more than one L2-based VN, this TS would have multiple IP
addresses and multiple logical interfaces, one IP address and one
logical interface per each such VN.
merged, et al. Expires April 9, 2015 [Page 6]
Internet-Draft NVO3 Mobility Scheme October 2014
A TS that is a member of a given L2-based VN may (as a matter of
policy) be allowed to communicate with TSs that belong to other L2-
based VNs, or with other hosts. Such communication involves IP
forwarding, and thus would result in changing MAC source and
destination addresses in the Ethernet header of the packets being
exchanged.
In this document the term "L2 physical attachment" refers to a
collection of interconnected devices attached to an NVE that perform
forwarding based on the information carried in the Ethernet header.
A trivial L2 physical attachment consists of just one non-
virtualized server. In a non-trivial L2 physical attachment (domain
that contains multiple forwarding entities) forwarding could be
provided by such layer 2 technologies as Spanning Tree Protocol
(STP), VEPA (IEEE802.1Qbg), etc. Note that any multi-chassis LAG
cannot span more than one L2 physical attachment. This document
assumes that a layer 2 access domain is an L2 physical attachment.
A physical server connected to a given L2 physical domain may host
TSs that belong to different L2-based VNs (while each of these VNs
may span multiple L2 physical domains). If an L2 physical attachment
contains servers that host TSs belonging to different L2-based VNs,
then enforcing L2-based VNs boundaries among these TSs within that
domain is accomplished by relying on Layer 2 mechanisms (e.g.
VLANs).
We say that an L2 physical attachment contains a given TS (or that a
given TS is in a given L2 physical attachment), if the server
presently hosting this TS is part of that domain, or the server is
connected to a ToR that is part of that domain.
We say that a given L2-based VN is present within a given data
center if one or more TSs that are part of that VN are presently
hosted by the servers located in that data center.
In the context of this document when we talk about VLAN-ID used by a
given TS, we refer to the VLAN-ID carried by the traffic that is
within the same L2 physical attachment as the TS, and that is either
originated or destined to that TS - e.g., VLAN-ID only has local
merged, et al. Expires April 9, 2015 [Page 7]
Internet-Draft NVO3 Mobility Scheme October 2014
significance within the L2 physical attachment, unless it is stated
otherwise.
Some of the TS-mobility solutions described in this document are E-
VPN based. When using E-VPN in NVO3 environment, the NVE function is
on the PE node. NVE-PE is used to describe the E-VPN PE node that
supports the NVE function.
4. Scheme to resolve VLAN-IDs usage in L2 access domains
This document assumes that within a given non-trivial L2 physical
attachment traffic from/to TSs belonging to different L2-based VNs
MUST have different VLAN-IDs.
To support tens of thousands of virtual networks, the local VLAN-ID
associated with client payload under each NVE has to be locally
significant. Therefore, the same L2-based VN MAY have either the
same or different VLAN-IDs under different NVEs. Thus when a given
TS moves from one non-trivial L2 physical attachment to another, the
VLAN-ID of the traffic from/to TS in the former may be different
than in the latter, and thus cannot assume to stay the same.
To describe the solution more clearly, here are the terminologies
used:
- Customer administered VLAN-IDs (usually hard coded in a TS's Guest
OS and can't be changed when the TS move from one NVE to another).
Some TSs may not have VLAN-ID attached.
- Provider administered VLAN-IDs of local significance, and
- Provider administered VN-IDs of global significance.
In the scenario where there are provider administered VLAN-IDs of
local significance (e.g. NVE in a TOR), the value is selected by NVA
from the pool of unused VIDs when the first local TS of a VN is
being added, and returned by NVA to the unused pool of VLAN-IDs when
the last TS leaves. For TSs with hard coded VLAN-ID, it is necessary
for an entity, most likely the first switch (virtual or physical) to
which the TS is attached, to change the locally administered VLAN-
IDs to the TSs' hard coded VLAN-IDs. For un-tagged TSs, the first
switch has to remove the locally administered VLAN-IDs before
sending packets to TSs.
merged, et al. Expires April 9, 2015 [Page 8]
Internet-Draft NVO3 Mobility Scheme October 2014
The section is intended to describe:
. NVA manages unused VLAN-IDs pool in each access L2 domain
. NVE reports to NVA when first local TS of a VN is reachable, or
none of TS in a VN is reachable by the NVE
. NVA can push the global VN ID <-> locally administered VID
mapping to NVE, or NVE can pull upon detecting a newly attached
VN.
. NVA manages the first switch to which TS is attached on mapping
between TS's own VLAN-ID and "locally administered VID".
Here is the detailed procedure:
. NVE should get the specific VNID from NVA for untagged data
frames arriving at the each Virtual Access Point [VNo3-
framework 3.1.1] of a NVE.
Since local VLAN-IDs under each NVE are locally significant,
here are the possible ways for ingress NVE to assign VLAN-ID in
the overlay header for data frames destined to other NVEs:
a) carry what comes in at ingress Virtual Access point.
Preserving vlan-id can be used to provide bundled
service/PVLAN. In this case many vlan-ids in ingress could map
to one logical VN (n to 1 mapping).
b) not carrying any vlan-id and using logical VN identifier.
The egress NVE gets the vlan-id from NVA to put on the packet
before sending to attached TSs. This is 1-to-1 mapping between
vlan-id and logical-VN.
. If the data frame is already tagged before reaching the NVE's
Virtual Access Point, the NVA should inform the first switch
port that is responsible for adding VLAN-ID to the untagged
data frames of the specific VLAN-ID to be inserted to data
frames.
. If data frames from a TS are already tagged, the first port
facing the TS has be informed by the NVA of the new local VLAN-
ID to replace the VLAN-ID encoded in the data frames.
For data frames coming from network side towards TSs (i.e.
inbound traffic towards TSs), the first switching port facing
TSs have to convert the VLAN-IDs encoded in the data frames to
the VLAN-IDs used by TSs.
merged, et al. Expires April 9, 2015 [Page 9]
Internet-Draft NVO3 Mobility Scheme October 2014
5. Layer 2 Extension
5.1. Layer 2 Extension Problem
Consider a scenario where a TS that is a member of a given L2-based
VN moves from one server to another, and these two servers are in
different L2 physical attachments, where these domains may be
located in the same or different data centers (or PODs). In order to
enable communication between this TS and other TSs of that L2-based
VN, the new L2 physical attachment must become interconnected with
the other L2 physical attachment(s) that presently contain the rest
of the TSs of that VN, and the interconnect must not violate the L2-
based VN requirement to preserve source and destination MAC
addresses in the Ethernet header of the packets exchange between
this TS and other members of that VN.
Moreover, if the previous L2 physical attachment no longer contains
any TSs of that VN, the previous domain no longer needs to be
interconnected with the other L2 physical attachments(s) that
contain the rest of the TSs of that VN.
Note that supporting TS mobility implies that the set of L2 physical
attachments that contain TSs that belong to a given L2-based VN may
change over time (new domains added, old domains deleted).
We will refer to this as the "layer 2 extension problem".
Note that the layer 2 extension problem is a special case of
maintaining connectivity in the presence of TS mobility, as the
former restricts communicating TSs to a single/common L2-based VN,
while the latter does not.
5.2. NVA based Layer 2 Extension Solution
Assume NVO3's NVA has at least the following information for each
TS:
. Inner Address: TS (host) Address family (IPv4/IPv6, MAC,
virtual network Identifier MPLS/VLAN, etc)
. Outer Address: The list of locally attached edges (NVEs);
normally one TS is attached to one edge, TS could also be
merged, et al. Expires April 9, 2015 [Page 10]
Internet-Draft NVO3 Mobility Scheme October 2014
attached to 2 edges for redundancy (dual homing). One TS is
rarely attached to more than 2 edges, though it could be
possible;
. VN Context (VN ID and/or VN Name)
. Timer for NVEs to keep the entry when pushed down to or pulled
from NVEs.
. Optionally the list of interested remote edges (NVEs). This
information is for NVA to promptly update relevant edges (NVEs)
when there is any change to this TS' attachment to edges
(NVEs). However, this information doesn't have to be kept per
TS. It can be kept per VN.
NVA can offer services in a Push, Pull mode, or the combination of
the two.
In this solution, the NVEs are connected via underlay IP network.
For each VN, the NVA informs all the NVEs to which the TSs of the
given VN are attached.
When the last TS of a VN is moved out of a NVE, NVE can either
confirm with the NVA or the NVA notifies the NVE for it to remove
its connectivity to the VN. When an NVE needs to support
connectivity to a VN not currently supported (as a result of TS turn
up, or TS migration), the NVA will push the necessary VN information
into the NVE.
The term "NVE being connected to a VN" means that the NVE at least
has:
. the inner-outer address mapping information for all the TSs in
the VN or being able to pull the mapping from the NVA,
. the mapping of local VLAN-ID to the VNID used by overlay
header, and
. has the VN's default gateway IP/MAC address.
merged, et al. Expires April 9, 2015 [Page 11]
Internet-Draft NVO3 Mobility Scheme October 2014
5.3. E-VPN based Layer 2 Extension Solution
This section describes a [E-VPN] based solution for the layer 2
extension problem, i.e. the L2 sites that contain TSs of a given L2-
based VN are interconnected together using E-VPN. Thus a given E-
VPN corresponds/associated with one or more L2-based VNs (e.g.,
VLANs). An L2-based VN is associated with a single E-VPN Ethernet
Tag Identifier.
This section provides a brief overview of how E-VPN is used as the
solution for the "layer 2 extension problem". Details of E-VPN
operations can be found in [E-VPN].
A single L2 site could be as large as the whole network within a
single POD or a data center, in which case the DCBRs of that
POD/data center, in addition to acting as IP routers for the L2-
based VNs present in the POD/data center, also act as PEs. In this
scenario E-VPN is used to handle TS migration between servers in
different POD/data centers and the PE nodes support the NVE
function.
A single L2 site could be as small as a single ToR with the servers
connected to it or virtual switch with TSs attached, in which case
the ToR or the virtual switch acts as a PE-NVE. In this scenario E-
VPN is used to handle TS migration between servers that are either
in the same or in different data centers. Note that even in this
scenario this document assumes that DCBRs, in addition to acting as
IP routers for the L2-based VNs present in their data center, also
participate in the E-VPN procedures, acting as BGP Route Reflectors
or BGP peer to another route-reflector for the E-VPN routes
originated by the ToRs acting as PE-NVEs.
In the case where E-VPN is used to interconnect L2 sites in
different PODs/data centers, the network that interconnects DCBRs of
these data centers could provide either (a) only Ethernet or IP/MPLS
connectivity service among these DCBRs, or (b) may offer the E-VPN
service. In the former case DCBRs exchange E-VPN routes among
themselves relying only on the Ethernet or IP/MPLS connectivity
service provided by the network that interconnects these DCBRs. The
network does not directly participate in the exchange of these E-VPN
routes. In the latter case the routers at the edge of the network
merged, et al. Expires April 9, 2015 [Page 12]
Internet-Draft NVO3 Mobility Scheme October 2014
may be either co-located with DCBRs, or may establish E-VPN peering
with DCBRs. Either way, in this case the network facilitates
exchange of E-VPN routes among DCBRs (as in this case DCBRs would
not need to exchange E-VPN routes directly with each other).
Please note that for the purpose of solving the layer 2 extension
problem the propagation scope of E-VPN routes for a given L2-based
VN is constrained by the scope of the PE-NVEs connected to the L2
sites that presently contain TSs of that VN. This scope is
controlled by the Route Target of the E-VPN routes. Controlling
propagation scope could be further facilitated by using Route Target
Constrain [RFC4684].
Use of E-VPN ensures that traffic among members of the same L2-based
VN is optimally forwarded, irrespective of whether members of that
VN are within the same or in different data centers/PODs. This
follows from the observation that E-VPN inherently enables
(disaggregated) forwarding at the granularity of the MAC address of
the TS.
Optimal forwarding among TSs of a given L2-based VN that are within
the same data center requires propagating TS MAC addresses, and
comes at the cost of disaggregated forwarding within a given data
center. However such disaggregated forwarding is not necessary
between data centers if a given L2-based VN spans multiple data
centers. For example when a given ToR acts as a PE-NVE, this ToR has
to maintain MAC advertisement routes only to the TSs within its own
data center (and furthermore, only to the TSs that belong to the L2-
based VNs whose site(s) are connected to that ToR), and then point a
"default" MAC route to one of the DCBRs of that data center. In
this scenario a DCBR of a given data center, when it receives MAC
advertisement routes from DCBR(s) in other data centers, does not
re-advertise these routes to the PE-NVEs within its own data center,
but just advertises a single "default" MAC advertisement route to
these PE-NVEs.
When a given TS moves to a new L2 site, if in the new site this TS
is the only TS from its L2-based VN, then the PE-NVE(s) connected to
the new site need to be provisioned with the E-VPN Instances (EVI)
of the E-VPN associated with this L2-based VN. Likewise, if after
the move the old site no longer has any TSs that are in the same L2-
merged, et al. Expires April 9, 2015 [Page 13]
Internet-Draft NVO3 Mobility Scheme October 2014
based VN as the TS that moved, the PE-NVE(s) connected to the old
site need to be de-provisioned with the EVI of the E-VPN.
Procedures to accomplish this are outside the scope of this
document.
6. Optimal IP Routing
In the context of this document optimal IP routing, or just optimal
routing, in the presence of TS mobility could be partitioned into
two problems:
- Optimal routing of a TS's outbound traffic. This means that as a
given TS moves from one server to another, the TS's default
gateway should be in a close topological proximity to the ToR that
connects the server presently hosting that TS. Note that when we
talk about optimal routing of the TS's outbound traffic, we mean
traffic from that TS to the destinations that are outside of the
TS's L2-based VN. This document refers to this problem as the TS
default gateway problem.
- Optimal routing of TS's inbound traffic. This means that as a
given TS moves from one server to another, the (inbound) traffic
originated outside of the TS's L2-based VN, and destined to that
TS be routed via the router of the TS's L2-based VN that is in a
close topological proximity to the ToR that connects the server
presently hosting that TS, without first traversing some other
router of that L2-based VN (the router of the TS's L2-based VN may
be either DCBR or ToR itself). This is also known as avoiding
"triangular routing". This document refers to this problem as the
triangular routing problem.
In order to avoid the "triangular routing", routers in the Wide Area
Network have to be aware which DCBRs can reach the designated TSs.
When TSs in a single VN are spread across many different DCBRs, all
individual TSs' addresses have to be visible to those routers, which
can dramatically increase the number of routes in those routers.
If a VN is spread across multiple DCBRs and all those DCBRs announce
the same IP prefix for the VN, there could be many issues,
including:
merged, et al. Expires April 9, 2015 [Page 14]
Internet-Draft NVO3 Mobility Scheme October 2014
- Traffic could go to DCBR A where target is in DCBR B. and DCBR "A"
is connected to DCBR "B" via WAN
- If majority of one VN members are under DCBR "A" and rest are
spread across X number of DCBRs. Will DCBR "A" have same weight as
DCBR "B", "C", etc?
If all those DCBRs announce individual IPs that are directly
attached and those IPs are not segmented well, then all the TSs IP
addresses have to be exposed to the WAN. So overlay hides the TSs IP
from the core switches in one DC or one POD, but exposes them to the
WAN. There are more routers in the WAN than the number of core
switches in one DC/POD.
The ability to deliver optimal routing (as defined above) in the
presence of stateful devices is outside the scope of this document.
6.1. Preserving Policies
Moving TS from one L2 physical attachment to another means (among
other things) that the NVE in the new domain that provides
connectivity between this TS and TSs in other L2 physical
attachments must be able to implement the policies that control
connectivity between this TS and TSs in other L2 physical
attachments. In other words, the policies that control connectivity
between a given TS and its peers MUST NOT change as the TS moves
from one L2 physical attachment to another. Moreover, policies, if
any, within the L2 physical attachment that contains a given TS MUST
NOT preclude realization of the policies that control connectivity
between this TS and its peers. All of the above is irrespective of
whether the L2 physical attachments are trivial or not.
There could be policies guarding TSs across different VNs, with some
being enforced by Firewall, some enforced by NAT/AntiDDOS/IPS/IDS,
etc. It is less about NVE polices to be maintained when TSs move,
it is more along the line of dynamically changing policies
associated with the "middleware" boxes attached to NVEs (if those
middle boxes are distributed).
merged, et al. Expires April 9, 2015 [Page 15]
Internet-Draft NVO3 Mobility Scheme October 2014
6.2. TS Default Gateway solutions
As TS moves to a new L2 site, the default gateway IP address of the
TS may not change. Further, while with cold TS mobility one may
assume that TS's ARP/ND cache gets flushed once TS moves to another
server, one cannot make such an assumption with hot TS mobility.
Thus the destination MAC address in the inter-VN/inter-subnet
traffic originated by that TS would not change as TS moves to the
new site. Given that, how would NVE(s) connected to the new L2 site
be able to recognize inter-VN/inter-subnet traffic originated by
that TS? The following describes possible solutions.
6.2.1. E-VPN based TS Default Gateway Solutions
The E-VPN based solutions assume that for inter-VN/inter-subnet
traffic between TS and its peers outside of TS's own data center,
one or more DCBRs of that data center act as fully functional
default gateways for that traffic.
Both of these solutions also assume that VLAN-aware VLAN bundling
mode of E-VPN is used as the default mode such that different L2-VNs
(different subnets) for the same tenant can be accommodated in a
single EVI. This facilitates provisioning since E-VPN related
provisioning (such as RT configuration) could be done on a per-
tenant basis as opposed to on a per-subnet (per L2-VN) basis. In
this default mode, TSs' MAC addresses are maintained on a per bridge
domain basis (per subnet) within the EVI; however, TS's IP addresses
are maintained across all the subnets of that tenant in that EVI.
In the scenarios where communications among TSs of different subnets
belonging to the same tenant is to be restricted based on some
policies, then the VLAN mode of E-VPN should be used with each
VLAN/subnet mapping to its own EVI and E-VPN RT filtering can be
leveraged to enforce flexible policy-based communications among TSs
of different subnets for that tenant.
merged, et al. Expires April 9, 2015 [Page 16]
Internet-Draft NVO3 Mobility Scheme October 2014
6.2.1.1. E-VPN based TS Default Gateway Solution 1
The first solution relies on the use of an anycast default gateway
IP address and an anycast default gateway MAC address.
If DCBRs act as PE-NVEs for an E-VPN corresponding to a given L2-
based VN, then these anycast addresses are configured on these
DCBRs. Likewise, if ToRs act as PE-NVEs, then these anycast
addresses are configured on these ToRs. All TSs of that L2-based VN
are (auto) configured with the (anycast) IP address of the default
gateway.
DCBRs (or ToRs) acting as PE-NVEs use these anycast addresses as
follows:
- When a particular PE-NVE receives a packet from local L2
attachment with the (anycast) default gateway MAC address, the PE-
NVE applies IP forwarding to the packet, and perform NVE function if
the destination of the packet is attached to another NVE.
- When a particular DCBR (or ToR) acting as a PE-NVE receives an
ARP/ND Request from local L2 attachment for the default gateway
(anycast) IP address, the DCBR (or ToR) generates ARP/ND Reply.
This ensures that a particular DCBR (or ToR), acting as a PE-NVE,
can always apply IP forwarding to the packets sent by a TS to the
(anycast) default gateway MAC address. It also ensures that such
DCBR (or ToR) can respond to the ARP Request generated by a TS for
the default gateway (anycast) IP address.
Except for gratuitous ARP/ND, DCBRs (or ToRs) acting as PE-NVEs must
never use the anycast default gateway MAC address as the source MAC
address in the packets originated by these DCBRs (or ToRs), cannot
use the anycast default gateway IP address as the source IP address
in the overlay header.
Note that multiple L2-based VNs may share the same MAC address for
the purpose of using as the (anycast) MAC address of the default
gateway for these VNs.
merged, et al. Expires April 9, 2015 [Page 17]
Internet-Draft NVO3 Mobility Scheme October 2014
If the default gateway functionality is not in NVEs (TORs), then the
default gateway MAC/IP addresses need to be distributed using E-VPN
procedures. Note that with this approach when originating E-VPN MAC
advertisement routes for the MAC address of the default gateways of
a given L2-based VN, all these routes MUST indicate that this MAC
address belongs to the same Ethernet Segment Identifier (ESI). The
ESI must be configured on all NVEs that can act as distributed
default gateway. However, the anycast-MAC and anycast-IP can be
configured only on a subset of these NVEs.
6.2.1.2. E-VPN based TS Default Gateway Solution 2
The second solution does not require configuring the anycast default
gateway IP and MAC address on the PE-NVEs.
Each DCBR (or each ToR) that acts as a default gateway for a given
L2-based VN advertises in the E-VPN control plane its default
gateway IP and MAC address using the MAC advertisement route, and
indicates that such route is associated with the default gateway.
The MAC advertisement route MUST be advertised as per procedures in
[E-VPN]. The MAC address in such an advertisement MUST be set to the
default gateway MAC address of the DCBR (or ToR). The IP address in
such an advertisement MUST be set to the default gateway IP address
of the DCBR (or ToR). To indicate that such a route is associated
with a default gateway, the route MUST carry the Default Gateway
extended community [Default-Gateway].
Each PE-NVE that receives this route and imports it as per
procedures of [E-VPN] MUST create MAC forwarding state that enables
it to apply IP forwarding to the packets destined to the MAC address
carried in the route. The PE-NVE that receives this E-VPN route
follows procedures in Section 12 of [E-VPN] when replying to ARP/ND
Requests that it receives if such Requests are for the IP address in
the received E-VPN route.
6.2.2. Distributed Proxy Default Gateway Solution
In this solution, NVEs perform the function of the default gateway
for all the TSs attached. Those NVEs are called "Proxy Default
merged, et al. Expires April 9, 2015 [Page 18]
Internet-Draft NVO3 Mobility Scheme October 2014
Gateway" in this document because those NVEs might not be the
Default Gateways explicitly configured on TSs attaches. Some of
those proxy default gateway NVEs might not have the complete inter-
subnet communications policies for the attached VNs.
In order to ensure that the destination MAC address in the inter-
VN/inter-subnet traffic originated by that TS would not change as TS
moves to a different NVE, a pseudo MAC address is assigned to all
NVE-based Proxy Default Gateways.
When a particular NVE acting as Proxy Default Gateway receives an
ARP/ND Request from the attached TSs for their default gateway IP
addresses, the NVE suppresses the ARP/ND request from being
forwarded and generates ARP/ND Reply with the pseudo MAC address.
When a particular NVE acting as a Proxy Default Gateway receives a
packet with the Pseudo default gateway MAC address:
- if the NVE has all the needed policies for the Source &
Destination VNs, the NVE applies the IP forwarding, i.e. forward
the packet from source VN to the destination VN, and apply the NVE
encapsulation function with target NVE as destination address and
destination VN identifier in the header,
- if the NVE doesn't have the needed policies from the source VN to
the destination VN, the NVE applies the NVE encapsulation function
with real host's default gateway as destination address and source
VN identifier in the header
This solution assumes that the NVE-based proxy default gateways
either get the mapping of hosts' default gateway IP <-> default
gateway MAC from the corresponding NVA or via ARP/ND discovery.
6.3. Triangular Routing
The triangular routing solution could be partitioned into two
components: intra data center triangular routing solution, and inter
data center triangular routing solution. The former handles the
situation where communicating TSs are in the same data center. The
latter handles all other cases. This draft only describes the
solution for intra data center triangular routing.
merged, et al. Expires April 9, 2015 [Page 19]
Internet-Draft NVO3 Mobility Scheme October 2014
6.3.1. NVA based Intra Data Center Triangular Routing Solution
To be added.
6.3.2. E-VPN based Intra Data Center Triangular Routing Solution
This solutions assumes that as a PE-NVE originates MAC advertisement
routes, such routes, in addition to MAC addresses of the TSs, also
carry IP addresses of these TSs. Procedures by which a PE-NVE can
learn the IP address associated with a given MAC address are
specified in [E-VPN].
Consider a set of L2-based VNs, such that TSs of these VNs, as a
matter of policy, are allowed to communicate with each other. To
avoid triangular routing among such TSs that are in the same data
center this document relies on the E-VPN procedures, as follows.
Procedures in this section assume that ToRs act as PE-NVEs, and also
able to support IP forwarding functionality.
For a given set of L2-based VNs whose TSs are allowed to communicate
with each other, consider a set of E-VPN instances (EVIs) of the E-
VPNs associated with these VNs. We further restrict this set of EVIs
to only the EVIs that are within the same data center. To avoid
triangular routing among TSs within the same data center, E-VPN
routes originated by one of the EVIs within such set should be
imported by all other EVIs in that set, irrespective of whether
these other EVIs belong to the same E-VPN as the EVI that originates
the routes.
One possible way to accomplish this is
- for each set of L2-based VNs whose TSs are allowed to communicate
with each other, and for each data center that contains such VNs
have a distinct RT (distinct RT per set, per data center),
- provision each EVI of the E-VPNs associated with these VNs to
import routes that carry this RT, and
- make the E-VPN routes originated by such EVIs to carry this RT.
Note that these RTs are in addition to the RTs used to form
individual E-VPNs. Note also, that what is described here is
merged, et al. Expires April 9, 2015 [Page 20]
Internet-Draft NVO3 Mobility Scheme October 2014
conceptually similar to the notion of "extranets" in BGP/MPLS VPNs
[RFC4364].
When a PE imports an E-VPN route into a particular EVI, and this
route is associated with a TS that is not part of the L2-based VN
associated with the E-VPN of that EVI, the PE-NVE creates IP
forwarding state to forward traffic to the IP address present in the
NLRI of the route towards the Next Hop, as specified in the route.
To illustrate how the above procedures avoid triangular routing,
consider the following example. Assume that a particular TS, TS-A,
is currently hosted by a server connected to a particular ToR-NVE,
ToR-1, and another TS, TS-B, is currently hosted by a server
connected to ToR-2 (NVE). Assume that TS-A and TS-B belong to
different L2-based VNs, and (as a matter of policy) TSs in these VNs
are allowed to communicate with each other. Now assume that TS-B
moves to another server, and this server is connected to ToR-3
(NVE). Assume that ToR-1, ToR-2, and ToR-3 are in the same data
center. While initially ToR-1 would forward data originated by TS-A
and destined to TS-B to ToR-2, after TS-B moves to the server
connected to ToR-3, using the procedures described above, ToR-1
would forward the data to ToR-3 (and not to ToR-2), thus avoiding
triangular routing.
Note that for the purpose of redistributing E-VPN routes among
multiple L2-based VNs, the above procedures limit the propagation
scope of routes to individual TSs to a single data center, and
furthermore, to only a subset of the PE-NVEs within that data center
- the PE-NVEs that have EVIs of the E-VPNs associated with the L2-
based VNs whose TSs are allowed to communicate with each other. As a
result, the control plane overhead needed to avoid triangular
routing within a data center is localized to these PE-NVEs.
7. L3 Address Migration
When the attachment to NVE is L3 based, TS migration can cause one
subnetwork to be scatted among many NVEs, or fragmented addresses.
The outbound traffic of fragmented L3 addresses doesn't have the
same issue as L2 address migration, but the inbound traffic has the
same issues as L2 address migration (Section 6). In theory, host
merged, et al. Expires April 9, 2015 [Page 21]
Internet-Draft NVO3 Mobility Scheme October 2014
hosting by every NVE (including the DCBR) can achieve the optimal
path forwarding in very fragmented network. But host routing can be
challenging in a very large and highly virtualized data center,
there could be hundreds of thousands of hosts/VMs, sometimes in
millions, due to business demand and highly advanced server
virtualization technologies.
Optimal routing of TS's inbound traffic: This means that as a given
TS moves from one server to another, the (inbound) traffic
originated outside of the TS's directly attached NVE, and destined
to that TS be routed optimally to the NVE to which the server
presently hosting that TS, without first traversing some other NVEs.
This is also known as avoiding "triangular routing".
ECMP can be used by the DCBR or any NVEs that don't support host
routing or can't access NVA to distribute traffic equally to any of
the NVEs that support the subnet (VN). If an NVE doesn't have the
destination of a data packet directly attached, it can query NVA for
the target NVE to which the destination is attached, and encapsulate
the packet with the target NVE as outer destination before sending
it out.
Another approach is to designate one or two NVEs as designated
forwarder for a specific subnet when the subnet is spread across
many NVEs. For example, if high percentage of TSs of one subnet is
attached to NVE "X", the remaining small percentage of the subnet is
spread around many NVEs. Designating NVE "X" as the designated
forwarder for the subnet can greatly reduce the "triangular routing"
for the traffic destined to TSs in this subnet.
8. Managing duplicated addresses
This document assumes that during VM migration a given MAC address
within a VN can only exist at one TS at a time. As TSs move around
NVEs, it is possible that the network state may not be immediately
synchronized. It is important for NVEs to report directly attached
TSs to NVA on periodically bases so that NVA can generate alarms and
fix duplicated address issues.
merged, et al. Expires April 9, 2015 [Page 22]
Internet-Draft NVO3 Mobility Scheme October 2014
9. Manageability Considerations
Several solutions described in this document depend on the presence
of NVA in the data center.
10. Security Considerations
In addition to the security considerations described in [nvo3-
problem], it is clear that allowing TSs migrating across Data Center
will require more stringent security enforcement. The traditional
placement of security functions, e.g. firewall, at data center
gateways is no longer enough. TS mobility will require security
functions to enforce policies among east-west traffic among TSs.
When TSs move across Data Center, the associated policies have to be
updated and enforced.
11. IANA Considerations
This document requires no IANA actions. RFC Editor: Please remove
this section before publication.
12. Acknowledgements
The authors would like to thank Adrian Farrel, David Black, Dave Allen, Tom
Herbert and Larry Kreeger for their review and comments. The authors would also
like to thank Ivan Pepelnjak for his contributions to this document.
13. References
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
13.2. Informative References
[nvo3-problem] Narten T.et al., "Overlays for Network
Virtualization", draft-ietf-nvo3-overlay-problem-statement-
04, July 2013.
merged, et al. Expires April 9, 2015 [Page 23]
Internet-Draft NVO3 Mobility Scheme October 2014
[RFC4364] Rosen, Rekhter, et. al., "BGP/MPLS IP VPNs", RFC4364,
February 2006
[RFC4684] Pedro Marques, et al., "Constrained Route Distribution for
Border Gateway Protocol/MultiProtocol Label Switching
(BGP/MPLS) Internet Protocol (IP) Virtual Private Networks
(VPNs)", RFC4684, November 2006
[E-VPN] Aggarwal R., et al., "BGP MPLS Based Ethernet VPN", draft-
ietf-l2vpn-evpn, work in progress
[Default-Gateway] http://www.iana.org/assignments/bgp-extended-
communities
merged, et al. Expires April 9, 2015 [Page 24]
Internet-Draft NVO3 Mobility Scheme October 2014
Authors' Addresses
Yakov Rekhter
Juniper Networks
1194 North Mathilda Ave.
Sunnyvale, CA 94089
Email: yakov@juniper.net
Linda Dunbar
Huawei Technologies
5340 Legacy Drive, Suite 175
Plano, TX 75024, USA
Email: ldunbar@huawei.com
Rahul Aggarwal
Arktan, Inc
Email: raggarwa_1@yahoo.com
Wim Henderickx
Alcatel-Lucent
Email: wim.henderickx@alcatel-lucent.com
Ravi Shekhar
Juniper Networks
1194 North Mathilda Ave.
Sunnyvale, CA 94089
Email: rshekhar@juniper.net
Luyuan Fang
Cisco Systems
111 Wood Avenue South
Iselin, NJ 08830
Email: lufang@microsoft.com
Ali Sajassi
Cisco Systems
Email: sajassi@cisco.com
merged, et al. Expires April 9, 2015 [Page 25]