Network Working Group                                        R. Aggarwal
Internet Draft                                                Arktan Inc
Category: Standards Track
Expiration Date: March 2012
                                                              Y. Rekhter
                                                        Juniper Networks

                                                           W. Henderickx
                                                          Alcatel-Lucent

                                                              R. Shekhar
                                                        Juniper Networks

                                                       September 6, 2011


      Data Center Mobility based on BGP/MPLS, IP Routing and NHRP


               draft-raggarwa-data-center-mobility-01.txt

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Copyright and License Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Raggarwa                                                        [Page 1]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008. The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.



Abstract

   This document describes a set of solutions for seamless mobility in
   the data center. These solutions provide a tool-kit which is based on
   IP routing, BGP/MPLS MAC-VPNs, BGP/MPLS IP VPNs and NHRP.
























Raggarwa                                                        [Page 2]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


Table of Contents

 1          Specification of requirements  .........................   4
 2          Introduction  ..........................................   4
 2.1        Terminology  ...........................................   4
 3          Problem Statement  .....................................   5
 3.1        Layer 2 Extension  .....................................   5
 3.2        Optimal Intra-VLAN Forwarding  .........................   5
 3.3        Optimal Routing  .......................................   5
 4          Layer 2 Extension and Optimal Intra-VLAN Forwarding Solution  6
 5          Optimal VM Default Gateway Solution  ...................   8
 6          Triangular Routing Solution  ...........................  10
 7          Triangular Routing Solution Based on Host Routes  ......  10
 7.1        Scenario 1  ............................................  11
 7.2        Scenario 2: BGP as the Routing Protocol between DCBs  ..  12
 7.3        Scenario 2: OSPF/IS-IS as the Routing Protocol between DCBs  14
 7.4        Scenario 3: Using BGP as the Routing Protocol  .........  14
 7.4.1      Base Solution  .........................................  15
 7.4.2      Refinements: SP Unaware of DC Routes  ..................  15
 7.4.3      Refinements: SP Participates in DC Routing  ............  16
 7.5        VM Motion  .............................................  17
 7.6        Policy based origination of VM Host IP Address Routes  .  17
 7.7        Policy based instantiation of VM Host IP Address Forwarding State  17
 8          Triangular Routing Solution Based on NHRP  .............  17
 8.1        Overview  ..............................................  17
 8.2        Detailed Procedures  ...................................  19
 8.3        Failure scenarios  .....................................  20
 9          Acknowledgements  ......................................  21
10          References  ............................................  21
11          Author's Address  ......................................  22
















Raggarwa                                                        [Page 3]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


1. Specification of requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


2. Introduction

   This document describes solutions for seamless mobility in the data
   center.  Mobility in the data center is defined as the ability to
   move a virtual machine (VM) from one server in the data center to
   another server in the same or different data center while retaining
   the IP address and the MAC address of the VM. The latter is necessary
   to provide seamless application experience. The term mobility or the
   reference to moving a VM in this document, should be considered to
   imply seamless mobility, unless otherwise stated. It is also to be
   noted that VM mobility doesn't change the VLAN/subnet associated with
   the VM. Infact VM mobility requires the VLAN to be "extended" to the
   new location of the VM.

   Data center providers have expressed a desire to provide the ability
   to move VMs across data centers, where the data centers may be in
   different geographical locations. There are certain constraints to
   how far such data centers may be located geographically. This
   distance is limited by the current state of the art of the Virtual
   Machine technology, by the bandwidth that may be available between
   the data centers, the ability to manage and operate such VM mobility
   etc. This document provides a set of solutions for VM mobility. The
   practical applicability of these solutions will depend on these
   constraints. However the solutions described here provide a framework
   that enables VMs to be moved across both small and large geographical
   distances. In other words if these constraints relax over time,
   allowing VMs to move across larger geographical boundaries, the
   solutions described here will continue to be applicable.


2.1. Terminology

   In this document the term Data Center Switch (DCS) is used to refer
   to a switch in the data center that is connected to the servers that
   host VMs. A data center may have multiple DCSes. Each data center
   also has one or more Data Center Border Routers (DCB) that connect to
   other data centers and to the Wide Area Network (WAN). A DCS may act
   as a DCB.

   This document also uses the terms MAC-VPN and Ethernet-VPN (E-VPN)
   inter-changeably.



Raggarwa                                                        [Page 4]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


3. Problem Statement

   This section describes the specific problems that need to be
   addressed to enable seamless VM mobility.


3.1. Layer 2 Extension

   The first problem is to extend the VLAN of a VM across DCSes where
   the DCSes may be located in the same or different data centers.  This
   is required to enable the VM to move between the DCSes.  We will
   refer to this as the "layer 2 extension problem".


3.2. Optimal Intra-VLAN Forwarding

   The second issue has to do with optimal forwarding in a VLAN in the
   presence of VM mobility, where VM mobility may involve multiple data
   centers.

   Optimal forwarding in a VLAN by definition implies that traffic
   between VMs that are in the same VLAN, should not traverse DCBs in
   data centers that contain neither of these VMs, except if:


   The DCBs in these data centers are on the layer 2 path between the DCBs
   in the data centers that contain the VM.


   Optimal forwarding in a VLAN also implies that traffic between a
   client and a VM that are in the same VLAN, should not traverse DCBs
   in the data centers that do not contain the VM, except if:


   The DCBs in these data centers are on the layer 2 path between the client
   site border router and the DCBs in the data centers that contain the VM.



3.3. Optimal Routing

   Optimal routing, in the presence of intra-data center VM mobility,
   implies that traffic between VMs that are on different VLANs/subnets
   should not traverse a DCS or DCB in that data center that does not
   host these VMs, except if:


   The DCS or DCBs are on an IP path between the DCSes that host the VMs.



Raggarwa                                                        [Page 5]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   Optimal routing, in the presence of inter-data center VM mobility,
   implies that traffic between VMs that are on different VLANs/subnets
   should not traverse DCBs in data centers that contain neither of
   these VMs, except if:


   The DCBs in these data centers are on an IP path between the DCBs in the
   data centers that contain the VMs.


   Optimal routing also implies that traffic between a VM and a client
   that are on different VLANs/subnets should not traverse any of the
   DCBs in data centers that do not contain the VM, except if:


   The DCBs in these data centers are on an IP path between the client's
   site border roueter and the DCB of the data center that contains the VM.


   Specifically optimal routing requires a mechanism that ensures that
   the default gateway of a VM can be in the geographical proximity of
   the VM as the VM moves. Consider a VM that moves from data center 1
   (DC1) to data center 2 (DC2). Further consider that the default
   gateway of the VM is located in DC1. Once the VM moves it is
   desirable to avoid carrying traffic originating from the VM, destined
   to other subnets, back to the default gateway in DC1, as this may not
   be optimal. We will refer to this as the "VM default gateway
   problem".

   Optimal routing also requires mechanisms to avoid "triangular
   routing" to ensure that the traffic destined to a given VM would not
   traverse through a DCB of a data center that does not contain the VM.
   For example packets from VM1 and VM2 that are both in data center 1
   (DC1) but on different VLANs/subnets should not go to data center 2
   (DC2) and back to DC1. This can be the case if VM2 moves from DC2 to
   DC1, unless additional mechanisms are built to prevent this.


4. Layer 2 Extension and Optimal Intra-VLAN Forwarding Solution

   The solution for the "layer 2 extension problem", particularly when
   the DCSes are located in different data centers, relies on MAC-VPNs
   [MAC-VPN]. A DCS may be enabled with MAC-VPN, in which case it acts
   as an MPLS Edge Switch (MES). However this is not a requirement. It
   is required for the DCBs to be enabled with MAC-VPN to enable layer 2
   extension across data centers. DCBs learns MAC routes within their
   own data center either via MAC-VPN state exchange with the DCSes, or
   via data plane learning, or other layer 2 protocols between the DCSes



Raggarwa                                                        [Page 6]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   and the DCBs. The DCBs MUST advertise these MAC routes as MAC-VPN
   routes. This way DCBs in one data center learns about MAC routes in
   other data centers. The specifics of such advertisement depends on
   the inter-connect between the DCBs as described below.


     -  IP, MPLS (e.g.,  or Layer 2 Interconnect between the DCBs; and
       between the client site border router and the DCBs. In this case
       the provider of the IP, MPLS or Layer 2 Interconnect does not
       participate in MAC-VPN.  The DCBs MUST exchange MAC-VPN routes
       using either IBGP or (multi-hop) EBGP peering. In addition if
       DCSes support MAC-VPN the DCBs MUST act as BGP Route Reflectors
       (RRs). IBGP peering may utilize additional RRs in the data center
       infrastructure (RR hierarchy). Note that in this scenario the
       provider of the IP, MPLS or Layer 2 Interconnect is not involved
       in these IBGP or EBGP peerings/exchanges.


     -  MAC-VPN as a Data Center Interconnect (DCI) service. The DCI
       service may be offered by a Service Provider (SP). There are two
       variants to this model. In the first variant the WAN Border
       Router is the same as the DCB.  In other words DCB is provided by
       the SP and may be used to provide DCI for DCSes belonging to
       multiple enterprises. The DCSes may connect to the DCBs using
       Layer 2 Protocols or even MAC-VPN peering. The DCBs MUST exchange
       MAC-VPN routes between themselves. The DCBs may utilize BGP RRs
       to exchange such routes. If there is MAC-VPN peering between the
       DCB and the DCSes within the DCB's own data center, then the DCB
       propagates the MAC-VPN routes that it learns from other DCBs to
       the DCSes within its own data center,

       In the second variant the WAN Border Router is not the same
       device as the DCB. In this variant the DCBs may connect to the
       WAN Border routers using layer 2 protocols. Or WAN Border Routers
       may establish MAC-VPN peering with the DCBs in which case the
       DCBs MUST advertise the MAC-VPN routes using either IBGP or
       (multi-hop) EBGP to the WAN Border routers. The WAN Border
       routers MUST exchange MAC-VPN routes between themselves. The WAN
       Border routers may utilize BGP RRs to exchange such routes. A WAN
       Border router propagates the MAC-VPN routes that it learns from
       other WAN Border routers to the DCBs that it is connected to if
       there is MAC-VPN peering between the DCBs and the WAN Border
       Routers.


   Please note that the propagation scope of MAC-VPN routes for a given
   VLAN/subnet is constrained by the scope of data centers that span
   that VLAN/subnet and this is controlled by the Route Target of the



Raggarwa                                                        [Page 7]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   MAC-VPN routes.

   The use of MAC-VPN ensures that traffic between VMs and clients, that
   are on the same VLAN, is optimally forwarded irrespective of the
   geographical extension of the VLAN. This follows from the observation
   that MAC-VPN inherently enables disaggregated forwarding at the
   granularity of the MAC address of the VM. MAC-VPN also allows
   aggregating MAC-VPN addresses into MAC prefixes. Optimal intra-VLAN
   forwarding requires propagating VM MAC addresses and comes at the
   cost of of disaggregated forwarding within a given data center.
   However such disaggregated forwarding is not necessary between data
   centers. For example for a MAC-VPN enabled DCS, this DCS has to
   maintain MAC routes only to the VMs within its own data center, and
   then point a "deafult MAC route" to the DCB of that data center.
   Another example would be advertisement of prefix-MAC routes by a
   DCS/DCB when its possible to assign a structure to the MAC addresses.

   This document assumes that the VM's VLAN and policy, e.g., firewalls,
   associated with a VM are present on the DCS to which the VM moves. If
   this is not the case then in addition to MAC-VPNs layer 2 extension
   requires the ability to move policies dynamically. The procedures for
   doing so are for further study.


5. Optimal VM Default Gateway Solution

   The solution for the "VM default gateway problem" relies on requiring
   the ability to perform routing at each DCB. This is in addition to
   requiring layer 2 forwarding and MAC-VPN functionality on a DCB. In
   addition it is desirable to be able to perform routing on the DCSes.

   Please note that when a VM moves the default gateway IP address of
   the VM may not change. Further the ARP cache of the VM may not time
   out. Rest of this section is written with this in mind.

   First consider the case where each DCB acts as a router but the DCSes
   do not act as routers. In this case the default gateway of a VM, that
   moves in the geographical proximity of a new DCB, may be the new DCB
   as long as there is a mechanism for the new DCB to be able to route
   packets that the VM sent to the "original" default gateway's MAC
   address.

   Now consider the case where one or more DCSes act as a router. In
   this case the default gateway of a VM, that moves to a particular
   DCS, may be the new DCS as long as there is a mechanism for the new
   DCS to be able to route packets sent by the VM to the "original"
   default gateway's MAC address.




Raggarwa                                                        [Page 8]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   There are two mechanisms to address the above cases.

   The first mechanism relies on the use of an anycast default gateway
   IP address and an anycast default gateway MAC address. These anycast
   addresses are configured on each DCB that is part of the layer 2
   domain. This requires co-ordination to ensure that the same anycast
   addresses are configured on DCBs, which may or may not be in the same
   data center, that are part of the same layer 2 domain. The anycast
   addresses are also configured on the DCSes that act as routers. This
   ensures that a particular DCB or DCS, when the DCS acts as a router,
   can always route packets sent by a VM to the anycast default gateway
   MAC address. It also ensures that such DCB or DCS can respond to the
   ARP request for the anycast IPaddress, generated by a VM. This
   mechanism

   The second mechanism lifts the restriction to configure the anycast
   default gateway addresses on each DCB or DCSes. This is accomplished
   by each DCB and the DCSes that act as routers, propagating, in the
   BGP MAC-VPN control plane, its default gateway IP and MAC address
   using the MAC advertisement route. To accomplish this the MAC
   advertisement route MUST be advertised as per the procedures in [MAC-
   VPN]. The MAC address in such an advertisement MUST be set to the
   default gateway MAC address of the DCB or DCS. The IP address in such
   an advertisement MUST be set to the default gateway IP address of the
   DCB or DCS. A new BGP community called the "Default Gateway
   Community" MUST be included with the route.  Each DCB or DCS that
   receives this route and imports it as per the procedures o f [MAC-
   VPN] SHOULD:

     - Create forwarding state that enables it to route packets destined
       to the default gateway MAC address of the advertising DCB or DCS.

     -  As an optimization, optionally, reply to ARP requests, that it
       receives, destined to the default gateway IP address of the
       advertising DCB or DCS.  The MAC address in the ARP response
       should be the MAC address associated with the IP address to which
       the ARP was sent.














Raggarwa                                                        [Page 9]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


6. Triangular Routing Solution

   There are two Triangular Routing solutions proposed in this document.

   The first Triangular Routing Solution is based on propagating routes
   to VM host IP addresses (/32 IPv4 or /128 IPv6) using IP routing or
   BGP/MPLS VPNs [RFC 4364] with careful consideration given to
   constraining the propagation of these addresses.

   The second solution relies on using Next Hop Resolution Protocol
   (NHRP).

   The section "Triangular Routing Solution based on Host Routes"
   describes the details of the first solution. The section "Triangular
   Routing Solution based on NHRP" describes details of the second
   solution.


7. Triangular Routing Solution Based on Host Routes

   The solution to the triangular routing problem based on MAC-VPN, IP
   routing or BGP/MPLS VPNs [RFC 4364] relies on the propagation of the
   host IP address of the VM. Further the solution provides a toolkit to
   constrain the scope of the distribution of the host IP address of the
   VM. In other words the solution relies on disaggregated routing with
   the ability to control which nodes in the network have the
   disaggregated information and also the ability to aggregate this
   information as it propagates in the network.

   The solution places the following requirements on DCSes and DCBs:

     -  A given DCB MUST implement IP routing using OSPF/IS-IS or/and
       BGP. A given DCB MAY implement BGP/MPLS VPNs. A DCB MUST
       implement MAC-VPN.

     -  A given DCS MAY implement IP routing using OSFP/IS-IS. A DCS MAY
       implement IP routing using BGP. A DCS MAY implement BGP/MPLS
       VPNs.  A DCS MAY implement MAC-VPN.


   To accomplish this each DCS/DCB SHOULD advertise the IP addresses of
   the VMs, in MAC-VPN, IP routing or using VPN IPv4 or VPN IPv6 address
   family, as per IP VPN [RFC 4364] procedures. The IP address of a VM
   maybe learned by an DCS either from data plane packets generated by
   the VM or from the control/management plane, if there is a
   control/management plane integration between the server hosting the
   VM and the DCS.




Raggarwa                                                       [Page 10]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   The propagation of the VM host IP addresses advertised by an DCS/DCB
   is constrained to a set of DCSes/DCBs. Such constrained distribution
   needs to address three main scenarios:

     -  Scenario 1. Traffic between VMs that are on different
       VLANs/subnets in the same data center. This scenario assumes that
       VM can move only among DCSes that are in the same data center.

     -  Scenario 2. Traffic between VMs (or between a VM and a client)
       that are on different VLANs/subnets in different DCs, but the DCs
       are in close geographical proximity. An example of this is
       multiple DCs in San Francisco or DCs in San Francisco and Los
       Angeles. This scenario assumes that VM can move only among DCs
       that are in close geographical proximity.

     -  Scenario 3. Traffic among VMs (or between a VM and a client)
       that are on different VLANs/subnets, in different DCs, and these
       DCs are not in close geographical proximity. An example of this
       is DCs in San Francisco and Denver.  In this scenario VM may move
       among DCs that are not in close geographical proximity



7.1. Scenario 1

   A DCS may originate /32 or /128 routes for all VMs connected to it.
   These routes may be propagated using MAC advertisement routes in MAC-
   VPN, along with the MAC address of the VM. Or they may be propagated
   using OSPF or IS-IS or BGP or even using BGP VPN IPv4/IPv6 routes
   [RFC 4364].  In either case the distribution scope of such routes is
   constrained to only the DCSes and the DCBs in the data center to
   which the DCS belongs.  If BGP is the distribution protocol then this
   can be achieved by treating DCBs as the Route Reflectors. If OSPF/IS-
   IS is the routing protocol then this can be achieved by treating the
   data center as an IGP area.

   When MAC-VPN is used for distributing VM host IP routes by DCSes,
   within the data center, then the Route Target of such routes must be
   such that the routes can be imported by all the DCSes and DCBs in the
   data center, even if they do not have members in the VLAN associated
   with the MAC address in the route. When a DCS or DCB imports such a
   route, then it should create IP forwarding state to route the IP
   address present in the advertisement with the next-hop as the DCS/DCB
   from which the advertisement was received.

   Consider a VM in a VLAN connected to DCS1 that sends a packet to a
   VM, in another VLAN, connected to DCS2. Further consider that DCS1
   and DCS2 are in the same data center. Then DCS1 will be able to route



Raggarwa                                                       [Page 11]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   the packet optimally to DCS2. For instance this packet may be sent
   directly from DCS1 to DCS2 without having to go through a DCB, if
   there is physical connectivity between DCS and DCS2. This is because
   DCS1 would have received and imported the host IP route to reach the
   destination VM.


7.2. Scenario 2: BGP as the Routing Protocol between DCBs

   A DCS MAY advertise /32 or /128 routes for all VMs connected to it
   using the procedures described in "Scenario 1". Note that the DCSes
   may use OSPF or IS-IS or BGP as the routing protocol.

   If a DCS advertises host routes as described above then the DCBs in
   the data center MUST learn the VM host routes within their data
   center from the routes advertised by the DCSes. If the DCSes do not
   advertise host routes but implement MAC-VPN then the DCSes SHOULD
   advertise the IP address of a VM along with the MAC advertisement for
   that VM. In this case the DCBs MUST learn the VM host IP addresses
   from the MAC advertisement routes. If the DCSes neither advertise VM
   host routes nor implement MAC-VPN then DCBs must rely on data plane
   snooping to learn the MAC addresses of the VMs.

   The DCBs in the data center originate /32 or /128 routes for all the
   VMs within their own data center as BGP IPv4/IPv6 routes or as BGP
   VPN IPv4/IPv6 routes. These routes are propagated to other DCBs that
   are in data centers in close geographical proximity of the data
   center originating the routes. To achieve this the routes carry one
   or more Route Targets (RT). These route targets control which of the
   other DCBs or Route Reflectors import the route.

   One mechanism to constrain the distribution of such routes is to
   assign a RT per DCB or per set of DCBs. This set of DCBs may be
   chosen based on geographical proximity. Note that when BGP/MPLS VPNs
   are used this RT is actually per {VPN, DCB} tuple or {VPN, set of
   DCBs} tuple. The rest of this section will refer to this as "DCB Set
   RT" for simplicity.

   Each DCB in a particular set of data centers is then configured with
   this RT.  A DCB may belong to multiple data center sets and hence may
   be configured with multiple DCB Set RTs. If a DCB that is in one or
   more Data Center Sets advertises a VM host IP address route, it MUST
   include all the DCB Set RTs it is configured with along with the
   route. This results in each DCB that is part of one or more of these
   Data Center Sets to import the route.

   A DCB MAY advertise a default IP route to the DCSes in its own data
   center employing a "vitual hub-and-spoke" methodology. Or a DCB MAY



Raggarwa                                                       [Page 12]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   advertise the IP routes received from other DCBs to the DCSes in its
   own data center.

   Consider a VM or a client in a VLAN in an IP VPN in particular data
   center that sends a packet to a VM, in another VLAN. Further consider
   that the destination VM is in a data center which is in the same data
   center set as the sender VM or client. Then the DCS that the sender
   VM or client is connected to will able to route the packet optimally.
   This is because the DCB in this DCS's data center would have received
   and imported the host IP route to reach the destination VM. Note that
   the DCS may have imported only a default route advertised by the DCB
   in the DCS's own data center.

   Now consider that the sender VM's or client's data center and the
   destination VM's data center are not in the same Data Center Set. In
   this case the packet sent by the sender VM or client will first be
   routed as per the best IP prefix route to reach the destination VM.
   The next-hop DCB of this route may be in the same Data Center Set as
   the destination VM's data center, in which case this next-hop DCB
   will be able to route the packet optimally. If this is not the case
   then the packet will be forwarded by the next-hop DCB as per its best
   route.

   Constraining the VM host IP address route using the DCB Set RT
   provides a mechanism for optimal routing within the set of data
   centers that are configured with the DCB Set RT.

   For example consider data centers in San Francisco and Los Angeles.
   All the DCBs in these data centers may be assigned a particular Data
   Center Set import RT, RT1. Further each DCB advertises VM host IP
   addresses with RT1.  As a result it is possible to perform optimal
   routing of packets destined to a VM in one of these data centers if
   the packet is originated by a VM or client in one of these data
   centers. It is also possible to perform this optimal routing for a
   packet that is originated outside these data centers, once the packet
   reaches a DCB in these data centers.  However if there are multiple
   entry points i.e., DCBs in these data centers then this mechanism is
   not sufficient for WAN routers to optimally route the packet  to the
   DCB, that the VM is closest to. Please see the section on "Sceanrio
   3: Using BGP as the Routing Protocol" for procedures on how to
   achieve this.










Raggarwa                                                       [Page 13]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


7.3. Scenario 2: OSPF/IS-IS as the Routing Protocol between DCBs

   A DCS MAY advertise /32 or /128 routes for all VMs connected to it
   using the procedures described in "Scenario 1". Note that the DCSes
   may use OSPF or IS-IS or BGP as the routing protocol.

   If a DCS advertises host routes as described above then the DCBs in
   the data center MUST learn the VM host routes within their data
   center from the routes advertised by the DCSes. If the DCSes do not
   advertise host routes but implement MAC-VPN then the DCSes SHOULD
   advertise the IP address of a VM along with the MAC advertisement for
   that VM. In this case the DCBs MUST learn the VM host IP addresses
   from the MAC advertisement routes. If the DCSes neither advertise VM
   host routes nor implement MAC-VPN then DCBs must rely on data plane
   snooping to learn the MAC addresses of the VMs.

   DCBs must follow IGP procedures to propagate the host routes within
   the non-backbone IGP area to which they belong.

   "Geographical proximity" is defined by an IGP area. The /32 /128
   routes are only propagated in the non-backbone IGP area to which the
   DCSes and DCB belong. This assumes that geographically proximate data
   centers are in their non-backbone IGP area. This solution is a
   natural fit with the OSPF/IS-IS model of operations. It avoids
   triangular routing when the sender VM/client and destination
   VM/client are in the same IGP area using principles that are very
   similar to those described in the section "Scenario 2: BGP as the
   Routing Protocol".


7.4. Scenario 3: Using BGP as the Routing Protocol

   The mechanisms to address Scenario 2 does not address Scenario 3.
   Specifically they do not address the distribution of VM host IP
   routes between DCBs that are not in close geographical proximity.
   This distribution may be necessary if it is desirable to ensure that
   a packet from a data center, outside the set of data centers desribed
   above, is to be routed to the optimal entry point in the set. For
   example if a VM in VLAN1 moves from San Francisco to Los Angeles,
   then it may be desirable to route packets from New York to Los
   Angeles without going through San Francisco, if such a path exists
   from New York to Los Angeles.

   The section "Base Solution" describes the base solution to address
   Scenario 3 based on BGP as the routing protocol. The section
   "Refinements" describes the modifications to these base procedures to
   improve the scale of the solution.




Raggarwa                                                       [Page 14]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


7.4.1. Base Solution

   A given DCB MUST advertise in IP routing routes for the IP subnets
   configured on the DCB. These are NOT host (/32 /128) routes. Instead
   these are prefix/aggregated routes. Further DCB of a given data
   center MUST originate into BGP IPv4/IPv6 or VPN IPv4/IPv6 host routes
   for all the VMs currently being present within its own DC. These
   routes are propagated to all DCBs in all data centers. This requires
   all host routes to be maintained by all DCBs at least in the control
   plane.

   This base solution may impose significant control plane overhead
   depending on the number of VM host IP addresses across all data
   centers. However it may be applicable as is in certain environments.

   Please see the next section "Refinements" for procedures that may be
   employed to improve the scale of this solution.


7.4.2. Refinements: SP Unaware of DC Routes

   We first consider the case where the SP does not participate in data
   center routing. Instead the SP just provides layer 2 or IP
   connectivity between the DCBs.

   In this case the VM host routes are propagated by the DCBs to the
   Route Reflectors (RRs) where the RRs are part of the data center
   infrastructure. Distribution of these routes to the RRs is
   constrained using Route Target that is configured on all RRs. In
   addition such VM host routes also carry the DCB Set RTs as described
   in "Section 2: BGP as the Routing Protocol". The RRs propagate such
   routes to all the DCBs that belong to the DCB Set RTs present in the
   route.

   In addition the propagation of these routes from RRs to other DCBs
   and/or client site border routers is done on demand. A given DCB,
   that needs to send traffic to a particular VM in some other data
   center would dynamically/on-demand request the host route to that VM
   from its RR using "prefix-based Outbound Route Filter (ORF)" A DCB
   can determine whether it requires a VM host IP address based on
   policy. For example the policy may be based on high volume of traffic
   to the destination IP address of the VM.  This mechanism reduces the
   number of host routes that a DCB needs to maintain.  Likewise, a
   given client site border router that needs to send traffic to a
   particular VM would dynamically/on-demand request the host route to
   that VM using prefix-based ORF. This reduces the number of host
   routes that client site border router needs to maintain.




Raggarwa                                                       [Page 15]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


7.4.3. Refinements: SP Participates in DC Routing

   This section considers the case where the SP offers Inter-DC routing
   as a service. To enable this the IPv4/IPv6 or VPN IPv4/IPv6 host VM
   routes need to be propagated by the SP.

   The first variant of this is the case where the DCBs are managed by
   the SP and the WAN Border Router is the same device as the DCB. The
   procedures of this variant are the same as those in "Refinements: WAN
   Unaware of DC Routes" except that the DCBs and the RR infrastructure
   is managed by the SP. In this variant it is desirable that the inter-
   DCB routing protocol is based on BGP/MPLS IP VPNs.

   The second variant of this is the case where the WAN Border Router
   and DCBs are separate devices and DCBs are not managed by the SP. In
   this variant the DCBs first need to propagate the routes to the WAN
   border routers. This can be done by configuring the WAN border
   routers with the Data Center Set RTs of all the data centers that the
   WAN border routers are connected to. WAN border routers would then
   need to import BGP IPv4/IPv6 or VPN IPv4/IPv6 routes that carry one
   of these RTs.

   Next the WAN border routers maybe configured to propagate such
   routes. As they propagate such routes, they MUST include a RT that
   controls which other routers in the WAN import such routes.

   One possible mechanism is to propagate such routes only to Route
   Reflectors (RRs) in the WAN. This can be accomplished by configuring
   the RRs with a particular import RT and by propagating the routes at
   the WAN border routers along with this RT. Now DCBs or border routers
   or PEs in the WAN can d dynamically request routes using prefix-based
   ORF for one or more host VM addresses.

   For instance the policy maybe to request such routes for a particular
   host address if the traffic to that host address exceeds a certain
   threshold.  This does require data plane statistics to be maintained
   for flows. This policy may be implemented on a WAN border router or
   PE which can then dynamically request host routes from a RR using BGP
   Outbound Route Filtering (ORF).












Raggarwa                                                       [Page 16]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


7.5. VM Motion

   The procedures described in this document require that a DCS that
   originates a VM host IP route MUST be able to detect when that VM
   moves to another DCS. If DCSes support MAC-VPN then the procedures in
   MAC-VPN MUST be used to detect VM motion. If DCSes do not support
   MAC-VPN then the DCSes must rely on layer 2 mechanisms or control
   plane/management plane interaction between the DCS and the VM to
   detect VM motion.

   When the DCS detects such VM motion it MUST withdraw the host VM
   route, that it advertised, from IGP or BGP.


7.6. Policy based origination of VM Host IP Address Routes

   When a DCS/DCB learns the host IP address of a VM it may not
   originate a corresponding VM host IP address route by default.
   Instead it may optionally do so based on a dynamic policy. For
   example the policy maybe to originate such a route only when the
   traffic to the VM exceeds a certain threshold.


7.7. Policy based instantiation of VM Host IP Address Forwarding State

   When a DCS/DCB learns the host IP address of a VM, from another DCS
   or DCB, it may not immediately install this route in the forwarding
   table.  Instead it may optionally do so based on a dynamic policy.
   For example the policy maybe to install such forwarding state only
   when the first packet to that particular VM is received.


8. Triangular Routing Solution Based on NHRP

8.1. Overview

   The following describes a scenario where a client within a given
   customer site communicates with a VM, and the VM could move among
   several data centers (DCs).

   Assume that a given VLAN/subnet, subnet X, spans two DCs, one in SF
   and another in LA. DCB-SF is the DCB for the SF DC. DCB-LA is the DCB
   for the LA DC. Since X spans both the SF DC and the LA DC, both DCB-
   SF and DCB-LA advertise a route to X (this is a route to a prefix,
   and not a /32 route).

   DCB-LA and DCB-SF can determine whether a particular VM on that
   VLAN/subnet is in LA or SF by running MAC-VPN (and exchanging MAC-VPN



Raggarwa                                                       [Page 17]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   routes among themselves).

   There is a site in Denver, and that site contains a host B that wants
   to communicate with a particular VM, VM-A, on the subnet X.

   Assume that there is an IP infrastructure that connects the border
   router of the site in Denver, DCB-SF, and DCB-LA. This infrastructure
   could be provided by either 2547 VPNs, or IPSec tunnels over the
   Internet, or by L2 circuits. [Note that this infrastructure does not
   assume that the border router in Denver is 1 IP hop away from either
   DCB-SF or DCB-LA].

   Goal: If VM-A is in LA, then the border route in Denver sends traffic
   for VM-A via DCB-LA without going first through DCB-SF. If VM-A is in
   SF, then the border route in Denver send traffic for VM-A via DCB-SF
   without going first through DCB-LA. This should be true except for
   some transients during the move of VM-A between SF and LA.

   To accomplish this we would require the border router in Denver, DCB-
   SF, and DCB-LA to support NHRP, and support GRE encapsulation.  In
   NHRP terminology DCB-SF and DCB-LA are NHSs, while the border router
   in Denver is an NHC.

   This document does not rely on the use of NHRP Registration
   Request/Reply messages, as DCBs/NHSs rely on the information provided
   by MAC-VPN.

   DCB-SF will be an authoritative NHS for all the /32s from X that are
   presently in the SF DC. Likewise, DCB-LA will be an authoritative NHS
   for all the /32s from X that are presently in the LA DC. Note that as
   a VM moves from SF to LA, the authoritative NHS for the IP address of
   that VM moves from DCB-SF to DCB-LA.

   We assume that the border router in Denver can determine the subset
   of the destination for which it has to apply NHRP. One way to do this
   would be for DCB-SF and DCB-LA to use OSPF tag to mark a route for X,
   and then make the border router in Denver to apply NHRP to any
   destination that matches any route that carries that particular tag.
   Another way to do this wouldbe for DCB-SF and DCB-LA to use a
   particular BGP community to mark a route for X, and then make the
   border router in Denver to apply NHRP to any destination that matches
   any route that carries that particular BGP community.









Raggarwa                                                       [Page 18]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


8.2. Detailed Procedures

   The following describes details of NHRP operations.

   When the border router in Denver first receives a packet from B
   destined to VM-A, the border router determines that VM-A falls into
   the subset of the destination for which the border router has to
   apply NHRP. Therefore, the border router originates an NHRP Request.
   [Note that the trigger for the originating an NHRP Request may be
   either the first packet destined to a particular /32, or a particular
   rate threshold for the traffic to that /32.] This Request is
   encapsulated into an IP packet, whose source IP address is the
   address of the border router, and whose destination IP address is the
   address of VM-A. The packet carries the Router Alert option.  NHRP is
   carried directly over IP using IP Protocol Number 54 [rfc1700].

   Following the route to X, the packet will eventually get to either
   DCB-SF or DCB-LA. Let's assume that it is DCB-SF that receives the
   packet. [None of the routers, if any, between the site border router
   in Denver and DCB-SF or DCB-LA would be required to support NHRP.]
   However, since both DCB-SF and DCB-LA assume to support NHRP, they
   would be required to process the NHRP Request carried in the packet.

   If DCB-SF determines that VM-A is in LA (DCB-SF determines this from
   the information provided by MAC-VPN), then DCB-SF will forward the
   packet to DCB-LA, as DCB-SF is not an authoritative NHS for VM-A,
   while DCB-LA is. [A way for DCB-SF to forward the packet to DCB-LA
   would be for DCB-SF to change to DCB-LA the destination address in
   the IP header of the packet. Alternatively, DCB-SF could keep the
   original destination address in the IP header, but set the
   destination MAC address to the MAC address of DCF-LA.]

   When the NHRP Request will reach DCB-LA, and DCB-LA determines that
   VM-A is in LA (DCB-LA determines this from the information provided
   by MAC-VPN), and thus DCB-LA is an authoritative NHS for VM-A, DCB-LA
   sends back to the border router in Denver an NHRP Reply indicating
   that DCB-LA should be used for forwarding traffic to VM-A (When
   sending the NHRP Reply, DCB-LA determines the address of the border
   router in Denver from the NHRP Request). Once the border router in
   Denver receives the Reply, the border router will encapsulate all the
   subsequent packets destined to VM-A into GRE with the outer header
   carrying DCB-LA as the IP destination address.  [In effect that means
   that the border router in Denver will install in its FIB a /32 route
   for VM-A indicating GRE encapsulation with DCB-LA as the destination
   IP address in the outer header.]

   Now assume that VM-A moves from LA to SF. Once DCB-LA finds this out
   (DCB-LA finds this out from the information provided by MAC-VPN),



Raggarwa                                                       [Page 19]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   DCB-LA sends an NHRP Purge to the border router in Denver. [Note that
   DCB-LA can defer sending the Purge message until it receives GRE-
   encapsulated data destined to VM-A. Note also, that in this case DCB-
   LA does not have to keep track of all the requestors for VM-A to whom
   DCB-LA subsequently sent NHRP Replies, as DCB-LA determines the
   address of these requestors from the outer IP header of the GRE
   tunnel.]

   When the border router in Denver receives the Purge message, it will
   purge the previousely received information that VM-A is reachable via
   DCB-LA. In effect that means that the border router in Denver will
   remove /32 route for VM-A from its FIB (but will still retain a route
   for X).

   >From that moment the border router in Denver will start forwarding
   packets destined to VM-A using the route to the subnet X (relying on
   plain IP routing). That means that these packets will get to DCB-SF
   (which is the desirable outcome anyway).

   However, once the border router in Denver receives NHRP Purge, the
   border router will issue another NHRP Request. This time, once this
   NHRP Request reaches DCB-SF, DCB-SF will send back to the border
   router in Denver an NHRP Reply (as at this point DCB-SF determines
   that VM-A is in SF, and therefore DCB-SF is an authoritative NHS for
   VM-A).  Once the border router in Denver receives the Reply, the
   router will encapsulate all the subsequent packets destined to VM-A
   into GRE with the outer header carrying DCB-SF as the IP destination
   address. In effect that means that the border router in Denver will
   install in its FIB a /32 route for VM-A indicating GRE encapsulation
   with DCB-SF as the destination IP address in the outer header.


8.3. Failure scenarios

   To illustrate operations during failures let's modify the original
   example by assuming that each DC has more than one DCB. Specifically,
   DC in SF has DCB-SF1 and DCB-SF2.  Both of these are authoritative
   NHSs for all the VMs whose addresses are take from X, and who are
   presently in the SF DC.  Note also that both DCB-SF1 and DCB-SF2
   advertise a route to X.

   Assume that the VM-A is presently in SF, so the border router in
   Denver tunnels the traffic to VM-A through DCB-SF1.

   Now assume that DCB-SF1 crashes. At that point the border router in
   Denver should stop tunnelling the traffic through DCB-SF1, and should
   switch to DCB-SF2. A way to accomplish this is to make each DCB to
   originate /32 route for its own IP address that it would advertise in



Raggarwa                                                       [Page 20]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


   the NHRP Replies. This way when DCB-SF1 crashes, the route to DCB-SF1
   IP address goes away, providing indication to the border router in
   Denver that it no longer can use DCB-SF1. At that point the border
   router in Denver removes /32 route for VM-A from its FIB (but will
   still retain a route for X). From that moment the border router in
   Denver will start forwarding packets destined to VM-A using the route
   to the subnet X. Since DCB-SF1 crashes, these packets will be routed
   to DCB-SF2, as DCB-SF2 advertises a route to X.

   However, once the border router in Denver detects that DCB-SF1 is
   down, the border router will issue another NHRP Request. This time,
   NHRP Request reaches DCB-SF2, and DCB-SF2 will send back to the
   border router in Denver an NHRP Reply. Once the border router in
   Denver receives the Reply, the router will encapsulate all the
   subsequent packets destined to VM-A into GRE with the outer header
   carrying DCB-SF2 as the IP destination address. In effect that means
   that the border router in Denver will install in its FIB a /32 route
   for VM-A indicating GRE encapsulation with DCB-SF2 as the destination
   IP address in the outer header.



9. Acknowledgements

   We would like to thank Dave Katz for reviewing the NHRP procedures.



10. References

   [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006

   [MAC-VPN] "BGP/MPLS Based Ethernet VPN", draft-raggarwa-sajassi-
   l2vpn-evpn-01.txt, R. Aggarwal et al.

   [RFC2332] "NBMA Next Hop Resolution Protocol (NHRP)", RFC 2332, J.
   Luciani et. al.














Raggarwa                                                       [Page 21]


Internet Draft draft-raggarwa-data-center-mobility-01.txt September 2011


11. Author's Address

   Rahul Aggarwal
   Arktan Inc
   Email: raggarwa_1@yahoo.com


   Yakov Rekhter
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: yakov@juniper.net

   Wim Henderickx
   Alcatel-Lucent
   e-mail: wim.henderickx@alcatel-lucent.com

   Ravi Shekhar
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: rskhehar@juniper.net






























Raggarwa                                                       [Page 22]