Network Working Group                                        B. Sarikaya
Internet-Draft                                                 L. Dunbar
Intended status: Standards Track                              Huawei USA
Expires: August 3, 2015                                    B. Khasnabish
                                                           ZTE (TX) Inc.
                                                                  F. Xia
                                                              Huawei USA
                                                        January 30, 2015


         Virtual Machine Mobility Protocol for Overlay Networks
                draft-sarikaya-nvo3-vmm-dmm-pmip-05.txt

Abstract

   This document specifies a virtual machine mobility protocol in data
   centers built with overlay-based network virtualization approach.
   The protocol is based on the virtual machine sending a gratuitous
   Address Resolution Protocol request in IPv4 and unsolicited neighbor
   advertisement message in IPv6 which are broadcast or sent to all
   nodes after moving to the new Network Virtualization Edge.  These
   messages enable the Network Virtualization Edges update their virtual
   machine MAC address to the tunnel endpoint tables.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 3, 2015.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Sarikaya, et al.         Expires August 3, 2015                 [Page 1]


Internet-Draft            VM Mobility Solution              January 2015


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions and Terminology . . . . . . . . . . . . . . . . .   3
   3.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Overview of the protocol  . . . . . . . . . . . . . . . . . .   4
   5.   IPv6 Operation . . . . . . . . . . . . . . . . . . . . . . .   5
     5.1.  IPv6 Unsolicited Neighbor Advertisement . . . . . . . . .   5
     5.2.  Encapsulation . . . . . . . . . . . . . . . . . . . . . .   6
   6.   IPv4 Operation . . . . . . . . . . . . . . . . . . . . . . .   6
     6.1.  Gratuitous ARP  . . . . . . . . . . . . . . . . . . . . .   6
     6.2.  Encapsulation . . . . . . . . . . . . . . . . . . . . . .   7
   7.  Handling Packets in Flight  . . . . . . . . . . . . . . . . .   7
   8.  Moving Local State of VM  . . . . . . . . . . . . . . . . . .   7
   9.  Handling of Hot, Warm and Cold Virtual Machine Mobility . . .   8
   10. Virtual Machine Operation . . . . . . . . . . . . . . . . . .   8
     10.1.  Virtual Machine Lifecycle Management . . . . . . . . . .   9
   11. Security Considerations . . . . . . . . . . . . . . . . . . .   9
   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     14.1.  Normative References . . . . . . . . . . . . . . . . . .   9
     14.2.  Informative references . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Data center networks are being increasingly used by telecom operators
   as well as by enterprises.  In this document we are interested in
   overlay-based data center networks supporting multitenancy.  These
   networks are organized as one large Layer 2 network geographically
   distributed in several buildings.

   Virtualization which is being used in almost all of today's data
   centers enables many virtual machines to run on a single physical
   computer or compute server.  Virtual machines (VM) need hypervisor
   running on the physical compute server to provide them shared
   processor/memory/storage.  Network connectivity is provided by the
   network virtualization edge (NVE) [I-D.ietf-nvo3-arch],
   [I-D.ietf-nvo3-nve-nva-cp-req].  Being able to move VMs dynamically,



Sarikaya, et al.         Expires August 3, 2015                 [Page 2]


Internet-Draft            VM Mobility Solution              January 2015


   or live migration, from one server to another allows for dynamic load
   balancing or work distribution and thus it is a highly desirable
   feature [RFC7364].

   There are many challenges and requirements related to migration,
   mobility, and interconnection of Virtual Machines (VMs)and Virtual
   Network Elements (VNEs).  Retaining IP addresses after a move is a
   key requirement [RFC7364].  Such a requirement is needed in order to
   maintain existing transport connections.

   In view of many virtual machine mobility schemes that exist today,
   there is a desire to define a standard control plane protocol for
   virtual machine mobility.  The protocol should be based on IPv4 or
   IPv6.  In this document we specify such a protocol.

2.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119] and
   [I-D.ietf-nvo3-arch].

   This document uses the terminology defined in [RFC7364].  In addition
   we make the following definitions:

   Hot VM Mobility.  A given VM could be moved from one server to
   another in running state.

   Warm VM Mobility.  In case of warm VM mobility, the VM states are
   mirrored to the secondary server (or domain) at a predefined
   (configurable) regular intervals.  This reduces the overheads and
   complexity but this may also lead to a situation when both servers
   may not contain the exact same data (state information)

   Cold VM Mobility.  A given VM could be moved from one server to
   another in stopped or suspended state.

3.  Requirements

   This section states requirements on data center network virtual
   machine mobility.

   Data center network MUST support virtual machine mobility in IPv6.

   IPv4 SHOULD also be supported in virtual machine mobility.

   Virtual machine mobility protocol SHOULD not support host routes.
   Virtual machine mobility protocol SHOULD not support triangular



Sarikaya, et al.         Expires August 3, 2015                 [Page 3]


Internet-Draft            VM Mobility Solution              January 2015


   routing.  Virtual machine mobility protocol SHOULD not need to use
   tunneling except for handling packets in flight.

4.  Overview of the protocol

   Being able to move Virtual Machines dynamically, from one server to
   another allows for dynamic load balancing or work distribution and
   thus it is a highly desirable feature.  In a Layer-2 based data
   center approach, virtual machine moving to another server does not
   change its IP address.  Because of this an IP based virtual machine
   mobility protocol is not needed.  However, when a virtual machine
   moves, NVEs need to change their caches associating VM Layer 2 or MAC
   address with NVE's IP address.  Such a change enables NVE to send
   outgoing MAC frames addressed to the virtual machine.

   Virtual machine moves from its source NVE to a new, destination NVE.
   The move is initiated by the source NVE and is in the same L2 link,
   the virtual machine IP address(es) do not change but this virtual
   machine is now under a new NVE, previously communicating NVEs will
   continue to send their packets to the source NVE.  Address Resolution
   Protocol (ARP) cache in IPv4 or neighbor cache in IPv6 in the NVEs
   need to be updated.

   It takes a few seconds for a VM to move from its source NVE to the
   new destination one.  During this period, a tunnel is needed so that
   source NVE forwards packets to the destination NVE.

   In IPv4, the virtual machine immediately after the move sends a
   gratuitous ARP request message containing its IPv4 and Layer-2 or MAC
   address in its new NVE, destination NVE.  This message is sent to the
   broadcast address.  This message is sent as a broadcast MAC frame to
   the destination NVE.  NVE sends it as a MAC frame after encapsulation
   such as VXLAN [RFC7348] which includes an IPv4 and UDP header.  Outer
   MAC frame contains destination NVE's source MAC address in Outer
   Source MAC Address field.

   The frame being a broadcast frame needs to be carried to all nodes in
   the whole L2 link.  One mechanism is to establish a single VLAN
   [RFC6820].  All NVEs in the VLAN receive this frame and after
   decapsulation, ARP request is received.  All NVEs communicating with
   this virtual machine MUST update their ARP cache with the new values
   of the destination NVE's IPv4 address corresponding to the virtual
   machine's MAC address.  This update enables the communicating NVEs to
   send all new IP packets to the destination NVE under which the
   virtual machine is located after the virtual machine moved to its new
   place.

   IPv6 operation is slightly different:



Sarikaya, et al.         Expires August 3, 2015                 [Page 4]


Internet-Draft            VM Mobility Solution              January 2015


   In IPv6, the virtual machine immediately after the move sends an
   unsolicited neighbor advertisement message containing its IPv6
   address and Layer-2 MAC address in its new NVE, the destination NVE.
   This message is sent to the IPv6 Solicited Node Multicast Address
   corresponding to the target address which is VM's IPv6 address.  NVE
   sends it as a MAC frame after encapsulation which possibly includes
   an IPv6 and UDP header.  Outer MAC frame contains destination NVE's
   source MAC address in Outer Source MAC Address field.

   When querying for a target IP address, neighbor discovery protocol
   maps the target address, i.e. the IPv6 address of the destination NVE
   into an IPv6 Solicited Node multicast address which has the form
   FF02:0:0:0:0:1:FFXX:XXXX, containing the low-order 24 bits of the
   target address.  This frame is sent as multicast frame rather than
   broadcast as in ARP.  This has the benefit that the multicast frames
   do not necessarily need to be sent to all parts of the network, i.e.,
   the frames can be sent only to segments where listeners for the
   Solicited Node multicast address reside [RFC6820].

   NVE sends the packet to a link-local scope as FF02 indicates.  All
   member NVEs receive this packet after decapsulation.  All NVEs
   communicating with this virtual machine MUST update their neighbor
   cache with the new values of the destination NVE's IPv6 address
   corresponding to the virtual machine's MAC address.  This update
   enables the communication NVEs to send all new IP packets to the
   destination NVE under which the virtual machine is located after the
   virtual machine moved to its new place.

   Note that Gratuitous ARP or Unsolicited Neighbor Advertisement
   messages are normally used to inform link-layer or MAC address
   changes [RFC4861].  In VM mobility case, these messages are used to
   inform IP address changes.

5.  IPv6 Operation

5.1.  IPv6 Unsolicited Neighbor Advertisement

   The virtual machine as an IPv6 node sends unsolicited Neighbor
   Advertisement after it moves to the destination NVE to inform
   neighboring nodes of changes in its attachment.  The Neighbor
   Advertisement contains information required by nodes to determine the
   type of Neighbor Advertisement message, the sender's role on the
   network, and typically the link-layer address of the sender.

   In the IPv6 header of the Neighbor Advertisement message, you will
   find these settings:





Sarikaya, et al.         Expires August 3, 2015                 [Page 5]


Internet-Draft            VM Mobility Solution              January 2015


   For an unsolicited Neighbor Advertisement, the Source Address field
   is set to a unicast address of the virtual machine.  The Destination
   Address field is set to the link-local scope all-nodes multicast
   address (FF02::1).  The Hop Limit field is set to 255.  Source link-
   layer address option field is set to the virtual machine's MAC
   address.

   Assuming that the local link is Ethernet, the virtual machine
   encapsultes IPv6 datagram in a MAC frame.  In the Ethernet header of
   the Neighbor Advertisement message, you will find the following
   settings:

   The Source Address field is set to the MAC address of the virtual
   machine.  For an unsolicited Neighbor Advertisement, the Destination
   Address field is set to 33-33-00-00-00-01, which is the Ethernet MAC
   address corresponding to the link-local scope all-nodes multicast
   address.

5.2.  Encapsulation

   Encapsulation depends on the encapsulation layer protocol used in the
   data center.  VXLAN type of encapsulation details are TBD.

6.  IPv4 Operation

6.1.  Gratuitous ARP

   A gratuitous ARP message could be broadcast as an ARP request
   containing the sender's protocol address (SPA) in the target protocol
   address field (TPA=SPA), with the target hardware address (THA) set
   to zero.  An alternative is to broadcast an ARP reply with the
   sender's hardware and protocol addresses (SHA and SPA) duplicated in
   the target protocol address and target hardware address fields
   (TPA=SPA, THA=SHA).

   Gratuitous ARP message is not sent to solicit a reply.  Instead it
   updates any cached entries in the ARP tables of other NVEs that
   receive the packet.  The operation code may indicate a request or a
   reply because the ARP standard specifies that the opcode is only
   processed after the ARP table has been updated from the address
   fields [RFC0826].

   Gratuitous ARP request/reply message is sent in an Ethernet frame to
   NVE by the VM.  The destination address is set to the broadcast
   address for the hardware (all ones in the case of the 10Mbit
   Ethernet).





Sarikaya, et al.         Expires August 3, 2015                 [Page 6]


Internet-Draft            VM Mobility Solution              January 2015


6.2.  Encapsulation

   Encapsulation depends on the encapsulation layer protocol used in the
   data center.  VXLAN type of encapsulation details are TBD.

7.  Handling Packets in Flight

   Source hypervisor may receive packets from the virtual machine's
   ongoing communications and these packets should not be lost and they
   should be sent to the destination hypervisor to be delivered to the
   virtual machine.  The steps involved in handling packets in flight
   are as follows:

   Preparation Step  It takes some time, possibly a few seconds for a VM
      to move from its source hypervisor to a new destination one.
      During this period, a tunnel needs to be established so that
      source NVE forwards packets to the destination NVE.

   Tunnel Establishment - IPv6  Inflight packets are tunneled to the
      destination NVE using the encapsulation protocol such as VXLAN in
      IPv6.  Source NVE gets destination NVE address from NVA along with
      the request to move the virtual machine.

   Tunnel Establishment - IPv4  Inflight packets are tunneled to the
      destination NVE using the encapsulation protocol such as VXLAN in
      IPv4.  Source NVE gets destination NVE address from the gratuitous
      ARP message sent from the destination NVE.

   Tunneling Packets - IPv6  IPv6 packets are received for the migrating
      virtual machine encapsulated in an IPv6 header at the destination
      NVE.  Destination NVE decapsulates the packet and sends IPv6
      packet to the migrating VM.

   Tunneling Packets - IPv4  IPv4 packets are received for the migrating
      virtual machine encapsulated in an IPv4 header at the destination
      NVE.  Destination NVE decapsulates the packet and sends IPv4
      packet to the migrating VM.

   Stop Tunneling Packets  When source NVE receives gratuitous ARP, or
      Unsolicited Neighbor Advertisement source VTEP MUST stop tunneling
      packets.

8.  Moving Local State of VM

   After VM mobility related signaling (VM Mobility Registration
   Request/Reply), the virtual machine state needs to be transferred to
   the destination Hypervisor.  The state includes its memory and file




Sarikaya, et al.         Expires August 3, 2015                 [Page 7]


Internet-Draft            VM Mobility Solution              January 2015


   system.  Source NVE opens a TCP connection with destination NVE over
   which VM's memory state is transferred.

   File system or local storage is more complicated to transfer.  The
   transfer should ensure consistency, i.e. the VM at the destination
   should find the same file system it had at the source.  Precopying is
   a commonly used technique for transferring the file system.  First
   the whole disk image is transferred while VM continues to run.  After
   the VM is moved any changes in the file system are packaged together
   and sent to the destination Hypervisor which reflects these changes
   to the file system locally at the destination.

9.  Handling of Hot, Warm and Cold Virtual Machine Mobility

   Cold Virtual Machine mobility is facilitated by the VM initially
   sending an ARP or Neighbor Discovery message which enables the
   correspondents to direct their communication to the destination NVE
   in the link.  A registration message needs to be sent to the source
   NVE because the messages from all other correspondents will be routed
   to the source NVE.  Previous source NVEs in the chain (if any) need
   not be informed of the move.  Cold VM mobility also allows all
   previous source NVEs to delete binding update list entries of the VM.

   The VMs that are used for cold standby receive scheduled backup
   information but less frequently than that would be for warm standby
   option.  Therefore, the cold mobility option can be used for non-
   critical applications and services.

   In cases of warm standby option, the backup VMs receive backup
   information at regular intervals.  The duration of the interval
   determines the warmth of the standby option.  The larger the
   duration, the less warm (and hence cold) the standby option becomes.

   In case of hot standby option, the VMs in both primary and secondary
   domains have identical information and can provide services
   simultaneously as in load-share mode of operation.  If the VMs in the
   primary domain fails, there is no need to actively move the VMs to
   the secondary domain because the VMs in the secondary domain already
   contains identical information.  The hot standby option is the most
   costly mechanism for providing redundancy, and hence this option is
   utilized only for mission-critical applications and services.

10.  Virtual Machine Operation

   Virtual machines are not involved in any mobility signalling.  Once
   VM moves to the destination hypervisor, VM IP address does not change
   and VM should be able to continue to receive packets to its
   address(es).  This happens in hot VM mobility scenarios.



Sarikaya, et al.         Expires August 3, 2015                 [Page 8]


Internet-Draft            VM Mobility Solution              January 2015


   Virtual machine sends a gratuitous Address Resolution Protocol or
   unsolicited Neighbor Advertisement message upstream after each move.

10.1.  Virtual Machine Lifecycle Management

   Managing the lifecycle of VM includes creating a VM with all of the
   required resources, and managing them seamlessly as the VM migrates
   from one service to another during its lifetime.  The on-boarding
   process includes the following steps:

   1.  Sending an allowed (authorized/authenticated) request to Network
       Virtualization Authority (NVA) in an acceptable format with
       mandatory/optional virtualized resources {cpu, memory, storage,
       process/thread support, etc.} and interface information

   2.  Receiving an acknowledgement from the NVA regarding availability
       and usability of virtualized resources and interface package

   3.  Sending a confirmation message to the NVA with request for
       approval to adapt/adjust/modify the virtualized resources and
       interface package for utilization in a service.

11.  Security Considerations

   TBD.

12.  IANA Considerations

   This document makes no request to IANA.

13.  Acknowledgements

   The authors are grateful to Tom Herbert for his comments.

14.  References

14.1.  Normative References

   [RFC0826]  Plummer, D., "Ethernet Address Resolution Protocol: Or
              converting network protocol addresses to 48.bit Ethernet
              address for transmission on Ethernet hardware", STD 37,
              RFC 826, November 1982.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
              June 1999.



Sarikaya, et al.         Expires August 3, 2015                 [Page 9]


Internet-Draft            VM Mobility Solution              January 2015


   [RFC5213]  Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
              and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.

   [RFC5844]  Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
              Mobile IPv6", RFC 5844, May 2010.

   [RFC3007]  Wellington, B., "Secure Domain Name System (DNS) Dynamic
              Update", RFC 3007, November 2000.

   [RFC3022]  Srisuresh, P. and K. Egevang, "Traditional IP Network
              Address Translator (Traditional NAT)", RFC 3022, January
              2001.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.

   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
              September 2007.

   [RFC6820]  Narten, T., Karir, M., and I. Foo, "Address Resolution
              Problems in Large Data Center Networks", RFC 6820, January
              2013.

   [I-D.ietf-nvo3-vm-mobility-issues]
              Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L.,
              Dunbar, L., and A. Sajassi, "Network-related VM Mobility
              Issues", draft-ietf-nvo3-vm-mobility-issues-03 (work in
              progress), June 2014.

   [I-D.ietf-nvo3-arch]
              Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
              Narten, "An Architecture for Overlay Networks (NVO3)",
              draft-ietf-nvo3-arch-02 (work in progress), October 2014.

   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
              eXtensible Local Area Network (VXLAN): A Framework for
              Overlaying Virtualized Layer 2 Networks over Layer 3
              Networks", RFC 7348, August 2014.

   [RFC7364]  Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L.,
              and M. Napierala, "Problem Statement: Overlays for Network
              Virtualization", RFC 7364, October 2014.

   [RFC7365]  Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
              Rekhter, "Framework for Data Center (DC) Network
              Virtualization", RFC 7365, October 2014.



Sarikaya, et al.         Expires August 3, 2015                [Page 10]


Internet-Draft            VM Mobility Solution              January 2015


14.2.  Informative references

   [I-D.ietf-nvo3-nve-nva-cp-req]
              Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network
              Virtualization NVE to NVA Control Protocol Requirements",
              draft-ietf-nvo3-nve-nva-cp-req-03 (work in progress),
              October 2014.

   [I-D.wkumari-dcops-l3-vmmobility]
              Kumari, W. and J. Halpern, "Virtual Machine mobility in L3
              Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in
              progress), August 2011.

   [I-D.shima-clouds-net-portability-reqs-and-models]
              Shima, K., Sekiya, Y., and K. Horiba, "Network Portability
              Requirements and Models for Cloud Environment", draft-
              shima-clouds-net-portability-reqs-and-models-01 (work in
              progress), October 2011.

   [I-D.raggarwa-data-center-mobility]
              Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R.,
              Fang, L., and A. Sajassi, "Data Center Mobility based on
              E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", draft-
              raggarwa-data-center-mobility-07 (work in progress), June
              2014.

   [I-D.khasnabish-vmmi-problems]
              Khasnabish, B., Liu, B., Lei, B., and F. Wang, "Mobility
              and Interconnection of Virtual Machines and Virtual
              Network Elements", draft-khasnabish-vmmi-problems-03 (work
              in progress), December 2012.

Authors' Addresses

   Behcet Sarikaya
   Huawei USA
   5340 Legacy Dr. Building 3
   Plano, TX  75024

   Email: sarikaya@ieee.org


   Linda Dunbar
   Huawei USA
   5340 Legacy Dr. Building 3
   Plano, TX  75024

   Email: linda.dunbar@huawei.com



Sarikaya, et al.         Expires August 3, 2015                [Page 11]


Internet-Draft            VM Mobility Solution              January 2015


   Bhumip Khasnabish
   ZTE (TX) Inc.
   55 Madison Avenue, Suite 160
   Morristown, NJ  07960

   Email: vumip1@gmail.com, bhumip.khasnabish@ztetx.com


   Frank Xia
   Huawei USA
   Nanjing, China

   Phone: +1 972-509-5599
   Email: xiayangsong@huawei.com





































Sarikaya, et al.         Expires August 3, 2015                [Page 12]