Network Working Group W. Kumari
Internet-Draft Google
Intended status: Informational J. Halpern
Expires: February 12, 2012 Ericsson
August 11, 2011
Virtual Machine mobility in L3 Networks.
draft-wkumari-dcops-l3-vmmobility-00
Abstract
This document outlines how Virtual Machine mobility can be
accomplished in datacenter networks that are based on L3
technologies. It is not really intended to solve (or fully define)
the problem, but rather to outline it at a very high level to
determine if standardization within the IETF makes sense.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 12, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
Kumari & Halpern Expires February 12, 2012 [Page 1]
Internet-Draft L3 VM Mobility August 2011
described in the Simplified BSD License.
Table of Contents
1. Author Notes . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Requirements notation . . . . . . . . . . . . . . . . . . . 4
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
6. Security Considerations . . . . . . . . . . . . . . . . . . . . 7
7. Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7
9. Normative References . . . . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8
Kumari & Halpern Expires February 12, 2012 [Page 2]
Internet-Draft L3 VM Mobility August 2011
1. Author Notes
[ RFC Editor -- Please remove this section before publication! ]
1. Fix terminology section!
2. Rejigger Introduction into Intro and Background!
3. Do we need to extend the mapping service to include
(Customer_ID)? This will allow the use of overlapping addresses
by customers, but *does* limit the encapsulating technologies.
4. Currently I'm envisioning this as IP only. It would be fairly
trivial to make the query in be for the MAC address instead of
the IP. This does lead to some interesting issues, like what do
we do with broadcast, such as ARP? Have the mapping server reply
with all of the destinations and then have the source replicate
the packet?!
2. Introduction
There are many ways to design and build a datacenter network (and the
definition of what exactly a datacenter network is very vague!), but
in general they can be separated into two main classes, Layer-2 based
and Layer-3 based.
A Layer-2 based datacenter is one in which the majority of the
traffic is bridged (or switched) in a large, flat Layer-2 domain or
number of Layer-2 domains. VLANs are often employed to provide
customer isolation.
A Layer-3 based datacenter is one in which much of the communication
between hosts is switched. In this architecture there are a large
number of separate Layer-3 domains (for example, one subnet per rack)
and communication between hosts is usually routed. Communication
between hosts in the same subnet is (obviously) bridged / switched.
While customer isolation can be provided though careful layout and
access control lists, in general this architecture is better suited
to a single (or small number ) of users, such as a single
organization.
This delineation is obviously a huge simplification as the design and
build out of a datacenter has many dimensions and most real-world
datacenters have properties of both Layer-2 and Layer-3.
Virtual Machines are fast gaining popularity as they allow a
datacenter operator to more fully leverage their hardware resources
and, in essence provide statistical multiplexing of compute
resources. By selling multiple VMs on a single physical machine they
can maximise their investment, quickly allocate resources to
Kumari & Halpern Expires February 12, 2012 [Page 3]
Internet-Draft L3 VM Mobility August 2011
customers and potentially move VMs to other hosts when needed.
One of the factors driving the design of datacenters is the desire to
provide Virtual Machine Mobility. This allows an operator to move
the guest machine state from one machine to another, including all of
the network state, including keeping TCP connections alive. This
allows a datacenter operator to dynamically move guest machines
around to better allocate resources and take devices offline for
maintain without negatively impacting customers. VM Mobility can
even be used to move running machine around to provide better latency
- for example an instance can be moved from the East Coast of the USA
to Australia and back on a daily basis to "follow the sun".
In many cases VM Mobility requires that the source and destination
host machines are on the same layer-2 networks, which has lead to the
formation of large Layer-2 networks containing thousands (or tens of
thousands) of machines. This has led to some scaling concerns, such
as those being addressed in the ARMD Working Group. Some operators
are more comfortable running Layer-3 networks (and, to be honest
think that big Layer-2 networks are bad JuJu.)
This document outlines how VM Mobility can be designed to work in a
datacenter (or across datacenters) that are broken up into multiple
Layer-3 domains.
2.1. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Terminology
There is a whole industry build around these technologies, and as
with many new industries each vendor has their own, unique
nomenclature for the various parts. This is the terminology that we
are using within this document -- it may not line up with what others
call something, but "A rose by any other name..."
Guest network A virtual network connecting guest instances owned by
a customer. This is also referred to as a Guest LAN or Customer
Network.
Host Machine A machine that "hosts" guest (virtual) machines and
runs a Hypervisor. This is usually a powerful server, using
software and hardware to provide isolation between the guest
machines, often referred to as a Hypervisor. The host machine
emulates all of the functions of a "normal" machine so that,
ideally, the guest OS is unaware that it is not running on
Kumari & Halpern Expires February 12, 2012 [Page 4]
Internet-Draft L3 VM Mobility August 2011
dedicated hardware.
Gateway A device that provides access to external networks. It
provides services to
Guest Machine A "Virtual Machine" that run on a Host Machine.
Hypervisor A somewhat loose them that encompasses the hardware and
software that provides isolation between guest machines and
emulates all of the functions of bare metal server. This usually
includes such things as a virtual Network Interface Card (NIC), a
virtual CPU (usually assisted by specialized hardware in the host
machine's CPU), virtual memory, etc.
Mapping Service A service providing a mapping between guest machines
and host machines on which those guests are running. This mapping
service also provides mappings to Gateways that provide
connectivity to devices outside the customer networks.
Virtual Machine A synonym for Guest Machine
Virtual Switch A vitualized bridge created by the Hypervisor,
bridging the virtual NICs in the virtual machines and providing
access to the physical network.
4. Overview
By providing a "shim" layer within the network stack provided by the
Hypervisor (or Guest machine) we can create a virtual L2 network
connecting the machines belonging to a customer, even if these
machines are in different L3 networks (subnets).
When an application on a virtual machine sends a packet to a receiver
on another virtual machine, the operating system on the sending VM
needs to resolve the hardware address of the destination IP address
(using ARP in IPv4 or Neighbor Discovery / Neighbor Solicitation in
IPv6). To do this, it generates an ARP / NS packet and broadcasts /
multicasts this. As with all traffic sent by the VM, this is handed
to a virtual network card, which is simulated by the hypervisor (yes,
some VM technologies provide direct access to hardware, this will be
further discussed later). The hypervisor examines the packet to
provide access control (and similar) and then discards or munges or
sends the packet on the physical network. So far this describes the
current operation of VM networking.
In order to provide Layer-2 connectivity between a set of virtual
machines that run on host machines in different IP subnets (for
example, in a Layer-3 based datacenter (or even owned and operated by
different providers) we simply build an overlay network connecting
the host machines.
When the VM passes the ARP / NS packet to the virtual NIC, the
hypervisor intercepts the packet, records which VM generated the
Kumari & Halpern Expires February 12, 2012 [Page 5]
Internet-Draft L3 VM Mobility August 2011
request and extracts the IP address to be resolved. It then queries
a mapping server with the guest VM identifier and requested address
to determine the IP address of the host machine that hosts the
requested destination VM, the VM identifier on that host, and the
virtual MAC assigned to that virtual machine. Once the source guest
VM receives this information it caches it, and either encapsulates
the original ARP / NS in an encapsulation / tunneling mechanism
(similar to GRE) or simply synthesized an response and hands that
back to the source VM.
Presumably the source VM initialed resolution of the destination VM
because it wanted to send traffic to it, so shortly after the source
has resolved the destination it will try and send an data packet to
it. Once this data packet reaches the hypervisor on the source host
machine, the hypervisor simply encapsulates the packet in a tunneling
protocol and ships it over the IP network to the destination. When
the packet reaches the destination host machine, the packet is
decapsulated, the VM ID is extracted and the packet is passed up to
the destination VM. (TODO (WK): We need a tunneling mechanism that
has a place to put the VM ID -- find one, extend one or simply define
a new one). In many ways much of this is similar to LISP...
As the ability to resolve (and so send traffic to) a given machine
requires getting the information from a mapping server, communication
between hosts can be easily granted and revoked by the mapping
server. It is expected that the mapping server will know which VMs
are owned by each customer and will, by default, allow access between
only those VMs (and a gateway, see below), but if the operator so
chooses it can (but probably shouldn't!) allow access between VMs
owned by different customers, etc. In addition, because the mapping
server uses both the IP address and VM ID to look up the destination
information (and the traffic between VMs is encapsulated),
overlapping customer space is seamlessly handled (other than in the
pathological case where operators allow the customers to interconnect
at L2!).
Obviously just having a bunch of customer machines communication just
amongst themselves isn't very useful - the customer will want to
reach them externally, they will be serving traffic to / from the
Internet, etc. This functionality is provided by gateway machines -
these machines decapsualte traffic that is destined to locations
outside the virtual network and encapsulate traffic bound for ht
destination network, etc.
By encapsulating the packet (for example in a GRE packet) the
Hypervisor can provide a virtual, transparent network to the
receiver. In order to obtain the necessary information to
encapsulate the packet (for example, the IP address of the machine
Kumari & Halpern Expires February 12, 2012 [Page 6]
Internet-Draft L3 VM Mobility August 2011
hosting the receiving VM) the sending Hypervisor queries the Mapping
Service. This service maps the tuple of (Customer_ID, Destination
Address) to the host machine hosting the instance.
For example, if guest machine GA, owned by customer CA on host
machine HX wishes to send a packet to guest machine GB (also owned by
customer CA) on host machine HY it would generate an ARP request (or,
in IPv6 land, a neighbor solicitation) for GB. The Hypervisor
process on HX would intercept the ARP, and query the Mapping Service
for (CA, GB) which would reply with the address of HY (Hypervisor
also cache this information) The Hypervisor on HX would then
encapsulate the ARP request packet in a GRE packet, setting the
destination to be HY and sending the packet. When the Hypervisor
process on HY receives the packet it would decapsulate the packet and
hand it to the guest instance GB. This process is transparent to GA
and GB - as far as they are concerned, they are both connected to a
single network. While the above might sound like a heavyweight
operation, the hypervisor is (in general) already examining all
packets to provide a virtualized switch, performing access control
functions and similar - performing the mapping functionality and
encapsulation / decapsulation is not expected to be expensive.
The Mapping Service contains information about all of the guest
machines, which customer they are associated with, and routes to
external networks. If a guest machine sends a packet that is
destined to an external network (such as a host on the Internet), the
mapping server returns the address of a Gateway.
5. IANA Considerations
No action required.
6. Security Considerations
7. Privacy
There
8. Acknowledgements
I would like to thank Google for 20% time.
Kumari & Halpern Expires February 12, 2012 [Page 7]
Internet-Draft L3 VM Mobility August 2011
9. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
Authors' Addresses
Warren Kumari
Google
1600 Amphitheatre Parkway
Mountain View, CA 94043
US
Email: warren@kumari.net
Joel M. Halpern
Ericsson
Email: joel.halpern@ericsson.com
Kumari & Halpern Expires February 12, 2012 [Page 8]