draft-baker-openstack-ipv6-model-00

Network Working Group                                           F. Baker
Internet-Draft                                                 C. Marino
Intended status: Informational                                  I. Wells
Expires: April 20, 2015                                    Cisco Systems
                                                        October 17, 2014


                A Model for IPv6 Operation in OpenStack
                  draft-baker-openstack-ipv6-model-00

Abstract

   This is an overview of a network model for OpenStack, designed to
   dramatically simplify scalable network deployment and operations.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 20, 2015.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.





Baker, et al.            Expires April 20, 2015                 [Page 1]


Internet-Draft                                              October 2014


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  What is OpenStack?  . . . . . . . . . . . . . . . . . . .   3
     1.2.  OpenStack Scaling Issues  . . . . . . . . . . . . . . . .   4
   2.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   5
     2.1.  Design approach . . . . . . . . . . . . . . . . . . . . .   5
     2.2.  Requirements Language . . . . . . . . . . . . . . . . . .   5
     2.3.  Multiple Data Centers . . . . . . . . . . . . . . . . . .   5
     2.4.  Large Data Centers  . . . . . . . . . . . . . . . . . . .   5
     2.5.  Multi-tenancy . . . . . . . . . . . . . . . . . . . . . .   6
     2.6.  Isolation . . . . . . . . . . . . . . . . . . . . . . . .   6
       2.6.1.  Inter-tenant isolation  . . . . . . . . . . . . . . .   6
       2.6.2.  Intra-tenant isolation  . . . . . . . . . . . . . . .   6
     2.7.  Operational Simplicity  . . . . . . . . . . . . . . . . .   6
     2.8.  Address space . . . . . . . . . . . . . . . . . . . . . .   7
     2.9.  Data Center Federation  . . . . . . . . . . . . . . . . .   7
   3.  Models  . . . . . . . . . . . . . . . . . . . . . . . . . . .   7
     3.1.  Configuration Model . . . . . . . . . . . . . . . . . . .   7
     3.2.  Data Center Model . . . . . . . . . . . . . . . . . . . .   7
       3.2.1.  Tenant Address Model  . . . . . . . . . . . . . . . .   9
       3.2.2.  Use of Global Addresses by the Data Center  . . . . .  12
     3.3.  Inter-tenant security services  . . . . . . . . . . . . .  12
       3.3.1.  Label Definition  . . . . . . . . . . . . . . . . . .  13
       3.3.2.  IPv6 Tenant Isolation using the Label . . . . . . . .  14
       3.3.3.  Isolation in Routing  . . . . . . . . . . . . . . . .  14
     3.4.  BCP 38 Ingress Filtering  . . . . . . . . . . . . . . . .  15
     3.5.  Moving virtual machines . . . . . . . . . . . . . . . . .  15
       3.5.1.  Recreation of the VM  . . . . . . . . . . . . . . . .  15
       3.5.2.  Live Migration of a Running Virtual Machine . . . . .  16
   4.  OpenStack implications  . . . . . . . . . . . . . . . . . . .  18
     4.1.  Configuration implications  . . . . . . . . . . . . . . .  18
     4.2.  vSwitch implications  . . . . . . . . . . . . . . . . . .  18
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  19
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
   7.  Privacy Considerations  . . . . . . . . . . . . . . . . . . .  19
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  19
   9.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  20
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  20
     10.2.  Informative References . . . . . . . . . . . . . . . . .  20
   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  24
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  24








Baker, et al.            Expires April 20, 2015                 [Page 2]


Internet-Draft                                              October 2014


1.  Introduction

   OpenStack, and its issues.

1.1.  What is OpenStack?

   OpenStack is a cloud computing orchestration solution developed using
   an open source community process.  It consists of a collection of
   'projects', each implementing the creation, control and
   administration of tenant resources.  There are separate OpenStack
   projects for managing compute, storage and network resources.

   Neutron is the project that manages OpenStack networking.  It exposes
   a northbound API to the other OpenStack projects for programmatic
   control over tenant network connectivity.  The southbound interface
   is implemented as one or more device driver plugins that are built to
   interact with specific devices in the network.  This approach
   provides the flexibility to deploy OpenStack networking using a range
   of alternative techniques.

   An OpenStack tenant is required to create what OpenStack identifies
   as a 'Network' connecting their virtual machines.  This Network is
   instantiated via the plugins as either a layer 2 network, a layer 3
   network, or as an overlay network.  The actual implementation is
   unknown to the tenant.  The technology used to provide these networks
   is selected by the OpenStack operator based upon the requirements of
   the cloud deployment.

   The tenant also is required to specify a 'Subnet' for each Network.
   This specification is made by providing a CIDR prefix for IPv4
   address allocation via DHCP and for IPv6 address allocation via DHCP
   or SLAAC.  This address range may be from within the address range of
   the datacenter (non-overlapping), or overlapping RFC 1918 addresses.
   Tenants may create multiple Networks, each with its own Subnet.

   An OpenStack Subnet is a logical layer 2 network and requires layer 3
   routing for packets to exit the Subnet.  This is achieved by
   attaching the Subnet to a Neutron Router.  The Neutron router
   implements Network Address Translation for external traffic from
   tenant networks as well for providing connectivity to tenant networks
   from the outside.  Using Linux utilities, OpenStack can support
   overlapping RFC 1918 addresses between tenants.

   OpenStack Subnets are typically implemented as VLANs in a datacenter.
   When tenant scalability requirement grow large, an overlay approach
   is typically used.  Because of the difficulties in scaling and
   administering large layer 2 and/or overlay networks, some OpenStack




Baker, et al.            Expires April 20, 2015                 [Page 3]


Internet-Draft                                              October 2014


   integrations chose not to provide isolated Subnets and simply offer
   tenants a layer 3 based network alternative.

   OpenStack uses Layer 3 and Layer 2 Linux utilities on hosts to
   provide protection against IP/MAC spoofing and ARP poisoning.

1.2.  OpenStack Scaling Issues

   One of the fundamental requirements of OpenStack Networking (Neutron)
   is to provide scalable, isolated tenant networks.  Today this is
   achieved via L2 segmentation using either a) standard 802.1Q VLANs or
   b) an overlay approach based on one of several L2 over L3
   encapsulation techniques available today such as 802.1ad, VXLAN, STT
   or NVGRE.

   However, these approaches still struggle to provide scalable,
   transparent, manageable, high performing isolated tenant networks.
   VLAN's don't scale beyond 4096 (2^12) networks and have complex
   trunking requirements when tenants span host and racks.  IEEE 802.1ad
   (QinQ) partially solves that, but adds another limit - at most 2^12
   tenants, each of which may have 2^12 VLANs.  IP Encapsulation
   introduces additional complexity on host computers running
   hypervisors as well as impact performance of tenant applications
   running on virtual machines.  Overlay based isolation techniques may
   also impair traditional network monitoring and performance management
   tools.  Moreover, when these isolated (L2) networks require external
   access to other networks or the public Internet, they require even
   more complex solutions to accommodate overlapping IP prefixes and
   network address translation (NAT).

   As more capabilities are built on to these layer 2 based 'virtual'
   networks, complexity continues to grow.

   This draft presents a new Layer 3 based approach to OpenStack
   networking using IPv6 that can be deployed natively on IPv6 networks.
   It will be shown that this approach can provide tenant isolation
   without the limitations of existing L2 based alternatives, as well as
   deliver high performance networks transparently using a simplified
   tenant network connectivity model without the overhead of
   encapsulation or managing overlapping IP addresses and address
   translations.  We note that some large content providers, notably
   Google and Facebook [FaceBook-IPv6], are going in exactly this
   direction.








Baker, et al.            Expires April 20, 2015                 [Page 4]


Internet-Draft                                              October 2014


2.  Requirements

   In this section, we attempt to list critical requirements.

2.1.  Design approach

   As a design approach, we presume an IPv6-only data center in a world
   that might have IPv4 clients outside of it.  This design explicitly
   does not depend on VLANs, QinQ, VXLAN, MPLS, Segment Routing, IP/IP
   or GRE tunnels, or anything else.  Data center operators remain free
   to use any of those tools, but they are not required.  If we can do
   everything required for OpenStack networking with IPv6 alone, these
   other networking technologies may be used as optimizations.  If we
   are unable to satisfy the OpenStack requirements that also is
   something we wish to know and understand.

   OpenStack is designed to be used by many cloud users or tenants.
   Scalable, secure and isolated tenant networks are a requirement for
   building a multi-tenant cloud datacenter.  The OpenStack
   administrator/operator can design and configure a cloud environment
   to provide network isolation using the approach described in this
   document, alone, or in combination with any of the above network
   technologies . However, all the details of the underlying technology
   and implementation details are completely transparent to the tenant
   itself.

2.2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

2.3.  Multiple Data Centers

   A common requirement in network and data center operations is
   reliability, serviceability, and maintainability of their operations
   in the presence of an outage.  At minimum, this implies multihoming
   in the sense of having multiple upstream ISPs; in many cases, it also
   implies multiple and at times duplicate data centers, and tenants
   stretched or able to be readily moved or recreated across multiple
   data centers.

2.4.  Large Data Centers

   Microsoft Azure [Microsoft-Azure] has purchased a 100 acre piece of
   land for the construction of a single data center.  In terms of
   physical space, that is enough for a data center with about half a
   million 19" RETMA racks.



Baker, et al.            Expires April 20, 2015                 [Page 5]


Internet-Draft                                              October 2014


   With even modest virtual machine density, infrastructure at this
   scale easily exhausts the 16M available RFC 1918 private addresses
   (i.e. 10.0.0.0/8) and explains the recent efforts by webscale cloud
   providers to deploy IPv6 throughout their new datacenters.

2.5.  Multi-tenancy

   While it is possible that a single tenant would require a 100 acre
   data center, it would be unusual.  In most such data centers, one
   would expect a large number of tenants.

2.6.  Isolation

   Isolation is required between tenants, and at times between
   components of a single tenant.

2.6.1.  Inter-tenant isolation

   A "tenant" is defined as a set of resources under common
   administrative control.  It may be appropriate for tenants to
   communicate with each other within the context of an application or
   relationships among their owners or operators.  However, unless
   specified otherwise, tenants are intended to operate as if they were
   on their own company's premises and be isolated from one another.

2.6.2.  Intra-tenant isolation

   There are often security compartments within a corporate network,
   just as there are security barriers between companies.  As a result,
   there is a recursive isolation requirement: it must be possible to
   isolate an identified part of a tenant from another part of the same
   tenant.

2.7.  Operational Simplicity

   To the extent possible (and, for operators, the concept will bring a
   smile), operation of a multi-location multi-tenant data center, and
   the design of an application that runs in one, should be simple and
   uncoupled.

   As discussed in [RFC3439], this requires that the operational model
   required to support a tenant with only two physical machines, or
   virtual machines in the same physical chassis, should be the same as
   that required to support a tenant running a million machines in a
   multiple data center application.  Additionally, this same
   operational model should scale from running a single tenant up to
   many thousands of tenants.




Baker, et al.            Expires April 20, 2015                 [Page 6]


Internet-Draft                                              October 2014


2.8.  Address space

   As described in Section 1.1, currently, an OpenStack tenant is
   required to specify a Subnet's CIDR prefix for IP address allocation.
   With this proposal, this is no longer required.

2.9.  Data Center Federation

   It must be possible to extend the architecture across multiple data
   centers.  These data centers may be operated by distinct entities,
   with security policies that apply to their interconnection.

3.  Models

3.1.  Configuration Model

   In the OpenStack model, the cloud computing user, or tenant, is
   building something Edward Yourdon might call a "structured design"
   for the application they are building.  In the 1960's, when Yourdon
   started specifying process and data flow diagrams, these were job
   steps in a deck of Job Control Language cards; in OpenStack, they are
   multiple, individual machines, virtual or physical, running parts of
   a structured application.

   In these, one might find a load balancer that receives and
   distributes requests to request processors, a set of stored data
   processing applications, and the storage they depend on.  What is
   important to the OpenStack tenant is that "this" and "that"
   communicate, potentially using or not using multicast communications,
   and don't communicate with "the other".  Typically unnecessary is any
   and all information regarding how this communication actually needs
   to occur (i.e. placement of routers, switches, and IP subnets,
   prefixes, etc.).

   An IPv6 based networking model simplifies the configuration of tenant
   connectivity requirements.  Global reachability eliminates the need
   for network address translation devices as well as tenant-specified
   Subnet prefixes (Section 2.8), although tenant-specified ULA prefixes
   or prefixes from the owner of the tenant's address space are usable
   with it.  With the exception of network security functions, no
   network devices need to be specified or configured to provide
   connectivity.

3.2.  Data Center Model

   The premises of the routing and addressing models are that





Baker, et al.            Expires April 20, 2015                 [Page 7]


Internet-Draft                                              October 2014


   o  The address tells the routing system what topological location to
      deliver a packet to, and within that, what interface to deliver it
      to, and

   o  The routing system should deliver traffic to a resource if and
      only if the sender is authorized to communicate with that
      resource.

   o  Contrary to the OpenStack Neutron Networking Model, tunnels are
      not necessary to provide tenant network isolation; we include
      resources in a tenant network by a Role-based Access Control
      model, but address the tenant resources within the data center in
      a manner that scales for the data center.

   We expect to find the data center to be composed of some minimal unit
   of connectivity and maintenance, such as a rack or row, and equipped
   with one or more Top-of-Rack or End-of-Row switch(es); each
   configured with at least one subnet prefix, perhaps one per such
   switch.  For the purposes of this note, these will be called Racks
   and Top-of-Rack switches, and when applied to other architectures the
   appropriate translation needs to be imposed.

   Figure 1 describes a relatively typical rack design.  It is a simple
   fat-tree architecture, with every device in a pair, so that any
   failure has an immediate hot backup.  There are other common designs,
   such as those that consider each rack to be in a "row" and in a
   "column", with a distribution switch in each.
























Baker, et al.            Expires April 20, 2015                 [Page 8]


Internet-Draft                                              October 2014


                    Distribution   Switches connecting
                    / Layer /      racks in a pod, and
                   /       /       connecting pods
                  /       /
               +-+-+   +-+-+       Mutual backup TOR
             +-+TOR+---+TOR+-+     switches
             | +---+   +---+ |
             | +-----------+ |
             +-+    host   +-+     Each host has two
             | +-----------+ |     Ethernet interfaces
             +-+    host   +-+     with separate subnets
             | +-----------+ |
             | .           . |
             | .           . |
             | .           . |     Design premise: complete
             | +-----------+ |     redundancy, with every
             +-+    host   +-+     switch and every cable
             | +-----------+ |     backed up by a doppelganger
             +-+    host   +-+
               +-----------+

                       Figure 1: Typical Rack Design

3.2.1.  Tenant Address Model

   Tenant resources need to be told, by configuration or naming, the
   addresses of resources they communicate with.  This is true
   regardless of their location or relationship to a given tenant.  In
   environments with well-known addresses, this gets complex and
   unscalable.  This was learned very early with Internet hostnames; a
   single "hostfile" was maintained by a central entity and updated
   daily, which quickly became unmanageable.  The result was the
   development of the Domain Name System; the level of indirection
   between names and addresses made the system more scalable.  It also
   facilitated ongoing maintenance.  If a service needed multiple
   servers, or a server needed to change its address, that was trivially
   solved by changing the DNS Resource Record; every resource that
   needed the new address would obtain it the next time it queried the
   DNS.  It has also facilitated the IPv4/IPv6 transition; a resource
   that has an IPv6 address is given a AAAA record in addition to, or to
   replace, its IPv4 A record.

   Similarly, today's reliance on NAPT technology frequently limits the
   capabilities of an application.  It works reasonably well for client
   accessing a client/server application when the protocol does not
   carry addressing information.  If there is an expectation that one
   resource's private address will be meaningful to a peer, such as when
   an SIP client presents its address in SDP or an HTTP server presents



Baker, et al.            Expires April 20, 2015                 [Page 9]


Internet-Draft                                              October 2014


   an address in a redirection, either the resource needs to understand
   the difference between an "inside" and an "outside" address and know
   which is which, or it needs a traversal algorithm that changes the
   addresses.  For peer-to-peer applications, this ultimately means
   providing a network design in which those issues don't apply.

   IPv6 provides global addresses, enough of them that there is no real
   expectation of running out any time soon, making these issues go
   away.  In addition, with the IPv4 address space running out, both
   globally and within today's large datacenters, there aren't
   necessarily addresses available for an IPv4 application to use, even
   as a floating IP address.

   Hence, the model we propose is that a resource in a tenant is told
   the addresses of the other resources with which it communicates.
   They are IPv6 addresses, and the data center takes care to ensure
   that inappropriate communications do not take place.

3.2.1.1.  Use of Global Unicast Addresses by Tenants

   A unicast address in an IP network identifies a topological location,
   by association with an IP prefix (which might be for a subnet or any
   aggregate of subnets).  It also identifies a single interface located
   within that subnet, which may or may not be instantiated at the time.
   We assume that there is a subnet associated with a top-of-rack switch
   or whatever its counterpart would be in a given network design, and
   that the physical and virtual machines located in that rack have
   addresses in that subnet.  This is the same prefix that is used by
   the datacenter administrator.

3.2.1.2.  Unique Local Addresses

   A common requirement is that tenants have the use of some form of
   private address space.  In an IPv6 network, a Unique Local IPv6
   Unicast Address [RFC4193] may be used to accomplish this.  In this
   case, however, the addresses will need to be explicitly assigned to
   physical or virtual machines used by the tenant, perhaps using DHCP
   or YANG, where a standard IPv6 address could be allocated using SLAAC
   or other technologies.

   The value of this is that Are we suggesting 2.8 address model is for
   GUA and that ULAs are a corner case in the data center.  Tenants have
   no routing information or other awareness of the prefix.  This is not
   intended for use behind a NAPT; resources that need accessibility to
   or from resources outside the tenant, and especially outside the data
   center, need global addresses.





Baker, et al.            Expires April 20, 2015                [Page 10]


Internet-Draft                                              October 2014


3.2.1.3.  Multicast Domains

   Multicast capability is a capability enjoyed by some groups of
   resources, that they can send a single message and have it delivered
   to multiple destinations roughly simultaneously.  At the link layer,
   this means sending a message once that is received by a specified set
   of recipient resources using hardware capabilities.  IP multicast can
   be implemented on a LAN as specified in [RFC4291], and can also cross
   multiple subnets directly, using routing protocols such as Protocol
   Independent Multicast [RFC4601] [RFC4602] [RFC4604] [RFC4605]
   [RFC4607].  In IPv6, the model would be that when a group of
   resources is created with a multicast capability, it is allocated one
   or more source-specific transient group addresses as defined in
   section 2.7 of that RFC.

3.2.1.4.  IPv4 Interaction Model

   OpenStack IPv4 Neutron uses "floating IPv4 addresses" - global or
   public IPv4 addresses - to enable remote resources to connect to
   tenant private network endpoints.  Tenant end points can connect out
   to remote resources through an "External Default Gateway".  Both of
   these depend on NAPT (DNAT/SNAT) to ensure that IPv4 end points are
   able communicate and at the same time ensure tenant isolation.

   If IPv6 is deployed in a data center, there are fundamentally two
   ways a tenant can interact with IPv4 peers:

   o  it can run existing IPv4 OpenStack technology in parallel with the
      IPv6 deployment, or

   o  It can have a translator at the data center edge (such as
      described in [I-D.anderson-v6ops-siit-dc]) that associates an IPv4
      address or address plus port with an IPv6 address or address plus
      port.  The IPv4 address, in this model, becomes a floating IPv4
      address attached to an internal IPv6 address.  The "data center
      edge" is, by definition, a system that has IPv4 reachability to at
      least the data center's upstream ISP and all IPv4 systems in the
      data center, IPv6 connectivity to all of the IPv6 systems in the
      data center, and (if the upstream offers IPv6 service) IPv6
      connectivity to the upstream as well.

   The first model is complex, if for no other reason than that there
   are two fundamental models in use, one with various encapsulations
   hiding overlapping address space and one with non-overlapping address
   space.

   To simplify the network, as noted in Section 2.1, we suggest that the
   data center be internally IPv6-only, and IPv4 be translated to IPv6



Baker, et al.            Expires April 20, 2015                [Page 11]


Internet-Draft                                              October 2014


   at the data center edge.  The advantage is that it enables IPv4
   access while that remains in use, and as IPv6 takes over, it reduces
   the impact of vestigial support for IPv4.

   The SIIT Translation model in [I-D.anderson-v6ops-siit-dc] has IPv4
   traffic come to an translator [RFC6145][RFC6146] having a pre-
   configured translation, resulting in an IPv6 packet indistinguishable
   from the packet the remote resource might have sent had it been
   IPv6-capable, with one exception.  The IPv6 destination address is
   that of the endpoint (the same address advertised in a AAAA record),
   but the source address is an IPv4-Embedded IPv6 Address [RFC6052]
   with the IPv4 address of the sender embedded in a prefix used by the
   translator.

   Access to external IPv4 resources is provided in the same way: an
   DNS64 [RFC6147] server is implemented that contains AAAA records with
   an IPv4-Embedded IPv6 Address [RFC6052] with the IPv4 address of the
   remote resource embedded in a prefix used by the translator.

   This follows the Framework for IPv4/IPv6 Translation [RFC6144],
   making the internal IPv4 address a floating IP address attached to an
   internal IPv6 address, and the external "dial-out" address
   indistinguishable from a native IPv6 address.

3.2.1.5.  Legacy IPv4 OpenStack

   The other possible mdel, applicable to IPv4-only devices, is to run a
   legacy OpenStack environment inside IPv6 tunnels.  This preserves the
   data center IPv6-only, and enables IPv4-only applications, notably
   those whose licenses tie them to IPv4 addresses, to run.  However, it
   adds significant overhead in terms of encapsulation size and network
   management complexity.

3.2.2.  Use of Global Addresses by the Data Center

   Every rack and physical host requires an IP prefix that is reachable
   by the OpenStack operator.  This will normally be a global IPv6
   unicast address.  For scalability purposes, as isolation is handled
   separately, this is normally the same prefix as is used by tenants in
   the rack.

3.3.  Inter-tenant security services

   In this model, the a label is used to identify a project/tenant or
   part of a project/tenant, and is used to facilitate access control
   based on the label value.  In Section 1, we noted the limitation of
   802.11ad QinQ, in that with Metro Ethernet networks it assigns a VLAN
   ID to a customer and a second VLAN id to a VLAN used by that



Baker, et al.            Expires April 20, 2015                [Page 12]


Internet-Draft                                              October 2014


   customer, and is in that sense limited to 2^12 customers.
   Alternatively, it could be considered to be 2^12 geographies, with
   2^12 tenant VLANs in each, meaning that the data center operator has
   to think about placement of tenant VMs in places their VLANs reach.
   Labels manage that space with different limits.

3.3.1.  Label Definition

   Three different types of labels are in view here: the IPv6 Flow
   Label, a federated identity in the IPv6 Destination Header, or a
   federated identity in the IPv6 Hop-by-Hop Header.  These have
   different capabilities and implications.

3.3.1.1.  Flow Label

   The IPv6 flow label may be used to identify a tenant or part of a
   tenant, and to facilitate access control based on the flow label
   value.  The flow label is a flat 20 bits, facilitating the
   designation of 2^20 (1,048,576) tenants without regard to their
   location. 1,048,576 is less than infinity, but compared to current
   data centers is large, and much simpler to manage.

   Note that this usage differs from the current IPv6 Flow Label
   Specification [RFC6437].  It also differs from the use of a flow
   label recommended by the IPv6 Specification [RFC2460], and the
   respective usages of the flow label in the Resource ReSerVation
   Protocol [RFC2205] and the previous IPv6 Flow Label Specification
   [RFC3697], and the projected usage in Low-Power and Lossy Networks
   [RFC5548][RFC5673].  Within a target domain, the usage may be
   specified by the domain.  That is the viewpoint taken in this
   specification.

3.3.1.2.  Federated Identity

   Alternatively, [I-D.baker-openstack-rbac-federated-identity] defines
   a numeric label usable in network layer Role-based Access Control.
   The syntax is a sequence of positive integers; the semantics of the
   integers are defined by the administration(s) using them.  A single
   integer may be used in the same way as the Flow Label in
   Section 3.3.1.1, but without the 20 bit limitation.  A pair of
   integers might, for example, signify a data center and its tenants,
   or a company and its departments.  Three integers might, again as an
   example, signify one data center operator among a set of data center
   operators, the second the clients of those operators, and the third
   subsets of those clients.  In any event, as in Section 3.3.1.1, it
   identifies a set of machines, physical or virtual, that are
   authorized to communicate freely among themselves, but may or may not
   be authorized to communicate with other equipment.



Baker, et al.            Expires April 20, 2015                [Page 13]


Internet-Draft                                              October 2014


   Carried in the Destination Options Extension Header, this option is
   visible to Neutron in the originating and terminating chassis.  This
   limits overheads in intermediate switches, and enables filtering in
   the destination system.

   Carried in the Hop-by-Hop Extension Header, this option is visible to
   routers en route in addition to Neutron in the originating and
   terminating chassis, at the possible cost of some processing
   overhead.  This facilitates filtering at any system.

3.3.2.  IPv6 Tenant Isolation using the Label

   Neutron today already implements a form of Network Ingress Filtering
   [RFC2827].  It prevents the VM from emitting traffic with an
   unauthorized MAC, IPv4, or IPv6 source address.

   In addition to this, in this model Neutron prevents the VM from
   transmitting a network packet with an unauthorized label value.  The
   VM MAY be configured with and authorized to use one of a short list
   of authorized label values, as opposed to simply having its choice
   overridden; in that case, Neutron verifies the value and overwrites
   one not in the list.

   When a hypervisor is about to deliver an IPv6 packet to a VM, it
   checks the label value against a list of values that the VM is
   permitted to receive.  If it contains an unauthorized value, the
   hypervisor discards the packet rather than deliver it.  If the Flow
   Label is in use, Neutron zeros the label prior to delivery.

   The intention is to hide the label value from malware potentially
   found in the VM, and enable the label to be used as a form of first
   and last hop security.  This provides basic tenant isolation, if the
   label is assigned as a tenant identifier, and may be used more
   creatively such as to identify a network management application as
   separate from a managed resource.

3.3.3.  Isolation in Routing

   This concept has the weakness that if a packet is not dropped at its
   source, it is dropped at its destination.  It would be preferable for
   the packet to be dropped in flight, such as at the top-of-rack switch
   or an aggregation router.

   Concepts discussed in IS-IS LSP Extendibility
   [I-D.baker-ipv6-isis-dst-flowlabel-routing][RFC5120][RFC5308] and
   OSPFv3 LSA Extendibility [I-D.baker-ipv6-ospf-dst-flowlabel-routing]
   [I-D.ietf-ospf-ospfv3-lsa-extend][RFC5340] may be used to isolate
   tenants in the routing of the data center backbone.  This is not



Baker, et al.            Expires April 20, 2015                [Page 14]


Internet-Draft                                              October 2014


   strictly necessary, if Section 3.3.2 is uniformly and correctly
   implemented.  It does, however, present a second defense against
   misconfiguration, as the filter becomes ubiquitous in the data center
   and as scalable as routing.

3.4.  BCP 38 Ingress Filtering

   As noted in Section 3.3.2, Neutron today implements a form of Network
   Ingress Filtering [RFC2827].  It prevents the VM from emitting
   traffic with an unauthorized MAC, IPv4, or IPv6 source address.

   In IPv6, this is readily handled when the address or addresses used
   by a VM are selected by the OpenStack operator.  It may then
   configure a per-VM filter with the addresses it has chosen, following
   logic similar to the Source Address Validation Solution for DHCP
   [I-D.ietf-savi-dhcp] or SEND [RFC7219].  This is also true of IPv6
   Stateless Address Autoconfiguration (SLAAC) [RFC4862] when the MAC
   address is known and not shared.

   However, when SLAAC is in use and either the MAC address is unknown
   or SLAAC's Privacy Extensions [RFC4941][RFC7217], are in use, Neutron
   will need to implement the provisions of FCFS SAVI: First-Come,
   First-Served Source Address Validation [RFC6620] in order to learn
   the addresses that a VM is using and include them in the per-VM
   filter.

3.5.  Moving virtual machines

   This design supports these kinds of required layer 2 networks with
   the additional use of a layer 2 over layer 3 encapsulation and
   tunneling protocol, such as VXLAN [I-D.mahalingam-dutt-dcops-vxlan].
   The important point here being that these overlays are used to
   address specific tenant network requirements and NOT deployed to
   remove the scalability limitations of OpenStack networking.

   There are at least three ways VM movement can be accomplished:

   o  Recreation of the VM

   o  VLAN Modification

   o  Live Migration of a Running Virtual Machine

3.5.1.  Recreation of the VM

   The simplest and most reliable is to

   1.  Create a new VM in the new location,



Baker, et al.            Expires April 20, 2015                [Page 15]


Internet-Draft                                              October 2014


   2.  Add its address to the DNS Resource Record for the name, allowing
       new references to the name to send transactions there,

   3.  Remove the old address from the DNS Resource Record (including
       the SIIT translation, if one exists), ending the use of the old
       VM for new transactions,

   4.  Wait for the period of the DNS Resource Record's lifetime
       (including the SIIT translation, if one exists), as it will get
       new requests throughout that interval,

   5.  Wait for the for the old VM to finish any outstanding
       transactions, and then

   6.  Kill the old VM.

   This is obviously not movement of an existing VM, but preservation of
   the same number and function of VMs by creation of a new VM and
   killing the old.

3.5.2.  Live Migration of a Running Virtual Machine

   At http://blogs.vmware.com/vsphere/2011/02/vmotion-whats-going-on-
   under-the-covers.html, VMWare describes its capability, called
   vMotion, in the following terms:

   1.  Shadow VM created on the destination host.

   2.  Copy each memory page from the source to the destination via the
       vMotion network.  This is known as preCopy.

   3.  Perform another pass over the VM's memory, copying any pages that
       changed during the last preCopy iteration.

   4.  Continue this iterative memory copying until no changed pages
       (outstanding to be-copied pages) remain or 100 seconds elapse.

   5.  Stun the VM on the source and resume it on the destination.

   In a native-address environment, we add three steps:

   1.  Shadow VM created on the destination host.

   2.  Copy each memory page from the source to the destination via the
       vMotion network.  This is known as preCopy.

   3.  Perform another pass over the VM's memory, copying any pages that
       changed during the last preCopy iteration.



Baker, et al.            Expires April 20, 2015                [Page 16]


Internet-Draft                                              October 2014


   4.  Continue this iterative memory copying until no changed pages
       (outstanding to be-copied pages) remain or 100 seconds elapse.

   5.  Stitch routing for the old address.

   6.  Stun the VM on the source and resume it on the destination.

   7.  Renumber the VM as instructed in [RFC4192].

   8.  Unstitch routing for the old address.

   If the VM is moved within the same subnet (which usually implies the
   same rack), there is no stitching or renumbering apart from ensuring
   that the MAC address moves with the VM.  When the VM moves to a
   different subnet, however, we need to restitch routing, at least
   temporarily.  This obviously calls for some definitions.

   Stitching Routing:  The VM is potentially in communication with two
      sets of peers: VMs in the same subnet, and VMs in different
      subnets.

      *  The router in the new subnet is instructed to advertise a host
         route (/128) to the moved VM, and to install a static route to
         the old address with the VM's address in the new subnet as its
         next hop address.  Traffic from VMs from other subnets will now
         follow the host route to the VM in its new location.

      *  The router in the old subnet is instructed to direct LAN
         traffic to the VM's MAC Address to its IPv6 forwarding logic.
         Traffic from other VMs in the old subnet will now follow the
         host route to the moved VM.

   Renumbering:  This step is optional, but is good hygiene if the VM
      will be there a while.  If the VM will reside in its new location
      only temporarily, it can be skipped.

      Note that every IPv6 address, unlike an IPv4 address, has a
      lifetime.  At least in theory, when the lifetime expires, neighbor
      relationships with the address must be extended or the address
      removed from the system.  The Neighbor Discovery [RFC4861] process
      in the subnet router will periodically emit a Router
      Advertisement; the VM will gain an IPv6 address in the new subnet
      at that time if not earlier.  As described in [RFC4192], DNS
      should be changed to report the new address instead of the old.
      The DNS lifetime and any ambient sessions using the old address
      are now allowed to expire.  That this point, any new sessions will
      be using the new address, and the old is vestigial.




Baker, et al.            Expires April 20, 2015                [Page 17]


Internet-Draft                                              October 2014


      Waiting for sessions using the address to expire can take an
      arbitrarily long interval, because the session generally has no
      knowledge of the lifetime of the IPv6 address.

   Unstitching Routing:  This is the reverse process of stitching.  If
      the VM is renumbered, when the old address becomes vestigial, the
      address will be discarded by the VM; if the VM is subsequently
      taken out of service, it has the same effect.  At that point, the
      host route is withdrawn, and the MAC address in the old subnet
      router's tables is removed.

4.  OpenStack implications

4.1.  Configuration implications

   1.  Neutron MUST be configured with a pre-determined default label
       value for each tenant virtual network Section 3.3.2.

   2.  Neutron MAY be configured with a set of authorized label values
       for each tenant virtual network Section 3.3.2.

   3.  A virtual tenant network MAY be configured with a set of
       authorized label values Section 3.3.2.

   4.  Neutron MUST be configured with one or more label values per
       virtual tenant network that the network is permitted to receive
       Section 3.3.2.

4.2.  vSwitch implications

   On messages transmitted by a virtual machine

   Label Correctness:  As described in Section 3.3.1.1 or
      Section 3.3.1.2, ensure that the label in the packet is one that
      the VM is authorized to use.  Depending on configuration, it may
      be in the IPv6 Flow Label, an option in the Hop-by_hop header, or
      an option in the Destination Header.  Again depending on
      configuratoin, the vSwitch may overwrite whatever value is there,
      or may ratify that the value there is as specified in a VM-
      specific list.

   Source Address Validation:  As described in Section 3.4, force the
      source address to be among those the VM is authorized to use.  The
      VM may simultaneously be authorized to use several addresses.

   Destination Address Validation:  OpenStack for IPv4 permits a NAT
      translation, called a "floating IP address", to enable a VM to
      comunicate outside the domain; without that, it cannot.  For IPv6,



Baker, et al.            Expires April 20, 2015                [Page 18]


Internet-Draft                                              October 2014


      the destination address should be permitted by some acces list,
      which may permit all addresses, or addresses matching one or more
      CIDR prefixes such as permitted multicast addresses, and the
      prefix of the data center.

   On messages received for delivery to a virtual machine

   Label Authorization:  As described in Section 3.3.2, the vSwitch only
      delivers a packet to a VM if the VM is authorized to receive it.
      The VM may have been authorized to receive several such labels.

5.  IANA Considerations

   This document does not ask IANA to do anything.

6.  Security Considerations

   In Section 2.6 and Section 3.3, this specification considers inter-
   tenant and intra-tenant network isolation.  It is intended to
   contribute to the security of a network, much like encapsulation in a
   maze of tunnels or VLANs might, but without the complexities and
   overhead of the management of such resources.  This does not replace
   the use of IPsec, SSH, or TLS encryption or the use authentication
   technologies; if these would be appropriate in an on-premises
   corporate data center, they remain appropriate in a multi-tenant data
   center regardless of the isolation technology.  However, one can
   think of this as a simple inter-tenant firewall based on the concepts
   of role-based access control; if it can be readily determined that a
   sender is not authorized to communicate with a receiver, such a
   transmission is prevented.

7.  Privacy Considerations

   This specification places no personally identifying information in an
   unencrypted part of a packet, unless a person is explicitly
   represented by an IPv6 address or label value, which is beyond its
   scope.  That said, if the RBAC Identifier in
   [I-D.baker-openstack-rbac-federated-identity] is used, the security
   and privacy considerations of that document are relevant here.

8.  Acknowledgements

   This document grew out of a discussion among the authors and
   contributors.







Baker, et al.            Expires April 20, 2015                [Page 19]


Internet-Draft                                              October 2014


9.  Contributors

      Puneet Konghot
      Cisco Systems
      San Jose, California  95134
      USA
      Email: pkonghot@cisco.com

      Rohit Agarwalla
      Cisco Systems
      San Jose, California  95134
      USA
      Email: roagarwa@cisco.com

      Shannon McFarland
      Cisco Systems
      Boulder, Colorado  80301
      USA
      Email: shmcfarl@cisco.com

                                 Figure 2

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

10.2.  Informative References

   [FaceBook-IPv6]
              Pepelnjak, I., "Facebook Is Close to Having an IPv6-only
              Data Center http://blog.ipspace.net/2014/03/
              facebook-is-close-to-having-ipv6-only.html", March 2014.

   [I-D.anderson-v6ops-siit-dc]
              tore, t., "SIIT-DC: Stateless IP/ICMP Translation for IPv6
              Data Centre Environments", draft-anderson-v6ops-siit-dc-00
              (work in progress), September 2014.

   [I-D.baker-ipv6-isis-dst-flowlabel-routing]
              Baker, F., "Using IS-IS with Role-Based Access Control",
              draft-baker-ipv6-isis-dst-flowlabel-routing-00 (work in
              progress), February 2013.



Baker, et al.            Expires April 20, 2015                [Page 20]


Internet-Draft                                              October 2014


   [I-D.baker-ipv6-ospf-dst-flowlabel-routing]
              Baker, F., "Using OSPFv3 with Role-Based Access Control",
              draft-baker-ipv6-ospf-dst-flowlabel-routing-02 (work in
              progress), May 2013.

   [I-D.baker-openstack-rbac-federated-identity]
              Baker, F., "Federated Identity for IPv6 Role-base Access
              Control", September 2014.

   [I-D.ietf-ospf-ospfv3-lsa-extend]
              Lindem, A., Mirtorabi, S., Roy, A., and F. Baker, "OSPFv3
              LSA Extendibility", draft-ietf-ospf-ospfv3-lsa-extend-03
              (work in progress), May 2014.

   [I-D.ietf-savi-dhcp]
              Bi, J., Wu, J., Yao, G., and F. Baker, "SAVI Solution for
              DHCP", draft-ietf-savi-dhcp-26 (work in progress), May
              2014.

   [I-D.mahalingam-dutt-dcops-vxlan]
              Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A
              Framework for Overlaying Virtualized Layer 2 Networks over
              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-09
              (work in progress), April 2014.

   [Microsoft-Azure]
              Sverdlik, Y., "Report: Microsoft Buys 100 Acres of Iowa
              land for Data Center http://www.datacenterknowledge.com/
              archives/2014/08/01/
              report-microsoft-buys-100-acres-iowa-land-data-center/",
              August 2014.

   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
              Functional Specification", RFC 2205, September 1997.

   [RFC2827]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
              Defeating Denial of Service Attacks which employ IP Source
              Address Spoofing", BCP 38, RFC 2827, May 2000.

   [RFC3439]  Bush, R. and D. Meyer, "Some Internet Architectural
              Guidelines and Philosophy", RFC 3439, December 2002.

   [RFC3697]  Rajahalme, J., Conta, A., Carpenter, B., and S. Deering,
              "IPv6 Flow Label Specification", RFC 3697, March 2004.





Baker, et al.            Expires April 20, 2015                [Page 21]


Internet-Draft                                              October 2014


   [RFC4192]  Baker, F., Lear, E., and R. Droms, "Procedures for
              Renumbering an IPv6 Network without a Flag Day", RFC 4192,
              September 2005.

   [RFC4193]  Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast
              Addresses", RFC 4193, October 2005.

   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
              Architecture", RFC 4291, February 2006.

   [RFC4601]  Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
              "Protocol Independent Multicast - Sparse Mode (PIM-SM):
              Protocol Specification (Revised)", RFC 4601, August 2006.

   [RFC4602]  Pusateri, T., "Protocol Independent Multicast - Sparse
              Mode (PIM-SM) IETF Proposed Standard Requirements
              Analysis", RFC 4602, August 2006.

   [RFC4604]  Holbrook, H., Cain, B., and B. Haberman, "Using Internet
              Group Management Protocol Version 3 (IGMPv3) and Multicast
              Listener Discovery Protocol Version 2 (MLDv2) for Source-
              Specific Multicast", RFC 4604, August 2006.

   [RFC4605]  Fenner, B., He, H., Haberman, B., and H. Sandick,
              "Internet Group Management Protocol (IGMP) / Multicast
              Listener Discovery (MLD)-Based Multicast Forwarding ("IGMP
              /MLD Proxying")", RFC 4605, August 2006.

   [RFC4607]  Holbrook, H. and B. Cain, "Source-Specific Multicast for
              IP", RFC 4607, August 2006.

   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
              September 2007.

   [RFC4862]  Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless
              Address Autoconfiguration", RFC 4862, September 2007.

   [RFC4941]  Narten, T., Draves, R., and S. Krishnan, "Privacy
              Extensions for Stateless Address Autoconfiguration in
              IPv6", RFC 4941, September 2007.

   [RFC5120]  Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi
              Topology (MT) Routing in Intermediate System to
              Intermediate Systems (IS-ISs)", RFC 5120, February 2008.

   [RFC5308]  Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, October
              2008.



Baker, et al.            Expires April 20, 2015                [Page 22]


Internet-Draft                                              October 2014


   [RFC5340]  Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF
              for IPv6", RFC 5340, July 2008.

   [RFC5548]  Dohler, M., Watteyne, T., Winter, T., and D. Barthel,
              "Routing Requirements for Urban Low-Power and Lossy
              Networks", RFC 5548, May 2009.

   [RFC5673]  Pister, K., Thubert, P., Dwars, S., and T. Phinney,
              "Industrial Routing Requirements in Low-Power and Lossy
              Networks", RFC 5673, October 2009.

   [RFC6052]  Bao, C., Huitema, C., Bagnulo, M., Boucadair, M., and X.
              Li, "IPv6 Addressing of IPv4/IPv6 Translators", RFC 6052,
              October 2010.

   [RFC6144]  Baker, F., Li, X., Bao, C., and K. Yin, "Framework for
              IPv4/IPv6 Translation", RFC 6144, April 2011.

   [RFC6145]  Li, X., Bao, C., and F. Baker, "IP/ICMP Translation
              Algorithm", RFC 6145, April 2011.

   [RFC6146]  Bagnulo, M., Matthews, P., and I. van Beijnum, "Stateful
              NAT64: Network Address and Protocol Translation from IPv6
              Clients to IPv4 Servers", RFC 6146, April 2011.

   [RFC6147]  Bagnulo, M., Sullivan, A., Matthews, P., and I. van
              Beijnum, "DNS64: DNS Extensions for Network Address
              Translation from IPv6 Clients to IPv4 Servers", RFC 6147,
              April 2011.

   [RFC6437]  Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme,
              "IPv6 Flow Label Specification", RFC 6437, November 2011.

   [RFC6620]  Nordmark, E., Bagnulo, M., and E. Levy-Abegnoli, "FCFS
              SAVI: First-Come, First-Served Source Address Validation
              Improvement for Locally Assigned IPv6 Addresses", RFC
              6620, May 2012.

   [RFC7217]  Gont, F., "A Method for Generating Semantically Opaque
              Interface Identifiers with IPv6 Stateless Address
              Autoconfiguration (SLAAC)", RFC 7217, April 2014.

   [RFC7219]  Bagnulo, M. and A. Garcia-Martinez, "SEcure Neighbor
              Discovery (SEND) Source Address Validation Improvement
              (SAVI)", RFC 7219, May 2014.






Baker, et al.            Expires April 20, 2015                [Page 23]


Internet-Draft                                              October 2014


Appendix A.  Change Log

   Initial Version:  October 2014

Authors' Addresses

   Fred Baker
   Cisco Systems
   Santa Barbara, California  93117
   USA

   Email: fred@cisco.com


   Chris Marino
   Cisco Systems
   San Jose, California  95134
   USA

   Email: chrmarin@cisco.com


   Ian Wells
   Cisco Systems
   San Jose, California  95134
   USA

   Email: iawells@cisco.com























Baker, et al.            Expires April 20, 2015                [Page 24]

Document	Document type	This is an older version of an Internet-Draft whose latest revision state is "Expired".
	Select version	00 01 02
	Compare versions
	Authors	Fred Baker , Chris Marino , Ian Wells
	RFC stream	(None)
	Other formats	txt xml pdf bibtex bibxml