NVO3 Working Group Yizhou Li
INTERNET-DRAFT Lucy Yong
Intended Status: Informational Huawei Technologies
Lawrence Kreeger
Cisco
Thomas Narten
IBM
David Black
EMC
Expires: May 22, 2015 November 18, 2014
Hypervisor to NVE Control Plane Requirements
draft-ietf-nvo3-hpvr2nve-cp-req-01
Abstract
In a Split-NVE architructure, the functions of the NVE are split
across the hypervisor/container on a server and an external network
equipment which is called an external NVE. A control plane
protocol(s) between a hypervisor and its associated external NVE(s)
is used for the hypervisor to distribute its virtual machine
networking state to the external NVE(s) for further handling. This
document illustrates the functionality required by this type of
control plane signaling protocol and outlines the high level
requirements. Virtual machine states as well as state transitioning
are summarized to help clarifying the needed protocol requirements.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
Li, et al [Page 1]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 5
2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 VM Creation Event . . . . . . . . . . . . . . . . . . . . . 7
2.2 VM Live Migration Event . . . . . . . . . . . . . . . . . . 8
2.3 VM Termination Event . . . . . . . . . . . . . . . . . . . . 9
2.4 VM Pause, Suspension and Resumption Events . . . . . . . . . 9
3. Hypervisor-to-NVE Control Plane Protocol Functionality . . . . 9
3.1 VN connect and Disconnect . . . . . . . . . . . . . . . . . 10
3.2 TSI Associate and Activate . . . . . . . . . . . . . . . . . 11
3.3 TSI Disassociate and Deactivate . . . . . . . . . . . . . . 14
4. Hypervisor-to-NVE Control Plane Protocol Requirements . . . . . 15
5. Security Considerations . . . . . . . . . . . . . . . . . . . . 16
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 17
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8.1 Normative References . . . . . . . . . . . . . . . . . . . 17
8.2 Informative References . . . . . . . . . . . . . . . . . . 17
Appendix A. IEEE 802.1Qbg VDP Illustration (For information
only) . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
Li, et al [Page 2]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
1. Introduction
In the Split-NVE architecture shown in Figure 1, the functionality of
the NVE is split across an end device supporting virtualization and
an external network device which is called an external NVE. The
portion of the NVE functionality located on the hypervisor/container
is called the tNVE and the portion located on the external NVE is
called the nNVE in this document. Overlay encapsulation/decapsulation
functions are normally off-loaded to the nNVE on the external NVE.
The tNVE is normally implemented as a part of hypervisor or container
in an virtualized end device.
The problem statement [RFC7364], discusses the needs for a control
plane protocol (or protocols) to populate each NVE with the state
needed to perform the required functions. In one scenario, an NVE
provides overlay encapsulation/decapsulation packet forwarding
services to Tenant Systems (TSs) that are co-resident within the NVE
on the same End Device (e.g. when the NVE is embedded within a
hypervisor or a Network Service Appliance). In such cases, there is
no need for a standardized protocol between the hypervisor and NVE,
as the interaction is implemented via software on a single device.
While in the Split-NVE architecture scenarios, as shown in figure 2
to figure 4, a control plane protocol(s) between a hypervisor and its
associated external NVE(s) is required for the hypervisor to
distribute the virtual machines networking states to the NVE(s) for
further handling. The protocol indeed is an NVE-internal protocol and
runs between tNVE and nNVE logical entities. This protocol is
mentioned in NVO3 problem statement [RFC7364] and appears as the
third work item.
Virtual machine states and state transitioning are summarized in this
document to show events where the NVE needs to take specific actions.
Such events might correspond to actions the control plane signaling
protocols between the hypervisor and external NVE will need to take.
Then the high level requirements to be fulfilled are outlined.
Li, et al [Page 3]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
+-- -- -- -- Split-NVE -- -- -- --+
|
|
+---------------|-----+
| +------------- ----+| |
| | +--+ +---\|/--+|| +------ --------------+
| | |VM|---+ ||| | \|/ |
| | +--+ | ||| |+--------+ |
| | +--+ | tNVE |||----- - - - - - -----|| | |
| | |VM|---+ ||| || nNVE | |
| | +--+ +--------+|| || | |
| | || |+--------+ |
| +--Hpvr/Container--+| +---------------------+
+---------------------+
End Device External NVE
Figure 1 Split-NVE structure
This document uses the term "hypervisor" throughout when describing
the Split-NVE scenario where part of the NVE functionality is off-
loaded to a separate device from the "hypervisor" that contains a VM
connected to a VN. In this context, the term "hypervisor" is meant to
cover any device type where part of the NVE functionality is off-
loaded in this fashion, e.g.,a Network Service Appliance, Linux
Container.
This document often uses the term "VM" and "Tenant System" (TS)
interchangeably, even though a VM is just one type of Tenant System
that may connect to a VN. For example, a service instance within a
Network Service Appliance may be another type of TS, or a system
running on an OS-level virtualization technologies like LinuX
Containers. When this document uses the term VM, it will in most
cases apply to other types of TSs.
Section 2 describes VM states and state transitioning in its
lifecycle. Section 3 introduces Hypervisor-to-NVE control plane
protocol functionality derived from VM operations and network events.
Section 4 outlines the requirements of the control plane protocol to
achieve the required functionality.
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
Li, et al [Page 4]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
document are to be interpreted as described in RFC 2119 [RFC2119].
This document uses the same terminology as found in [RFC7365] and [I-
D.ietf-nvo3-nve-nva-cp-req]. This section defines additional
terminology used by this document.
Split-NVE: a type of NVE that the functionalities of it are split
across an end device supporting virtualization and an external
network device.
tNVE: the portion of Split-NVE functionalities located on the end
device supporting virtualization.
nNVE: the portion of Split-NVE functionalities located on the network
device which is directly or indirectly connects to the end device
holding the corresponding tNVE.
External NVE: the physical network device holding nNVE
Hypervisor/Container: the logical collection of software, firmware
and/or hardware that allows the creation and running of server or
service appliance virtualization. tNVE is located on
Hypervisor/Container. It is loosely used in this document to refer to
the end device supporting the virtualization. For simplicity, we also
use Hypervisor in this document to represent both hypervisor and
container.
VN Profile: Meta data associated with a VN that is applied to any
attachment point to the VN. That is, VAP properties that are appliaed
to all VAPs associated with a given VN and used by an NVE when
ingressing/egressing packets to/from a specific VN. Meta data could
include such information as ACLs, QoS settings, etc. The VN Profile
contains parameters that apply to the VN as a whole. Control
protocols between the NVE and NVA could use the VN ID or VN Name to
obtain the VN Profile.
VSI: Virtual Station Interface. [IEEE 802.1Qbg]
VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Qbg]
1.2 Target Scenarios
In the Split-NVE architecture, an external NVE can provide an offload
of the encapsulation / decapsulation function, network policy
enforcement, as well as the VN Overlay protocol overhead. This
offloading may provide performance improvements and/or resource
savings to the End Device (e.g. hypervisor) making use of the
Li, et al [Page 5]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
external NVE.
The following figures give example scenarios of a Split-NVE
architecture.
Hypervisor Access Switch
+------------------+ +-----+-------+
| +--+ +-------+ | | | |
| |VM|---| | | VLAN | | |
| +--+ | tNVE |---------+ nNVE| +--- Underlying
| +--+ | | | Trunk | | | Network
| |VM|---| | | | | |
| +--+ +-------+ | | | |
+------------------+ +-----+-------+
Figure 2 Hypervisor with an External NVE
Hypervisor L2 Switch
+---------------+ +-----+ +----+---+
| +--+ +----+ | | | | | |
| |VM|---| | |VLAN | |VLAN | | |
| +--+ |tNVE|-------+ +-----+nNVE| +--- Underlying
| +--+ | | |Trunk| |Trunk| | | Network
| |VM|---| | | | | | | |
| +--+ +----+ | | | | | |
+---------------+ +-----+ +----+---+
Figure 3 Hypervisor with an External NVE
across an Ethernet Access Switch
Network Service Appliance Access Switch
+--------------------------+ +-----+-------+
| +------------+ | \ | | | |
| |Net Service |----| \ | | | |
| |Instance | | \ | VLAN | | |
| +------------+ |tNVE| |------+nNVE | +--- Underlying
| +------------+ | | | Trunk| | | Network
| |Net Service |----| / | | | |
| |Instance | | / | | | |
| +------------+ | / | | | |
+--------------------------+ +-----+-------+
Figure 4 Physical Network Service Appliance with an External NVE
Tenant Systems connect to external NVEs via a Tenant System Interface
Li, et al [Page 6]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
(TSI). The TSI logically connects to the external NVE via a Virtual
Access Point (VAP) [I-D.ietf-nvo3-arch]. The external NVE may provide
Layer 2 or Layer 3 forwarding. In the Split-NVE architecture, the
external NVE may be able to reach multiple MAC and IP addresses via a
TSI. For example, Tenant Systems that are providing network services
(such as transparent firewall, load balancer, VPN gateway) are likely
to have complex address hierarchy. This implies that if a given TSI
disassociates from one VN, all the MAC and/or IP addresses are also
disassociated. There is no need to signal the deletion of every MAC
or IP when the TSI is brought down or deleted. In the majority of
cases, a VM will be acting as a simple host that will have a single
TSI and single MAC and IP visible to the external NVE.
2. VM Lifecycle
Figure 2 of [I-D.ietf-opsawg-vmm-mib] shows the state transition of a
VM. Some of the VM states are of interest to the external NVE. This
section illustrates the relevant phases and events in the VM
lifecycle. It should be noted that the following subsections do not
give an exhaustive traversal of VM lifecycle state. They are intended
as the illustrative examples which are relevant to Split-NVE
architecture, not as prescriptive text; the goal is to capture
sufficient detail to set a context for the signaling protocol
functionality and requirements described in the following sections.
2.1 VM Creation Event
VM creation event makes the VM state transiting from Preparing to
Shutdown and then to Running [I-D.ietf-opsawg-vmm-mib]. The end
device allocates and initializes local virtual resources like storage
in the VM Preparing state. In Shutdown state, the VM has everything
ready except that CPU execution is not scheduled by the hypervisor
and VM's memory is not resident in the hypervisor. From the Shutdown
state to Running state, normally it requires the human execution or
system triggered event. Running state indicates the VM is in the
normal execution state. As part of transitioning the VM to the
Running state, the hypervisor must also provision network
connectivity for the VM's TSI(s) so that Ethernet frames can be sent
and received correctly. No ongoing migration, suspension or shutdown
is in process.
In the VM creation phase, the VM's TSI has to be associated with the
external NVE. Association here indicates that hypervisor and the
external NVE have signaled each other and reached some agreement.
Relevant networking parameters or information have been provisioned
properly. The External NVE should be informed of the VM's TSI MAC
address and/or IP address. In addition to external network
Li, et al [Page 7]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
connectivity, the hypervisor may provide local network connectivity
between the VM's TSI and other VM's TSI that are co-resident on the
same hypervisor. When the intra or inter-hypervisor connectivity is
extended to the external NVE, a locally significant tag, e.g. VLAN
ID, should be used between the hypervisor and the external NVE to
differentiate each VN's traffic. Both the hypervisor and external NVE
sides must agree on that tag value for traffic identification,
isolation and forwarding.
The external NVE may need to do some preparation work before it
signals successful association with TSI. Such preparation work may
include locally saving the states and binding information of the
tenant system interface and its VN, communicating with the NVA for
network provisioning, etc.
Tenant System interface association should be performed before the VM
enters running state, preferably in Shutdown state. If association
with external NVE fails, the VM should not go into running state.
2.2 VM Live Migration Event
Live migration is sometimes referred to as "hot" migration, in that
from an external viewpoint, the VM appears to continue to run while
being migrated to another server (e.g., TCP connections generally
survive this class of migration). In contrast, "cold" migration
consists of shutdown VM execution on one server and restart it on
another. For simplicity, the following abstract summary about live
migration assumes shared storage, so that the VM's storage is
accessible to the source and destination servers. Assume VM live
migrates from hypervisor 1 to hypervisor 2. Such migration event
involves the state transition on both hypervisors, source hypervisor
1 and destination hypervisor 2. VM state on source hypervisor 1
transits from Running to Migrating and then to Shutdown [I-D.ietf-
opsawg-vmm-mib]. VM state on destination hypervisor 2 transits from
Shutdown to Migrating and then Running.
The external NVE connected to destination hypervisor 2 has to
associate the migrating VM's TSI with it by discovering the TSI's MAC
and/or IP addresses, its VN, locally significant VID if any, and
provisioning other network related parameters of the TSI. The
external NVE may be informed about the VM's peer VMs, storage devices
and other network appliances with which the VM needs to communicate
or is communicating. The migrated VM on destination hypervisor 2
SHOULD not go to Running state before all the network provisioning
and binding has been done.
The migrating VM SHOULD not be in Running state at the same time on
Li, et al [Page 8]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
the source hypervisor and destination hypervisor during migration.
The VM on the source hypervisor does not transition into Shutdown
state until the VM successfully enters the Running state on the
destination hypervisor. It is possible that VM on the source
hypervisor stays in Migrating state for a while after VM on the
destination hypervisor is in Running state.
2.3 VM Termination Event
VM termination event is also referred to as "powering off" a VM. VM
termination event leads to its state going to Shutdown. There are two
possible causes to terminate a VM [I-D.ietf-opsawg-vmm-mib], one is
the normal "power off" of a running VM; the other is that VM has been
migrated to another hypervisor and the VM image on the source
hypervisor has to stop executing and to be shutdown.
In VM termination, the external NVE connecting to that VM needs to
deprovision the VM, i.e. delete the network parameters associated
with that VM. In other words, the external NVE has to de-associate
the VM's TSI.
2.4 VM Pause, Suspension and Resumption Events
The VM pause event leads to the VM transiting from Running state to
Paused state. The Paused state indicates that the VM is resident in
memory but no longer scheduled to execute by the hypervisor [I-
D.ietf-opsawg-vmm-mib]. The VM can be easily re-activated from Paused
state to Running state.
The VM suspension event leads to the VM transiting from Running state
to Suspended state. The VM resumption event leads to the VM
transiting state from Suspended state to Running state. Suspended
state means the memory and CPU execution state of the virtual machine
are saved to persistent store. During this state, the virtual
machine is not scheduled to execute by the hypervisor [I-D.ietf-
opsawg-vmm-mib].
In the Split-NVE architecture, the external NVE should keep any
paused or suspended VM in association as the VM can return to Running
state at any time.
3. Hypervisor-to-NVE Control Plane Protocol Functionality
The following subsections show the illustrative examples of the state
transitions on external NVE which are relevant to Hypervisor-to-NVE
Signaling protocol functionality. It should be noted they are not
prescriptive text for full state machines.
Li, et al [Page 9]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
3.1 VN connect and Disconnect
In Split-NVE scenario, a protocol is needed between the End
Device(e.g. Hypervisor) making use of the external NVE and the
external NVE in order to make the external NVE aware of the changing
VN membership requirements of the Tenant Systems within the End
Device.
A key driver for using a protocol rather than using static
configuration of the external NVE is because the VN connectivity
requirements can change frequently as VMs are brought up, moved and
brought down on various hypervisors throughout the data center or
external cloud.
+---------------+ Recv VN_connect; +-------------------+
|VN_Disconnected| return Local_Tag value |VN_Connected |
+---------------+ for VN if successful; +-------------------+
|VN_ID; |-------------------------->|VN_ID; |
|VN_State= | |VN_State=connected;|
|disconnected; | |Num_TSI_Associated;|
| |<----Recv VN_disconnect----|Local_Tag; |
+---------------+ |VN_Context; |
+-------------------+
Figure 5 State Transition Example of a VAP Instance
on an External NVE
Figure 5 shows the state transition for a VAP on the external NVE. An
NVE that supports the hypervisor to NVE control plane protocol should
support one instance of the state machine for each active VN. The
state transition on the external NVE is normally triggered by the
hypervisor-facing side events and behaviors. Some of the interleaved
interaction between NVE and NVA will be illustrated for better
understanding of the whole procedure; while others of them may not be
shown. More detailed information regarding that is available in [I-
D.ietf-nvo3-nve-nva-cp-req].
The external NVE must be notified when an End Device requires
connection to a particular VN and when it no longer requires
connection. In addition, the external NVE must provide a local tag
value for each connected VN to the End Device to use for exchange of
packets between the End Device and the external NVE (e.g. a locally
significant 802.1Q tag value). How "local" the significance is
depends on whether the Hypervisor has a direct physical connection to
Li, et al [Page 10]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
the external NVE (in which case the significance is local to the
physical link), or whether there is an Ethernet switch (e.g. a blade
switch) connecting the Hypervisor to the NVE (in which case the
significance is local to the intervening switch and all the links
connected to it).
These VLAN tags are used to differentiate between different VNs as
packets cross the shared access network to the external NVE. When the
external NVE receives packets, it uses the VLAN tag to identify the
VN of packets coming from a given TSI, strips the tag, and adds the
appropriate overlay encapsulation for that VN and sends it towards
the corresponding remote NVE across the underlying IP network.
The Identification of the VN in this protocol could either be through
a VN Name or a VN ID. A globally unique VN Name facilitates
portability of a Tenant's Virtual Data Center. Once an external NVE
receives a VN connect indication, the NVE needs a way to get a VN
Context allocated (or receive the already allocated VN Context) for a
given VN Name or ID (as well as any other information needed to
transmit encapsulated packets). How this is done is the subject of
the NVE-to-NVA protocol which are part of work items 1 and 2 in
[RFC7364].
VN_connect message can be explicit or implicit. Explicit means the
hypervisor sending a message explicitly to request for the connection
to a VN. Implicit means the external NVE receives other messages,
e.g. very first TSI associate message (see the next subsection) for a
given VN, to implicitly indicate its interest to connect to a VN.
A VN_disconnect message will indicate that the NVE can release all
the resources for that disconnected VN and transit to VN_disconnected
state. The local tag assigned for that VN can possibly be reclaimed
by other VN.
3.2 TSI Associate and Activate
Typically, a TSI is assigned a single MAC address and all frames
transmitted and received on that TSI use that single MAC address. As
mentioned earlier, it is also possible for a Tenant System to
exchange frames using multiple MAC addresses or packets with multiple
IP addresses.
Particularly in the case of a TS that is forwarding frames or packets
from other TSs, the external NVE will need to communicate the mapping
between the NVE's IP address (on the underlying network) and ALL the
addresses the TS is forwarding on behalf of for the corresponding VN
to the NVA.
Li, et al [Page 11]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
The NVE has two ways in which it can discover the tenant addresses
for which frames must be forwarded to a given End Device (and
ultimately to the TS within that End Device).
1. It can glean the addresses by inspecting the source addresses in
packets it receives from the End Device.
2. The hypervisor can explicitly signal the address associations of
a TSI to the external NVE. The address association includes all the
MAC and/or IP addresses possibly used as source addresses in a packet
sent from the hypervisor to external NVE. The external NVE may
further use this information to filter the future traffic from the
hypervisor.
To perform the second approach above, the "hypervisor-to-NVE"
protocol requires a means to allow End Devices to communicate new
tenant addresses associations for a given TSI within a given VN.
Figure 6 shows the example of a state transition for a TSI connecting
to a VAP on the external NVE. An NVE that supports the hypervisor to
NVE control plane protocol may support one instance of the state
machine for each TSI connecting to a given VN.
Li, et al [Page 12]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
disassociate; +--------+ disassociate
+--------------->| Init |<--------------------+
| +--------+ |
| | | |
| | | |
| +--------+ |
| | | |
| associate | | activate |
| +-----------+ +-----------+ |
| | | |
| | | |
| \|/ \|/ |
+--------------------+ +---------------------+
| Associated | | Activated |
+--------------------+ +---------------------+
|TSI_ID; | |TSI_ID; |
|Port; |-----activate---->|Port; |
|VN_ID; | |VN_ID; |
|State=associated; | |State=activated ; |-+
+-|Num_Of_Addr; |<---deactivate;---|Num_Of_Addr; | |
| |List_Of_Addr; | |List_Of_Addr; | |
| +--------------------+ +---------------------+ |
| /|\ /|\ |
| | | |
+---------------------+ +-------------------+
add/remove/updt addr; add/remove/updt addr;
or update port; or update port;
Figure 6 State Transition Example of a TSI Instance
on an External NVE
Associated state of a TSI instance on an external NVE indicates all
the addresses for that TSI have already associated with the VAP of
the external NVE on port p for a given VN but no real traffic to and
from the TSI is expected and allowed to pass through. An NVE has
reserved all the necessary resources for that TSI. An external NVE
may report the mappings of its' underlay IP address and the
associated TSI addresses to NVA and relevant network nodes may save
such information to its mapping table but not forwarding table. A NVE
may create ACL or filter rules based on the associated TSI addresses
on the attached port p but not enable them yet. Local tag for the VN
corresponding to the TSI instance should be provisioned on port p to
receive packets.
VM migration event(discussed section 2) may cause the hypervisor to
send an associate message to the NVE connected to the destination
hypervisor the VM migrates to. VM creation event may also lead to the
Li, et al [Page 13]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
same practice.
The Activated state of a TSI instance on an external NVE indicates
that all the addresses for that TSI functioning correctly on port p
and traffic can be received from and sent to that TSI via the NVE.
The mappings of the NVE's underlay IP address and the associated TSI
addresses should be put into the forwarding table rather than the
mapping table on relevant network nodes. ACL or filter rules based on
the associated TSI addresses on the attached port p in NVE are
enabled. Local tag for the VN corresponding to the TSI instance MUST
be provisioned on port p to receive packets.
The Activate message makes the state transit from Init or Associated
to Activated. VM creation, VM migration and VM resumption events
discussed in section 4 may trigger the Activate message to be sent
from the hypervisor to the external NVE.
TSI information may get updated either in Associated or Activated
state. The following are considered updates to the TSI information:
add or remove the associated addresses, update current associated
addresses (for example updating IP for a given MAC), update NVE port
information based on where the NVE receives messages. Such updates do
not change the state of TSI. When any address associated to a given
TSI changes, the NVE should inform the NVA to update the mapping
information on NVE's underlying address and the associated TSI
addresses. The NVE should also change its local ACL or filter
settings accordingly for the relevant addresses. Port information
update will cause the local tag for the VN corresponding to the TSI
instance to be provisioned on new port p and removed from the old
port.
3.3 TSI Disassociate and Deactivate
Disassociate and deactivate conceptually are the reverse behaviors of
associate and activate. From Activated state to Associated state, the
external NVE needs to make sure the resources are still reserved but
the addresses associated to the TSI are not functioning and no
traffic to and from the TSI is expected and allowed to pass through.
For example, the NVE needs to inform the NVA to remove the relevant
addresses mapping information from forwarding or routing table. ACL
or filtering rules regarding the relevant addresses should be
disabled. From Associated or Activated state to the Init state, the
NVE will release all the resources relevant to TSI instances. The NVE
should also inform the NVA to remove the relevant entries from
mapping table. ACL or filtering rules regarding the relevant
addresses should be removed. Local tag provisioning on the connecting
port on NVE should be cleared.
Li, et al [Page 14]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
A VM suspension event(discussed in section 2) may cause the relevant
TSI instance(s) on the NVE to transit from Activated to Associated
state. A VM pause event normally does not affect the state of the
relevant TSI instance(s) on the NVE as the VM is expected to run
again soon. The VM shutdown event will normally cause the relevant
TSI instance(s) on NVE transit to Init state from Activated state.
All resources should be released.
A VM migration will lead the TSI instance on the source NVE to leave
Activated state. When a VM migrates to another hypervisor connecting
to the same NVE, i.e. source and destination NVE are the same, NVE
should use TSI_ID and incoming port to differentiate two TSI
instance.
Although the triggering messages for state transition shown in Figure
6 does not indicate the difference between VM creation/shutdown event
and VM migration arrival/departure event, the external NVE can make
optimizations if it is notified of such information. For example, if
the NVE knows the incoming activate message is caused by migration
rather than VM creation, some mechanisms may be employed or triggered
to make sure the dynamic configurations or provisionings on the
destination NVE are the same as those on the source NVE for the
migrated VM. For example IGMP query [RFC2236] can be triggered by the
destination external NVE to the migrated VM on destination hypervisor
so that the VM is forced to answer an IGMP report to the multicast
router. Then multicast router can correctly send the multicast
traffic to the new external NVE for those multicast groups the VM had
joined before the migration.
4. Hypervisor-to-NVE Control Plane Protocol Requirements
Req-1: The protocol MUST support a bridged network connecting End
Devices to External NVE.
Req-2: The protocol MUST support multiple End Devices sharing the
same External NVE via the same physical port across a bridged
network.
Req-3: The protocol MAY support an End Device using multiple external
NVEs simultaneously, but only one external NVE for each VN.
Req-4: The protocol MAY support an End Device using multiple external
NVEs simultaneously for the same VN.
Req-5: The protocol MUST allow the End Device initiating a request to
its associated External NVE to be connected/disconnected to a given
VN.
Li, et al [Page 15]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
Req-6: The protocol MUST allow an External NVE initiating a request
to its connected End Devices to be disconnected to a given VN.
Req-7: When a TS attaches to a VN, the protocol MUST allow for an End
Device and its external NVE to negotiate a locally-significant tag
for carrying traffic associated with a specific VN (e.g., 802.1Q
tags).
Req-8: The protocol MUST allow an End Device initiating a request to
associate/disassociate and/or activate/deactive address(es) of a TSI
instance to a VN on an NVE port.
Req-9: The protocol MUST allow the External NVE initiating a request
to disassociate and/or deactivate address(es) of a TSI instance to a
VN on an NVE port.
Req-10: The protocol MUST allow an End Device initiating a request to
add, remove or update address(es) associated with a TSI instance on
the external NVE. Addresses can be expressed in different formats,
for example, MAC, IP or pair of IP and MAC.
Req-11: The protocol MUST allow the External NVE to authenticate the
End Device connected.
Req-12: The protocol MUST be able to run over L2 links between the
End Device and its External NVE.
Req-13: The protocol SHOULD support the End Device indicating if an
associate or activate request from it results from a VM hot migration
event.
VDP [IEEE 802.1Qbg] is a candidate protocol running on layer 2.
Appendix A illustrates VDP for reader's information. It requires
extensions to fulfill the requirements in this document.
5. Security Considerations
NVEs must ensure that only properly authorized Tenant Systems are
allowed to join and become a part of any specific Virtual Network. In
addition, NVEs will need appropriate mechanisms to ensure that any
hypervisor wishing to use the services of an NVE are properly
authorized to do so. One design point is whether the hypervisor
should supply the NVE with necessary information (e.g., VM addresses,
VN information, or other parameters) that the NVE uses directly, or
whether the hypervisor should only supply a VN ID and an identifier
for the associated VM (e.g., its MAC address), with the NVE using
that information to obtain the information needed to validate the
hypervisor-provided parameters or obtain related parameters in a
Li, et al [Page 16]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
secure manner.
6. IANA Considerations
No IANA action is required. RFC Editor: please delete this section
before publication.
7. Acknowledgements
This document was initiated and merged from the drafts draft-kreeger-
nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve-mechanism and draft-
kompella-nvo3-server2nve. Thanks to all the co-authors and
contributing members of those drafts.
The authors would like to specially thank Jon Hudson for his generous
help in improving the readability of this document.
8. References
8.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
8.2 Informative References
[RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and
M. Napierala, "Problem Statement: Overlays for Network
Virtualization", October 2014.
[RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
Rekhter, "Framework for DC Network Virtualization",
October 2014.
[I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and
D. Black, "Network Virtualization NVE to NVA Control
Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-01
(work in progress), October 2013.
[I-D.ietf-nvo3-arch] Black, D., Narten, T., et al, "An Architecture
for Overlay Networks (NVO3)", draft-narten-nvo3-arch, work
in progress.
[I-D.ietf-opsawg-vmm-mib] Asai H., MacFaden M., Schoenwaelder J.,
Shima K., Tsou T., "Management Information Base for
Li, et al [Page 17]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
Virtual Machines Controlled by a Hypervisor", draft-ietf-
opsawg-vmm-mib-00 (work in progress), February 2014.
[IEEE 802.1Qbg] IEEE, "Media Access Control (MAC) Bridges and Virtual
Bridged Local Area Networks - Amendment 21: Edge Virtual
Bridging", IEEE Std 802.1Qbg, 2012
[8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged
Local Area Networks", IEEE Std 802.1Q-2011, August, 2011
Appendix A. IEEE 802.1Qbg VDP Illustration (For information only)
VDP has the format shown in Figure A.1. Virtual Station Interface (VSI)
is an interface to a virtual station that is attached to a downlink port
of an internal bridging function in server. VSI's VDP packet will be
handled by an external bridge. VDP is the controlling protocol running
between the hypervisor and the external bridge.
+--------+--------+------+----+----+------+------+------+-----------+
|TLV type|TLV info|Status|VSI |VSI |VSIID | VSIID|Filter|Filter Info|
| 7b |str len | |Type|Type|Format| | Info | |
| | 9b | 1oct |ID |Ver | | |format| |
| | | |3oct|1oct| 1oct |16oct |1oct | M oct |
+--------+--------+------+----+----+------+------+------+-----------+
| | | | |
| | |<--VSI type&instance-->|<----Filter------>|
| | |<------------VSI attributes-------------->|
|<--TLV header--->|<-------TLV info string = 23 + M octets--------->|
Figure A.1: VDP TLV definitions
There are basically four TLV types.
1. Pre-Associate: Pre-Associate is used to pre-associate a VSI instance
with a bridge port. The bridge validates the request and returns a
failure Status in case of errors. Successful pre-association does not
imply that the indicated VSI Type or provisioning will be applied to any
traffic flowing through the VSI. The pre-associate enables faster
response to an associate, by allowing the bridge to obtain the VSI Type
prior to an association.
2. Pre-Associate with resource reservation: Pre-Associate with Resource
Reservation involves the same steps as Pre-Associate, but on successful
pre-association also reserves resources in the Bridge to prepare for a
subsequent Associate request.
Li, et al [Page 18]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
3. Associate: The Associate creates and activates an association between
a VSI instance and a bridge port. The Bridge allocates any required
bridge resources for the referenced VSI. The Bridge activates the
configuration for the VSI Type ID. This association is then applied to
the traffic flow to/from the VSI instance.
4. Deassociate: The de-associate is used to remove an association
between a VSI instance and a bridge port. Pre-Associated and Associated
VSIs can be de-associated. De-associate releases any resources that were
reserved as a result of prior Associate or Pre-Associate operations for
that VSI instance.
Deassociate can be initiated by either side and the rest types of
messages can only be initiated by the server side.
Some important flag values in VDP Status field:
1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is
migrating (M-bit = 1) or provides no guidance on the migration of the
user of the VSI (M-bit = 0). The M-bit is used as an indicator relative
to the VSI that the user is migrating to.
2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is
suspended (S-bit = 1) or provides no guidance as to whether the user of
the VSI is suspended (S-bit = 0). A keep-alive Associate request with
S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used
as an indicator relative to the VSI that the user is migrating from.
The filter information format currently supports 4 types as the
following.
1. VID Filter Info format
+---------+------+-------+--------+
| #of | PS | PCP | VID |
|entries |(1bit)|(3bits)|(12bits)|
|(2octets)| | | |
+---------+------+-------+--------+
|<--Repeated per entry->|
Figure A.2 VID Filter Info format
Li, et al [Page 19]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
2. MAC/VID filter format
+---------+--------------+------+-------+--------+
| #of | MAC address | PS | PCP | VID |
|entries | (6 octets) |(1bit)|(3bits)|(12bits)|
|(2octets)| | | | |
+---------+--------------+------+-------+--------+
|<--------Repeated per entry---------->|
Figure A.3 MAC/VID filter format
3. GroupID/VID filter format
+---------+--------------+------+-------+--------+
| #of | GroupID | PS | PCP | VID |
|entries | (4 octets) |(1bit)|(3bits)|(12bits)|
|(2octets)| | | | |
+---------+--------------+------+-------+--------+
|<--------Repeated per entry---------->|
Figure A.4 GroupID/VID filter format
4. GroupID/MAC/VID filter format
+---------+----------+-------------+------+-----+--------+
| #of | GroupID | MAC address | PS | PCP | VID |
|entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)|
|(2octets)| | | | | |
+---------+----------+-------------+------+-----+--------+
|<-------------Repeated per entry------------->|
Figure A.5 GroupID/MAC/VID filter format
The null VID can be used in the VDP Request sent from the hypervisor to
the external bridge. Use of the null VID indicates that the set of VID
values associated with the VSI is expected to be supplied by the Bridge.
The Bridge can obtain VID values from the VSI Type whose identity is
specified by the VSI Type information in the VDP Request. The set of VID
values is returned to the station via the VDP Response. The returned VID
value can be a locally significant value. When GroupID is used, it is
equivalent to the VN ID in NVO3. GroupID will be provided by the
hypervisor to the bridge. The bridge will map GroupID to a locally
significant VLAN ID.
The VSIID in VDP request that identify a VM can be one of the following
format: IPV4 address, IPV6 address, MAC address, UUID or locally
defined.
Authors' Addresses
Li, et al [Page 20]
INTERNET DRAFT Hypervisor to NVE CP Reqs July 2014
Yizhou Li
Huawei Technologies
101 Software Avenue,
Nanjing 210012
China
Phone: +86-25-56625409
EMail: liyizhou@huawei.com
Lucy Yong
Huawei Technologies, USA
Email: lucy.yong@huawei.com
Lawrence Kreeger
Cisco
Email: kreeger@cisco.com
Thomas Narten
IBM
Email: narten@us.ibm.com
David Black
EMC
Email: david.black@emc.com
Li, et al [Page 21]