Requirements for Extending BGP/MPLS VPNs to End-Systems
draft-fang-l3vpn-end-system-requirements-00
Network Working Group
Internet Draft
Intended status: Informational Maria Napierala
Expires: April 15, 2013 AT&T
Luyuan Fang
Cisco Systems
October 15, 2012
Requirements for Extending BGP/MPLS VPNs to End-Systems
draft-fang-l3vpn-end-system-requirements-00.txt
Abstract
Service Providers commonly use BGP/MPLS VPNs [RFC 4364] as the
control plane for wide-area virtual networks. This technology has
proven to scale to a large number of VPNs and attachment points,
and it is well suited to provide VPN service to end-systems.
Virtualized environment imposes additional requirements to MPLS/BGP
VPN technology when applied to end-system networking, which are
defined in this document.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
Napierala, Fang Expire April 2012 [Page 1]
Internet Draft October 2012
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction 3
1.1. Terminology 3
2. Application of MPLS/BGP VPNs to End-Systems 3
3. Connectivity Requirements 4
4. Multi-Tenancy Requirements 5
5. Decoupling of Virtualized Networking from Physical
Infrastructure 5
6. Decoupling of Layer 3 Virtualization from Layer 2 Topology 6
7. Encapsulation of Virtual Payloads 6
8. Optimal Forwarding of Traffic 7
9. Inter-operability with Existing MPLS/BGP VPNs 8
10. IP Mobility 9
11. BGP Requirements in a Virtualized Environment 10
11.1. BGP Convergence and Routing Consistency 10
11.2. Optimizing Route Distribution 11
12. Security Considerations 11
13. IANA Considerations 11
14. Normative References 11
15. Informative References 11
16. Authors' Addresses 11
17. Acknowledgements 12
Requirements Language
Although this document is not a protocol specification, the key
words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [RFC
2119].
[Page 2]
Internet Draft October 2012
1. Introduction
Networks are increasingly being consolidated and outsourced in an
effort, both, to improve the deployment time of services as well as
reduce operational costs. This coincides with an increasing demand
for compute, storage, and network resources from applications.
In order to scale compute, storage, and network service functions,
physical resources are being abstracted from their logical
representation. This is referred as server, storage, and network
virtualization. Virtualization can be implemented in various layers
of computer systems or networks. The virtualized loads are executed
over a common physical infrastructure. Compute nodes running guest
operating systems are often executed as Virtual Machines (or VMs).
This document defines requirements for a network virtualization
solution that provides IP connectivity to virtual resources on end-
systems. The requirements address the virtual resources, defined as
Virtual Machines, applications, and appliances that require only IP
connectivity. Non-IP communication is addressed by other solutions
and is not in scope of this document.
1.1. Terminology
AS Autonomous Systems
End-System A device where Guest OS and Host OS/Hypervisor reside
IaaS Infrastructure as a Service
RT Route Target
ToR Top-of-Rack switch
VM Virtual Machine
Hypervisor Virtual Machine Manager
SDN Software Defined Network
VPN Virtual Private Network
2. Application of MPLS/BGP VPNs to End-Systems
MPLS/BGP VPN technology [RFC 4364] have proven to be able to scale
to a large number of VPNs (tens of thousands) and customer routes
(millions) while providing for aggregated management capability. In
traditional WAN deployments of BGP IP VPNs a Customer Edge (CE) is
a physical device connected to a Provider Edge (PE). In addition,
the forwarding function and control function of a Provider Edge
(PE) device co-exist within a single physical router.
MPLS/BGP VPN technology should to able to evolve and adapt to new
virtualized environments by extending VPN service to end-systems.
[Page 3]
Internet Draft October 2012
When end-system attaches to MPLS/BGP VPN, CE becomes a Virtual
Machine or an application residing on the end-system itself. As in
traditional MPLS/BGP VPN deployments, it is undesirable for the end-
system VPN forwarding knowledge to extend to the transport network
infrastructure. Hence, optimally, with regard to forwarding the end-
system should become both the CE and the PE simultaneously.
Moreover, it is a current practice to implement PE forwarding and
control functions in different processors of the same device and to
use internal (proprietary) communication between those processors.
Typically, the PE control functionality is implemented in one (or
very few) components of a device and the PE forwarding
functionality is implemented in multiple components of the same
device (a.k.a., "line cards"). In end-system environment, a single
end-system, effectively, corresponds to a line card in a
traditional PE router. For scalable and cost effective deployment
of end-system MPLS/BGP VPNs PE forwarding function should be
decoupled from PE control function such that the former can be
implemented on multiple standalone devices. This separation of
functionality will allow for implementing the end-system PE
forwarding on multiple end-system devices, for example, in
operating systems of application servers or network appliances.
The PE control plane function can itself be virtualized and run as
an application in end-system.
3. Connectivity Requirements
A network virtualization solution should be able to provide IPv4 and
IPv6 unicast connectivity between hosts in the same and different
subnets without any assumptions regarding the underlying media
layer.
Furthermore, the multicast transmission, i.e., allowing IP
applications to send packets to a group of IPv4 or IPv6 addresses
should be supported. The multicast service should also support a
delivery of traffic to all endpoints of a given VPN even if those
endpoints have not sent any control messages indicating the need to
receive that traffic. In other words, the multicast service should
be capable of delivering the IP broadcast traffic in a virtual
topology. A solution for supporting VPN multicast and VPN broadcast
must not require that the underlying transport network supports IP
multicast transmission service.
In some deployments, Virtual Machines or applications are
configured to belong to an IP subnet. A network virtualization
solution should support grouping of virtual resources into IP
subnets regardless of whether the underlying implementation uses a
multi-access network or not.
[Page 4]
Internet Draft October 2012
4. Multi-Tenancy Requirements
One of the main goals of network virtualization is to provide
traffic and routing isolation between different virtual components
that share a common physical infrastructure. A collection of
virtual resources might provide external or internal services. For
example, such collection may serve an external "customer" or
internal "tenant" to whom a Service Provider provides service(s).
We will refer to collection of virtual resources dedicated to a
process or application as a VPN, using the terminology of IP VPNs.
Any network virtualization solution has to assure the network
isolation (in data plane and control plane) among tenants or
applications sharing the same data center physical resources.
Typically VPNs that belong to different external tenants do not
communicate with each other directly but they should be allowed to
access shared services or shared network resources. It is also
common for tenants to require multiple distinct VPNs. In that
scenario traffic might need to cross VPN boundaries, subject to
access controls and/or routing policies.
A tenant should be able to create multiple VPNs. A network
virtualization solution should allow a VM or application end-point
to directly access multiple VPNs without a need to traverse a
gateway. It is often the case that SP infrastructure services are
provided to multiple tenants, for example voice-over-IP gateway
services or video-conferencing services for branch offices.
A network virtualization solution should support both, isolated
VPNs and overlapping VPNs (often referred to as "extranets"), as
well as both, any-to-any and hub-and-spoke topologies.
5. Decoupling of Virtualized Networking from Physical
Infrastructure
One of the main goals in designing a large scale transport network
is to minimize the cost and complexity of its "fabric". It is often
done by delegating the virtual resource communication processing to
the network edge. Networks use various VPN technologies to isolate
disjoint groups of virtual resources. Some use VLANs as a VPN
technology, others use layer 3 based solutions, often with
proprietary control planes. Service Providers are interested in
interoperability and in openly documented protocols rather than in
proprietary solutions.
The transport network infrastructure should not maintain any
information that pertains to the virtual resources in end-systems.
Decoupling of virtualized networking from the physical
infrastructure has the following advantages: 1) provides better
[Page 5]
Internet Draft October 2012
scalability; 2) simplifies the design and operation; 3) reduces
network cost. It has been proven (in Internet and in large BGP IP
VPN deployments) that moving complexity to network edge while
keeping network core simple has very good scaling properties.
There should be a total separation between the virtualized segments
(i.e., interfaces associated with virtual resources) and the
physical network (i.e., physical interfaces associated with network
infrastructure). This separation should include the separation of
the virtual network IP address space from the physical network IP
address space. The physical infrastructure addresses should be
routable in the underlying transport network, while the virtual
network addresses should be routable only in the virtual network.
Not only should the virtual network data plane be fully decoupled
from the physical network, but its control plane should be
decoupled as well.
6. Decoupling of Layer 3 Virtualization from Layer 2 Topology
The layer 3 approach to network virtualization dictates that the
virtualized communication should be routed, not bridged. The layer
3 virtualization solution should be decoupled from the layer 2
topology. Thus, there should be no dependency on VLANs and layer 2
broadcast.
In solutions that depend on layer 2 broadcast domains, host-to-host
communication is established based on flooding and data plane MAC
learning. Layer 2 MAC information has to be maintained on every
switch where a given VLAN is present. Even if some solutions are
able to minimize data plane MAC learning and/or unicast flooding,
they still rely on MAC learning at the network edge and on
maintaining the MAC addresses on every (edge) switch where the
layer 2 VPN is present.
The MAC addresses known to guest OS in end-system are not relevant
to IP services and introduce unnecessary overhead. Hence, the MAC
addresses associated with virtual resources should not be used in
the virtual layer 3 networks. Rather, only what is significant to
IP communication, namely the IP addresses of the virtual machines
and application endpoints should be maintained by the virtual
networks.
7. Encapsulation of Virtual Payloads
In a layer 3 end-system virtual network, IP packets should reach
the first-hop router in one IP-hop, regardless of whether the
first-hop router is an end-system itself (i.e., a hypervisor/Host
[Page 6]
Internet Draft October 2012
OS) or it is an external (to end-system) device. The first-hop
router should always perform an IP lookup on every packet it
receives from a virtual machine or an application. The first-hop
router should encapsulate the packets and route them towards the
destination end-system.
In order to scale the transport networks, the virtual network
payloads must be encapsulated with headers that are routable (or
switchable) in the physical network infrastructure. The IP
addresses of the virtual resources are not to be advertized within
the physical infrastructure address space.
The encapsulation (and decapsulation) function should be
implemented on a device as close to virtualized resources as
possible. Since the hypervisors in the end-systems are the devices
at the network edge they are the most optimal location for the
encap/decap functionality. A device implementing the encap/decap
functionality acts as the first-hop router in the virtual topology.
The network virtualization solution should also support deployments
where it is not possible or not desirable to implement the virtual
payload encapsulation in the hypervisor/Host OS. In such
deployments encap/decap functionality may be implemented in an
external device. The external device implementing encap/decap
functionality should be a close as possible to the end-system
itself. The same network virtualization solution should support
deployments with both, internal (in a hypervisor) and external
(outside of a hypervisor) encap/decap devices.
Whenever the virtual forwarding functionality is implemented in an
external device, the virtual service itself must be delivered to an
end-system such that switching elements connecting the end-system
to the encap/decap device are not aware of the virtual topology.
MPLS/VPN technology based on [RFC 4364] specifies that different
encapsulation methods could be for connecting PE routers, namely
Label Switched Paths (LSPs), IP tunneling, and GRE tunneling. If
LSPs are used in the transport network they could be signaled with
LDP, in which case host (/32) routes to all PE routers must be
propagated throughout the network, or with RSVP-TE, in which case a
full mesh of RSVP-TE tunnels is required. If the transport network
is only IP-capable then MPLS in IP or MPLS in GRE [RFC4023]
encapsulation could be used. Other transport layers such 802.1ah
might also need to be supported.
8. Optimal Forwarding of Traffic
[Page 7]
Internet Draft October 2012
The network virtualization solutions that optimize for the maximum
utilization of compute and storage resources require that those
resources may be located anywhere in the network. The physical and
logical spreading of appliances and workloads implies a very
significant increase in the infrastructure bandwidth consumption.
Hence, it is important that the virtualized networking solutions are
efficient in terms of traffic forwarding and assure that packets
traverse the transport network only once.
It must be also possible to send the traffic directly from one end-
system to another end-system without traversing through a midpoint
router.
9. Inter-operability with Existing MPLS/BGP VPNs
Service Providers want to tie their server-based offerings to their
MPLS/BGP VPN services. MPLS/BGP VPNs provide secure and latency-
optimized WAN connectivity to the virtualized resources in SP's
data center. MPLS/BGP VPN customers may require simultaneous access
to resources in both SP and their own data centers. The service
provider-based VPN access can provide additional value compared
with public internet access, such as security, QoS, OAM, multicast
service, VoIP service, video conferencing, wireless connectivity.
Service Providers want to "spin up" the L3VPN access to data center
VPNs as dynamically as the spin up of compute and other virtualized
resources.
The network virtualization solution should be fully inter-operable
with MPLS/BGP VPNs, including Inter-AS MPLS/BGP VPN Options A, B,
or C [RFC 4364]. MPLS/BGP VPN technology is widely supported on
routers and other appliances. BGP/MPLS VPN-capable network devices
should be able to participate directly in a virtual network that
spans end-systems. The network devices should be able to
participate in isolated collections of end-systems, i.e., in
isolated VPNs, as well as in overlapping VPNs (called "extranets"
in BGP/MPLS VPN terminology).
When connecting an end-system VPN with other services/networks, it
should not be necessary to advertize the specific host routes but
rather the aggregated routing information. A BGP/MPLS VPN-capable
router or appliance can be used to aggregate VPN's IP routing
information and advertize the aggregated prefixes. The aggregated
prefixes should be advertized with the router/appliance IP address
as BGP next-hop and with locally assigned aggregate 20-bit label.
The aggregate label should trigger a destination IP lookup in its
corresponding VRF on all the packets entering the virtual network.
[Page 8]
Internet Draft October 2012
The inter-connection of end-system VPNs with traditional VPNs
requires an integrated control plane and unified orchestration of
network and end-system resources.
10. IP Mobility
Another reason for a network virtualization is the need to support
IP mobility. IP mobility consists in IP addresses used for
communication within or between applications being anywhere across
the network. Using a virtual topology, i.e., abstracting the
externally visible network address from the underlying
infrastructure address is an effective way to solve IP mobility
problem.
IP mobility consists in a device physically moving (e.g., a roaming
wireless device) or a workload being transferred from one physical
server/appliance to another. IP mobility requires preserving
device's active network connections (e.g., TCP and higher-level
sessions). Such mobility is also referred to as "live" migration
with respect to a Virtual Machine. IP mobility is highly desirable
for many reasons such as efficient and flexible resource sharing,
data center migration, disaster recovery, server redundancy, or
service bursting.
To accommodate live mobility of a virtual machine (or a device), it
is desirable to assign to it a permanent IP address that remains
with the VM/device after it moves. When dealing with IP-only
applications it is not only sufficient but optimal to forward the
traffic based on layer 3 rather than on layer 2 information. The
MAC addresses of devices or applications should be irrelevant to IP
services and introduce unnecessary overhead and complications when
devices or VMs move (i.e., when a VM moves between physical
servers, the MAC learning tables in the switches must be updated;
also, it is possible that VM's MAC address might need to change in
its new location). In IP-based network virtualization solution a
device or a workload move should be handled by an IP route
advertisement.
IP mobility has to be transparent to applications and any external
entity interacting with the applications. This implies that the
network connectivity restoration time is critical. The transport
sessions can typically survive over several seconds of disruption,
however, applications may have sub-second latency requirement for
their correct operation.
To minimize the disruption to established communication during
workload or device mobility, the control plane of a network
virtualization solution should be able to differentiate between the
[Page 9]
Internet Draft October 2012
activation of a workload in a new location from advertising its
route to the network. This will enable the remote end-points to
update their routing tables prior to workload's migration as well
as allowing the traffic to be tunneled via the workload's old
location.
11. BGP Requirements in a Virtualized Environment
11.1. BGP Convergence and Routing Consistency
BGP was designed to carry very large amount of routing information
but it is not a very fast converging protocol. In addition, the
routing protocols, including BGP, have traditionally favored
convergence (i.e., responsiveness to route change due to failure or
policy change) over routing consistency. Routing consistency means
that a router forwards a packet strictly along the path adopted by
the upstream routers. When responsiveness is favored, a router
applies a received update immediately to its forwarding table
before propagating the update to other routers, including those
that potentially depend upon the outcome of the update. The route
change responsiveness comes at the cost of routing blackholes and
loops.
Routing consistency in virtualized environments is important
because multiple workloads can be simultaneously moved between
different physical servers due to maintenance activities, for
example. If packets sent by the applications that are being moved
are dropped (because they do not follow a live path), the active
network connections will be dropped. To minimize the disruption to
the established communications during VM migration or device
mobility, the live path continuity is required.
11.1.1. BGP IP Mobility Requirements
In IP mobility, the network connectivity restoration time is
critical. In fact, Service Provider networks already use routing
and forwarding plane techniques that support fast failure
restoration by pre-installing a backup path to a given destination.
These techniques allow to forward traffic almost continuously using
an indirect forwarding path or a tunnel to a given destination, and
hence, are referred to as "local repair". The traffic path is
restored locally at the destination's old location while the
network converges to a backup path. Eventually, the network
converges to an optimal path and bypasses the local repair.
BGP assists in the local repair techniques by advertizing multiple
and not only the best path to a given destination.
[Page 10]
Internet Draft October 2012
11.2. Optimizing Route Distribution
When virtual networks are triggered based on the IP communication,
the Route Target Constraint extension [RFC 4684] of BGP should be
used to optimize the route distribution for sparse virtual network
events. This technique ensures that only those VPN forwarders that
have local participants in a particular data plane event receive
its routing information. This also decreases the total load on the
upstream BGP speakers.
12. Security Considerations
The document presents the requirements for end-systems MPLS/BGP
VPNs. The security considerations for specific solutions will be
documented in the relevant documents.
13. IANA Considerations
This document contains no new IANA considerations.
14. Normative References
[RFC 4363] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006.
[RFC 4023] Worster, T., Rekhter, Y. and E. Rosen, "Encapsulating
in IP or Generic Routing Encapsulation (GRE)", RFC 4023, March
2005.
[RFC 4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
R., Patel, K. and J. Guichard, "Constrained Route Distribution for
Border Gateway Protocol/Multiprotocol Label Switching (BGP/MPLS)
Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684,
November 2006.
15. Informative References
[RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
16. Authors' Addresses
[Page 11]
Internet Draft October 2012
Maria Napierala
AT&T
200 Laurel Avenue
Middletown, NJ 07748
Email: mnapierala@att.com
Luyuan Fang
Cisco Systems
111 Wood Avenue South
Iselin, NJ 08830, USA
Email: lufang@cisco.com
17. Acknowledgements
The authors would like to thank Pedro Marques for his comments and
input.
[Page 12]