Network Working Group L. Dunbar
Internet Draft Futurewei
Intended status: Informational A. Malis
Expires: September 3, 2023 Malis Consulting
C. Jacquenet
Orange
M. Toy
Verizon
K. Majumdar
Microsoft
March 3, 2023
Dynamic Networks to Hybrid Cloud DCs: Problem Statement and
Mitigation Practices
draft-ietf-rtgwg-net2cloud-problem-statement-20
Abstract
This document describes the network-related problems enterprises
face today when interconnecting their branch offices with dynamic
workloads in third-party data centers (a.k.a. Cloud DCs) and some
mitigation practices. There can be many problems associated with
connecting to or among Cloud DCs; the Net2Cloud problem statements
are mainly for enterprises that already have traditional MPLS
services and are interested in leveraging those networks (instead of
altogether abandoning them). Other problems are out of the scope of
this document.
This document also describes the mitigation practices for getting
around the identified problems and discusses the benefits vs. costs
of the practices.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
xxx, et al. [Page 1]
Internet-Draft Net2Cloud Problem Statement
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on September 3, 2023.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction...................................................3
2. Definition of terms............................................3
3. Issues and Mitigation Methods of Connecting to Cloud DCs.......4
3.1. Increased BGP errors and Mitigation Methods...............4
3.2. Site failures and Methods to Minimize Impacts.............5
3.3. Optimal Paths to Cloud DC locations.......................6
3.4. Network Issues for 5G Edge Clouds and Mitigation Methods..7
3.5. Security Issues and Methods to Minimize impacts...........7
3.6. DNS Practices for Hybrid Workloads........................8
3.7. NAT Practice for Accessing Cloud Services.................9
3.8. Cloud Discovery Practices................................10
4. Dynamic Interconnecting Enterprise Sites with Cloud DCs.......10
Dunbar, et al. [Page 2]
Internet-Draft Net2Cloud Problem Statement
4.1. Sites to Cloud DC........................................11
4.2. Inter-Cloud Connection...................................13
4.3. Extending MPLS-based VPNs to Hybrid Cloud DCs............14
5. Methods to Scale IPsec tunnels to Cloud DCs...................15
5.1. Scaling Issues with IPsec Tunnels........................15
5.2. Poor performance when overlay public internet............16
6. Requirements for Dynamic Cloud Data Center VPNs...............16
7. Security Considerations.......................................17
8. IANA Considerations...........................................17
9. References....................................................17
9.1. Normative References.....................................17
9.2. Informative References...................................18
10. Acknowledgments..............................................18
1. Introduction
With the advent of widely available third-party cloud DCs and
services in diverse geographic locations and the advancement of
tools for monitoring and predicting application behaviors, it is
very attractive for enterprises to instantiate applications and
workloads in locations that are geographically closest to their end-
users. Such proximity can improve end-to-end latency and overall
user experience. Conversely, an enterprise can easily shutdown
applications and workloads whenever end-users are in motion (thereby
modifying the networking connection of subsequently relocated
applications and workloads).
Key characteristics of Cloud Services are on-demand, scalable,
highly available, and usage-based billing. Cloud Services, such as,
compute, storage, network functions (most likely virtual), third
party managed applications, etc. are usually hosted and managed by
third parties Cloud Operators. Here are some examples of Cloud
network functions: Virtual Firewall services, Virtual private
network services, Virtual PBX services including voice and video
conferencing systems, etc. Cloud Data Center (DC) is shared
infrastructure that hosts the Cloud Services to many customers.
2. Definition of terms
Cloud DC: Third party Data Centers that usually host applications
and workload owned by different organizations or
tenants.
Dunbar, et al. [Page 3]
Internet-Draft Net2Cloud Problem Statement
Controller: Used interchangeably with SD-WAN controller to manage
SD-WAN overlay path creation/deletion and monitoring the
path conditions between two or more sites.
DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure
network that exchanges data between sites without
needing to pass traffic through an organization's
headquarter virtual private network (VPN) server or
router.
Heterogeneous Cloud: applications and workloads split among Cloud
DCs owned or managed by different operators.
Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own
on-premises DCs in addition to Cloud services provided
by one or more cloud operators. (e.g. AWS, Azure,
Google, Salesforces, SAP, etc).
VPC: Virtual Private Cloud is a virtual network dedicated to
one client account. It is logically isolated from other
virtual networks in a Cloud DC. Each client can launch
his/her desired resources, such as compute, storage, or
network functions into his/her VPC. Most Cloud
operators' VPCs only support private addresses, some
support IPv4 only, others support IPv4/IPv6 dual stack.
3. Issues and Mitigation Methods of Connecting to Cloud DCs
There are many problems associated with connecting to hybrid Cloud
Services, many of which are out of the IETF scope. This section is
to identify some of the high-level problems that can be addressed by
IETF, especially by Routing area. Other problems are out of the
scope of this document. By no means has this section covered all
problems for connecting to Hybrid Cloud Services, e.g., difficulty
in managing cloud spending is not discussed here.
3.1. Increased BGP errors and Mitigation Methods
Traditional network service providers usually have prior negotiated
peering policies with their BGP peers over fixed interfaces. Cloud
GWs need to peer with more variety of parties, via private circuits
Dunbar, et al. [Page 4]
Internet-Draft Net2Cloud Problem Statement
or IPsec over public internet. Many of those peering parties may not
be traditional network service providers. Their BGP configurations
practices might not be consistent, and some are done by less
experienced personnel. All those can contribute to increased BGP
peering errors, such as capability mismatch, BGP ceasing
notification, unwanted route leaks, missing Keepalives, etc.
Capability mismatch can cause BGP sessions not established properly.
If a BGP speaker receives from its peer a capability that it does
not itself support or recognize, it MUST ignore that capability and
the BGP session MUST NOT be terminated per RFC5492. When receiving a
BGP UPDATE with a malformed attribute, the revised BGP error
handling procedure [RFC7606] should be followed instead of session
resetting.
Many Cloud DCs don't support multi hop eBGP peering with external
devices. To get around this limitation, it is necessary for
enterprises GWs to establish IP tunnels to the Cloud GWs to form IP
adjacency.
Some Cloud DC eBGP peering only supports limited number of routes
from external entities. To get around this limitation, on-premises
DCs need to set up default routes to be exchanged with the Cloud DC
eBGP peers. When inbound routes exceed the maximum routes threshold
for a peer, the current common practice is generating out of band
alerts (e.g., Syslog) via management system to the peer if the BGP
session is not ceased. If the BGP session is terminated when
exceeding the maximum routes from a peer, cease notification
messages [RFC 4486] must be sent.
3.2. Site failures and Methods to Minimize Impacts
Site failures include, but not limited to, a site capacity
degradation or entire site going down caused by a variety of
reasons, such as fiber cut connecting to the site or among pods
within the site, cooling failures, insufficient backup power, cyber
threats attacks, too many changes outside of the maintenance window,
etc. Fiber-cut is not uncommon within a Cloud site or between sites.
As described in RFC7938, Cloud DC BGP might not have an IGP to route
around link/node failures within the ASes.
When those failure events happen, the Cloud DC GW which is visible
to clients are running fine. Therefore, the Client GW can't use BFD
to detect the failures.
Dunbar, et al. [Page 5]
Internet-Draft Net2Cloud Problem Statement
When a site capacity degrades or goes dark, there are massive
numbers of routes needing to be changed.
The large number of routes switching over to another site can also
cause overloading that triggers more failures.
In addition, the routes (IP addresses) in a Cloud DC cannot be
aggregated nicely, triggering very large number of BGP UPDATE
messages when a failure occurs.
It might be more effective to do mass reroute, similar to EVPN
[RFC7432] defined mass withdraw mechanism to signal a large number
of routes being changed to remote PE nodes as quickly as possible.
3.3. Optimal Paths to Cloud DC locations
Many applications have multiple instances instantiated in different
Cloud DCs. The current state of the art solutions is typically based
on DNS assisted with load balancer by responding a FQDN (Fully
Qualified Domain Name) inquiry with an IP address of the closest or
lowest cost DC that can reach the instance. Here are some problems
associated with DNS based solutions:
- Dependent on client behavior
- Client can cache results indefinitely
- Client may not receive service even though there are
servers available (before cache timeout) in other Cloud
DCs.
- No inherent leverage of proximity information present in the
network (routing) layer, resulting in loss of performance.
- Inflexible traffic control:
Local DNS resolver become the unit of traffic management. This
requires DNS to receive periodical update of the network
condition, which is difficult.
To address the problems listed above, ANYCAST addresses can be
utilized so that network proximity and conditions can be inherently
considered in optimal path selection.
Dunbar, et al. [Page 6]
Internet-Draft Net2Cloud Problem Statement
3.4. Network Issues for 5G Edge Clouds and Mitigation Methods
The 5G edge clouds may host edge computing servers (virtual or
physical) for Ultra-low latency services that must be near the UEs
(User equipment). Those edge computing applications have low latency
connections to the UEs and regular connections to backend servers or
databases in other locations.
The low latency services traffic to/from the edge Clouds are
transported through the 5G Local Data Networks (LDN) and 5G UPFs to
the UEs. The LDN's ingress routers, directly connected to the UPFs,
might be co-located with 5G Core functions in the edge Cloud data
centers. The 5G Core functions include Radio Control Functions,
Session Management Functions (SMF), Access Mobility Functions (AMF),
User Plane Functions (UPF), etc.
Here are some network problems with connecting the services in the
5G Edge Cloud DCs:
1) The difference of routing distances to multiple server
instances in different edge Cloud is relatively small.
2) Capacity status at the Edge Cloud DC might play a bigger role
for E2E performance.
3) Source (UEs) can ingress from different LDN Ingress routers
due to mobility.
To get around those problems, the ingress routers can incorporate
the destination site's capabilities with the routing distance in
computing the optimal paths.
3.5. Security Issues and Methods to Minimize impacts
There are many aspects of security issues in terms of networking to
clouds:
- Service instances in Cloud DCs are connected to users
(enterprises) via Public IP ports which are exposed to the
following security risks:
a) Potential DDoS attack to the ports facing the untrusted
network (e.g., the public internet), which may propagate to the
cloud edge resources. To mitigate such security risk, it is
Dunbar, et al. [Page 7]
Internet-Draft Net2Cloud Problem Statement
necessary for the ports facing internet to enable Anti-DDoS
features.
b) Potential risk of augmenting the attack surface with inter-
Cloud DC connection by means of identity spoofing, man-in-the-
middle, eavesdropping or DDoS attacks. One example of
mitigating such attacks is using DTLS to authenticate and
encrypt MPLS-in-UDP encapsulation (RFC 7510).
- Potential attacks from service instances within the cloud. For
example, data breaches, compromised credentials, and broken
authentication, hacked interfaces and APIs, account hijacking.
- When IPsec tunnels established from enterprise on-premises CPEs
are terminated at the Cloud DC gateway where the workloads or
applications are hosted, traffic to/from an enterprise's
workload can be exposed to others behind the data center
gateway (e.g., exposed to other organizations that have
workloads in the same data center).
To ensure that traffic to/from workloads is not exposed to
unwanted entities, IPsec tunnels may go all the way to the
workload (servers, or VMs) within the DC.
Many Cloud operators offer monitoring services for data stored in
Clouds, such as AWS CloudTrail, Azure Monitor, and many third-party
monitoring tools to improve visibility to data stored in Clouds.
More diligent security procedures need to be considered to mitigate
all those security issues.
3.6. DNS Practices for Hybrid Workloads
DNS name resolution is essential for on-premises and cloud-based
resources. For customers with hybrid workloads, which include on-
premises and cloud-based resources, extra steps are necessary to
configure DNS to work seamlessly across both environments.
Cloud operators have their own DNS to resolve resources within their
Cloud DCs and to well-known public domains. Cloud's DNS can be
configured to forward queries to customer managed authoritative DNS
servers hosted on-premises and to respond to DNS queries forwarded
by on-premises DNS servers.
Dunbar, et al. [Page 8]
Internet-Draft Net2Cloud Problem Statement
For enterprises utilizing Cloud services by different cloud
operators, it is necessary to establish policies and rules on
how/where to forward DNS queries. When applications in one Cloud
need to communicate with applications hosted in another Cloud, there
could be DNS queries from one Cloud DC being forwarded to the
enterprises' on-premises DNS, which in turn be forwarded to the DNS
service in another Cloud. Configuration can be complex depending on
the application communication patterns.
However, even with carefully managed policies and configurations,
collisions can still occur. If you use an internal name like .cloud
and then want your services to be available via or within some other
cloud provider which also uses .cloud, then collisions might occur.
Therefore, it is better to use the global domain name even when an
organization does not make all its namespace globally resolvable. An
organization's globally unique DNS can include subdomains that
cannot be resolved outside certain restricted paths, zones that
resolve differently based on the origin of the query, and zones that
resolve the same globally for all queries from any source.
Globally unique names do not equate to globally resolvable names or
even global names that resolve the same way from every perspective.
Globally unique names can prevent any possibility of collisions at
present or in the future, and they make DNSSEC trust manageable.
Consider using a registered and fully qualified domain name (FQDN)
from global DNS as the root for enterprise and other internal
namespaces.
3.7. NAT Practice for Accessing Cloud Services
Cloud resources, such as VM instances, are usually assigned private
IP addresses. By configuration, some private subnets can have the
NAT function to reach out to external networks, and some private
subnets are internal to Cloud only.
Different Cloud operators support different levels of NAT functions.
For example, AWS NAT Gateway does not currently support connections
towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC
Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-
nat-gateway.html#nat-gateway-other-services. AWS Direct
Connect/VPN/VPC Peering does not currently support any NAT
functionality.
Google's Cloud NAT allows Google Cloud virtual machine (VM)
instances without external IP addresses and private Google
Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud
Dunbar, et al. [Page 9]
Internet-Draft Net2Cloud Problem Statement
NAT implements outbound NAT in conjunction with a default route to
allow instances to reach the Internet. It does not implement inbound
NAT. Hosts outside the VPC network can only respond to established
connections initiated by instances inside the Google Cloud; they
cannot initiate new connections to Cloud instances via NAT.
For enterprises with applications running in different Cloud DCs,
proper configuration of NAT must be performed in Cloud DCs and their
on-premises DC.
3.8. Cloud Discovery Practices
One of the concerns of using Cloud services is not aware of where
the resource is located, especially Cloud operators can move
application instances from one place to another. When applications
in Cloud communicate with on-premises applications, it may not be
clear where the Cloud applications are located or to which VPCs they
belong.
Being able to detect Cloud services location can help on-premises
gateways (routers) to switch the services to a more optimal site
when the current cloud site encounters failures or degradation. A
significant difference is that cloud discovery uses the cloud
vendor's API to extract data on your cloud services rather than the
direct access used in scanning your on-premises infrastructure.
For enterprises that instantiate virtual routers in Cloud DCs,
metadata can be attached (e.g., GENEVE header or IPv6 optional
header) to indicate Geo-location of the Cloud DCs.
4. Dynamic Interconnecting Enterprise Sites with Cloud DCs
For many enterprises with established VPNs (e.g., MPLS-based L2VPN
or L3VPN) interconnecting branch offices & on-premises data centers,
connecting to Cloud services will be mixed of different types of
networks. When an enterprise's existing VPN service providers do not
have direct connections to the desired cloud DCs that the enterprise
prefers to use, the enterprise faces additional infrastructure and
operational costs to utilize the Cloud services.
This section describes practices to connect to Cloud services.
Dunbar, et al. [Page 10]
Internet-Draft Net2Cloud Problem Statement
4.1. Sites to Cloud DC
Most Cloud operators offer some type of network gateway through
which an enterprise can reach their workloads hosted in the Cloud
DCs. For example, AWS (Amazon Web Services) offers the following
options to reach workloads in AWS Cloud DCs:
- AWS Internet gateway allows communication between instances in
AWS VPC and the internet.
- AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are
established between an enterprise's own gateway and AWS vGW, so
that the communications between those gateways can be secured
from the underlay (which might be the public Internet).
- AWS Direct Connect, which allows enterprises to purchase direct
connect from network service providers to get a private leased
line interconnecting the enterprises gateway(s) and the AWS
Direct Connect routers. In addition, an AWS Transit Gateway can
be used to interconnect multiple VPCs in different Availability
Zones. AWS Transit Gateway acts as a hub that controls how
traffic is forwarded among all the connected networks which act
like spokes.
Microsoft's ExpressRoute allows extension of a private network to
any of the Microsoft cloud services, including Azure and Office365.
ExpressRoute is configured using Layer 3 routing. Customers can opt
for redundancy by provisioning dual links from their location to two
Microsoft Enterprise edge routers (MSEEs) located within a third-
party ExpressRoute peering location. The BGP routing protocol is
then setup over WAN links to provide redundancy to the cloud. This
redundancy is maintained from the peering data center into
Microsoft's cloud network.
Google's Cloud Dedicated Interconnect offers similar network
connectivity options as AWS and Microsoft. One distinct difference,
however, is that Google's service allows customers access to the
entire global cloud network by default. It does this by connecting
the on-premises network with the Google Cloud using BGP and Google
Cloud Routers to provide optimal paths to the different regions of
the global cloud infrastructure.
Figure below shows an example of some of a tenant's workloads are
accessible via a virtual router connected by AWS Internet Gateway;
Dunbar, et al. [Page 11]
Internet-Draft Net2Cloud Problem Statement
some are accessible via AWS vGW, and others are accessible via AWS
Direct Connect.
Different types of access require different level of security
functions. Sometimes it is not visible to end customers which type
of network access is used for a specific application instance. To
get better visibility, separate virtual routers (e.g., vR1 & vR2)
can be deployed to differentiate traffic to/from different cloud
GWs. It is important for some enterprises to be able to observe the
specific behaviors when connected by different connections.
Customer Gateway can be customer owned router or ports physically
connected to AWS Direct Connect GW.
+------------------------+
| ,---. ,---. |
| (TN-1 ) ( TN-2)|
| `-+-' +---+ `-+-' |
| +----|vR1|----+ |
| ++--+ |
| | +-+----+
| | /Internet\ For External
| +-------+ Gateway +----------------------
| \ / to reach via Internet
| +-+----+
| |
| ,---. ,---. |
| (TN-1 ) ( TN-2)|
| `-+-' +---+ `-+-' |
| +----|vR2|----+ |
| ++--+ |
| | +-+----+
| | / virtual\ For IPsec Tunnel
| +-------+ Gateway +----------------------
| | \ / termination
| | +-+----+
| | |
| | +-+----+ +------+
| | / \ For Direct /customer\
| +-------+ Gateway +----------+ gateway |
| \ / Connect \ /
| +-+----+ +------+
| |
+------------------------+
Figure 1: Examples of Multiple Cloud DC connections.
Dunbar, et al. [Page 12]
Internet-Draft Net2Cloud Problem Statement
4.2. Inter-Cloud Connection
The connectivity options to Cloud DCs described in the previous
section are for reaching Cloud providers' DCs, but not between cloud
DCs. When applications in AWS Cloud need to communicate with
applications in Azure, today's practice requires a third-party
gateway (physical or virtual) to interconnect the AWS's Layer 2
DirectConnect path with Azure's Layer 3 ExpressRoute.
Enterprises can also instantiate their own virtual routers in
different Cloud DCs and administer IPsec tunnels among them, which
by itself is not a trivial task. Or by leveraging open-source VPN
software such as strongSwan, you create an IPSec connection to the
Azure gateway using a shared key. The StrongSwan instance within AWS
not only can connect to Azure but can also be used to facilitate
traffic to other nodes within the AWS VPC by configuring forwarding
and using appropriate routing rules for the VPC.
Most Cloud operators, such as AWS VPC or Azure VNET, use non-
globally routable CIDR from private IPv4 address ranges as specified
by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is
necessary to exchange Public routable addresses for applications in
different Cloud DCs.
In summary, here are some approaches, available now to interconnect
workloads among different Cloud DCs:
a) Utilize Cloud DC provided inter/intra-cloud connectivity
services (e.g., AWS Transit Gateway) to connect workloads
instantiated in multiple VPCs. Such services are provided with
the cloud gateway to connect to external networks (e.g., AWS
DirectConnect Gateway).
b) Hairpin all traffic through the customer gateway, meaning all
workloads are directly connected to the customer gateway, so
that communications among workloads within one Cloud DC must
traverse through the customer gateway.
c) Establish direct tunnels among different VPCs (AWS' Virtual
Private Clouds) and VNET (Azure's Virtual Networks) via
client's own virtual routers instantiated within Cloud DCs.
DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN
(Dynamic Smart VPN) techniques can be used to establish direct
Multi-point-to-Point or multi-point-to multi-point tunnels
among those client's own virtual routers.
Dunbar, et al. [Page 13]
Internet-Draft Net2Cloud Problem Statement
Approach a) usually does not work if Cloud DCs are owned and managed
by different Cloud providers.
Approach b) creates additional transmission delay plus incurring
cost when exiting Cloud DCs.
For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution
Protocol) [RFC2735] so that spoke nodes can register their IP
addresses & WAN ports with the hub node. The IETF ION
(Internetworking over NBMA (non-broadcast multiple access) WG
standardized NHRP for connection oriented NBMA network (such as ATM)
network address resolution more than two decades ago.
There are many differences between virtual routers in Public Cloud
DCs and the nodes in an NBMA network. NHRP cannot be used for
registering virtual routers in Cloud DCs unless an extension of such
protocols is developed for that purpose, e.g. taking NAT or dynamic
addresses into consideration. Therefore, DMVPN and/or DSVPN cannot
be used directly for connecting workloads in hybrid Cloud DCs.
4.3. Extending MPLS-based VPNs to Hybrid Cloud DCs
Traditional MPLS-based VPNs have been widely deployed as an
effective way to support businesses and organizations that require
network performance and reliability. MPLS shifted the burden of
managing a VPN service from enterprises to service providers. The
CPEs attached to MPLS VPNs are simpler and less expensive because
they do not need to manage routes to remote sites; they pass all
outbound traffic to the MPLS VPN PEs to which the CPEs are attached
(albeit multi-homing scenarios require more processing logic on
CPEs). MPLS has addressed the problems of scale, availability, and
fast recovery from network faults, and incorporated traffic-
engineering capabilities.
However, traditional MPLS-based VPN solutions are sub-optimized for
dynamically connecting to workloads/applications in cloud DCs. The
Provider Edge (PE) nodes of the enterprise's VPNs might not have
direct connections to the third-party cloud DCs used by the
enterprise to provide easy access to its end users. When the user
base changes, the enterprise's workloads/applications may be
migrated to a new cloud DC location closest to the new user base.
The existing MPLS VPN provider might not have PEs at the new
location. Deploying PEs routers at new locations is not trivial,
Dunbar, et al. [Page 14]
Internet-Draft Net2Cloud Problem Statement
which defeats one of the benefits of Clouds' geographically diverse
locations allowing workloads to be as close to their end-users as
possible.
To mitigate those problems, IPsec tunnels can be used to dynamically
connecting MPLS PEs with the desired Cloud DCs. As MPLS VPN provide
more secure and higher quality services, it is desirable to locate
the PEs with the least cost (including routing distance and capacity
cost) for the dynamic IPsec tunnels to the Cloud DC GW.
An enterprise can connect to multiple Cloud DC locations and
establish different BGP peers with Cloud GW routers at different
locations. As multiple Cloud DCs are interconnected by the Cloud
provider's own internal network, the Cloud GW BGP session might
advertise all of the prefixes of the enterprise's VPC, regardless of
which Cloud DC a given prefix is actually in. This can result in
inefficient routing for the end-to-end data path. To get around this
problem, virtual routers in Cloud DCs can be used to attach metadata
(e.g., GENEVE header or IPv6 optional header) to indicate Geo-
location of the Cloud DCs.
5. Methods to Scale IPsec tunnels to Cloud DCs
As described in Section 4.3, IPsec tunnels can be used to
dynamically establish connection between MPLS VPN PEs with Cloud GW.
Enterprises can also instantiate virtual routers within Cloud DCs to
connect to their on-premises devices via IPsec tunnels.
5.1. Scaling Issues with IPsec Tunnels
IPsec tunnels are a very convenient solution for an enterprise with
limited locations to reach a Cloud DC.
However, for a medium-to-large enterprise with multiple sites and
data centers to connect to multiple cloud DCs, there are N*N number
of IPsec tunnels among Cloud DC gateways and all those sites. Each
of those IPsec Tunnels requires pair-wise periodic key refreshment.
For a company with hundreds or thousands of locations, managing
hundreds (or even thousands) of IPsec tunnels can be very processing
intensive. That is why many Cloud operators only allow a limited
number of (IPsec) tunnels & bandwidth to each customer.
Dunbar, et al. [Page 15]
Internet-Draft Net2Cloud Problem Statement
To scale the IPsec key management, you could use a solution like
group encryption where a single IPsec SA is necessary at the GW. But
the drawback is key distribution and maintenance of a key server,
etc.
5.2. Poor performance when overlay public internet
When large number of IPSec encap & decap are needed, the performance
is degraded. NAT also adds performance burden.
When enterprise CPEs or gateways are far away from cloud DC gateways
or across country/continent boundaries, performance of IPsec tunnels
over the public Internet can be problematic and unpredictable. Even
though there are many monitoring tools available to measure delay
and various performance characteristics of the network, the
measurement for paths over the Internet is passive and past
measurements may not represent future performance.
Many cloud providers can replicate workloads in different available
zones. An App instantiated in a cloud DC closest to clients may have
to cooperate with another App (or its mirror image) in another
region or database server(s) in the on-premises DC. This kind of
coordination requires predicable networking behavior/performance
among those locations.
6. Requirements for Dynamic Cloud Data Center VPNs
To address the aforementioned issues, any solution for enterprise
VPNs that includes connectivity to dynamic workloads or applications
in cloud data centers should satisfy a set of requirements:
- Global reachability from different geographical zones, thereby
facilitating the proximity of applications as a function of the
end users' location, to improve latency.
- Elasticity: prompt connection to newly instantiated
applications at Cloud DCs when usages increase and prompt
release of connection after applications at locations being
removed when demands change.
- Scalable policy management: apply the appropriate polices to
the newly instantiated application instances at any Cloud DC
locations.
- The solution should allow enterprises to take advantage of the
current state-of-the-art in VPN technology, in both traditional
Dunbar, et al. [Page 16]
Internet-Draft Net2Cloud Problem Statement
MPLS-based VPNs and IPsec-based VPNs (or any combination
thereof) that run over the public Internet.
- The solution should not require an enterprise to upgrade all
their existing CPEs.
- The solution should support scalable IPsec key management among
all nodes involved in DC interconnect schemes.
- The solution needs to support easy and fast, on-the-fly, VPN
connections to dynamic workloads and applications in third
party data centers, and easily allow these workloads to migrate
both within a data center and between data centers.
- Allow VPNs to provide bandwidth and other performance
guarantees.
- Be a cost-effective solution for enterprises to incorporate
dynamic cloud-based applications and workloads into their
existing VPN environment.
7. Security Considerations
The draft discusses security requirements as a part of the problem
space, particularly in sections 3.
Solution drafts resulting from this work will address security
concerns inherent to the solution(s), including both protocol
aspects and the importance (for example) of securing workloads in
cloud DCs and the use of secure interconnection mechanisms.
8. IANA Considerations
This document requires no IANA actions. RFC Editor: Please remove
this section before publication.
9. References
9.1. Normative References
[RFC2735] B. Fox, et al "NHRP Support for Virtual Private networks".
Dec. 1999.
[RFC4271] Y. Rekhter, et al "BGP-4". Jan 2006
Dunbar, et al. [Page 17]
Internet-Draft Net2Cloud Problem Statement
[RFC5492] J. Scudder, R. Chandra "Capabilities Advertisement with
BGP-4". Feb 2009
[RFC7606] E. Chen, et al "Revised Error Handling for BGP UPDATE
Messages". Aug 2015.
[RFC7938] P. Lapukkov, et al "Use of BGP for Routing in Large-
Scale Data Centers", Aug. 2016
9.2. Informative References
[RFC8192] S. Hares, et al "Interface to Network Security Functions
(I2NSF) Problem Statement and Use Cases", July 2017
[ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation,
storage, distribution and enforcement of policies for
network security", Nov 2007.
[RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and
Internet Key Exchange (IKE) Document Roadmap", Feb 2011.
[RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", Feb 2006
[RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual
Private Networks (L2VPNs)", Sept 2006.
10. Acknowledgments
Many thanks to Adrian Farrel, Alia Atlas, Chris Bowers, Paul Vixie,
Paul Ebersman, Timothy Morizot, Ignas Bagdonas, Donald Eastlake,
Michael Huang, Liu Yuan Jiao, Katherine Zhao, and Jim Guichard for
the discussion and contributions.
Dunbar, et al. [Page 18]
Internet-Draft Net2Cloud Problem Statement
Authors' Addresses
Linda Dunbar
Futurewei
Email: Linda.Dunbar@futurewei.com
Andrew G. Malis
Malis Consulting
Email: agmalis@gmail.com
Christian Jacquenet
Orange
Rennes, 35000
France
Email: Christian.jacquenet@orange.com
Mehmet Toy
Verizon
One Verizon Way
Basking Ridge, NJ 07920
Email: mehmet.toy@verizon.com
Kausik Majumdar
Microsoft Azure
kmajumdar@microsoft.com
Dunbar, et al. [Page 19]