Dynamic Networks to Hybrid Cloud DCs Problem Statement
draft-ietf-rtgwg-net2cloud-problem-statement-15
The information below is for an old version of the document.
Document | Type |
This is an older version of an Internet-Draft whose latest revision state is "Active".
|
|
---|---|---|---|
Authors | Linda Dunbar , Andrew G. Malis , Christian Jacquenet , Mehmet Toy , Kausik Majumdar | ||
Last updated | 2022-07-25 | ||
Replaces | draft-dm-net2cloud-problem-statement | ||
RFC stream | Internet Engineering Task Force (IETF) | ||
Formats | |||
Reviews |
ARTART Last Call review
(of
-41)
by Rich Salz
Ready w/nits
TSVART Last Call review
(of
-32)
by Magnus Westerlund
Ready w/issues
INTDIR Early review
(of
-26)
by Benson Muite
Ready w/nits
RTGDIR Early review
(of
-22)
by Ines Robles
Has issues
OPSDIR Early review
(of
-22)
by Susan Hares
Has issues
SECDIR Early review
(of
-22)
by Deb Cooley
Has issues
DNSDIR Early review
(of
-22)
by Florian Obser
Ready w/nits
GENART Early review
(of
-21)
by Paul Kyzivat
Ready w/nits
|
||
Additional resources | Mailing list discussion | ||
Stream | WG state | WG Document | |
Document shepherd | (None) | ||
IESG | IESG state | I-D Exists | |
Consensus boilerplate | Unknown | ||
Telechat date | (None) | ||
Responsible AD | (None) | ||
Send notices to | (None) |
draft-ietf-rtgwg-net2cloud-problem-statement-15
Network Working Group L. Dunbar Internet Draft Futurewei Intended status: Informational A. Malis Expires: January 25, 2023 Malis Consulting C. Jacquenet Orange M. Toy Verizon K. Majumdar Microsoft July 25, 2022 Dynamic Networks to Hybrid Cloud DCs Problem Statement draft-ietf-rtgwg-net2cloud-problem-statement-15 Abstract This document describes the network-related problems enterprises face today when interconnecting their branch offices with dynamic workloads in third-party data centers (a.k.a. Cloud DCs). There can be many problems associated with connecting to or among Clouds; the Net2Cloud problem statements are mainly for enterprises who already have traditional MPLS services and are interested in leveraging those networks (instead of completely abandoning them). This document aims to describe the problems of continuing using the MPLS networks when connecting workloads in the Cloud, and to clarify additional work in the IETF Routing area. Other problems are out of the scope of this document. Current operational problems are examined to determine whether there is a need to improve existing protocols or whether a new protocol is necessary to solve them. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that xxx, et al. [Page 1] Internet-Draft Net2Cloud Problem Statement March 2022 other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 25, 2009. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................3 1.1. Key Characteristics of Cloud Services:....................3 1.2. Connecting to Cloud Services..............................3 1.3. Reaching App instances in the optimal Cloud DC locations..4 2. Definition of terms............................................5 3. High Level Issues of Connecting to Cloud DCs...................6 3.1. More BGP errors triggered by large number of peers........6 3.2. Site failures that may lead to massive routes changes.....6 3.3. 5G Edge Clouds............................................7 Dunbar, et al. [Page 2] Internet-Draft Net2Cloud Problem Statement March 2022 3.4. Security Issues...........................................7 3.5. DNS for Cloud Resources...................................8 3.6. NAT for Cloud Services....................................9 3.7. Cloud Discovery..........................................10 4. Interconnecting Enterprise Sites with Cloud DCs...............10 4.1. Sites to Cloud DC........................................10 4.2. Inter-Cloud Interconnection..............................12 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...14 6. Problem with using IPsec tunnels to Cloud DCs.................15 6.1. Scaling Issues with IPsec Tunnels........................15 6.2. Poor performance when overlay public internet............16 7. End-to-End Security Concerns for Data Flows...................16 8. Requirements for Dynamic Cloud Data Center VPNs...............17 9. Security Considerations.......................................17 10. IANA Considerations..........................................18 11. References...................................................18 11.1. Normative References....................................18 11.2. Informative References..................................18 12. Acknowledgments..............................................18 1. Introduction 1.1. Key Characteristics of Cloud Services: Key characteristics of Cloud Services are on-demand, scalable, highly available, and usage-based billing. Cloud Services, such as, compute, storage, network functions (most likely virtual), third party managed applications, etc. are usually hosted and managed by third parties Cloud Operators. Here are some examples of Cloud network functions: Virtual Firewall services, Virtual private network services, Virtual PBX services including voice and video conferencing systems, etc. Cloud Data Center (DC) is shared infrastructure that hosts the Cloud Services to many customers. 1.2. Connecting to Cloud Services With the advent of widely available third-party cloud DCs and services in diverse geographic locations and the advancement of tools for monitoring and predicting application behaviors, it is very attractive for enterprises to instantiate applications and workloads in locations that are geographically closest to their end- users. Such proximity can improve end-to-end latency and overall user experience. Conversely, an enterprise can easily shutdown applications and workloads whenever end-users are in motion (thereby Dunbar, et al. [Page 3] Internet-Draft Net2Cloud Problem Statement March 2022 modifying the networking connection of subsequently relocated applications and workloads). In addition, enterprises may wish to take advantage of more and more business applications offered by cloud operators. The networks that interconnect hybrid cloud DCs must address the following requirements: - to access all workloads in the desired cloud DCs: Many enterprises include cloud in their disaster recovery strategy, such as enforcing periodic backup policies within the cloud, or running backup applications in the Cloud. - Global reachability from different geographical zones, thereby facilitating the proximity of applications as a function of the end users' location, to improve latency. - Elasticity: prompt connection to newly instantiated applications at Cloud DCs when usages increase and prompt release of connection after applications at locations being removed when demands change. - Scalable policy management: apply the appropriate polices to the newly instantiated application instances at any Cloud DC location. 1.3. Reaching App instances in the optimal Cloud DC locations Many applications have multiple instances instantiated in different Cloud DCs. The current state of the art solutions is typically based on DNS assisted with load balancer by responding a FQDN (Fully Qualified Domain Name) inquiry with an IP address of the closest or lowest cost DC that can reach the instance. Here are some problems associated with DNS based solutions: - Dependent on client behavior - Client can cache results indefinitely - Client may not receive service even though there are servers available (before cache timeout) in other Cloud DCs. - No inherent leverage of proximity information present in the network (routing) layer, resulting in loss of performance Dunbar, et al. [Page 4] Internet-Draft Net2Cloud Problem Statement March 2022 - Client on the west coast can be mapped to a DC on the east coast - Inflexible traffic control: - Local DNS resolver become the unit of traffic management. This requires DNS to receive periodical update of the network condition, which is difficult. 2. Definition of terms Cloud DC: Third party Data Centers that usually host applications and workload owned by different organizations or tenants. Controller: Used interchangeably with SD-WAN controller to manage SD-WAN overlay path creation/deletion and monitoring the path conditions between two or more sites. DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure network that exchanges data between sites without needing to pass traffic through an organization's headquarter virtual private network (VPN) server or router. Heterogeneous Cloud: applications and workloads split among Cloud DCs owned or managed by different operators. Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own on-premises DCs in addition to Cloud services provided by one or more cloud operators. (e.g. AWS, Azure, Google, Salesforces, SAP, etc). VPC: Virtual Private Cloud is a virtual network dedicated to one client account. It is logically isolated from other virtual networks in a Cloud DC. Each client can launch his/her desired resources, such as compute, storage, or network functions into his/her VPC. Most Cloud operators' VPCs only support private addresses, some support IPv4 only, others support IPv4/IPv6 dual stack. Dunbar, et al. [Page 5] Internet-Draft Net2Cloud Problem Statement March 2022 3. High Level Issues of Connecting to Cloud DCs There are many problems associated with connecting to hybrid Cloud Services, many of which are out of the IETF scope. This section is to identify some of the high-level problems that can be addressed by IETF, especially by Routing area. Other problems are out of the scope of this document. By no means has this section covered all problems for connecting to Hybrid Cloud Services, e.g., difficulty in managing cloud spending is not discussed here. 3.1. More BGP errors triggered by large number of peers Many network service providers have limited number of BGP peers and usually have prior negotiated peering policies with their BGP peers. Cloud GWs need to peer with many more parties, via private circuits or IPsec over public internet. Many of those peering parties may not be traditional network service providers. Their BGP configurations practices might not be consistent, and some are done by less experienced personnel. All those can contribute to increased BGP peering errors, such as capability mismatch, BGP cease notification, unwanted route leaks, missing Keepalives, etc. 3.2. Site failures that may lead to massive routes changes Site failures include, but not limited to, a site capacity degradation or entire site going down caused by a variety of reasons, such as fiber cut connecting to the site or among pods within the site, cooling failures, insufficient backup power, cyber threats attacks, too many changes outside of the maintenance window, etc. Fiber-cut is not uncommon within a Cloud site or between sites. As described in RFC7938, Cloud DC BGP might not have an IGP to route around link/node failures within the ASes. When those failure events happen, the Cloud DC GW which is visible to clients are running fine. Therefore, the Client GW can't use BFD to detect the failures. When a site capacity degrades or goes dark, there are massive numbers of routes needing to be changed. The large number of routes switching over to another site can also cause overloading that triggers more failures. Dunbar, et al. [Page 6] Internet-Draft Net2Cloud Problem Statement March 2022 In addition, the routes (IP addresses) in a Cloud DC cannot be aggregated nicely, triggering very large number of BGP UPDATE messages when a failure occurs. It might be more effective to do mass reroute, similar to EVPN [RFC7432] defined mass withdraw mechanism to signal a large number of routes being changed to remote PE nodes as quickly as possible. 3.3. 5G Edge Clouds 5G edge cloud data centers have routers connecting to the 5G Core functions, such as Radio Control Functions, Session Management Function (SMF), Access Mobility Functions (AMF), User Plane Functions (UPF), etc. Those functions need to be connected to the Radio Data Unit (R-DU) on the Cell Tower. The UPFs need to be connected to the 5G Local Data Networks' ingress routers which might co-located the cloud edge data centers. In addition, the 5G edge cloud data centers may host edge computing servers for Ultra-low latency services that need to be near the UEs (User equipment). Those edge computing applications need to have very low latency to the UEs, and also connect to backend servers or databases in another location. Here are some network problems for 5G Edge Cloud DCs: - The difference of routing distances is relatively small - Capacity at the Edge DC plays bigger role for performance - Source (UEs) can ingress from different Ingress routers 3.4. Security Issues There are many aspects of security issues in terms of networking to clouds: - Service instances in Cloud DCs are connected to users (enterprises) via Public IP ports which are exposed to the following security risks: a) Potential DDoS attack to the ports facing the untrusted network (e.g., the public internet), which may propagate to the cloud edge resources. To mitigate such security risk, it is necessary for the ports facing internet to enable Anti-DDoS features. Dunbar, et al. [Page 7] Internet-Draft Net2Cloud Problem Statement March 2022 b) Potential risk of augmenting the attack surface with inter- Cloud DC connection by means of identity spoofing, man-in-the- middle, eavesdropping or DDoS attacks. One example of mitigating such attacks is using DTLS to authenticate and encrypt MPLS-in-UDP encapsulation (RFC 7510). - Potential attacks from service instances within the cloud. For example, data breaches, compromised credentials, and broken authentication, hacked interfaces and APIs, account hijacking. - Securing user identity management, authentication, and access control mechanisms is important. Developing appropriate security mechanisms (including tools to assess the robustness of the enforced security policies) can enhance the confidence needed by enterprises to fully take advantage of Cloud Services. Many Cloud operators offer monitoring services for data stored in Clouds, such as AWS CloudTrail, Azure Monitor, and many third-party monitoring tools to improve visibility to data stored in Clouds. But there is still underline security concerns on illegitimate data and workloads access. 3.5. DNS for Cloud Resources DNS name resolution is essential for on-premises and cloud-based resources. For customers with hybrid workloads, which include on- premises and cloud-based resources, extra steps are necessary to configure DNS to work seamlessly across both environments. Cloud operators have their own DNS to resolve resources within their Cloud DCs and to well-known public domains. Cloud's DNS can be configured to forward queries to customer managed authoritative DNS servers hosted on-premises, and to respond to DNS queries forwarded by on-premises DNS servers. For enterprises utilizing Cloud services by different cloud operators, it is necessary to establish policies and rules on how/where to forward DNS queries to. When applications in one Cloud need to communication with applications hosted in another Cloud, there could be DNS queries from one Cloud DC being forwarded to the enterprise's on-premise DNS, which in turn be forwarded to the DNS service in another Cloud. Needless to say, configuration can be complex depending on the application communication patterns. Dunbar, et al. [Page 8] Internet-Draft Net2Cloud Problem Statement March 2022 However, even with carefully managed policies and configurations, collisions can still occur. If you use an internal name like .cloud and then want your services to be available via or within some other cloud provider which also uses .cloud, then it can't work. Therefore, it is better to use the global domain name even when an organization does not make all its namespace globally resolvable. An organization's globally unique DNS can include subdomains that cannot be resolved at all outside certain restricted paths, zones that resolve differently based on the origin of the query, and zones that resolve the same globally for all queries from any source. Globally unique names do not equate to globally resolvable names or even global names that resolve the same way from every perspective. Globally unique names do prevent any possibility of collision at the present or in the future and they make DNSSEC trust manageable. Consider using a registered and fully qualified domain name (FQDN) from global DNS as the root for enterprise and other internal namespaces. 3.6. NAT for Cloud Services Cloud resources, such as VM instances, are usually assigned with private IP addresses. By configuration, some private subnets can have the NAT function to reach out to external network and some private subnets are internal to Cloud only. Different Cloud operators support different levels of NAT functions. For example, AWS NAT Gateway does not currently support connections towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc- nat-gateway.html#nat-gateway-other-services. AWS Direct Connect/VPN/VPC Peering does not currently support any NAT functionality. Google's Cloud NAT allows Google Cloud virtual machine (VM) instances without external IP addresses and private Google Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud NAT implements outbound NAT in conjunction with a default route to allow instances to reach the Internet. It does not implement inbound NAT. Hosts outside of VPC network can only respond to established connections initiated by instances inside the Google Cloud; they cannot initiate their own, new connections to Cloud instances via NAT. Dunbar, et al. [Page 9] Internet-Draft Net2Cloud Problem Statement March 2022 For enterprises with applications running in different Cloud DCs, proper configuration of NAT has to be performed in Cloud DC and in their on-premises DC. 3.7. Cloud Discovery One of the concerns of using Cloud services is not aware where the resource is located, especially Cloud operators can move application instances from one place to another. When applications in Cloud communicate with on-premise applications, it may not be clear where the Cloud applications are located or to which VPCs they belong. It is highly desirable to have tools to discover cloud services in much the same way as you would discover your on-premises infrastructure. A significant difference is that cloud discovery uses the cloud vendor's API to extract data on your cloud services, rather than the direct access used in scanning your on-premises infrastructure. Being able to detect Cloud services location can also help on- premises gateways (routers) to switch the services to a more optimal site when the current cloud site encounters failures or degradation. 4. Interconnecting Enterprise Sites with Cloud DCs Considering that many enterprises already have existing VPNs (e.g. MPLS based L2VPN or L3VPN) interconnecting branch offices & on- premises data centers, connecting to Cloud services will be mixed of different types of networks. When an enterprise's existing VPN service providers do not have direct connections to the corresponding cloud DCs that the enterprise prefers to use, the enterprise has to face additional infrastructure and operational costs to utilize the Cloud services. 4.1. Sites to Cloud DC Most Cloud operators offer some type of network gateway through which an enterprise can reach their workloads hosted in the Cloud DCs. AWS (Amazon Web Services) offers the following options to reach workloads in AWS Cloud DCs: Dunbar, et al. [Page 10] Internet-Draft Net2Cloud Problem Statement March 2022 - AWS Internet gateway allows communication between instances in AWS VPC and the internet. - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are established between an enterprise's own gateway and AWS vGW, so that the communications between those gateways can be secured from the underlay (which might be the public Internet). - AWS Direct Connect, which allows enterprises to purchase direct connect from network service providers to get a private leased line interconnecting the enterprises gateway(s) and the AWS Direct Connect routers. In addition, an AWS Transit Gateway can be used to interconnect multiple VPCs in different Availability Zones. AWS Transit Gateway acts as a hub that controls how traffic is forwarded among all the connected networks which act like spokes. Microsoft's ExpressRoute allows extension of a private network to any of the Microsoft cloud services, including Azure and Office365. ExpressRoute is configured using Layer 3 routing. Customers can opt for redundancy by provisioning dual links from their location to two Microsoft Enterprise edge routers (MSEEs) located within a third- party ExpressRoute peering location. The BGP routing protocol is then setup over WAN links to provide redundancy to the cloud. This redundancy is maintained from the peering data center into Microsoft's cloud network. Google's Cloud Dedicated Interconnect offers similar network connectivity options as AWS and Microsoft. One distinct difference, however, is that Google's service allows customers access to the entire global cloud network by default. It does this by connecting your on-premises network with the Google Cloud using BGP and Google Cloud Routers to provide optimal paths to the different regions of the global cloud infrastructure. Figure below shows an example of some of a tenant's workloads are accessible via a virtual router connected by AWS Internet Gateway; some are accessible via AWS vGW, and others are accessible via AWS Direct Connect. Different types of access require different level of security functions. Sometimes it is not visible to end customers which type of network access is used for a specific application instance. To get better visibility, separate virtual routers (e.g. vR1 & vR2) can be deployed to differentiate traffic to/from different cloud GWs. It Dunbar, et al. [Page 11] Internet-Draft Net2Cloud Problem Statement March 2022 is important for some enterprises to be able to observe the specific behaviors when connected by different connections. Customer Gateway can be customer owned router or ports physically connected to AWS Direct Connect GW. +------------------------+ | ,---. ,---. | | (TN-1 ) ( TN-2)| | `-+-' +---+ `-+-' | | +----|vR1|----+ | | ++--+ | | | +-+----+ | | /Internet\ For External | +-------+ Gateway +---------------------- | \ / to reach via Internet | +-+----+ | | | ,---. ,---. | | (TN-1 ) ( TN-2)| | `-+-' +---+ `-+-' | | +----|vR2|----+ | | ++--+ | | | +-+----+ | | / virtual\ For IPsec Tunnel | +-------+ Gateway +---------------------- | | \ / termination | | +-+----+ | | | | | +-+----+ +------+ | | / \ For Direct /customer\ | +-------+ Gateway +----------+ gateway | | \ / Connect \ / | +-+----+ +------+ | | +------------------------+ Figure 1: Examples of Multiple Cloud DC connections. 4.2. Inter-Cloud Interconnection The connectivity options to Cloud DCs described in the previous section are for reaching Cloud providers' DCs, but not between cloud DCs. When applications in AWS Cloud need to communicate with applications in Azure, today's practice requires a third-party gateway (physical or virtual) to interconnect the AWS's Layer 2 DirectConnect path with Azure's Layer 3 ExpressRoute. Dunbar, et al. [Page 12] Internet-Draft Net2Cloud Problem Statement March 2022 Enterprises can also instantiate their own virtual routers in different Cloud DCs and administer IPsec tunnels among them, which by itself is not a trivial task. Or by leveraging open source VPN software such as strongSwan, you create an IPSec connection to the Azure gateway using a shared key. The StrongSwan instance within AWS not only can connect to Azure but can also be used to facilitate traffic to other nodes within the AWS VPC by configuring forwarding and using appropriate routing rules for the VPC. Most Cloud operators, such as AWS VPC or Azure VNET, use non- globally routable CIDR from private IPv4 address ranges as specified by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is necessary to exchange Public routable addresses for applications in different Cloud DCs. In summary, here are some approaches, available now (which might change in the future), to interconnect workloads among different Cloud DCs: a) Utilize Cloud DC provided inter/intra-cloud connectivity services (e.g., AWS Transit Gateway) to connect workloads instantiated in multiple VPCs. Such services are provided with the cloud gateway to connect to external networks (e.g., AWS DirectConnect Gateway). b) Hairpin all traffic through the customer gateway, meaning all workloads are directly connected to the customer gateway, so that communications among workloads within one Cloud DC must traverse through the customer gateway. c) Establish direct tunnels among different VPCs (AWS' Virtual Private Clouds) and VNET (Azure's Virtual Networks) via client's own virtual routers instantiated within Cloud DCs. DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN (Dynamic Smart VPN) techniques can be used to establish direct Multi-point-to-Point or multi-point-to multi-point tunnels among those client's own virtual routers. Approach a) usually does not work if Cloud DCs are owned and managed by different Cloud providers. Approach b) creates additional transmission delay plus incurring cost when exiting Cloud DCs. For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution Protocol) [RFC2735] so that spoke nodes can register their IP Dunbar, et al. [Page 13] Internet-Draft Net2Cloud Problem Statement March 2022 addresses & WAN ports with the hub node. The IETF ION (Internetworking over NBMA (non-broadcast multiple access) WG standardized NHRP for connection oriented NBMA network (such as ATM) network address resolution more than two decades ago. There are many differences between virtual routers in Public Cloud DCs and the nodes in an NBMA network. NHRP cannot be used for registering virtual routers in Cloud DCs unless an extension of such protocols is developed for that purpose, e.g. taking NAT or dynamic addresses into consideration. Therefore, DMVPN and/or DSVPN cannot be used directly for connecting workloads in hybrid Cloud DCs. 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs Traditional MPLS-based VPNs have been widely deployed as an effective way to support businesses and organizations that require network performance and reliability. MPLS shifted the burden of managing a VPN service from enterprises to service providers. The CPEs attached to MPLS VPNs are also simpler and less expensive, because they do not need to manage routes to remote sites; they simply pass all outbound traffic to the MPLS VPN PEs to which the CPEs are attached (albeit multi-homing scenarios require more processing logic on CPEs). MPLS has addressed the problems of scale, availability, and fast recovery from network faults, and incorporated traffic-engineering capabilities. However, traditional MPLS-based VPN solutions are sub-optimized for connecting end-users to dynamic workloads/applications in cloud DCs because: - The Provider Edge (PE) nodes of the enterprise's VPNs might not have direct connections to third party cloud DCs that are used for hosting workloads with the goal of providing an easy access to enterprises' end-users. - It takes some time to deploy provider edge (PE) routers at new locations. When enterprise's workloads are changed from one cloud DC to another (i.e., removed from one DC and re- instantiated to another location when demand changes), the enterprise branch offices need to be connected to the new cloud DC, but the network service provider might not have PEs located at the new location. Dunbar, et al. [Page 14] Internet-Draft Net2Cloud Problem Statement March 2022 One of the main drivers for moving workloads into the cloud is the widely available cloud DCs at geographically diverse locations, where apps can be instantiated so that they can be as close to their end-users as possible. When the user base changes, the applications may be migrated to a new cloud DC location closest to the new user base. - Most of the cloud DCs do not expose their internal networks. An enterprise with a hybrid cloud deployment can use an MPLS-VPN to connect to a Cloud provider at multiple locations. The connection locations often correspond to gateways of different Cloud DC locations from the Cloud provider. The different Cloud DCs are interconnected by the Cloud provider's own internal network. At each connection location (gateway), the Cloud provider uses BGP to advertise all of the prefixes in the enterprise's VPC, regardless of which Cloud DC a given prefix is actually in. This can result in inefficient routing for the end-to-end data path. Another roadblock is the lack of a standard way to express and enforce consistent security policies for workloads that not only use virtual addresses, but in which are also very likely hosted in different locations within the Cloud DC [RFC8192]. The current VPN path computation and bandwidth allocation schemes may not be flexible enough to address the need for enterprises to rapidly connect to dynamically instantiated (or removed) workloads and applications regardless of their location/nature (i.e., third party cloud DCs). 6. Problem with using IPsec tunnels to Cloud DCs As described in the previous section, many Cloud operators expose their gateways for external entities (which can be enterprises themselves) to directly establish IPsec tunnels. Enterprises can also instantiate virtual routers within Cloud DCs to connect to their on-premises devices via IPsec tunnels. 6.1. Scaling Issues with IPsec Tunnels If there is only one enterprise location that needs to reach the Cloud DC, an IPsec tunnel is a very convenient solution. Dunbar, et al. [Page 15] Internet-Draft Net2Cloud Problem Statement March 2022 However, many medium-to-large enterprises have multiple sites and multiple data centers. For multiple sites to communicate with workloads and apps hosted in cloud DCs, Cloud DC gateways have to maintain many IPsec tunnels to all those locations. In addition, each of those IPsec Tunnels requires pair-wise periodic key refreshment. For a company with hundreds or thousands of locations, there could be hundreds (or even thousands) of IPsec tunnels terminating at the cloud DC gateway, which is very processing intensive. That is why many cloud operators only allow a limited number of (IPsec) tunnels & bandwidth to each customer. Alternatively, you could use a solution like group encryption where a single IPsec SA is necessary at the GW but the drawback is key distribution and maintenance of a key server, etc. 6.2. Poor performance when overlay public internet When large number of IPSec encap & decap are needed, the performance is degraded. NAT also adds performance burden. When enterprise CPEs or gateways are far away from cloud DC gateways or across country/continent boundaries, performance of IPsec tunnels over the public Internet can be problematic and unpredictable. Even though there are many monitoring tools available to measure delay and various performance characteristics of the network, the measurement for paths over the Internet is passive and past measurements may not represent future performance. Many cloud providers can replicate workloads in different available zones. An App instantiated in a cloud DC closest to clients may have to cooperate with another App (or its mirror image) in another region or database server(s) in the on-premises DC. This kind of coordination requires predicable networking behavior/performance among those locations. 7. End-to-End Security Concerns for Data Flows Add description for Bucket 7 from Kausik When IPsec tunnels established from enterprise on-premises CPEs are terminated at the Cloud DC gateway where the workloads or applications are hosted, some enterprises have concerns regarding traffic to/from their workload being exposed to others behind the data center gateway (e.g., exposed to other organizations that have workloads in the same data center). Dunbar, et al. [Page 16] Internet-Draft Net2Cloud Problem Statement March 2022 To ensure that traffic to/from workloads is not exposed to unwanted entities, IPsec tunnels may go all the way to the workload (servers, or VMs) within the DC. 8. Requirements for Dynamic Cloud Data Center VPNs To address the aforementioned issues, any solution for enterprise VPNs that includes connectivity to dynamic workloads or applications in cloud data centers should satisfy a set of requirements: - The solution should allow enterprises to take advantage of the current state-of-the-art in VPN technology, in both traditional MPLS-based VPNs and IPsec-based VPNs (or any combination thereof) that run over the public Internet. - The solution should not require an enterprise to upgrade all their existing CPEs. - The solution should support scalable IPsec key management among all nodes involved in DC interconnect schemes. - The solution needs to support easy and fast, on-the-fly, VPN connections to dynamic workloads and applications in third party data centers, and easily allow these workloads to migrate both within a data center and between data centers. - Allow VPNs to provide bandwidth and other performance guarantees. - Be a cost-effective solution for enterprises to incorporate dynamic cloud-based applications and workloads into their existing VPN environment. 9. Security Considerations The draft discusses security requirements as a part of the problem space, particularly in sections 4, 5, and 8. Solution drafts resulting from this work will address security concerns inherent to the solution(s), including both protocol aspects and the importance (for example) of securing workloads in cloud DCs and the use of secure interconnection mechanisms. Dunbar, et al. [Page 17] Internet-Draft Net2Cloud Problem Statement March 2022 10. IANA Considerations This document requires no IANA actions. RFC Editor: Please remove this section before publication. 11. References 11.1. Normative References 11.2. Informative References [RFC2735] B. Fox, et al "NHRP Support for Virtual Private networks". Dec. 1999. [RFC8192] S. Hares, et al "Interface to Network Security Functions (I2NSF) Problem Statement and Use Cases", July 2017 [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation, storage, distribution and enforcement of policies for network security", Nov 2007. [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and Internet Key Exchange (IKE) Document Roadmap", Feb 2011. [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", Feb 2006 [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual Private Networks (L2VPNs)", Sept 2006. 12. Acknowledgments Many thanks to Alia Atlas, Chris Bowers, Paul Vixie, Paul Ebersman, Timothy Morizot, Ignas Bagdonas, Michael Huang, Liu Yuan Jiao, Katherine Zhao, and Jim Guichard for the discussion and contributions. Dunbar, et al. [Page 18] Internet-Draft Net2Cloud Problem Statement March 2022 Authors' Addresses Linda Dunbar Futurewei Email: Linda.Dunbar@futurewei.com Andrew G. Malis Malis Consulting Email: agmalis@gmail.com Christian Jacquenet Orange Rennes, 35000 France Email: Christian.jacquenet@orange.com Mehmet Toy Verizon One Verizon Way Basking Ridge, NJ 07920 Email: mehmet.toy@verizon.com Kausik Majumdar Microsoft Azure kmajumdar@microsoft.com Dunbar, et al. [Page 19]