Skip to main content

Dynamic Networks to Hybrid Cloud DCs Problem Statement
draft-ietf-rtgwg-net2cloud-problem-statement-16

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Linda Dunbar , Andrew G. Malis , Christian Jacquenet , Mehmet Toy , Kausik Majumdar
Last updated 2023-01-09
Replaces draft-dm-net2cloud-problem-statement
RFC stream Internet Engineering Task Force (IETF)
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd (None)
IESG IESG state I-D Exists
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-rtgwg-net2cloud-problem-statement-16
Network Working Group                                         L. Dunbar
Internet Draft                                                Futurewei
Intended status: Informational                               Andy Malis
Expires: July 9, 2023                                  Malis Consulting
                                                           C. Jacquenet
                                                                 Orange
                                                                 M. Toy
                                                                Verizon
                                                            K. Majumdar
                                                              Microsoft
                                                        January 9, 2023

           Dynamic Networks to Hybrid Cloud DCs Problem Statement
              draft-ietf-rtgwg-net2cloud-problem-statement-16

Abstract

   This document describes the network-related problems enterprises
   face today when interconnecting their branch offices with dynamic
   workloads in third-party data centers (a.k.a. Cloud DCs) and some
   mitigation practices. There can be many problems associated with
   connecting to or among Cloud DCs; the Net2Cloud problem statements
   are mainly for enterprises that already have traditional MPLS
   services and are interested in leveraging those networks (instead of
   altogether abandoning them). Other problems are out of the scope of
   this document.

   This document also describes the practices of getting around the
   problems when connecting workloads in the Cloud DCs.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

xxx, et al.                                                    [Page 1]
Internet-Draft        Net2Cloud Problem Statement

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on July 9, 2023.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.

Table of Contents

   1. Introduction...................................................3
      1.1. Key Characteristics of Cloud Services:....................3
      1.2. Connecting to Cloud Services..............................3
      1.3. Reaching App instances in the optimal Cloud DC locations..4
   2. Definition of terms............................................5
   3. High Level Issues of Connecting to Cloud DCs...................6
      3.1. More BGP errors triggered by large number of peers........6
      3.2. Site failures that may lead to massive routes changes.....6
      3.3. 5G Edge Clouds............................................7
      3.4. Security Issues...........................................8
      3.5. DNS for Hybrid Workloads..................................9
      3.6. NAT for Cloud Services...................................10

Dunbar, et al.                                                 [Page 2]
Internet-Draft        Net2Cloud Problem Statement

      3.7. Cloud Discovery..........................................10
   4. Interconnecting Enterprise Sites with Cloud DCs...............11
      4.1. Sites to Cloud DC........................................11
      4.2. Inter-Cloud Interconnection..............................13
   5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...15
   6. Problem with using IPsec tunnels to Cloud DCs.................16
      6.1. Scaling Issues with IPsec Tunnels........................16
      6.2. Poor performance when overlay public internet............17
   7. End-to-End Security Concerns for Data Flows...................17
   8. Requirements for Dynamic Cloud Data Center VPNs...............18
   9. Security Considerations.......................................18
   10. IANA Considerations..........................................18
   11. References...................................................19
      11.1. Normative References....................................19
      11.2. Informative References..................................19
   12. Acknowledgments..............................................19

1. Introduction

1.1. Key Characteristics of Cloud Services:

   Key characteristics of Cloud Services are on-demand, scalable,
   highly available, and usage-based billing. Cloud Services, such as,
   compute, storage, network functions (most likely virtual), third
   party managed applications, etc. are usually hosted and managed by
   third parties Cloud Operators. Here are some examples of Cloud
   network functions: Virtual Firewall services, Virtual private
   network services, Virtual PBX services including voice and video
   conferencing systems, etc. Cloud Data Center (DC) is shared
   infrastructure that hosts the Cloud Services to many customers.

1.2. Connecting to Cloud Services

   With the advent of widely available third-party cloud DCs and
   services in diverse geographic locations and the advancement of
   tools for monitoring and predicting application behaviors, it is
   very attractive for enterprises to instantiate applications and
   workloads in locations that are geographically closest to their end-
   users. Such proximity can improve end-to-end latency and overall
   user experience. Conversely, an enterprise can easily shutdown
   applications and workloads whenever end-users are in motion (thereby
   modifying the networking connection of subsequently relocated
   applications and workloads). In addition, enterprises may wish to

Dunbar, et al.                                                 [Page 3]
Internet-Draft        Net2Cloud Problem Statement

   take advantage of more and more business applications offered by
   cloud operators.

   The networks that interconnect hybrid cloud DCs must address the
   following requirements:
     - to access all workloads in the desired cloud DCs:
        Many enterprises include cloud in their disaster recovery
        strategy, such as enforcing periodic backup policies within the
        cloud, or running backup applications in the Cloud.

     - Global reachability from different geographical zones, thereby
        facilitating the proximity of applications as a function of the
        end users' location, to improve latency.
     - Elasticity: prompt connection to newly instantiated
        applications at Cloud DCs when usages increase and prompt
        release of connection after applications at locations being
        removed when demands change.
     - Scalable policy management: apply the appropriate polices to
        the newly instantiated application instances at any Cloud DC
        locations.

1.3. Reaching App instances in the optimal Cloud DC locations

   Many applications have multiple instances instantiated in different
   Cloud DCs. The current state of the art solutions is typically based
   on DNS assisted with load balancer by responding a FQDN (Fully
   Qualified Domain Name) inquiry with an IP address of the closest or
   lowest cost DC that can reach the instance. Here are some problems
   associated with DNS based solutions:
     - Dependent on client behavior
          - Client can cache results indefinitely
          - Client may not receive service even though there are
             servers available (before cache timeout) in other Cloud
             DCs.
     - No inherent leverage of proximity information present in the
        network (routing) layer, resulting in loss of performance. When
        multiple service instances are hosted in edge Cloud DCs, the

Dunbar, et al.                                                 [Page 4]
Internet-Draft        Net2Cloud Problem Statement

        routing distances among those edge Cloud DCs can be less
        significant comparing with the dynamic network conditions.
     - Inflexible traffic control:
        Local DNS resolver become the unit of traffic management. This
        requires DNS to receive periodical update of the network
        condition, which is difficult.

2. Definition of terms

   Cloud DC:   Third party Data Centers that usually host applications
               and workload owned by different organizations or
               tenants.

   Controller: Used interchangeably with SD-WAN controller to manage
               SD-WAN overlay path creation/deletion and monitoring the
               path conditions between two or more sites.

   DSVPN:      Dynamic Smart Virtual Private Network. DSVPN is a secure
               network that exchanges data between sites without
               needing to pass traffic through an organization's
               headquarter virtual private network (VPN) server or
               router.

   Heterogeneous Cloud: applications and workloads split among Cloud
               DCs owned or managed by different operators.

   Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own
               on-premises DCs in addition to Cloud services provided
               by one or more cloud operators. (e.g. AWS, Azure,
               Google, Salesforces, SAP, etc).

   VPC:        Virtual Private Cloud is a virtual network dedicated to
               one client account. It is logically isolated from other
               virtual networks in a Cloud DC. Each client can launch
               his/her desired resources, such as compute, storage, or
               network functions into his/her VPC. Most Cloud
               operators' VPCs only support private addresses, some
               support IPv4 only, others support IPv4/IPv6 dual stack.

Dunbar, et al.                                                 [Page 5]
Internet-Draft        Net2Cloud Problem Statement

3. High Level Issues of Connecting to Cloud DCs

   There are many problems associated with connecting to hybrid Cloud
   Services, many of which are out of the IETF scope. This section is
   to identify some of the high-level problems that can be addressed by
   IETF, especially by Routing area. Other problems are out of the
   scope of this document. By no means has this section covered all
   problems for connecting to Hybrid Cloud Services, e.g., difficulty
   in managing cloud spending is not discussed here.

3.1. More BGP errors triggered by large number of peers

   Many network service providers have limited number of BGP peers and
   usually have prior negotiated peering policies with their BGP peers.
   Cloud GWs need to peer with many more parties, via private circuits
   or IPsec over public internet. Many of those peering parties may not
   be traditional network service providers. Their BGP configurations
   practices might not be consistent, and some are done by less
   experienced personnel.
   All those can contribute to increased BGP peering errors, such as
   capability mismatch, BGP cease notification, unwanted route leaks,
   missing Keepalives, etc.

   Many Cloud DCs don't support multi hop eBGP peering with external
   devices. To get around this limitation, it is necessary for
   enterprises GWs to establish IP tunnels to the Cloud GWs to form IP
   adjacency.

   Some Cloud DC eBGP peering only supports limited number of routes
   from external entities. To get around this limitation, on-premises
   DCs need to set up default routes to be exchanged with the Cloud DC
   eBGP peers.

3.2. Site failures that may lead to massive routes changes

   Site failures include, but not limited to, a site capacity
   degradation or entire site going down caused by a variety of
   reasons, such as fiber cut connecting to the site or among pods
   within the site, cooling failures, insufficient backup power, cyber
   threats attacks, too many changes outside of the maintenance window,
   etc. Fiber-cut is not uncommon within a Cloud site or between sites.

Dunbar, et al.                                                 [Page 6]
Internet-Draft        Net2Cloud Problem Statement

   As described in RFC7938, Cloud DC BGP might not have an IGP to route
   around link/node failures within the ASes.

   When those failure events happen, the Cloud DC GW which is visible
   to clients are running fine. Therefore, the Client GW can't use BFD
   to detect the failures.

   When a site capacity degrades or goes dark, there are massive
   numbers of routes needing to be changed.

   The large number of routes switching over to another site can also
   cause overloading that triggers more failures.

   In addition, the routes (IP addresses) in a Cloud DC cannot be
   aggregated nicely, triggering very large number of BGP UPDATE
   messages when a failure occurs.

   It might be more effective to do mass reroute, similar to EVPN
   [RFC7432] defined mass withdraw mechanism to signal a large number
   of routes being changed to remote PE nodes as quickly as possible.

3.3. 5G Edge Clouds

   The 5G edge clouds may host edge computing servers (virtual or
   physical) for Ultra-low latency services that must be near the UEs
   (User equipment). Those edge computing applications need to have
   very low latency to the UEs and connect to backend servers or
   databases in another location.

   The low latency services traffic is transported to/from the servers
   (virtual and physical) in the edge Clouds via the 5G Local Data
   Networks (LDN). The LDN's ingress routers, directly connected to the
   UPFs, might be co-located with 5G Core functions in the edge Cloud
   data centers. The 5G Core functions include Radio Control Functions,
   Session Management Functions (SMF), Access Mobility Functions (AMF),
   User Plane Functions (UPF), etc.

   Here are some network problems for 5G Edge Cloud DCs:

       1)              The difference of routing distances to multiple server
          instances is relatively small.
       2)              Capacity at the Edge Cloud DC might play a bigger role for
          E2E performance.
       3)              Source (UEs) can ingress from different LDN Ingress routers
          due to mobility.

Dunbar, et al.                                                 [Page 7]
Internet-Draft        Net2Cloud Problem Statement

   To get around the problem 1), the ingress routers can incorporate
   the destination site's capabilities with the routing distance for
   choosing the optimal paths.

3.4. Security Issues

   There are many aspects of security issues in terms of networking to
   clouds:

     - Service instances in Cloud DCs are connected to users
        (enterprises) via Public IP ports which are exposed to the
        following security risks:

        a) Potential DDoS attack to the ports facing the untrusted
        network (e.g., the public internet), which may propagate to the
        cloud edge resources.                                   To mitigate such security risk, it is
        necessary for the ports facing internet to enable Anti-DDoS
        features.

        b) Potential risk of augmenting the attack surface with inter-
        Cloud DC connection by means of identity spoofing, man-in-the-
        middle, eavesdropping or DDoS attacks. One example of
        mitigating such attacks is using DTLS to authenticate and
        encrypt MPLS-in-UDP encapsulation (RFC 7510).

     - Potential attacks from service instances within the cloud. For
        example, data breaches, compromised credentials, and broken
        authentication, hacked interfaces and APIs, account hijacking.

     - Securing user identity management, authentication, and access
        control mechanisms is important. Developing appropriate
        security mechanisms (including tools to assess the robustness
        of the enforced security policies) can enhance the confidence
        needed by enterprises to fully take advantage of Cloud
        Services.

   Many Cloud operators offer monitoring services for data stored in
   Clouds, such as AWS CloudTrail, Azure Monitor, and many third-party
   monitoring tools to improve visibility to data stored in Clouds.
   More diligent security procedures need to be considered to mitigate
   all those security issues.

Dunbar, et al.                                                 [Page 8]
Internet-Draft        Net2Cloud Problem Statement

3.5. DNS for Hybrid Workloads

   DNS name resolution is essential for on-premises and cloud-based
   resources. For customers with hybrid workloads, which include on-
   premises and cloud-based resources, extra steps are necessary to
   configure DNS to work seamlessly across both environments.

   Cloud operators have their own DNS to resolve resources within their
   Cloud DCs and to well-known public domains. Cloud's DNS can be
   configured to forward queries to customer managed authoritative DNS
   servers hosted on-premises, and to respond to DNS queries forwarded
   by on-premises DNS servers.

   For enterprises utilizing Cloud services by different cloud
   operators, it is necessary to establish policies and rules on
   how/where to forward DNS queries. When applications in one Cloud
   need to communicate with applications hosted in another Cloud, there
   could be DNS queries from one Cloud DC being forwarded to the
   enterprise's on-premise DNS, which in turn be forwarded to the DNS
   service in another Cloud. Configuration can be complex depending on
   the application communication patterns.

   However, even with carefully managed policies and configurations,
   collisions can still occur. If you use an internal name like .cloud
   and then want your services to be available via or within some other
   cloud provider which also uses .cloud, then it can't work.
   Therefore, it is better to use the global domain name even when an
   organization does not make all its namespace globally resolvable. An
   organization's globally unique DNS can include subdomains that
   cannot be resolved at all outside certain restricted paths, zones
   that resolve differently based on the origin of the query, and zones
   that resolve the same globally for all queries from any source.

   Globally unique names do not equate to globally resolvable names or
   even global names that resolve the same way from every perspective.
   Globally unique names do prevent any possibility of collision at the
   present or in the future and they make DNSSEC trust manageable.
   Consider using a registered and fully qualified domain name (FQDN)
   from global DNS as the root for enterprise and other internal
   namespaces.

Dunbar, et al.                                                 [Page 9]
Internet-Draft        Net2Cloud Problem Statement

3.6. NAT for Cloud Services

   Cloud resources, such as VM instances, are usually assigned private
   IP addresses. By configuration, some private subnets can have the
   NAT function to reach out to external networks, and some private
   subnets are internal to Cloud only.

   Different Cloud operators support different levels of NAT functions.
   For example, AWS NAT Gateway does not currently support connections
   towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC
   Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-
   nat-gateway.html#nat-gateway-other-services. AWS Direct
   Connect/VPN/VPC Peering does not currently support any NAT
   functionality.

   Google's Cloud NAT allows Google Cloud virtual machine (VM)
   instances without external IP addresses and private Google
   Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud
   NAT implements outbound NAT in conjunction with a default route to
   allow instances to reach the Internet. It does not implement inbound
   NAT. Hosts outside the VPC network can only respond to established
   connections initiated by instances inside the Google Cloud; they
   cannot initiate new connections to Cloud instances via NAT.

   For enterprises with applications running in different Cloud DCs,
   proper configuration of NAT must be performed in Cloud DCs and their
   on-premises DC.

3.7. Cloud Discovery

   One of the concerns of using Cloud services is not aware of where
   the resource is located, especially Cloud operators can move
   application instances from one place to another. When applications
   in Cloud communicate with on-premises applications, it may not be
   clear where the Cloud applications are located or to which VPCs they
   belong.

   It is highly desirable to have tools to discover cloud services in
   the same way you would discover your on-premises infrastructure. A
   significant difference is that cloud discovery uses the cloud
   vendor's API to extract data on your cloud services rather than the
   direct access used in scanning your on-premises infrastructure.

   Being able to detect Cloud services location can also help on-
   premises gateways (routers) to switch the services to a more optimal
   site when the current cloud site encounters failures or degradation.

Dunbar, et al.                                                [Page 10]
Internet-Draft        Net2Cloud Problem Statement

4. Interconnecting Enterprise Sites with Cloud DCs

   For many enterprises with established VPNs (e.g., MPLS-based L2VPN
   or L3VPN) interconnecting branch offices & on-premises data centers,
   connecting to Cloud services will be mixed of different types of
   networks. When an enterprise's existing VPN service providers do not
   have direct connections to the desired cloud DCs that the enterprise
   prefers to use, the enterprise faces additional infrastructure and
   operational costs to utilize the Cloud services.

   This section describes some methods to minimize the costs of
   utilizing Cloud services.

4.1. Sites to Cloud DC

   Most Cloud operators offer some type of network gateway through
   which an enterprise can reach their workloads hosted in the Cloud
   DCs. For example, AWS (Amazon Web Services) offers the following
   options to reach workloads in AWS Cloud DCs:

     - AWS Internet gateway allows communication between instances in
        AWS VPC and the internet.
     - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are
        established between an enterprise's own gateway and AWS vGW, so
        that the communications between those gateways can be secured
        from the underlay (which might be the public Internet).
     - AWS Direct Connect, which allows enterprises to purchase direct
        connect from network service providers to get a private leased
        line interconnecting the enterprises gateway(s) and the AWS
        Direct Connect routers. In addition, an AWS Transit Gateway can
        be used to interconnect multiple VPCs in different Availability
        Zones. AWS Transit Gateway acts as a hub that controls how
        traffic is forwarded among all the connected networks which act
        like spokes.

   Microsoft's ExpressRoute allows extension of a private network to
   any of the Microsoft cloud services, including Azure and Office365.
   ExpressRoute is configured using Layer 3 routing. Customers can opt
   for redundancy by provisioning dual links from their location to two
   Microsoft Enterprise edge routers (MSEEs) located within a third-

Dunbar, et al.                                                [Page 11]
Internet-Draft        Net2Cloud Problem Statement

   party ExpressRoute peering location. The BGP routing protocol is
   then setup over WAN links to provide redundancy to the cloud. This
   redundancy is maintained from the peering data center into
   Microsoft's cloud network.

   Google's Cloud Dedicated Interconnect offers similar network
   connectivity options as AWS and Microsoft. One distinct difference,
   however, is that Google's service allows customers access to the
   entire global cloud network by default. It does this by connecting
   the on-premises network with the Google Cloud using BGP and Google
   Cloud Routers to provide optimal paths to the different regions of
   the global cloud infrastructure.

   Figure below shows an example of some of a tenant's workloads are
   accessible via a virtual router connected by AWS Internet Gateway;
   some are accessible via AWS vGW, and others are accessible via AWS
   Direct Connect.

   Different types of access require different level of security
   functions. Sometimes it is not visible to end customers which type
   of network access is used for a specific application instance.  To
   get better visibility, separate virtual routers (e.g., vR1 & vR2)
   can be deployed to differentiate traffic to/from different cloud
   GWs. It is important for some enterprises to be able to observe the
   specific behaviors when connected by different connections.

   Customer Gateway can be customer owned router or ports physically
   connected to AWS Direct Connect GW.

Dunbar, et al.                                                [Page 12]
Internet-Draft        Net2Cloud Problem Statement

     +------------------------+
     |    ,---.         ,---. |
     |   (TN-1 )       ( TN-2)|
     |    `-+-'  +---+  `-+-' |
     |      +----|vR1|----+   |
     |           ++--+        |
     |            |         +-+----+
     |            |        /Internet\ For External
     |            +-------+ Gateway  +----------------------
     |                     \        / to reach via Internet
     |                      +-+----+
     |                        |
     |    ,---.         ,---. |
     |   (TN-1 )       ( TN-2)|
     |    `-+-'  +---+  `-+-' |
     |      +----|vR2|----+   |
     |           ++--+        |
     |            |         +-+----+
     |            |        / virtual\ For IPsec Tunnel
     |            +-------+ Gateway  +----------------------
     |            |        \        /  termination
     |            |         +-+----+
     |            |           |
     |            |         +-+----+              +------+
     |            |        /        \ For Direct /customer\
     |            +-------+ Gateway  +----------+ gateway  |
     |                     \        /  Connect   \        /
     |                      +-+----+              +------+
     |                        |
     +------------------------+

     Figure 1: Examples of Multiple Cloud DC connections.

4.2. Inter-Cloud Interconnection

   The connectivity options to Cloud DCs described in the previous
   section are for reaching Cloud providers' DCs, but not between cloud
   DCs. When applications in AWS Cloud need to communicate with
   applications in Azure, today's practice requires a third-party
   gateway (physical or virtual) to interconnect the AWS's Layer 2
   DirectConnect path with Azure's Layer 3 ExpressRoute.

   Enterprises can also instantiate their own virtual routers in
   different Cloud DCs and administer IPsec tunnels among them, which
   by itself is not a trivial task. Or by leveraging open-source VPN
   software such as strongSwan, you create an IPSec connection to the
   Azure gateway using a shared key. The StrongSwan instance within AWS

Dunbar, et al.                                                [Page 13]
Internet-Draft        Net2Cloud Problem Statement

   not only can connect to Azure but can also be used to facilitate
   traffic to other nodes within the AWS VPC by configuring forwarding
   and using appropriate routing rules for the VPC.

   Most Cloud operators, such as AWS VPC or Azure VNET, use non-
   globally routable CIDR from private IPv4 address ranges as specified
   by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is
   necessary to exchange Public routable addresses for applications in
   different Cloud DCs.

   In summary, here are some approaches, available now to interconnect
   workloads among different Cloud DCs:

     a)            Utilize Cloud DC provided inter/intra-cloud connectivity
        services (e.g., AWS Transit Gateway) to connect workloads
        instantiated in multiple VPCs. Such services are provided with
        the cloud gateway to connect to external networks (e.g., AWS
        DirectConnect Gateway).
     b)            Hairpin all traffic through the customer gateway, meaning all
        workloads are directly connected to the customer gateway, so
        that communications among workloads within one Cloud DC must
        traverse through the customer gateway.
     c)            Establish direct tunnels among different VPCs (AWS' Virtual
        Private Clouds) and VNET (Azure's Virtual Networks) via
        client's own virtual routers instantiated within Cloud DCs.
        DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN
        (Dynamic Smart VPN) techniques can be used to establish direct
        Multi-point-to-Point or multi-point-to multi-point tunnels
        among those client's own virtual routers.

   Approach a) usually does not work if Cloud DCs are owned and managed
   by different Cloud providers.

   Approach b) creates additional transmission delay plus incurring
   cost when exiting Cloud DCs.

   For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution
   Protocol) [RFC2735] so that spoke nodes can register their IP
   addresses & WAN ports with the hub node. The IETF ION
   (Internetworking over NBMA (non-broadcast multiple access) WG
   standardized NHRP for connection oriented NBMA network (such as ATM)
   network address resolution more than two decades ago.

Dunbar, et al.                                                [Page 14]
Internet-Draft        Net2Cloud Problem Statement

   There are many differences between virtual routers in Public Cloud
   DCs and the nodes in an NBMA network. NHRP cannot be used for
   registering virtual routers in Cloud DCs unless an extension of such
   protocols is developed for that purpose, e.g. taking NAT or dynamic
   addresses into consideration. Therefore, DMVPN and/or DSVPN cannot
   be used directly for connecting workloads in hybrid Cloud DCs.

5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs

   Traditional MPLS-based VPNs have been widely deployed as an
   effective way to support businesses and organizations that require
   network performance and reliability. MPLS shifted the burden of
   managing a VPN service from enterprises to service providers. The
   CPEs attached to MPLS VPNs are also simpler and less expensive,
   because they do not need to manage routes to remote sites; they
   simply pass all outbound traffic to the MPLS VPN PEs to which the
   CPEs are attached (albeit multi-homing scenarios require more
   processing logic on CPEs).  MPLS has addressed the problems of
   scale, availability, and fast recovery from network faults, and
   incorporated traffic-engineering capabilities.

   However, traditional MPLS-based VPN solutions are sub-optimized for
   connecting end-users to dynamic workloads/applications in cloud DCs
   because:

     - The Provider Edge (PE) nodes of the enterprise's VPNs might not
        have direct connections to third party cloud DCs that are used
        for hosting workloads with the goal of providing an easy access
        to enterprises' end-users.

     - It takes some time to deploy provider edge (PE) routers at new
        locations. When enterprise's workloads are changed from one
        cloud DC to another (i.e., removed from one DC and re-
        instantiated to another location when demand changes), the
        enterprise branch offices need to be connected to the new cloud
        DC, but the network service provider might not have PEs located
        at the new location.

        One of the main drivers for moving workloads into the cloud is
        the widely available cloud DCs at geographically diverse
        locations, where apps can be instantiated so that they can be
        as close to their end-users as possible. When the user base

Dunbar, et al.                                                [Page 15]
Internet-Draft        Net2Cloud Problem Statement

        changes, the applications may be migrated to a new cloud DC
        location closest to the new user base.

     - Most of the cloud DCs do not expose their internal networks. An
        enterprise with a hybrid cloud deployment can use an MPLS-VPN
        to connect to a Cloud provider at multiple locations.  The
        connection locations often correspond to gateways of different
        Cloud DC locations from the Cloud provider.  The different
        Cloud DCs are interconnected by the Cloud provider's own
        internal network.  At each connection location (gateway), the
        Cloud provider uses BGP to advertise all of the prefixes in the
        enterprise's VPC, regardless of which Cloud DC a given prefix
        is actually in. This can result in inefficient routing for the
        end-to-end data path.

   Another roadblock is the lack of a standard way to express and
   enforce consistent security policies for workloads that not only use
   virtual addresses, but in which are also very likely hosted in
   different locations within the Cloud DC [RFC8192]. The current VPN
   path computation and bandwidth allocation schemes may not be
   flexible enough to address the need for enterprises to rapidly
   connect to dynamically instantiated (or removed) workloads and
   applications regardless of their location/nature (i.e., third party
   cloud DCs).

6. Problem with using IPsec tunnels to Cloud DCs
   As described in the previous section, many Cloud operators expose
   their gateways for external entities (which can be enterprises
   themselves) to directly establish IPsec tunnels. Enterprises can
   also instantiate virtual routers within Cloud DCs to connect to
   their on-premises devices via IPsec tunnels.

6.1. Scaling Issues with IPsec Tunnels

   If there is only one enterprise location that needs to reach the
   Cloud DC, an IPsec tunnel is a very convenient solution.

   However, many medium-to-large enterprises have multiple sites and
   multiple data centers. For multiple sites to communicate with
   workloads and apps hosted in cloud DCs, Cloud DC gateways have to

Dunbar, et al.                                                [Page 16]
Internet-Draft        Net2Cloud Problem Statement

   maintain many IPsec tunnels to all those locations. In addition,
   each of those IPsec Tunnels requires pair-wise periodic key
   refreshment. For a company with hundreds or thousands of locations,
   there could be hundreds (or even thousands) of IPsec tunnels
   terminating at the cloud DC gateway, which is very processing
   intensive. That is why many cloud operators only allow a limited
   number of (IPsec) tunnels & bandwidth to each customer.

   Alternatively, you could use a solution like group encryption where
   a single IPsec SA is necessary at the GW but the drawback is key
   distribution and maintenance of a key server, etc.

6.2. Poor performance when overlay public internet

   When large number of IPSec encap & decap are needed, the performance
   is degraded. NAT also adds performance burden.

   When enterprise CPEs or gateways are far away from cloud DC gateways
   or across country/continent boundaries, performance of IPsec tunnels
   over the public Internet can be problematic and unpredictable. Even
   though there are many monitoring tools available to measure delay
   and various performance characteristics of the network, the
   measurement for paths over the Internet is passive and past
   measurements may not represent future performance.

   Many cloud providers can replicate workloads in different available
   zones. An App instantiated in a cloud DC closest to clients may have
   to cooperate with another App (or its mirror image) in another
   region or database server(s) in the on-premises DC. This kind of
   coordination requires predicable networking behavior/performance
   among those locations.

7. End-to-End Security Concerns for Data Flows

     When IPsec tunnels established from enterprise on-premises CPEs
     are terminated at the Cloud DC gateway where the workloads or
     applications are hosted, some enterprises have concerns regarding
     traffic to/from their workload being exposed to others behind the
     data center gateway (e.g., exposed to other organizations that
     have workloads in the same data center).
     To ensure that traffic to/from workloads is not exposed to
     unwanted entities, IPsec tunnels may go all the way to the
     workload (servers, or VMs) within the DC.

Dunbar, et al.                                                [Page 17]
Internet-Draft        Net2Cloud Problem Statement

8. Requirements for Dynamic Cloud Data Center VPNs

   To address the aforementioned issues, any solution for enterprise
   VPNs that includes connectivity to dynamic workloads or applications
   in cloud data centers should satisfy a set of requirements:

     - The solution should allow enterprises to take advantage of the
        current state-of-the-art in VPN technology, in both traditional
        MPLS-based VPNs and IPsec-based VPNs (or any combination
        thereof) that run over the public Internet.
     - The solution should not require an enterprise to upgrade all
        their existing CPEs.
     - The solution should support scalable IPsec key management among
        all nodes involved in DC interconnect schemes.
     - The solution needs to support easy and fast, on-the-fly, VPN
        connections to dynamic workloads and applications in third
        party data centers, and easily allow these workloads to migrate
        both within a data center and between data centers.
     - Allow VPNs to provide bandwidth and other performance
        guarantees.
     - Be a cost-effective solution for enterprises to incorporate
        dynamic cloud-based applications and workloads into their
        existing VPN environment.

9. Security Considerations

   The draft discusses security requirements as a part of the problem
   space, particularly in sections 4, 5, 7, and 8.

   Solution drafts resulting from this work will address security
   concerns inherent to the solution(s), including both protocol
   aspects and the importance (for example) of securing workloads in
   cloud DCs and the use of secure interconnection mechanisms.

10. IANA Considerations

   This document requires no IANA actions. RFC Editor: Please remove
   this section before publication.

Dunbar, et al.                                                [Page 18]
Internet-Draft        Net2Cloud Problem Statement

11. References

11.1. Normative References

11.2. Informative References

   [RFC2735]   B. Fox, et al "NHRP Support for Virtual Private
   networks". Dec. 1999.

   [RFC8192] S. Hares, et al "Interface to Network Security Functions
             (I2NSF) Problem Statement and Use Cases", July 2017

    [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation,
             storage, distribution and enforcement of policies for
             network security", Nov 2007.

    [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and
             Internet Key Exchange (IKE) Document Roadmap", Feb 2011.

   [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private
             Networks (VPNs)", Feb 2006

   [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual
             Private Networks (L2VPNs)", Sept 2006.

12. Acknowledgments

   Many thanks to Alia Atlas, Chris Bowers, Paul Vixie, Paul Ebersman,
   Timothy Morizot, Ignas Bagdonas, Michael Huang, Liu Yuan Jiao,
   Katherine Zhao, and Jim Guichard for the discussion and
   contributions.

Dunbar, et al.                                                [Page 19]
Internet-Draft        Net2Cloud Problem Statement

Authors' Addresses

   Linda Dunbar
   Futurewei
   Email: Linda.Dunbar@futurewei.com

   Andrew G. Malis
   Malis Consulting
   Email: agmalis@gmail.com

   Christian Jacquenet
   Orange
   Rennes, 35000
   France
   Email: Christian.jacquenet@orange.com

   Mehmet Toy
   Verizon
   One Verizon Way
   Basking Ridge, NJ 07920
   Email: mehmet.toy@verizon.com

   Kausik Majumdar
   Microsoft Azure
   kmajumdar@microsoft.com

Dunbar, et al.                                                [Page 20]