Skip to main content

Neotec (Network Operations in Telecom Cloud)
bofreq-xie-neotec-network-operations-in-telecom-cloud-00

The information below is for an older version of this BOF request.
Document Type Proposed BOF request Snapshot
Title Neotec (Network Operations in Telecom Cloud)
Last updated 2025-01-14
State Proposed
Editors Jie Dong , Linda Dunbar , Chongfeng Xie
Responsible leadership
Send notices to (None)
bofreq-xie-neotec-network-operations-in-telecom-cloud-00

Name

Neotec (Network Operations in Telecom Cloud)

Note: After the BoF a different name may be chosen for the potential WG.
(Suggest to keep the name Neotec at least for the BoF, this can make good use of the Neotec mailing list and the influence of the past side meetings. If we find a better name for the WG, a new mailing list can be created and the participants will be redirected)

Description

The Neotec initiative is proposed in response to the emergence of "Telecom Cloud Service Providers (TCSPs)" in the era of cloud computing. Herein TCSPs refer to network operators who have deployed their own clouds infrastructure, particularly numerous edge clouds where services are hosted and have strict performance requirements, such as low latency or high bandwidth. The performance of cloud services delivery of TCSPs heavily relies on the efficient combined management and coordination between the clouds and the network infrastructure that interconnect them.

[Problem statement] Telecom edge cloud delivers compute processing, storage, and networks closer to the customer locations (e.g. customer premises or distributed data centers (DCs)) to fulfill the connectivity and Service Level Objectives (SLOs) requirements of user applications. However, the telecom edge clouds could have different types of processing resources (CPU, GPU, FPGA, etc.), and be dimensioned with less resource capacity compared to a conventional DC. Also, there is limited interaction or coordination between the cloud system and network controller even though they are both managed by the same operator, which poses some challenges:
1) The network operation is unaware of the edge cloud resource and service status, when service function instances are scaled up or down, relocated, or when traffic matrices between service functions change, and the network is unable to promptly adjust its resources to accommodate these changes.
2) Without visibility into network SLOs status and available resources, cloud system struggles to ensure that the deployment of service function instances in edge DCs meet the customer 's strict SLO requirements on bandwidth, latency and packet loss, e.g., SD-WAN and Security Access Service Edge (SASE) functions, and AI/ML applications.
3) The network is unaware of the traffic characteristics of telecom edge cloud applications, this might reduce service performance. Take the AI-cluster training case an example, without obtaining the flow characteristics (e.g., IP addresses, volume or timeframe) of AI training, the overall performance and efficiency of the training is constrained by the slowest flow, this may cause uneven load distribution and low network throughput.

[Scope] Neotec WG focuses on introducing a cloud-aware service orchestrator and investigating the interfaces between the cloud-aware service orchestrator and existing network controllers and cloud managers, here the term of “cloud-aware service orchestrator” refers to a system that can dynamically manage and optimize services across clouds and networks (e.g. fixed, mobile access metro network or backbone networks). Specifically, these interfaces support the following functions:
1) Edge cloud resource and services status exposure: Expose the metrics of edge clouds and characteristics of services hosted in them so that network controller can adjust the path, its bandwidth or priority to accommodate the service requirements. For example, the computing metrics collected from cloud system, e.g. Kubernetes or OpenStack, are transformed into ones that can be understood by the network.
2) Inter-edge cloud network functions and status exposure: For example, expose the network topology and connectivity performance status among edge clouds to the cloud-aware service orchestrator for service placement and deployment, such as the status of service functions associated with VPNs, as well as the status of SD-WAN and SASE.
3) Dynamical adjustment of load-balancing policies: Effective load-balancing policy is essential to guarantee the end-to-end performance and maximize the overall throughput for certain flows among edge clouds during a specific time span. For achieving the overall best load balancing effect, based on the understanding the bandwidth requirements of each service flow, the cloud-aware orchestrator issues the traffic scheduling policy to the network controller through the interfaces. The policy can be used by the controller to determine the optimal forwarding path for each flow.
As TCSPs often rely on equipment and controllers from multiple vendors, standardized and interoperable solutions are essential to seamlessly integrate cloud and network resources.

[Use cases] The first use case is micro-service deployment in the clouds. This requires dynamic placement of service instances in the clouds, and there can be multiple instances of the same service function distributed across distributed cloud facilities at the edge or core clouds. The performance of micro-services provided to a customer relies on both the availability of computing resources in the cloud, and the topology and congestion condition of networks among the DCs to build optimized service function paths.
One more use case is the cross-DC scheduling of computing and storage resources. Given the constraints of computing resources and storage costs, it is crucial to facilitate storage migration, analysis, and processing of large-scale data across various DCs. By leveraging the real-time perception of the cloud-computing network resource status provided by the cloud, network controller can dynamically allocate storage and computing resources which connect to the metro area networks. This capability allows for flexible cross-DC storage, data modeling, and AI training, ensuring optimal service quality and efficient resource utilization.
Another use case is that Machine Learning (ML) and Federated ML applications in 5G and beyond demand massive computing resources which may spread in multiple Cloud DCs, and it also relies on the network to provide on-demand connections with required bandwidth and latency. The performance and efficiency of such application can be improved by dynamic coordination between cloud-aware service orchestrator and network controllers for optimized computing resources utilization and network throughput.

[BOF intro] The Neotec BoF will discuss several use cases where better coordination between network controllers and cloud-aware service orchestrator is needed, enabling the exchange of resource, attributes, status, requirement and policy between these domains. It will analyze the gaps in existing IETF works for the coordination between network and clouds in management and operation, and hopefully identifies the potential work needed in IETF, such as YANG model extension and potentially a shim layer which will be used by cloud-aware service orchestrator for the abstraction of the APIs provided by different cloud implementations.

Neotec will also aim to serve as a platform for the industry to exchange requirements, challenges, and experiences related to coordinated network operations for cloud-based services.

Required Details

  • Status: Non-WG Forming
  • Responsible AD name: Mahesh Jethanandani
  • BOF proponents: Chongfeng Xie (chongfeng.xie@foxmail.com), Luis M. Contreras (luismiguel.contrerasmurillo@telefonica.com), Gyan Mishra (hayabusagsm@gmail.com), Linda Dunbar(linda.dunbar@futurewei.com)
  • Number of people expected to attend: 100
  • Length of session (1 or 2 hours): 2 hours
  • Conflicts (whole Areas and or WGs)
  • Chair Conflicts TBD
  • Technology Overlap: OPSAWG, NMOP, CATS, TEAS
  • Key Participant Conflict: Joel Halpern, Diego R. López, Mohamed Boucadair, Chongfeng Xie, Zhenbin Li, Luis M. Contreras, Linda Dunbar, Jie Dong, Gyan Mishra, Bo Wu, Ran Pang, Houda Chihi, Daniel Bernier, Benoit Claise, Peng Liu, Tianran Zhou, Qiong Sun, Guangping Huang

Information for IAB/IESG

To allow evaluation of your proposal, please include the following items

  • Any protocols or practices that already exist in this space

[Gap analysis] Currently there are several WGs in IETF which work on topics related to Neotec:
-OPSAWG deals with operational and management topics that are not in scope of an existing OPS area working group and do not justify the formation of a new working group. OPSAWG has already defined connection interfaces such as Attachment Circuits (AC) between cloud gateway and network edge devices (draft-ietf-opsawg-teas-attachment-circuit) and Service Attachment Points (SAPs) (RFC 9408), as well as interfaces for network and VPN services topology and performance status (RFC 9375). None of these documents or RFCs cover the traffic flow scheduling policy interface. In addition, the WG is publishing ACaaS YANG model. This model can be used for the provisioning of ACs before or during computing service deployment to connect a cloud infrastructure to a service provider network. Although it provides bearer services for communication between Cloud DCs, it does not address the exposure of network resources to the cloud and the real-time environmental status of service function instances, which are crucial for making dynamic network path decisions.
-VPN service models (RFC8299 L3SM, RFC8466 L2SM) have been produced by several WGs, they can be used for connection service between DCs. None of them covers the traffic flow scheduling policy interface.
-TEAS WG is responsible for defining traffic engineering architecture and identifying required related routing and path computation element functions. It takes network capability and information into consideration for traffic engineering in network. It also delivers YANG models in support of traffic engineering. TEAS WG does not consider the coordination between network and cloud in management and operation. Network Slice service models (draft-ietf-teas-ietf-network-slice-nbi-yang) which can also be used for connection services with SLA requirements, however, it does not cover the traffic flow scheduling policy interface either.
-CATS WG focuses on the problem of how the network edge can steer traffic between clients of a compute service and sites offering the service. It works on a general framework for the distribution of compute and network metrics and transport of traffic from network edge to service instance, and identifies some common metrics(draft-ysl-cats-metric-definition), which will be used for traffic steering at the network edge node. It does not consider the coordination between network and cloud in management and operation.
- Network Management Operations (NMOP) WG focuses on solving network management problems faced by operators. Currently it discusses operational issues faced by the deployment of existing network management technologies, but it does not cover the management and operation problems of telecom clouds faced by TCSPs.

  • Which (if any) modifications to existing protocols or practices are required
    None.

  • Which (if any) entirely new protocols or practices are required
    Currently the following documents are expected to be produced:
    -Use cases
    -Problem statement/Gap Analysis
    -Requirement
    -Framework or Architecture
    -YANG models
    -Applicability and Deployment

In particular, new YANG model(s) need to be defined to for the interface between the cloud-aware service orchestrators and network controllers:
a. Exposes the resources, topology and metrics of the edge clouds in an appropriate format.
b. Provide dynamic traffic scheduling policy for specific flows among edge clouds during a specific time span to meet the SLAs.
c. Exposes the network functions and status among edge clouds for service placement and deployment.

Agenda

  • BoF introduction and Administrivia [Chairs] 10 mins
  • Use Cases and Problem Statements 48 mins
    Case 1: Cloud-aware Network Resouce Scheduling for AI Applications, Qiong Sun(China Telecom) 12 mins
    Case 2: DC-aware TE Topology Model for Computing Service Deployment,
    draft-llc-teas-dc-aware-topo-model, Luis M. Contreras (Telefonica) 12 mins
    Case 3: TBD, Gyan Mishra(Verizon) 12 mins
    Case 4: Other 12 mins

  • Gap Analysis and potential work in IETF (Bo Wu) 20 mins

  • Open Discussion 30 mins
  • Conclusion and Next steps [Chairs] 10 mins
  • Speaker shuffling time 3 mins