Internet-Draft Computing Resource Representation in CAN March 2022
Du & Fu Expires 4 September 2022 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-du-computing-resource-representation-00
Published:
Intended Status:
Informational
Expires:
Authors:
Z. Du
China Mobile
Y. Fu
China Mobile

Computing Resource Representation in Computing Aware Networking

Abstract

This document introduces the way of encoding service-specific information and the way of signaling it in the network.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 4 September 2022.

1. Introduction

Traditionally, the network can only do traffic engineering according to the network statuses. As the trend of computing and network convergence, some works are proposed for network to be aware of service information, and can make a better choice in the traffic steering accordingly. Dynamic Anycast (Dyncast) and Computing Aware Networking (CAN) could make the routing decisions based on both the network and computing statuses, being considered as an advanced mechanism in computing and network convergence.

In traditional network architecture, the network is only responsible for delivering packets between servers and clients, and is not aware of the computing information. [I-D.liu-dyncast-ps-usecases] and [I-D.liu-dyncast-reqs] show that, when service instances are deployed at multiple geographical edge sites, Dyncast would achieve service equivalence and load balancing by considering both the service metrics and network metrics.

However, the method of notifying the service metrics in the network, representation of computing resources, and signaling of computing resource to the network are still uncertain, which is important for the network domain to know about the computing domain.

This document dose further exploring on the way of service metrics encoding and signaling.

2. Consideration of and representation of computing metric

2.1. Comparation of network metric and computing metric

The main job of the network is to forward the packets of the users from the source to the destination, while the main job of the computing is to complete the various tasks of the users.

The network metrics include the bandwidth, latency, jitter, etc. They can describe the abilities of the network, and are independent of the detailed realization of the underlayer technologies, such as the mode of the optical fiber, or the structure of a switch.

The service metrics are more complex, which is hard to match the QoS/QoE. For example, if the task is the AI computing, such as the image processing, the computing resource can be measured by using FLOPS (Floating-point Operations Per Second) or TFLOPS (Tera FLOPS). However, it is more difficult to get the process time, which will be influenced by the current utilization rate of CPU, cache, and so on. Even some real-time OS or protocol are used, some times it will fail because of the deadlock or others mechanisms of OS.That is not to say there is any problem with the OS, but the complex environment in it. So the service metric will consider more factors to judge the performance, and how to be used in other domain to guarantee the E2E service quality.

2.2. Representation of computing metric

Based on the diversity of computing resources, to use the information of computing resource for network, we can use two ways to represent them.

At one aspect, we can offer a general computing load information to the ingress nodes. As an example, we perhaps only need to three values:

one red value stands for the busy status,

one yellow value stands for relatively busy status,

one green value stands for free status.

Therefore, the ingress node only needs to consider the yellow MECs and green MECs when doing load balancing, in which the green ones are more preferred. That is like the SR policy and could also be used together, for example, to choose a yellow path and a yellow service instance.

At the other aspect, we can also offer some other computing related information to the ingress nodes for a specific, such as:

the service information deployed on MEC, for example, Service ID,

the maximum session number that the MEC can provide,

the current session number that is in use,

the CPU/GPU utilization,

the FLOPS/HASH ability of the server,

the available computing infrastructure of the server, etc.

These additional information may be optional and encoded as TLVs. A specific service may have a specific preferred set of TLVs. When more information is offered, the ingress node can make a better decision according to more dimensions. For example, if multiple instances have the same free status, the ingress node can make a better choice according to the additional TLVs. The detailed decision algorithm is out of scope of this document.

2.3. Example precess of computing load information

For a specific service, we can offer both a general computing load information and some more specific information about the computing. A general process about it is described as below.

Step1: The service instances are deployed in multiple MECs. The ingress nodes of network working as the load balancing point needs to obtain the computing information. The service should have a specific SID, for example SID1, in the network, so that the ingress node can recognize and treat the service request differently according to SID.

Step2: After obtaining the computing information of a service related to ServiceID1 from multiple MECs, the ingress nodes should record the computing information. Meanwhile, an ingress node should also be able to obtain network status, for example the latency to the egress of an MEC, and record it.

Step3: An ingress node receives a packet targeted to the ServiceID1. According to the service metrics and network metrics it has recorded, the ingress node makes a decision about which MEC to use and forward the packet to the related egress. The selection method may be depended on the service. For example, it may be the one with the lowest latency among the ones that can offer the service, or the one with the best computing resource among the ones that have a latency fulfilling the service requirements, or a hybrid method.

The purpose of the procedure is to find an MEC that is relatively near to the client, and also have enough computing resource for the service. However, the MECs that provide the service may be various, and perhaps have different computing abilities. Therefore, a load balancing method considering the computing resource is useful in this scenario.

3. Signaling of Computing Load Information

For the signaling, a general process about it is described as below.

Step1: The gateway of the MEC collects the status information of a service, such as SID1. For example, the controller in the MEC can collect the information and notify the gateway of the MEC.

Step2: The egress of the MEC receives the service status information of the SID1 from the gateway of the MEC, and notify other network nodes including the ingress nodes.

In the first step, the controller or the gateway perhaps can communication by using PCE or other protocol for the SDN controller. In the second step, the SDN method can also be used; however, communications between the controller of the MEC and the controller of the network may be needed, which is complicated. In this document, we suggest transferring computing information by using BGP. When we are notifying that the MEC can support SerivceID1, i.e., the route for ServiceID1, we can include additional computing information in its Extended Community.

7. References

7.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.

7.2. Informative References

[I-D.liu-dyncast-ps-usecases]
Liu, P., Willis, P., Trossen, D., and C. Li, "Dynamic-Anycast (Dyncast) Use Cases & Problem Statement", Work in Progress, Internet-Draft, draft-liu-dyncast-ps-usecases-02, , <https://www.ietf.org/archive/id/draft-liu-dyncast-ps-usecases-02.txt>.
[I-D.liu-dyncast-reqs]
Liu, P., Jiang, T., Willis, P., Trossen, D., and C. Li, "Dynamic-Anycast (Dyncast) Requirements", Work in Progress, Internet-Draft, draft-liu-dyncast-reqs-01, , <https://www.ietf.org/archive/id/draft-liu-dyncast-reqs-01.txt>.

Authors' Addresses

Zongpeng Du
China Mobile
No.32 XuanWuMen West Street
Beijing
100053
China
Yuexia Fu
China Mobile
No.32 XuanWuMen West Street
Beijing
100053
China