Internet-Draft Abbreviated Title October 2023
Huang, et al. Expires 25 April 2024 [Page]
Workgroup:
RTGWG
Internet-Draft:
draft-ietf-xml2rfc-template-06
Published:
Intended Status:
Standards Track
Expires:
Authors:
D.H. Huang
ZTE Corporation
G.C. Chen
China Telecom
J.L. Liang
China Telecom
Y.Z. Zhang
China Unicom
D.Y. Dong
Beijing Jiaotong University
Y.DY. Yuan
ZTE Corporation
F.HK. Fu
ZTE Corporation
C. Huang
ZTE Corporation
Y. Guo
ZTE Corporation

Service ID for Addressing and Networking

Abstract

More and more emerging applications have raised the demand for establishing networking connections?anywhere and anytime, alongside the availability of highly distributive?any-cloud services. Such a demand motivates the need to efficiently interconnect heterogeneous entities, e.g., different domains of network and cloud owned by different providers, with the goal of reducing cost, e.g., overheads and end-to-end latency, while ensuring the overall performance satisfies the requirements of the applications. Considering that different network domains and cloud providers may adopt different types of technologies, the key of interconnection and efficient coordination is to employ a unified interface that can be understood by heterogeneous parties which could derive the consistent requirements of the same service and treat the service traffic appropriately by their proprietary policies and technologies.

Therefore, service ID is one promising candidate for the unified interface since it could be designed to be lightweight, secure, and enables fast and efficient packet treatment. Leveraging service ID, addressing and networking among heterogeneous network domains and cloud providers can be accomplished by establishing the mapping between the unified service ID and the specific technologies used by a network domain or a cloud provider.

This document provides typical use cases of unified service ID for addressing and routing (SIAN), validating that interconnecting different network domains or cloud providers can be achieved at lower cost without sacrificing the performance of application compared with existing methods of which problems as well as gaps have also been illustrated. The requirements for SIAN are also derived for each of the scenarios. Finally, a framework solution is demonstrated.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 25 April 2024.

Table of Contents

1. Introduction

Emerging applications have raised stringent requirements such as a 1G+bps rate and less than 10ms delay. To satisfy these requirements, three major trends have taken place. First, the cloud-native paradigm enables one application to be decomposed into multiple microservices each performing an independent piece of functionality. Second, virtualization technologies decouple the logical function from physical infrastructure, enabling the deployment of microservices in multiple network locations while their aggregate performance is the same as the monolithic application. Third, cloud computing tasks are offloaded to the edge such as base stations, vehicles, or even handheld devices, which further bring the micro service closer to clients. These three trends lead to the deployment of highly distributive?any-cloud services and the demand for establishing Internet connections?anywhere and anytime. However, considering the heterogeneous technologies adopted by different entities i.e., network domains and cloud provider, and the dynamicity required in selecting appropriate service nodes when clients are moving or the available resources changes, it remains a challenge to efficiently interconnect different entities with consistent SLA guaranteeing. Currently, when a packet is delivered from one network domain to another, it is generally sent via a tunnel where two endpoints are located in the two network domains. The tunnel is unaware of the underlying technologies used by the two network domains, but the encapsulation and the de-capsulation process at both endpoints lead to a larger end-to-end delay. Moreover, the establishment and the tearing down procedures of the tunnel take time, which makes the tunneling approach not able to dynamically select appropriate network domains.

To achieve efficient inter-domain or inter-cloud communications, it is critical to design a unified interface that can be understood by any network domain or cloud provider. Among all the available technologies, we observe that service ID is one promising candidate for the unified interface and select typical use cases to demonstrate its advantages. Leveraging service ID, addressing and networking among heterogeneous network domains and cloud providers with consistent service SLA guarantee could be accomplished by establishing the mapping between the unified service ID and the specific technologies used by a network domain or a cloud provider.

[[I-D.trossen-rtgwg-rosa-arch]] illustrates the service address to be employed as anycast in the overlay network for service oriented addressing for the benefits of decoupling with specific networking and computing resources, while [[I-D.ldbc-cats-framework]] employs computing service ID as an index for computing awareness traffic steering, and [[I-D.li-apn-framework]] designs an APP ID as an interface between application as well as its networking requirements and the underlying network. Rather than leveraging the conventional 5 tuples for either traffic steering or interface between different parties, employment of a light-weight and standalone service ID in the routing network could address the deterrent gaps with significant benefits. This draft will try to demonstrate the chief gaps with overall requirements of service ID, and focuses upon the use cases the above drafts have not yet brought up and illustrates end to end solution considerations from perspectives of interconnection of networks, clouds and terminals.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2. Terminology

  • Service ID:Service Identifier
  • SIAN:Service ID for Addressing and Networking
  • SCMS:Service Control and Management System
  • SSMC:SIAN Service Metric Collector
  • SNMC:SIAN Network Metric Collector
  • SUTF:SIAN User Traffic Forwarder
  • SPIS:SIAN Path and Instance Selector
  • OAM:Operation Administration and Maintenance
  • FM:Failure management
  • PM:Performance management
  • CSP:cloud service provider
  • Multi-cloud [ITU-T Y. 3537]:Use of cloud services in the public cloud from two or more independent cloud service providers (CSPs) at the same time for business.

3. Use Case Scenarios

In the following, we have a couple of typical use cases that require the interconnection of different entities such as network domains and CSPs. For each case, we demonstrate the complexity and the possible hinderances for service performance using existing methods and illustrate how service ID facilitates interconnection between different parties.

3.1. Service Mesh Management in Multi-Cloud

In the cloud-native paradigm, an application is generally decomposed into multiple micro-service components each performing an independent piece of functionality and then using service mesh to manage inter-service communications. Due to constraints imposed by the computing resources (e.g., processor types or storage capacity) and the capability of different CSPs, especially by cost and energy insufficiency for those edge computing providers, deploying the entire application functionalities in one site is therefore not economical feasible, so instantiating and executing micro-service components in multi-cloud environments, and then performing inter-service networking over Internet become attractive. Moreover, to ensure the aggregated performance of the micro-services deployed in multi-cloud is the same as the monolithic cloud service, it is an essential use case to conduct service mesh management in multi-cloud.

In the existing service mesh management approach, first, it is running in each CSP’s internal SDN domain, which is separated from that of other CSPs or external 3rd parties. The service accessing point (either in layer 4 or layer 7 from the client’s point of view) which currently resides in the API gateway that each CSP operate separately, before a client’s request is processed, one of the CSP’s API gateway must be first appointed as the service accessing point to intercept all the incoming requests. The API gateway then processes the client’s request according to the capabilities and container orchestration of its own CSP. So, if a client’s front end application tries to access ubiquitous computing resources and make use of an available back end micro-service instance deployed in different CSPs, such an approach actually limits and adds complexities to clients’ capabilities of switching CSPs to serve their requests if there are better choices available in other CSPs. Second, inter-service communication is generally conducted using sidecar proxies that are collocated with service pods. As shown in Fig. 1, a sidecar proxy, e.g., sidecar proxy A, intercepts all incoming traffic of the collocated service pod, e.g., micro-service A, decapsulates packets of the traffic, conducts appropriate processing, such as service discovery, routing, or rate limiting, and sends the packets to appropriate service instances. After receiving packets from a service instance, sidecar proxy A encapsulates the packets and sends the packets to another sidecar proxy, e.g., sidecar proxy B, using layer-7 protocols such as gRPC, or REST API.


                            +-----------------+      +-----------------+
                    |   cloud GW A    |       |   cloud GW B   |
                    +-----------------+      +-----------------+
                                 |      intercept         |
                                 v    incoming traffic    v
                    +-----------------+     +-----------------+
                    |  sidecar proxy A|--->|  sidecar proxy B |
                    +-----------------+     +-----------------+
                              |  ^                  |  ^
                              v  |                  v  |
                   +-----------------+     +-----------------+
                   |   microservice A|     |  microservice B |
                   +-----------------+     +-----------------+

Figure 1

It can be seen from the above procedures that each hop inter-service communication incurs additional delay including the processing time of two sidecar proxies. If a composite cloud native service requires meshed or multi-hop inter-service communications in multi-cloud, the complexity of managing the composite cloud service is tremendous and the end-to-end delay of the composite cloud service can easily become intolerable.

3.2. Multi-Domain and Multi-Cloud interconnection

In industry, there is a growing interest in connecting factories that are located in different areas to achieve smart manufacturing and fast logistics. Since the distance between factories may range from several kilometers to thousands of kilometers, the communications among factories generally involve multiple network domains and CSPs that adopt heterogeneous technologies. Moreover, the requirements of inter-factory communications are diverse. For discrete automation applications, the end-to-end delay is required to be less than 10ms and the data rate is less than 10Mbps, while for process automation systems, the end-to-end delay is about 60ms and the data rate can be as high as 100Mbps.

To accommodate such diverse applications, tunnels are generally established to connect heterogeneous domains in existing approaches. As shown in Fig. 2, a tunnel is established between network domains A and B to deliver packets for factories A and B. If the two network domains use different protocols, two gateways also need to be established at the two endpoints of the tunnel, respectively. When factory A sends a remote control message to factory B, the packets are encapsulated at GW A and then sent to the ingress of the tunnel. The packets are delivered through the tunnel and when they reach the egress, they are sent to GW B and further decapsulated.




         +----------------------------------+                    +---------------------------------+
         |                                  |        Tunnel      |                                 |
         |   _____________          ____    |--------------------|  _______         ____________   |
         |   |_Factory A_|-------->|_GWA_|  |    --------------> |  |_GWB_|-------->|_Factory B_|  |
         |                                  |--------------------|                                 |
         |   network domain A               |                    |    network domain B             |
         +----------------------------------+                    +---------------------------------+

Figure 2

It can be seen from the above procedures that each hop cross-domain communication incurs additional delay including the encapsulation and the decapsulation time of packets. If a remote control application requires multi-hop cross-domain communications, such as the application involves the sequential execution of multiple factories, or the device that triggers the application moves from one network domain to another, the complexity of managing the remote control application is tremendous and the end-to-end delay of the application would always exceed the maximum tolerable latency requirements.

4. Problems and gap analysis of the existing service identification mechanism

This section illustrates the problems and gap analysis of the conventional 5-tuple service identification mechanism in terms of the emerging use cases.

4.1. Sate burden of service identification

No explicit service identification scheme has been designed for the L3 routing network in which all identifications are designed specifically for devices, traffic flows, network sections such as IP addresses, labels, while the ports from L4 have been designed to be always associated with specific protocols (TCP/UDP) rather than service identification. Therefore, when it comes to service identification in the routing network, mapping state has to be maintained by combining the selected tuples among the above L3/L4 routing network identifications, the selected tuples combination is actually traffic flow identification. In the very scenario in which the cloud resources as well as the associated services and applications have been migrated from centralized sites to the edge sites, the mapping state of service identification through selected tuples combination would increase dramatically and put an overwhelming state burden for the routing network in terms of scalability.

4.2. Granularity and traffic engineering of service identification

4.2.1. Granularity of service identification

The conventional methods utilized to distinguish traffic flows mainly rely on the 5-tuple of the incoming packets. For instance, ACL (Access Control List) and PBR (Policy-based Routing) apply corresponding 5-tuple matching strategies. However, a set of 5-tuple which includes Source IP, Destination IP, Source Port, Destination Port, and Protocol is not enough to reflect and indicate explicit information of Application Layer services. Elements of a set of 5-tuple belong to the Network and Transport Layer and only reckon to be an estimation and inference of Application Layer services. Thus, the current 5-tuple scheme is not sufficient to provide fine-granular service provisioning.

Particularly, it’s critically important to identify the key sub-flow which is more sensitive to networking SLA guarantee than other sub flows which share a same 5-tuple identification, therefore the networking nodes could not be able to identify the said sub-flow.

4.2.2. Traffic Engineering of service identification

The existing SRv6 technology provides the SRV6 TE policy capability to implement differentiated network service capabilities. In general, the following traffic engineering methods are used to with the SRV6 TE policy :

  • Binding SID-based traffic engineering: In general, it is used for network-side tunnel concatenation, cross-domain path concatenation, and SD-WAN scenarios. This involves security authorization and accounting management, and thus is rarely feasible for user traffic steering
  • Color-based traffic engineering : The device looks up for a matching SRV6 TE policy with the same color and end-point address. If a matching SRV6 TE policy exists, the device guides the service traffic to the policy. Then, it forwards the service traffic through the TE policy.the service and application specific SLA requirements have to anchor upon the existing TE policy capabilities rather than the other way around.
  • DSCP-based Traffic engineering: DSCP bits in the service packets are always used to further distinguish the services. However, DSCP ranges from 0 to 63, the differentiation as well as the diversification it could indicate would be quite limited.

4.3. Service operation and fulfillment

From perspective of service operation and fulfillment, an easy and simple interface for both the underlying network capabilities and the key services and applications with tailored and guaranteed networking and cloud resources, is imperative. However, the selected tuples combination scheme indicates either particular paths or traffic flows and thus could not be exposed directly to the third parties with regard to the fulfillment and operation of the said services and the networking capabilities.

In the case of segment routing over IPv6, binding segment identifications have been designed and rendered in some occasions as network service and capability to the third parties. Nevertheless, SRv6 binding segment identification stays exactly within network scheduling and orchestration domain, the exposure from SRv6 BSID would be actually restricted for network operator itself rather than a straightforward and fulfillment interface to the third parties.

4.4. Convergence of network and cloud

In current conditions, the original 5-tuples of packets are terminated when entering the cloud. Kubernetes, for instance, applies a NodePort mechanism in which the access Destination IP refers to any possible IP of a host in the cluster. Afterwards, the packets are further steered to a possible specific Pod according to Iptables rules configured in the cluster itself. Also, SNAT operations are indispensable when a source node decides to achieve load balance by distributing the packets to Pods deployed on other nodes. Another typical example is interconnections between different clusters, Istio for instance. The remote service is registered with the remote Gateway IP in the local cluster. Thus, 5-tuples in the packets sent only implys the remote Gateway and ends at the edge of the remote cluster. Therefore, the semantics indicated by 5-tuples which records in the network domain is not preserved and inherited in the cloud in the current scheme.


                +----------------------+           +--------------+
        |   Cluster 1          |  Network  |   Cluster 2  |
        |                      |           |              |
        |        +-+      +---+----+   +---+----+         |
        |       (   )---->|GateWay +-->|GateWay |         |
        |        +-+      +---+----+   +---+----+         |
        | sleep.sample         |           |\     +-+     |
        | Round Robin between: |           | \-->(   )    |
        | local Pod(Pod IP)    |           |      +-+     |
        | Remote Cluster(GW IP)|           |      Pod     |
        |                      |           |              |
        +----------------------+           +--------------+

Figure 3

4.5. L4/L7 gateway in the way of end to end service traffic

The end to end service traffic has always been terminated at L4/L7 gateways in the cloud sites in terms of traffic routing and forwarding because of the current service and application governance mechanism where the cloud as well as the applications has been operated in separate domain. A significant price has been paid in terms of the end to end service traffic forwarding performance, a higher performance benefits could have been gained with L3-based hardware forwarding instead of the L4/L7-based software forwarding. On top of L4/L7 gateway routing termination, there’re two different set of IP address with regard to the same service and application, so Network Address Translation (NAT) would always be involved in the process of the service traffic forwarding both inbound and outbound cloud.

Under scenario of inter-cloud and client-cloud service traffic forwarding, L4/L7 gateway and NAT brings forwarding performance burden which could be hindering for some sensitive services and applications.

5. Requirements of service identification for addressing and networking

In this section, requirements of service identification for routing network have been identified based upon the use cases and the problems and gap analysis of the existing service identification mechanisms.

REQ1 Service identification SHOULD have standalone semantics against 5-tuples.

REQ2 Service identification SHOULD have global and unified semantics across terminal, network and cloud.

REQ3 Service identification SHOULD be able to index the specified service profile in terms of its SLA requirements.

REQ4 Service identification might indicate specified networking capabilities and specified applications as well as application components such as micro-services.

REQ5 Service identification might cover only the selected services and applications which have been designated to be networking and computing sensitive.

6. Framework consideration of service identification for addressing and networking

6.1. Service ID over existing networking IDs and labels

It’s quite important to make the routing network be aware of the service identification in such a straightforward way that the networking node does not have to be heavily stateful when it comes to service identification specific routing and forwarding, and more importantly, decoupling mechanism between application and network remains as it is.

A Standalone entity of service identification is employed in the network control and data plane which shares the following features:

  • Location and device independent.
  • Semantics only of service type as well as its networking and computing SLA requirements.
  • Globally unique within a controlled network and possibly across multiple domains and across terminal and cloud.

6.2. Service ID Management and maintenance

The edge computing service is being expanded from a single edge site to networking and collaborating with multiple edge sites to solve problems such as high cost, poor service experience, and low resource utilization. Large-scale edge sites require interconnection and coordination, dynamic services require optimal service access and load balancing. Based on the computing capability and network conditions of the real processing delay, services can be dynamically scheduled to appropriate service instances to improve resource utilization and user experience. Service identification based addressing and networking is employed to facilitate these interconnections and coordination.

Service ID is designated as indicating a common type of fundamental service which has global semantics across terminal, network and cloud. In addition, a local service ID may be assigned by the operation and management system in the service domain. Service ID provides effective interconnections between networks and services. Based on the attributes associated with service ID, the network can perceive the resources provided by the services, the quality of the service, as well as the service requirements. From the perspective of the service platforms, the overall view of the computing and network resources with regard to the service ID could be established.

6.3. Lifecycle and governance of service ID

Registration: The service ID is assigned by the SCMS system when a service provider registers a cloud service or a network operator registers a networking connection service.

Publish: The service ID can be published after the service has been identified and authenticated and authorized. Network operators can configure specific network policies for a service according to the requirements associated with a service ID, and service providers can also orchestrate specific service instances for a service according to the resource status associated with a service ID.

Subscription: The terminal application system subscribes the service ID from the SCMS, integrates the service ID into the client of the application system, and encapsulates the service ID in the protocol header of the data flow.

Update: As the service is used, the attributes associated with the service ID are updated, and the network policy is updated in real time based on the network status.

Revocation: The service ID is not be revoked due to the termination of a particular service, and is only be terminated and revoked by the SCMS in accordance with the operating agreement and business contract of the service.

6.4. Key Processes of service

*Service initiation: SIAN service is initiated through service data traffic, so it is not necessary to initiate a signaling interaction flow through a separate service. The terminal application program carries the subscribed service identifier in the service packet, and initiates a service data traffic transmission request to the SIAN network.

*Service awareness: Network senses a resource metric indicator of a corresponding service instance by using service ID, and spreads the metric indicator on a control plane, so as to further calculate a service routing table based on the service identifier according to the network and the service metric indicator; and in addition, senses a network SLA requirement of a service type level granularity, and implements a service SLA policy guarantee of the streamline.

*Service routing: In the L3 forwarding entry mechanism in which the light-weight service identifier of the forwarding plane is used as an index, the SIAN architecture logically introduces a service routing sub-layer, that is, a routing protocol uses the service identifier as a routing identifier. Logically, the service routing sub-layer only implements service routing, that is, service identifiers are used as the index for computing, scheduling and routing. Specifically, the service routing sublayer implements comprehensive selection of service instances and network paths for service data traffic, and implements efficient service-centric computing, network scheduling, and routing.

*Service delivery: After a service flow is forwarded to Service Server through the SIAN network, and Service Server completes service routing and scheduling in the cloud based on the service ID.

*Service OAM: SIAN enables complete OAM to measure and monitor the health of network links and service instances. The measurement of the OAM system is reported to the control plane to update network metric and resource metric indicators of service instances in real time, and adjust the service SLA status and service routing tables of the streamline in a timely manner. It also supports network-level OAM, which is used to detect service quality, trigger service route re-convergence and self-healing.

6.5. Service ID based Routing and Forwarding reference framework and work flow

This section proposes a reference framework and work flow to demonstrate the end to end service ID based routing and forwarding process as illustrated in figure 4.


                   | Service S-ID 1,instance SI-ID-1 1[metrics] |
           |<------------------------------------------>|
           | Service S-ID 2,instance SI-ID-2 1[metrics] |
         +-+------+          +--------+          +------+-+       +---------+
         |  SIAN  |          |        |          |  SIAN  |       |  Edge   |
         | Ingress|          |        |          | Egress |       |  Site   |
         | +----+ |          |        |          |        |       |         |
         | |SPIS| |          |        |          |        |       ++-------++
+------+ | +----+ |  Network |underlay|  Network | +----+ |Service||S-ID 1 ||
|client+-+ +----+ |<---------+ domain |<---------+ |SSMC| |<------+|SI-ID-1||
+------+ | |SNMC| |  metrics |        |  metrics | +----+ |metric |+-------+|
         | +----+ |          |        |          | +----+ |       |+-------+|
         | +----+ |          |        |          | |SUTF| |       ||S-ID 2 ||
         | |SUTF| |          |        |          | +----+ |       ||SI-ID-2||
         | +----+ |          |        |          |        |       |+-------+|
         +-+----+-+          +--------+          +--------+       +---------+

Figure 4

6.5.1. Components and working mechanism

  • Service client: A host requests service identification information of a specific application from a management and control system, and generates a data packet that carries the service identification information. If the information is carried in the data packet, the information is used by the SIAN ingress gateway node to determine the address of the instance of the service and the path between the SIAN ingress and egress gateway nodes, so as to forward the data packet to the destination of the service instance. That is, after the service instance is selected, the data packet is directed to the corresponding SR path that meets the application requirements. In the SIAN architecture, service identities must be client-aware, and there are various schemes for carrying them.
  • SIAN Ingress : Receives the service compute network SLA parameters delivered by the service control and management system, and generates the service routing table indexed by service ID in accordance with the compute network resource status on the control plane. Receives and parses the service identifier carried in the user service packet, searches for a service routing entry according to the service identifier, and forwards the service packet.
  • SIAN Egress : The specific service path is terminated at the tail node, and packets are forwarded to the Service server. SIAN Egress connects to a plurality of computing resources and senses status information of the computing resources.
  • Edge site and service instances: The Edge site is usually deployed near the user to install various services (such as AR/VR) that are extremely sensitive to delay and bandwidth, so that users can have better experience in accessing the network. Service instance is an instance resource that provides the service, and can accept, process, and respond to service requests. Generally, a same Edge site may deploy service instances (SI-ID-1 1 and SI-ID-1 2 in FIG. 2) that provide a same service type, or may deploy service instances (SI-ID-1 3 and SI-ID-2 1 in FIG. 2) that provide different service types.
  • SSMC(SIAN Service Metric Collector ): Deployed on the SIAN egress to collect service metric information, including the resource usage, slow request ratio, and average service completion time. The information changes frequently. To avoid too much pressure on the network due to frequent updates, it is recommended that the information be compressed in accordance with the threshold or long period (minutes).
  • SNMC(SIAN Network Metric Collector ): Deployed on the SIAN ingress to collect the network metric information spread by the transport network device and SIAN gateway. The information includes link bandwidth, physical link delay, and link occupation. It is usually spread in the domain through the IGP protocol, and an TE-DB is formed on each network node.
  • SPIS(SIAN Path and Instance Selector): Deployed on the SIAN ingress or centralized server. In some cases, for example, across domains, the SPIS must be deployed on the server. In accordance with the metric information recorded by the SSMC and SNMC, the SPIS is delivered to the forwarding plane SUTF through the control plane calculation by using the service identifier mapping algorithm.
  • SUTF(SIAN User Traffic Forwarder): The SIAN ingress and egress gateways are usually deployed to identify client service request traffic, and select a path and a service instance in accordance with the service forwarding table. The undelay network does not distinguish service traffic, but forwards packets in accordance with the path carried by packets, for example, SRH.

6.5.2. Control Plane Consideration

Service identification based metric notification as well as the forwarding policy would be achieved by extending the existing routing protocols and mechanisms as following:

  • Service metric distribution: The SSMC of the SIAN egress perceives service metric changes and spreads the information in the network domain by using the IGP/BGP protocol. In the overlay model, to spread the service metric to affect the undelay network, it is recommended that this parameter be set to underlay bypass. To reduce the resource consumption of the network control plane and forwarding plane, it is recommended that this parameter be set to SIAN egress to converge service instances and information before spreading.
  • Distributed service route calculation and delivery: The SIAN ingress and egress are deployed by using the overlay model. However, as a network device, the SIAN ingress and egress can interconnect with the undelay network through IGPs to obtain network metric information. The SPIS obtains SSMC and SNMC records metric information, calculates service routes by using the constraint-based algorithm, and delivers the information to the SUTF for service access. This overlay model can still achieve the goal of joint service and network calculation, and achieve joint traffic engineering of streamline computing and networks.
  • Deployment of centralized service route computation: The SPIS is deployed on the compute network controller. The metric information collected by the SNMC and the compute network controller is reported through BGP-LS. The metric information collected by the SSMC can also be reported through the extended BGP-LS protocol. In a cross-domain scenario, this is the only option for implementing service routing.

6.5.3. Data Plane Consideration

A Service ID defined in this draft owns its unique semantics in the forwarding procedure. The forwarding plane regards the Service ID as a simple indicator to steer the traffic in the purportedly overlay service routing layer. It is also gifted with possible incremental values and scalability, security insurance through a whole service process based on Service ID for instance.

6.5.3.1. Service ID Encapsulated in The IP Address Field

Service ID can be encapsulated in the IP address field in an IPv6 header. Typical encapsulation methods are displayed as below.When the Service ID is encapsulated in the IP address field in an IPv6 header, its semantics is preserved and maintained from the client to the network domain.


                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                       |               |               |       |
        |         Prefix        |      Node     |   Service ID  |Padding|
        |                       |               |               |       |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 5
6.5.3.2. Service ID Standalone Encapsulation

Service ID can be encapsulated in a standalone position which decouples from IP addresses. Service ID encapsulated in the Flow Label field.


0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |      Flow Label(Service ID)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Payload Length          |  Next Header  |   Hop Limit   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                             Source                            |
|                             Address                           |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                           Destination                         |
|                             Address                           |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 6

Service ID can also be encapsulated and carried IPve extention headers as following:


0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next Header  |  Hdr Ext Len  |      Options(variable)        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                        (Service ID)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 7

When the Service ID is encapsulated in a standalone position in an IPv6 header, a corresponding unique service semantics is preserved and maintained from the client to the network and is also capable of being delivered into the cloud. As illustrated in section 4,, a standalone Service ID enables global service provisioning in the whole service process across network and cloud sites.

6.5.3.3. Service ID-based forwarding

Service forwarding table: In a traditional network, services are identified through 5-tuples. If the network needs to distinguish services, QoS policy remark dscp is used to hand over services to the SR-TE ingress gateway of the Underlay in IP+DSCP mode. Traffic-based automatic traffic diversion implements fine mapping, and the SR-TE technology ensures the scalability of the solution. It should be noted that although this solution has clear management boundaries, network device resource consumption and configuration complexity cannot be ignored. Considering that the service ID is directly mapped to the SLA for service access, a routing table based on the service ID is directly designed. The routing table is directly interconnected with the SR-TE POLICY that meets the SLA. The user directly carries the service ID and queries the table on the SIAN ingress to provide services. In this way, the above-mentioned limitation problem is solved. Depending on whether service information is aggregated, two models are supported: Model 1 (non-aggregated): The service forwarding table carries the policy path and the selected service instance. Model 2 (aggregated): The service forwarding table carries the policy path or service site identification.

Service ID encapsulation: Because of its powerful programmable capability, the SR-MPLS/SRv6 is currently selected by the SIAN gateway and transport network. For terminals and cloud services, the IPv4 is used for access and interconnection. In the future, the SR-MPLS/SRv6 will gradually transit to the IPv6. Therefore, the SR-MPLS/SRv6 needs to support the IPv4 and IPv6 scenarios. There are multiple encapsulation modes.

Service packet forwarding: After receiving a service request packet, the SIAN ingress obtains the service identifier and searches the service forwarding table. Based on the service forwarding table model 1, the SIAN ingress modifies the destination policy in the service request packet to the service instance policy carried in the forwarding table, encapsulates the tunnel header in accordance with the policy information, and forwards the packet. After decapsulating the tunnel encapsulation packet, the SIAN egress forwards the packet in accordance with the standard IP route. For the service forwarding table model 2, the SIAN ingress does not modify the tunnel header encapsulated in accordance with the policy information and forwards the packet. After decapsulating the tunnel header, the SIAN egress searches the local service forwarding table in accordance with the service identifier, modifies the destination IP of the service packet to the IP in the service forwarding table, and forwards the packet based on the standard IP route. Regardless of the forwarding model, the underlay node does not perceive the information inside the tunnel and forwards the packet.

Service flow affinity in the service forwarding table: Flow affinity means that packets from the same flow are always sent to the same egress and processed by the same service instance.

For the service forwarding plane table 1 model, when a new flow arrives at the ingress, after the best service instance and egress are determined, the ingress updates the flow identifier (5-tuple), preferred egress, and affinity timeout time to the flow binding table. The destination egress is already the real service instance egress, and the egress does not need to search the flow affinity table.

For the service forwarding plane table 2 model, when a new flow arrives at the ingress, after only the best egress egress is determined, the ingress updates information such as a flow identifier (information such as a 5-tuple is distinguished), a preferred egress, and an affinity timeout time to the flow binding table. Because a destination egress is not determined, the egress still needs to search the flow affinity table to obtain an instance egress, modify the flow affinity table, and perform table lookup and forwarding of an egress forwarding table.

6.6. OAM Consideration

The main function of the OAM is to detect network defects before an abnormal event is activated. It isolates correctable errors or time errors within a certain range and does not interfere with network operation, thus ensuring that the operator fulfills its QoS commitment and achieves the pre-signed SLA.

The OAM generally includes a fault management (FM) function and a performance management (PM) function. FM features such as CC, CV, and RDI automatically detect and locate defects in the network. PM features such as LM, DM, and Throughput can diagnose service degradation. The OAM function is also the key to network survivability and triggering network protection.


   |                |                     |                     |
   | Access network |  Transport network  |  Data center network|
   |<-------------->|<------------------->|<------------------->|
   |                |                     |                     |
+--+---+        +---+---+  +--------+  +--+---+            +----+---+
|client+--------+ SIAN  +--+underlay+--+ SIAN +------------+Services|
+--+---+        |Ingress|  |  node  |  |Egress|            +----+---+
   |            +---+---+  +---+----+  +--+---+                 |
   |                | Link OAM | Link OAM |    Service OAM      |
   |                |<-------->|<-------->|<------------------->|
   |                |          |          |                     |
   |                |   Network E2E OAM   |                     |
   |                |<------------------->|                     |
   |                |                     |                     |
   |                |         Network to Service E2E OAM        |
   |                |<----------------------------------------->|
   |                |                                           |
   |                 Client to Service E2E OAM                  |
   |<---------------------------------------------------------->|
   |                                                            |

Figure 8

In addition to a conventional network domain OAM technology, the SIAN OAM also introduces computing power-related OAMs. Referring to an architecture of the SIAN OAM in figure 8. the SIAN OAM specifically includes the following layers:

  • The base-layer OAM: includes the network Link OAM (such as BFD, EFM, and MPLS-LM-DM) and the Service OAM (such as ping and keep alive) from the SIAN egress gateway to the service instance. The related OAM detection results are used as the reference and factor for service and network joint traffic engineering calculation, and are also used for triggering fast convergence through fault detection.
  • The network-layer OAM: includes Network E2E OAM (such as BFD, INT, TWAMP, SR-PM, and MPLS-LM-DM) and Network To Service E2E OAM (such as ping, INT, and RTT mesurement). It implements network fault and quality deterioration perception, and is respectively used to trigger network-segment SLA and network-to-service SLA. It is used to trigger the recalculation of network, service paths, and instances to achieve service SLA. At the same time, it is self-proofed.
  • The application layer OAM: includes the Client To Service E2E OAM (such as ping, INT, and http ping), which is used to implement application-level end-to-end detection and evaluate the achievement of application-level SLA. In most cases, software-level application-level burial points can be used to implement end-to-end QoS detection for applications.

6.7. End to end service flow upon service ID

As illustrated above, the service ID could be materialized in different fields of the data packet, the end to end service flow would quite different when the service ID is put in the fixed field such as destination address field and in the extension header as a standalone encapsulation respectively.

6.7.1. Service ID in destination address

The client in the terminal obtains the service ID by either DNS inquiry or other subscription processes, and encapsulates the service ID in the field of destination address. When the service request arrives at the ingress gateway which is aware of service ID, the ingress retrieves the service ID and treats the request as well as the subsequent flow according to the service ID specific policy maintained at the ingress.In particular, the policy here is actually service ID-based addressing in which both service ID and its corresponding service requirements which could be satisfied by the network would be involved. From the perspective of service ID awareness, it could be only ingress and egress related while the traditional underlay network nodes would transmit the service flow in the scheduled networking policy without being aware of service ID.When the service request arrives at the egress gateway, it could continue forwarding the service request according to the constraints associated with the service ID beyond networking and the policy would be terminated otherwise.The key point here about service ID in destination address is the traditional service discovery process such as DNS could stay as it is and therefore the client in the terminal would not be impacted.

6.7.2. Service ID in flow label and extension headers

The service ID encapsulated in the extension header in a standalone way by the client of the terminal could remain intact through the entire network as well as the cloud site, and thus be treated at the service ID-awareness nodes which would retrieve it and steer the service traffic according to the service ID specific policy. The service work flow would be the same as that of service ID in destination address except the following sub-work flow:

  • Adaptation has to occur at client of the terminal because an additional extension header encapsulated with service ID should be added to the original data packet header.
  • At egress, when the service traffic continue to be forwarded to other service ID-unaware network domain or cloud sites, the service ID could remain in the user packet header even when the network address translation would be executed.

7. Acknowledgements

To be added upon contributions, comments and suggestions.

8. IANA Considerations

This memo includes no request to IANA.

9. Security Considerations

A standalone service ID in routing network would add a new threat exposure in terms of networking sercurity.However, service ID of this proposal should be governed and managed by the network and cloud platform, so service ID should be strictly handled within a closed system. The security related behaviors with regard to networking node would proposed in other documents.

10. Informative References

[I-D.ldbc-cats-framework]
Li, C.L., "A Framework for Computing-Aware Traffic Steering (CATS)", , <https://datatracker.ietf.org/doc/draft-ldbc-cats-framework/>.
[I-D.li-apn-framework]
Li, ZB.L., "Application-aware Networking (APN) Framework", , <https://datatracker.ietf.org/doc/draft-li-apn-framework/>.
[I-D.trossen-rtgwg-rosa-arch]
Trossen, D.T., "Architecture for Routing on Service Addresses", , <https://datatracker.ietf.org/doc/draft-trossen-rtgwg-rosa-arch/>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.

Authors' Addresses

Daniel Huang
ZTE Corporation
Nanjing
China
Ge Chen
China Telecom
Guangzhou
China
Jie Liang
China Telecom
Guangzhou
China
Yan Zhang
China Unicom
Beijing
China
Dong Yang
Beijing Jiaotong University
Beijing
Dongyu Yuan
ZTE Corporation
Nanjing
China
Fu Huakai
ZTE Corporation
Wuhan
China
Cheng Huang
ZTE Corporation
Shanghai
Yong Guo
ZTE Corporation
Shanghai