Skip to main content

BGP Extension for 5G Edge Service Metadata
draft-ietf-idr-5g-edge-service-metadata-11

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Linda Dunbar , Kausik Majumdar , Haibo Wang , Gyan Mishra , Zongpeng Du
Last updated 2023-10-23 (Latest revision 2023-10-19)
RFC stream Internet Engineering Task Force (IETF)
Formats
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd (None)
IESG IESG state I-D Exists
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-idr-5g-edge-service-metadata-11
Network Working Group                                          L. Dunbar
Internet-Draft                                                 Futurewei
Intended status: Standards Track                             K. Majumdar
Expires: 25 April 2024                                   Microsoft Azure
                                                                 H. Wang
                                                                  Huawei
                                                               G. Mishra
                                                                 Verizon
                                                                   Z. Du
                                                            China Mobile
                                                         23 October 2023

               BGP Extension for 5G Edge Service Metadata
               draft-ietf-idr-5g-edge-service-metadata-11

Abstract

   This draft describes a new Metadata Path Attribute and some Sub-TLVs
   for egress routers to advertise the Metadata about the attached edge
   services (ES).  The Edge Service Metadata can be used by the ingress
   routers in the 5G Local Data Network to make path selections not only
   based on the routing cost but also the running environment of the
   edge services.  The goal is to improve latency and performance for 5G
   edge services.

   The extension enables an edge service at one specific location to be
   more preferred than the others with the same IP address (ANYCAST) to
   receive data flow from a specific source, like a specific User
   Equipment (UE).

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119] [RFC8174]
   when, and only when, they appear in all capitals, as shown here.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Dunbar, et al.            Expires 25 April 2024                 [Page 1]
Internet-Draft                Metadata Path                 October 2023

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 25 April 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions used in this document . . . . . . . . . . . . . .   3
   3.  Metadata Influenced Ingress Node Behavior . . . . . . . . . .   4
     3.1.  Metadata Influenced BGP Path Selection  . . . . . . . . .   5
     3.2.  Ingress Router Forwarding Behavior  . . . . . . . . . . .   5
     3.3.  Forwarding Behavior when UEs Move . . . . . . . . . . . .   5
   4.  Edge Service Metadata Encoding  . . . . . . . . . . . . . . .   6
     4.1.  Metadata Path Attribute . . . . . . . . . . . . . . . . .   6
       4.1.1.  Metadata Path Attribute Handling Procedure  . . . . .   6
       4.1.2.  TLV Format  . . . . . . . . . . . . . . . . . . . . .   7
       4.1.3.  Error Handling  . . . . . . . . . . . . . . . . . . .   8
     4.2.  The Site Preference Index Sub-TLV . . . . . . . . . . . .   8
     4.3.  Capacity Availability Index Metadata  . . . . . . . . . .   9
       4.3.1.  Site Index Associated to Routes . . . . . . . . . . .  10
       4.3.2.  BGP UPDATE with standalone Site Availability Index  .  10
     4.4.  Service Delay Prediction Index  . . . . . . . . . . . . .  11
       4.4.1.  Service Delay Prediction Sub-TLV  . . . . . . . . . .  12
       4.4.2.  Service Delay Prediction Based on Load Measurement  .  12
       4.4.3.  Raw Load Measurement Sub-TLV  . . . . . . . . . . . .  13
   5.  Service Metadata Influenced Decision Process  . . . . . . . .  14
     5.1.  Integrating Network Delay with the Service Metrics  . . .  14
     5.2.  Integrating with BGP decision process . . . . . . . . . .  15
   6.  Service Metadata Propagation Scope  . . . . . . . . . . . . .  16
   7.  Minimum Interval for Metrics Change Advertisement . . . . . .  17
   8.  Validation and Error Handling . . . . . . . . . . . . . . . .  17

Dunbar, et al.            Expires 25 April 2024                 [Page 2]
Internet-Draft                Metadata Path                 October 2023

   9.  Manageability Considerations  . . . . . . . . . . . . . . . .  18
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  18
   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
     11.1.  Metadata Path Attribute  . . . . . . . . . . . . . . . .  18
     11.2.  Metadata Path Attribute Sub-Types  . . . . . . . . . . .  18
   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  19
     12.1.  Normative References . . . . . . . . . . . . . . . . . .  19
     12.2.  Informative References . . . . . . . . . . . . . . . . .  20
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

1.  Introduction

   [CATS-Edge-Service] describes the 5G Edge Computing background and
   how BGP can be used to advertise the running status and environment
   of the directly attached 5G edge services.  Besides the Radio Access,
   5G [TS.23.501-3GPP] is characterized by having edge services closer
   to the Cell Towers reachable by Local Data Networks (LDN) . From IP
   network perspective, the 5G LDN is a limited domain [RFC8799] with
   edge services a few hops away from the ingress nodes.  Only selective
   UE services are considered as 5G low latency Edge Services.

   This document describes a new Metadata Path Attribute added to a BGP
   UPDATE message [RFC4271] for egress routers to advertise the Metadata
   about the directly attached edge services.  The Edge Service Metadata
   in this document includes the site availability index, the site
   preference, and the service delay prediction index, which are further
   explained in Section 4.

   Note: The proposed Edge Service Metadata are not intended for the
   best-effort services reachable via the public internet.  The Edge
   Service Metadata can be used by the ingress routers to make path
   selections for selective low latency services based on not only the
   network distance but also the running environment of the edge cloud
   sites.  The goal is to improve latency and performance for 5G ultra-
   low latency services.

   The extension is targeted for a single domain with RR controlling the
   propagation of the BGP UPDATE.  The Edge Service Metadata is only
   attached to the services (routes) hosted in the 5G edge cloud sites,
   which are only a small subset of services initiated from UEs.  E.g.,
   not for UEs accessing many internet sites.

2.  Conventions used in this document

   The following conventions are used in this document.

   Edge DC:  Edge Data Center, which provides the hosting environment

Dunbar, et al.            Expires 25 April 2024                 [Page 3]
Internet-Draft                Metadata Path                 October 2023

      for the edge services.  An Edge DC might host 5G core functions in
      addition to the frequently used edge services.

   gNB:  next generation Node B [TS.23.501-3GPP]

   RTT:  Round-trip Time

   PSA:  PDU Session Anchor (UPF) [TS.23.501-3GPP]

   UE:  User Equipment

   UPF:  User Plane Function [TS.23.501-3GPP]

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC8174] when, and only when, they appear in all capitals, as
   shown here.

3.  Metadata Influenced Ingress Node Behavior

   The goal of this Edge Service Metadata Path Attribute is for egress
   routers to propagate the metrics about their running environment to
   ingress routers so that the ingress routers can make path selections
   based on not only the routing cost but also the running environment
   of the edge services.  Multiple metrics can be attached to one
   Metadata Path Attribute.  One Metadata Path Attribute can contain
   computing service capability information, computing service states,
   computing resource states of the corresponding edge site, or more.
   Computing service capability information can be used to record
   information of the computing power node or initialization deployment
   information for computing service initialization.  Computing service
   states can include one of the service connection numbers, service
   duration, and so on.  Computing resource states can be detailed
   information on computing resources such as CPU/GPU.  They can also be
   an abstract metric from these detailed parameters to indicate the
   resource status of the edge site.  Many more metrics about the
   running environment are being discussed at CATS WG [draft-ldbc-cats-
   framework].  This document illustrates a few examples of Sub-TLVs of
   the metrics under the Edge Service Metadata Path Attribute:

   -  the site capacity availability index

   -  the site preference index

   -  the service delay predication index x, and

   -  the raw load measurement.

Dunbar, et al.            Expires 25 April 2024                 [Page 4]
Internet-Draft                Metadata Path                 October 2023

   This section specifies how those Metadata impact the ingress node's
   path selections.

3.1.  Metadata Influenced BGP Path Selection

   When an ingress router receives BGP updates for the same IP prefix
   from multiple egress routers, all these egress routers' loopback
   addresses are considered as the next hops for the IP prefix.  For the
   selected low latency edge services, the ingress router BGP engine
   would call an Edge Service Management function that can select paths
   based on the Edge Service Metadata received.  [CATS-Edge-Service] has
   an exemplary algorithm to compute the weighted path cost based on the
   Edge Service Metadata carried by the Sub-TLV(s) specified in this
   document.

   Section 5 has the detailed description of the Edge Service Metadata
   influenced optimal path selection.

3.2.  Ingress Router Forwarding Behavior

   When the ingress router receives a packet and does a lookup on the
   route in the FIB, it gets the destination prefix's whole path.  It
   encapsulates the packet destined towards the optimal egress node.

   For subsequent packets belonging to the same flow, the ingress router
   needs to forward them to the same egress router unless the selected
   egress router is no longer reachable.  Keeping packets from one flow
   to the same egress router, a.k.a.  Flow Affinity, is supported by
   many commercial routers.  Most registered EC services have relatively
   short flows.

   How Flow Affinity is implemented is out of the scope for this
   document.  Appendix A has one example illustrating achieving flow
   affinity.

3.3.  Forwarding Behavior when UEs Move

   When a UE moves to a new 5G gNB which is anchored to the same UPF,
   the packets from the UE traverse to the same ingress router.  Path
   selection and forwarding behavior are same as before.

   If the UE maintains the same IP address when anchored to a new UPF,
   the directly connected ingress router might use the information
   passed from a neighboring router to derive the optimal Next Hop for
   this route.  The detailed algorithm is out of the scope of this
   document.

Dunbar, et al.            Expires 25 April 2024                 [Page 5]
Internet-Draft                Metadata Path                 October 2023

4.  Edge Service Metadata Encoding

4.1.  Metadata Path Attribute

   The Metadata Path Attribute is an optional transitive BGP Path
   attribute to carry metrics and metadata about the edge services
   attached to the egress router.  The Metadata Path Attribute, to be
   assigned by IANA [RFC2042], consists of a set of Sub-TLVs, and each
   Sub-TLV contains information for specific metrics of the edge
   services.

4.1.1.  Metadata Path Attribute Handling Procedure

   Most BGP UPDATE messages don't include the Metadata Path Attribute.
   For the limited edge services that need to advertise the metadata
   about the services, the Metadata Path Attribute can be included in a
   BGP UPDATE message [RFC4271] together with other BGP Path Attributes
   [IANA-BGP-PARAMS], such as Communities [RFC4360], NEXT_HOP, Tunnel
   Encapsulation Path Attribute [RFC9012], etc.

   The BGP Metadata Path attribute MAY be attached to BGP IPv4/IPv6
   Unicast prefixes, BGP Labeled IPv4/IPv6 prefixes [RFC8277], and IPv4/
   IPv6 Anycast prefixes [RFC4786].  In order to prevent distribution of
   the BGP Metadata Path Attribute beyond its intended scope of
   applicability, attribute filtering SHOULD be deployed to remove the
   BGP Metadata Path attribute at the administrative boundary.

   A BGP speaker that advertises a path received from one of its
   neighbors SHOULD advertise the BGP Metadata Path attribute received
   with the path without modification as long as the BGP Metadata Path
   attribute was acceptable.  If the path did not come with a BGP
   Metadata Path attribute, the speaker MAY attach a BGP Metadata
   Attribute to the path if configured to do so.

Dunbar, et al.            Expires 25 April 2024                 [Page 6]
Internet-Draft                Metadata Path                 October 2023

   The Metadata Path Attribute MUST contain at least one metadata Sub-
   TLV.  Multiple Metadata Sub-TLVs can be included in a Metadata Path
   Attribute in one BGP UPDATE message.  The content of the Sub-TLVs
   present in the BGP Metadata Path attribute is determined by the
   configuration.  When a BGP Speaker does not recognize some of the
   Sub-TLVs within one Metadata Path Attribute in a BGP UPDATE message,
   the BGP Speaker should forward the received BGP UPDATE message
   without any change if the BGP UPDATE message is marked as transitive.
   The domain ingress nodes SHOULD process the recognized Sub-TLVs
   carried by the Metadata Path Attribute and ignore the unrecognized
   Sub-TLVs.  By default, a BGP speaker does not report any unrecognized
   Sub-TLVs within a Metadata Path Attribute unless configured to send a
   notification to its management system.  The ingress node should be
   configured with an algorithm to combine the recognized metrics
   carried by the Sub-TLVs within a Metadata Path Attribute of the
   received BGP UPDATE message.

   The metrics Sub-TLVs included in the Metadata Path Attribute apply to
   all the address families carried in the NLRI field of the BGP UPDATE
   message [RFC4271].  For a multi-protocol BGP UPDATE message [RFC4760]
   [RFC7606], the metrics Sub-TLVs included in the Metadata Path
   Attribute apply to all the AFIs/SAFIs address families carried by the
   MP_REACH_NLRI.

4.1.2.  TLV Format

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   Attr. Flags |MetaDataPathAtt|        Length (2 Octets)      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |         Value (multiple Metadata Sub-TLVs)                    |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 1: Metadata Path Attribute

   Attr.Flags:  Attribute flags, defined as:

   -  The high-order bit (bit 0): set to 1.

   -  The second high-order bit (bit 1): set to 0 to indicate that the
      service-metadata is not transitive.  Only intended for the
      receiving router.

   -  The third high-order bit (bit 2): same as specified by RFC4721.

Dunbar, et al.            Expires 25 April 2024                 [Page 7]
Internet-Draft                Metadata Path                 October 2023

   -  The fourth high-order bit (bit 3): set to 1 to indicate there are
      two octets for the Length field.

   MetaDataPathAtt:  Metadata Path Attribute: TBD1 (assigned by IANA.

   Length:  the total number of octets of the value field.

   All values in the Sub-TLVs are unsigned 32 bits integers.

4.1.3.  Error Handling

   This section specifies a set of metadata Sub-TLVs for the 5G edge
   services.  A BGP speaker MUST NOT include multiple instances with the
   same type for the Sub-TLVs specified in this document in one Metadata
   Path Attribute.  A BGP speaker SHOULD NOT include more than one
   Metadata Path Attribute in one BGP Update message.

   A BGP UPDATE message that includes the Metadata Path Attribute
   doesn't change the BGP Error Handling procedure specified in the
   [RFC7606].  Where more than one sub-TLVs specified in this document
   are present in a Metadata Path Attribute, they are processed
   independently.  If one of the Sub-TLVs has an invalid value, e.g.,
   out of its specified ranges, the Sub-TLV with the invalid value is
   ignored by the BGP receiver.  By default, no notification is required
   unless configured to send a notification to its management system.
   All other Sub-TLVs within the Metadata Path Attribute with the valid
   values MUST be processed.

4.2.  The Site Preference Index Sub-TLV

   Different services might have different preference index values
   configured for the same site.  For example, Service-A requires high
   computing power, Service-B requires high bandwidth among its
   microservices, and Service-C requires high volume storage capacity.
   For a DC with relatively low storage capacity but high bisectional
   bandwidth, its preference index value for Service-B is higher and
   lower for Service-C.  Site Preference Index can also be used to
   achieve stickiness for some services.

   It is out of the scope of this document how the preference index is
   determined or configured.

   The Preference Index Sub-TLV has the following format:

Dunbar, et al.            Expires 25 April 2024                 [Page 8]
Internet-Draft                Metadata Path                 October 2023

      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |Site-Preference-Index Sub-Type |               Length          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   Preference Index value                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 2: Preference Index Sub-TLV

   -  Site-Preference-Index Sub-Type =1 (specified in this document).

   -  Preference Index value: 1-100, with 1 being the least preferred,
      and 100 being the most preferred.

   When the Preference Index value is outside the range of 1-100, the
   value carried in this Sub-TLV is ignored.

4.3.  Capacity Availability Index Metadata

   Capacity Availability Index indicates if an edge site, which can be a
   building, a floor, a pod, a row of server racks, etc., has full
   capacity, reduced capacity, or is completely out of service.
   Therefore, the value is 0-100, with 100% indicating the site is fully
   functional, 0% indicating the site is entirely out of service, and
   50% indicating the site is 50% degraded.

   Cloud Site/Pod failures and degradation include but are not limited
   to, a site capacity degradation or an entire site going down caused
   by a variety of reasons, such as fiber cut connecting to the site or
   among pods, cooling failures, insufficient backup power, cyber
   threats attacks, too many changes outside of the maintenance window,
   etc.  Fiber-cut is not uncommon within a Cloud site or between sites.

   When those failure events happen, the edge (egress) router is running
   fine.  Therefore, the ingress routers with paths to the egress router
   can't use BFD to detect the failures.

   When there is a failure occurring at an edge site (or a pod), many
   instances can be impacted.  In addition, the routes (i.e., the IP
   addresses) in the site might not be aggregated nicely.  Instead of
   many BGP UPDATE messages to the ingress routers for all the instances
   impacted, the egress router can send one single BGP UPDATE indicating
   the capacity availability of the site.  The ingress routers can
   switch all or a portion of the instances that are associated with the
   site depending on how much the site is degraded.

   The Capacity Availability Index Sub-TLV:

Dunbar, et al.            Expires 25 April 2024                 [Page 9]
Internet-Draft                Metadata Path                 October 2023

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      CapAvailIdx Sub-Type     |         Reserved              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Site-ID (2 octets)     | Site Availability Percentage  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 3: Capacity Availability Index Sub-TLV

   - CapAvailIdx:  Capability-Availability-Index Sub-Type=2 (Specified
      in this document).

   - Site ID:  identifier for a site, which can be a pod, a row of
      server racks, a floor, or an entire DC.  There could be multiple
      sites connected to the egress router (a.k.a.  Edge DC GW)

   Site Availability Percentage:  represent the percentage of the site
      availability, e.g., 100%, 50%, or 0%. When a site goes dark, the
      Index is set to 0.  50 means 50% capacity functioning.  When the
      value is outside the 0-100% range, the value carried in this Sub-
      TLV is ignored.

4.3.1.  Site Index Associated to Routes

   An egress router must append the Site Capacity Availability Index
   Sub-TLV with a BGP ROUTE UPDATE message for the registered low
   latency edge services so that the ingress routers can associate the
   Site reference Identifier to the route in the Routing table.

   However, it is unnecessary to include the Site Capacity Availability
   Index for every BGP Update message if there is no change to the site-
   reference identifier or the Capacity Availability value for the
   service instances.

4.3.2.  BGP UPDATE with standalone Site Availability Index

   When an ingress router receives a BGP update message from Router-X
   with a prefix of the loopback for Router-X and the Metadata Path
   Attribute with the Capability Availability Index Sub-TLV, the new
   capability availability index value is applied to all route that have
   the following two constraints: a) have router-X as their next hop,
   and b) associated with site-ID.  When there are failures or
   degradation to a site, the corresponding egress router can send one
   BGP UPDATE with the Capacity Availability Site Index with the egress
   router's loopback address.

Dunbar, et al.            Expires 25 April 2024                [Page 10]
Internet-Draft                Metadata Path                 October 2023

4.4.  Service Delay Prediction Index

   It is desirable for an ingress router to select a site with the
   shortest processing time for an ultra-low latency service.  But it is
   not easy to predict which site has "the fastest processing time" or
   "the shortest processing delay" for an incoming service request
   because:

   -  The given service instance shares the same physical infrastructure
      with many other applications and service instances.  Service
      requests by other applications, UEs, or applications running
      behavior can impact the processing time for the given service
      instance.

   -  The given service instance can be served by a cluster of servers
      behind a Load Balancer.  To the network, the service is identified
      by one service ID.

   -  The service complexity is different.  One service may call many
      microservices, need to access multiple backend databases, and need
      to go through sophisticated security scrubbing functions, etc.
      Another service can be processed by a few simple steps.  Without
      the application internal logic, it is not easy to estimate the
      processing time for future service requests.

   Even though utilization measurements, like those below, are collected
   by most data centers, they cannot indicate which site has the
   shortest processing time.  A service request might be processed
   faster on Site-A even if Site-A is overutilized.

   o  Server utilization for the server where the instance is
      instantiated.

   o  The network utilization for the links to the server where the
      instance is instantiated.

   o  The number of databases that the service instance will access.

   o  The memory utilization of the databases

   The remaining available resource at a site is a more reasonable
   indication of process delay for future service requests.

   o  The remaining available Server resources.

   o  The remaining available network utilization for the links to the
      server where the instance is instantiated.

Dunbar, et al.            Expires 25 April 2024                [Page 11]
Internet-Draft                Metadata Path                 October 2023

   o  The number of databases that the service instance will access.

   o  The remaining storage available for the databases.

   The Service Delay Prediction Index is a value that predicts
   processing delays at the site for future service requests.  The
   higher the value, the longer of the delay.

4.4.1.  Service Delay Prediction Sub-TLV

   While out of scope, we assume there is an algorithm that can derive
   the Service Delay Prediction Index that can be assigned to the egress
   router.  When the Service Delay Prediction value is updated, which
   can be triggered by the available resources change, etc., the egress
   router can attach the updated Service Delay Predication value in a
   Sub-TLV under the Metadata Path Attribute of the BGP Route UPDATE
   message to the ingress routers.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ServiceDelayPredict Sub-Type  |               Length          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Service Delay Predication Value                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 4: Service Delay Prediction Index Sub-TLV

   - ServiceDelayPredict:  Service Delay Predication) Sub-type=3
      (specified in this document).

   - Service Delay Predication Value:  an integer in the range of 0-100,
      with 0 indicating that the service delay is negligible and 100
      indicating that the site has the most significant delay compared
      to all other sites for the same service.  When the value is
      outside the 0-100 range, the value carried in this Sub-TLV is
      ignored.

4.4.2.  Service Delay Prediction Based on Load Measurement

   When data centers detailed running status are not exposed to the
   network operator, historic traffic patterns through the egress nodes
   can be utilized to predict the load to a specific service.  For
   example, when traffic volume to one service at one data center
   suddenly increases a huge percentage compared with the past 24 hours
   average, it is likely caused by a larger than normal demand for the
   service.  When this happens, another data center with lower-than-
   average traffic volume for the same service might have a shorter

Dunbar, et al.            Expires 25 April 2024                [Page 12]
Internet-Draft                Metadata Path                 October 2023

   processing time for the same service.

   Here are some measurements that can be utilized to derive the Service
   Delay Predication for a service ID:

   -  Total number of packets to the attached service instance
      (ToPackets);

   -  Total number of packets from the attached service instance
      (FromPackets);

   -  Total number of Bytes to the attached service instance (ToBytes);

   -  Total number of bytes from the attached service instance
      (FromBytes);

   -  The actual load measurement to the service instance attached to a
      CATS-ER can be based on one of the metrics above or including all
      four metrics with different weights applied to each, such as:

      LoadIndex = w1*ToPackets+w2*FromPackes+w3*ToBytes+w4*FromBytes

      Where w1/w2/w3/w4 are between 0-1.  w1+ w2+ w3+ w4 = 1;

      The weights of each metric contributing to the index of the
      service instance attached to a CATS-ER can be configured or
      learned by self-adjusting based on user feedbacks.

   The Service Delay Prediction Index can be derived from
   LoadIndex/24Hour-Average.  A higher value means a longer delay
   prediction.  The egress router can use the ServiceDelayPred sub-TLV
   to indicate to the ingress routers of the delay prediction derived
   from the traffic pattern.

   Note: The proposed IP layer load measurement is only an estimate
   based on the amount of traffic through the egress router, which might
   not truly reflect the load of the servers attached to the egress
   routers.  They are listed here only for some special deployments
   where those metrics are helpful to the ingress routers in selecting
   the optimal paths.

4.4.3.  Raw Load Measurement Sub-TLV

   When ingress routers have embedded analytics tool relying on the raw
   measurements, it is useful for the egress router to send the raw
   measurement.

   Raw Load Measurement Sub-TLV has the following format:

Dunbar, et al.            Expires 25 April 2024                [Page 13]
Internet-Draft                Metadata Path                 October 2023

      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Raw-Load-Measurement Sub-Type |               Length          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   Measurement Period                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of packets to the Edge Service         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of packets from the Edge Service       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of bytes to the Edge Service           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of bytes from the Edge Service         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 5: Service Delay Prediction Index Sub-TLV

   - Raw-Load-Measurement Sub-Type =4 (specified in this document): Raw
   measurements of packets/bytes to/from the Edge Service address.

   - The receiver nodes can compute the Service Delay Prediction for the
   Service based on the raw measurements sent from the egress node and
   preconfigured algorithms.

   - Measurement Period: BGP Update period in Seconds or user-specified
   period.

5.  Service Metadata Influenced Decision Process

5.1.  Integrating Network Delay with the Service Metrics

   As the service metrics and network delays are in different units,
   here is an exemplary algorithm for an ingress router to compare the
   cost to reach the service instances at Site-i or Site-j.

                   SerD-i * CP-j               Pref-j * NetD-i
   Cost-i=min(w *(----------------) + (1-w) *(------------------))
                   ServD-j * CP-i               Pref-i * NetD-j

   CP-i:  Capacity Availability Index at Site-i.  A higher value means
      higher capacity available.

   NetD-i:  Network latency measurement (RTT) to the Egress Router at
      the site-i.

   Pref-i:  Preference Index for Site-i, a higher value means higher
      preference.

Dunbar, et al.            Expires 25 April 2024                [Page 14]
Internet-Draft                Metadata Path                 October 2023

   ServD-i:  Service Delay Predication Index at Site-i for the service,
      i.e., the ANYCAST address [RFC4786] for the service.

   w:  Weight is a value between 0 and 1.  If smaller than 0.5, Network
      latency and the site Preference have more influence; otherwise,
      Service Delay and capacity availability have more influence.

   When a set of service Metadata is converted to a simple metric, a
   decision process is determined by the metric semantics and deployment
   situations.  The goal is to integrate the conventional network
   decision process with the service Metadata into a unified decision-
   making process for path selection.

5.2.  Integrating with BGP decision process

   When an ingress router receives BGP updates for the same IP address
   from multiple egress routers, all those egress routers are considered
   as the next hops for the IP address.  For the selected services
   configured to be influenced by the Edge Service Metadata, the ingress
   router BGP Decision process [IDR-CUSTOM-DECISION] would trigger the
   Edge Service Management function to compute the weight to be applied
   to the route's next hop in the forwarding plane.  The decision
   process is influenced by the Edge Service Metadata associated with
   the client routes, such as Capacity Availability Index, Site
   Preference, and Service Delay Prediction Index, in addition to the
   traditional BGP multipath computation algorithm, such as the Weight,
   Local preference, Origin, MED, etc., shown below:

                        BGP ANYCAST Update
      +--------+ with Metadata    +---------------+
      | BGP    |----------------->| EdgeServiceMgn|
      |Decision|< - - - - - - - - |               |
      +---^-|--+                  +-------|-------+
          | | BGP ANYCAST                 | Update Anycast
          | | Route                       | Route Nexthops
          | | Multi-path NH install       | with weight
      +---|-V--+                          |
      |   RIB  |                          |
      +----+---+                          |
           |                              |
       +---V------------------------------V-------+
       |               Forwarding Plane           |
       |                                          |
       +------------------------------------------+

                   Figure 6: Metadata Influenced Decision

Dunbar, et al.            Expires 25 April 2024                [Page 15]
Internet-Draft                Metadata Path                 October 2023

   When any of those metadata value goes to 0, the effect is the same as
   the routes becoming ineligible via the egress router who originates
   the metadata UPDATE.  But when any of those metadata just degrade,
   there is possibility, even though smaller, for the egress router to
   continue as the optimal next hop.

   Suppose a destination address for aa08::4450 can be reached by three
   next hops (R1, R2, R3).  Further, suppose the local BGP's Decision
   Process based on the traditional network layer policies and metrics
   identifies the R1 as the optimal next hop for this destination
   (aa08::4450).  If the Edge Service Metadata results in R2 as the
   optimal next hop for the prefix, the Forwarding Plane will have R2 as
   the next-hop for the destination address of aa08::4450.

   The Edge Service Metadata influencing next hop selection is different
   from the metric (or weight) to the next hop.  The metric to a next
   hop can impact many (sometimes, tens of thousands) routes that have
   the node as their next hop. while as the Edge Service Metadata only
   impact the optimal next hop selection for a subset of client routes
   that are identified as the edge services.

   When the BGP custom decision [idr-custom-decision] is used, the Edge
   Service Management function would have algorithm to combine the Edge
   Service Metadata attributes with the custom decision to derive the
   optimal next hop for the Edge service routes.

   Note: For a BGP UPDATE message that includes the Edge Servuce
   Metadata Path Attribute with the egress router's loopback prefix, the
   Site Capacity Availability Index value is applied to all the NLRIs
   with the Site-ID indicated in the Edge Service Metadata Path
   Attribute.

6.  Service Metadata Propagation Scope

   Service Metadata are only distributed to the relevant ingress nodes
   interested in the Service, which can be configured or automatically
   formed.

Dunbar, et al.            Expires 25 April 2024                [Page 16]
Internet-Draft                Metadata Path                 October 2023

   For each registered low-latency Service, BGP RT Constrained
   Distribution [RFC4684] can be used to form the Group interested in
   the Service.  The "Service ID", an IP address prefix, is the Route
   Target.  When an ingress router receives the first packet of a flow
   destined to a Service ID, the ingress router sends a BGP UPDATE that
   advertises the Route Target membership NLRI per RFC4684.  The ingress
   router must assign a Timer for the Service ID, as the UE that uses
   the Service ID might move away.  Upon receiving a packet destined for
   the Service ID, the ingress router must refresh the Timer.  The
   ingress router must send a BGP Withdraw UPDATE for the Service ID
   upon expiration of the Timer.

7.  Minimum Interval for Metrics Change Advertisement

   As the metrics change can impact the path selection, the Minimum
   Interval for Metrics Change Advertisement is configured to control
   the update frequency to avoid route oscillations.  Default is 30s.

   Significant load changes at EC data centers can be triggered by
   short-term gatherings of UEs, like conventions, lasting a few hours
   or days, which are too short to justify adjusting EC server
   capacities among DCs.  Therefore, the load metrics change rate can be
   in the magnitude of hours or days.

8.  Validation and Error Handling

   The Metadata Path Attribute contains a sequence of Sub-TLVs.  The
   Metadata Path Attribute's length determines the total number of
   octets for all the Sub-TLVs under the Metadata Path Attribute.  The
   sum of the lengths from all the Sub-TLVs under the Metadata Path
   Attribute should equal the length of the Metadata Path Attribute.  If
   this is not the case, the TLV should be considered malformed, and the
   "Treat-as-withdraw" procedure of [RFC7606] is applied.

   If a Metadata Path attribute can be parsed correctly but contains a
   Sub-TLV whose type is not recognized by a particular BGP speaker,
   that BGP speaker MUST NOT consider the attribute to be malformed.
   Rather, it MUST interpret the attribute as if that Sub-TLV had not
   been present.  If the route carrying the Metadata path attribute is
   propagated with the attribute, the unrecognized Sub-TLV remains in
   the attribute.

Dunbar, et al.            Expires 25 April 2024                [Page 17]
Internet-Draft                Metadata Path                 October 2023

9.  Manageability Considerations

   The Edge Service Metadata described in this document are only
   intended for propagating between Ingress and egress routers of one
   single BGP domain, i.e., the 5G Local Data Networks, which is a
   limited domain with edge services a few hops away from the ingress
   nodes.  Only the selective services by UEs are considered as 5G Edge
   Services.  The 5G LDN is usually managed by one operator, even though
   the routers can be by different vendors.

10.  Security Considerations

   The proposed Edge Service Metadata are advertised within the trusted
   domain of 5G LDN's ingress and egress routers.  The ingress routers
   should not propagate the Edge Service Metadata to any nodes that are
   not within the trusted domain.

11.  IANA Considerations

11.1.  Metadata Path Attribute

   IANA is requested to assign a new path attribute from the "BGP Path
   Attributes" registry.  The symbolic name of the attribute is
   "Metadata", and the reference is [This Document].

      +=======+======================================+=================+
      | Value |             Description              |    Reference    |
      +=======+======================================+=================+
      |  TDB1 |      Metadata Path Attribute         | [this document] |
      +-------+--------------------------------------+-----------------+

11.2.  Metadata Path Attribute Sub-Types

   IANA is requested to create a new sub-registry under the Metadata
   Path Attribute registry as follows:

   Name:  Sub-TLVs under the "Metadata Path Attribute"

   Registration Procedure:  Expert Review [RFC8126].

      Detailed Expert Review procedure will be added per RFC8126.

   Reference:  [this document]

Dunbar, et al.            Expires 25 April 2024                [Page 18]
Internet-Draft                Metadata Path                 October 2023

   +========+==========================+=================+
   |Sub-Type|   Description            | Reference       |
   +========+==========================+=================+
   |      0 | reserved                 | [this document] |
   +--------+--------------------------+-----------------+
   |      1 | Site Preference Index    | [this document] |
   +--------+--------------------------+-----------------+
   |      2 | Site Availability Index  | [this document] |
   +--------+--------------------------+-----------------+
   |      3 | Service Delay Predication| [this document] |
   +--------+--------------------------+-----------------+
   |      4 | Raw Load Measurement     | [this document] |
   +--------+--------------------------+-----------------+
   |  5-254 | unassigned               | [this document] |
   +--------+--------------------------+-----------------+
   |    255 | reserved                 | [this document] |
   +--------+--------------------------+-----------------+

12.  References

12.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
              DOI 10.17487/RFC4271, January 2006,
              <https://www.rfc-editor.org/info/rfc4271>.

   [RFC4360]  Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
              Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
              February 2006, <https://www.rfc-editor.org/info/rfc4360>.

   [RFC4760]  Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
              "Multiprotocol Extensions for BGP-4", RFC 4760,
              DOI 10.17487/RFC4760, January 2007,
              <https://www.rfc-editor.org/info/rfc4760>.

   [RFC4786]  Abley, J. and K. Lindqvist, "Operation of Anycast
              Services", BCP 126, RFC 4786, DOI 10.17487/RFC4786,
              December 2006, <https://www.rfc-editor.org/info/rfc4786>.

Dunbar, et al.            Expires 25 April 2024                [Page 19]
Internet-Draft                Metadata Path                 October 2023

   [RFC7606]  Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K.
              Patel, "Revised Error Handling for BGP UPDATE Messages",
              RFC 7606, DOI 10.17487/RFC7606, August 2015,
              <https://www.rfc-editor.org/info/rfc7606>.

   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
              Writing an IANA Considerations Section in RFCs", BCP 26,
              RFC 8126, DOI 10.17487/RFC8126, June 2017,
              <https://www.rfc-editor.org/info/rfc8126>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8277]  Rosen, E., "Using BGP to Bind MPLS Labels to Address
              Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017,
              <https://www.rfc-editor.org/info/rfc8277>.

   [RFC9012]  Patel, K., Van de Velde, G., Sangli, S., and J. Scudder,
              "The BGP Tunnel Encapsulation Attribute", RFC 9012,
              DOI 10.17487/RFC9012, April 2021,
              <https://www.rfc-editor.org/info/rfc9012>.

12.2.  Informative References

   [CATS-Edge-Service]
              L. Dunbar, K. Majumdar, H. Wang, and G. Mishra, "5G Edge
              Service use Cases", July 2023,
              <https://datatracker.ietf.org/doc/draft-dunbar-cats-edge-
              service-metrics/>.

   [draft-ldbc-cats-framework]
              C. Li, et al, "A Framework for Computing-Aware Traffic
              Steering", August 2023, <https://datatracker.ietf.org/doc/
              draft-ldbc-cats-framework/>.

   [IANA-BGP-PARAMS]
              IANA, "BGP Path Attributes", BGP Path Attributes 
              https://www.iana.org/assignments/bgp-parameters/.

   [IDR-CUSTOM-DECISION]
              A. Retana, R. White, "BGP Custom Decision Process", August
              2017, <https://datatracker.ietf.org/doc/draft-ietf-idr-
              custom-decision/>.

   [RFC2042]  Manning, B., "Registering New BGP Attribute Types",
              RFC 2042, DOI 10.17487/RFC2042, January 1997,
              <https://www.rfc-editor.org/info/rfc2042>.

Dunbar, et al.            Expires 25 April 2024                [Page 20]
Internet-Draft                Metadata Path                 October 2023

   [RFC4684]  Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
              R., Patel, K., and J. Guichard, "Constrained Route
              Distribution for Border Gateway Protocol/MultiProtocol
              Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual
              Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684,
              November 2006, <https://www.rfc-editor.org/info/rfc4684>.

   [RFC8799]  Carpenter, B. and B. Liu, "Limited Domains and Internet
              Protocols", RFC 8799, DOI 10.17487/RFC8799, July 2020,
              <https://www.rfc-editor.org/info/rfc8799>.

   [TS.23.501-3GPP]
              3rd Generation Partnership Project (3GPP), "System
              Architecture for 5G System; Stage 2, 3GPP TS 23.501
              v2.0.1", December 2017.

Authors' Addresses

   Linda Dunbar
   Futurewei
   Dallas, TX,
   United States of America
   Email: ldunbar@futurewei.com

   Kausik Majumdar
   Microsoft Azure
   California,
   United States of America
   Email: kmajumdar@microsoft.com

   Haibo Wang
   Huawei
   Beijing
   China
   Email: rainsword.wang@huawei.com

   Gyan Mishra
   Verizon
   United States of America
   Email: gyan.s.mishra@verizon.com

Dunbar, et al.            Expires 25 April 2024                [Page 21]
Internet-Draft                Metadata Path                 October 2023

   Zongpeng Du
   China Mobile
   Beijing
   China
   Email: duzongpeng@foxmail.com

Dunbar, et al.            Expires 25 April 2024                [Page 22]