Skip to main content

BGP Extension for 5G Edge Service Metadata
draft-ietf-idr-5g-edge-service-metadata-21

Document Type Active Internet-Draft (idr WG)
Authors Linda Dunbar , Kausik Majumdar , Cheng Li , Gyan Mishra , Zongpeng Du
Last updated 2024-07-08
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status (None)
Formats
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd (None)
IESG IESG state I-D Exists
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-idr-5g-edge-service-metadata-21
Network Working Group                                          L. Dunbar
Internet-Draft                                                 Futurewei
Intended status: Standards Track                             K. Majumdar
Expires: 9 January 2025                                  Microsoft Azure
                                                                   C. Li
                                                     Huawei Technologies
                                                               G. Mishra
                                                                 Verizon
                                                                   Z. Du
                                                            China Mobile
                                                             8 July 2024

               BGP Extension for 5G Edge Service Metadata
               draft-ietf-idr-5g-edge-service-metadata-21

Abstract

   This draft describes a new Metadata Path Attribute and some Sub-TLVs
   for egress routers to advertise the Metadata about the attached edge
   services (ES).  The edge service Metadata can be used by the ingress
   routers in the 5G Local Data Network to make path selections not only
   based on the routing cost but also the running environment of the
   edge services.  The goal is to improve latency and performance for 5G
   edge services.

   The extension enables an edge service at one specific location to be
   more preferred than the others with the same IP address (ANYCAST) to
   receive data flow from a specific source, like a specific User
   Equipment (UE).

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   [RFC2119] [RFC8174] when, and only when, they appear in all capitals,
   as shown here.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Dunbar, et al.           Expires 9 January 2025                 [Page 1]
Internet-Draft                Metadata Path                    July 2024

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 9 January 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions used in this document . . . . . . . . . . . . . .   4
   3.  Metadata Influenced Ingress Node Behavior . . . . . . . . . .   4
     3.1.  Metadata Influenced BGP Path Selection  . . . . . . . . .   5
     3.2.  Ingress Router Forwarding Behavior  . . . . . . . . . . .   5
     3.3.  Forwarding Behavior when UEs Move . . . . . . . . . . . .   6
   4.  Edge Service Metadata Encoding  . . . . . . . . . . . . . . .   6
     4.1.  Metadata Path Attribute . . . . . . . . . . . . . . . . .   6
       4.1.1.  Metadata Path Attribute Characteristics . . . . . . .   6
       4.1.2.  Metadata Path Attribute Processing  . . . . . . . . .   7
       4.1.3.  Sub-TLVs Data Processing  . . . . . . . . . . . . . .   7
       4.1.4.  Metadata Path Attribute Handling Procedure  . . . . .   7
     4.2.  Metadata Path Attribute TLV . . . . . . . . . . . . . . .   8
     4.3.  The Site Preference Index Sub-TLV . . . . . . . . . . . .   9
     4.4.  Site Physical Availability Index Metadata . . . . . . . .  10
       4.4.1.  Site Index Associated to Routes . . . . . . . . . . .  12
       4.4.2.  BGP UPDATE with standalone Site Availability Index  .  12
     4.5.  Service Delay Prediction Index  . . . . . . . . . . . . .  12
       4.5.1.  Service Delay Prediction Sub-TLV  . . . . . . . . . .  13
       4.5.2.  Service Delay Prediction Based on Load Measurement  .  14
     4.6.  Raw Measurement Sub-TLV . . . . . . . . . . . . . . . . .  15
     4.7.  Service-Oriented Capability Sub-TLV . . . . . . . . . . .  17
     4.8.  Service-Oriented Available Resource Sub-TLV . . . . . . .  18
   5.  Service Metadata Influenced Decision Process  . . . . . . . .  20
     5.1.  Egress Node Behavior  . . . . . . . . . . . . . . . . . .  20

Dunbar, et al.           Expires 9 January 2025                 [Page 2]
Internet-Draft                Metadata Path                    July 2024

     5.2.  Integrating Network Delay with the Service Metrics  . . .  21
     5.3.  Integrating with BGP decision process . . . . . . . . . .  22
   6.  Service Metadata Propagation Scope  . . . . . . . . . . . . .  23
   7.  Minimum Interval for Metrics Change Advertisement . . . . . .  24
   8.  Validation and Error Handling . . . . . . . . . . . . . . . .  25
   9.  Manageability Considerations  . . . . . . . . . . . . . . . .  25
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  25
   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  27
     11.1.  Metadata Path Attribute  . . . . . . . . . . . . . . . .  27
     11.2.  Metadata Path Attribute Sub-Types  . . . . . . . . . . .  27
   12. Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  28
   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  28
   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  28
     14.1.  Normative References . . . . . . . . . . . . . . . . . .  28
     14.2.  Informative References . . . . . . . . . . . . . . . . .  30
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  30

1.  Introduction

   This document describes a new Metadata Path Attribute added to a BGP
   UPDATE message [RFC4271] for egress routers to advertise the Metadata
   about 5G low latency edge services directly attached to the egress
   routers. 5G [TS.23.501-3GPP]is characterized by having edge services
   closer to the Cell Towers reachable by Local Data Networks (LDN).
   From IP network perspective, the 5G LDN is a limited domain [RFC8799]
   with edge services a few hops away from the ingress nodes.  Only
   selective UE services are considered as 5G low latency edge services.

   Note: The proposed edge service Metadata Path Attribute are not
   intended for the best-effort services reachable via the public
   internet.  The information carried by the Metadata Path Attribute can
   be used by the ingress routers to make path selections for selective
   low latency services based on not only the network distance but also
   the running environment of the edge cloud sites.  The goal is to
   improve latency and performance for 5G ultra-low latency services.

   The extension is targeted for a single domain with RR controlling the
   propagation of the BGP UPDATE.  The edge service Metadata Path
   Attribute is only attached to the low latency services (routes)
   hosted in the 5G edge cloud sites, which are only a small subset of
   services initiated from UEs, not for UEs accessing many internet
   sites.

Dunbar, et al.           Expires 9 January 2025                 [Page 3]
Internet-Draft                Metadata Path                    July 2024

   While the proposed Metadata Path Attribute is particularly beneficial
   for low latency services, the metadata path attributes can be
   expanded to propagate information about GPU availability, power, or
   other resources necessary for compute-intensive services such as AI
   and machine learning.  This flexibility makes it a valuable tool for
   a wide range of applications beyond just low latency services.

2.  Conventions used in this document

   The following conventions are used in this document.

   Edge DC:  Edge Data Center, which provides the hosting environment
      for the edge services.  An Edge DC might host 5G core functions in
      addition to the frequently used edge services.

   gNB:  next generation Node B [TS.23.501-3GPP]

   RTT:  Round-trip Time

   PSA:  PDU Session Anchor (UPF) [TS.23.501-3GPP]

   UE:  User Equipment

   UPF:  User Plane Function [TS.23.501-3GPP]

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC8174] when, and only when, they appear in all capitals, as
   shown here.

3.  Metadata Influenced Ingress Node Behavior

   The goal of this edge service Metadata Path Attribute is for egress
   routers to propagate the metrics about the running environment for a
   subset of edge services to ingress routers so that the ingress
   routers can make path selections based on not only the routing cost
   but also the running environment for those edge services.  The BGP
   speakers that do not support the Metadata Path Attribute can ignore
   the Metadata Path Attribute in a BGP UPDATE Message.  All
   intermediate nodes can forward the entire BGP UPDATE as it is.
   Multiple metrics can be attached to one Metadata Path Attribute.  One
   Metadata Path Attribute can contain computing service capability
   information, computing service states, computing resource states of
   the corresponding edge site, or more.  Computing service capability
   information can be used to record information of the computing power
   node or initialization deployment information for computing service
   initialization.  Computing service states can include one of the

Dunbar, et al.           Expires 9 January 2025                 [Page 4]
Internet-Draft                Metadata Path                    July 2024

   service connection numbers, service duration, and so on.  Computing
   resource states can be detailed information on computing resources
   such as CPU/GPU.  They can also be an abstract metric from these
   detailed parameters to indicate the resource status of the edge site.
   There could be more metrics about the running environment being
   attached to the Metadata Path Attribute, e.g., some of the metrics
   being discussed by the CATS WG.  This document illustrates a few
   examples of Sub-TLVs of the metrics under the edge service Metadata
   Path Attribute:

   -  the site physical availability index

   -  the site preference index

   -  the service delay predication index x, and

   -  the raw load measurement.

   This section specifies how those Metadata impact the ingress node's
   path selections.

3.1.  Metadata Influenced BGP Path Selection

   When an ingress router receives BGP updates for the same IP prefix
   from multiple egress routers, all these egress routers' loopback
   addresses are considered as the next hops for the IP prefix.  For the
   selected low latency edge services, the ingress router BGP engine
   would call an edge service Management function that can select paths
   based on the edge service Metadata received.  Section 5.1 has an
   exemplary algorithm to compute the weighted path cost based on the
   edge service Metadata carried by the Sub-TLV(s) specified in this
   document.

   Section 5 has the detailed description of the edge service Metadata
   influenced optimal path selection.

3.2.  Ingress Router Forwarding Behavior

   When the ingress router receives a packet and does a lookup on the
   route in the FIB, it gets the destination prefix's whole path.  It
   encapsulates the packet destined towards the optimal egress node.

   For subsequent packets belonging to the same flow, the ingress router
   needs to forward them to the same egress router unless the selected
   egress router is no longer reachable.  Keeping packets from one flow
   to the same egress router, a.k.a.  Flow Affinity, is supported by
   many commercial routers.  Most registered EC services have relatively
   short flows.

Dunbar, et al.           Expires 9 January 2025                 [Page 5]
Internet-Draft                Metadata Path                    July 2024

   How Flow Affinity is implemented is out of the scope for this
   document.

3.3.  Forwarding Behavior when UEs Move

   When a UE moves to a new 5G gNB which is anchored to the same UPF,
   the packets from the UE traverse to the same ingress router.  Path
   selection and forwarding behavior are same as before.

   If the UE maintains the same IP address when anchored to a new UPF,
   the directly connected ingress router might use the information
   passed from a neighboring router to derive the optimal Next Hop for
   this route.  The detailed algorithm is out of the scope of this
   document.

4.  Edge Service Metadata Encoding

4.1.  Metadata Path Attribute

   The Metadata Path Attribute is an optional non-transitive BGP Path
   attribute to carry metrics and metadata about the edge services
   attached to the egress router.  The Metadata Path Attribute, to be
   assigned by IANA [RFC2042], consists of a set of Sub-TLVs, and each
   Sub-TLV contains information for specific metrics of the edge
   services.

4.1.1.  Metadata Path Attribute Characteristics

   Only a small subset of BGP UPDATE messages include the Metadata Path
   Attribute.  The choice of which prefix to carry the Metadata Path
   Attribute is determined by local policies.  The Metadata Path
   Attribute can be included in a BGP UPDATE message [RFC4271] together
   with other BGP Path Attributes [IANA-BGP-PARAMS], such as Communities
   [RFC4360], NEXT_HOP, Tunnel Encapsulation Path Attribute [RFC9012],
   etc.

   The metadata Path Attribute has the following characteristics:

   -  Non-transitive

   -  Boundary node filtering SHOULD be deployed to remove the BGP
      Metadata Path attribute at the administrative boundary to prevent
      the distribution of the BGP Metadata Path Attribute beyond its
      intended scope of applicability.

   -  Can be packed with NLRI(AFI/SAFI) Unicast (1/1, 2/1), Label
      Unicast (AFI/SAFI - ) [RFC8277], IPv6 Anycast [RFC4786].

Dunbar, et al.           Expires 9 January 2025                 [Page 6]
Internet-Draft                Metadata Path                    July 2024

   -  MUST contain at least one metadata Sub-TLV.  Multiple Metadata
      Sub-TLVs can be included in a Metadata Path Attribute in one BGP
      UPDATE message.  The choice of the Sub-TLVs present in the BGP
      Metadata Path attribute is determined by the local policies.
      Multiple Sub-TLVs may be carried by a single BGP Metadata Path
      Attribute.

   The metrics Sub-TLVs included in the Metadata Path Attribute apply to
   all the address families carried in the NLRI field of the BGP UPDATE
   message [RFC4271].  For a multi-protocol BGP UPDATE message [RFC4760]
   [RFC7606], the metrics Sub-TLVs included in the Metadata Path
   Attribute apply to all the AFIs/SAFIs address families carried by the
   MP_REACH_NLRI.

4.1.2.  Metadata Path Attribute Processing

   A BGP speaker that advertises a path received from one of its
   neighbors SHOULD advertise the BGP Metadata Path attribute received
   with the path without modification as long as the BGP Metadata Path
   attribute was acceptable.  If the path did not come with a BGP
   Metadata Path attribute, the speaker MAY attach a BGP Metadata
   Attribute to the path if configured to do so.

   A BGP Peer receiving a BGP Metadata Path attribute should ignore Sub-
   TLVs with unknown types and process the recognized Sub-TLVs.  BGP
   Peers should not delete any Sub-TLV from the BGP Metadata Path
   Attribute.

4.1.3.  Sub-TLVs Data Processing

   By default, a BGP speaker does not report any unrecognized Sub-TLVs
   within a Metadata Path Attribute unless configured to send a
   notification to its management system.  The ingress node should be
   configured with an algorithm to combine the recognized metrics
   carried by the Sub-TLVs within a Metadata Path Attribute of the
   received BGP UPDATE message.

4.1.4.  Metadata Path Attribute Handling Procedure

   The Metadata Path Attribute MUST contain at least one metadata Sub-
   TLV.  Multiple Metadata Sub-TLVs can be included in a Metadata Path
   Attribute in one BGP UPDATE message.  The content of the Sub-TLVs
   present in the BGP Metadata Path attribute is determined by the
   configuration.  When a BGP Speaker does not recognize some of the
   Sub-TLVs within one Metadata Path Attribute in a BGP UPDATE message,
   the BGP Speaker should forward the received BGP UPDATE message
   without any change if the transitive bit is set to 1 [RFC4271].  The
   domain ingress nodes should process the recognized Sub-TLVs carried

Dunbar, et al.           Expires 9 January 2025                 [Page 7]
Internet-Draft                Metadata Path                    July 2024

   by the Metadata Path Attribute and ignore the unrecognized Sub-TLVs.
   By default, a BGP speaker does not report any unrecognized Sub-TLVs
   within a Metadata Path Attribute unless configured to send a
   notification to its management system.  The ingress node should be
   configured with an algorithm to combine the recognized metrics
   carried by the Sub-TLVs within a Metadata Path Attribute of the
   received BGP UPDATE message.

   The metrics Sub-TLVs included in the Metadata Path Attribute apply to
   all the address families carried in the NLRI field of the BGP UPDATE
   message [RFC4271].  For a multi-protocol BGP UPDATE message [RFC4760]
   [RFC7606], the metrics Sub-TLVs included in the Metadata Path
   Attribute apply to all the AFIs/SAFIs address families carried by the
   MP_REACH_NLRI.

4.2.  Metadata Path Attribute TLV

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |AttFlag|Reserve|MetaDataPathAtt|Length(1 Octet)| Reserved      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |         Value (multiple Metadata Sub-TLVs)                    |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 1: Metadata Path Attribute

   AttFlag:  Attribute flags, defined as:

   -  The high-order bit (bit 0): set to 1.

   -  The second high-order bit (bit 1): set to 0 to indicate that the
      Metadata Path Attribte is non-transitive.  This means that a BGP
      speaker that does not recognize the attribute will not propagate
      it to other BGP peers [RFC4271].  This non-transitive setting
      prevents the Metadata Path Attribute from being leaked to peers
      outside the domain, ensuring it remains contained within the set
      of BGP speakers that understand it.

   -  The third high-order bit (bit 2): same as specified by [RFC4271].

   -  The fourth high-order bit (bit 3): set to 0 to indicate there is
      one octet for the Length field [RFC4271].

   -  The fifth-eighth high-order bits (bit 4~7) are reserved.

Dunbar, et al.           Expires 9 January 2025                 [Page 8]
Internet-Draft                Metadata Path                    July 2024

   MetaDataPathAtt:  Metadata Path Attribute: TBD1 (assigned by IANA).

   Length:  Specifies the length of the value field in octets, not
      including the first three octets of the AttFlag, Type, and Length
      fields.  The Length value is the total length of the Value field
      plus one reserved octet.

   Reserved:  For future expansion.

   All values in the Sub-TLVs are unsigned 32 bits integers.

4.3.  The Site Preference Index Sub-TLV

   Different services might have different preference index values
   configured for the same site.  For example, Service-A requires high
   computing power, Service-B requires high bandwidth among its
   microservices, and Service-C requires high volume storage capacity.
   For a DC with relatively low storage capacity but high bisectional
   bandwidth, its preference index value for Service-B is higher and
   lower for Service-C.  Site Preference Index can also be used to
   achieve stickiness for some services.

   It is out of the scope of this document how the preference index is
   determined or configured.

   The Preference Index Sub-TLV has the following format:

      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |Site-Preference-Index Sub-Type | Length        | Reserved      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   Preference Index value                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 2: Preference Index Sub-TLV

   -  Site-Preference-Index Sub-Type =1 (specified in this document).

   -  Length: Specifies the total length in octets of the value field
      (not including the Type and Length fields).  The Length = 5 for
      the Site-Preference-Index Sub-Type.

   -  Reserved: Reserved for future use.

   -  Preference Index value: 1 .. (2^32-1); the higher the value, the
      more preference the site.  Preference Index value == 0 is
      reserved.

Dunbar, et al.           Expires 9 January 2025                 [Page 9]
Internet-Draft                Metadata Path                    July 2024

4.4.  Site Physical Availability Index Metadata

   The Site Physical Availability Index indicates the percentage of
   impact on a group of routes associated with a common physical
   characteristic, for example, a pod, a row of server racks, a floor,
   or an entire DC.  The purpose is to use one UPDATE message to
   indicate a group of routes of different NLRIs impacted by a physical
   event.  For example, a power outage to a pod can cause the Site
   Physical Availability Index to be 0% for all the routes in the pod.
   Partial fiber cut to a row of shelves can cause the Site Physical
   Availability Index to 50% for all the routes in those shelves.  The
   value is 0-100, with 100% indicating the site is fully functional, 0%
   indicating the site is entirely out of service, and 50% indicating
   the site is 50% degraded.

   It is recommended to assign each route with one Site-ID.  Depending
   on deployment, one DC can use POD number as Site-ID, another DC can
   use Row of Shelves as the Site-ID.

   Cloud Site/Pod failures and degradation include but are not limited
   to, a site degradation or an entire site going down caused by a
   variety of reasons, such as fiber cut connecting to the site or among
   pods, cooling failures, insufficient backup power, cyber threats
   attacks, too many changes outside of the maintenance window, etc.
   Fiber-cut is not uncommon within a Cloud site or between sites.

   When those failure events happen, the edge (egress) router is running
   fine.  Therefore, the ingress routers with paths to the egress router
   can't use BFD to detect the failures.

   When there is a failure occurring at an edge site (or a pod), many
   instances can be impacted.  In addition, the routes (i.e., the IP
   addresses) in the site might not be aggregated nicely.  Instead of
   many BGP UPDATE messages to the ingress routers for all the
   instances, i.e.  routes, impacted, the egress router can send one
   single BGP UPDATE to indicate the capacity availability of the site.
   The ingress routers can switch all or a portion of the instances
   associated with the site depending on how much the site is degraded.

   The BGP UPDATE for the individual instances (i.e., the routes) can
   include the Capacity Availability Index solely for ingress routers to
   associate the routes with the Side-ID.  The actual Capacity
   Availability Index value, i.e., the percentage for all the routes
   associated with the Side-ID, is generated by the egress routers with
   the egress routers' loopback address as the NLRI.

Dunbar, et al.           Expires 9 January 2025                [Page 10]
Internet-Draft                Metadata Path                    July 2024

   The Site Physical Availability Index Sub-TLV has fixed length of 8
   Octets, including the Type field.  Therefore a Length field is not
   needed.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      PhyAvailIdx Sub-Type     |I|         Reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Site-ID (2 octets)     | Site Availability Percentage  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 3: Site Physical Availability Index Sub-TLV

   - PhyAvailIdx Sub-Type (16 bits):  Indicates teh Site-Physical-
      Availability-Index Sub-Type=2 (Specified in this document).

   Route-Flag I (1 bit):  is a flag bit.  When set to 1, the Site
      Availability Index is for BGP speakers (receivers) to associate
      the routes with the Site-ID.  The Site Availability Percentage
      value is ignored.  When set to 0, the BGP speakers (receivers)
      should apply the Site Availability Index value to all the routes
      associated with the Site-ID.

   Reserved (15 bits):  Reserved for future use.  The bits are set to
      zero upon transmission, and ignored upon reception.

   - Site ID (16 bits):  is an identifier for a group of routes
      associated with a common physical characteristic, for example, a
      pod, a row of server racks, a floor, or an entire DC.  The purpose
      is to use one UPDATE message to indicate a group of routes
      impacted by a physical event.  Those routes might be from
      different address families or NLRIs.  There could be multiple
      sites connected to one egress router (a.k.a.  Edge DC GW).

   Site Availability Percentage (16 bits):  When the RouteFlag-I is 1,
      the Site Availability Percentage is ignored by the Ingress
      routers.  When the RouteFlag I is set to 0, the Site Availability
      Percentage represents the percentage of the site availability for
      all the routes associated with the Site-ID, e.g., 100%, 50%, or
      0%. When a site goes dark, the Index is set to 0. 50 means 50%
      functioning.  When the value is outside the 0-100% range, the
      value carried in this Sub-TLV is ignored.

Dunbar, et al.           Expires 9 January 2025                [Page 11]
Internet-Draft                Metadata Path                    July 2024

4.4.1.  Site Index Associated to Routes

   An egress router sets itself as the next hop for a BGP peer before
   sending an UPDATE with the Metadata Path Attribute that includes the
   Site Physical Availability Index Sub-TLV.  The Site Physical
   Availability Index Sub-TLV (with RouteFlag-I=1) is for ingress
   routers to associate the Site Identifier with the prefixes.

   However, it is not necessary to include the Site Physical
   Availability Index Sub-TLV for every BGP Update message if there is
   no change to the Site Identifier or the Site Physical Availability
   value for the prefixes.

4.4.2.  BGP UPDATE with standalone Site Availability Index

   Upon receiving a BGP update message from Router-X, containing the
   Metadata Path Attribute with the Site Physical Availability Index
   Sub-TLV, the next-hop should be the loopback of Router-X.  The local
   BGP peer uses local policy to evaluate the route (prefix and path
   attributes).  When the local policy processes the Metadata Attribute
   with the Site Physical availability, it will use the site
   availability index to efficiently reduce or increase the preference
   for all BGP routes with the Router-X next-hop (loopback).

   The BGP UPDATE with a standalone Site Availability Index is NOT
   intended for resolving NextHop.

4.5.  Service Delay Prediction Index

   It is desirable for an ingress router to select a site with the
   shortest processing time for an ultra-low latency service.  But it is
   not easy to predict which site has "the fastest processing time" or
   "the shortest processing delay" for an incoming service request
   because:

   -  The given service instance shares the same physical infrastructure
      with many other applications and service instances.  Service
      requests by other applications, UEs, or applications running
      behavior can impact the processing time for the given service
      instance.

   -  The given service instance can be served by a cluster of servers
      behind a Load Balancer.  To the network, the service is identified
      by one service ID.

   -  The service complexity is different.  One service may call many

Dunbar, et al.           Expires 9 January 2025                [Page 12]
Internet-Draft                Metadata Path                    July 2024

      microservices, need to access multiple backend databases, and need
      to go through sophisticated security scrubbing functions, etc.
      Another service can be processed by a few simple steps.  Without
      the application internal logic, it is not easy to estimate the
      processing time for future service requests.

   Even though utilization measurements, like those below, are collected
   by most data centers, they cannot indicate which site has the
   shortest processing time.  A service request might be processed
   faster on Site-A even if Site-A is overutilized.

   o  Server utilization for the server where the instance is
      instantiated.

   o  The network utilization for the links to the server where the
      instance is instantiated.

   o  The number of databases that the service instance will access.

   o  The memory utilization of the databases

   The remaining available resource at a site is a more reasonable
   indication of process delay for future service requests.

   o  The remaining available Server resources.

   o  The remaining available network utilization for the links to the
      server where the instance is instantiated.

   o  The number of databases that the service instance will access.

   o  The remaining storage available for the databases.

   The Service Delay Prediction Index is a value that predicts
   processing delays at the site for future service requests.  The
   higher the value, the longer of the delay.

4.5.1.  Service Delay Prediction Sub-TLV

   While out of scope, we assume there is an algorithm that can derive
   the Service Delay Prediction Index that can be assigned to the egress
   router.  When the Service Delay Prediction value is updated, which
   can be triggered by the available resources change, etc., the egress
   router can attach the updated Service Delay Predication value in a
   Sub-TLV under the Metadata Path Attribute of the BGP Route UPDATE
   message to the ingress routers.

Dunbar, et al.           Expires 9 January 2025                [Page 13]
Internet-Draft                Metadata Path                    July 2024

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ServiceDelayPredict Sub-Type  |   Length      |F|L|Reserved   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Service Delay Predication Value                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 4: Service Delay Prediction Index Sub-TLV

   - ServiceDelayPredict:  (Service Delay Predication) Sub-type=3
      (specified in this document).

   - Length:  specifies the total length in octets of the value field,
      not including the sub-Type and Length field.  The value of Length
      can be 5 or 9 depends on what format the Service Delay Prediction
      Vlaue uses.

   - Flag (F):  A single bit flag to indicate the specific condition of
      the Service Delay Predication Value.

   - Flag (L):  A single bit flag to indicate using 64-bit NTP Timestamp
      Format in Service Delay Prediction Value field.  It is valid only
      when F-flag is set to 0.

   - Reserved (6 bits):  Reserved for future use.

   - Service Delay Predication Value (when the Flag bit is set to 1):
      an integer in the range of 0-100, with 0 indicating that the
      service delay is negligible and 100 indicating that the site has
      the most significant delay compared to all other sites for the
      same service.  When the value is outside the 0-100 range, the
      value carried in this Sub-TLV is ignored.

   - Service Delay Predication Value (when the Flag bit is set to 0):
      the estimated delay time encoded in the NTP Format as defined in
      [RFC5905].  When the L-flag is 1, then it is a 64-bit format,
      otherwise it is a 32-bit short format.

4.5.2.  Service Delay Prediction Based on Load Measurement

   When data centers detailed running status are not exposed to the
   network operator, historic traffic patterns through the egress nodes
   can be utilized to predict the load to a specific service.  For
   example, when traffic volume to one service at one data center
   suddenly increases a huge percentage compared with the past 24 hours
   average, it is likely caused by a larger than normal demand for the
   service.  When this happens, another data center with lower-than-
   average traffic volume for the same service might have a shorter

Dunbar, et al.           Expires 9 January 2025                [Page 14]
Internet-Draft                Metadata Path                    July 2024

   processing time for the same service.

   Here are some measurements that can be utilized to derive the Service
   Delay Predication for a service ID:

   -  Total number of packets to the attached service instance
      (ToPackets);

   -  Total number of packets from the attached service instance
      (FromPackets);

   -  Total number of bytes to the attached service instance (ToBytes);

   -  Total number of bytes from the attached service instance
      (FromBytes);

   -  The actual load measurement to the service instance attached to an
      egress router can be based on one of the metrics above or
      including all four metrics with different weights applied to each,
      such as:

      LoadIndex = w1*ToPackets+w2*FromPackes+w3*ToBytes+w4*FromBytes

      Where w1/w2/w3/w4 are between 0-1. w1+ w2+ w3+ w4 = 1;

      The weights of each metric contributing to the index of the
      service instance attached to an egress router can be configured or
      learned by self-adjusting based on user feedbacks.

   The Service Delay Prediction Index can be derived from
   LoadIndex/24Hour-Average.  A higher value means a longer delay
   prediction.  The egress router can use the ServiceDelayPred sub-TLV
   to indicate to the ingress routers of the delay prediction derived
   from the traffic pattern.

   Note: The proposed IP layer load measurement is only an estimate
   based on the amount of traffic through the egress router, which might
   not truly reflect the load of the servers attached to the egress
   routers.  They are listed here only for some special deployments
   where those metrics are helpful to the ingress routers in selecting
   the optimal paths.

4.6.  Raw Measurement Sub-TLV

   When ingress routers have embedded analytics tool relying on the raw
   measurements, it is useful for the egress router to send the raw
   measurement.

Dunbar, et al.           Expires 9 January 2025                [Page 15]
Internet-Draft                Metadata Path                    July 2024

   Raw Measurement Sub-TLV has the following format:

      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Raw-Measurement Sub-Type | Length        |  Reserved     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                           Value                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Figure 5: Service Delay Prediction Raw Measurements Sub-TLV

   - Raw-Measurement Sub-Type =4 (specified in this document): Raw
   measurements metadata from the edge service address.

   - Length: specifies the total length in octets of the value field,
   i.e., not including the Sub-Type, the Length fields.

   - Reserved: Reserved for future use.

   - Value: The value fileds can contain multiple types of sub-TLVs,
   which are used to describe the raw metadata.

   A typical raw mesurement metadata sub-TLV is defined below.

      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    Sub-Type   | Length        |                  Reserved     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   Measurement Period                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of packets to the Edge Service         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of packets from the Edge Service       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of bytes to the Edge Service           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           total number of bytes from the Edge Service         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 6: Packets and Bytes Measurements Sub-TLV

   - Sub-Type =4 (specified in this document): Type = 0, Raw
   measurements of packets/bytes to/from the edge service address.

Dunbar, et al.           Expires 9 January 2025                [Page 16]
Internet-Draft                Metadata Path                    July 2024

   - Length: specifies the total length in octets of the value field,
   i.e., not including the Sub-Type, the Length fields.  The value is
   22.

   - Reserved: Reserved for future use.

   - Measurement Period: BGP Update period in Seconds or user-specified
   period.

   - Total number of pakcets and bytes: The receiver nodes can compute
   the needed metrics, such as the Service Delay Prediction, for the
   Service based on the raw measurements sent from the egress node and
   preconfigured algorithms.

4.7.  Service-Oriented Capability Sub-TLV

   The service-oriented capability Sub-TLV is for distributing
   information regarding the capabilities of a specific service in a
   deployment environment.  Depending on the deployment, a deployment
   environment can be an edge site or other types of environments.  This
   information provides ingress routers or controllers with the
   available resources for the specific service in each deployment
   environment.  It enables them to make well-informed decisions for the
   optimal paths to the selected deployment environment.  Currently, the
   Sub-TLV only has an abstract value derived from various metrics,
   although the specifics of this derivation are beyond the scope of
   this document.  Importantly, this value is significant only when
   comparing multiple data center sites for the same service; it is not
   meaningful when comparing different services, meaning the capability
   value relevant to Service A cannot be directly compared with that for
   Service B.  Future enhancements may expand this sub-TLV to include
   more types of metrics or even raw data that represents direct
   metrics.  This information is important in 5G network environments
   where efficient resource utilization is crucial for enhancing
   performance and service quality.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ServiceOriented Cap Sub-Type  |   Length      | Res   |  MT   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       SO-CapValue                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 7: Service-Oriented Capability Sub-TLV

   - ServiceOriented Cap Sub-Type (16 bits):  Indicates the Service-

Dunbar, et al.           Expires 9 January 2025                [Page 17]
Internet-Draft                Metadata Path                    July 2024

      Oriented Capability Sub-type=5 (specified in this document).

   - Length (8 bits):  Specifies the total length in octets, not
      including the sub-Type and Length fields.  The Length = 5 for the
      ServiceOriented Cap Sub-Type.

   - Res (4 bits):  Reserved for future use.

   - MT (4 bits):  Metric Type.  This document defines a default metric
      type as value 0, indicating this is the normalized metric
      generating by multiple type of metrics.  The genrating rules of
      the normalized metric are out of scope of this document and
      defined by per service.  Other Metric Types could be defined by
      other documents in the future.

   - SO-CapValue (32 bits):  The Service-Oriented Capability Abstract
      Value is an integer between 0 and 2**32-1.  Bigger number means
      larger capability, and a value of 0 indicates the site has the
      lowest relative capability for the service.  The method used to
      derive this value is beyond the scope of this document.

   Multiple Service-Oriented Capability Sub-TLVs with different metric
   types can be encoded in a Metadata Path Attribute, indicating that
   multiple metrics are carried.  However, if more than one Service-
   Oriented Capability Sub-TLVs with the same metric type are encoded in
   a Metadata Path Attribute, only the first one will be processed and
   the others will be ignored in processing.

4.8.  Service-Oriented Available Resource Sub-TLV

   The "Service-Oriented Available Resource Sub-TLV" is for distributing
   a metric that measures the real-time avaiable resources allocated for
   processing specific services or applications at an edge site.  This
   Sub-TLV complements the "Service-Oriented Capability Sub-TLV"
   described in Section 4.6, which addresses the static resource
   capability of a site for a service.  While the Capability Abstract
   Value provides a baseline understanding of a site's potential to
   handle a service, the Available Resource metric offers a dynamic
   perspective by quantifying how much of this capacity is currently
   available.  This distinction is crucial for managing resource
   efficiency and responsiveness in network operations, ensuring that
   capabilities are not only available but also optimally used to meet
   the actual service demands.

Dunbar, et al.           Expires 9 January 2025                [Page 18]
Internet-Draft                Metadata Path                    July 2024

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |ServiceOriented Avail Sub-Type |   Length      |P| Res |  MT   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         SO-AvailRes                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

           Figure 8: Service-Oriented Available Resource Sub-TLV

   - ServiceOriented Avail Sub-Type:  (Service-Oriented Available
      Resource Sub-Type) Sub-type=6 (specified in this document).

   - Length (8 bits)  Specifies the total length in octets, excluding
      the sub-Type and the length field.  The Length is 5 for the
      ServiceOriented Available Resource Sub-Type.

   - Flag (P):  Is a single-bit Percentage flag.  When it is set to 1,
      it indicates the value is the Service-Oriented Available Resource
      in percentage.  When the "P" flag is set to 0, the value in this
      Sub-TLV is the abstract value of the available resource.

   - Res (3 bits):  Reserved for future use.

   - MT (4 bits)  Metric Type.  This document defines a default metric
      type as value 0, indicating this is the normalized metric
      generating by multiple type of metrics.  The genrating rules of
      the normalized metric are out of scope of this document and
      defined by the service.  Other Metric Types could be defined by
      other documents in the future.

   - SO-AvailRes (32 bits):  When the P-Flag bit is set to 1, Service-
      Oriented Available Resource Value is a percentage (0-100), with 0
      indicating that 0% of the capability is available and 100
      indicating that 100% of the capability is available.  When the
      value is outside the 0-100 range, the value carried in this Sub-
      TLV is ignored.  For example, Capacity value is 50 and the SO-
      AvailRes is 50 when P-flag is set, it means 50% of 50 unit of
      resource is available, while 25 unit of resource is available in
      this site for the service.  When the P-flag is 0, then the value
      of this filed is the abstract value of the available resource.
      For example, When the capacity value is 50, and the SO-AvailRes is
      50, it means all the resource is available.

Dunbar, et al.           Expires 9 January 2025                [Page 19]
Internet-Draft                Metadata Path                    July 2024

   Multiple Service-Oriented Available Resource Sub-TLVs with different
   metric types can be encoded in a Metadata Path Attribute, indicating
   that multiple metrics are carried.  However, if more than one
   Service-Oriented Available Resource Sub-TLVs with the same metric
   type are encoded in a Metadata Path Attribute, only the first one
   will be processed and the others will be ignored in processing.

5.  Service Metadata Influenced Decision Process

5.1.  Egress Node Behavior

   Multiple instances of the same service could be attached to one
   egress router.  When all instances of the same service are grouped
   behind one application layer load balancer, they appear as one single
   route to the egress router, i.e., the application loader balancer's
   prefix.  Under this scenario, the compute metrics for all those
   instances behind one application layer balancer are aggregated under
   the application load balancer's prefix.  In this case, the compute
   metrics aggregated by the Load Balancer are visible to the egress
   router as associated with the Load Balancer's prefix.  However, how
   the application layer Load Balancers distribute the traffic among
   different instances is out of the scope of this document.  When
   multiple instances of the same service have different paths or links
   reachable from the egress router, multiple groups of metrics from
   respective paths could be exposed to the egress router.  The egress
   router can have preconfigured policies on aggregating various metrics
   from different paths and the corresponding policies in selecting a
   path for forwarding the packets received from ingress routers.  The
   aggregated metrics can be carried in the BGP Update messages instead
   of detailed measurements to reduce the entries advertised by the
   control plane and dampen the routes update in the forwarding plane.
   Upon receiving packets from ingress routers, the egress router can
   use its policies to choose an optimal path to one service instance.
   It is out of the scope of this document how the measurements are
   aggregated on egress routers and how ingress routers are configured
   with the algorithms to integrate the aggregated metrics with network
   layer metrics.

   Many measurements could impact and correspondingly reflect service
   performance.  In order to simplify an optimal selection process,
   egress routers can have preconfigured policies or algorithms to
   aggregate multiple metrics into one simple one to ingress routers.
   Though out of the scope of this document, an egress router can also
   have an algorithm to convert multiple metrics to network metrics, an
   IGP cost for each instance, to pass to ingress nodes.  This decision-
   making process integrates network metrics computed by traditional
   IGP/BGP and the service delay metrics from egress routers to achieve
   a well-informed and adaptive routing approach.  This intelligent

Dunbar, et al.           Expires 9 January 2025                [Page 20]
Internet-Draft                Metadata Path                    July 2024

   orchestration at the edge enhances the service's overall performance
   and optimizes resource utilization across the distributed
   infrastructure.  When the egress has merged the compute metrics from
   the local sites behind it, it can include one or more aggregated
   compute metrics in the Metadata Path Attribute in the BGP UPDATE to
   the Ingress.  Also, an identifier or flag can be carried to indicate
   that the metrics are merged ones.  After receiving the routes for the
   Service ID with the identifier, the ingress would do the route
   selection based on pre-configured algorithms (see Section 3 of this
   document).

5.2.  Integrating Network Delay with the Service Metrics

   As the service metrics and network delays are in different units,
   here is an exemplary algorithm for an ingress router to compare the
   cost to reach the service instances at Site-i or Site-j.

                   ServD-i * CP-j               Pref-j * NetD-i
   Cost-i=min(w *(----------------) + (1-w) *(------------------))
                   ServD-j * CP-i               Pref-i * NetD-j

   CP-i:  Capacity Availability Index at Site-i.  A higher value means
      higher capacity available.

   NetD-i:  Network latency measurement (RTT) to the Egress Router at
      the site-i.

   Pref-i:  Preference Index for Site-i, a higher value means higher
      preference.

   ServD-i:  Service Delay Predication Index at Site-i for the service,
      i.e., the ANYCAST address [RFC4786] for the service.

   w:  Weight is a value between 0 and 1.  If smaller than 0.5, Network
      latency and the site Preference have more influence; otherwise,
      Service Delay and capacity availability have more influence.

   When a set of service Metadata is converted to a simple metric, a
   decision process is determined by the metric semantics and deployment
   situations.  The goal is to integrate the conventional network
   decision process with the service Metadata into a unified decision-
   making process for path selection.

Dunbar, et al.           Expires 9 January 2025                [Page 21]
Internet-Draft                Metadata Path                    July 2024

5.3.  Integrating with BGP decision process

   When an ingress router receives BGP updates for the same IP address
   from multiple egress routers, all those egress routers are considered
   as the next hops for the IP address.  For the selected services
   configured to be influenced by the edge service Metadata, the ingress
   router BGP Decision process [IDR-CUSTOM-DECISION] would trigger the
   edge service Management function to compute the weight to be applied
   to the route's next hop in the forwarding plane.  The decision
   process is influenced by the edge service Metadata associated with
   the client routes, such as Capacity Availability Index, Site
   Preference, and Service Delay Prediction Index, in addition to the
   traditional BGP multipath computation algorithm, such as the Weight,
   Local preference, Origin, MED, etc., shown below:

                        BGP ANYCAST Update
      +--------+ with Metadata    +---------------+
      | BGP    |----------------->| EdgeServiceMgn|
      |Decision|< - - - - - - - - |               |
      +---^-|--+                  +-------|-------+
          | | BGP ANYCAST                 | Update Anycast
          | | Route                       | Route Nexthops
          | | Multi-path NH install       | with weight
      +---|-V--+                          |
      |   RIB  |                          |
      +----+---+                          |
           |                              |
       +---V------------------------------V-------+
       |               Forwarding Plane           |
       |                                          |
       +------------------------------------------+

                   Figure 9: Metadata Influenced Decision

   When any of those metadata value goes to 0, the effect is the same as
   the routes becoming ineligible via the egress router who originates
   the metadata UPDATE.  But when any of those metadata just degrade,
   there is possibility, even though smaller, for the egress router to
   continue as the optimal next hop.

   Suppose a destination address for aa08::4450 can be reached by three
   next hops (R1, R2, R3).  Further, suppose the local BGP's Decision
   Process based on the traditional network layer policies and metrics
   identifies the R1 as the optimal next hop for this destination
   (aa08::4450).  If the edge service Metadata results in R2 as the
   optimal next hop for the prefix, the Forwarding Plane will have R2 as
   the next-hop for the destination address of aa08::4450.

Dunbar, et al.           Expires 9 January 2025                [Page 22]
Internet-Draft                Metadata Path                    July 2024

   The edge service Metadata influencing next hop selection is different
   from the metric (or weight) to the next hop.  The metric to a next
   hop can impact many (sometimes, tens of thousands) routes that have
   the node as their next hop. while as the edge service Metadata only
   impact the optimal next hop selection for a subset of client routes
   that are identified as the edge services.

   When the BGP custom decision [idr-custom-decision] is used, the edge
   service Management function would have algorithm to combine the edge
   service Metadata attributes with the custom decision to derive the
   optimal next hop for the Edge service routes.

   Note: For a BGP UPDATE message that includes the edge service
   Metadata Path Attribute with the RouteFlag-I=0 and the egress
   router's loopback prefix as the NLRI, the Site Capacity Availability
   Index value is applied to all the routes associated with the Site-ID.

6.  Service Metadata Propagation Scope

   The propagation scope of the Metadata Path Attribute needs careful
   consideration to ensure it does not inadvertently leak to other BGP
   domains.  According to Section 3 of [ATTRIBUTE-ESCAPE], it is
   necessary for the Route Reflector (RR) to be upgraded to constrain
   the propagation scope when propagating the metadata path attributes.
   Therefore, the Metadata Path Attribute originator sets the attribute
   as Non-transitive when sending the BGP UPDATE message to its
   corresponding RR.  Non-transitive attributes are only guaranteed to
   be dropped during BGP route propagation by implementations that do
   not recognize them, ensuring that the metadata path attributes do not
   propagate beyond the intended scope.

   The RR can append the NO-ADVERTISE well-known community to the BGP
   UPDATE message with the Metadata Path Attribute when forwarding it to
   the ingress routers.  This signals to the ingress nodes that the
   associated route's Metadata Path Attribute should not be further
   advertised beyond their scope.  This precautionary measure ensures
   that the receiver of the BGP UPDATE message refrains from forwarding
   the received update to its peers, preventing the undesired
   propagation of the information carried by the Metadata Path
   Attribute.

   In addition, the Service Metadata propagation can be further
   constrained to a set of ingress nodes that are specifically
   interested in the services.  For example, for each registered low-
   latency Service, BGP RT Constrained Distribution [RFC4684] can be
   used to form a Group interested in the Service.  The "Service ID", an
   IP address prefix, is the Route Target.  When an ingress router
   receives the first packet of a flow destined to a Service ID (i.e.,

Dunbar, et al.           Expires 9 January 2025                [Page 23]
Internet-Draft                Metadata Path                    July 2024

   IP prefix), the ingress router sends a BGP UPDATE that advertises the
   Route Target membership NLRI per [RFC4684].  The ingress router must
   assign a Timer for the Service ID, as the UE that uses the Service ID
   might move away.  Upon receiving a packet destined for the Service
   ID, the ingress router must refresh the Timer.  The ingress router
   must send a BGP Withdraw UPDATE for the Service ID upon expiration of
   the Timer.

   [RFC4684]specifies SAFI=132 for the Route Target membership NLRI
   Advertisements.

7.  Minimum Interval for Metrics Change Advertisement

   Route Churn Considerations

   While the mechanism detailed in this document aims to provide dynamic
   metrics like Capacity Availability Index, Site Delay Prediction
   Index, Service Delay Prediction Index, and Raw Measurement to
   optimize path selection, it is essential to consider the broader
   implications of metric-induced churn.  Particularly, in the context
   of routes used for BGP nexthop resolution (e.g., labeled unicast),
   frequent changes in these metrics can lead to significant churn not
   only for the prefixes carrying the data but also for dependent
   routes.

   This behavior is analogous to the impacts observed with RSVP auto-
   bandwidth, which can introduce considerable instability within a
   network.  Such route churn can propagate through the network, causing
   a cascade of updates and potential route flaps, thereby affecting
   overall network stability and performance.

   To mitigate these effects, network operators should carefully manage
   the advertisement intervals of these dynamic metrics, ensuring they
   are set to avoid unnecessary churn.  The default minimum interval for
   metrics change advertisement, set at 30 seconds, is designed to
   balance responsiveness with stability.  However, in scenarios with
   higher sensitivity to route stability, operators may consider
   increasing this interval further to reduce the frequency of updates.

   Furthermore, operators should implement robust route damping and
   filtering policies to control the propagation of changes and minimize
   the impact on dependent routes.  By acknowledging and planning for
   these broader impacts, the mechanism can be deployed more
   effectively, ensuring optimal performance without compromising
   network stability.

Dunbar, et al.           Expires 9 January 2025                [Page 24]
Internet-Draft                Metadata Path                    July 2024

   Significant load changes at EC data centers can be triggered by
   short-term gatherings of UEs, like conventions, lasting a few hours
   or days.  Therefore, the load metrics change rate can be in the
   magnitude of hours or days.

8.  Validation and Error Handling

   In addition to the Error Handling procedure described in [RFC7606], a
   BGP speaker should ignore the Metadata Path Attribute if more than
   one Metadata Path Attribute is within one BGP Update message.

   The Metadata Path Attribute contains a sequence of Sub-TLVs.  The
   Metadata Path Attribute's length minus 1 determines the total number
   of octets for all the Sub-TLVs under the Metadata Path Attribute.
   The sum of the lengths from all the Sub-TLVs under the Metadata Path
   Attribute plus 1 should equal the length of the Metadata Path
   Attribute.  If this is not the case, the TLV should be considered
   malformed, and the "Treat-as-withdraw" procedure of [RFC7606] is
   applied.

   When more than one sub-TLV is present in a Metadata Path Attribute,
   they are processed independently.  Suppose a Metadata Path attribute
   can be parsed correctly but contains a Sub-TLV whose type is not
   recognized by a particular BGP speaker; that BGP speaker MUST NOT
   consider the attribute malformed.  Instead, it MUST interpret the
   attribute as if that Sub-TLV had not been present.  Logging the error
   locally or to a management system is optional.  If the route carrying
   the Metadata path attribute is propagated with the attribute, the
   unrecognized Sub-TLV remains in the attribute.

9.  Manageability Considerations

   The edge service Metadata described in this document are only
   intended for propagating between Ingress and egress routers of one
   single BGP domain, i.e., the 5G Local Data Networks, which is a
   limited domain with edge services a few hops away from the ingress
   nodes.  Only the selective services by UEs are considered as 5G edge
   services.  The 5G LDN is usually managed by one operator, even though
   the routers can be by different vendors.

10.  Security Considerations

   The proposed edge service Metadata are advertised within the trusted
   domain of 5G LDN's ingress and egress routers.  The ingress routers
   should not propagate the edge service Metadata to any nodes that are
   not within the trusted domain.

Dunbar, et al.           Expires 9 January 2025                [Page 25]
Internet-Draft                Metadata Path                    July 2024

   To prevent the BGP UPDATE receivers (a.k.a. ingress routers in this
   document) from leaking the Metadata Path Attribute by accident to
   nodes outside the trusted domain [ATTRIBUTE-ESCAPE], the following
   practice should be enforced:

   -  The Metadata Path Attribute originator sets the attribute as Non-
      transitive when sending the BGP UPDATE message to its
      correspoinding RR.  According to [RFC4271], Non-transitive Path
      Attributes are only guaranteed to be dropped during BGP route
      propagation by implementations that do not recognize them.

   -  The RR (Route Reflector) can append the NO-ADVERTISE well-known
      community to the BGP UPDATE message with Metadata Path Attribute
      when forwarding to the ingress routers.  By doing so, the Route
      Reflector signals to ingress nodes that the associated route's
      Metadata Path Attribute should not be further advertised beyond
      their scope.  This precautionary measure ensures that the receiver
      of the BGP UPDATE message refrains from forwarding the received
      update to its peers, preventing the undesired propagation of the
      information carried by the Metadata Path Attribute.

   BGP Route Filtering or BGP Route Policies [RFC5291] can also be used
   to ensure that BGP update messages with Metadata Path Attribute
   attached do not get forwarded out of the administrative domain.  BGP
   route filtering [RFC5291] allows network administrators to control
   the advertisements and acceptance of BGP routes, ensuring that
   specific routes do not leak outside the intended administrative
   domain.  Here are the steps to achieve this:

   -  Use Route Filtering: Implement route filtering policies on the
      ingress routers to restrict the propagation of BGP update messages
      for the registered 5G edge services beyond the administrative
      domain.  You can use access control lists (ACLs), prefix lists, or
      route maps to filter the BGP routes classified as the 5G edge
      services, which need the Metadata Path Attributes to be
      distributed from egress routers to ingress routers.

   -  Filter by Prefix: Use prefix filtering to specify which IP
      prefixes should be advertised to peers and which should be
      suppressed.  This step ensures that only authorized routes are
      sent to external peers.

   -  Use Route Maps: Route maps provide a flexible way to filter and
      manipulate BGP route advertisements.  You can create route maps to
      match specific conditions and then apply them to the BGP
      configuration.

Dunbar, et al.           Expires 9 January 2025                [Page 26]
Internet-Draft                Metadata Path                    July 2024

11.  IANA Considerations

11.1.  Metadata Path Attribute

   IANA is requested to assign a new path attribute from the "BGP Path
   Attributes" registry.  The symbolic name of the attribute is
   "Metadata", and the reference is [This Document].

      +=======+======================================+=================+
      | Value |             Description              |    Reference    |
      +=======+======================================+=================+
      |  TBD1 |      Metadata Path Attribute         | [this document] |
      +-------+--------------------------------------+-----------------+

11.2.  Metadata Path Attribute Sub-Types

   IANA is requested to create a new sub-registry under the Metadata
   Path Attribute registry as follows:

   Name:  Sub-TLVs under the "Metadata Path Attribute"

   Registration Procedure:  Expert Review [RFC8126].

      Detailed Expert Review procedure will be added per [RFC8126].

   Reference:  [this document]

Dunbar, et al.           Expires 9 January 2025                [Page 27]
Internet-Draft                Metadata Path                    July 2024

   +========+==========================+=================+
   |Sub-Type|   Description            | Reference       |
   +========+==========================+=================+
   |      0 | reserved                 | [this document] |
   +--------+--------------------------+-----------------+
   |      1 | Site Preference Index    | [this document] |
   +--------+--------------------------+-----------------+
   |      2 | Site Availability Index  | [this document] |
   +--------+--------------------------+-----------------+
   |      3 | Service Delay Predication| [this document] |
   +--------+--------------------------+-----------------+
   |      4 | Raw Load Measurement     | [this document] |
   +--------+--------------------------+-----------------+
   |      5 | Service-Oriented         |                 |
   |        | Capability               | [this document] |
   +--------+--------------------------+-----------------+
   |      6 | Service-Oriented         |                 |
   |        | Utilization              | [this document] |
   +--------+--------------------------+-----------------+
   |  7-254 | unassigned               | [this document] |
   +--------+--------------------------+-----------------+
   |    255 | reserved                 | [this document] |
   +--------+--------------------------+-----------------+

12.  Contributors

   Changwang Lin

   New H3C Technologies

   China

   Email: linchangwang.04414@h3c.com

13.  Acknowledgements

   Acknowledgements to Jeff Hass, Tom Petch, Adrian Farrel, Alvaro
   Retana, Robert Raszuk, Sue Hares, Shunwan Zhuang, Donald Eastlake,
   Dhruv Dhody, Cheng Li, DongYu Yuan, and Vincent Shi for their
   suggestions and contributions.

14.  References

14.1.  Normative References

Dunbar, et al.           Expires 9 January 2025                [Page 28]
Internet-Draft                Metadata Path                    July 2024

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
              DOI 10.17487/RFC4271, January 2006,
              <https://www.rfc-editor.org/info/rfc4271>.

   [RFC4360]  Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
              Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
              February 2006, <https://www.rfc-editor.org/info/rfc4360>.

   [RFC4760]  Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
              "Multiprotocol Extensions for BGP-4", RFC 4760,
              DOI 10.17487/RFC4760, January 2007,
              <https://www.rfc-editor.org/info/rfc4760>.

   [RFC4786]  Abley, J. and K. Lindqvist, "Operation of Anycast
              Services", BCP 126, RFC 4786, DOI 10.17487/RFC4786,
              December 2006, <https://www.rfc-editor.org/info/rfc4786>.

   [RFC5291]  Chen, E. and Y. Rekhter, "Outbound Route Filtering
              Capability for BGP-4", RFC 5291, DOI 10.17487/RFC5291,
              August 2008, <https://www.rfc-editor.org/info/rfc5291>.

   [RFC5905]  Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
              "Network Time Protocol Version 4: Protocol and Algorithms
              Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
              <https://www.rfc-editor.org/info/rfc5905>.

   [RFC7606]  Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K.
              Patel, "Revised Error Handling for BGP UPDATE Messages",
              RFC 7606, DOI 10.17487/RFC7606, August 2015,
              <https://www.rfc-editor.org/info/rfc7606>.

   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
              Writing an IANA Considerations Section in RFCs", BCP 26,
              RFC 8126, DOI 10.17487/RFC8126, June 2017,
              <https://www.rfc-editor.org/info/rfc8126>.

   [RFC8277]  Rosen, E., "Using BGP to Bind MPLS Labels to Address
              Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017,
              <https://www.rfc-editor.org/info/rfc8277>.

Dunbar, et al.           Expires 9 January 2025                [Page 29]
Internet-Draft                Metadata Path                    July 2024

   [RFC9012]  Patel, K., Van de Velde, G., Sangli, S., and J. Scudder,
              "The BGP Tunnel Encapsulation Attribute", RFC 9012,
              DOI 10.17487/RFC9012, April 2021,
              <https://www.rfc-editor.org/info/rfc9012>.

14.2.  Informative References

   [ATTRIBUTE-ESCAPE]
              J. Haas, "BGP Attribute Escape", July 2023,
              <https://datatracker.ietf.org/doc/draft-haas-idr-bgp-
              attribute-escape/>.

   [IANA-BGP-PARAMS]
              IANA, "BGP Path Attributes", BGP Path Attributes 
              https://www.iana.org/assignments/bgp-parameters/.

   [IDR-CUSTOM-DECISION]
              A. Retana, R. White, "BGP Custom Decision Process", August
              2017, <https://datatracker.ietf.org/doc/draft-ietf-idr-
              custom-decision/>.

   [RFC2042]  Manning, B., "Registering New BGP Attribute Types",
              RFC 2042, DOI 10.17487/RFC2042, January 1997,
              <https://www.rfc-editor.org/info/rfc2042>.

   [RFC4684]  Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
              R., Patel, K., and J. Guichard, "Constrained Route
              Distribution for Border Gateway Protocol/MultiProtocol
              Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual
              Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684,
              November 2006, <https://www.rfc-editor.org/info/rfc4684>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8799]  Carpenter, B. and B. Liu, "Limited Domains and Internet
              Protocols", RFC 8799, DOI 10.17487/RFC8799, July 2020,
              <https://www.rfc-editor.org/info/rfc8799>.

   [TS.23.501-3GPP]
              3rd Generation Partnership Project (3GPP), "System
              Architecture for 5G System; Stage 2, 3GPP TS 23.501
              v2.0.1", December 2017.

Authors' Addresses

Dunbar, et al.           Expires 9 January 2025                [Page 30]
Internet-Draft                Metadata Path                    July 2024

   Linda Dunbar
   Futurewei
   Dallas, TX,
   United States of America
   Email: ldunbar@futurewei.com

   Kausik Majumdar
   Microsoft Azure
   California,
   United States of America
   Email: kausikm.ietf@gmail.com

   Cheng Li
   Huawei Technologies
   Beijing
   China
   Email: c.l@huawei.com

   Gyan Mishra
   Verizon
   United States of America
   Email: gyan.s.mishra@verizon.com

   Zongpeng Du
   China Mobile
   Beijing
   China
   Email: duzongpeng@chinamobile.com

Dunbar, et al.           Expires 9 January 2025                [Page 31]