Skip to main content

Adaptive Routing Framework
draft-cheng-rtgwg-adaptive-routing-framework-03

Document Type Active Internet-Draft (individual)
Authors Weiqiang Cheng , Changwang Lin , Kevin Wang , Jiaming Ye , Rui Zhuang , PengFei Huo
Last updated 2024-10-20
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-cheng-rtgwg-adaptive-routing-framework-03
Network Working Group                                          W. Cheng
Internet Draft                                             China Mobile
Intended status: Informational                                   C. Lin
Expires: April 28, 2025                            New H3C Technologies
                                                                K. Wang
                                                       Juniper Networks
                                                                  J. Ye
                                                              R. Zhuang
                                                           China Mobile
                                                                 P. Huo
                                                              ByteDance
                                                        October 20,2024

                        Adaptive Routing Framework
              draft-cheng-rtgwg-adaptive-routing-framework-03

Abstract

   In many cases, ECMP (Equal-Cost Multi-Path) flow-based hashing leads
   to high congestion and variable flow completion time. This reduces
   applications performance. Load balancing based on local link quality
   is not always optimal, A global view of congestion, with information
   from remote links, is needed for optimal balancing. Adaptive routing
   is a technology that makes dynamic routing decision based on changes
   in traffic load and network topology.

   This document describes a framework for Adaptive Routing.
   Specifically, it identifies a set of adaptive routing components,
   explains their interactions, and exemplifies the workflow mechanism.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 28, 2025.

Cheng, et al.          Expires April 28, 2025                 [Page 1]
Internet-Draft        Adaptive Routing Framework          October 2024

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document.  Code Components extracted from this
   document must include Revised BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Revised BSD License.

Table of Contents

   1. Introduction...................................................3
      1.1. Requirements Language.....................................3
   2. Problem Analysis...............................................3
      2.1. Use Case 1................................................4
      2.2. Use Case 2................................................5
   3. Solution.......................................................5
      3.1. Flow-based solution.......................................5
         3.1.1. Weight-Based Dynamic ECMP Flow Mode..................6
         3.1.2. Flow Redirection Mode................................6
      3.2. Packet-based solution.....................................7
   4. Framework......................................................8
      4.1. Framework Overview........................................8
      4.2. Remote Path Info..........................................9
      4.3. Routing Plane.............................................9
      4.4. Forwarding Plane.........................................10
      4.5. Adaptive Routing Mode....................................11
         4.5.1. Flow-Based Adjustment Mode..........................12
         4.5.2. Packet-Based Adjustment Mode........................12
      4.6. Congestion Detection.....................................12
      4.7. Congestion definition....................................12
      4.8. Congestion Notify........................................13
   5. Work Flow.....................................................13
      5.1. Weight-Based Dynamic ECMP Flow Adjustment Mode...........14
      5.2. Flow Redirect Mode.......................................16
      5.3. Packet-Based Adjustment Mode.............................18
   6. Security Considerations.......................................19
   7. IANA Considerations...........................................19
   8. References....................................................20
      8.1. Normative References.....................................20
      8.2. Informative References...................................20
   9. Acknowledgments...............................................20
   Authors' Addresses...............................................21

Cheng, et al.          Expires April 28, 2025                 [Page 2]
Internet-Draft        Adaptive Routing Framework          October 2024

1. Introduction

   In many cases, ECMP (Equal-Cost Multi-Path) flow-based hashing leads
   to high congestion and variable flow completion time. This reduces
   applications performance. Load balancing based on local link quality
   is not always optimal, A global view of congestion, with information
   from remote links, is needed for optimal balancing.

   Adaptive routing is a network routing mechanism that dynamically
   adjusts routing paths based on changes in network conditions,
   thereby optimizing network performance and resource utilization.

   This document defines a framework for Adaptive Routing.
   Specifically, it identifies adaptive routing components, explains
   their interactions, and illustrates the workflow mechanism. It
   focuses exclusively on dynamic load balancing for existing loop-free
   multiple paths, allowing adjustments based on remote link quality.
   The formation of loop-free paths is outside the scope of this
   document.

1.1. Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2. Problem Analysis

   The current AI networks exhibit the following characteristics: a low
   number of flows, but each flow has a heavy load. The commonly used
   load balancing strategy employs an N-tuple hash algorithm to forward
   traffic on a per-flow basis. For current AI networks, this load
   balancing strategy can easily lead to load imbalances, causing
   network congestion.

   When network congestion occurs, the current load balancing
   adjustment strategy typically involves nearby devices at the
   congestion point switching links based on the local link congestion
   state. However, this approach is inefficient because adjustments
   made by devices near the congestion point have limited impact. If
   load balancing adjustments could be initiated from the earliest
   routing devices, it would significantly improve the efficiency of
   load balancing.

Cheng, et al.          Expires April 28, 2025                 [Page 3]
Internet-Draft        Adaptive Routing Framework          October 2024

2.1. Use Case 1

            +--+        +--+
     Spine  |R1|        |R2|
            +--+        +--+
             | \        / |
             |   \    /   |
             |     \/     |
             |     /\     X <- congested
             |   /    \   |
             | /        \ |
            +--+        +--+
     Leaf   |R3|        |R4|
            +--+        +--+
             ^            |
             |            v
            Source      Destination

              Figure 1 Spine-Leaf network

   In the Spin-Leaf network shown in Figure 1, assuming that the R2-R4
   link becomes congested, R3 will continue to send traffic to both R1
   and R2. Due to the congestion, continuing to forward traffic at the
   current rate through R2 will exacerbate the link congestion, leading
   to the loss of some traffic.

Cheng, et al.          Expires April 28, 2025                 [Page 4]
Internet-Draft        Adaptive Routing Framework          October 2024

2.2. Use Case 2

        Source
          |
          v
     +---------+
     |         |
     | Group 1 |-------------+
     |         |             |
     +---------+             |
          |             +---------+
          |             |         |
          X<- congested | Group 3 |
          |             |         |
          |             +---------+
     +---------+             |
     |         |             |
     | Group 2 |-------------+
     |         |
     +---------+
          |
          v
     Destination

              Figure 2 Dragon-fly network

   In the dragon-fly network shown in Figure 2, the ECMP paths include
   Group1->Group2 and Group1->Group3->Group2 for load balancing. When
   the link between Group1 and Group2 becomes congested, Group1
   continues to send traffic at the current rate through the
   Group1->Group2 link, exacerbating the congestion and causing the
   loss of some traffic.

3. Solution

   To address the problem of load imbalance mentioned above, solutions
   can be classified into two types: flow-based adjustments and packet-
   based adjustments. In flow-based adjustment, each flow is forwarded
   along a single path, and the dynamic adjustment is in the load
   distribution across multiple paths. In packet-based adjustment,
   packets can be forwarded across multiple paths on a per-packet
   basis, and if ordering is required, the receiving end must handle
   the reordering.

3.1. Flow-based solution

   Flow-based load balancing adjustments can be further categorized
   into weight-based dynamic ECMP flow mode and redirecting congested
   flows mode. The weight-based dynamic ECMP flow mode adjusts the
   forwarding ratio across multiple paths in real-time to prevent

Cheng, et al.          Expires April 28, 2025                 [Page 5]
Internet-Draft        Adaptive Routing Framework          October 2024

   further congestion on remote paths. In contrast, the redirecting
   congested flows mode reroutes actual congested flows from the
   congested link to other links, suitable for scenarios with
   relatively stable flows.

3.1.1. Weight-Based Dynamic ECMP Flow Mode

   When congestion occurs and nearby devices detect it, congestion
   information is sent to remote devices. The remote devices then
   dynamically adjust the forwarding weights of the ECMP paths for this
   link according to the congestion status. This reduces traffic
   through the congested link and alleviates the load. Using a weighted
   load balancing strategy instead of a hash-based strategy can more
   effectively utilize the bandwidth resources of multiple links. By
   assigning forwarding weights based on the status of each link, the
   load can be more evenly balanced.

   The disadvantage of weight-based dynamic ECMP flow Mode is that it
   cannot adjust existing flows, only new ones.

   Examples of weight-based dynamic ECMP flow mode are as follows:

   Example 1:

   In Figure 1, when R2 detects congestion on the R2->R4 link, R2 sends
   the congestion information to R3 via the control plane. R3 then
   dynamically adjusts the forwarding weights of the ECMP paths based
   on the congestion status, reducing the forwarding weight for the
   congested link. This decreases the traffic on that link and
   alleviates its load. Once the congestion is cleared, R2 sends a
   congestion clearance message to R3 via the control plane, and R3
   restores the original forwarding weight for that link.

   Example 2:

   In Figure 2, when the egress router in Group 1 detects inter-group
   link congestion, it sends a congestion message to the ingress router
   via the control plane. The ingress router dynamically adjusts the
   forwarding weights of the ECMP paths based on the congestion status,
   reducing the traffic through the Group1->Group2 link to alleviate
   the load on the congested link. Once the congestion is cleared, the
   egress router in Group 1 notifies the ingress router in Group 1 of
   the congestion-clearance message, and the ingress router restores
   the ECMP link weights.

3.1.2. Flow Redirection Mode

   When congestion occurs and nearby devices detect congestion in a
   specific flow, they send congestion information to remote devices.
   The remote devices then recompute the load balancing for the

Cheng, et al.          Expires April 28, 2025                 [Page 6]
Internet-Draft        Adaptive Routing Framework          October 2024

   congested flow and select a less loaded ECMP link for forwarding the
   congested flow.

   The advantage of the flow redirection mode is that it specifically
   adjusts the congested flows, which can quickly alleviate congestion.

   Examples of flow redirection mode are as follows:

   Example 1:

   In Figure 1, when R2 detects congestion in a specific flow S on the
   R2->R4 link, R2 sends congestion information about flow S to R3 via
   the control plane. R3 then redirects flow S by selecting a less
   loaded ECMP path to forward flow S.

   Example 2:

   In Figure 2, when the egress router in Group 1 detects congestion in
   a specific flow S on the inter-group link, it sends a congestion
   message about flow S to the ingress router via the control plane.
   The ingress router then redirects flow S by selecting a less loaded
   link to forward flow S.

3.2. Packet-based solution

   The packet-based adjustment method does not select load-balancing
   paths based on flows; instead, it chooses the load-balancing path
   for each individual packet. This approach helps prevent congestion
   caused by imbalanced load distribution from extremely large flows.

   For multiple ECMP links, link load status is monitored based on link
   quality. When selecting the forwarding path for each packet, the
   ECMP link with the lowest load is chosen. After forwarding a packet,
   the recorded load for that link is correspondingly increased.

   When responding to congestion notifications from remote devices, the
   load on the ECMP links is adjusted to influence the path selection
   for subsequent packets. This dynamic adjustment of link load helps
   achieve load balancing. Upon receiving a packet, per-packet
   forwarding mode can be used based on the packet's markers or the
   current per-packet forwarding mode. Each time, the least loaded link
   is selected for sending, in order to achieve the effect of overall
   load balancing.

   Example:

   In Figure 1, when R2 detects congestion on the R2->R4 link, it sends
   congestion information to R3. R3 then dynamically adjusts the load
   status of the ECMP paths according to the congestion status,
   increasing the load for the congested link. As a result, subsequent

Cheng, et al.          Expires April 28, 2025                 [Page 7]
Internet-Draft        Adaptive Routing Framework          October 2024

   packets will choose other links due to the higher load on the
   congested link, thus relieving congestion. Once the congestion is
   cleared, R2 sends a congestion clearance message to R3 via the
   control plane, and R3 decreases the load for that link.

4. Framework

4.1. Framework Overview

   A high-level view of the Adaptive Routing framework, without
   expanding the functional entities in the network, is illustrated in
   Figure 3.

     +-------------+
     |Routing Plane|
     +-------------+
             |
             | Remote Path Info
             v
     +----------------+       +-----------------------+
     |Forwarding Plane|<------|Adaptive Routing Policy|
     +----------------+       +-----------------------+
                                        ^
                                        | Congestion Notifiy
                                        |
                            +----------------------------+
                            |Remote Congestion Detection |
                            +----------------------------+

              Figure 3 Adaptive Routing Framwork Overview

   As shown in Figure 3, the following planes are defined:

     * Routing Plane: Responsible for the transmission and calculation
       of routes. The calculated routes should include remote path
       information. The routes and remote path info should be correlated
       and updated to the Forwarding Plane. Control plane protocols can
       statically specify remote paths or, through protocol extensions,
       calculate remote path information and update it to the forwarding
       plane.

     * Forwarding Plane: Responsible for path adjustments based on
       Adaptive Routing policies and remote link congestion information,
       implementing adjusted forwarding strategies for traffic. In
       addition to the traditional next-hop in routing, extra
       information is needed to describe remote link details. Beyond the
       direct next-hop, an additional remote next-hop is required to
       convey remote path information.

Cheng, et al.          Expires April 28, 2025                 [Page 8]
Internet-Draft        Adaptive Routing Framework          October 2024

     * Adaptive Routing Policy: Handles remote link congestion or flow
       information, dynamically adjusting routing and updating the
       Forwarding Plane. There are two adjustment modes: flow-based and
       packet-based. Congestion information and adjustment modes update
       local forwarding table information.

     * Remote Congestion Detection: Detects link congestion and sends
       Congestion Notifications to neighboring devices. It dynamically
       senses congestion information, defines it consistently, and
       promptly informs surrounding devices.

4.2. Remote Path Info

   Currently, the forwarding table contains information about the route
   destination, next hop, and exit interface. Local dynamic load
   balancing can dynamically adjust the weight of load distribution
   based on the link metric of local interfaces, such as interface
   traffic load and queue size.

   Load balancing based on local link quality is not always optimal.
   Global congestion awareness, with information from remote links, is
   needed for optimal balancing. Therefore, the forwarding table needs
   to contain not only local exit interface information but also remote
   path info and remote link congestion information.

   Remote path info can be remote links or remote nodes, specifically
   as follows:

    * For BGP-based networks: Remote path info can be the BGP identifier
     corresponding to the next-next-hop, as described in [I-D.wang-idr-
     next-next-hop-nodes]. It can also be the BGP AS-PATH information
     or BGP router-id, which is not detailed in this document.

    * For IGP-based networks: Remote path info can be the interface
     information from the next-hop neighbor device to the next-hop
     device, which could be the interface index, or the interface's
     local address.

   By using remote path info, routes can be associated with remote
   paths.

4.3. Routing Plane

   When calculating routes, the path needs to be perceived, and the
   path information will be attached to the next hop.

   In a BGP-based network, a BGP route may carry the router-id of the
   peer from which that route is received, and the router-id will be
   added into the path information when calculating that route. The BGP
   protocol may need some extensions to support such a feature. The

Cheng, et al.          Expires April 28, 2025                 [Page 9]
Internet-Draft        Adaptive Routing Framework          October 2024

   specific extensions can refer to [I-D.wang-idr-next-next-hop-nodes]
   or other extensions, which are not detailed in this document.

   In an IGP-based network, a router may compute the path information
   based on the SPF tree and attach it to the next hop. Path info can
   be a link-local address, interface ID, or Link Local Identifier, or
   other extensions. The detailed mechanisms are out of the scope of
   this document.

4.4. Forwarding Plane

   Taking Figure 1 as an example, the forwarding table on R3 is
   illustrated in Figure 4. Below is an explanation of the table's
   structure.

   For each prefix, the next hop and the weight corresponding to each
   path are recorded. If per-packet load balancing is supported, the
   load on the path is also recorded.

   The next hop for the prefix is constructed from the local next hop
   and remote path information. The forwarding weight and load are
   determined by the quality of the local next-hop interface (local(q))
   and the quality of the remote link in the remote path (remote(q)).
   Additionally, the load varies with the total traffic forwarded to
   this path during per-packet forwarding.

   When responding to local congestion events, the next-hop address in
   the congestion event is used to find the corresponding ECMP entry,
   and the weight and load of this ECMP entry are modified according to
   the congestion level.

   When responding to remote congestion events, the path info in the
   congestion message is used to find the corresponding ECMP entry. The
   link quality of the remote path is updated, and a new weight value
   and new load value are calculated based on the local and remote link
   quality. Then the weight and load of this ECMP entry are modified
   according to the congestion level.

   In per-packet load balancing mode, each time the path with the
   smallest load is selected for forwarding. Each time a packet is
   forwarded, the load for the corresponding path is increased
   accordingly. When a link congestion event is received, the load for
   that path is increased according to the level of congestion.

Cheng, et al.          Expires April 28, 2025                [Page 10]
Internet-Draft        Adaptive Routing Framework          October 2024

     +------+       +--------------------------+ local(q)+remote(q)
     |Prefix|---+-->|Next-hop: to R1, Weight w1|<----------------|
     +------+   |   |                 Load   l1|                 |
                |   +--------------------------+                 |
                |           |           +------------+   +--------+
                |           +---------->|Path: R1->R4|-->|Quality1|
                |                       +------------+   +--------+
                |   +--------------------------+ local(q)+remote(q)
                +-->|Next-hop: to R2, Weight w2|<----------------|
                    |                 Load   l2|                 |
                    +--------------------------+                 |
                            |           +------------+   +--------+
                            +---------->|Path: R2->R4|-->|Quality2|
                                        +------------+   +--------+
           Figure 4 Forwarding table for Adaptive Routing

   When the number of flows is small or when there are elephant flows,
   adaptive routing needs to be performed through flow redirection. The
   following figure 5 is a schematic of the forwarding layer flow table
   maintenance. The flow tables are maintained according to the five-
   tuple of the traffic, recording the path information corresponding
   to this flow.

   When responding to remote flow congestion events as described in
   section 4.7, the flow will be rehashed to choose an ECMP path, and
   this flow is redirected to the least loaded ECMP path.

     +------+
     |SAddr |
     |DAddr |
     |SPort |       +------------------+
     |DPort |------>|Next-hop: to R1   |
     |Proto |       +------------------+
     +------+
               Figure 5 Flow table

4.5. Adaptive Routing Mode

   Adaptive routing adjustment modes can be categorized into flow-based
   adjustment modes and packet-based adjustment modes. Flow-based
   adjustments can be further divided into link congestion-based load
   balancing weight adjustments and flow redirection-based adjustments.
   In packet-based adjustment modes, each time a packet is forwarded,
   the link with the lightest load is chosen for forwarding. This can
   lead to out-of-order reception at the receiving end, requiring the
   application side to handle out-of-order issues.

Cheng, et al.          Expires April 28, 2025                [Page 11]
Internet-Draft        Adaptive Routing Framework          October 2024

4.5.1. Flow-Based Adjustment Mode

   In flow-based adjustment modes, the load balancing weight of links
   can be adjusted, or specific flows can be redirected.

   Weight-Based Dynamic ECMP Flow Mode For link-level congestion
   events, based on the congestion status of remote links and combined
   with the local link congestion status, the forwarding weights of the
   corresponding ECMP links in the forwarding table are dynamically
   adjusted. This helps to achieve global link load balancing by
   reducing the flow weights on congested links. The forwarding weight
   is calculated based on the quality of the local and remote links.

   Flow Redirection Mode For specific flow congestion events, the
   congested flow is redirected to ECMP links with lighter loads.
   During flow redirection, the quality, such as the remaining
   bandwidth, of these links must be considered to avoid congestion on
   the redirected link.

4.5.2. Packet-Based Adjustment Mode

   Based on the congestion status of local and remote links, the load
   on corresponding ECMP links is dynamically adjusted. During data
   forwarding, using a per-packet forwarding model, each data packet is
   sent through the link with the lightest load. The application side
   must be capable of handling out-of-order packets when receiving
   them.

4.6. Congestion Detection

   Congestion detection is generally performed by devices near the
   congestion point, including the detection of link congestion and
   congestion clearance. Network performance and congestion points can
   be identified by sending test traffic. A queue exceeds a threshold
   depth may send congestion notification. Congestion can also be
   inferred by monitoring the packet loss rate to determine if a link
   is congested. Congestion detection also includes flow-based
   congestion detection. Congestion Specific detection methods are
   beyond the scope of this document.

4.7. Congestion definition

   The definition of congestion can be based on interface bandwidth or
   forwarding buffer utilization, measured using a quality level. This
   level can be tailored so that lower levels indicate poorer path
   quality and can be calculated based on current bandwidth and buffer
   usage, using a specific ratio.

Cheng, et al.          Expires April 28, 2025                [Page 12]
Internet-Draft        Adaptive Routing Framework          October 2024

   For instance, with 16 quality levels, on a 400G interface, level 0
   could represent 25G and level 15 could represent 400G.

   The exact method for calculating the quality level is beyond this
   document's scope, but the rules must be consistent among routers
   exchanging this information.

4.8. Congestion Notify

   When a change in congestion status is detected, it needs to be
   communicated to remote devices in order to adjust traffic scheduling
   from the source.

   Congestion messages can be of two types:

   1)           The first type includes Path information, which helps in
      identifying the corresponding route for adjustments. It also
      includes the congestion information of the link corresponding to
      the Path. With this information, global congestion calculation can
      be performed to derive the weight information for the forwarding
      table. For details, refer to section 4.4.

   2)           The second type includes the five-tuple information of the
      congested flow. By using this congested flow information,
      congestion flow redirection can be implemented. For details, refer
      to sections 4.4 and 4.5.

   This can be achieved by extending the IGP protocol to transmit link
   state information within the IGP domain, or by extending the BGP
   protocol and setting up BGP reflectors to facilitate communication
   between BGP neighbors. However, transmitting congestion information
   through traditional routing protocols presents performance
   challenges. On one hand, congestion notifications need to be sent
   more frequently than routing information. On the other hand,
   processing congestion messages should ideally occur in the
   forwarding plane. Therefore, to improve the performance of adaptive
   routing, new protocols can be designed specifically for this
   purpose. Specific extensions can refer to [I-D. draft-zzhang-rtgwg-
   router-info] or other extensions, which are not detailed in this
   document. Congestion messages can be transmitted in-band or out-of-
   band. For high-performance solutions, additional protocols may be
   needed for efficient out-of-band message transmission. Specific
   methods are beyond the scope of this document.

5. Work Flow

   Taking the network shown in Figure 1 as an example, the neighbor ID
   of R4 is included in the routing distribution and calculation to
   indicate the path from R2 to R4. The router ID can be used as the

Cheng, et al.          Expires April 28, 2025                [Page 13]
Internet-Draft        Adaptive Routing Framework          October 2024

   neighbor ID. Using BGP routing as an example, explain the entire
   working process.

   When congestion occurs on the link between R2 and R4, R2 will use
   R4's neighbor ID to notify R1 of the link congestion.

   Below is a description of the workflow for handling congestion
   information in various modes:

5.1. Weight-Based Dynamic ECMP Flow Adjustment Mode

   R3 <---BGP Route--- R2                              [S1]

     <NLRI: Prefix, Next-hop BGP ID: R2, Next-next-hop BGP ID: R4>

   R3 <---BGP Route--- R1                              [S1]

     <NLRI: Prefix, Next-hop BGP ID: R1, Next-next-hop BGP ID: R4>

        |
        | Per-neighbor Path Information
        v

   R3's Forwarding Plane:                              [S2]

     +------+       +--------------------------+ local(q)+remote(q)
     |Prefix|---+-->|Next-hop: to R1, Weight w1|<----------------|
     +------+   |   +--------------------------+                 |
                |           |           +------------+   +--------+
                |           +---------->|Path: R1->R4|-->|Quality1|
                |                       +------------+   +--------+
                |   +--------------------------+ local(q)+remote(q)
                +-->|Next-hop: to R2, Weight w2|<----------------|
                    +--------------------------+                 |
                            |           +------------+   +--------+
                            +---------->|Path: R2->R4|-->|Quality2|[S4]
                                        +------------+   +--------+
        ^
        | Link Congestion
        |

   R3 <---Control Plan Notification--- R2:

     Link Congestion: Link ID, Congestion Level             [S3]

                           Figure 6

   As shown in Figure 6, the workflow for handling remote link
   congestion by stream mode is as follows:

Cheng, et al.          Expires April 28, 2025                [Page 14]
Internet-Draft        Adaptive Routing Framework          October 2024

   [S1]:There are two paths from R3 to R4: R3->R1->R4 and R3->R2->R4.
   Link information is advertised by R2 through the BGP.

   When R2 delivers BGP routes to R3, the NNHN Capability TLV is
   carried in the attributes [I-D.wang-idr-next-next-hop-nodes],
   indicating that the next-hop is R2 and the next-next-hop is R4.

   When R1 delivers BGP routes to R3, the NNHN Capability TLV is
   carried in the attributes [I-D.wang-idr-next-next-hop-nodes],
   indicating that the next-hop is R1 and the next-next-hop is R4.

   [S2]:R3 learns the routes and maintains two ECMP paths, both with
   initial equal weights set to 1.

   [S3]:R2 detects a change in congestion on the R2->R4 link using
   congestion detection methods and classifies the congestion into
   levels according to severity. This information, including the
   detecting node, the congested link, and the congestion level, is
   notified to R1 through the control plane [I-D. draft-zzhang-rtgwg-
   router-info].

   [S4]:R3 receives the remote notification and, based on the congested
   node (R2) and next-hop information (R4), looks up its local
   forwarding table. It then adjusts the forwarding weights of the
   corresponding ECMP entries according to the congestion level,
   assuming the weight is adjusted to 10, as shown in Figure 7.

Cheng, et al.          Expires April 28, 2025                [Page 15]
Internet-Draft        Adaptive Routing Framework          October 2024

     +------+       +--------------------------+
     |Prefix|---+-->|Next-hop: to R1, Weight 50|
     +------+   |   +--------------------------+
                |           |           +----------------+
                |           +---------->|Path: R1->R4    |
                |                       +----------------+
                |   +--------------------------+
                +-->|Next-hop: to R2, Weight 10|
                    +--------------------------+
                            |           +----------------+
                            +---------->|Path: R2->R4    |
                                        +----------------+

           Figure 7 Adaptive forwarding table

5.2. Flow Redirect Mode

   R3 <---BGP Route--- R2                              [S1]

     <NLRI: Prefix, Next-hop BGP ID: R2, Next-next-hop BGP ID: R4>

   R3 <---BGP Route--- R1                              [S1]

     <NLRI: Prefix, Next-hop BGP ID: R1, Next-next-hop BGP ID: R4>

        |
        | Per-neighbor Path Information
        v

   R3's Forwarding Plane:                           [S2]

     +------+       +--------------------------+
     |Flow 1|------>|Next-hop: to R2           |
     +------+       +--------------------------+
                            |           +------------+
                            +---------->|Path: R2->R4|     [S4]
                                        +------------+
        ^
        | Flow Congestion
        |

   R3 <---Control Plan Notification--- R2:

     Flow Congestion: Flow 1, Congestion             [S3]

                           Figure 8

Cheng, et al.          Expires April 28, 2025                [Page 16]
Internet-Draft        Adaptive Routing Framework          October 2024

   As shown in Figure 8, the workflow for handling remote flow
   congestion is as follows:

   [S1]:There are two paths from R3 to R4: R3->R1->R4 and R3->R2->R4.
   Link information is advertised by R2 through the BGP.

   When R2 delivers BGP routes to R3, the NNHN Capability TLV is
   carried in the attributes [I-D.wang-idr-next-next-hop-nodes],
   indicating that the next-hop is R2 and the next-next-hop is R4.

   When R1 delivers BGP routes to R3, the NNHN Capability TLV is
   carried in the attributes [I-D.wang-idr-next-next-hop-nodes],
   indicating that the next-hop is R1 and the next-next-hop is R4.

   [S2]:The initial Flow 1 selects the path R2->R4 for forwarding and
   establishes a flow table.

   [S3]:R2 detects Flow congestion on a specific flow passing through
   the R3->R4 link using congestion detection methods; R2 notifies the
   remote device R3 of the congestion change event, including the
   congested path info and flow information [I-D. draft-zzhang-rtgwg-
   router-info];

   [S4]:R3 receives the flow congestion event and looks up the flow
   table based on the flow information, redirecting the flow to the
   least loaded link among the ECMP links; Subsequently, the flow is
   forwarded according to the new flow table.

Cheng, et al.          Expires April 28, 2025                [Page 17]
Internet-Draft        Adaptive Routing Framework          October 2024

5.3. Packet-Based Adjustment Mode

   R3 <---BGP Route--- R2                              [S1]

     <NLRI: Prefix, Next-hop BGP ID: R2, Next-next-hop BGP ID: R4>

   R3 <---BGP Route--- R1                              [S1]

     <NLRI: Prefix, Next-hop BGP ID: R1, Next-next-hop BGP ID: R4>

        |
        | Per-neighbor Path Information
        v

   R3's Forwarding Plane:                              [S2]

     +------+       +--------------------------+ local(q)+remote(q)
     |Prefix|---+-->|Next-hop: to R1, Load l1  |<----------------|
     +------+   |   +--------------------------+                 |
                |           |           +------------+   +--------+
                |           +---------->|Path: R1->R4|-->|Quality1|
                |                       +------------+   +--------+
                |   +--------------------------+ local(q)+remote(q)
                +-->|Next-hop: to R2, Load l2  |<----------------|
                    +--------------------------+                 |
                            |           +------------+   +--------+
                            +---------->|Path: R2->R4|-->|Quality2|[S4]
                                        +------------+   +--------+
        ^
        | Link Congestion
        |

   R3 <---Control Plan Notification--- R2:

     Link Congestion: Link ID, Congestion Level             [S3]

                           Figure 9

   As shown in Figure 9, the workflow for handling remote link
   congestion by packet mode is as follows:

   [S1]:There are two paths from R3 to R4: R3->R1->R4 and R3->R2->R4.
   Link information is advertised by R2 through the BGP.

   When R2 delivers BGP routes to R3, the NNHN Capability TLV is
   carried in the attributes [I-D.wang-idr-next-next-hop-nodes],
   indicating that the next-hop is R2 and the next-next-hop is R4.

Cheng, et al.          Expires April 28, 2025                [Page 18]
Internet-Draft        Adaptive Routing Framework          October 2024

   When R1 delivers BGP routes to R3, the NNHN Capability TLV is
   carried in the attributes [I-D.wang-idr-next-next-hop-nodes],
   indicating that the next-hop is R1 and the next-next-hop is R4.

   [S2]:R3 learns the routes and maintains two ECMP paths, both with
   initial equal load set to 50.

   [S3]:R2 detects a change in congestion on the R2->R4 link using
   congestion detection methods and classifies the congestion into
   levels according to severity. This information, including the
   detecting node, the congested link, and the congestion level, is
   notified to R1 through the control plane [I-D. draft-zzhang-rtgwg-
   router-info].

   [S4]:R3 receives the remote notification and, based on the congested
   node (R2) and next-hop information (R4), looks up its local
   forwarding table. It then adjusts the forwarding load of the
   corresponding ECMP entries according to the congestion level,
   assuming the load is adjusted to 100, as shown in Figure 9.
   Afterwards, when selecting a path for per-packet forwarding, the
   lower load path R1->R4 will be chosen for forwarding, until the load
   on the path R1->R4 becomes higher than that of R2->R4.

     +------+       +--------------------------+
     |Prefix|---+-->|Next-hop: to R1, load 50  |
     +------+   |   +--------------------------+
                |           |           +----------------+
                |           +---------->|Path: R1->R4    |
                |                       +----------------+
                |   +--------------------------+
                +-->|Next-hop: to R2, load 100 |
                    +--------------------------+
                            |           +----------------+
                            +---------->|Path: R2->R4    |
                                        +----------------+
                           Figure 10

6. Security Considerations

   TBD.

7. IANA Considerations

   TBD.

Cheng, et al.          Expires April 28, 2025                [Page 19]
Internet-Draft        Adaptive Routing Framework          October 2024

8. References

8.1. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
             2119 Key Words", BCP 14, RFC 8174, May 2017

8.2. Informative References

   [I-D.wang-idr-next-next-hop-nodes] Wang, K. , J. Haas. And Lin, C. ,
             "BGP Next-next Hop Nodes", Work in Progress, Internet-
             Draft, draft-wang-idr-next-next-hop-nodes-01, 4 September
             2024, <https://datatracker.ietf.org/doc/html/draft-wang-
             idr-next-next-hop-nodes-01>.

   [I-D. draft-zzhang-rtgwg-router-info] Zhang, Z. , Wang, K. and  Lin,
             C., " Advertising Router Information", Work in Progress,
             Internet-Draft, draft-zzhang-rtgwg-router-info-01, 18
             September 2024,
             <https://datatracker.ietf.org/doc/html/draft-zzhang-rtgwg-
             router-info-01>.

9. Acknowledgments

   TBD.

Cheng, et al.          Expires April 28, 2025                [Page 20]
Internet-Draft        Adaptive Routing Framework          October 2024

Authors' Addresses

   Weiqiang Cheng
   China Mobile
   China
   Email: chengweiqiang@chinamobile.com

   Changwang Lin
   New H3C Technologies
   China
   Email: linchangwang.04414@h3c.com

   Kevin F. Wang
   Juniper Networks
   Email: kfwang@juniper.net

   Jiaming Ye
   China Mobile
   China
   Email: yejiaming@chinamobile.com

   Rui Zhuang
   China Mobile
   China
   Email: zhuangruiyjy@chinamobile.com

   PengFei Huo
   ByteDance
   China
   Email: huopengfei@bytedance.com

Cheng, et al.          Expires April 28, 2025                [Page 21]