Skip to main content

SCONE-Based Rate Advice for RoCEv2 Networks
draft-zz-scone-rate-advice-rocev2-00

Document Type Active Internet-Draft (individual)
Authors Guangyu Zhao , Cheng Zhou
Last updated 2026-07-02
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-zz-scone-rate-advice-rocev2-00
scone                                                            G. Zhao
Internet-Draft                                                   C. Zhou
Intended status: Informational                              China Mobile
Expires: 3 January 2027                                      2 July 2026

              SCONE-Based Rate Advice for RoCEv2 Networks
                  draft-zz-scone-rate-advice-rocev2-00

Abstract

   This document describes the applicable scenarios of the Standard
   Communication Protocol for Network Elements (SCONE) in RoCEv2
   networks.  SCONE defines a mechanism that enables network elements on
   the forwarding path to deliver throughput guidance to RoCEv2
   endpoints.  This document further specifies the method for carrying
   Rate Advice in RoCEv2 packets.  The Rate Advice is generated by
   network nodes (e.g., switches), which can be either rate limits
   defined by network policies or quantitative rate adjustment
   recommendations derived from network status information.

   This document specifies the packet format for Rate Advice and the
   calculation method for the advised rate.

Note

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 3 January 2027.

Zhao & Zhou              Expires 3 January 2027                 [Page 1]
Internet-Draft           Rate Advice for RoCEv2                July 2026

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Deployment Scenarios  . . . . . . . . . . . . . . . . . . . .   3
     3.1.  Deployment within Intelligent Computing Data Centers  . .   3
     3.2.  Deployment across Intelligent Computing Data Centers  . .   4
   4.  Overview of Rate Advice Framework . . . . . . . . . . . . . .   5
   5.  Acquisition of Advised Rate . . . . . . . . . . . . . . . . .   5
     5.1.  Rate Policy Based on Network Element Configuration  . . .   5
     5.2.  Calculation Based on Egress Bandwidth Congestion
           Status  . . . . . . . . . . . . . . . . . . . . . . . . .   6
   6.  Rate Advice Message . . . . . . . . . . . . . . . . . . . . .   6
     6.1.  SCONE-RoCEv2 Packet Format  . . . . . . . . . . . . . . .   6
     6.2.  Definition of SCONE-RoCEv2 Fields . . . . . . . . . . . .   7
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   9.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .   7
   10. Informative References  . . . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   Remote Direct Memory Access (RDMA) enables data to be read from and
   written to remote memory without involving the CPU, which effectively
   reduces latency, improves throughput and lowers computing overhead of
   devices.  It is widely deployed in high-performance computing and
   artificial intelligence scenarios.  As a native lossless network for
   HPC and AI, InfiniBand [INFINIBAND] natively supports RDMA.  The RDMA
   over Converged Ethernet (RoCE) standard introduces RDMA capabilities
   to Ethernet.  Among its versions, RoCEv2 carries the InfiniBand
   transport layer over UDP/IP and has become the most widely adopted
   version in the industry.

Zhao & Zhou              Expires 3 January 2027                 [Page 2]
Internet-Draft           Rate Advice for RoCEv2                July 2026

   Data Center Quantized Congestion Notification (DCQCN) is the default
   congestion control algorithm for RoCEv2 networks [DCQCN].  When a
   DCQCN flow starts transmitting, it sends data at the physical line
   rate by default.  It reduces the transmission rate upon receiving
   Congestion Notification Packets (CNP) and restores the rate quickly
   when the network is free of congestion, forming an end-to-end
   transmission rate adjustment mechanism.  However, the binary nature
   of the Explicit Congestion Notification (ECN) mechanism can only
   indicate the presence or absence of network congestion, which limits
   the timeliness and effectiveness of rate adjustment on the sender
   side.

   Based on the concept of SCONE [SCONE-PROTOCOL], providing advised
   transmission rates for RoCEv2 endpoints is of great practical
   significance.  This document defines a Rate Advice mechanism adapted
   to the RoCEv2 protocol, to deliver quantitative rate recommendations.

2.  Terminology

   The following terms are used in this document:

   *  Rate Advice: A quantitative rate adjustment recommendation
      generated by network nodes (e.g., switches) and delivered to the
      sender.

   *  Rate Advice Message: An extended message based on the RoCEv2
      format, which carries Rate Advice information from network nodes
      to the sender.

3.  Deployment Scenarios

3.1.  Deployment within Intelligent Computing Data Centers

   Inside an intelligent computing data center, RoCEv2 flows are
   generally exchanged within a single data center, and the network is
   mainly built with two-tier Leaf-Spine or three-tier Clos
   architectures.  Network nodes such as Spine switches and Leaf
   switches can monitor the egress queue depth, queuing delay and packet
   loss rate.  When an uplink link gets congested, the switch generates
   Rate Advice Messages based on real-time monitoring results and sends
   them directly to the senders, acting as a supplement to DCQCN.

Zhao & Zhou              Expires 3 January 2027                 [Page 3]
Internet-Draft           Rate Advice for RoCEv2                July 2026

   In this deployment scenario, SCONE-enabled network nodes reside on
   the data plane of the data center network and coexist with the
   existing DCQCN protocol stack.  Senders can receive both CNP
   congestion notifications from DCQCN and Rate Advice from SCONE.  The
   recommended implementation policy is that senders give priority to
   responding to emergency congestion indications carried in CNP, and
   conduct gradual rate adjustment with reference to the advised rate in
   SCONE Rate Advice.

3.2.  Deployment across Intelligent Computing Data Centers

   For interconnection of distributed intelligent computing clusters,
   multiple intelligent computing data centers are interconnected via
   WAN links (e.g., dedicated lines, SRv6 tunnels) to form a cross-
   domain logical intelligent computing pool.  With the continuous
   expansion of large language models, a single data center is
   restricted by power supply, heat dissipation and physical space.
   Cross-data center collaborative training has become an inevitable
   industry demand.  For latest progress, Google DeepMind adopted
   Decoupled DiLoCo [DiLoCo] to train a 12-billion-parameter model
   across four regions in the United States, achieving more than 20x
   faster training speed compared with traditional synchronous methods.

   In this deployment model, data center gateways or WAN edge routers
   act as aggregation nodes for RoCEv2 traffic.  The Round-Trip Time
   (RTT) of WAN links is much longer than that inside data centers,
   which can reach tens of milliseconds.  The long RTT leads to delayed
   rate reduction.  In extreme cases, when the RTT is 20 ms and the
   packet loss rate is 0.1%, the throughput may drop to nearly zero
   [URDMA].

   Deploying SCONE-enabled network nodes on central gateways or WAN edge
   routers to generate path-based Rate Advice provides valuable
   reference for the transmission rate of endpoints.  Two deployment
   modes are available:

   *  Near-source Gateway Generation: The source gateway sends Rate
      Advice to senders, reflecting the congestion status of the WAN
      ingress path.

   *  Remote Gateway Generation: The WAN egress gateway sends
      backpressure signals to the source gateway, and then the source
      gateway forwards Rate Advice to senders.

   The near-source gateway mode features a shorter control loop and
   faster response, which is the preferred solution.

Zhao & Zhou              Expires 3 January 2027                 [Page 4]
Internet-Draft           Rate Advice for RoCEv2                July 2026

4.  Overview of Rate Advice Framework

   This section describes the overall architecture of the Rate Advice
   framework.

   The Rate Advice mechanism introduces a dedicated direct signaling
   path from network nodes (switches) to senders, as shown in Figure 1.

          +----------+         +-----------+         +----------+
          |          <--------->  Network  <--------->          |
          |  Sender  |  RoCEv2 |  Element  |  RoCEv2 | Receiver |
          |          |  Data   |           |  Data   |          |
          +-----^----+         +-----+-----+         +----------+
                |                    |
                |  Rate Advice Msg   |
                +--------------------+

                    Figure 1: Rate Advice Signaling Path

   The sender indicates its support for the Rate Advice capability in
   the RoCEv2 packet header.  Network nodes parse this indication and
   enable the Rate Advice function only for capable senders.  Network
   devices on the data path calculate the advised rate for each flow
   according to pre-defined rate limits of network policies or real-time
   egress queue status.  The network device encapsulates the Rate Advice
   into a SCONE-RoCEv2 message and transmits it to the sender.  The
   sender parses the advised rate from the SCONE-RoCEv2 message and
   adjusts its transmission rate accordingly.

5.  Acquisition of Advised Rate

   The advised rate can be obtained in two ways: acquiring the advised
   rate or rate upper limit for each flow according to the rate policies
   configured on network elements, or calculating the value based on the
   egress queue depth, queuing delay and packet loss rate of network
   elements.

5.1.  Rate Policy Based on Network Element Configuration

   Network nodes directly obtain the advised rate or rate upper limit
   for each flow according to pre-configured rate policies set by
   administrators, such as per-flow rate limiting and priority bandwidth
   allocation.  This method applies to scenarios where operators have
   explicit Service Level Agreement (SLA) and traffic engineering
   policies.

Zhao & Zhou              Expires 3 January 2027                 [Page 5]
Internet-Draft           Rate Advice for RoCEv2                July 2026

5.2.  Calculation Based on Egress Bandwidth Congestion Status

   Network nodes calculate the advised rate based on the real-time
   status of local egress queues, including queue depth, queuing delay
   and packet loss rate.  This method is applicable to Rate Advice
   scenarios that require real-time response to network congestion.

6.  Rate Advice Message

   This section defines the format of the SCONE-RoCEv2 Rate Advice
   Message.

6.1.  SCONE-RoCEv2 Packet Format

   The Base Transport Header (BTH) is the transport layer header of
   InfiniBand, which is adopted by both RoCEv1 and RoCEv2.  A standard
   RoCEv2 packet carries a UDP header, with the structure as follows:

       [ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)]

   The Rate Advice Message reuses the long BTH header format of RoCEv2
   and is identified by a new OpCode value (RATE_ADVICE).  The structure
   of the message is defined below:

  [ETH + IP + UDP(dport 4791) +
                IB(BTH(OpCode = RATE_ADVICE) + Rate Advice Packet + ICRC)]

   As a control message type of RoCEv2, the Rate Advice Message is
   encapsulated in the UDP payload with the UDP destination port set to
   4791.  The complete encapsulation sequence from link layer to
   transport layer is as follows:

                         Rate Advice Packet {
                             Rate (32),
                             Version (32),
                             Destination QP ID (32),
                             Source QP ID (32),
                         }

   A new OpCode value (e.g., 0x1D) shall be assigned to the BTH OpCode
   field to identify this packet as a Rate Advice Message.  Other fields
   in BTH (such as P_Key, FECN, etc.) shall be set in compliance with
   standard RoCEv2 specifications.

Zhao & Zhou              Expires 3 January 2027                 [Page 6]
Internet-Draft           Rate Advice for RoCEv2                July 2026

6.2.  Definition of SCONE-RoCEv2 Fields

   *  Rate (32 bits): The advised transmission rate, measured in Mbps.
      This value can be calculated by network nodes based on congestion
      metrics (queue depth, queuing delay, packet loss rate), or
      directly derived from rate policies configured by administrators.

   *  Version (32 bits): Version number.  The initial version defined in
      this specification is 0x00000001.  The version number will be
      incremented for future backward-incompatible modifications.

   *  Destination QP ID (32 bits): Destination Queue Pair Identifier.
      This field should be filled with the Queue Pair (QP) number
      corresponding to the sender.  When generating a Rate Advice
      Message, the network node extracts this field from the BTH of the
      original RoCEv2 data packet and copies the value.  Upon reception,
      the sender uses this field to associate the Rate Advice with the
      corresponding flow.

   *  Source QP ID (32 bits): Source Queue Pair Identifier.  This field
      is generally set to the QP number used by the receiver or the
      network node.

7.  Security Considerations

   (TBD)

8.  IANA Considerations

   This document does not require any IANA actions.

9.  Contributors

   The following people have substantially contributed to this document:

   Hongwei Yang
   China Mobile
   Beijing
   China
   Email: yanghongwei@chinamobile.com

   Zhiqiang Li
   China Mobile
   Beijing
   China
   Email: lizhiqiangyjy@chinamobile.com

Zhao & Zhou              Expires 3 January 2027                 [Page 7]
Internet-Draft           Rate Advice for RoCEv2                July 2026

10.  Informative References

   [SCONE-PROTOCOL]
              Thomson, M., "Standard Communication with Network Elements
              (SCONE) Protocol", Work in Progress, Internet-Draft,
              draft-ietf-scone-protocol-04, May 2025,
              <https://datatracker.ietf.org/doc/html/draft-ietf-scone-
              protocol-04>.  Work in Progress

   [DiLoCo]   Douillard, A., Rush, K., and Y. Donchev, "Decoupled DiLoCo
              for Resilient Distributed Pre-training", April 2026.

   [INFINIBAND]
              InfiniBand Trade Association, "InfiniBand Architecture
              Specification Volume 1, Release 1.5", June 2020.

   [DCQCN]    Zhu, Y., "Congestion Control for Large-Scale RDMA
              Deployments", ACM SIGCOMM Proceedings, 2015.

   [URDMA]    Duan, X D., Lu, L., Sun, T., Li, Z Q., Yang, H W., and Z
              P. Du, "URDMA technologies for wide-area high-throughput
              network", Journal ZTE Technology Journal, Volume 30,
              Issue 6, Pages 23-30, June 2024.

Authors' Addresses

   Guangyu Zhao
   China Mobile
   Beijing
   China
   Email: zhaoguangyu@chinamobile.com

   Cheng Zhou
   China Mobile
   Beijing
   China
   Email: zhouchengyjy@chinamobile.com

Zhao & Zhou              Expires 3 January 2027                 [Page 8]