SCONE-Based Rate Advice for RoCEv2 Networks
draft-zz-scone-rate-advice-rocev2-00
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Guangyu Zhao , Cheng Zhou | ||
| Last updated | 2026-07-02 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-zz-scone-rate-advice-rocev2-00
scone G. Zhao
Internet-Draft C. Zhou
Intended status: Informational China Mobile
Expires: 3 January 2027 2 July 2026
SCONE-Based Rate Advice for RoCEv2 Networks
draft-zz-scone-rate-advice-rocev2-00
Abstract
This document describes the applicable scenarios of the Standard
Communication Protocol for Network Elements (SCONE) in RoCEv2
networks. SCONE defines a mechanism that enables network elements on
the forwarding path to deliver throughput guidance to RoCEv2
endpoints. This document further specifies the method for carrying
Rate Advice in RoCEv2 packets. The Rate Advice is generated by
network nodes (e.g., switches), which can be either rate limits
defined by network policies or quantitative rate adjustment
recommendations derived from network status information.
This document specifies the packet format for Rate Advice and the
calculation method for the advised rate.
Note
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 3 January 2027.
Zhao & Zhou Expires 3 January 2027 [Page 1]
Internet-Draft Rate Advice for RoCEv2 July 2026
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Deployment Scenarios . . . . . . . . . . . . . . . . . . . . 3
3.1. Deployment within Intelligent Computing Data Centers . . 3
3.2. Deployment across Intelligent Computing Data Centers . . 4
4. Overview of Rate Advice Framework . . . . . . . . . . . . . . 5
5. Acquisition of Advised Rate . . . . . . . . . . . . . . . . . 5
5.1. Rate Policy Based on Network Element Configuration . . . 5
5.2. Calculation Based on Egress Bandwidth Congestion
Status . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. Rate Advice Message . . . . . . . . . . . . . . . . . . . . . 6
6.1. SCONE-RoCEv2 Packet Format . . . . . . . . . . . . . . . 6
6.2. Definition of SCONE-RoCEv2 Fields . . . . . . . . . . . . 7
7. Security Considerations . . . . . . . . . . . . . . . . . . . 7
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 7
10. Informative References . . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
Remote Direct Memory Access (RDMA) enables data to be read from and
written to remote memory without involving the CPU, which effectively
reduces latency, improves throughput and lowers computing overhead of
devices. It is widely deployed in high-performance computing and
artificial intelligence scenarios. As a native lossless network for
HPC and AI, InfiniBand [INFINIBAND] natively supports RDMA. The RDMA
over Converged Ethernet (RoCE) standard introduces RDMA capabilities
to Ethernet. Among its versions, RoCEv2 carries the InfiniBand
transport layer over UDP/IP and has become the most widely adopted
version in the industry.
Zhao & Zhou Expires 3 January 2027 [Page 2]
Internet-Draft Rate Advice for RoCEv2 July 2026
Data Center Quantized Congestion Notification (DCQCN) is the default
congestion control algorithm for RoCEv2 networks [DCQCN]. When a
DCQCN flow starts transmitting, it sends data at the physical line
rate by default. It reduces the transmission rate upon receiving
Congestion Notification Packets (CNP) and restores the rate quickly
when the network is free of congestion, forming an end-to-end
transmission rate adjustment mechanism. However, the binary nature
of the Explicit Congestion Notification (ECN) mechanism can only
indicate the presence or absence of network congestion, which limits
the timeliness and effectiveness of rate adjustment on the sender
side.
Based on the concept of SCONE [SCONE-PROTOCOL], providing advised
transmission rates for RoCEv2 endpoints is of great practical
significance. This document defines a Rate Advice mechanism adapted
to the RoCEv2 protocol, to deliver quantitative rate recommendations.
2. Terminology
The following terms are used in this document:
* Rate Advice: A quantitative rate adjustment recommendation
generated by network nodes (e.g., switches) and delivered to the
sender.
* Rate Advice Message: An extended message based on the RoCEv2
format, which carries Rate Advice information from network nodes
to the sender.
3. Deployment Scenarios
3.1. Deployment within Intelligent Computing Data Centers
Inside an intelligent computing data center, RoCEv2 flows are
generally exchanged within a single data center, and the network is
mainly built with two-tier Leaf-Spine or three-tier Clos
architectures. Network nodes such as Spine switches and Leaf
switches can monitor the egress queue depth, queuing delay and packet
loss rate. When an uplink link gets congested, the switch generates
Rate Advice Messages based on real-time monitoring results and sends
them directly to the senders, acting as a supplement to DCQCN.
Zhao & Zhou Expires 3 January 2027 [Page 3]
Internet-Draft Rate Advice for RoCEv2 July 2026
In this deployment scenario, SCONE-enabled network nodes reside on
the data plane of the data center network and coexist with the
existing DCQCN protocol stack. Senders can receive both CNP
congestion notifications from DCQCN and Rate Advice from SCONE. The
recommended implementation policy is that senders give priority to
responding to emergency congestion indications carried in CNP, and
conduct gradual rate adjustment with reference to the advised rate in
SCONE Rate Advice.
3.2. Deployment across Intelligent Computing Data Centers
For interconnection of distributed intelligent computing clusters,
multiple intelligent computing data centers are interconnected via
WAN links (e.g., dedicated lines, SRv6 tunnels) to form a cross-
domain logical intelligent computing pool. With the continuous
expansion of large language models, a single data center is
restricted by power supply, heat dissipation and physical space.
Cross-data center collaborative training has become an inevitable
industry demand. For latest progress, Google DeepMind adopted
Decoupled DiLoCo [DiLoCo] to train a 12-billion-parameter model
across four regions in the United States, achieving more than 20x
faster training speed compared with traditional synchronous methods.
In this deployment model, data center gateways or WAN edge routers
act as aggregation nodes for RoCEv2 traffic. The Round-Trip Time
(RTT) of WAN links is much longer than that inside data centers,
which can reach tens of milliseconds. The long RTT leads to delayed
rate reduction. In extreme cases, when the RTT is 20 ms and the
packet loss rate is 0.1%, the throughput may drop to nearly zero
[URDMA].
Deploying SCONE-enabled network nodes on central gateways or WAN edge
routers to generate path-based Rate Advice provides valuable
reference for the transmission rate of endpoints. Two deployment
modes are available:
* Near-source Gateway Generation: The source gateway sends Rate
Advice to senders, reflecting the congestion status of the WAN
ingress path.
* Remote Gateway Generation: The WAN egress gateway sends
backpressure signals to the source gateway, and then the source
gateway forwards Rate Advice to senders.
The near-source gateway mode features a shorter control loop and
faster response, which is the preferred solution.
Zhao & Zhou Expires 3 January 2027 [Page 4]
Internet-Draft Rate Advice for RoCEv2 July 2026
4. Overview of Rate Advice Framework
This section describes the overall architecture of the Rate Advice
framework.
The Rate Advice mechanism introduces a dedicated direct signaling
path from network nodes (switches) to senders, as shown in Figure 1.
+----------+ +-----------+ +----------+
| <---------> Network <---------> |
| Sender | RoCEv2 | Element | RoCEv2 | Receiver |
| | Data | | Data | |
+-----^----+ +-----+-----+ +----------+
| |
| Rate Advice Msg |
+--------------------+
Figure 1: Rate Advice Signaling Path
The sender indicates its support for the Rate Advice capability in
the RoCEv2 packet header. Network nodes parse this indication and
enable the Rate Advice function only for capable senders. Network
devices on the data path calculate the advised rate for each flow
according to pre-defined rate limits of network policies or real-time
egress queue status. The network device encapsulates the Rate Advice
into a SCONE-RoCEv2 message and transmits it to the sender. The
sender parses the advised rate from the SCONE-RoCEv2 message and
adjusts its transmission rate accordingly.
5. Acquisition of Advised Rate
The advised rate can be obtained in two ways: acquiring the advised
rate or rate upper limit for each flow according to the rate policies
configured on network elements, or calculating the value based on the
egress queue depth, queuing delay and packet loss rate of network
elements.
5.1. Rate Policy Based on Network Element Configuration
Network nodes directly obtain the advised rate or rate upper limit
for each flow according to pre-configured rate policies set by
administrators, such as per-flow rate limiting and priority bandwidth
allocation. This method applies to scenarios where operators have
explicit Service Level Agreement (SLA) and traffic engineering
policies.
Zhao & Zhou Expires 3 January 2027 [Page 5]
Internet-Draft Rate Advice for RoCEv2 July 2026
5.2. Calculation Based on Egress Bandwidth Congestion Status
Network nodes calculate the advised rate based on the real-time
status of local egress queues, including queue depth, queuing delay
and packet loss rate. This method is applicable to Rate Advice
scenarios that require real-time response to network congestion.
6. Rate Advice Message
This section defines the format of the SCONE-RoCEv2 Rate Advice
Message.
6.1. SCONE-RoCEv2 Packet Format
The Base Transport Header (BTH) is the transport layer header of
InfiniBand, which is adopted by both RoCEv1 and RoCEv2. A standard
RoCEv2 packet carries a UDP header, with the structure as follows:
[ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)]
The Rate Advice Message reuses the long BTH header format of RoCEv2
and is identified by a new OpCode value (RATE_ADVICE). The structure
of the message is defined below:
[ETH + IP + UDP(dport 4791) +
IB(BTH(OpCode = RATE_ADVICE) + Rate Advice Packet + ICRC)]
As a control message type of RoCEv2, the Rate Advice Message is
encapsulated in the UDP payload with the UDP destination port set to
4791. The complete encapsulation sequence from link layer to
transport layer is as follows:
Rate Advice Packet {
Rate (32),
Version (32),
Destination QP ID (32),
Source QP ID (32),
}
A new OpCode value (e.g., 0x1D) shall be assigned to the BTH OpCode
field to identify this packet as a Rate Advice Message. Other fields
in BTH (such as P_Key, FECN, etc.) shall be set in compliance with
standard RoCEv2 specifications.
Zhao & Zhou Expires 3 January 2027 [Page 6]
Internet-Draft Rate Advice for RoCEv2 July 2026
6.2. Definition of SCONE-RoCEv2 Fields
* Rate (32 bits): The advised transmission rate, measured in Mbps.
This value can be calculated by network nodes based on congestion
metrics (queue depth, queuing delay, packet loss rate), or
directly derived from rate policies configured by administrators.
* Version (32 bits): Version number. The initial version defined in
this specification is 0x00000001. The version number will be
incremented for future backward-incompatible modifications.
* Destination QP ID (32 bits): Destination Queue Pair Identifier.
This field should be filled with the Queue Pair (QP) number
corresponding to the sender. When generating a Rate Advice
Message, the network node extracts this field from the BTH of the
original RoCEv2 data packet and copies the value. Upon reception,
the sender uses this field to associate the Rate Advice with the
corresponding flow.
* Source QP ID (32 bits): Source Queue Pair Identifier. This field
is generally set to the QP number used by the receiver or the
network node.
7. Security Considerations
(TBD)
8. IANA Considerations
This document does not require any IANA actions.
9. Contributors
The following people have substantially contributed to this document:
Hongwei Yang
China Mobile
Beijing
China
Email: yanghongwei@chinamobile.com
Zhiqiang Li
China Mobile
Beijing
China
Email: lizhiqiangyjy@chinamobile.com
Zhao & Zhou Expires 3 January 2027 [Page 7]
Internet-Draft Rate Advice for RoCEv2 July 2026
10. Informative References
[SCONE-PROTOCOL]
Thomson, M., "Standard Communication with Network Elements
(SCONE) Protocol", Work in Progress, Internet-Draft,
draft-ietf-scone-protocol-04, May 2025,
<https://datatracker.ietf.org/doc/html/draft-ietf-scone-
protocol-04>. Work in Progress
[DiLoCo] Douillard, A., Rush, K., and Y. Donchev, "Decoupled DiLoCo
for Resilient Distributed Pre-training", April 2026.
[INFINIBAND]
InfiniBand Trade Association, "InfiniBand Architecture
Specification Volume 1, Release 1.5", June 2020.
[DCQCN] Zhu, Y., "Congestion Control for Large-Scale RDMA
Deployments", ACM SIGCOMM Proceedings, 2015.
[URDMA] Duan, X D., Lu, L., Sun, T., Li, Z Q., Yang, H W., and Z
P. Du, "URDMA technologies for wide-area high-throughput
network", Journal ZTE Technology Journal, Volume 30,
Issue 6, Pages 23-30, June 2024.
Authors' Addresses
Guangyu Zhao
China Mobile
Beijing
China
Email: zhaoguangyu@chinamobile.com
Cheng Zhou
China Mobile
Beijing
China
Email: zhouchengyjy@chinamobile.com
Zhao & Zhou Expires 3 January 2027 [Page 8]