Adaptive Routing Notification
draft-wh-rtgwg-adaptive-routing-arn-05
| Document | Type |
Expired Internet-Draft
(individual)
Expired & archived
|
|
|---|---|---|---|
| Authors | Haibo Wang , Xuesong Geng , Xiaohu Xu , Yinben Xia , Jie Dong , Hongyi Huang | ||
| Last updated | 2026-06-11 (Latest revision 2025-12-08) | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | Expired | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:
Abstract
Large-scale supercomputing and AI data centers utilize multipath to implement load balancing and/or improve transport reliability. Adaptive routing (AR), widely used in direct topologies such as dragonfly, is growing popular in commodity data centers to dynamically adjust routing policies based on path congestion and failures. When congestion or failure occurs, the sensing node can not only apply AR locally but also send the congestion/failure information to other nodes in a timely and accurate manner to enforce AR on other nodes, thus avoiding exacerbating congestion on the reported path. This document specifies Adaptive Routing Notification (ARN), a general mechanism to proactively disseminate congestion detection and congestion elimination information for remote nodes to perform re-routing policies. Particularly for AI workloads like DeepSeek's MoE models that exhibit dynamic all-to-all communication patterns with bursty traffic characteristics, such mechanisms become crucial to enable immediate network response to transient congestion conflicts.
Authors
Haibo Wang
Xuesong Geng
Xiaohu Xu
Yinben Xia
Jie Dong
Hongyi Huang
(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)