Skip to main content

Fully Adaptive Routing Ethernet using BGP
draft-xu-idr-fare-02

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Expired & archived
Authors Xiaohu Xu , Shraddha Hegde , Zongying He , Junjie Wang , Hongyi Huang , Qingliang Zhang , Hang Wu , Yadong Liu , Yinben Xia , Peilong Wang , Tiezheng
Last updated 2025-03-17 (Latest revision 2024-09-01)
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state Expired
Telechat date (None)
Responsible AD (None)
Send notices to (None)

This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:

Abstract

Large language models (LLMs) like ChatGPT have become increasingly popular in recent years due to their impressive performance in various natural language processing tasks. These models are built by training deep neural networks on massive amounts of text data, often consisting of billions or even trillions of parameters. However, the training process for these models can be extremely resource- intensive, requiring the deployment of thousands or even tens of thousands of GPUs in a single AI training cluster. Therefore, three- stage or even five-stage CLOS networks are commonly adopted for AI networks. The non-blocking nature of the network become increasingly critical for large-scale AI models. Therefore, adaptive routing is necessary to dynamically distribute the traffic to the same destination over multiple equal-cost paths, based on the network capacity and even congestion information along those paths.

Authors

Xiaohu Xu
Shraddha Hegde
Zongying He
Junjie Wang
Hongyi Huang
Qingliang Zhang
Hang Wu
Yadong Liu
Yinben Xia
Peilong Wang
Tiezheng

(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)