Collective Communication Optimizations: Requirement and Analysis
draft-yao-tsvwg-cco-requirement-and-analysis-02
Document | Type |
Expired Internet-Draft
(individual)
Expired & archived
|
|
---|---|---|---|
Authors | Kehan Yao , Xu Shiping , Liu Chang , Yizhou Li , Hongyi Huang , Weifeng Wang , Dirk KUTSCHER | ||
Last updated | 2025-01-09 (Latest revision 2024-07-08) | ||
RFC stream | (None) | ||
Intended RFC status | (None) | ||
Formats | |||
Stream | Stream state | (No stream defined) | |
Consensus boilerplate | Unknown | ||
RFC Editor Note | (None) | ||
IESG | IESG state | Expired | |
Telechat date | (None) | ||
Responsible AD | (None) | ||
Send notices to | (None) |
This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:
Abstract
Gernerative AI applications depend on large scale parallel computing clusters for model training and inference. Existing implementations of collective communication in parallel computing is built on top of RDMA, the most adoptable AI transport protocol. However, One-to- Many, Many-to-One, and Many-to-Many collective operations all depend on point-to-point transport semantics of RDMA, which inevitably introduces more bandwidth occupancy and transmission overhead. Emerging approaches for collective communication optimization focus on network-assisted collective acceleration and can work compatibly with RDMA. This document analyzes different technical schemes for network-assisted collective acceleration based on RDMA, and presents the gap between these work and current IETF standards, notably iWARP. Requirements for designing new standards are proposed accordingly.
Authors
Kehan Yao
Xu Shiping
Liu Chang
Yizhou Li
Hongyi Huang
Weifeng Wang
Dirk KUTSCHER
(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)