Multicast Use Cases for Large Language Model Synchronization
draft-liu-rtgwg-llmsync-multicast-00
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Yisong Liu , Zheng Zhang , Junye Zhang | ||
| Last updated | 2026-02-12 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-liu-rtgwg-llmsync-multicast-00
RTGWG Y. Liu
Internet-Draft China Mobile
Intended status: Informational Z. Zhang
Expires: 16 August 2026 ZTE Corporation
J. Zhang
China Mobile
12 February 2026
Multicast Use Cases for Large Language Model Synchronization
draft-liu-rtgwg-llmsync-multicast-00
Abstract
Large Language Models (LLMs) deployments are becoming increasingly
widespread, with inference services being the most common
application. This draft will discuss multicast use cases for
inference cloud services.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 16 August 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
Liu, et al. Expires 16 August 2026 [Page 1]
Internet-Draft Abbreviated Title February 2026
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 2
2. LLM Synchronization . . . . . . . . . . . . . . . . . . . . . 2
3. Multicast technologies applying . . . . . . . . . . . . . . . 4
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
5. Security Considerations . . . . . . . . . . . . . . . . . . . 5
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.1. Normative References . . . . . . . . . . . . . . . . . . 5
6.2. Informative References . . . . . . . . . . . . . . . . . 5
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6
1. Introduction
With the rapid development of AI and the widespread application of
large language models (LLMs), inference services are the most
frequently used services. Different users may use different LLMs,
and the same user may use multiple LLMs simultaneously for inference
to obtain the optimal solution. AI inference cloud providers can
provide large-scale real-time inference, fine-tuning, and model
optimization services on GPU cloud platforms. However, the GPU
infrastructure of AI inference cloud providers may cover multiple
cloud platforms and regions, facing significant challenges in
deployment and application, including highly concurrent model loading
and severe cold start latency.
This draft will discuss multicast use cases for inference cloud
services.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. LLM Synchronization
Liu, et al. Expires 16 August 2026 [Page 2]
Internet-Draft Abbreviated Title February 2026
+------------------+
| Model Repository |
+--+-----+-----+---+
| | |
| | |
+----------------+ | +----------------+
| | |
| | |
+-------------------+ +-------------------+ +-------------------+
| +---------------+ | | +---------------+ | | +---------------+ |
| | Local Storage | | | | Local Storage | | | | Local Storage | |
| +---------------+ | | +---------------+ | | +---------------+ |
| | | | | |
| +---------------+ | | +---------------+ | | +---------------+ |
| | GPU Cluster | | | | GPU Cluster | | | | GPU Cluster | |
| +---------------+ | | +---------------+ | | +---------------+ |
| | | | | |
| GPU Cloud 1 | | GPU Cloud 2 | | GPU Cloud n |
+-------------------+ +-------------------+ +-------------------+
Figure 1
Highly concurrent model loading refers to the peak load and
concurrency challenge of simultaneously downloading the same popular
large model across dozens of GPU cloud platforms. A single
deployment of a model may involve 10 to 100+ copies, each requiring a
complete copy of the model file. Hundreds of GPU servers
simultaneously downloading the same model (each ranging from 70GB to
1TB in size) within a single cluster generates enormous bandwidth
demands and creates I/O bottlenecks.
Severe cold start latency refers to the high delay in initial model
deployment caused by slow download speeds. Replica startup time is
limited by inbound network bandwidth, which varies significantly
between different cloud providers and is typically much lower than
the cluster's internal bandwidth. This significantly impacts the
download efficiency of large models in practical applications.
Highly concurrent model downloading are typical multicast
applications. Such multicast applications have the following
characteristics:
* Large data volume: Due to the large size of the models, typically
a single model can reach 70GB to 1TB, placing extremely high
demands on network bandwidth.
Liu, et al. Expires 16 August 2026 [Page 3]
Internet-Draft Abbreviated Title February 2026
* Transmission time: Due to cold start latency requirements, data
transmission needs to be completed as quickly as possible.
Transmission times exceeding tens of minutes will significantly
impact user experience; therefore, the shorter the time, the
better.
Therefore, when applying multicast technology to this scenario, it is
necessary to consider ensuring high bandwidth and low latency.
3. Multicast technologies applying
Considering the need to conserve network bandwidth, ingress interface
replication technology is not suitable for this scenario. PIM-SM, SR
P2MP or BIER technologies should be considered instead.
Protocol Independent Multicast - Sparse Mode (PIM-SM) [RFC7761] is a
traditional multicast technology. It is widely used in scenarios
where the receivers are relatively fixed, such as IPTV systems. When
the network topology of the multicast tree changes, a new multicast
tree needs to be established for each multicast stream via PIM-SM
signaling after the BGP/IGP protocol converges. The convergence time
of the multicast tree is much longer than that of the IGP protocol.
SR-P2MP (Segment Routing Replication for Multipoint Service Delivery)
[I-D.ietf-pim-sr-p2mp-policy] is a relatively new tunneling
technology that uses SR-MPLS/SRv6 (Segment Routing over IPv6)
tunneling technology for multicast traffic transmission. It requires
the routing module of the controller or ingress node to calculate and
determine the path of the multicast traffic. Then, the controller or
ingress node issues a SID (Segment Identifier) for multicast
operations to the replication point (i.e., the multicast replication
point) in the network. When multicast traffic enters the tunnel, it
is replicated and forwarded at the replication point according to the
multicast operation SID. When the network topology changes, the
controller or ingress node needs to recalculate and determine the
replication point and issue the multicast operation SID to the
changed replication point, so that subsequent multicast traffic will
be forwarded through the new path.
BIER (Bit-Indexed Explicit Replication) [RFC8279] is an architecture
that provides optimal multicast forwarding through a "multicast
domain", without requiring intermediate routers to maintain any per-
flow state or to engage in an explicit tree-building protocol. BIER
is more flexible than PIM-SM and SR P2MP. When link failures or
other issues occur on the multicast forwarding path, BIER can
converge along with IGP convergence, a speed far exceeding that of
PIM-SM and SR P2MP.
Liu, et al. Expires 16 August 2026 [Page 4]
Internet-Draft Abbreviated Title February 2026
When considering applying multicast technology to large model
synchronization scenarios, if the model is synchronized to the same
destination GPU clouds each time, a multicast tree can be pre-
established, or the SR replication path can be calculated using the
controller, and PIM-SM or SR P2MP technologies can be used for model
copying.
If the destination GPU clouds for each model synchronization is
different, pre-establishing a multicast tree or multicast path each
time using PIM-SM/SR P2MP technologies may be inefficient because
multicast tree establishment takes time. In this case, using BIER
technology is a better choice.
4. IANA Considerations
There are no IANA consideration introduced by this draft.
5. Security Considerations
There are no security issues introduced by this draft.
6. References
6.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I.,
Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent
Multicast - Sparse Mode (PIM-SM): Protocol Specification
(Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March
2016, <https://www.rfc-editor.org/info/rfc7761>.
[RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
Przygienda, T., and S. Aldrin, "Multicast Using Bit Index
Explicit Replication (BIER)", RFC 8279,
DOI 10.17487/RFC8279, November 2017,
<https://www.rfc-editor.org/info/rfc8279>.
6.2. Informative References
[I-D.ietf-pim-sr-p2mp-policy]
Parekh, R., Voyer, D., Filsfils, C., Bidgoli, H., and Z.
J. Zhang, "Segment Routing Point-to-Multipoint Policy",
Work in Progress, Internet-Draft, draft-ietf-pim-sr-p2mp-
Liu, et al. Expires 16 August 2026 [Page 5]
Internet-Draft Abbreviated Title February 2026
policy-22, 4 September 2025,
<https://datatracker.ietf.org/doc/html/draft-ietf-pim-sr-
p2mp-policy-22>.
Authors' Addresses
Yisong Liu
China Mobile
China
Email: liuyisong@chinamobile.com
Zheng Zhang
ZTE Corporation
China
Email: zhang.zheng@zte.com.cn
Junye Zhang
China Mobile
China
Email: zhangjunye@chinamobile.com
Liu, et al. Expires 16 August 2026 [Page 6]