Requirements Analysis of System and Network for Large Language Model Inference Service
draft-liu-nmrg-ai-llm-inference-requirements-02
| Document | Type |
Expired Internet-Draft
(individual)
Expired & archived
|
|
|---|---|---|---|
| Authors | Liu Chang , Chuyi Guo | ||
| Last updated | 2026-05-07 (Latest revision 2025-11-03) | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | Expired | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:
Abstract
With the rise of ChatGPT, DeepSeek, and other Large Language Models, which is short for LLMs in the remaining part, as well as the proliferation of inference applications, inference serving oriented to large-scale users has become increasingly critical. However, due to the extreme demands on computing power and communication during inference, the large-scale service deployment of LLMs poses significant challenges. To address these challenges, different vendors have adopted diverse inference service architectures, such as vLLM, SGLang, Mooncake, etc. This paper investigates mainstream inference frameworks, summarizes their core design principle and research question, and analyzes the challenges and requirements they impose on network management. The goal is to lay a foundation for defining a unified LLM inference architecture in the future.
Authors
(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)