Skip to main content

Requirements Analysis of System and Network for Large Language Model Inference Service
draft-liu-nmrg-ai-llm-inference-requirements-02

Document Type Expired Internet-Draft (individual)
Expired & archived
Authors Liu Chang , Chuyi Guo
Last updated 2026-05-07 (Latest revision 2025-11-03)
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state Expired
Telechat date (None)
Responsible AD (None)
Send notices to (None)

This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:

Abstract

With the rise of ChatGPT, DeepSeek, and other Large Language Models, which is short for LLMs in the remaining part, as well as the proliferation of inference applications, inference serving oriented to large-scale users has become increasingly critical. However, due to the extreme demands on computing power and communication during inference, the large-scale service deployment of LLMs poses significant challenges. To address these challenges, different vendors have adopted diverse inference service architectures, such as vLLM, SGLang, Mooncake, etc. This paper investigates mainstream inference frameworks, summarizes their core design principle and research question, and analyzes the challenges and requirements they impose on network management. The goal is to lay a foundation for defining a unified LLM inference architecture in the future.

Authors

Liu Chang
Chuyi Guo

(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)