Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.

An Open, Decentralized, and Scalable Framework for Large Language Model Inference
draft-wang-cats-innetwork-infer-01

Versions:

This document is an Internet-Draft (I-D). Anyone may submit an I-D to the IETF. This I-D is not endorsed by the IETF and has no formal standing in the IETF standards process.

Document	Type	Replaced Internet-Draft (individual) Expired & archived
	Authors	Hanling Wang , Qing Li , Yong Jiang , Mingwei Xu
	Last updated	2026-03-02
	Replaced by	draft-wang-cats-odsi
	RFC stream	(None)
	Intended RFC status	(None)
	Formats	txt html xml htmlized bibtex bibxml
Stream	Stream state	(No stream defined)
	Consensus boilerplate	Unknown
	RFC Editor Note	(None)
IESG	IESG state	Replaced by draft-wang-cats-odsi
	Telechat date	(None)
	Responsible AD	(None)
	Send notices to	(None)

Email authors IPR References Referenced by Nits Search email archive

This Internet-Draft is no longer active. A copy of the expired Internet-Draft is available in these formats:

txt html xml htmlized bibtex bibxml

Abstract

Large Language Model (LLM) inference is increasingly deployed as a networked service, yet existing deployments rely primarily on centralized infrastructure and trusted operators. Such designs limit openness, concentrate resource ownership, and constrain scalability to the capacity of individual providers. At the same time, LLM inference introduces execution characteristics (e.g., strict sequential dependencies, large intermediate activations, and tight latency requirements) that are not well supported by existing network, transport, or coordination mechanisms in open environments. This document specifies an open, decentralized, and scalable framework for executing LLM inference across independently operated and mutually untrusted participants. The framework treats inference as a distributed, layer-wise execution process subject to explicit deadlines, rather than as a monolithic computation or best-effort service. It combines layer-aware activation transport and routing, decentralized coordination among heterogeneous compute resources, and security mechanisms that provide accountability and correctness without assuming trusted execution. This document focuses on the architectural framework, design rationale, problem definition, challenges, and solution space of the Open, Decentralized, and Scalable Inference framework (ODSI). It does not specify concrete wire protocols, message formats, or protocol state machines. Such protocol-level specifications are to be defined in separate documents that build upon the framework described herein.

Authors

Hanling Wang
Qing Li
Yong Jiang
Mingwei Xu

(Note: The e-mail addresses provided for the authors of this Internet-Draft may no longer be valid.)

An Open, Decentralized, and Scalable Framework for Large Language Model Inference draft-wang-cats-innetwork-infer-01

An Open, Decentralized, and Scalable Framework for Large Language Model Inference
draft-wang-cats-innetwork-infer-01