Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
AI Agent Protocols for Multi-modality
draft-hw-protocol-agent-00

Versions:
This document is an Internet-Draft (I-D). Anyone may submit an I-D to the IETF. This I-D is not endorsed by the IETF and has no formal standing in the IETF standards process.
Document	Type	Active Internet-Draft (individual)
	Authors	Yixin Lin , Chenchen Yang , Kuan Zhang , Xueli An, , Shaoyun Wu , Muhammad Awais Jadoon , Sebastian Robitzsch
	Last updated	2026-03-02
	RFC stream	(None)
	Intended RFC status	(None)
	Formats	txt html xml htmlized bibtex bibxml
Stream	Stream state	(No stream defined)
	Consensus boilerplate	Unknown
	RFC Editor Note	(None)
IESG	IESG state	I-D Exists
	Telechat date	(None)
	Responsible AD	(None)
	Send notices to	(None)
Email authors IPR References Referenced by Nits Search email archive
draft-hw-protocol-agent-00
Network Working Group                                     Yixin Lin, Ed.
Internet-Draft                                        Chenchen Yang, Ed.
Intended status: Standards Track                         Kuan Zhang, Ed.
Expires: 3 September 2026                                 Xueli An,, Ed.
                                                         Shaoyun Wu, Ed.
                                           Huawei Technologies Co., Ltd.
                                              Muhammad Awais Jadoon, Ed.
                                                Sebastian Robitzsch, Ed.
                                                InterDigital Europe Ltd.
                                                            2 March 2026

                 AI Agent Protocols for Multi-modality
                       draft-hw-protocol-agent-00

Abstract

   With the advancement of AI technologies, AI Agent traffic will
   account for the majority of network traffic, driving an increasing
   demand for higher quality of multi-modal data transmissions.  Current
   networks lack awareness of the diverse transmission quality
   requirements for multi-modal data within a single traffic, leading to
   degraded service quality and inefficient utilization of network
   resources.  This document proposes methods to enable networks (e.g.,
   mobile network, transport network) to recognize the characteristics
   and the transmission requirement of AI Agent multi-modal data and
   outlines necessary capabilities and features of the AI Agent
   Protocols.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 3 September 2026.

Yixin Lin, et al.       Expires 3 September 2026                [Page 1]
Internet-Draft               Agent Protocol                   March 2026

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Agent Protocol Gap Analysis . . . . . . . . . . . . . . . . .   3
     2.1.  Multi-modality of AI Agent Traffic  . . . . . . . . . . .   3
     2.2.  Indistinguishable Multi-modal AI Agent Traffic  . . . . .   3
   3.  AI Agent Traffic Transmission Requirement on Underlying
           Networks  . . . . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Quality of Service (QoS)  . . . . . . . . . . . . . . . .   5
     3.2.  Awareness of AI Agent Traffic Multi-modality  . . . . . .   5
   4.  Agent Protocols for Multi-modal Traffic . . . . . . . . . . .   5
     4.1.  Multiple Streams for Multi-modal Agent Traffic  . . . . .   5
       4.1.1.  Mapping Information Notification to Underlying
               Networks  . . . . . . . . . . . . . . . . . . . . . .   5
       4.1.2.  Inter-stream Relationship Indicator . . . . . . . . .   7
     4.2.  Single Stream with Multi-modality . . . . . . . . . . . .   7
   5.  Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .  10
   6.  Reference . . . . . . . . . . . . . . . . . . . . . . . . . .  10
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
   9.  Normative References  . . . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   The Agent protocols have garnered significant attention across
   diverse technical domains.  Agent protocols are being introduced by
   open-source and standardization organizations, e.g., A2A protocol by
   Google [1], Agent protocol proposals in IETF [2] [3] and ETSI [4].
   For mobile telecommunications, 3GPP has also expressed substantial
   interest in incorporating Agent capabilities within the future 6G
   network.  Numerous use cases for Agent-centric scenarios have been
   documented within the 3GPP SA1 working group [5].  The 3GPP SA2

Yixin Lin, et al.       Expires 3 September 2026                [Page 2]
Internet-Draft               Agent Protocol                   March 2026

   working group has reached consensus that the 6G network should
   support Agentic network and Agent communication [6].  Adaptations to
   End-to-End Agent protocols are needed to better support AI Agent
   traffic transmission and ensure the quality of such transmission,
   e.g., based on the characteristics of AI Agent traffic.
   Additionally, the 3GPP CT working groups are organizing studies on AI
   Agent protocols for the 6G network.  In this context, IETF AI Agent
   protocols are highly recommended as a reference [7].

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Agent Protocol Gap Analysis

2.1.  Multi-modality of AI Agent Traffic

   An AI Agent traffic may consist of multiple “Part” with multi-modal
   data (e.g., text, file, sensor data, structured data, video, audio
   and image) [3].  Different “Part” of an AI Agent traffic may be
   delivered with different transmission modes (e.g., Streaming, Push
   Notification, Burst).

   Taking Google A2A protocol as an example [1], Message constitutes a
   communication unit between client and server, and Artifacts represent
   concrete outputs of task executions.  Both Messages and Artifacts
   transmit content composed of one or more Parts:

   *  A Part serves as a container for discrete communication content
      segments, which MUST contain exactly one content type:text (purely
      textual), raw (file-based:images, videos, etc.), url (resource
      locator), or data (structured blobs e.g., JSON).

2.2.  Indistinguishable Multi-modal AI Agent Traffic

   There are two distinct methods by which an AI Agent client or server,
   acting as the source, can encapsulate multi-modal data.

   a) The multi-modal data of an AI Agent traffic can be encapsulated
   into multiple lower layers streams (e.g., HTTP streams, MoQ stream,
   QUIC Stream):

Yixin Lin, et al.       Expires 3 September 2026                [Page 3]
Internet-Draft               Agent Protocol                   March 2026

   *  the underlying network lacks awareness that these streams belong
      to the same AI Agent traffic.  Consequently, while network nodes
      (e.g., core network user plane function, radio access network base
      stations) can guarantee the QoS for individual stream during data
      forwarding, the overall performance (e.g., latency) for the
      traffic may increase.

   *  Moreover, in scenarios like multi-Agent systems, it may require
      the multi-modal data to arrive at different receiving Agent
      endpoints as simultaneously as possible.  This characteristic
      renders the existing techniques susceptible to introducing
      significant overall performance (e.g., latency) decrease if the
      underlying network lacks awareness.

   b) The multi-modal data of an AI Agent traffic can be encapsulated
   within a single lower layer stream (e.g., HTTP stream, MoQ stream,
   QUIC Stream), this stream exhibits inherent multi-modal
   characteristics:

   *  This manifests as significant variance in Part sizes (e.g., text
      snippets vs. video files) and divergent quality of service (QoS)
      criticality across Parts, e.g., regarding transmission urgency and
      latency tolerance thresholds.

   *  Limitation: Content of different Parts are encapsulated within a
      single lower layer stream (e.g., as payload of HTTP protocol, MoQ
      protocol, QUIC protocol), the underlying network (e.g., mobile
      network, transport network) remains unaware of the distinct Parts
      and their individual requirements, such as specific QoS needs.

   _The Following figure illustrates the interaction between Agent
   Client and Agent Server via underlying Network:_

   +--------+                  +--------+
   | Agent  |                  | Agent  |
   | Client |                  | Server |
   +--------+                  +--------+
       |                            |
       |       +------------+       |
       +-------| Underlying |-------+
               |  Network   |
               +------------+

   _Figure 1.  Interaction between Agent Client and Agent Server via
   underlying Network._

3.  AI Agent Traffic Transmission Requirement on Underlying Networks

Yixin Lin, et al.       Expires 3 September 2026                [Page 4]
Internet-Draft               Agent Protocol                   March 2026

3.1.  Quality of Service (QoS)

   Multi-modal AI Agent traffic should be transmitted appropriately by
   underlying network.  Taking 3GPP mobile network as an example of
   underlying network, QoS framework controls and manages transmission
   of traffic to ensure performance.  QoS framework governs how diverse
   traffics are treated to meet specific performance requirements (e.g.,
   latency, reliability, data rate, packet loss) [5], [6], [8].  It
   operates primarily through QoS Flows, which are granular data streams
   bound with specific QoS parameters and characteristics.  These
   parameters and characteristics indicate the packet forwarding
   treatment (e.g., scheduling weights, admission thresholds, queue
   management) applied throughout the network, from the User Equipment
   (UE) to the mobile network (e.g., core network, radio access
   network).  Multi-modal data of an AI Agent traffic should be
   transmitted with differentiated QoS to increase the overall
   performance and resource utilization.

3.2.  Awareness of AI Agent Traffic Multi-modality

   The underlying network should be aware of the AI Agent traffic
   characteristics.  Different transmission treatment policies may be
   configured and bounded to the different multi-modal data of an AI
   Agent traffic based on their different traffic characteristics (e.g.,
   delay budget, priority, error rate).  Taking mobile network as an
   example, the Application Server may influence QoS treatment by
   notifying service requirements to the mobile network functions.
   Based on this, the mobile network provisions and configures the
   corresponding QoS enforcement policies.  Subsequently, the mobile
   network maps incoming packets of application traffic to the
   appropriate QoS Flows by matching against the enforcement policies
   and the traffic information (e.g., typically IP 5-tuple).  Similarly,
   the AI Agent client and server need to notify necessary information
   on its AI Agent traffic to the underlying network for better
   performance.

4.  Agent Protocols for Multi-modal Traffic

4.1.  Multiple Streams for Multi-modal Agent Traffic

4.1.1.  Mapping Information Notification to Underlying Networks

   Using method a) mentioned in 3.2, a source Agent (client or server)
   transmits multi-modal data of an AI Agent traffic into distinct lower
   layers streams based on modality and QoS requirements.

Yixin Lin, et al.       Expires 3 September 2026                [Page 5]
Internet-Draft               Agent Protocol                   March 2026

   *  For example, a source Agent transmits multi-modal data of an AI
      Agent traffic into distinct HTTP streams.  Each HTTP stream is
      characterized by a unique IP 5-tuple and carries data of only one
      modality.

   *  As another example, multi-modal data of an AI Agent traffic is
      transmitted via distinct transport layer streams (e.g., MoQ
      streams, QUIC streams) or connections (e.g., TCP connections).

   The source Agent needs to notify the mapping information to the
   underlying network to make alignment.  Based on the mapping
   information, the underlying network can identify the correlation of
   the different lower layer streams, and execute dedicated QoS
   guarantee accordingly.

   The AI Agent protocols in the Agent client and server should have the
   capabilities and features to perform the actions above, e.g., mapping
   multi-modal data of Agent traffic to different lower layer streams,
   notifying the mapping relationship to the underlying network (e.g.,
   via control plane of mobile network), and notifying necessary
   information to lower layers to enable the latter to encapsulate Agent
   traffic assistance information.  The capabilities and features of the
   AI Agent protocol should be highlighted and standardized, e.g., in
   IETF working group(s).

   Moreover, it is possible that the Agent protocol layer and/or lower
   layers need some enhancements, e.g., to encapsulate AI Agent
   assistance information (e.g., traffic characteristics, multi-modal
   data types, correlation information).

+------------------------------------------------------+
|                    Agent Protocol                    |
|               {Correlation Information}              |
+------------------------------------------------------+   --+
| e.g., HTTP Protocol       | e.g., MoQ                |     |
| {Correlation Information} | {Correlation Information}|     |
+------------------------------------------------------+     |-- Lower Layers
|                   Transport Protocol                 |     |
|               {Correlation Information}              |     |
+------------------------------------------------------+   --+

   _Figure 2.  Enhancements in Agent protocol layer and/or lower
   layers._

Yixin Lin, et al.       Expires 3 September 2026                [Page 6]
Internet-Draft               Agent Protocol                   March 2026

4.1.2.  Inter-stream Relationship Indicator

   As outlined in 2.2, there is a potential concern associated with
   correlated multi-stream transmission.  In this case, an inter-stream
   relationship indicator, as an example of the correlation information,
   is introduced to explicitly signal that multiple distinct streams
   belong to the same AI Agent traffic.  This indicator could take a
   form of a unique ID that determined by source Agent(s), or it can be
   assigned by the underlying network.  The sending entity encapsulates
   this indicator within the headers of all associated streams.  Upon
   processing streams containing this indicator, the underlying network
   transmission nodes can leverage it to perform coordinated
   transmission operations.  For example, base stations in mobile
   network can hold data from related streams to facilitate
   synchronization, and apply scheduling algorithms that consider the
   temporal relationship between streams.  This enables the base station
   to orchestrate the transmission of data from these related streams to
   minimize arrival time variance to the receiver(s), ensuring data
   arrives as simultaneously as possible at intended endpoint(s).

   This inter-stream relationship indicator can be encapsulated in the
   header of Agent Protocol Layer or lower layers, e.g., HTTP Layer,
   Transport Layer, etc.

   +-------------------------------------------------+
   | POST /rpc/ HTTP/2.0                             |
   +-------------------------------------------------+
   | Host: agent.example.com                         |
   | Content-Type: application/json                  |
   | Content-Length: xxx                             |
   | Correlation Information: Relationship indictor  |
   +-------------------------------------------------+

   _Figure 3. *Correlation Information* in the HTTP header._

4.2.  Single Stream with Multi-modality

   Using method b) mentioned in 2.2, the source Agent can encapsulate
   multi-modal data of an AI Agent traffic (e.g., as separate Parts)
   within a single lower layer stream.  One of the limitations of this
   method is that the underlying network remains unaware of individual
   Parts and their associated QoS requirements.

   To address this limitation, the source Agent is required to perform
   Part segmentation during encapsulation: it should separate different
   Parts with clear boundaries, and embed supplementary metadata (e.g.,
   traffic information, QoS requirements) corresponding to each Part
   within the packet header or sub-headers.  This supplementary metadata

Yixin Lin, et al.       Expires 3 September 2026                [Page 7]
Internet-Draft               Agent Protocol                   March 2026

   enables the underlying network to identify assistance information on
   specific Parts (e.g., their different traffic characteristics and
   respective QoS requirements).

   a) In one implementation, supplementary metadata of each Part can be
   encapsulated in the packet header (e.g., Agent protocol header, or
   lower layer header) as in Figure 4.  For example, the header contains
   fields such as:

   *  the total number of Parts,

   *  sequence number of each Part and the corresponding traffic
      information (e.g. size of a Part, modality of a Part, and
      transmission mode of a Part), QoS requirements (e.g., latency,
      error rate, priority, privacy), etc.

   The contents of different Parts will be encapsulated in the payload
   in sequence with the same order of the supplementary metadata in the
   header.

+----------------------------+-------------+-------------+-------------+
|Header{                     |             |             |             |
|   Number of Parts,         |             |             |             |
|   {Part 1,                 |             |             |             |
|      {Traffic information, |             |             |             |
|       QoS requirements},   |   Payload   |   Payload   |     ...     |
|    Part 2,                 |   (Part1)   |   (Part2)   |             |
|      {...},                |             |             |             |
|   ...                      |             |             |             |
|   }                        |             |             |             |
|}                           |             |             |             |
+----------------------------+-------------+-------------+-------------+

   _Figure 4.  Supplementary metadata in the *header*._

   b) In another implementation, supplementary metadata of each Part can
   be encapsulated in the sub-header of the sub-PDU as in Figure 5.  For
   example, each sub-header contains fields such as:

   *  sequence number of each Part and the corresponding traffic
      information (e.g. size of a Part, modality of a Part, and
      transmission mode of a Part), QoS requirements (e.g., latency,
      error rate, priority, privacy), etc.

Yixin Lin, et al.       Expires 3 September 2026                [Page 8]
Internet-Draft               Agent Protocol                   March 2026

   The contents of different Parts, on the other hand, will be
   encapsulated in the payload following the corresponding sub-header.
   In this case, a common header (e.g., in front of the subPDUs) can
   contain the field to indicate the total number of Parts of an AI
   Agent traffic.

                       +------------------------+
                       |Sub-header{             |
                       |   Part ID / SN,        |
                       |   Traffic information, |
                       |   QoS requirements     |
                       |}                       |
                       +------------------------+
                       |                       /
                       |                    /
                       |                 /
                       |              /
                       |           /
                       +------------------+  +------------------+
                       |Sub-header|Payload|  |Sub-header|Payload|
                       +------------------+  +------------------+
                       |                 /  /                  /
                       |                / /                   /
                       |               //                    /
   +---------------------------------------------------------------+
   |       Header      |               |                     |     |
   | number of subPDUs |     subPDU    |       subPDU        | ... |
   +---------------------------------------------------------------+

   _Figure 5.  Supplementary metadata in the *sub-headers*._

   Moreover, the source Agent may also inform the underlying network in
   advance about the supplementary metadata (e.g., traffic information,
   QoS requirements) corresponding to each distinct Part.  The
   underlying network then provision QoS parameters for each Part.  Upon
   receiving the stream, the underlying network can distinguish these
   multi-modal Parts based on their specific characteristics and
   associated QoS requirements.

   Upon data arrival at the receiving endpoint, the entity of underlying
   network can perform reassembly of data packets associated with an AI
   Agent traffic prior to delivery to the receiving Agent client/server.
   Or, the entity of Agent protocol layer in the receiving Agent client/
   server can reassemble the data packets.  Which one is better can be
   left for further study.

Yixin Lin, et al.       Expires 3 September 2026                [Page 9]
Internet-Draft               Agent Protocol                   March 2026

5.  Summary

   In this paper, we address the challenge wherein the underlying
   network lacks awareness of multi-modality of AI Agent traffic.  We
   proposed two different ways for the source Agent client or server to
   encapsulate multi-modal data: one is to encapsulate multi-modal data
   into multiple streams, another one is to encapsulate multi-modal data
   into a single stream.  In the former case, the source Agent notifies
   stream correlation information to the underlying network.  In the
   latter case, the source Agent separates multi-modal data into
   different Parts, and embed supplementary metadata (e.g., traffic
   information and QoS requirements) within the header and/or sub-
   headers of Agent protocol layer or lower layers.  For both methods,
   upon receiving the stream(s), underlying network can apply
   differential treatment to individual Parts, e.g., based on their
   specific QoS requirements.

6.  Reference

   [1] A2A Protocol.  "Specification". https://a2a-protocol.org/v0.2.5/
   specification/, 2025.

   [2] Jonathan Rosenberg , Cullen Fluffy Jennings, “Framework, Use
   Cases and Requirements for AI Agent Protocols”,
   https://datatracker.ietf.org/doc/draft-rosenberg-ai-protocols/,
   2025.05

   [3] Chenchen Yang , Huanhuan Huang, , Arashmid Akhavain , Faye Liu ,
   Xueli An, , Weijun Xing, , Jinyan Li , Aijun Wang , Yang Wencong,
   “Requirements and Enabling Technologies of Agent Protocols for 6G
   Networks.”

   [4] ETSI ENI ISG 055 Early Draft, “Use Cases and Requirements for AI
   Agents based Core Network”, 2025.06.04.

   [5] 3GPP TR 22.870 (V1.1.0): “Study on 6G Use Cases and Service
   Requirements; Stage 1 (Release 20)”.

   [6] 3GPP TR 23.801-01 (V0.3.0): “Study on Architecture for 6G System;
   Stage 2 (Release 20)”.

   [7] C3-260051, C4-260225, C1-260122, “New SID on Study on the
   Protocol for AI in the 6G System”, 2026.02.

   [8] 3GPP TR 23.501 (V20.0.0): “System architecture for the 5G System
   (5GS); Stage 2 (Release 20)”.

Yixin Lin, et al.       Expires 3 September 2026               [Page 10]
Internet-Draft               Agent Protocol                   March 2026

7.  Security Considerations

   This document does not introduce any new security considerations.

8.  IANA Considerations

   This document has no IANA actions.

9.  Normative References

   [RFC8986]  Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer,
              D., Matsushima, S., and Z. Li, "Segment Routing over IPv6
              (SRv6) Network Programming", RFC 8986,
              DOI 10.17487/RFC8986, February 2021,
              <https://www.rfc-editor.org/rfc/rfc8986>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

Authors' Addresses

   Yixin Lin (editor)
   Huawei Technologies Co., Ltd.
   Shanghai
   China
   Email: linyixin6@huawei.com

   Chenchen Yang (editor)
   Huawei Technologies Co., Ltd.
   Shanghai
   China
   Email: yangchenchen7@huawei.com

   Kuan Zhang (editor)
   Huawei Technologies Co., Ltd.
   Shanghai
   China
   Email: zhangkuan3@huawei.com

Yixin Lin, et al.       Expires 3 September 2026               [Page 11]
Internet-Draft               Agent Protocol                   March 2026

   Xueli An, (editor)
   Huawei Technologies Co., Ltd.
   Munich
   Germany
   Email: Xueli.An@huawei.com

   Shaoyun Wu (editor)
   Huawei Technologies Co., Ltd.
   Shanghai
   China
   Email: wushaoyun@huawei.com

   Muhammad Awais Jadoon (editor)
   InterDigital Europe Ltd.
   London
   United Kingdom
   Email: Muhammad.AwaisJadoon@InterDigital.com

   Sebastian Robitzsch (editor)
   InterDigital Europe Ltd.
   London
   United Kingdom
   Email: Sebastian.Robitzsch@InterDigital.com

Yixin Lin, et al.       Expires 3 September 2026               [Page 12]
AI Agent Protocols for Multi-modality draft-hw-protocol-agent-00

AI Agent Protocols for Multi-modality
draft-hw-protocol-agent-00