MoQ Multimodal Feedback
draft-jiang-moq-multimodal-feedback-00
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Minghui Jiang , Yanmei Liu , Ronghua Wu | ||
| Last updated | 2026-03-16 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-jiang-moq-multimodal-feedback-00
Media Over QUIC M. Jiang
Internet-Draft Y. Liu
Intended status: Standards Track Alibaba Inc.
Expires: 18 September 2026 R. Wu
Ant Group.
17 March 2026
MoQ Multimodal Feedback
draft-jiang-moq-multimodal-feedback-00
Abstract
This document defines an extension to Media over QUIC Transport
(MOQT) that enables MoQ receivers to report delivery quality
information for media Objects to senders. The MoQ layer synthesizes
MMF feedback and local congestion control (CC) output to compute
control decisions such as bitrate, frame rate, and pacing, and inform
the CC algorithm module via a cross-layer control interface. This
mechanism reuses the MOQT Track/Object data model without introducing
new control message types. While QUIC ACK and reception timestamp
extensions continue to provide per-packet CC signals; this mechanism
adds per-Object media semantic feedback when the MMF extension is
negotiated and enabled.
Discussion Venues
This note is to be removed before publishing as an RFC.
Discussion of this document takes place on the Media Over QUIC
Working Group mailing list (moq@ietf.org), which is archived at
https://mailarchive.ietf.org/arch/browse/moq/.
Source for this draft and an issue tracker can be found at
https://github.com/Yanmei-Liu/draft-moq-multimodal-feedback.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Jiang, et al. Expires 18 September 2026 [Page 1]
Internet-Draft MoQ Multimodal Feedback March 2026
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 18 September 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Conventions and Definitions . . . . . . . . . . . . . . . 4
2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Three-Layer Architecture: Application / MoQ /
Transport . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1. Application Layer: Media Sources . . . . . . . . . . 5
3.1.2. MoQ Layer: Semantic Hub . . . . . . . . . . . . . . . 6
3.1.3. Transport Layer: CC Algorithms . . . . . . . . . . . 7
3.2. Dual-Layer Feedback Model: QUIC receive-ts and MMF . . . 7
3.2.1. QUIC receive-ts Requirements . . . . . . . . . . . . 8
3.3. Cross-Layer Control Interface . . . . . . . . . . . . . . 8
3.4. Bidirectional MMF: Output Feedback and Input Feedback . . 10
4. Feedback Track . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Track Definition . . . . . . . . . . . . . . . . . . . . 10
4.2. Track Naming . . . . . . . . . . . . . . . . . . . . . . 11
4.3. Track Establishment and Lifecycle . . . . . . . . . . . . 11
4.4. Transport and Priority . . . . . . . . . . . . . . . . . 12
5. Feedback Report Format . . . . . . . . . . . . . . . . . . . 12
5.1. Report Structure . . . . . . . . . . . . . . . . . . . . 13
5.2. Object Entry . . . . . . . . . . . . . . . . . . . . . . 13
5.2.1. Object ID . . . . . . . . . . . . . . . . . . . . . . 13
5.2.2. Status . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.3. Receive Timestamp Delta . . . . . . . . . . . . . . . 14
5.3. Delivery Status Codes . . . . . . . . . . . . . . . . . . 14
Jiang, et al. Expires 18 September 2026 [Page 2]
Internet-Draft MoQ Multimodal Feedback March 2026
5.3.1. RECEIVED_LATE Determination . . . . . . . . . . . . . 15
5.3.2. NOT_RECEIVED Reporting Timing . . . . . . . . . . . . 15
5.3.3. PARTIALLY_RECEIVED . . . . . . . . . . . . . . . . . 15
5.4. Summary Stats Block . . . . . . . . . . . . . . . . . . . 16
5.4.1. Report Interval . . . . . . . . . . . . . . . . . . . 16
5.4.2. Total Objects Evaluated . . . . . . . . . . . . . . . 16
5.4.3. Objects Received . . . . . . . . . . . . . . . . . . 17
5.4.4. Objects Received Late . . . . . . . . . . . . . . . . 17
5.4.5. Objects Lost . . . . . . . . . . . . . . . . . . . . 17
5.4.6. Avg Inter-Arrival Delta . . . . . . . . . . . . . . . 17
5.5. Optional Media Metrics . . . . . . . . . . . . . . . . . 18
5.6. Report Size Control . . . . . . . . . . . . . . . . . . . 20
5.6.1. Encoding Example . . . . . . . . . . . . . . . . . . 21
6. Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1. Setup Parameter . . . . . . . . . . . . . . . . . . . . . 22
6.2. Capability Negotiation Rules . . . . . . . . . . . . . . 23
6.3. Behavior When Parameter Not Declared . . . . . . . . . . 23
6.4. Runtime Capability Change . . . . . . . . . . . . . . . . 24
7. Receiver Behavior . . . . . . . . . . . . . . . . . . . . . . 24
7.1. Arrival Time Recording . . . . . . . . . . . . . . . . . 24
7.2. Delivery Status Determination . . . . . . . . . . . . . . 24
7.3. MMF Generation Frequency . . . . . . . . . . . . . . . . 24
7.4. Object Entry Selection . . . . . . . . . . . . . . . . . 25
7.5. Exception Handling . . . . . . . . . . . . . . . . . . . 26
8. Sender Behavior . . . . . . . . . . . . . . . . . . . . . . . 26
8.1. Object to QUIC Packet Mapping . . . . . . . . . . . . . . 26
8.1.1. Object Granularity . . . . . . . . . . . . . . . . . 26
8.1.2. Packet Not Crossing Object Boundaries . . . . . . . . 26
8.1.3. Sender-Side per-Object Transmission Statistics . . . 27
8.1.4. Frame-Level Pacing . . . . . . . . . . . . . . . . . 28
8.2. Application-Layer Consumption . . . . . . . . . . . . . . 28
8.3. Transport-Layer Consumption (Cross-Layer Control) . . . . 29
8.4. Example: MoQ Layer Controlling BBR . . . . . . . . . . . 29
9. Application Scenarios: Streaming Media and AI Inference . . . 30
9.1. MMF-Driven Rate-Quality Adaptation . . . . . . . . . . . 30
9.2. Typical Use Cases . . . . . . . . . . . . . . . . . . . . 32
9.2.1. Use Case A: Video Live Streaming ABR Adaptation . . . 32
9.2.2. Use Case B: Bandwidth Drop (Audio-Video Mixed) . . . 32
9.2.3. Use Case C: Multi-Layer Quality Adaptation (AI
Inference) . . . . . . . . . . . . . . . . . . . . . 33
9.2.4. Use Case D: Generation Rate Overload (AI
Inference) . . . . . . . . . . . . . . . . . . . . . 33
9.2.5. Use Case E: Streaming Input Inference (Bidirectional
MMF) . . . . . . . . . . . . . . . . . . . . . . . . 33
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34
11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 34
12. Normative References . . . . . . . . . . . . . . . . . . . . 35
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35
Jiang, et al. Expires 18 September 2026 [Page 3]
Internet-Draft MoQ Multimodal Feedback March 2026
1. Introduction
Media over QUIC Transport (MOQT, [MoQTransport]) is a QUIC-based
publish/subscribe media transport framework. In low-latency
interactive scenarios, senders need to obtain media delivery quality
information from peer to adjust sending strategies. Adjustments
occur at two layers:
* *Application layer:* Encoders adjust bitrate/frame rate, inference
systems adjust generation rate, and ABR switches Tracks.
* *Transport layer:* Congestion control (CC) algorithms adjust cwnd/
pacing rate.
QUIC Transport layer feedback (QUIC-ACK, receive timestamps
[quic-receive-ts]) only covers the transport layer, leaving blind
spots at the MoQ semantic level (see Section 2 for details). This
document defines a MoQ-layer feedback mechanism that provides per-
Object media semantic feedback to the application layer. The MoQ
layer also serves as the control layer for CC algorithms,
synthesizing MMF signals and local transport state to issue control
commands such as pacing rate and pacing gain to CC (see Section 3.3
for details).
1.1. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
The following terms are used throughout this document:
* *MOQT:* Media over QUIC Transport, following [MoQTransport].
* *Object:* The smallest semantic unit of data delivery in MOQT.
* *Feedback Track:* A MoQ Track that carries MMF feedback.
* *MMF (MoQ Multimodal Feedback):* A receiver report at the MoQ
layer containing per-Object delivery status.
* *Cross-Layer Control Interface:* A mechanism for the MoQ layer to
issue control commands to CC algorithms; the specific form is
implementation- defined.
Jiang, et al. Expires 18 September 2026 [Page 4]
Internet-Draft MoQ Multimodal Feedback March 2026
2. Motivation
QUIC layer feedback mechanisms (ACK, Receive Timestamps) operate at
the packet level, leaving the following blind spots at the MoQ layer:
+==================+===============================================+
| Blind Spot | Description |
+==================+===============================================+
| Object Semantics | QUIC ACK confirms packets but cannot perceive |
| | frame integrity, type, or deadline |
+------------------+-----------------------------------------------+
| Frame-Level | QUIC ACK provides packet-level delay but |
| Timing | cannot provide inter-Object arrival timing |
+------------------+-----------------------------------------------+
| Playback | QUIC layer cannot know peer's playback buffer |
| Progress | level or application-layer consumption state |
+------------------+-----------------------------------------------+
Table 1
This document defines a MoQ-layer feedback mechanism (MMF) that
supplements Object-level semantic signals which are unavailable at
the QUIC layer. MMF serves both the application layer (per-Object
delivery status) and CC algorithms (aggregate statistics injection).
MMF enables senders to reduce sending quality before the peer's
playback buffer is depleted, rather than passively responding after
stuttering occurs.
3. Architecture
3.1. Three-Layer Architecture: Application / MoQ / Transport
This mechanism adopts a three-layer architecture: the application
layer serves as the media source, the transport layer implements CC
algorithms, and the MoQ layer resides in between, responsible for
semantic understanding and signal distribution.
3.1.1. Application Layer: Media Sources
MMF does not restrict the type of application-layer media sources.
Two typical media source types are:
Jiang, et al. Expires 18 September 2026 [Page 5]
Internet-Draft MoQ Multimodal Feedback March 2026
+==================+====================+===========================+
| Media Source | Characteristics | MMF-Driven Adjustments |
| Type | | |
+==================+====================+===========================+
| Traditional | Unidirectional | Unidirectional |
| Media | output, | adjustment: encoding |
| (encoder/camera/ | controllable | bitrate, frame rate, |
| live streaming) | frame rate/ | resolution, ABR Track |
| | bitrate | switching |
+------------------+--------------------+---------------------------+
| AI Inference | Bidirectional | Bidirectional adjustment: |
| Pipeline | interaction, | above general adjustments |
| (multimodal | tunable | + inference parameters |
| inference | generation | (chunk_size, flush |
| engine) | parameters | strategy) |
+------------------+--------------------+---------------------------+
Table 2
Both media source types share the same MMF format; the difference
lies only in the adjustable actions that can be taken after consuming
MMF.
3.1.2. MoQ Layer: Semantic Hub
The MoQ layer assumes two responsibilities in the three-layer
architecture:
* *Semantic Translation:* The MoQ Track/Object/Group model provides
feedback with frame-level semantics. MMF reports Object delivery
status (complete, expired, lost) rather than packet-level status.
* *Control Hub:* The MoQ layer synthesizes MMF feedback and local
transport state, driving application-layer adjustments upward via
callbacks (encoding bitrate, frame rate) and instructing CC to
adjust sending rate downward via the cross-layer control interface
(pacing_rate, pacing_gain). The MoQ layer can also directly
execute certain adjustments (ABR Track switching, Object
transmission frequency control) without requiring application-
layer media source cooperation.
The Feedback Track is a normal MoQ Track, at the same level as
audio/video/text Tracks. This mechanism introduces no new QUIC
frames or MOQT control messages.
Jiang, et al. Expires 18 September 2026 [Page 6]
Internet-Draft MoQ Multimodal Feedback March 2026
3.1.3. Transport Layer: CC Algorithms
CC algorithms (BBR, GCC, etc.) are responsible for congestion
detection and bandwidth estimation based on local QUIC ACK and
receive-ts. In real-time media scenarios, the MoQ layer issues
control commands (pacing_rate, pacing_gain) to CC via the cross-layer
control interface (Section 3.3); CC algorithms SHOULD execute these
commands. The MoQ layer makes decisions by synthesizing three
sources of information: MMF feedback, local CC output (BWE), and
frame-level statistics (Section 8.1), achieving higher information
completeness than CC algorithms alone. An integration example is
provided in Section 8.4.
3.2. Dual-Layer Feedback Model: QUIC receive-ts and MMF
CC algorithms rely on per-packet delay signals for bandwidth
estimation and congestion detection; this signal is provided by QUIC
receive-ts ([quic-receive-ts]). MMF operates at the MoQ layer,
forming a dual-layer feedback which could cooperate with QUIC
receive-ts. Both mechanisms can be enabled simultaneously in
implementations.
+==========+=============+============+=============+=============+
|Feedback | Granularity | Hop | CC Role | Application |
|Layer | | | | Layer Role |
+==========+=============+============+=============+=============+
|QUIC | per-packet | QUIC | Primary | None |
|receive-ts| (~us) | connection | signal | |
| | | layer | (delay | |
| | | | gradient, | |
| | | | BW est.) | |
+----------+-------------+------------+-------------+-------------+
|MoQ MMF | per-Object | Client/ | MoQ layer | Primary |
| | (~ms) | Server | synthesizes | signal |
| | | direct | and issues | (bitrate/ |
| | | | control | frame |
| | | | commands | rate/ABR/ |
| | | | | inference |
| | | | | parameter |
| | | | | adjustment) |
+----------+-------------+------------+-------------+-------------+
Table 3
The two layers cover different granularities:
Jiang, et al. Expires 18 September 2026 [Page 7]
Internet-Draft MoQ Multimodal Feedback March 2026
QUIC receive-ts provides per-packet inter-arrival delta, enabling CC
algorithms to perform delay-based congestion detection and bandwidth
estimation.
MMF supplements additional signals:
* *Per-Object Signals:* QUIC transport layer cannot determine
whether an Object is complete or arrived within deadline. MMF
provides per-Object status (RECEIVED_LATE, NOT_RECEIVED, etc.).
* *Application-Layer Metrics:* Playout headroom (PLAYOUT_AHEAD),
receiver-side bandwidth estimation, etc.
3.2.1. QUIC receive-ts Requirements
Implementations would still need to support both QUIC receive-ts
([quic-receive-ts]) and MMF simultaneously when both extensions are
negotiated.
QUIC receive-ts carries per-packet reception timestamps (Timestamp
Range / Timestamp Delta) via ACK_EXTENDED frames, enabling CC
algorithms to compute inter-arrival delta (delay gradient) and
bandwidth estimation. Its role is equivalent to TWCC in WebRTC:
* *QUIC receive-ts:* per-packet reception timestamps --> delay-based
CC (GCC/SQP, etc.)
* *MMF:* per-Object delivery status --> application-layer adaptation
+ MoQ-layer CC control
Compared to WebRTC GCC+TWCC, this framework adds a per-Object
semantic feedback layer on top of per-packet feedback. TWCC only
provides packet-level arrival times and cannot express Object
completeness, expiration status, or playout headroom.
3.3. Cross-Layer Control Interface
MoQ implementations could provide a cross-layer control interface
that enables the MoQ layer to issue control commands to CC
algorithms. The specific form of the interface is implementation-
defined.
Distinct from approaches that pass raw data to CC algorithms for
independent judgment, this mechanism could help the MoQ layer to
achieve better performance. The MoQ layer possesses three kinds of
information resource: MMF feedback, local CC output (BWE), and frame-
level statistics (Section 8.1), achieving higher information
completeness than CC algorithms alone.
Jiang, et al. Expires 18 September 2026 [Page 8]
Internet-Draft MoQ Multimodal Feedback March 2026
CC algorithms could still run their own congestion detection logic,
or adapted to accept control commands issued by the MoQ layer.
*Control Commands:*
+================+================================================+
| Command | Description |
+================+================================================+
| target_bitrate | Target encoding bitrate computed by MoQ layer |
| | from BWE and MMF; notifies CC of current |
| | application-layer sending budget |
+----------------+------------------------------------------------+
| pacing_gain | Sending gain coefficient; MoQ layer calculates |
| | from MMF signals (Object loss rate, expiration |
| | rate, playout headroom); CC may execute as |
| | pacing_rate = BWE x pacing_gain |
+----------------+------------------------------------------------+
| pacing_rate | Directly specify sending rate; MoQ layer |
| | calculates and issues in scenarios like frame- |
| | level pacing (Section 8.1.4) |
+----------------+------------------------------------------------+
Table 4
*MoQ Layer Decision Inputs:*
The MoQ layer synthesizes the following signals for control
decisions:
* *MMF Signals (from peer):* Object loss rate, expiration rate, Avg
Inter-Arrival Delta, PLAYOUT_AHEAD_MS
* *CC Output (from local):* BWE, RTT, loss rate
* *Frame-Level Statistics (from local, Section 8.1.3):* per-Object
transmission/loss/delivery duration
*CC Algorithm Behavior:*
CC algorithms SHOULD execute control commands after receiving them:
* *Upon pacing_gain:* Update sending rate as pacing_rate = BWE x
pacing_gain
* *Upon pacing_rate:* Use this value directly as sending rate
* *Upon target_bitrate:* Record current application-layer sending
budget for internal decision reference
Jiang, et al. Expires 18 September 2026 [Page 9]
Internet-Draft MoQ Multimodal Feedback March 2026
When no control commands are issued by the MoQ layer, CC algorithms
operate normally according to their own logic.
The cross-layer control interface is optional. CC algorithms that do
not support this interface can still operate independently but cannot
benefit from MMF-driven frame-level control capabilities.
3.4. Bidirectional MMF: Output Feedback and Input Feedback
The MMF report direction described in the preceding sections is from
client to server, reporting delivery quality of downstream media
(audio_response, text_response). In streaming input scenarios, the
server simultaneously subscribes to upstream media (audio_input,
video_input) and MAY establish a reverse feedback Track to report
upstream media delivery quality to the client.
Client Server | | |----- audio_input ------------->| Server
subscribes |<---- input_feedback (MMF) -----| Server reports input
delivery quality | | |<---- audio_response -----------| Client
subscribes |-- multimodal_feedback (MMF) -->| Client reports output
delivery quality | |
Output MMF (client-->server) and Input MMF (server-->client) use the
same report format (Section 5). Their purposes differ:
* *Output MMF:* Server adjusts encoding bitrate, inference
parameters, and CC based on it.
* *Input MMF:* Client adjusts upstream behavior based on it:
- *Audio Input:* Adjust chunk size, encoding bitrate,
transmission frequency
- *Video Input:* Adjust frame rate, resolution, pause/resume
Input MMF is optional. Support of the input MMF is declared via
bit-2 of the Setup Parameter bitmap.
4. Feedback Track
4.1. Track Definition
The Feedback Track is a normal MoQ Track where the Payload of each
Object is an MMF. Each MMF is published as an independent Object
with monotonically increasing Object ID within a Group.
Jiang, et al. Expires 18 September 2026 [Page 10]
Internet-Draft MoQ Multimodal Feedback March 2026
The Group division strategy for Feedback Track is implementation-
defined. It is RECOMMENDED to use a single Group (Group ID = 0) with
continuously incrementing Object IDs to simplify implementation.
Each Feedback Track MUST be associated 1:1 with a media Track. The
association is established during the PUBLISH phase (Section 4.3) and
is not repeated in MMF reports.
When feedback is needed for multiple media Tracks, independent
Feedback Tracks MUST be established separately.
4.2. Track Naming
Feedback Track naming SHOULD follow these conventions:
* *Namespace:* Same Namespace as the media Track being fed back.
* *Track Name:* multimodal-feedback/<media_track_name>.
Examples: multimodal-feedback/audio_response, multimodal-feedback/
video_response.
Media Track Names MUST NOT contain the / character to avoid parsing
ambiguity in Feedback Track Names.
When reverse feedback exists in a session (see Section 3.4), the
Input Feedback Track Name is input-feedback/<media_track_name>.
When a sender receives a PUBLISH request for a Feedback Track, it
MUST identify the associated media Track via the <media_track_name>
portion of the Track Name. If no matching established media Track is
found, it SHOULD respond with REQUEST_ERROR.
4.3. Track Establishment and Lifecycle
The Feedback Track is established by the feedback generator
(typically the media receiver) as Publisher through MOQT's PUBLISH /
PUBLISH_OK negotiation.
*Establishment Flow Example:*
Media Receiver Media Sender (Media Subscriber / (Media Publisher /
Feedback Publisher) Feedback Subscriber) | | |
SUBSCRIBE(track=audio_response) | (1) Subscribe media
Track |----------------------------------->| |
SUBSCRIBE_OK | |<-----------------------------------| | | |
PUBLISH(track=multimodal-feedback/ | (2) Publish Feedback Track |
audio_response, | (Role reversal: receiver |
Jiang, et al. Expires 18 September 2026 [Page 11]
Internet-Draft MoQ Multimodal Feedback March 2026
namespace=same_as_media) | becomes feedback
Publisher) |----------------------------------->| |
PUBLISH_OK | |<-----------------------------------| | | | [Object:
MMF seq=0] | (3) Send MMF |----------------------------------->| |
[Object: MMF seq=1] | |----------------------------------->| | ... |
*Lifecycle Rules:*
The Feedback Track lifecycle SHOULD align with the subscribed media
Track it covers. After the media Track publisher sends PUBLISH_DONE
for the media Track, the Feedback Track publisher (i.e., the media
receiver) SHOULD send PUBLISH_DONE for the Feedback Track after
transmitting the final MMF and stop publishing Objects.
When re-establishing a Feedback Track after connection interruption,
the Report Sequence SHOULD restart from 0.
4.4. Transport and Priority
The Feedback Track SHOULD be carried over QUIC Stream (consistent
with ordinary MoQ Objects).
The Subscriber Priority for Feedback Track SHOULD be set lower than
media Tracks ([MoQTransport], Section 7). Example priority
assignment:
+=====================+=====================+================+
| Track Type | Subscriber Priority | Description |
+=====================+=====================+================+
| audio_response | 0 (highest) | Audio media |
+---------------------+---------------------+----------------+
| video_response | 1 | Video media |
+---------------------+---------------------+----------------+
| multimodal-feedback | 3 | Feedback Track |
+---------------------+---------------------+----------------+
Table 5
When bandwidth contention occurs, media data SHOULD take precedence
over feedback data transmission.
5. Feedback Report Format
This section defines the binary encoding format of MMF. All integer
fields use QUIC Variable-Length Integer encoding (RFC 9000,
Section 16) unless otherwise specified.
Jiang, et al. Expires 18 September 2026 [Page 12]
Internet-Draft MoQ Multimodal Feedback March 2026
Fields marked as "signed encoding" use ZigZag mapping and are
transmitted as unsigned QUIC varint:
* *Encoding:* unsigned = (signed << 1) ^ (signed >> 63)
* *Decoding:* signed = (unsigned >>> 1) ^ -(unsigned & 1)
Mapping examples: 0 -> 0, -1 -> 1, 1 -> 2, -2 -> 3, 2 -> 4.
5.1. Report Structure
MoQ Multimodal Feedback { Report Timestamp (i), Report Sequence (i),
Object Entry Count (i), Object Entry (..) ..., Summary Stats (..),
Optional Metric Count (i), Optional Metric (..) ..., }
The MMF format version is negotiated via Setup Parameter
(Section 6.1) and is not repeated in each report. This document
defines version 0.
Each MMF reports Object delivery status for a single media Track
associated with its Feedback Track (1:1 association, see
Section 4.1).
*Report Timestamp (varint):* The moment when the report is generated,
using receiver's local monotonic clock value in microseconds. This
value is only used for Receive Timestamp Delta chain anchoring and
report ordering, not for cross-end time alignment. Monotonic clock
MUST be used.
*Report Sequence (varint):* Monotonically increasing from 0. Senders
detect feedback loss via sequence gaps (Section 10.1).
*Object Entry Count (varint):* Number of Object Entries that follow.
When the value is 0, the report contains only Summary Stats (for
heartbeat purposes).
5.2. Object Entry
Object Entry { Object ID (i), Status (i), [Receive Timestamp Delta
(i)], }
Object Entries within the same MMF MUST be sorted in ascending order
by Object ID.
5.2.1. Object ID
The Object ID (varint) within the media Track.
Jiang, et al. Expires 18 September 2026 [Page 13]
Internet-Draft MoQ Multimodal Feedback March 2026
5.2.2. Status
Delivery status code (varint), see Section 5.3 for values.
5.2.3. Receive Timestamp Delta
Conditional presence field (varint, signed encoding): Present only
when Status is RECEIVED (0x00) or RECEIVED_LATE (0x01). Encoding
rules:
* *First received Object within the same MMF* (i.e., the first entry
with Status RECEIVED or RECEIVED_LATE when iterating through the
list): Delta is the offset of the Object's arrival time relative
to Report Timestamp in microseconds, encoded as signed varint
(negative values indicate arrival time earlier than report
generation time).
* *Subsequent received Objects:* Delta is the offset of the Object's
arrival time relative to the arrival time of the most recent entry
with Status RECEIVED or RECEIVED_LATE that precedes it in the
list, encoded as signed varint.
NOT_RECEIVED and PARTIALLY_RECEIVED entries are skipped in the delta
chain.
Since Object Entries are sorted by Object ID in ascending order while
Objects may arrive out of order, delta values can be negative
(indicating the Object arrived earlier than the previous received
Object in the list). Negative delta from reordering is a valid
network state signal.
This Delta chain encoding approach is consistent with QUIC Receive
Timestamps (draft-ietf-quic-receive-ts-00), but at Object granularity
rather than packet. For small Objects (audio frames ~200B,
approximately 1 packet/Object), the Delta sequence approaches per-
packet precision.
5.3. Delivery Status Codes
+=======+====================+============================+
| Value | Name | Description |
+=======+====================+============================+
| 0x00 | RECEIVED | Completely received and |
| | | within delivery deadline |
+-------+--------------------+----------------------------+
| 0x01 | RECEIVED_LATE | Completely received but |
| | | exceeded delivery deadline |
+-------+--------------------+----------------------------+
Jiang, et al. Expires 18 September 2026 [Page 14]
Internet-Draft MoQ Multimodal Feedback March 2026
| 0x02 | NOT_RECEIVED | No bytes received at |
| | | report generation time |
+-------+--------------------+----------------------------+
| 0x03 | PARTIALLY_RECEIVED | Partial bytes received but |
| | | Object is incomplete |
+-------+--------------------+----------------------------+
Table 6
5.3.1. RECEIVED_LATE Determination
Receiver MUST determine Object expiration based on local playback
deadline:
An Object is RECEIVED_LATE when its arrival time exceeds its expected
playback moment.
When playback deadline is unavailable (e.g., non-real-time playback
scenarios), receiver SHOULD report RECEIVED rather than
RECEIVED_LATE.
5.3.2. NOT_RECEIVED Reporting Timing
Receiver SHOULD report an Object as NOT_RECEIVED when one of the
following conditions is met:
* A subsequent Object with larger Object ID has arrived, but this
Object has not.
* The Object's expected arrival time has exceeded 2 x
expected_interval (see Section 5.4.6) without arrival, and no
subsequent Object is available for reference.
Condition 2 covers the "last Object lost" scenario (no subsequent
Object to trigger Condition 1).
Receiver MUST NOT report NOT_RECEIVED for Objects that are not yet
reasonably expected to arrive.
5.3.3. PARTIALLY_RECEIVED
Applicable to Objects carried over QUIC Stream: When a Stream is
terminated by RESET_STREAM or closed due to timeout, Objects with
partially received data SHOULD be reported as PARTIALLY_RECEIVED.
The timeout threshold used to determine whether a partially received
Object should be reported as PARTIALLY_RECEIVED is application-
defined. Different applications may have different latency
Jiang, et al. Expires 18 September 2026 [Page 15]
Internet-Draft MoQ Multimodal Feedback March 2026
requirements (e.g., real-time voice vs. file transfer), and the
receiver's application layer is best positioned to decide when an
incomplete Object is no longer useful. Implementations SHOULD allow
the application layer to configure or signal this timeout value.
For Objects carried over QUIC Datagram: Since QUIC Datagrams are not
retransmitted by the transport layer, any Object whose Datagram(s)
are lost results in incomplete data at the receiver. If the receiver
detects that some but not all Datagrams constituting an Object have
arrived, and the remaining Datagrams are not expected to arrive
(e.g., a subsequent Object has arrived or the application-defined
timeout has expired), the Object SHOULD be reported as
PARTIALLY_RECEIVED. If no Datagrams for the Object have arrived, the
Object SHOULD be reported as NOT_RECEIVED instead.
5.4. Summary Stats Block
Summary Stats { Report Interval (i), Total Objects Evaluated (i),
Objects Received (i), Objects Received Late (i), Objects Lost (i),
Avg Inter-Arrival Delta (i), }
Summary Stats MUST always be included in every MMF, not controlled by
negotiation bitmap. It provides windowed aggregate information,
enabling lightweight CC consumers that do not parse Object Entry to
obtain effective signals. The MoQ layer can compute control
decisions based on this block and issue them to CC algorithms via the
cross-layer control interface (Section 3.3).
The Report Interval window of Summary Stats is independent from the
coverage of Object Entries. Object Entries MAY include Objects
outside the Report Interval window (e.g., continuously reported
NOT_RECEIVED entries, see Section 7.4).
5.4.1. Report Interval
The time window length covered by this report (varint), in
microseconds. This window spans from Report Timestamp - Report
Interval to Report Timestamp. RECOMMENDED value is 50000-200000
(50-200ms).
5.4.2. Total Objects Evaluated
Total number of Objects (varint) evaluated within the Report Interval
window. Includes Objects of all statuses. MUST equal Objects
Received + Objects Received Late + Objects Lost.
Jiang, et al. Expires 18 September 2026 [Page 16]
Internet-Draft MoQ Multimodal Feedback March 2026
5.4.3. Objects Received
Number of Objects (varint) with status RECEIVED within the window.
5.4.4. Objects Received Late
Number of Objects (varint) with status RECEIVED_LATE within the
window.
5.4.5. Objects Lost
Number of Objects (varint) with status NOT_RECEIVED or
PARTIALLY_RECEIVED within the window.
5.4.6. Avg Inter-Arrival Delta
Average arrival interval deviation (varint, signed encoding) of
consecutive received Object pairs within the window, in microseconds.
Calculation method:
For consecutive received Object pairs (i-1, i) sorted by arrival time
within the window:
delta(i) = (A(i) - A(i-1)) - expected_interval Avg Inter-Arrival
Delta = mean(delta(i)) for all i
Where A(i) is the arrival time of Object i, and expected_interval is
the expected arrival interval.
Methods to determine expected_interval (by priority):
1. Known media frame rate (e.g., 50 obj/s --> 20000us).
2. Historical average arrival interval within current session
(sliding window).
If neither method is available, receiver SHOULD use method 2 or set
this field to 0.
Positive values indicate Object arrival interval greater than
expected (increased queuing), negative values indicate less than
expected. When fewer than 2 received Objects exist in the window,
this field MUST be 0.
Jiang, et al. Expires 18 September 2026 [Page 17]
Internet-Draft MoQ Multimodal Feedback March 2026
5.5. Optional Media Metrics
Optional Media Metrics immediately follow Summary Stats. Optional
Metric Count (varint) specifies the number of subsequent Optional
Metrics; a value of 0 indicates no optional metrics are included.
Each metric uses Key-Value-Pair encoding (draft-ietf-moq-transport-
17, Section 1.4.2):
Optional Metric { Metric Type (i), Metric Value (i), }
Defined metric types are divided into two categories: Application-
Layer Metrics and QUIC Layer Summary Metrics.
*Application-Layer Metrics:*
Jiang, et al. Expires 18 September 2026 [Page 18]
Internet-Draft MoQ Multimodal Feedback March 2026
+======+==========================+==============+==================+
| Type | Name | Unit | Description |
+======+==========================+==============+==================+
| 0x02 | PLAYOUT_AHEAD_MS | milliseconds | Remaining time |
| | | | until playback |
| | | | stall at |
| | | | receiver, i.e., |
| | | | buffered but |
| | | | not yet played |
| | | | media duration. |
| | | | Smaller values |
| | | | indicate closer |
| | | | to stall (0 = |
| | | | currently |
| | | | stalled). |
+------+--------------------------+--------------+------------------+
| 0x04 | ESTIMATED_BANDWIDTH_KBPS | kbps | Available |
| | | | bandwidth |
| | | | estimate |
| | | | observed at |
| | | | receiver. Can |
| | | | be calculated |
| | | | as bytes |
| | | | received in |
| | | | window / window |
| | | | duration. For |
| | | | sender to |
| | | | cross-reference |
| | | | with local |
| | | | bandwidth |
| | | | estimation. |
+------+--------------------------+--------------+------------------+
Table 7
*QUIC Layer Summary Metrics:*
The following metrics expose the receiver's local QUIC connection
transport state to the sender for CC algorithm cross-validation.
Jiang, et al. Expires 18 September 2026 [Page 19]
Internet-Draft MoQ Multimodal Feedback March 2026
+======+================+==============+=======================+
| Type | Name | Unit | Description |
+======+================+==============+=======================+
| 0x10 | PEER_RTT_US | microseconds | Receiver's local QUIC |
| | | | connection smoothed |
| | | | RTT, corresponding to |
| | | | smoothed_rtt in RFC |
| | | | 9002. For sender to |
| | | | cross-validate with |
| | | | local RTT estimation. |
+------+----------------+--------------+-----------------------+
| 0x12 | PEER_LOSS_RATE | per mille | Receiver's local QUIC |
| | | | connection packet |
| | | | loss rate within |
| | | | Report Interval, |
| | | | expressed in per |
| | | | mille (e.g., 50 = |
| | | | 5.0%). |
+------+----------------+--------------+-----------------------+
Table 8
MMF's core CC signals rely on Receive Timestamp Delta in Object Entry
(Section 5.2.3, referencing QUIC receive-ts / WebRTC TWCC per-packet
delta encoding approach) and Summary Stats (Section 5.4), not
Optional Metrics. Optional Metrics serve only as supplements.
Type values 0x00-0x1f are reserved for this specification. Values
0x20 and above are available for application-layer custom use.
Receiver MUST ignore unrecognized Metric Types.
Optional Media Metrics MAY be included in MMF only when both parties
have declared bit1=1 in Setup negotiation (Section 6). When not
negotiated or bit1=0, Optional Metric Count MUST be 0.
5.6. Report Size Control
A single MMF is RECOMMENDED not to exceed 1200 bytes to avoid QUIC
packet fragmentation.
When the number of Objects to report exceeds the capacity of a single
MMF, receiver SHOULD:
* Prioritize including recent Object Entries (largest Object IDs).
* Trim the oldest Object Entries.
Jiang, et al. Expires 18 September 2026 [Page 20]
Internet-Draft MoQ Multimodal Feedback March 2026
* Ensure Summary Stats covers the complete Report Interval window
(Summary Stats is not affected by Object Entry trimming).
5.6.1. Encoding Example
The following is an encoding structure of a typical MMF reporting
delivery status of the 5 most recent Objects on the audio_response
Track. Object Entries are sorted in ascending order by Object ID.
Signed fields use ZigZag-encoded unsigned values (encoding convention
at the beginning of Section 5).
``` MoQ Multimodal Feedback: Report Timestamp: 2000000 (2000000us =
2s since setup) Report Sequence: 10 (10th report) Object Entry Count:
5 (5 Objects)
Object Entry [0]: (smallest Object ID, first in list) Object ID: 96
Status: 0x00 (RECEIVED) Recv Ts Delta: 169999 (ZigZag(-85000): 85ms
before Report Timestamp) (First received Object, baseline=Report
Timestamp)
Object Entry [1]: Object ID: 97 Status: 0x02 (NOT_RECEIVED) (No Recv
Ts Delta)
Object Entry [2]: Object ID: 98 Status: 0x01 (RECEIVED_LATE) Recv Ts
Delta: 100000 (ZigZag(+50000): 50ms later than Object 96) (Skip
NOT_RECEIVED 97, baseline=Object 96)
Object Entry [3]: Object ID: 99 Status: 0x00 (RECEIVED) Recv Ts
Delta: 40000 (ZigZag(+20000): 20ms later than Object 98)
Object Entry [4]: Object ID: 100 Status: 0x00 (RECEIVED) Recv Ts
Delta: 40000 (ZigZag(+20000): 20ms later than Object 99)
Summary Stats: Report Interval: 100000 (100000us = 100ms) Total
Objects Evaluated: 5 Objects Received: 3 Objects Received Late: 1
Objects Lost: 1 Avg Inter-Arrival Delta: 6000 (ZigZag(+3000): avg
arrival interval 3ms larger)
Optional Metric Count: 2 Optional Metric [0]: Metric Type: 0x02
(PLAYOUT_AHEAD_MS) Metric Value: 150 (playout headroom 150ms)
Optional Metric [1]: Metric Type: 0x04 (ESTIMATED_BANDWIDTH_KBPS)
Metric Value: 800 (estimated bandwidth 800kbps) ```
Jiang, et al. Expires 18 September 2026 [Page 21]
Internet-Draft MoQ Multimodal Feedback March 2026
In this example, Object 97 is lost and Object 98 arrived late.
Object 98's delta (+50ms) is significantly larger than normal
interval (~20ms). Avg Inter-Arrival Delta is positive (+3ms)
indicating larger-than-expected arrival intervals. PLAYOUT_AHEAD_MS
is only 150ms. These signals combined indicate deteriorating network
conditions.
6. Negotiation
6.1. Setup Parameter
During MOQT Setup phase, both parties declare Multimodal Feedback
capability via Setup Parameter.
MOQT_MULTIMODAL_FEEDBACK Setup Parameter { Type = TBD1 (i), Length
(i), Value (i), }
The Value field is a capability bitmap (varint) with the following
bit definitions:
+======+==================+========================================+
| Bit | Name | Description |
+======+==================+========================================+
| 0 | OUTPUT_FEEDBACK | Support output direction Feedback |
| | | Track (receiver-->sender) |
+------+------------------+----------------------------------------+
| 1 | OPTIONAL_METRICS | Support Optional Media Metrics |
| | | (Section 5.5) |
+------+------------------+----------------------------------------+
| 2 | INPUT_FEEDBACK | Support input direction Feedback Track |
| | | (sender-->receiver, Section 3.4) |
+------+------------------+----------------------------------------+
| 3-62 | Reserved | Sender MUST set to 0, receiver MUST |
| | | ignore |
+------+------------------+----------------------------------------+
Table 9
*Negotiation Example:*
Jiang, et al. Expires 18 September 2026 [Page 22]
Internet-Draft MoQ Multimodal Feedback March 2026
Client Server | | | CLIENT_SETUP( | | version=16, | | params=[ | |
{type=0x00, value=0x02}, | (ROLE=Publisher) | {type=TBD1,
value=0x03} | (MULTIMODAL_FEEDBACK: bit0|1=1) |
]) | |----------------------------------->| | | | SERVER_SETUP( | |
version=16, | | params=[ | | {type=0x00, value=0x03}, |
(ROLE=PubSub) | {type=TBD1, value=0x01} | (MULTIMODAL_FEEDBACK:
bit0=1) | ]) | |<-----------------------------------| | | |
Negotiation Result: | | bit0=1 (Output Feedback) | | bit1=0 (Client
declared but | | Server did not, | | Optional Metrics disabled) |
6.2. Capability Negotiation Rules
Feature enable conditions (both parties MUST declare corresponding
bit as 1):
+========================+==================+=================+
| Feature | Enable Condition | Dependency |
+========================+==================+=================+
| Output Feedback | Both bit0=1 | None |
+------------------------+------------------+-----------------+
| Optional Media Metrics | Both bit1=1 | Output Feedback |
| | | enabled |
+------------------------+------------------+-----------------+
| Input Feedback | Both bit2=1 | None |
+------------------------+------------------+-----------------+
Table 10
When a feature is not enabled:
* Receiver MUST NOT publish Feedback Track in corresponding
direction.
* When sender receives PUBLISH request for Feedback Track in un-
negotiated direction, it SHOULD respond with REQUEST_ERROR (draft-
ietf-moq-transport-17, Section 9.8).
* MMF MUST NOT include un-negotiated optional fields (e.g., Optional
Metrics).
6.3. Behavior When Parameter Not Declared
When peer's Setup does not include MOQT_MULTIMODAL_FEEDBACK
Parameter, it is equivalent to Value=0 (no Multimodal Feedback
capability supported).
This end MUST NOT proactively establish Feedback Track.
Jiang, et al. Expires 18 September 2026 [Page 23]
Internet-Draft MoQ Multimodal Feedback March 2026
6.4. Runtime Capability Change
This version does not support runtime changes to Multimodal Feedback
capability. If change is needed, MoQ session MUST be re-established.
7. Receiver Behavior
7.1. Arrival Time Recording
Receiver MUST record the arrival time of each Object. Arrival time
is defined as the moment when the last byte of the Object arrives at
the receiver's MoQ layer. Implementation MUST use monotonic clock,
unaffected by system time adjustments. Time precision SHOULD be no
less than 1 millisecond, RECOMMENDED to be microsecond-level.
7.2. Delivery Status Determination
Receiver MUST maintain delivery status for each known Object:
* Last byte of Object arrives and within deadline: RECEIVED (0x00).
* Last byte of Object arrives but exceeds deadline: RECEIVED_LATE
(0x01).
* Object has not arrived but is reasonably considered lost (see
Section 5.3): NOT_RECEIVED (0x02).
* Object partially arrived and carrying Stream is closed:
PARTIALLY_RECEIVED (0x03).
Object status MAY be updated in subsequent MMF. For example, from
NOT_RECEIVED to RECEIVED (when a delayed Object is eventually
received). Therefore, counts such as Objects Lost in Summary Stats
reflect an observation snapshot at report generation time and are not
guaranteed to be consistent with final statistics. Sender SHOULD use
them as immediate signals rather than precise statistics.
7.3. MMF Generation Frequency
Receiver SHOULD generate MMF at the following recommended
frequencies:
Jiang, et al. Expires 18 September 2026 [Page 24]
Internet-Draft MoQ Multimodal Feedback March 2026
+==================+=============+==================================+
| Scenario | Recommended | Description |
| | Frequency | |
+==================+=============+==================================+
| Audio Track (~50 | Every | High-frequency Objects |
| Object/s) | 50-100ms | require dense feedback |
+------------------+-------------+----------------------------------+
| Video Track | Every | Low-frequency Objects can |
| (~2-30 Object/s) | 100-200ms | reduce feedback frequency |
+------------------+-------------+----------------------------------+
| No new Objects | Every | Heartbeat to prevent |
| arriving | 500ms-1s | sender from misjudging |
| | | connection state |
+------------------+-------------+----------------------------------+
Table 11
Generation frequency SHOULD NOT exceed once per 50ms (to avoid
feedback itself consuming excessive bandwidth). Generation frequency
SHOULD NOT be lower than once per 2s (to ensure sender receives
timely feedback).
When receiver detects rapid deterioration in delivery quality (e.g.,
consecutive Object losses), it MAY immediately generate an additional
MMF (without waiting for the scheduled cycle) to accelerate sender
response.
7.4. Object Entry Selection
When generating MMF, receiver SHOULD follow these Object Entry
selection strategies:
* Prioritize covering recent Objects (most recent in time).
* Total number of Object Entries per MMF is RECOMMENDED not to
exceed 50.
* For Objects already reported in previous MMF with unchanged
status, they MAY be omitted.
* Objects with status NOT_RECEIVED SHOULD be continuously reported
(for at least 3 MMF cycles) until status changes or exceeding the
report window.
Jiang, et al. Expires 18 September 2026 [Page 25]
Internet-Draft MoQ Multimodal Feedback March 2026
7.5. Exception Handling
* *Object Out-of-Order Arrival:* Receiver MUST record by actual
arrival time without reordering. Arrival time reflects actual
network behavior; out-of-order itself is a valid network signal.
* *Duplicate Objects:* Receiver SHOULD ignore duplicate arrivals,
retaining the first arrival time.
* *Feedback Track Publish Failure:* Receiver SHOULD retry
establishing Feedback Track with backoff strategy. Retry interval
is RECOMMENDED to use exponential backoff (initial 1s, maximum
30s).
8. Sender Behavior
8.1. Object to QUIC Packet Mapping
If a sender needs to correlate MMF feedback with local transmission
events, it MUST maintain the mapping relationship between Objects and
QUIC packets. Without this mapping, the sender cannot determine the
number of packets, packet loss count, and delivery duration for a
single video frame, and the per-Object status reported by MMF cannot
be aligned with sender-side statistics.
8.1.1. Object Granularity
In real-time media scenarios, a MoQ Object SHOULD correspond to an
independently decodable media unit (a video frame or an audio frame).
This enables per-Object feedback in MMF to directly carry frame-level
semantics: Object loss = frame loss, Object expiration = frame
expiration. If an Object contains multiple frames or a single frame
spans multiple Objects, there will be deviation between the delivery
status reported by MMF and the actual media quality.
8.1.2. Packet Not Crossing Object Boundaries
Senders SHOULD avoid merging data from different Objects into the
same QUIC packet.
If a packet contains data from two Objects, then loss or delay of
that packet cannot be attributed to a single Object, leading to:
* Frame-level packet loss rate statistics distortion (single packet
loss affects statistics for both frames)
Jiang, et al. Expires 18 September 2026 [Page 26]
Internet-Draft MoQ Multimodal Feedback March 2026
* Frame-level delivery time cannot be accurately measured
(transmission times of two frames overlap)
* Per-Object arrival time at the receiver becomes inaccurate
During implementation, when the QUIC send queue attempts to append
new data to an existing packet, it SHOULD check whether both belong
to the same Object; if not, it SHOULD create a new packet.
8.1.3. Sender-Side per-Object Transmission Statistics
Senders SHOULD maintain the following transmission statistics for
each Object:
+=================+=============================================+
| Statistic | Description |
+=================+=============================================+
| sent_packets | Number of QUIC packets sent for this Object |
+-----------------+---------------------------------------------+
| lost_packets | Number of QUIC packets lost for this Object |
+-----------------+---------------------------------------------+
| first_sent_time | Transmission time of the first packet for |
| | this Object |
+-----------------+---------------------------------------------+
| all_acked_time | Time when all packets for this Object are |
| | acknowledged |
+-----------------+---------------------------------------------+
Table 12
Based on these statistics, the sender can compute:
* *Frame-level bandwidth sample:* object_size / (all_acked_time -
first_sent_time)
* *Frame-level packet loss rate:* lost_packets / sent_packets
* *Frame-level delivery duration:* all_acked_time - first_sent_time
These metrics provide the foundation for frame-level BWE and frame-
level pacing (Section 8.1.4). Standard CC algorithms (BBR, CUBIC)
sample bandwidth at the packet level. Frame-level bandwidth sampling
serves as a complementary approach, suitable for real-time video
scenarios with large frame size variations (I-frames may be 10x
larger than P-frames), helping to reduce noise from single-packet
sampling.
Jiang, et al. Expires 18 September 2026 [Page 27]
Internet-Draft MoQ Multimodal Feedback March 2026
Whether a CC algorithm provides BWE output depends on the algorithm
type. Model-based algorithms such as BBR and GCC provide bandwidth
estimation; pure loss-based algorithms like CUBIC only output cwnd
without explicit BWE. When the CC algorithm does not provide BWE,
the MoQ layer MAY use frame-level bandwidth sampling as the bandwidth
estimation source.
8.1.4. Frame-Level Pacing
Real-time video exhibits significant frame size variations. When
using a global fixed pacing rate for transmission, large frames
(I-frames) will burst a large number of packets in a short time,
while small frames (P-frames) underutilize the sending window.
Senders SHOULD calculate pacing intervals at Object granularity.
The packet sending interval for each Object is RECOMMENDED to be
computed as follows:
pkt_send_interval = media_pace_duration / (sent_packets - 1)
Where media_pace_duration is the media duration corresponding to this
Object (e.g., 33ms@30fps for video, 20ms@50fps for audio). The first
packet of an Object is sent immediately, and subsequent packets are
sent at equal intervals of pkt_send_interval.
This approach distributes an Object's data uniformly across its frame
interval. I-frames have denser packet intervals, P-frames have
sparser intervals, but neither produces bursts.
Frame-level pacing may conflict with CC's pacing rate. When they are
inconsistent, senders SHOULD use the lower sending rate to prevent
frame-level pacing from bypassing CC's congestion control.
8.2. Application-Layer Consumption
After the sender's MoQ layer receives MMF, it exposes the following
information to the inference scheduler/ABR through application-layer
callbacks:
* Delivery quality (loss rate, expiration rate) of the associated
media Track
* Per-Object status and timestamps (can be used for detailed
analysis)
* Optional media metrics (playout headroom, bandwidth estimation,
etc.)
Jiang, et al. Expires 18 September 2026 [Page 28]
Internet-Draft MoQ Multimodal Feedback March 2026
The inference scheduler makes decisions based on the above
information (see Section 9 for details).
8.3. Transport-Layer Consumption (Cross-Layer Control)
The MoQ layer synthesizes MMF signals and local CC output, computes
control decisions, and issues them to CC algorithms via the cross-
layer control interface (Section 3.3). CC algorithms do not directly
parse MMF, but instead execute control commands from the MoQ layer.
Decision inputs, issuable commands, and CC behavior are described in
Section 3.3.
8.4. Example: MoQ Layer Controlling BBR
The core formula of BBR is pacing_rate = bandwidth x pacing_gain.
BBR itself estimates bandwidth through ACK sampling and controls
pacing_gain via its state machine. In cross-layer control mode, the
MoQ layer takes over control of pacing_gain. BBR remains responsible
for bandwidth estimation, while pacing_gain is determined by the MoQ
layer based on MMF feedback:
``` MoQ Layer: 1. Read BBR's BWE 2. Read MMF: Objects Lost, Late,
PLAYOUT_AHEAD_MS, Delta 3. Compute pacing_gain (comprehensive
judgment) 4. Issue pacing_gain --> CC
BBR: 1. Receive pacing_gain 2. pacing_rate = BWE x pacing_gain 3.
Send at pacing_rate ```
Example logic for MoQ layer computing pacing_gain:
1. *Objects Lost > 0 and BBR local loss = 0* --> Reduce pacing_gain
to 1.0 (CC local ACK is normal, but peer is actually losing
frames; stop probing upward)
2. *PLAYOUT_AHEAD_MS < 100ms* --> Reduce pacing_gain to 1.0
(Insufficient playout headroom; avoid aggressive probing)
3. *Avg Inter-Arrival Delta consistently positive* --> Reduce
pacing_gain to 0.9 (Receiver-side queuing is worsening;
proactively reduce speed)
4. *High proportion of Objects Received Late* --> Reduce
target_bitrate (Transmission has no packet loss but delay exceeds
deadline; reduce per-frame data volume at the source)
5. *None of the above conditions met* --> Do not issue commands; BBR
runs according to its own state machine
Jiang, et al. Expires 18 September 2026 [Page 29]
Internet-Draft MoQ Multimodal Feedback March 2026
When the MoQ layer does not issue commands, BBR operates normally
according to its own state machine. The MoQ layer overrides
pacing_gain only when MMF signals indicate intervention is needed;
otherwise, BBR operates fully autonomously.
9. Application Scenarios: Streaming Media and AI Inference
This section describes the usage of MMF in streaming media and AI
inference scenarios. Section 9.1 presents a general adaptation
framework, and Section 9.2 illustrates with concrete use cases.
9.1. MMF-Driven Rate-Quality Adaptation
The following adaptation rules apply to all MoQ streaming media
scenarios:
Jiang, et al. Expires 18 September 2026 [Page 30]
Internet-Draft MoQ Multimodal Feedback March 2026
+==================+===============+==============+==============+
| MMF Field | Adaptation | Effect | Applicable |
| (Section) | Action | | Scenarios |
+==================+===============+==============+==============+
| PLAYOUT_AHEAD_MS | Reduce | Less data | Video/Audio/ |
| downward trend | encoding | per frame | Inference |
| (5.5) | bitrate | | |
+------------------+---------------+--------------+--------------+
| PLAYOUT_AHEAD_MS | Reduce frame | Reduced | Video |
| downward trend | rate / Object | transmission | |
| (5.5) | sending | volume | |
| | frequency | | |
+------------------+---------------+--------------+--------------+
| Objects Lost > 0 | Reduce | Match | All |
| (5.4.5) | sending rate | available | |
| | (with CC) | bandwidth | |
+------------------+---------------+--------------+--------------+
| Objects Received | Reduce | Trade | All |
| Late > 0 (5.4.4) | quality to | quality for | |
| | meet deadline | timeliness | |
+------------------+---------------+--------------+--------------+
| Avg Inter- | Preventive | Avoid sudden | All |
| Arrival Delta | quality | degradation | |
| increasing | reduction | | |
| (5.4.6) | (before | | |
| | packet loss) | | |
+------------------+---------------+--------------+--------------+
| PLAYOUT_AHEAD_MS | Gradually | Improve | All |
| recovery (5.5) | restore | experience | |
| | bitrate/frame | | |
| | rate | | |
+------------------+---------------+--------------+--------------+
Table 13
These adaptations can be executed at the MoQ publishing layer with
per-Object granularity and latency < 1 frame cycle, without requiring
application-layer media source cooperation. When the application
layer is an AI inference pipeline, MMF can also adjust inference
pipeline parameters (real-time flush, dynamic chunk_size).
Unidirectional and bidirectional adaptation modes are described in
Section 3.4 (Bidirectional MMF).
Jiang, et al. Expires 18 September 2026 [Page 31]
Internet-Draft MoQ Multimodal Feedback March 2026
9.2. Typical Use Cases
The following use cases assume BBR as the CC algorithm (integration
method described in Section 8.4) and illustrate the effects of MMF in
different scenarios. All cases follow the strategy of prioritizing
audio output and progressively reducing quality.
9.2.1. Use Case A: Video Live Streaming ABR Adaptation
* *Scenario:* Sender publishes video live streaming with two Tracks:
1080p (3Mbps) and 720p (1.5Mbps). Receiver is currently
subscribed to the 1080p Track.
* *Trigger condition:* Available network bandwidth drops from 4Mbps
to 2Mbps.
* *MMF signals:* Proportion of Objects Received Late increases
(video frames arrive but exceed deadline), PLAYOUT_AHEAD_MS
gradually decreases.
* *CC response:* BBR reduces pacing_rate to match new bandwidth.
* *Application-layer response:* MoQ publishing layer detects
persistently high Objects Received Late on the 1080p Track,
triggers ABR switching: switches to 720p Track at the next Group
boundary (native MoQ Track switching mechanism).
* *Recovery:* After MMF reports Objects Received Late returning to
normal and PLAYOUT_AHEAD_MS recovering, sender may attempt to
switch back to 1080p.
* *Benefit:* Per-frame integrity and deadline information provided
by MMF enables higher ABR decision accuracy than pure CC-driven
approaches.
9.2.2. Use Case B: Bandwidth Drop (Audio-Video Mixed)
* *Trigger condition:* Available bandwidth drops sharply.
* *MMF signals:* Objects Lost increases, Avg Inter-Arrival Delta
increases, PLAYOUT_AHEAD_MS decreases.
* *CC response:* MoQ layer issues pacing_gain=1.0 based on MMF loss
signal, pauses upward probing.
* *Application-layer response:* Reduce Opus bitrate, pause video
Track (prioritize audio).
Jiang, et al. Expires 18 September 2026 [Page 32]
Internet-Draft MoQ Multimodal Feedback March 2026
* *Benefit:* Peer-side frame loss signal from MMF and congestion
signal from BBR local ACK can cross-validate. Gradually restore
bitrate after network recovery.
9.2.3. Use Case C: Multi-Layer Quality Adaptation (AI Inference)
* *Trigger condition:* Persistent congestion.
* *MMF signals:* Objects Lost persistently high, PLAYOUT_AHEAD_MS at
low level.
* *CC response:* MoQ layer continuously issues low pacing_gain,
pauses upward probing.
* *Application-layer response:* Based on general adaptation from Use
Case B, inference pipeline reduces chunk_size, accelerates audio
flush, reduces output latency. CC and application layer adjust
synchronously.
* *Benefit:* Per-frame level adaptation takes effect within frame
cycle; audio remains uninterrupted throughout. During recovery,
gradually restore chunk_size and bitrate.
9.2.4. Use Case D: Generation Rate Overload (AI Inference)
* *Trigger condition:* Inference model suddenly generates long
response, audio_response traffic spikes.
* *MMF signals:* High proportion of Objects Received Late (frames
arrive but exceed playback deadline), Avg Inter-Arrival Delta is
large.
* *CC response:* Reduce pacing_rate.
* *Application-layer response:* Reduce encoding bitrate, accelerate
flush, guide inference to generate more concise responses (reduce
audio duration at the source).
* *Benefit:* Trade quality for timeliness, avoid playback
stuttering.
9.2.5. Use Case E: Streaming Input Inference (Bidirectional MMF)
* *Trigger condition:* Uplink network jitter (streaming input
inference scenario).
* *MMF signals:* Input MMF reports high proportion of NOT_RECEIVED.
Jiang, et al. Expires 18 September 2026 [Page 33]
Internet-Draft MoQ Multimodal Feedback March 2026
* *Application-layer response:*
- *Client:* Increase chunk size, reduce uplink bitrate, pause
video input.
- *Server:* Manage KV cache, adjust handling strategy for
incomplete input.
* *Benefit:* Bidirectional MMF coordinates quality of both uplink
and downlink paths. MMF session activity signals (feedback
frequency, PLAYOUT_AHEAD_MS) can be used for KV cache eviction and
priority decisions.
10. IANA Considerations
This document defines the following code points for registration:
* *TBD1:* MOQT_MULTIMODAL_FEEDBACK Setup Parameter Type
(Section 6.1)
* *Delivery Status Code Registry* (Section 5.3):
- 0x00: RECEIVED
- 0x01: RECEIVED_LATE
- 0x02: NOT_RECEIVED
- 0x03: PARTIALLY_RECEIVED
* *Optional Metrics Type Registry* (Section 5.5):
- 0x02: PLAYOUT_AHEAD_MS
- 0x04: ESTIMATED_BANDWIDTH_KBPS
- 0x10: PEER_RTT_US
- 0x12: PEER_LOSS_RATE
11. Acknowledgments
The design of this document references QUIC Extended ACK Receive
Timestamps, RTP Transport Congestion Control Feedback, and drafts
from the MoQ Working Group (MOQT / MSF / Metrics), as well as real-
time multimodal inference systems (LongCat-Flash-Omni, Qwen3-Omni,
Voxtral-Realtime), the vLLM-Omni Stage Pipeline framework, and vLLM
streaming input mode.
Jiang, et al. Expires 18 September 2026 [Page 34]
Internet-Draft MoQ Multimodal Feedback March 2026
12. Normative References
[MoQTransport]
Nandakumar, S., Vasiliev, V., Swett, I., and A. Frindell,
"Media over QUIC Transport", Work in Progress, Internet-
Draft, draft-ietf-moq-transport-17, 2 March 2026,
<https://datatracker.ietf.org/doc/html/draft-ietf-moq-
transport-17>.
[quic-receive-ts]
Swett, I. and J. Beshay, "QUIC Extended Acknowledgement
for Reporting Packet Receive Timestamps", Work in
Progress, Internet-Draft, draft-ietf-quic-receive-ts-01, 2
March 2026, <https://datatracker.ietf.org/doc/html/draft-
ietf-quic-receive-ts-01>.
[QUIC-TRANSPORT]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", RFC 9000,
DOI 10.17487/RFC9000, May 2021,
<https://www.rfc-editor.org/rfc/rfc9000>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.
Authors' Addresses
Minghui Jiang
Alibaba Inc.
Email: shimei.jmh@alibaba-inc.com
Yanmei Liu
Alibaba Inc.
Email: miaoji.lym@alibaba-inc.com
Ronghua Wu
Ant Group.
Email: r.wu@antgroup.com
Jiang, et al. Expires 18 September 2026 [Page 35]