Warp - Segmented Live Video Transport
draft-lcurley-warp-00
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Author | Luke Curley | ||
| Last updated | 2022-02-09 | ||
| Stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | plain text html xml htmlized pdfized bibtex | ||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-lcurley-warp-00
Independent Submission L. Curley
Internet-Draft Twitch
Intended status: Informational 9 February 2022
Expires: 13 August 2022
Warp - Segmented Live Video Transport
draft-lcurley-warp-00
Abstract
This document defines the core behavior for Warp, a segmented live
video transport protocol. Warp maps live media to QUIC streams based
on the underlying media encoding. Media is prioritized to minimize
latency during congestion.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 13 August 2022.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Curley Expires 13 August 2022 [Page 1]
Internet-Draft WARP February 2022
Table of Contents
1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Terms and Definitions . . . . . . . . . . . . . . . . . . 3
2. Segments . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Initialization . . . . . . . . . . . . . . . . . . . . . 3
2.2. Media . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1. Video . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2. Audio . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Messages . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Segments . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3. Prioritization . . . . . . . . . . . . . . . . . . . . . 5
3.3.1. Live Content . . . . . . . . . . . . . . . . . . . . 6
3.3.2. Recorded Content . . . . . . . . . . . . . . . . . . 6
3.4. Cancellation . . . . . . . . . . . . . . . . . . . . . . 6
3.5. Middleware . . . . . . . . . . . . . . . . . . . . . . . 7
4. Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. init . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2. media . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3. priority . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4. Extensions . . . . . . . . . . . . . . . . . . . . . . . 8
5. Security Considerations . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1. Normative References . . . . . . . . . . . . . . . . . . 9
7.2. Informative References . . . . . . . . . . . . . . . . . 9
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Overview
Warp is a live video transport protocol that utilizes the [QUIC]
network protocol.
The live stream is split into segments (Section 2) at I-frame
boundaries. These are fragmented MP4 files as defined in [ISOBMFF].
Initialization segments contain track metadata while media segments
contain either video or audio samples.
QUIC streams (Section 3) are used to transfer messages and segments
between endpoints. These streams are prioritized based on the
contents, such that the most important media is delivered during
congestion.
Messages (Section 4) are sent over streams alongside segments. These
are used to carry necessary metadata and control messages.
Curley Expires 13 August 2022 [Page 2]
Internet-Draft WARP February 2022
1.1. Terms and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Commonly used terms in this document are described below.
Frame: An image to be rendered at a specific point in time.
I-frame: A frame that does not depend on the contents of other
frames.
Group of pictures (GOP): A I-frame followed by a sequential series
of dependent frames.
Group of samples: A sequential series of audio samples starting at a
given timestamp.
Segment: A sequence of video frames and/or audio samples serialized
into a container.
Presentation Timestamp (PTS): A point in time when video/audio
should be presented to the viewer.
Media producer: An endpoint sending media over the network.
Media consumer: An endpoint receiving media over the network.
Congestion: Packet loss and queuing caused by degraded or overloaded
networks.
2. Segments
The live stream is split into segments before being transferred over
the network. Segments are fragmented MP4 files as defined by
[ISOBMFF].
There are two types of segments: initialization and media.
2.1. Initialization
Initialization segments contain track metadata but no sample data.
Curley Expires 13 August 2022 [Page 3]
Internet-Draft WARP February 2022
Initialization segments MUST consist of a File Type Box ('ftyp')
followed by a Movie Box ('moov'). This Movie Box consists of Movie
Header Boxes ('mvhd'), Track Header Boxes ('tkhd'), Track Boxes
('trak'), followed by a final Movie Extends Box ('mvex'). These
boxes MUST NOT contain any samples and MUST have a duration of zero.
Note that a Common Media Application Format Header [CMAF] meets all
these requirements.
2.2. Media
Media segments contain media samples for a single track.
Media segments MUST consist of a Segment Type Box ('styp') followed
by at least one media fragment. Each media fragment consists of a
Movie Fragment Box ('moof') followed by a Media Data Box ('mdat').
The Media Fragment Box MUST contain a Movie Fragment Header Box
('mfhd') and Track Box ('trak') with a Track ID ('track_ID') matching
a Track Box in the initialization segment.
Note that a Common Media Application Format Segment [CMAF] meets all
these requirements.
2.2.1. Video
Media segments containing video data MUST start with an I-frame.
Media fragments MAY contain a single frame, minimizing latency at the
cost of a small increase in segment size. Video frames MUST be in
decode order.
2.2.2. Audio
Media fragments MAY contain a single group of audio samples,
minimizing latency at the cost of a small increase in segment size.
3. Streams
Warp uses unidirectional QUIC streams to transfer messages and
segments over the network. The establishment of the QUIC connection
is outside the scope of this document.
An endpoints MAY both send media (producer) and receive media
(consumer). This is accomplished by sending messages and segments
over unidirectional streams. Streams contain any number of messages
and segments concatenated together.
Curley Expires 13 August 2022 [Page 4]
Internet-Draft WARP February 2022
3.1. Messages
Messages are used to control playback or carry metadata about
upcoming segments.
A Warp Box ('warp') is a top-level MP4 box as defined in [ISOBMFF].
The contents of this box is a warp message. See the messages section
(Section 4) for the encoding and types available.
3.2. Segments
Segments are transferred over streams alongside messages. Each
segment MUST be preceded by an init (Section 4.1) or media
(Section 4.2) message, indicating the type of segment and providing
additional metadata.
The media producer SHOULD send each segment as a unique stream to
avoid head-of-line blocking. The media producer MAY send multiple
segments over a single stream, for simplicity, when head-of-line
blocking is desired.
A segment is the smallest unit of delivery, as the tail of a segment
can be safely delayed/dropped without decode errors. A future
version of Warp will support layered coding (additional QUIC streams)
to enable dropping or downscalling frames in the middle of a segment.
3.3. Prioritization
Warp utilizes precedence to deliver the most important content during
congestion.
The media producer assigns a numeric presidence to each stream. This
is a strict prioritzation scheme, such that any available bandwidth
is allocated to streams in descending order. QUIC supports stream
prioritization but does not standardize any mechanisms; see
Section 2.3 in [QUIC]. The media producer MUST support sending
priorized streams. The media producer MAY choose to delay
retransmitting lower priority streams when possible within QUIC flow
control limits.
The media consumer determines how long to wait for a given segment
(buffer size) before skipping ahead. The media consumer MAY cancel a
skipped segment to save bandwidth, or leave it downloading in the
background (ex. to support rewind).
Prioritization allows a single media producer to support multiple
media consumers with different latency targets. For example, one
consumer could have a 1s buffer to minimize latency, while another
Curley Expires 13 August 2022 [Page 5]
Internet-Draft WARP February 2022
conssumer could have a 5s buffer to improve quality, while a yet
another consumer could have a 30s buffer to receive all media (ex.
VOD recorder).
3.3.1. Live Content
Live content is encoded and delivered in real-time. Media delivery
is blocked on the encoder throughput, except during congestion
causing limited network throughput. To best deliver live content:
* Audio streams SHOULD be prioritized over video streams. This
allows the media consumer to skip video while audio continues
uninterupted during congestion.
* Newer video streams SHOULD be prioritized over older video
streams. This allows the media consumer to skip older video
content during congestion.
For example, this formula will prioritze audio segments, but only up
to 3s in the future:
if is_audio:
precedence = timestamp + 3s
else:
precedence = timestamp
3.3.2. Recorded Content
Recorded content has already been encoded. Media delivery is blocked
exclusively on network throughput.
Warp is primarily designed for live content, but can switch to head-
of-line blocking by changing stream prioritization. This is also
useful for content that should not be skipped over, such as
advertisements. To enable head-of-line blocking:
* Older streams SHOULD be prioritized over newer streams.
For example, this formula will prioritize older segments:
precedence = -timestamp
3.4. Cancellation
During congestion, prioritization intentionally cause stream
starvation for the lowest priority streams. Some form of starvation
will last until the network fully recovers, which may be indefinite.
Curley Expires 13 August 2022 [Page 6]
Internet-Draft WARP February 2022
The media consumer SHOULD cancel a stream (via a QUIC STOP_SENDING
frame) after it has been skipped to save bandwidth. The media
producer SHOULD reset the lowest priority stream (via QUIC
RESET_STREAM frame) when nearing resource limits. Both of these
actions will effectively drop the tail of the segment.
3.5. Middleware
Media may go through multiple hops and processing steps on the path
from the broadcaster to player. The full effectiveness of warp as an
end-to-end protocol depends on middleware support.
* Middleware MUST maintain stream idependence to avoid introducing
head-of-line blocking.
* Middleware SHOULD maintain stream prioritization when traversing
networks susceptible to congestion.
* Middleware MUST forward the priority message (Section 4.3) for
downstream servers.
4. Messages
Warp endpoints communicate via messages contained in the top-level
Warp Box (warp).
A warp message is JSON object, where the key defines the message type
and the value depends on the message type. Unknown messages MUST be
ignored.
An endpoint MUST send messages sequentially over a single stream when
ordering is required. Messages MAY be combined into a single JSON
object when ordering is not required.
4.1. init
The init message indicates that the remainder of the stream contains
an initialization segment.
{
init: {
id: int
}
}
id: Incremented by 1 for each initialization segment.
Curley Expires 13 August 2022 [Page 7]
Internet-Draft WARP February 2022
4.2. media
The media message contains metadata about the next media segment in
the stream.
{
segment: {
init: int,
timestamp: int,
}
}
init: The id of the cooresponding initialization segment. A decoder
MUST block until the coorespending init message to arrive.
timestamp: The presentation timestamp in milliseconds for the first
frame/sample in the next segment. This timestamp MUST be used
when it does not match the timestamp in the media container.
4.3. priority
The priority message informs middleware about the intended priority
of the current stream. Any middleware MAY ignore this value but
SHOULD forward it.
{
priority: {
precedence: int,
}
}
precedence: An integer value, indicating that any available
bandwidth SHOULD be allocated to streams in descending order.
4.4. Extensions
Custom messages MUST start with x-. Unicode LATIN SMALL LETTER X
(U+0078) followed by HYPHEN-MINUS (U+002D).
Custom messages SHOULD use a unique prefix to reduce collisions. For
example: x-twitch-load would contain identification required to start
playback of a Twitch stream.
5. Security Considerations
TODO
Curley Expires 13 August 2022 [Page 8]
Internet-Draft WARP February 2022
6. IANA Considerations
This document has no IANA actions.
7. References
7.1. Normative References
[ISOBMFF] "Information technology — Coding of audio-visual objects —
Part 12: ISO Base Media File Format", December 2015.
[QUIC] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", RFC 9000,
DOI 10.17487/RFC9000, May 2021,
<https://www.rfc-editor.org/rfc/rfc9000>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
7.2. Informative References
[CMAF] "Information technology -- Multimedia application format
(MPEG-A) -- Part 19: Common media application format
(CMAF) for segmented media", March 2020.
Contributors
* Michael Thornburgh
Author's Address
Luke Curley
Twitch
Email: lcurley@twitch.tv
Curley Expires 13 August 2022 [Page 9]