Skip to main content

MoQ Media Interop
draft-cenzano-moq-media-interop-00

Document Type Active Internet-Draft (individual)
Authors Jorge Cenzano Ferret , Alan Frindell
Last updated 2024-10-21
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-cenzano-moq-media-interop-00
Media Over QUIC                                        J. Cenzano-Ferret
Internet-Draft                                               A. Frindell
Intended status: Informational                                      Meta
Expires: 24 April 2025                                   21 October 2024

                           MoQ Media Interop
                   draft-cenzano-moq-media-interop-00

Abstract

   This protocol can be used to send and receive video and audio over
   Media over QUIC Transport [MOQT].

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://afrind.github.io/draft-cenzano-media-interop/draft-cenzano-
   moq-media-interop.html.  Status information for this document may be
   found at https://datatracker.ietf.org/doc/draft-cenzano-moq-media-
   interop/.

   Discussion of this document takes place on the Media Over QUIC
   Working Group mailing list (mailto:moq@ietf.org), which is archived
   at https://mailarchive.ietf.org/arch/browse/moq/.  Subscribe at
   https://www.ietf.org/mailman/listinfo/moq/.

   Source for this draft and an issue tracker can be found at
   https://github.com/afrind/draft-cenzano-media-interop.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 April 2025.

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 1]
Internet-Draft                   moq-mi                     October 2024

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Protocol Operation  . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Track Names . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Mapping Tracks to MoQT Object Model . . . . . . . . . . .   3
     2.3.  Timestamps  . . . . . . . . . . . . . . . . . . . . . . .   3
     2.4.  Object Format . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Video H264 in AVCC with WCP Payload Format  . . . . . . . . .   4
   4.  Audio Opus bitstream Payload Format . . . . . . . . . . . . .   5
   5.  Conventions and Definitions . . . . . . . . . . . . . . . . .   6
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   6
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   6
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   6
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   6
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   7
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .   7
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

1.  Introduction

   This protocol specifies a simple mechanism for sending media (video
   and audio) over MOQT for both live-streaming and VC use cases.  The
   protocol is flexible in order to support this range of use cases.

   Media parameters can be updated in the middle of a the track (ex:
   frame rate, resolution, codec, etc)

   The protocol defines a low overhead packaging format optimized for
   WebCodecs called WCP that is extensible to other formats such as
   FMP4.  This is not LoC [LOC], but will eventually be merged with that
   specification.

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 2]
Internet-Draft                   moq-mi                     October 2024

2.  Protocol Operation

2.1.  Track Names

   The publisher selects a namespace of their choosing, and sends an
   ANNOUNCE message for this namespace.  Since MoQT tracks are
   immutable, each new broadcast MUST have a unique namespace.  It is
   RECOMMENDED that the last tuple of the track namespace contain a
   broadcast timestamp to ensure uniqueness.

   Within the namespace the publisher offers media tracks named videoX
   and audioX, where X is an integer starting at 0 and increasing by 1
   for each additional track of a given type.

   For example, if the publisher issues 2 audio tracks and 1 video
   track, the track names available will be video0, audio0, and audio1.

   The subscriber will consider all of those tracks belonging to the
   same namespace as part of the same synchronization group (timestamps
   aligned to the same timeline).

2.2.  Mapping Tracks to MoQT Object Model

   For the video track, the publisher begins a new group at the start of
   each IDR (so object 0 will be always an IDR Keyframe), and each group
   contains a single subgroup.  Each object has the format described in
   Section 2.4.

   For the audio track, the publisher begins a new group with each audio
   object, and each group contains a single subgroup.  Each object has
   the format described in Section 2.4.

   TODO: Datagram forwarding preference could be used, but has problems
   if audio frame does not fit in a single UDP payload.

2.3.  Timestamps

   To avoid using fractional numbers and having to deal with rounding
   errors, timestamps will be expressed with two integers:

   *  timestamp numerator (ex: PTS, DTS, duration)

   *  timebase

   To convert a timestamp into seconds you just need to: timestamp(s) =
   timestamp numerator / timebase

   _Example:_

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 3]
Internet-Draft                   moq-mi                     October 2024

   PTS = 11, timebase = 30

   PTS(s) = 11/30 = 0.366666

2.4.  Object Format

   All objects this protocol have the following format.

   {
     Media Type (i)
     Media payload (..)
   }

                        Figure 1: MOQT Media object

   *  Media Type: Indicates what kind of media payload will follow.

                  +======+=============================+
                  | Code | Value                       |
                  +======+=============================+
                  |  0x0 | Video H264 in AVCC with WCP |
                  +------+-----------------------------+
                  |  0x1 | Audio Opus bitstream        |
                  +------+-----------------------------+

                           Table 1: Media Types

   *  Media payload: Media type specific payload

3.  Video H264 in AVCC with WCP Payload Format

   {
     Seq ID (i)
     PTS Timestamp (i)
     DTS Timestamp (i)
     Timebase (i)
     Duration (i)
     Wall Clock (i)
     Metadata Size (i)
     Metadata (..)
     Payload (..)
   }

                    Figure 2: MOQT Media video h264 WCP

   *  Seq ID: Monotonically increasing counter for this media track.

   *  PTS Timestamp: Presentation timestamp in timebase.

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 4]
Internet-Draft                   moq-mi                     October 2024

   TODO: Varint does NOT accept easily negative values, so it could be
   challenging to encode at start (priming).

   *  DTS Timestamp: Display timestamp in timebase.  If B frames are not
      used, the encoder SHOULD set this to the same value as PTS.

   TODO: Varint does NOT accept easily negative values, so it could be
   challenging to encode at start (priming).

   *  Timebase: Denominator used to calculate PTS, DTS, and Duration.

   *  Duration: Duration of Payload in timebase.  It will be 0 if not
      set.

   *  Wall Clock: Epoch time in ms when this frame started being
      captured.  It will be 0 if not set.

   *  Metadata Size: Size in bytes of the metadata section.  It will be
      0 when no metadata is present.

   *  Metadata: Extra data needed to decode this stream.  This will be
      AVCDecoderConfigurationRecord as described in [ISO14496] section
      5.3.3.1, with field lengthSizeMinusOne = 3 (So length = 4).  If
      any other size length is indicated (in
      AVCDecoderConfigurationRecord), the receiver SHOULD close the
      session with a Protocol Violation error.  Any change in encoding
      parameters MUST send a new AVCDecoderConfigurationRecord

   *  Payload: H264 with bitstream AVC1 format as described in
      [ISO14496] section 5.3.  Using 4 bytes size field length.

4.  Audio Opus bitstream Payload Format

   {
     Seq ID (i)
     PTS Timestamp (i)
     Timebase (i)
     Sample Freq (i)
     Num Channels (i)
     Duration (i)
     Wall Clock (i)
     Payload (..)
   }

                    Figure 3: MOQT Media audio Opus WCP

   *  Seq Id: Monotonically increasing counter for this media track.

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 5]
Internet-Draft                   moq-mi                     October 2024

   *  PTS Timestamp: Indicates PTS in timebase.

   TODO: Varint does NOT accept easily negative, so it could be
   challenging to encode at start (priming).

   *  Timebase: Denominator used to calculate PTS and Duration.

   *  Sample Freq: Sample frequency used in the original signal (before
      encoding).

   *  Num Channels: Number of channels in the original signal (before
      encoding).

   *  Duration: Duration of Payload in timebase.  It will be 0 if not
      set.

   *  Wall Clock: Epoch time in ms when this frame started being
      captured.  It will be 0 if not set.

   *  Payload: Opus packets, as described in [RFC6716] - section 3

5.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

6.  Security Considerations

   TODO Security

7.  IANA Considerations

   This document has no IANA actions.

8.  References

8.1.  Normative References

   [ISO14496] ISO, "Carriage of network abstraction layer (NAL) unit
              structured video in the ISO base media file format", 2019,
              <https://www.iso.org/standard/74429.html>.

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 6]
Internet-Draft                   moq-mi                     October 2024

   [MOQT]     Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and
              I. Swett, "Media over QUIC Transport", Work in Progress,
              Internet-Draft, draft-ietf-moq-transport-07, 21 October
              2024, <https://datatracker.ietf.org/doc/html/draft-ietf-
              moq-transport-07>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC6716]  Valin, JM., Vos, K., and T. Terriberry, "Definition of the
              Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
              September 2012, <https://www.rfc-editor.org/rfc/rfc6716>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

8.2.  Informative References

   [LOC]      Zanaty, M., Nandakumar, S., and P. Thatcher, "Low Overhead
              Media Container", Work in Progress, Internet-Draft, draft-
              mzanaty-moq-loc-03, 4 March 2024,
              <https://datatracker.ietf.org/doc/html/draft-mzanaty-moq-
              loc-03>.

Acknowledgments

   TODO acknowledge.

Authors' Addresses

   Jordi Cenzano-Ferret
   Meta
   Email: jcenzano@meta.com

   Alan Frindell
   Meta
   Email: afrind@meta.com

Cenzano-Ferret & Frindell Expires 24 April 2025                 [Page 7]