Internet-Draft	Matroska Codec	May 2022
Lhomme, et al.	Expires 2 November 2022	[Page]

Workgroup:: cellar
Internet Draft:: draft-ietf-cellar-codec-09
Published:: 1 May 2022
Intended Status:: Standards Track
Expires:: 2 November 2022
Authors:: S. Lhomme

M. Bunkus

D. Rice

Matroska Media Container Codec Specifications

Abstract

This document defines the Matroska codec mappings, including the codec ID, layout of data in a Block Element and in an optional CodecPrivate Element.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 2 November 2022.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

▲

1. Introduction

Matroska aims to become THE standard of multimedia container formats. It stores interleaved and timestamped audio/video/subtitle data using various codecs. To interpret the codec data, a mapping between the way the data is stored in Matroska and how it is understood by such a codec is necessary.¶

This document intends to define this mapping for many commonly used codecs in Matroska.¶

2. Status of this document

This document is a work-in-progress specification defining the Matroska file format as part of the IETF Cellar working group. It uses basic elements and concept already defined in the Matroska specifications defined by this workgroup.¶

3. Notation and Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

4. Codec Mappings

A Codec Mapping is a set of attributes to identify, name, and contextualize the format and characteristics of encoded data that can be contained within Matroska Clusters.¶

Each TrackEntry used within Matroska MUST reference a defined Codec Mapping using the Codec ID to identify and describe the format of the encoded data in its associated Clusters. This Codec ID is a unique registered identifier that represents the encoding stored within the Track. Certain encodings MAY also require some form of codec initialization in order to provide its decoder with context and technical metadata.¶

The intention behind this list is not to list all existing audio and video codecs, but rather to list those codecs that are currently supported in Matroska and therefore need a well defined Codec ID so that all developers supporting Matroska will use the same Codec ID. If you feel we missed support for a very important codec, please tell us on our development mailing list (cellar at ietf.org).¶

4.1. Defining Matroska Codec Support

Support for a codec is defined in Matroska with the following values.¶

4.1.1. Codec ID

Each codec supported for storage in Matroska MUST have a unique Codec ID. Each Codec ID MUST be prefixed with the string from the following table according to the associated type of the codec. All characters of a Codec ID Prefix MUST be capital letters (A-Z) except for the last character of a Codec ID Prefix which MUST be an underscore ("_").¶

Table 1
Codec Type	Codec ID Prefix
Video	"V_"
Audio	"A_"
Subtitle	"S_"
Button	"B_"

Each Codec ID MUST include a Major Codec ID immediately following the Codec ID Prefix. A Major Codec ID MAY be followed by an OPTIONAL Codec ID Suffix to communicate a refinement of the Major Codec ID. If a Codec ID Suffix is used, then the Codec ID MUST include a forward slash ("/") as a separator between the Major Codec ID and the Codec ID Suffix. The Major Codec ID MUST be composed of only capital letters (A-Z) and numbers (0-9). The Codec ID Suffix MUST be composed of only capital letters (A-Z), numbers (0-9), underscore ("_"), and forward slash ("/").¶

The following table provides examples of valid Codec IDs and their components:¶

Table 2
Codec ID Prefix	Major Codec ID	Separator	Codec ID Suffix	Codec ID
A_	AAC	/	MPEG2/LC/SBR	A_AAC/MPEG2/LC/SBR
V_	MPEG4	/	ISO/ASP	V_MPEG4/ISO/ASP
V_	MPEG1			V_MPEG1

4.1.2. Codec Name

Each encoding supported for storage in Matroska MUST have a Codec Name. The Codec Name provides a readable label for the encoding.¶

4.1.3. Description

An optional description for the encoding. This value is only intended for human consumption.¶

4.1.4. Initialization

Each encoding supported for storage in Matroska MUST have a defined Initialization. The Initialization MUST describe the storage of data necessary to initialize the decoder, which MUST be stored within the CodecPrivate Element. When the Initialization is updated within a track, then that updated Initialization data MUST be written into the CodecState Element of the first Cluster to require it. If the encoding does not require any form of Initialization, then none MUST be used to define the Initialization and the CodecPrivate Element SHOULD NOT be written and MUST be ignored. Data that is defined Initialization to be stored in the CodecPrivate Element is known as Private Data.¶

4.1.5. Codec BlockAdditions

Additional data that contextualizes or supplements a Block can be stored within the BlockAdditional Element of a BlockMore Element. This BlockAdditional data MAY be passed to the associated decoder along with the content of the Block Element. Each BlockAdditional is coupled with a BlockAddID that identifies the kind of data it contains. The following table defines the meanings of BlockAddID values.¶

Table 3
BlockAddID Value	Definition
0	Invalid.
1	Indicates that the context of the `BlockAdditional` data is defined by the corresponding `Codec Mapping`.
2 or greater	`BlockAddID` values of 2 and greater are mapped to the `BlockAddIDValue` of the `BlockAdditionMapping` of the associated Track.

The values of BlockAddID that are 2 of greater have no semantic meaning, but simply associate the BlockMore Element with a BlockAdditionMapping of the associated Track. See Section 6 on Block Additional Mappings for more information.¶

The following XML depicts the nested Elements of a BlockGroup Element with an example of BlockAdditions:¶

<BlockGroup>
  <Block>{Binary data of a VP9 video frame in YUV}</Block>
  <BlockAdditions>
    <BlockMore>
      <BlockAddID>1</BlockAddID>
      <BlockAdditional>
        {alpha channel encoding to supplement the VP9 frame}
      </BlockAdditional>
    </BlockMore>
  </BlockAdditions>
</BlockGroup>

4.1.6. Citation

Documentation of the associated normative and informative references for the codec is RECOMMENDED.¶

4.1.7. Deprecation Date

A timestamp, expressed in [RFC3339] that notes when support for the Codec Mapping within Matroska was deprecated. If a Codec Mapping is defined with a Deprecation Date, then it is RECOMMENDED that Matroska writers SHOULD NOT use the Codec Mapping after the Deprecation Date.¶

4.1.8. Superseded By

A Codec Mapping MAY only be defined with a Superseded By value, if it has an expressed Deprecation Date. If used, the Superseded By value MUST store the Codec ID of another Codec Mapping that has superseded the Codec Mapping.¶

4.2. Recommendations for the Creation of New Codec Mappings

Creators of new Codec Mappings to be used in the context of Matroska:¶

SHOULD assume that all Codec Mappings they create might become standardized, public, commonly deployed, or usable across multiple implementations.¶
SHOULD employ meaningful values for Codec ID and Codec Name that they have reason to believe are currently unused.¶
SHOULD NOT prefix their Codec ID with "X_" or similar constructs.¶

These recommendations are based upon Section 3 of [RFC6648].¶

4.3. Video Codec Mappings

4.3.1. V_MS/VFW/FOURCC

Codec ID: V_MS/VFW/FOURCC¶

Codec Name: Microsoft (TM) Video Codec Manager (VCM)¶

Description: The private data contains the VCM structure BITMAPINFOHEADER including the extra private bytes, as defined by Microsoft. The data are stored in little-endian format (like on IA32 machines). Where is the Huffman table stored in HuffYUV, not AVISTREAMINFO ??? And the FourCC, not in AVISTREAMINFO.fccHandler ???¶

Initialization: Private Data contains the VCM structure BITMAPINFOHEADER including the extra private bytes, as defined by Microsoft in https://msdn.microsoft.com/en-us/library/windows/desktop/dd183376(v=vs.85).aspx.¶

Citation: https://msdn.microsoft.com/en-us/library/windows/desktop/dd183376(v=vs.85).aspx ¶

4.3.2. V_UNCOMPRESSED

Codec ID: V_UNCOMPRESSED¶

Codec Name: Video, raw uncompressed video frames¶

Description: All details about the used color specs and bit depth are to be put/read from the TrackEntry\Video\UncompressedFourCC elements.¶

Initialization: none¶

4.3.3. V_MPEG4/ISO/SP

Codec ID: V_MPEG4/ISO/SP¶

Codec Name: MPEG4 ISO simple profile (DivX4)¶

Description: Stream was created via improved codec API (UCI) or even transmuxed from AVI (no b-frames in Simple Profile), frame order is coding order.¶

Initialization: none¶

4.3.4. V_MPEG4/ISO/ASP

Codec ID: V_MPEG4/ISO/ASP¶

Codec Name: MPEG4 ISO advanced simple profile (DivX5, XviD, FFMPEG)¶

Description: Stream was created via improved codec API (UCI) or transmuxed from MP4, not simply transmuxed from AVI. Note there are differences how b-frames are handled in these native streams, when being compared to a VfW created stream, as here there are no dummy frames inserted, the frame order is exactly the same as the coding order, same as in MP4 streams.¶

Initialization: none¶

4.3.5. V_MPEG4/ISO/AP

Codec ID: V_MPEG4/ISO/AP¶

Codec Name: MPEG4 ISO advanced profile¶

Initialization: none¶

4.3.6. V_MPEG4/MS/V3

Codec ID: V_MPEG4/MS/V3¶

Codec Name: Microsoft (TM) MPEG4 V3¶

Description: Microsoft (TM) MPEG4 V3 and derivates, means DivX3, Angelpotion, SMR, etc.; stream was created using VfW codec or transmuxed from AVI; note that V1/V2 are covered in VfW compatibility mode.¶

Initialization: none¶

4.3.7. V_MPEG1

Codec ID: V_MPEG1¶

Codec Name: MPEG 1¶

Description: The Matroska video stream will contain a demuxed Elementary Stream (ES), where block boundaries are still to be defined. Its RECOMMENDED to use MPEG2MKV.exe for creating those files, and to compare the results with self-made implementations¶

Initialization: none¶

4.3.8. V_MPEG2

Codec ID: V_MPEG2¶

Codec Name: MPEG 2¶

Initialization: none¶

4.3.9. V_MPEG4/ISO/AVC

Codec ID: V_MPEG4/ISO/AVC¶

Codec Name: AVC/H.264¶

Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of AVC/H.264 stored as described in [ISO.14496-15].¶

Initialization: The Private Data contains a AVCDecoderConfigurationRecord structure, as defined in [ISO.14496-15]. For legacy reasons, because Block Addition Mappings are preferred, see Section 4.7, the AVCDecoderConfigurationRecord structure MAY be followed by an extension block beginning with a 4-byte extension block size field in big-endian byte order which is the size of the extension block minus 4 (excluding the size of the extension block size field) and a 4-byte field corresponding to a BlockAddIDType of "mvcC" followed by a content corresponding to the content of BlockAddIDExtraData for mvcC; see Section 4.7.8.¶

4.3.10. V_MPEGH/ISO/HEVC

Codec ID: V_MPEGH/ISO/HEVC¶

Codec Name: HEVC/H.265¶

Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of HEVC/H.265 stored as described in [ISO.14496-15].¶

Initialization: The Private Data contains a HEVCDecoderConfigurationRecord structure, as defined in [ISO.14496-15].¶

4.3.11. V_AVS2

Codec ID: V_AVS2¶

Codec Name: AVS2-P2/IEEE.1857.4¶

Description: Individual pictures of AVS2-P2 stored as described in the second part of [IEEE.1857-4].¶

Initialization: none.¶

4.3.12. V_AVS3

Codec ID: V_AVS3¶

Codec Name: AVS3-P2/IEEE.1857.10¶

Description: Individual pictures of AVS3-P2 stored as described in the second part of [IEEE.1857-10].¶

Initialization: none.¶

4.3.13. V_REAL/RV10

Codec ID: V_REAL/RV10¶

Codec Name: RealVideo 1.0 aka RealVideo 5¶

Description: Individual slices from the Real container are combined into a single frame.¶

Initialization: The Private Data contains a real_video_props_t structure in big-endian byte order as found in librmff.¶

4.3.14. V_REAL/RV20

Codec ID: V_REAL/RV20¶

Codec Name: RealVideo G2 and RealVideo G2+SVT¶

Description: Individual slices from the Real container are combined into a single frame.¶

Initialization: The Private Data contains a real_video_props_t structure in big-endian byte order as found in librmff.¶

4.3.15. V_REAL/RV30

Codec ID: V_REAL/RV30¶

Codec Name: RealVideo 8¶

Description: Individual slices from the Real container are combined into a single frame.¶

Initialization: The Private Data contains a real_video_props_t structure in big-endian byte order as found in librmff.¶

4.3.16. V_REAL/RV40

Codec ID: V_REAL/RV40¶

Codec Name: rv40 : RealVideo 9¶

Description: Individual slices from the Real container are combined into a single frame.¶

Initialization: The Private Data contains a real_video_props_t structure in big-endian byte order as found in librmff.¶

4.3.17. V_QUICKTIME

Codec ID: V_QUICKTIME¶

Codec Name: Video taken from QuickTime(TM) files¶

Description: Several codecs as stored in QuickTime, e.g., Sorenson or Cinepak.¶

Initialization: The Private Data contains all additional data that is stored in the 'stsd' (sample description) atom in the QuickTime file after the mandatory video descriptor structure (starting with the size and FourCC fields). For an explanation of the QuickTime file format read QuickTime File Format Specification.¶

4.3.18. V_THEORA

Codec ID: V_THEORA¶

Codec Name: Theora¶

Initialization: The Private Data contains the first three Theora packets in order. The lengths of the packets precedes them. The actual layout is:¶

Byte 1: number of distinct packets #p minus one inside the CodecPrivate block. This MUST be "2" for current (as of 2016-07-08) Theora headers.¶
Bytes 2..n: lengths of the first #p packets, coded in Xiph-style lacing. The length of the last packet is the length of the CodecPrivate block minus the lengths coded in these bytes minus one.¶
Bytes n+1..: The Theora identification header, followed by the commend header followed by the codec setup header. Those are described in the Theora specs.¶

4.3.19. V_PRORES

Codec ID: V_PRORES¶

Codec Name: Apple ProRes¶

Initialization: The Private Data contains the FourCC as found in MP4 movies:¶

ap4x: ProRes 4444 XQ¶
ap4h: ProRes 4444¶
apch: ProRes 422 High Quality¶
apcn: ProRes 422 Standard Definition¶
apcs: ProRes 422 LT¶
apco: ProRes 422 Proxy¶
aprh: ProRes RAW High Quality¶
aprn: ProRes RAW Standard Definition¶

this page for more technical details on ProRes ¶

4.3.20. V_VP8

Codec ID: V_VP8¶

Codec Name: VP8 Codec format¶

Description: VP8 is an open and royalty free video compression format developed by Google and created by On2 Technologies as a successor to VP7. [RFC6386]¶

Codec BlockAdditions: A single-channel encoding of an alpha channel MAY be stored in BlockAdditions. The BlockAddId of the BlockMore containing these data MUST be 1.¶

Initialization: none¶

4.3.21. V_VP9

Codec ID: V_VP9¶

Codec Name: VP9 Codec format¶

Description: VP9 is an open and royalty free video compression format developed by Google as a successor to VP8. Draft VP9 Bitstream and Decoding Process Specification ¶

Codec BlockAdditions: A single-channel encoding of an alpha channel MAY be stored in BlockAdditions. The BlockAddId of the BlockMore containing these data MUST be 1.¶

Initialization: none¶

4.3.22. V_FFV1

Codec ID: V_FFV1¶

Codec Name: FF Video Codec 1¶

Description: FFV1 is a lossless intra-frame video encoding format designed to efficiently compress video data in a variety of pixel formats. Compared to uncompressed video, FFV1 offers storage compression, frame fixity, and self-description, which makes FFV1 useful as a preservation or intermediate video format. Draft FFV1 Specification ¶

Initialization: For FFV1 versions 0 or 1, Private Data SHOULD NOT be written. For FFV1 version 3 or greater, the Private Data MUST contain the FFV1 Configuration Record structure, as defined in https://tools.ietf.org/html/draft-ietf-cellar-ffv1-04#section-4.2, and no other data.¶

4.4. Audio Codec Mappings

4.4.1. A_MPEG/L3

Codec ID: A_MPEG/L3¶

Codec Name: MPEG Audio 1, 2, 2.5 Layer III¶

Description: The data contain everything needed for playback in the MPEG Audio header of each frame. Corresponding ACM wFormatTag : 0x0055¶

Initialization: none¶

4.4.2. A_MPEG/L2

Codec ID: A_MPEG/L2¶

Codec Name: MPEG Audio 1, 2 Layer II¶

Description: The data contain everything needed for playback in the MPEG Audio header of each frame. Corresponding ACM wFormatTag : 0x0050¶

Initialization: none¶

4.4.3. A_MPEG/L1

Codec ID: A_MPEG/L1¶

Codec Name: MPEG Audio 1, 2 Layer I¶

Description: The data contain everything needed for playback in the MPEG Audio header of each frame. Corresponding ACM wFormatTag : 0x0050¶

Initialization: none¶

4.4.4. A_PCM/INT/BIG

Codec ID: A_PCM/INT/BIG¶

Codec Name: PCM Integer Big Endian¶

Description: The audio bit depth MUST be read and set from the BitDepth Element. Audio samples MUST be considered as signed values, except if the audio bit depth is 8 which MUST be interpreted as unsigned values. Corresponding ACM wFormatTag : ???¶

Initialization: none¶

4.4.5. A_PCM/INT/LIT

Codec ID: A_PCM/INT/LIT¶

Codec Name: PCM Integer Little Endian¶

Initialization: none¶

4.4.6. A_PCM/FLOAT/IEEE

Codec ID: A_PCM/FLOAT/IEEE¶

Codec Name: Floating Point, IEEE compatible¶

Description: The audio bit depth MUST be read and set from the BitDepth Element (32 bit in most cases). The floats are stored as defined in [IEEE.754] and in little-endian order. Corresponding ACM wFormatTag : 0x0003¶

Initialization: none¶

4.4.7. A_MPC

Codec ID: A_MPC¶

Codec Name: MPC (musepack) SV8¶

Description: The main developer for musepack has requested that we wait until the SV8 framing has been fully defined for musepack before defining how to store it in Matroska.¶

4.4.8. A_AC3

Codec ID: A_AC3¶

Codec Name: (Dolby™ (U+2122)) AC3¶

Description: BSID <= 8 !! The private data is void ??? Corresponding ACM wFormatTag : 0x2000 ; channel number have to be read from the corresponding audio element¶

4.4.9. A_AC3/BSID9

Codec ID: A_AC3/BSID9¶

Codec Name: (Dolby™ (U+2122)) AC3¶

Description: The ac3 frame header has, similar to the mpeg-audio header a version field. Normal ac3 is defined as bitstream id 8 (5 Bits, numbers are 0-15). Everything below 8 is still compatible with all decoders that handle 8 correctly. Everything higher are additions that break decoder compatibility. For the samplerates 24kHz (00); 22,05kHz (01) and 16kHz (10) the BSID is 9 For the samplerates 12kHz (00); 11,025kHz (01) and 8kHz (10) the BSID is 10¶

Initialization: none¶

4.4.10. A_AC3/BSID10

Codec ID: A_AC3/BSID10¶

Codec Name: (Dolby™ (U+2122)) AC3¶

Initialization: none¶

4.4.11. A_ALAC

Codec ID: A_ALAC¶

Codec Name: ALAC (Apple Lossless Audio Codec)¶

Initialization: The Private Data contains ALAC's magic cookie (both the codec specific configuration as well as the optional channel layout information). Its format is described in ALAC's official source code.¶

4.4.12. A_DTS

Codec ID: A_DTS¶

Codec Name: Digital Theatre System¶

Description: Supports DTS, DTS-ES, DTS-96/26, DTS-HD High Resolution Audio and DTS-HD Master Audio. The private data is void. Corresponding ACM wFormatTag : 0x2001¶

Initialization: none¶

4.4.13. A_DTS/EXPRESS

Codec ID: A_DTS/EXPRESS¶

Codec Name: Digital Theatre System Express¶

Description: DTS Express (a.k.a. LBR) audio streams. The private data is void. Corresponding ACM wFormatTag : 0x2001¶

Initialization: none¶

4.4.14. A_DTS/LOSSLESS

Codec ID: A_DTS/LOSSLESS¶

Codec Name: Digital Theatre System Lossless¶

Description: DTS Lossless audio that does not have a core substream. The private data is void. Corresponding ACM wFormatTag : 0x2001¶

Initialization: none¶

4.4.15. A_VORBIS

Codec ID: A_VORBIS¶

Codec Name: Vorbis¶

Initialization: The Private Data contains the first three Vorbis packet in order. The lengths of the packets precedes them. The actual layout is: - Byte 1: number of distinct packets #p minus one inside the CodecPrivate block. This MUST be "2" for current (as of 2016-07-08) Vorbis headers. - Bytes 2..n: lengths of the first #p packets, coded in Xiph-style lacing. The length of the last packet is the length of the CodecPrivate block minus the lengths coded in these bytes minus one. - Bytes n+1..: The Vorbis identification header, followed by the Vorbis comment header followed by the codec setup header.¶

4.4.16. A_FLAC

Codec ID: A_FLAC¶

Codec Name: FLAC (Free Lossless Audio Codec)¶

Initialization: The Private Data contains all the header/metadata packets before the first data packet. These include the first header packet containing only the word fLaC as well as all metadata packets.¶