Audio/Video Payload WG T. Schierl
Internet Draft Fraunhofer HHI
Intended status: Standards track S. Wenger
Expires: April 2013 Vidyo
Y.-K. Wang
Qualcomm
M. M. Hannuksela
Nokia
October 22, 2012
RTP Payload Format for High Efficiency Video Coding
draft-schierl-payload-rtp-h265-01.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 22, 2013.
Copyright and License Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
Schierl, et al Expires April 22, 2013 [Page 1]
Internet-Draft RTP Payload Format for HEVC October 2012
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Schierl, et al Expires April 22, 2013 [Page 2]
Internet-Draft RTP Payload Format for HEVC October 2012
Abstract
This memo describes an RTP payload format for High Efficiency Video
Coding (HEVC) [HEVC], which is currently being developed by the
Joint Collaborative Team on Video Coding (JCT-VC). The RTP payload
format allows for packetization of one or more Network Abstraction
Layer (NAL) units in each RTP packet payload, as well as
fragmentation of a NAL unit into multiple RTP packets. Furthermore,
it supports transmission of an HEVC stream over a single as well as
multiple RTP flows. The payload format has wide applicability in
videoconferencing, Internet video streaming, and high bit-rate
entertainment-quality video, among others.
Table of Contents
Status of this Memo ............................................ 1
Abstract ....................................................... 3
Table of Contents .............................................. 3
1 . Introduction ............................................... 5
1.1 . The HEVC Codec......................................... 5
1.1.1 Overview ........................................... 5
1.1.2 Parallel Processing Support ........................ 6
1.1.3 Parameter Sets ..................................... 9
1.1.4 NAL Unit Header ................................... 9
1.2 . Overview of the Payload Format ....................... 11
2 . Conventions ............................................... 12
3 . Definitions and Abbreviations ............................. 12
3.1 Definitions ............................................ 12
3.1.1 Definitions from the HEVC Specification ........... 12
3.1.2 Definitions Specific to This Memo ................. 13
3.2 Abbreviations .......................................... 14
4 . RTP Payload Format ........................................ 14
4.1 RTP Header Usage........................................ 14
4.2 NAL Unit Header Usage .................................. 16
4.3 Payload Structures ..................................... 16
4.4 Transmission Modes ..................................... 17
4.5 Packetization Modes .................................... 17
4.6 Decoding Order ......................................... 18
4.7 Aggregation Packets .................................... 20
4.7.1 Single Time Aggregation Packet (STAP) ............. 21
4.8 Fragmentation Units (FUs) .............................. 24
5 . Packetization Rules........................................ 27
5.1 Common Packetization Rules ............................. 28
5.2 Non-Interleaved mode ................................... 29
5.3 Interleaved mode........................................ 29
Schierl, et al Expires April 22, 2013 [Page 3]
Internet-Draft RTP Payload Format for HEVC October 2012
6 . De-Packetization Process .................................. 29
6.1 Non-Interleaved Mode ................................... 29
6.2 Interleaved Mode........................................ 30
6.2.1 Size of the De-interleaving Buffer ................ 30
6.2.2 De-interleaving Process .......................... 31
6.3 Additional De-Packetization Guidelines ................. 32
7 . Payload Format Parameters ................................. 33
7.1 Media Type Registration ................................ 33
7.2 SDP Parameters ......................................... 41
7.2.1 Mapping of Payload Type Parameters to SDP ......... 41
7.2.2 Usage with the SDP Offer/Answer Model ............. 41
7.2.3 Usage with SDP Offer/Answer Model ................. 42
7.2.4 Usage in Declarative Session Descriptions ......... 42
7.2.5 Signaling of Parallel Processing .................. 42
7.3 Examples ............................................... 42
7.4 Parameter Set Considerations ........................... 42
8 . Security Considerations ................................... 42
9 . Congestion Control ........................................ 42
10 . IANA Consideration........................................ 42
11 . Informative Appendix: Application Examples ............... 42
11.1 Introduction .......................................... 42
11.2 Streaming ............................................. 43
11.3 Videoconferencing (Unicast to MANE, Unicast to Endpoints)43
11.4 Mobile TV (Multicast to MANE, Unicast to Endpoint) .... 43
12 . Acknowledgements ......................................... 43
13 . References ............................................... 43
13.1 Normative References .................................. 43
13.2 Informative References ................................ 44
14 . Authors' Addresses........................................ 44
Schierl, et al Expires April 22, 2013 [Page 4]
Internet-Draft RTP Payload Format for HEVC October 2012
1. Introduction
1.1. The HEVC Codec
1.1.1 Overview
High Efficiency Video Coding [HEVC] is a forthcoming video coding
standard under development by the Joint Collaborative Team on Video
Coding (JCT-VC) formed by the ITU-T and ISO/IEC. It is reported to
provide significantly coding efficiency gains over H.264 [H.264].
The standard, once ratified, will officially be known asas ISO/IEC
23008-2, informally as MPEG H Part 2. ITU-T may decide soon on the
final recommendation number.
As both H.264 [H.264] and its RTP payload format [RFC6184] are
widely deployed and generally known in the relevant implementer
community, we frequently highlight only the differences to those two
specifications in non-normative, explanatory parts of this memo.
Basic familiarity with both specifications is assumed. The
normative parts of this memo do not require study of H.264 or its
payload format.
H.264 and HEVC share a similar hybrid video codec design.
Conceptually, both technologies include a video coding layer (VCL),
and a network abstraction layer (NAL).
The VCL of HEVC includes a prediction stage that involves motion
compensation and spatial intra-prediction, integer transforms
applied to prediction residuals, and an entropy coding stage that
uses an arithmetic coding. As in H.264, in-loop deblocking filtering
is applied to the reconstructed picture.
An important difference of HEVC compared to H.264 is the coding
structure within a picture. In HEVC each picture is divided into
treeblocks of up to 64x64 luma samples. Treeblocks can be
recursively split into smaller Coding Units (CUs) using a generic
quad-tree segmentation structure. CUs can be further split into
Prediction Units (PUs) used for intra- and inter-prediction and
Transform Units (TUs) defined for transform and quantization. HEVC
includes integer transforms for a number of TU sizes. HEVC also
includes a new in-loop filter known as Sample Adaptive Offset (SAO)
that may be applied after the deblocking filtering.
On random accessibility provisioning, HEVC introduces besides
Instantaneous Decoder Refresh (IDR) pictures a Clean Random Access
(CRA) picture, which is similar to what has been conventionally
called open Group-of-Pictures (GOP) intra picture. Compared to
Schierl, et al Expires April 22, 2013 [Page 5]
Internet-Draft RTP Payload Format for HEVC October 2012
H.264 wherein a CRA picture may be signalled using a recovery point
Supplemental Enhancement Information (SEI) message, in HEVC a
distinct NAL unit type is used for indication of a CRA picture.
Furthermore, HEVC specifies that a conforming bitstream may start
with a CRA picture, compared to in H.264 a conforming must start
with an IDR picture.
Temporal layer access (TLA) pictures were introduced in HEVC to
indicate temporal layer switching points.
Predictively coded pictures can include uni-predicted and bi-
predicted slices. The flexibility in creating picture coding
structures is roughly comparable to H.264.
The VCL generates and consumes syntax structures designed to be
adaptable to MTU sizes commonly found in IP networks, irrespective
of the size of a coded picture. Picture segmentation is achieved
through slices. The Network Adaptation Layer (NAL) is responsible
for information required to the decoding process of more than one
slice, which are collected in parameter sets. A number of data
structures not strictly required for the decoding process, but
potentially helpful in decoding systems can be conveyed in data
structures such as Supplementary Enhancement Information (SEI)
messages, Access unit delimiters, and so on.
All the aforementioned MTU-sized (or smaller) data structures are
available in the form of Network Adaptation Layer Units.
The single distinguishing difference between HEVC and H.264 with
respect to the RTP payload format design is the availability of VCL-
based coding tools that are specifically designed to enable
processing on high-level parallel architectures. These tools are
described below in sufficient detail to provide motivation for the
parallel processing signaling support that is described in section
7.2.5.
1.1.2 Parallel Processing Support
The reportedly significantly higher computational demand of HEVC
over H.264 (especially with respect to encoders), in conjunction
with the ever increasing video resolution (both spatially and
temporally) required by the market, led to the adoption of VCL
coding tools specifically targeted to allow for parallelization on
the sub-picture level. That is, parallelization occurs, at the
minimum, at the granularity of an integer number of treeblocks. The
targets for this type of high-level parallelization are multicore
CPUs and DSPs as well as multiprocessor systems. In a system
Schierl, et al Expires April 22, 2013 [Page 6]
Internet-Draft RTP Payload Format for HEVC October 2012
design, to be useful, these tools require signaling support, which
is provided in section 7.2.5 of this memo. This section provides a
brief overview of the tools available in [HEVC]. This section is
expected to be updated frequently as the HEVC draft evolves.
For parallelization, four picture partition strategies are
available.
Regular slices are segments of the bitstream that can be
reconstructed independently from other regular slices within the
same picture (though there may still be interdependencies through
loop filtering operations). Regular slices are the only tool that
can be used for parallelization that is also available, in virtually
identical form, in H.264. Regular slices based parallelization does
not require much inter-processor or inter-core communication (except
for inter-processor or inter-core data sharing for motion
compensation when decoding a predictively coded picture, which is
typically much heavier than inter-processor or inter-core data
sharing due to in-picture prediction), as slices are designed to be
independently decodable. However, for the same reason, regular
slices can require some coding overhead. Further, regular slices
(in contrast to some of the other tools mentioned below) also serve
as the key mechanism for bitstream partitioning to match MTU size
requirements, due to the in-picture independence of regular slices
and that each regular slice is encapsulated in its own NAL unit. In
many cases, the goal of parallelization and the goal of MTU size
matching can place contradicting demands to the slice layout in a
picture. The realization of this situation led to the development
of the more advanced tools mentioned below. This payload format
does not contain any specific mechanisms aiding parallelization
through regular slices.
Dependent slices allow for the fragmentation of a coded bitstream
into fragments at treeblock boundaries, without breaking any in-
picture prediction mechanism. They are complimentary to the
fragmentation mechanism described in this memo in that they need the
cooperation of the encoder, or parsing of the slice header in a
Media Aware Network Element (MANE) so to identify coded treeblock
boundaries and enable byte alignment. A dependent slice necessarily
contains an integer number of coded treeblocks, a decoder using
multiple cores operating on treeblocks can process a dependent slice
if entropy and intra/inter coding information from preceding
treeblocks is available. Fragmentation, as specified in this memo,
in contrast, does not guarantee that a fragment contains an integer
number of treeblocks.
Schierl, et al Expires April 22, 2013 [Page 7]
Internet-Draft RTP Payload Format for HEVC October 2012
In Wavefront Parallel Processing, the picture is partitioned into
rows of treeblocks. Entropy decoding and prediction are allowed to
use data from treeblocks in other partitions. Parallel processing
is possible through parallel decoding of rows of treeblocks, where
the start of the decoding of a row is delayed by two treeblocks, so
to ensure that data related to a treeblock above and to the right of
the subject treeblock is available before the subject treeblock is
being decoded. Using this staggered start (which appears like a
wavefront when represented graphically), parallelization is possible
with up to as many processors/cores as the picture contains
treeblock rows.
Because in-picture prediction between neighboring treeblock rows
within a picture is allowed, the required inter-processor/inter-core
communication to enable in-picture prediction can be substantial.
The wavefront parallel processing partitioning does not result into
more NAL units compared to when it is not applied, thus wavefront
parallel processing may be also used for MTU size matching in case
of using dependent slices.
Tiles define horizontal and vertical boundaries that partition a
picture into tile columns and rows. The scan order of treeblocks is
changed to be local within a tile (in the order of a treeblock
raster can of a tile), before decoding the top-left treeblock of the
next tile in the order of tile raster scan of a picture. Similar to
regular slices, tiles break in-picture prediction dependencies
(including entropy decoding dependencies). However, they do not
need to be included into individual NAL units (same as wavefront
parallel processing in this regard), hence tiles cannot be used for
MTU size matching. Each tile can be processed by one
processor/core, and the inter-processor/inter-core communication
required for in-picture prediction between processing units decoding
neighboring tiles is limited to conveying the shared slice header in
cases a slice is spanning more than one tile, and loop filtering
related sharing of reconstructed samples and metadata. Insofar,
tiles are less demanding in terms of memory bandwidth compared to
WPP due to the in-picture independence between two neighboring
partitions. Tiles are included in the (single) existing profile of
[HEVC] and the support in the context of this memo will be specified
in section 7 of this memo.
The interaction between regular slices and tiles is simplified by
constraints of the HEVC draft. Specifically, for each slice and
tile, either or both of the following conditions must be fulfilled:
1) all coded treeblocks in a slice belong to the same tile; 2) all
coded treeblocks in a tile belong to the same slice.
Schierl, et al Expires April 22, 2013 [Page 8]
Internet-Draft RTP Payload Format for HEVC October 2012
1.1.3 Parameter Sets
The parameter set concept is borrowed from [H.264] with no
conceptual changes. In addition to Sequence Parameter Sets (SPS),
carrying data valid to the whole video sequence, and Picture
Parameter Sets (PPS), carrying information valid on a picture by
picture base, the new Video Parameter Set (VPS) has been introduced.
At the time of writing, the VPS includes information about maximum
profile and level as well as information related to temporal
scalability and Hypothetical Reference Decoder (HRD) parameters.
For the HEVC extensions for scalable (SHVC) and 3D coding, the VPS
is planned to also convey information about non-temporal layer
dependency, and related side information.
1.1.4 NAL Unit Header
HEVC maintains the NAL unit concept of H.264 with modifications.
HEVC uses a two byte NAL unit header. Table 1 lists the allocation
of NAL unit types for VCL NAL units and non-VCL NAL units.
Schierl, et al Expires April 22, 2013 [Page 9]
Internet-Draft RTP Payload Format for HEVC October 2012
Table 1. NAL unit types in HEVC
Values marked as "Unspecified" are intended for use by
specifications other than HEVC, for example by this RTP payload
format.
Type NAL Unit Name NAL unit type class
----------------------------------------------------------------
0 TRAIL_N Coded slice seg. of a non-TSA ,non-STSA trailing
picture VCL
1 TRAIL_R Coded slice seg. of a non-TSA, non-STSA trailing
picture VCL
2 TSA_N Coded slice segment of a TSA picture VCL
3 TSA_R Coded slice segment of a TSA pictur VCL
4 STSA_N Coded slice segment of an STSA picture VCL
5 STSA_R Coded slice segment of an STSA picture VCL
6 RADL_N Coded slice segment of an RADL picture VCL
7 RADL_R Coded slice segment of an RADL picture VCL
8 RASL_N Coded slice segment of an RASL picture VCL
9 RASL_R Coded slice segment of an RASL picture VCL
10,12,14 RSV_VCL_N10, ..N12, ..N14 Reserved N VCL VCL
11,13,15 RSV_VCL_R11, ..R13, ..R15 Reserved R VCL VCL
16 BLA W TFD Coded slice segment of a BLA picture VCL
17 BLA W DLP Coded slice segment of a BLA picture VCL
18 BLA N LP Coded slice segment of a BLA picture VCL
19 IDR W LP Coded slice segment of an IDR picture VCL
20 IDR N LP Coded slice segment of an IDR picture VCL
21 CRA_NUT Coded slice segment of a CRA picture VCL
22..23 RSV_RAP_VCL22, RSV_RAP_VCL23 Reserved RAP VCL
24..31 RSV NVCL24..NVCL31 Reserved VCL VCL
32 VPS NUT Video parameter set non-VCL
33 SPS NUT Sequence parameter set non-VCL
34 PPS NUT Picture parameter set non-VCL
35 AUD NUT Access unit delimiter non-VCL
36 EOS NUT End of sequence non-VCL
37 EOB NUT End of bitsteam non-VCL
38 FD NUT Filler data non-VCL
39 PREFIX_SEI_NUT Prefix Supplemental enhancement information
(SEI) non-VCL
40 SUFIX_SEI_NUT Suffix Supplemental enhancement information
(SEI) non-VCL
41..47 RSV_NVCL41..NVCL47 Reserved non-VCL
48..63 UNSPEC48..UNSPEC63 Unspecified non-VCL
Schierl, et al Expires April 22, 2013 [Page 10]
Internet-Draft RTP Payload Format for HEVC October 2012
The syntax and semantics of the NAL unit header are specified in
[HEVC], but the essential properties of the NAL unit header are
summarized below for convenience.
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Type | R | TID |
+-------------+-----------------+
The semantics of the components of the NAL unit type octets, as
specified in [HEVC], are described briefly below. In addition to
the name and size of each field, the corresponding syntax element
name in [HEVC] is also provided.
F: 1 bit
forbidden_zero_bit. MUST be zero. HEVC declares a value of 1 as
a syntax violation. Note: the bit is wasted for compatibility
with MPEG-2 transport systems.
Type: 6 bits
nal_unit_type. This component specifies the NAL unit type as
defined in Table 7-1 of [HEVC], and in Table 1 in this memo. For
a reference of all currently defined NAL unit types and their
semantics, please refer to Section 7.4.1 in [HEVC].
R: 6 bits
reserved_6 bits. Reserved bits for future extension (such as
scalability and three-dimension video extensions). R MUST be
equal to "000000" (in binary form).
TID: 3 bits
temporal_id. This component indicates the temporal identifier of
the NAL unit in the coded sequence, plus 1. A TID value of 0 is
illegal to prevent start code emulations in MPEG-2 systems.
This memo extends the semantics of F and TID, as described in
Section 4.2.
1.2. Overview of the Payload Format
This payload format defines the following processes required for
transport of HEVC coded data over RTP [RFC3550]:
o Usage of RTP header with this payload format
o Packetization of HEVC coded NAL units into RTP packets
Schierl, et al Expires April 22, 2013 [Page 11]
Internet-Draft RTP Payload Format for HEVC October 2012
o Transmission of HEVC NAL units of the same bitstream within a
single RTP session
o Payload format parameters to be used within the Session
Description Protocol (SDP) [RFC4566].
2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119].
This specification uses the notion of setting and clearing a bit
when bit fields are handled. Setting a bit is the same as assigning
that bit the value of 1 (On). Clearing a bit is the same as
assigning that bit the value of 0 (Off).
3. Definitions and Abbreviations
3.1 Definitions
This document uses the terms and definitions of [HEVC]. Section
3.1.1 lists relevant definitions copied from [HEVC] for convenience.
Section 3.1.2 gives definitions specific to this memo.
3.1.1 Definitions from the HEVC Specification
access unit: A set of NAL units that are consecutive in decoding
order and contain exactly one coded picture. In addition to the
coded slice NAL units of the coded picture, the access unit may
also contain other NAL units not containing slices of the coded
picture. The decoding of an access unit always results in a
decoded picture.
dependent slice segment: A slice segment for which the values of
some syntax elements of the slice segment header are inferred
from the values for the preceding independent slice segment in
decoding order.
coded video sequence: A sequence of access units that consists,
in decoding order, of a CRA access unit that is the first access
unit in the bitstream, an IDR access unit or a BLA access unit,
followed by zero or more non-IDR and non-BLA access units
including all subsequent access units up to but not including any
independent slice segment: A slice segment for which the values
Schierl, et al Expires April 22, 2013 [Page 12]
Internet-Draft RTP Payload Format for HEVC October 2012
of the syntax elements of the slice segment header are not
inferred from the values for a preceding slice segment.
slice: An integer number of coding tree units contained in one
independent slice segment and all subsequent dependent slice
segments (if any) that precede the next independent slice segment
(if any) within the same access unit.
slice segment: An integer number of coding tree blocks units
ordered consecutively in the tile scan and contained in a single
NAL unit; t. The division of each picture into slice segments is
a partitioning.
subsequent IDR or BLA access unit.CRA access unit: An access unit
in which the coded picture is a CRA picture.CRA picture: A RAP
picture for which each slice has nal_unit_type equal to
CRA_NUT.IDR access unit: An access unit in which the coded
picture is an IDR picture.IDR picture: A RAP picture for which
each slice has nal_unit_type equal to IDR_W_LP or IDR_N_LP.Random
Access: The act of starting the decoding process for a bitstream
at a point other than the beginning of the stream.
RAP access unit: An access unit in which the coded picture is a
RAP picture.
RAP picture: A coded picture containing only I slices and for
which each slice has nal_unit_type in the range of 7 to 12,
inclusive.
tile: An integer number of coding tree blocks co-occurring in one
column and one row, ordered consecutively in coding tree block
raster scan of the tile. The division of each picture into tiles
is a partitioning. Tiles in a picture are ordered consecutively
in tile raster scan of the picture.
3.1.2 Definitions Specific to This Memo
media aware network element (MANE): A network element, such as a
middlebox or application layer gateway that is capable of parsing
certain aspects of the RTP payload headers or the RTP payload and
reacting to their contents.
Informative note: The concept of a MANE goes beyond normal
routers or gateways in that a MANE has to be aware of the
signaling (e.g., to learn about the payload type mappings of
the media streams), and in that it has to be trusted when
working with SRTP. The advantage of using MANEs is that they
Schierl, et al Expires April 22, 2013 [Page 13]
Internet-Draft RTP Payload Format for HEVC October 2012
allow packets to be dropped according to the needs of the
media coding. For example, if a MANE has to drop packets due
to congestion on a certain link, it can identify and remove
those packets whose elimination produces the least adverse
effect on the user experience. After dropping packets, MANEs
must rewrite RTCP packets to match the changes to the RTP
packet stream as specified in Section 7 of [RFC3550].
NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.1.2.3 in
[HEVC].
NALU-time: The value that the RTP timestamp would have if the NAL
unit would be transported in its own RTP packet.
RTP packet stream: A sequence of RTP packets with increasing
sequence numbers (except for wrap-around), identical PT and
identical SSRC (Synchronization Source), carried in one RTP
session. Within the scope of this memo, one RTP packet stream is
utilized to transport one or more layers.
transmission order: The order of packets in ascending RTP
sequence number order (in modulo arithmetic). Within an
aggregation packet, the NAL unit transmission order is the same
as the order of appearance of NAL units in the packet.
3.2 Abbreviations
TBD
4. RTP Payload Format
4.1 RTP Header Usage
The format of the RTP header is specified in [RFC3550] and reprinted
in Figure 1 for convenience. This payload format uses the fields of
the header in a manner consistent with that specification.
The RTP payload (and the settings for some RTP header bits) for
aggregation packets and fragmentation units are specified in
Sections 4.6 and 4.8, respectively.
Schierl, et al Expires April 22, 2013 [Page 14]
Internet-Draft RTP Payload Format for HEVC October 2012
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1 RTP header according to [RFC3550]
The RTP header information to be set according to this RTP payload
format is set as follows:
Marker bit (M): 1 bit
Set for the last packet of the access unit indicated by the RTP
timestamp, in line with the normal use of the M bit in video
formats, to allow an efficient playout buffer handling. For
aggregation packets (STAP), the marker bit in the RTP header MUST
be set to the value that the marker bit of the last NAL unit of
the aggregation packet would have been if it were transported in
its own RTP packet. Decoders MAY use this bit as an early
indication of the last packet of an access unit but MUST NOT rely
on this property.
Informative note: Only one M bit is associated with an
aggregation packet carrying multiple NAL units. Thus, if a
gateway has re-packetized an aggregation packet into several
packets, it cannot reliably set the M bit of those packets.
Payload type (PT): 7 bits
The assignment of an RTP payload type for this new packet format
is outside the scope of this document and will not be specified
here. The assignment of a payload type has to be performed
either through the profile used or in a dynamic way.
Schierl, et al Expires April 22, 2013 [Page 15]
Internet-Draft RTP Payload Format for HEVC October 2012
Sequence number (SN): 16 bits
Set and used in accordance with RFC 3550. In some packetization
modes (list TBD), the sequence number is used to determine
decoding order for the NALUs.
Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the
content. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g.,
parameter set and SEI NAL units), the RTP timestamp is set to the
RTP timestamp of the coded picture of the access unit in which
the NAL unit is included, according to Section 7.4.1.2.3 of
[HEVC].
Receivers SHOULD ignore any picture timing SEI messages included
in access units that have only one display timestamp. Instead,
receivers SHOULD use the RTP timestamp for synchronizing the
display process. If one access unit has more than one display
timestamp carried in a picture timing SEI message, then the
information in the SEI message SHOULD be treated as relative to
the RTP timestamp, with the earliest event occurring at the time
given by the RTP timestamp and subsequent events later, as given
by the difference in picture time values carried in the picture
timing SEI message. Let tSEI1, tSEI2, ..., tSEIn be the display
timestamps carried in the SEI message of an access unit, where
tSEI1 is the earliest of all such timestamps. Let tmadjst() be a
function that adjusts the SEI messages time scale to a 90-kHz
time scale. Let TS be the RTP timestamp. Then, the display time
for the event associated with tSEI1 is TS. The display time for
the event with tSEIx, where x is [2..n], is TS + tmadjst (tSEIx -
tSEI1).
4.2 NAL Unit Header Usage
The structure and semantics of the NAL unit header according to the
HEVC specification [HEVC] were introduced in Section 1.1.4. This
section specifies the extended semantics of the NAL unit header
fields.
4.3 Payload Structures
The NAL unit structure is central to HEVC [HEVC], all HEVC coded
bits for representing a video signal are encapsulated in NAL units.
Therefore each RTP packet payload is structured as a NAL unit, which
Schierl, et al Expires April 22, 2013 [Page 16]
Internet-Draft RTP Payload Format for HEVC October 2012
contains one or a part of one NAL unit specified in HEVC, or
aggregates one or more NAL units specified in HEVC.
4.4 Transmission Modes
This memo enables transmission of an HEVC bitstream over a single
RTP session or multiple RTP sessions.
4.5 Packetization Modes
This memo specifies the following packetization modes:
o Non-interleaved mode
o Interleaved mode
In the non-interleaved mode, NAL units are transmitted in NAL unit
decoding order. The interleaved mode allows transmission of NAL
units outside of NAL unit decoding order.
The packetization mode in use MAY be signaled by the value of the
OPTIONAL packetization-mode media type parameter. The used
packetization mode governs which NAL unit types are allowed in RTP
payloads. Table 2 summarizes the allowed packet payload types for
each packetization mode. Packetization modes are explained in more
detail in section 6.
Table 2. Summary of allowed NAL unit types for each packetization
mode (yes = allowed, no = disallowed, ig = ignore)
Payload Packet Non-Interleaved Interleaved
Type Type Mode Mode
-------------------------------------------------
0 reserved ig ig
1-47 NAL unit yes no
48 STAP-A yes no
49 STAP-B no yes
50 FU-A yes yes
51 FU-B no yes
52-63 reserved ig ig
Some NAL unit or payload type values (indicated as reserved in
Table 2) are reserved for future extensions. NAL units of those
types SHOULD NOT be sent by a sender (direct as packet payloads, or
as aggregation units in aggregation packets, or as fragmented units
in FU packets), MUST be ignored by a receiver, and SHOULD be
forwarded unchanged by a MANE.
Schierl, et al Expires April 22, 2013 [Page 17]
Internet-Draft RTP Payload Format for HEVC October 2012
For example, the payload types 1-47, with the associated packet type
"NAL unit", are allowed in "Non-Interleaved Mode", but disallowed in
"Interleaved Mode". However, NAL units of NAL unit types 1-47 can
be used in "Interleaved Mode" as aggregation units in STAP-B packets
as well as fragmented units in FU-A and FU-B packets. Similarly,
NAL units of NAL unit types 1-47 can also be used in the "Non-
Interleaved Mode" as aggregation units in STAP-A packets or
fragmented units in FU-A packets, in addition to being directly used
as packet payloads.
4.6 Decoding Order
In the interleaved packetization mode, the transmission order of NAL
units is allowed to differ from the decoding order of the NAL units.
Decoding order number (DON) is a field in the payload structure or a
derived variable that indicates the NAL unit decoding order.
Rationale and examples of use cases for transmission out of decoding
order and for the use of DON are given in section 13.
The coupling of transmission and decoding order is controlled by the
OPTIONAL sprop-interleaving-depth media type parameter as follows.
When the value of the OPTIONAL sprop-interleaving-depth media type
parameter is equal to 0 (explicitly or per default), the
transmission order of NAL units MUST conform to the NAL unit
decoding order. When the value of the OPTIONAL sprop-interleaving-
depth media type parameter is greater than 0,
o the order of NAL units generated by de-packetizing STAP-Bs, and
FUs in two consecutive packets is NOT REQUIRED to be the NAL unit
decoding order.
The RTP payload structures for an STAP-A, and an FU-A do not include
DON. STAP-B and FU-B structures include DON.
Informative note: When an FU-A occurs in interleaved mode, it
always follows an FU-B, which sets its DON.
Informative note: If a transmitter wants to encapsulate a single
NAL unit per packet and transmit packets out of their decoding
order, STAP-B packet type can be used.
In the non-interleaved packetization mode, the transmission order of
NAL units in single NAL unit packets, STAP-As, and FU-As MUST be the
same as their NAL unit decoding order. The NAL units within an STAP
MUST appear in the NAL unit decoding order. Thus, the decoding
order is first provided through the implicit order within a STAP,
Schierl, et al Expires April 22, 2013 [Page 18]
Internet-Draft RTP Payload Format for HEVC October 2012
and second provided through the RTP sequence number for the order
between STAPs, FUs, and single NAL unit packets.
Signaling of the value of DON for NAL units carried in STAP-B, and a
series of fragmentation units starting with an FU-B is specified in
sections 4.7.1, and 4.8, respectively. The DON value of the first
NAL unit in transmission order MAY be set to any value. Values of
DON are in the range of 0 to 65535, inclusive. After reaching the
maximum value, the value of DON wraps around to 0.
The decoding order of two NAL units contained in any STAP-B, or a
series of fragmentation units starting with an FU-B is determined as
follows. Let DON(i) be the decoding order number of the NAL unit
having index i in the transmission order. Function don_diff(m,n) is
specified as follows:
If DON(m) == DON(n), don_diff(m,n) = 0
If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
don_diff(m,n) = DON(n) - DON(m)
If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
don_diff(m,n) = 65536 - DON(m) + DON(n)
If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
don_diff(m,n) = - (DON(m) + 65536 - DON(n))
If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
don_diff(m,n) = - (DON(m) - DON(n))
A positive value of don_diff(m,n) indicates that the NAL unit having
transmission order index n follows, in decoding order, the NAL unit
having transmission order index m. When don_diff(m,n) is equal to
0, then the NAL unit decoding order of the two NAL units can be in
either order. A negative value of don_diff(m,n) indicates that the
NAL unit having transmission order index n precedes, in decoding
order, the NAL unit having transmission order index m.
Values of the DON field MUST be such that the decoding order
determined by the values of DON, as specified above, conforms to the
NAL unit decoding order. If the order of two NAL units in NAL unit
decoding order is switched and the new order does not conform to the
NAL unit decoding order, the NAL units MUST NOT have the same value
of DON. If the order of two consecutive NAL units in the NAL unit
stream is switched and the new order still conforms to the NAL unit
decoding order, the NAL units MAY have the same value of DON.
Consequently, NAL units having the same value of DON can be decoded
Schierl, et al Expires April 22, 2013 [Page 19]
Internet-Draft RTP Payload Format for HEVC October 2012
in any order, and two NAL units having a different value of DON
should be passed to the decoder in the order specified above. When
two consecutive NAL units in the NAL unit decoding order have a
different value of DON, the value of DON for the second NAL unit in
decoding order SHOULD be the value of DON for the first, incremented
by one.
An example of the de-packetization process to recover the NAL unit
decoding order is given in section 7.
Informative note: Receivers should not expect that the absolute
difference of values of DON for two consecutive NAL units in the
NAL unit decoding order will be equal to one, even in error-free
transmission. An increment by one is not required, as at the
time of associating values of DON to NAL units, it may not be
known whether all NAL units are delivered to the receiver. For
example, a gateway may not forward coded slice NAL units of non-
reference pictures or SEI NAL units when there is a shortage of
bit rate in the network to which the packets are forwarded. In
another example, a live broadcast is interrupted by pre-encoded
content, such as commercials, from time to time. The first intra
picture of a pre-encoded clip is transmitted in advance to ensure
that it is readily available in the receiver. When transmitting
the first intra picture, the originator does not exactly know how
many NAL units will be encoded before the first intra picture of
the pre-encoded clip follows in decoding order. Thus, the values
of DON for the NAL units of the first intra picture of the pre-
encoded clip have to be estimated when they are transmitted, and
gaps in values of DON may occur.
4.7 Aggregation Packets
Aggregation packets are the NAL unit aggregation scheme of this
payload specification. The scheme is introduced to enable the
reduction of packetization overhead for small NAL units, such as
most of the non-VCL NAL units (which are often only a few octets
long).
The Single-time aggregation packet (STAP) aggregates NAL units with
identical NALU-time. Two types of STAPs are defined, one without
DON (STAP-A) and another including DON (STAP-B).
Each NAL unit to be carried in an aggregation packet is encapsulated
in an aggregation unit. The structure of the RTP payload format for
aggregation packets is presented in Figure 2.
Schierl, et al Expires April 22, 2013 [Page 20]
Internet-Draft RTP Payload Format for HEVC October 2012
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Type | R | TID | |
+-------------+-----------------+ |
| |
| one or more aggregation units |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2 RTP payload format for aggregation packets
STAPs have the following packetization rules: The type field of the
NAL unit type octet MUST be set to the appropriate value for STAP,
as indicated in Table 2. The F bit MUST be cleared if all F bits of
the aggregated NAL units are zero; otherwise, it MUST be set. The
value of R MUST be the lowest value of R of any aggregation unit's
R.
The marker bit in the RTP header is set to the value that the marker
bit of the last NAL unit of the aggregated packet would have if it
were transported in its own RTP packet.
The payload of an aggregation packet consists of one or more
aggregation units as described below in section 4.7.1. An
aggregation packet can carry as many aggregation units as necessary;
however, the total amount of data in an aggregation packet obviously
MUST fit into an IP packet, and the size SHOULD be chosen so that
the resulting IP packet is smaller than the MTU size so to avoid IP
layer fragmentation. An aggregation packet MUST NOT contain
fragmentation units specified in section 4.8. Aggregation packets
MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
another aggregation packet.
4.7.1 Single Time Aggregation Packet (STAP)
The payload of an STAP consists of at least one single-time
aggregation unit, with a format as presented in Figure 3. The
payload of an STAP-B consists of a 16-bit unsigned decoding order
number (DON) (in network byte order) followed by at least one
single-time aggregation unit, as presented in Figure 4.
Schierl, et al Expires April 22, 2013 [Page 21]
Internet-Draft RTP Payload Format for HEVC October 2012
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: |
+-+-+-+-+-+-+-+-+ |
| |
| single-time aggregation units |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3 Payload format for STAP-A
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: decoding order number (DON) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| single-time aggregation units |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 Payload format for STAP-B
The DON field specifies the value of DON for the first NAL unit in
an STAP-B in transmission order. For each successive NAL unit in
appearance order in an STAP-B, the value of DON is equal to (the
value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
which '%' stands for the modulo operation.
A single-time aggregation unit consists of 16-bit unsigned size
information (in network byte order) that indicates the size of the
following NAL unit in bytes (excluding these two octets, but
including the NAL unit type octet of the NAL unit), followed by the
NAL unit itself, including its NAL unit type byte. A single-time
aggregation unit is byte aligned within the RTP payload, but it may
not be aligned on a 32-bit word boundary. Figure 5 presents the
structure of the single-time aggregation unit.
Schierl, et al Expires April 22, 2013 [Page 22]
Internet-Draft RTP Payload Format for HEVC October 2012
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: NAL unit size | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| NAL unit |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5 Structure for single-time aggregation unit (STAU)
Figure 6 presents an example of an RTP packet that contains an STAP-
A. The STAP-A contains two single-time aggregation units, labeled
as 1 and 2 in the figure.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| STAP NAL HDR | NALU 1 Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 HDR | NALU 1 Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . . . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . . . | NALU 2 Size | NALU 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR | NALU 2 Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . . . |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6 An example of an RTP packet including an STAP-A containing
two single-time aggregation units
Figure 7 presents an example of an RTP packet that contains an STAP-
B. The STAP contains two single-time aggregation units, labeled as
1 and 2 in the figure.
Schierl, et al Expires April 22, 2013 [Page 23]
Internet-Draft RTP Payload Format for HEVC October 2012
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| STAP-B NAL HDR | DON |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Size | NALU 1 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Data |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 Size | NALU 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 HDR | NALU 2 Data |
+-+-+-+-+-+-+-+-+ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7 An example of an RTP packet including an STAP-B containing
two single-time aggregation units
4.8 Fragmentation Units (FUs)
This payload type allows fragmenting a NAL unit into several RTP
packets. Doing so on the application layer instead of relying on
lower layer fragmentation (e.g., by IP) may have the following use
cases:
o The payload format is capable of transporting NAL units bigger
than 64 kbytes over an IPv4 network that may be present in pre-
recorded video, particularly in High Definition formats (there is
a limit of the number of slices per picture, which results in a
limit of NAL units per picture, which may result in big NAL
units).
o The fragmentation mechanism allows fragmenting a single NAL unit
and applying generic forward error correction.
Note: Please see section 1.1.2 for the relationship between
fragmentation and dependent slices.
Schierl, et al Expires April 22, 2013 [Page 24]
Internet-Draft RTP Payload Format for HEVC October 2012
Fragmentation is defined only for a single NAL unit and not for any
aggregation packets. A fragment of a NAL unit consists of an
integer number of consecutive octets of that NAL unit. Each octet
of the NAL unit MUST be part of exactly one fragment of that NAL
unit. Fragments of the same NAL unit MUST be sent in consecutive
order with ascending RTP sequence numbers (with no other RTP packets
within the same RTP packet stream being sent between the first and
last fragment). Similarly, a NAL unit MUST be reassembled in RTP
sequence number order.
When a NAL unit is fragmented and conveyed within fragmentation
units (FUs), it is referred to as a fragmented NAL unit. STAPs MUST
NOT be fragmented. FUs MUST NOT be nested; i.e., an FU MUST NOT
contain another FU.
The RTP timestamp of an RTP packet carrying an FU is set to the
NALU-time of the fragmented NAL unit.
Figure 8 presents the RTP payload format for FU-A. An FU-A consists
of a fragmentation unit NAL unit header, a fragmentation unit header
of one octet, and a fragmentation unit payload.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU NAL HDR | FU header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| FU payload |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8 RTP payload format for FU-A
Figure 9 presents the RTP payload format for FU-Bs. An FU-B
consists of a fragmentation unit NAL unit header, a fragmentation
unit header of one octet, a decoding order number (DON) (in network
byte order), and a fragmentation unit payload. In other words, the
structure of FU-B is the same as the structure of FU-A, except for
the additional DON field.
Schierl, et al Expires April 22, 2013 [Page 25]
Internet-Draft RTP Payload Format for HEVC October 2012
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU NAL unit header | FU header | DON |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| DON | |
|-+-+-+-+-+-+-+-+ |
| FU payload |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 9 RTP payload format for FU-B
NAL unit type FU-B MUST be used in the interleaved packetization
mode for the first fragmentation unit of a fragmented NAL unit. NAL
unit type FU-B MUST NOT be used in any other case. In other words,
in the interleaved packetization mode, each NALU that is fragmented
has an FU-B as the first fragment, followed by one or more FU-A
fragments.
The FU NAL unit header has the same format as any NAL unit header,
as described in section 1.1.4 above. A value equal to 50 in the
Type field of the FU indicator octet identifies an FU-A packet and a
value of 51 identifies an FU-B packet. The use of the F bit is
described in section 5. The value of the N field MUST be set
according to the value of the N field in the fragmented NAL unit.
The FU header has the following format:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|S|E| Type |
+---------------+
S: 1 bit
When set to one, the Start bit indicates the start of a
fragmented NAL unit. When the following FU payload is not the
start of a fragmented NAL unit payload, the Start bit is set to
zero.
E: 1 bit
When set to one, the End bit indicates the end of a fragmented
NAL unit, i.e., the last byte of the payload is also the last
Schierl, et al Expires April 22, 2013 [Page 26]
Internet-Draft RTP Payload Format for HEVC October 2012
byte of the fragmented NAL unit. When the following FU payload
is not the last fragment of a fragmented NAL unit, the End bit is
set to zero.
Type: 6 bits
The NAL unit payload type as defined in Table 7-1 of [HEVC].
The value of DON in FU-Bs is selected as described in section 4.6.
Informative note: The DON field in FU-Bs allows gateways to
fragment NAL units to FU-Bs without organizing the incoming NAL
units to the NAL unit decoding order.
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
the Start bit and End bit MUST NOT both be set to one in the same FU
header.
The FU payload consists of fragments of the payload of the
fragmented NAL unit so that if the fragmentation unit payloads of
consecutive FUs are sequentially concatenated, the payload of the
fragmented NAL unit can be reconstructed. The NAL unit type octet
of the fragmented NAL unit is not included as such in the
fragmentation unit payload, but rather the information of the NAL
unit type octet of the fragmented NAL unit is conveyed in F and N
fields of the FU indicator octet of the fragmentation unit and in
the type field of the FU header. An FU payload MAY have any number
of octets and MAY be empty.
If a fragmentation unit is lost, the receiver SHOULD discard all
following fragmentation units in transmission order corresponding to
the same fragmented NAL unit, unless the decoder in the receiver is
known to be prepared to gracefully handle incomplete NAL units.
A receiver in an endpoint or in a MANE MAY aggregate the first n-1
fragments of a NAL unit to an (incomplete) NAL unit, even if
fragment n of that NAL unit is not received. In this case, the
forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
syntax violation.
5. Packetization Rules
The packetization modes are introduced in section 4.5. The
packetization rules common to more than one of the packetization
modes are specified in section 5.1. The packetization rules for the
non-interleaved mode are specified in section 5.2, and the
Schierl, et al Expires April 22, 2013 [Page 27]
Internet-Draft RTP Payload Format for HEVC October 2012
packetization rules for the interleaved mode are specified in
sections 5.3.
5.1 Common Packetization Rules
All senders MUST enforce the following packetization rules
regardless of the packetization mode in use:
o VCL NAL units belonging to the same coded picture (and thus
sharing the same RTP timestamp value) SHOULD be sent in their
original decoding order to minimize the delay. Note that the
decoding order is the order of the NAL units in the bitstream.
o Parameter sets are handled in accordance with the rules and
recommendations given in section 7.4.
o MANEs MUST NOT duplicate any NAL unit except for sequence or
picture parameter set NAL units, as neither this memo nor the
HEVC specification provides means to identify duplicated NAL
units. Sequence and picture parameter set NAL units MAY be
duplicated to make their correct reception more likely, but any
such duplication MUST NOT affect the contents of any active
sequence or picture parameter set and the additional bandwidth
taken by the duplication MUST NOT increase network congestion
beyond what is "allowed" for the session (see section xxx for
details).
Senders using the non-interleaved mode and the interleaved mode MUST
enforce the following packetization rule:
o MANEs MAY convert single NAL unit packets into one aggregation
packet, convert an aggregation packet into several single NAL
unit packets, or mix both concepts, in an RTP translator. The
RTP translator SHOULD take into account at least the following
parameters: path MTU size, unequal protection mechanisms (e.g.,
through packet-based FEC according to [RFC5109], especially for
sequence and picture parameter set NAL units and coded slice data
partition A NAL units), bearable latency of the system, and
buffering capabilities of the receiver.
Informative note: An RTP translator is required to handle RTCP
as per [RFC3550].
Schierl, et al Expires April 22, 2013 [Page 28]
Internet-Draft RTP Payload Format for HEVC October 2012
5.2 Non-Interleaved mode
This mode MUST be supported. This mode is in use when the value of
the OPTIONAL packetization-mode media type parameter is equal to 1.
It is primarily intended for low-delay applications. Only single
NAL unit packets, STAPs, and FUs MAY be used in this mode. The
transmission order of NAL units MUST comply with the NAL unit
decoding order.
5.3 Interleaved mode
This mode is in use when the value of the OPTIONAL packetization-
mode media type parameter is equal to 2. Some receivers MAY support
this mode. STAP-Bs, FU-As, and FU-Bs MAY be used. STAP-As and
single NAL unit packets MUST NOT be used. The transmission order of
packets and NAL units is constrained as specified in section 4.6.
6. De-Packetization Process
The de-packetization process is implementation dependent.
Therefore, the following description should be seen as an example of
a suitable implementation. Other schemes may be used as well as
long as the output for the same input is the same as the process
described below. The output is the same meaning that the number of
NAL units and their order are both the identical. Optimizations
relative to the described algorithms are likely possible. Section
6.1 presents the de-packetization process for the non-interleaved
packetization mode and section 6.2 presents the de-packetization
process for the interleaved packetization mode.
All normal RTP mechanisms related to buffer management apply. In
particular, duplicated or outdated RTP packets (as indicated by the
RTP sequences number and the RTP timestamp) are removed. To
determine the exact time for decoding, factors such as a possible
intentional delay to allow for proper inter-stream synchronization
must be factored in.
6.1 Non-Interleaved Mode
The receiver includes a receiver buffer to compensate for
transmission delay jitter. The receiver stores incoming packets in
reception order into the receiver buffer. Packets are de-packetized
in RTP sequence number order. If a de-packetized packet is a single
NAL unit packet, the NAL unit contained in the packet is passed
directly to the decoder. If a de-packetized packet is an STAP-A,
Schierl, et al Expires April 22, 2013 [Page 29]
Internet-Draft RTP Payload Format for HEVC October 2012
the NAL units contained in the packet are passed to the decoder in
the order in which they are encapsulated in the packet. For all the
FU-A packets containing fragments of a single NAL unit, the de-
packetized fragments are concatenated in their sending order to
recover the NAL unit, which is then passed to the decoder.
6.2 Interleaved Mode
The general concept behind these de-packetization rules is to
reorder NAL units from transmission order to the NAL unit decoding
order.
The receiver includes a receiver buffer, which is used to compensate
for transmission delay jitter and to reorder NAL units from
transmission order to the NAL unit decoding order. In this section,
the receiver operation is described under the assumption that there
is no transmission delay jitter. To make a difference from a
practical receiver buffer that is also used for compensation of
transmission delay jitter, the receiver buffer is here after called
the de-interleaving buffer in this section. Receivers SHOULD also
prepare for transmission delay jitter; i.e., either reserve separate
buffers for transmission delay jitter buffering and de-interleaving
buffering or use a receiver buffer for both transmission delay
jitter and de-interleaving. Moreover, receivers SHOULD take
transmission delay jitter into account in the buffering operation;
e.g., by additional initial buffering before starting of decoding
and playback.
This section is organized as follows: subsection 6.2.1 presents how
to calculate the size of the de-interleaving buffer. Subsection
6.2.2 specifies the receiver process how to organize received NAL
units to the NAL unit decoding order.
6.2.1 Size of the De-interleaving Buffer
When the SDP Offer/Answer model or any other capability exchange
procedure is used in session setup, the properties of the received
stream SHOULD be such that the receiver capabilities are not
exceeded. In the SDP Offer/Answer model, the receiver can indicate
its capabilities to allocate a de-interleaving buffer with the
deint-buf-cap media type parameter. The sender indicates the
requirement for the de-interleaving buffer size with the sprop-
deint-buf-req media type parameter. It is therefore RECOMMENDED to
set the de-interleaving buffer size, in terms of number of bytes,
equal to or greater than the value of sprop-deint-buf-req media type
parameter. See section 8.1 for further information on deint-buf-cap
Schierl, et al Expires April 22, 2013 [Page 30]
Internet-Draft RTP Payload Format for HEVC October 2012
and sprop-deint-buf-req media type parameters and section 8.2.2 for
further information on their use in the SDP Offer/Answer model.
When a declarative session description is used in session setup, the
sprop-deint-buf-req media type parameter signals the requirement for
the de-interleaving buffer size. It is therefore RECOMMENDED to set
the de-interleaving buffer size, in terms of number of bytes, equal
to or greater than the value of sprop-deint-buf-req media type
parameter.
6.2.2 De-interleaving Process
There are two buffering states in the receiver: initial buffering
and buffering while playing. Initial buffering occurs when the RTP
session is initialized. After initial buffering, decoding and
playback are started, and the buffering-while-playing mode is used.
Regardless of the buffering state, the receiver stores incoming NAL
units, in reception order, in the de-interleaving buffer as follows.
NAL units of aggregation packets are stored in the de-interleaving
buffer individually. The value of DON is calculated and stored for
each NAL unit.
The receiver operation is described below with the help of the
following functions and constants:
o Function AbsDON is specified in section 7.1.
o Function don_diff is specified in section 4.6.
o Constant N is the value of the OPTIONAL sprop-interleaving-depth
media type type parameter (see section 7.1) incremented by 1.
Initial buffering lasts until one of the following conditions is
fulfilled:
o There are N or more VCL NAL units in the de-interleaving buffer.
o If sprop-max-don-diff is present, don_diff(m,n) is greater than
the value of sprop-max-don-diff, in which n corresponds to the
NAL unit having the greatest value of AbsDON among the received
NAL units and m corresponds to the NAL unit having the smallest
value of AbsDON among the received NAL units.
o Initial buffering has lasted for the duration equal to or greater
than the value of the OPTIONAL sprop-init-buf-time media type
parameter.
Schierl, et al Expires April 22, 2013 [Page 31]
Internet-Draft RTP Payload Format for HEVC October 2012
The NAL units to be removed from the de-interleaving buffer are
determined as follows:
o If the de-interleaving buffer contains at least N VCL NAL units,
NAL units are removed from the de-interleaving buffer and passed
to the decoder in the order specified below until the buffer
contains N-1 VCL NAL units.
o If sprop-max-don-diff is present, all NAL units m for which
don_diff(m,n) is greater than sprop-max-don-diff are removed from
the de-interleaving buffer and passed to the decoder in the order
specified below. Herein, n corresponds to the NAL unit having
the greatest value of AbsDON among the NAL units in the de-
interleaving buffer.
The order in which NAL units are passed to the decoder is specified
as follows:
o Let PDON be a variable that is initialized to 0 at the beginning
of the RTP session.
o For each NAL unit associated with a value of DON, a DON distance
is calculated as follows. If the value of DON of the NAL unit is
larger than the value of PDON, the DON distance is equal to DON -
PDON. Otherwise, the DON distance is equal to 65535 - PDON + DON
+ 1.
o NAL units are delivered to the decoder in ascending order of DON
distance. If several NAL units share the same value of DON
distance, they can be passed to the decoder in any order.
o When a desired number of NAL units have been passed to the
decoder, the value of PDON is set to the value of DON for the
last NAL unit passed to the decoder.
6.3 Additional De-Packetization Guidelines
The following additional de-packetization rules may be used to
implement an operational HEVC de-packetizer:
o Intelligent RTP receivers (e.g., in MANEs) may identify lost FUs.
If a lost FU is detected, a gateway MAY decide not to send the
following FUs of the same fragmented NAL unit, as their
information is meaningless for HEVC decoders. In this way a MANE
can reduce network load by discarding useless packets without
parsing a complex bitstream.
Schierl, et al Expires April 22, 2013 [Page 32]
Internet-Draft RTP Payload Format for HEVC October 2012
7. Payload Format Parameters
This section specifies the parameters that MAY be used to select
optional features of the payload format and certain features of the
bitstream. The parameters are specified here as part of the media
type registration for the HEVC codec. A mapping of the parameters
into the Session Description Protocol (SDP) [RFC4566] is also
provided for applications that use SDP. Equivalent parameters could
be defined elsewhere for use with control protocols that do not use
SDP.
Some parameters provide a receiver with the properties of the stream
that will be sent. The names of all these parameters start with
"sprop" for stream properties. Some of these "sprop" parameters are
limited by other payload or codec configuration parameters. For
example, the sprop-parameter-sets parameter is constrained by the
profile-tier-level-id parameter. The media sender selects all
"sprop" parameters rather than the receiver. This uncommon
characteristic of the "sprop" parameters may be incompatible with
some signaling protocol concepts, in which case the use of these
parameters SHOULD be avoided.
7.1 Media Type Registration
The media subtype for the HEVC codec is allocated from the IETF
tree.
The receiver MUST ignore any unspecified parameter.
Media Type name: video
Media subtype name: H265
Required parameters: none
OPTIONAL parameters:
In the following definitions of parameters, "the stream" or "the
NAL unit stream" refers to all NAL units conveyed in the current
RTP session in SST, and all NAL units conveyed in the current RTP
session and all NAL units conveyed in other RTP sessions that the
current RTP session depends on in MST.
profile-tier-level-id:
A base16 [7] (hexadecimal) representation of the following four
bytes in the sequence parameter set or video parameter set NAL
Schierl, et al Expires April 22, 2013 [Page 33]
Internet-Draft RTP Payload Format for HEVC October 2012
units is specified in [HEVC]: 1) a byte herein referred to
profile-tier-iop, composed of the values of the 2-bit
general_profile_space, the general_tier_flag and the 5-bit
profile_idc, 2) the 8 MSB of general_reserved_zero_16bits, 3) the
8 LSB of general_reserved_zero_16bits and 4) level_idc. Note that
general_reserved_zero_16bits is required to be equal to 0 in
[HEVC], but other values for it may be specified in the future by
ITU-T or ISO/IEC.
The profile-tier-level-id parameter indicates the default profile
(i.e., the subset of coding tools that may have been used to
generate the stream or that the receiver supports) and the
default level of the stream or the receiver supports.
If the profile-tier-level-id parameter is used to indicate
properties of a NAL unit stream, it indicates that, to decode the
stream, the minimum subset of coding tools a decoder has to
support is the default profile, and the lowest level the decoder
has to support is the default level.
If the profile-tier-level-id parameter is used for capability
exchange or session setup, it indicates the subset of coding
tools, which is equal to the default profile, that the codec
supports for both receiving and sending. If max-recv-level is not
present, the default level from profile-tier-level-id indicates
the highest level the codec wishes to support. If max-recv-level
is present, it indicates the highest level the codec supports for
receiving. For either receiving or sending, all levels that are
lower than the highest level supported MUST also be supported.
If no profile-tier-level-id is present, the Main profile, without
additional constraints at Level 1, MUST be inferred.
profile-compatibility-indicator:
A base16 [7] representation of the four bytes conforming the 32
general_profile_compatibility_flags in the sequence parameter set
or video parameter set NAL units. A decoder conforming to a
certain profile may be able to decode bitstreams conforming to
other profiles. The profile-compatibility-indicator provides
exact information of the ability of a decoder conforming to a
certain profile to decode bitstreams conforming to another
profile. More concretely, if the
general_profile_compatibility_flag corresponding to the profile,
which a decoder conforms to, is set, then the decoder is able to
decode that bitstream with the flag set, irrespective of the
Schierl, et al Expires April 22, 2013 [Page 34]
Internet-Draft RTP Payload Format for HEVC October 2012
profile, which a bistream conforms to (provided that the decoder
supports the highest level of the bitstream).
max-recv-level:
This parameter MAY be used to indicate the highest level a
receiver supports when the highest level is higher than the
default level (the level indicated by profile-tier-level-id). The
value of max-recv-level is a base16 (hexadecimal) representation
of the syntax element general_level_idc in the sequence parameter
set or video parameter set NAL unit specified in [HEVC]. The
highest level the receiver supports is equal to the level_idc
byte of max-recv-level divided by 30.
max-recv-level MUST NOT be present if the highest level the
receiver supports is not higher than the default level.
sprop-parameter-sets:
This parameter MAY be used to convey any video parameter set,
sequence parameter set and picture parameter set NAL units
(herein referred to as the initial parameter set NAL units) that
can be placed in the NAL unit stream to precede any other NAL
units in decoding order. The parameter MUST NOT be used to
indicate codec capability in any capability exchange procedure.
The value of the parameter is a comma-separated (',') list of
base64 [RFC4648] representations of parameter set NAL units as
specified in Sections 7.3.2.1, 7.3.2.2 and 7.3.2.3 of [HEVC].
Note that the number of bytes in a parameter set NAL unit is
typically less than 10, but a picture parameter set NAL unit can
contain several hundred bytes.
Informative note: When several payload types are offered in
the SDP Offer/Answer model, each with its own sprop-
parameter-sets parameter, the receiver cannot assume that
those parameter sets do not use conflicting storage locations
(i.e., identical values of parameter set identifiers).
Therefore, a receiver should buffer all sprop-parameter-sets
and make them available to the decoder instance that decodes
a certain payload type.
The sprop-parameter-sets parameter MUST only contain parameter
sets that are conforming to the profile-tier-level-id, i.e., the
subset of coding tools indicated by any of the parameter sets
Schierl, et al Expires April 22, 2013 [Page 35]
Internet-Draft RTP Payload Format for HEVC October 2012
MUST be equal to the default profile, and the level indicated by
any of the parameter sets MUST be equal to the default level.
max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
TBD
max-mbps:
TBD
max-smbps:
TBD
max-fs:
TBD
max-cpb:
TBD
max-dpb:
TBD
max-br:
TBD
sprop-level-parameter-sets:
TBD
use-level-src-parameter-sets:
TBD
packetization-mode:
This parameter signals the properties of an RTP payload type
or the capabilities of a receiver implementation. Only a
single configuration point can be indicated; thus, when
capabilities to support more than one packetization-mode are
declared, multiple configuration points (RTP payload types)
must be used.
When the value of packetization-mode is equal to 1, the non-
interleaved mode, as defined in section 5.2 MUST be used.
When the value of packetization-mode is equal to 2, the
interleaved mode, as defined in section 5.3, MUST be used.
The value of packetization-mode MUST be an integer in the
range of 1 to 2, inclusive.
Schierl, et al Expires April 22, 2013 [Page 36]
Internet-Draft RTP Payload Format for HEVC October 2012
sprop-interleaving-depth:
This parameter MUST NOT be present when packetization-mode is
not present or the value of packetization-mode is equal to 0
or 1. This parameter MUST be present when the value of
packetization-mode is equal to 2.
This parameter signals the properties of an RTP packet stream.
It specifies the maximum number of VCL NAL units that precede
any VCL NAL unit in the RTP packet stream in transmission
order and follow the VCL NAL unit in decoding order.
Consequently, it is guaranteed that receivers can reconstruct
NAL unit decoding order when the buffer size for NAL unit
decoding order recovery is at least the value of sprop-
interleaving-depth + 1 in terms of VCL NAL units.
The value of sprop-interleaving-depth MUST be an integer in
the range of 0 to 32767, inclusive.
sprop-deint-buf-req:
This parameter MUST NOT be present when packetization-mode is
not present or the value of packetization-mode is not equal to
2. It MUST be present when the value of packetization-mode is
equal to 2.
sprop-deint-buf-req signals the required size of the de-
interleaving buffer for the RTP packet stream. The value of
the parameter MUST be greater than or equal to the maximum
buffer occupancy (in units of bytes) required in such a de-
interleaving buffer that is specified in section 6.2. It is
guaranteed that receivers can perform the de-interleaving of
interleaved NAL units into NAL unit decoding order, when the
de-interleaving buffer size is at least the value of sprop-
deint-buf-req in terms of bytes.
The value of sprop-deint-buf-req MUST be an integer in the
range of 0 to 4294967295, inclusive.
Informative note: sprop-deint-buf-req indicates the
required size of the de-interleaving buffer only. When
network jitter can occur, an appropriately sized jitter
buffer has to be provisioned for as well.
deint-buf-cap:
This parameter signals the capabilities of a receiver
implementation and indicates the amount of de-interleaving
buffer space in units of bytes that the receiver has available
for reconstructing the NAL unit decoding order. A receiver is
Schierl, et al Expires April 22, 2013 [Page 37]
Internet-Draft RTP Payload Format for HEVC October 2012
able to handle any stream for which the value of the sprop-
deint-buf-req parameter is smaller than or equal to this
parameter.
If the parameter is not present, then a value of 0 MUST be
used for deint-buf-cap. The value of deint-buf-cap MUST be an
integer in the range of 0 to 4294967295, inclusive.
Informative note: deint-buf-cap indicates the maximum
possible size of the de-interleaving buffer of the receiver
only. When network jitter can occur, an appropriately
sized jitter buffer has to be provisioned for as well.
sprop-init-buf-time:
This parameter MAY be used to signal the properties of an RTP
packet stream. The parameter MUST NOT be present, if the
value of packetization-mode is equal to 1.
The parameter signals the initial buffering time that a
receiver MUST wait before starting decoding to recover the NAL
unit decoding order from the transmission order. The
parameter is the maximum value of (decoding time of the NAL
unit - transmission time of a NAL unit), assuming reliable and
instantaneous transmission, the same timeline for transmission
and decoding, and that decoding starts when the first packet
arrives.
An example of specifying the value of sprop-init-buf-time
follows. A NAL unit stream is sent in the following
interleaved order, in which the value corresponds to the
decoding time and the transmission order is from left to
right:
0 2 1 3 5 4 6 8 7 ...
Assuming a steady transmission rate of NAL units, the
transmission times are:
0 1 2 3 4 5 6 7 8 ...
Subtracting the decoding time from the transmission time
column-wise results in the following series:
0 -1 1 0 -1 1 0 -1 1 ...
Thus, in terms of intervals of NAL unit transmission times,
the value of sprop-init-buf-time in this example is 1. The
Schierl, et al Expires April 22, 2013 [Page 38]
Internet-Draft RTP Payload Format for HEVC October 2012
parameter is coded as a non-negative base10 integer
representation in clock ticks of a 90-kHz clock. If the
parameter is not present, then no initial buffering time value
is defined. Otherwise the value of sprop-init-buf-time MUST
be an integer in the range of 0 to 4294967295, inclusive.
In addition to the signaled sprop-init-buf-time, receivers
SHOULD take into account the transmission delay jitter
buffering, including buffering for the delay jitter caused by
mixers, translators, gateways, proxies, traffic-shapers, and
other network elements.
sprop-max-don-diff:
This parameter MAY be used to signal the properties of an RTP
packet stream. It MUST NOT be used to signal transmitter or
receiver or codec capabilities. The parameter MUST NOT be
present if the value of packetization-mode is equal to 1.
sprop-max-don-diff is an integer in the range of 0 to 32767,
inclusive. If sprop-max-don-diff is not present, the value of
the parameter is unspecified. sprop-max-don-diff is
calculated as follows:
sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
for any i and any j>i,
where i and j indicate the index of the NAL unit in the
transmission order and AbsDON denotes a decoding order number
of the NAL unit that does not wrap around to 0 after 65535.
In other words, AbsDON is calculated as follows: Let m and n
be consecutive NAL units in transmission order. For the very
first NAL unit in transmission order (whose index is 0),
AbsDON(0) = DON(0). For other NAL units, AbsDON is calculated
as follows:
If DON(m) == DON(n), AbsDON(n) = AbsDON(m)
If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
AbsDON(n) = AbsDON(m) + DON(n) - DON(m)
If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)
If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))
If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))
Schierl, et al Expires April 22, 2013 [Page 39]
Internet-Draft RTP Payload Format for HEVC October 2012
where DON(i) is the decoding order number of the NAL unit
having index i in the transmission order. The decoding order
number is specified in section 4.6.
Informative note: Receivers may use sprop-max-don-diff to
trigger which NAL units in the receiver buffer can be
passed to the decoder.
max-rcmd-nalu-size:
TBD
sar-understood:
TBD
sar-supported:
TBD
Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550).
Security considerations:
See Section 8 of RFC XXXX.
Public specification:
Please refer to Section 13 of RFC XXXX.
Additional information:
None
File extensions: none
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
Thomas Schierl, ts@thomas-schierl.de
Intended usage: COMMON
Author:
Thomas Schierl, ts@thomas-schierl.de
Schierl, et al Expires April 22, 2013 [Page 40]
Internet-Draft RTP Payload Format for HEVC October 2012
Change controller:
IETF Audio/Video Transport Payloads working group delegated
from the IESG.
7.2 SDP Parameters
7.2.1 Mapping of Payload Type Parameters to SDP
TBD
7.2.2 Usage with the SDP Offer/Answer Model
The media type video/H265 string is mapped to fields in the Session
Description Protocol (SDP) [RFC4566] as follows:
o The media name in the "m=" line of SDP MUST be video.
o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
media subtype).
o The clock rate in the "a=rtpmap" line MUST be 90000.
o The OPTIONAL parameters "profile-tier-level-id", "packetization-
mode", when present, MUST be included in the "a=fmtp" line of
SDP. These parameters are expressed as a media type string, in
the form of a semicolon separated list of parameter=value pairs.
o The OPTIONAL parameters "sprop-parameter-sets" and "sprop-level-
parameter-sets", when present, MUST be included in the "a=fmtp"
line of SDP or conveyed using the "fmtp" source attribute as
specified in section 6.3 of [RFC5576]. For a particular media
format (i.e., RTP payload type), a "sprop-parameter-sets" or
"sprop-level-parameter-sets" MUST NOT be both included in the
"a=fmtp" line of SDP and conveyed using the "fmtp" source
attribute. When included in the "a=fmtp" line of SDP, these
parameters are expressed as a media type string, in the form of a
semicolon separated list of parameter=value pairs. When conveyed
using the "fmtp" source attribute, these parameters are only
associated with the given source and payload type as parts of the
"fmtp" source attribute.
Informative note: Conveyance of "sprop-parameter-sets" and
"sprop-level-parameter-sets" using the "fmtp" source attribute
allows for out-of-band transport of parameter sets in
topologies like Topo-Video-switch-MCU [TBD].
An example of media representation in SDP is as follows:
Schierl, et al Expires April 22, 2013 [Page 41]
Internet-Draft RTP Payload Format for HEVC October 2012
m=video 49170 RTP/AVP 98
a=rtpmap:98 H265/90000
a=fmtp:98 profile-tier-level-id=UVWXYZ;
packetization-mode=1;
sprop-parameter-sets=<parameter sets data>
7.2.3 Usage with SDP Offer/Answer Model
TBD
7.2.4 Usage in Declarative Session Descriptions
TBD
7.2.5 Signaling of Parallel Processing
[Ed.Note(TS): Do need text on signaling of parallelization, JCT-VC
will include signaling for multithreading support in the VUI as
"min_spatial_segmentation_idc" parameter. First approach copy
parameter to SDP.]
7.3Examples
TBD.
7.4 Parameter Set Considerations
TBD
8. Security Considerations
TBD
9. Congestion Control
TBD
10. IANA Consideration
A new media type, as specified in Section 7.1 of this memo, should
be registered with IANA.
11. Informative Appendix: Application Examples
11.1 Introduction
TBD
Schierl, et al Expires April 22, 2013 [Page 42]
Internet-Draft RTP Payload Format for HEVC October 2012
11.2 Streaming
TBD
11.3 Videoconferencing (Unicast to MANE, Unicast to Endpoints)
TBD
11.4 Mobile TV (Multicast to MANE, Unicast to Endpoint)
TBD
12. Acknowledgements
TBD
This document was prepared using 2-Word-v2.0.template.dot.
13. References
13.1 Normative References
[HEVC] JCT-VC, "High-Efficiency Video Coding (HEVC) text
specification Working Draft 9", JCTVC-K1003, October 2012.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for
generic audiovisual services", March 2010.
[RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
Payload Format for H.264 Video", RFC 6184, May 2011.
[RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
Eleftheriadis, "RTP Payload Format for Scalable Video
Coding", RFC 6190, May 2011.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
With Session Description Protocol (SDP)", RFC 3264, June
2002.
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, October 2006.
Schierl, et al Expires April 22, 2013 [Page 43]
Internet-Draft RTP Payload Format for HEVC October 2012
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
V., "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
Description Protocol", RFC 4566, July 2006.
[RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
Media Attributes in the Session Description Protocol", RFC
5576, June 2009.
13.2 Informative References
[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
Correction", RFC 5109, December 2007.
14. Authors' Addresses
Thomas Schierl
Fraunhofer HHI
Einsteinufer 37
D-10587 Berlin
Germany
Phone: +49-30-31002-227
Email: ts@thomas-schierl.de
Stephan Wenger
Vidyo, Inc. th 433 Hackensack Ave., 7 floor
Hackensack, N.J. 07601
USA
Phone: +1-415-713-5473
EMail: stewe@stewe.org
Ye-Kui Wang
Qualcomm Incorporated
5775 Morehouse Drive
San Diego, CA 92121
USA
Phone: +1-858-651-8345
EMail: yekuiw@qti.qualcomm.com
Miska M. Hannuksela
Schierl, et al Expires April 22, 2013 [Page 44]
Internet-Draft RTP Payload Format for HEVC October 2012
Nokia Corporation
P.O. Box 1000
33721 Tampere
Finland
Phone: +358-7180-08000
EMail: miska.hannuksela@nokia.com
Schierl, et al Expires April 22, 2013 [Page 45]