[Search] [txt|pdf|bibtex] [Tracker] [WG] [Email] [Nits]

Versions: 00 01 rfc2429                                                 
Internet Engineering Task Force                 Audio-Video Transport WG
INTERNET-DRAFT                                 C. Bormann / Univ. Bremen
                                                        L. Cline / Intel
                                                      G. Deisher / Intel
                                                       T. Gardos / Intel
                                                     C. Maciocco / Intel
                                                       D. Newell / Intel
                                                   J. Ott / Univ. Bremen
                                                   S. Wenger / TU Berlin
                                                          C. Zhu / Intel



               RTP Payload Format for the 1998 Version of
                    ITU-T Rec. H.263 Video (H.263+)



Status of This Memo

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups.  Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or made obsolete by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.


1. Introduction

This document specifies an RTP payload header format applicable to the
transportation of video streams generated based on the 1998 version of
ITU-T Recommendation H.263.

The 1998 version of ITU-T Recommendation H.263 added numerous coding
options to improve codec performance over the 1996 version.  The 1998
version is referred to as H.263+ in this document.  Among the new
options, the ones with the biggest impact on the RTP payload are the
slice structured mode (SS), independent segment decoding mode (ISD), and
the scalability mode.  This section summarizes the impact of these new
coding options on packetization.  Refer to [4] for more information on
coding options.

Slice structure was added to H.263+ for three purposes: to provide
enhanced error resilience capability, to make the bitstream more
amenable to use with an underlying packet transport such as RTP, and to
minimize video delay.  The slice structured mode supports fragmentation
at macroblock boundaries.

When the independent segment decoding option is employed, a video
picture frame is broken into segments and encoded in such a way that
each segment is independently decodable.  Utilizing ISD in a lossy
network environment helps prevent the propagation of errors from one
segment of the picture to others.

H.263+ also includes bitstream scalability as an optional coding mode.
Three kinds of scalability are defined: temporal, signal-to-noise ratio
(SNR), and spatial scalability.  Temporal scalability is achieved via
the disposable nature of bi-directionally predicted frames, or B-frames.
SNR scalability permits refinement of encoded video frames, thereby
improving the quality (or SNR).  Spatial scalability is similar to SNR
scalability except the refinement layer is twice the size of the base
layer in the horizontal dimension, vertical dimension, or both.


2. Usage of RTP

When transmitting H.263+ video streams over the internet, the output of
the encoder can be packetized directly.  All the bits resulting from the
bitstream including the fixed length codes and variable length codes
will be included in the packet.

For H.263+ bitstreams coded with temporal, spatial, or SNR scalability,
each layer may be transported to a different network address.  More
specifically, each layer may use a unique IP address and port
combination.  In addition, temporal relations between layers shall be
expressed using the RTP timestamp so that they can be synchronized at
the receiving ends in multicast or unicast applications.

The H.263+ video streams will be carried as payload data within RTP
packets.  A new H.263+ payload header, H.263+ payload header, is defined
in section 4.  This section defines the usage of the RTP fixed header
and H.263+ video packet structure.


2.1 RTP Header Usage

Each RTP packet starts with a fixed RTP header.  The following fields of
the RTP fixed header are used for H.263+ video streams:

Marker bit (M bit): The Marker bit of the RTP header is set to 1 when
the current packet carries the end of current frame, and is 0 otherwise.

Payload Type (PT): The Payload Type shall specify H.263+ video payload
format.  A dynamic payload can be used initially until a static payload
type is assigned.

Timestamp: The RTP Timestamp encodes the sampling instance of the first
video frame contained in the RTP data packet.  The RTP timestamp may be
the same on successive packets if a video frame occupies more than one
packet.  In a multilayer scenario, all pictures corresponding to the
same temporal reference should pertain the same timestamp.  If temporal
scalability is used and B-frames are present, the timestamp may not be
monotonically increasing in the video stream.  If B-frames are
transmitted on a separate layer and address, they must be synchronized
properly with the reference frames.  Please refer to the 1998 ITU
Recommendation for H.263 [4] for information on required transmission
order to a decoder.  For an H.263+ video stream, the RTP timestamp is
based on a 90 kHz clock, the same as that of the RTP payload for H.261
stream [5].


2.2 Video Packet Structure

An H.263+ compressed bitstream is carried as a payload within each RTP
packet.  For each RTP packet, the RTP header is followed by an H.263+
payload header, which is followed by a standard H.263+ compressed
bitstream.  The size of the H.263+ payload header is variable depending
on the payload involved as detailed in the section 4.  The layout of the
RTP H.263+ video packet is shown as:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    RTP Header                                               ...
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    H.263+ Payload Header                                    ...
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    H.263+ Compressed Data Stream                            ...
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


3. Design Considerations

The goal of this payload format is to specify an efficient way of
encapsulating an H.263+ standard compliant bitstream and enhance the
resiliency towards packet losses.  Due to the large number of different
possible coding schemes in H.263+, a copy of the picture header with
configuration information is inserted into the payload header when
appropriate.

There are a few assumptions and constraints associated with this H.263+
payload header design.  The purpose of this section is to point out
various design issues and also discuss several coding options provided
by H.263+ that may impact the performance of network video.

. It is reasonable to assume that no single macroblock will be too large
  to fit in a packet.

. The optional slice structured mode described in annex K of H.263+ [4]
  enables more flexibility for packetization.  Furthermore, packets
  based on a slice structure are also inherently more loss resilient.
  Similar to a picture segment that begins with a GOB header, the
  motion vector predictors in a slice are restricted to reside within
  its boundaries.  For these reasons, the use of the slice structured
  mode is strongly recommended for network applications.

. In non-rectangular slice structured mode, only complete slices should
  be included in a packet.  In other words, slices should not be
  fragmented across packets.  Optimally, a packet will contain only one
  slice.

. When the slice structure is not applied, the insertion of a GOB header
  in every GOB is recommended to reduce the dependency on motion vector
  prediction across GOBs.  See section 3.3 of [6] for more information.

. The independently segmented decoding described in annex R of [4] does
  not allow any data dependency across slice or GOB boundaries in
  reference picture.  It can be utilized to further improve resiliency
  in high loss conditions.

. If ISD is used in conjunction with the slice structure, the
  rectangular slice submode shall be enabled and the dimensions and
  quantity of the slices present in a frame shall remain the same
  between two intra-coded frames (I-frames).  The ISD segments may be
  entirely intra coded from time to time to realize quick error
  recovery without adding latency time associated with sending complete
  I-frames.

. For resiliency, sending a full picture header for every frame is
  recommended.  In other words, the sender should always set the
  subfield UFEP in PLUSPTYPE to '001' in the video bitstream.

. In a multi-layer scenario, each layer can be transmitted to a
  different network address.  The configuration of each layer such as
  the enhancement layer number (ELNUM), reference layer number (RLNUM),
  and scalability type should be determined at the start of the session
  and should not change during the course of the session.


4. H.263+ Payload Header

For H.263+ video streams, each RTP packet carries only one H.263+ video
packet.  The H.263+ payload header is always present for each H.263+
video packet.  The payload header has variable length.  If a picture
header is included in the payload header, the length of the picture
header in number of bytes is specified by PLEN.  The minimum length of
the payload header is 32 bits, corresponding to PLEN equals 0.

The H.263+ payload header is structured as follow:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |V=0|SBIT |EBIT |  PLEN   |PEBIT| TID | Trun  |       RR        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |1 0 0 0 0 0| picture header starting with TR, PTYPE, ...       .
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  V: 2 bits
  Version number.  Set to '00' for this payload format.
  [Ed. Note: The version control will not take effect until a draft has
  been formally submitted to the IETF.]

  SBIT: 3 bits
  Start bit position specifies the number of bits that should be
  ignored in the first data byte of the payload.

  EBIT: 3 bits
  End bit position indicates the number of bits that should be ignored
  in the last data byte of the payload.

  PLEN: 3 bits
  Picture header length in number of bytes.

  PEBIT: 3 bits
  End bit position indicates the number of bits that should be ignored
  in the last byte of the picture header.

  TID: 3 bits
  Thread id.  Used only in optional video redundancy coding mode (VRC).
  See annex N of [4].  All three bits must be set to 0 unless VRC mode
  is applied.

  Trun: 4 bits
  Cyclic packet number.  Used only in optional VRC mode.  These bits
  must be set to 0 unless VRC mode is applied.

  RR: 9 bits
  Reserved bits.

Notice that the TID and Trun fields are associated only with the video
redundancy coding usage scenario derived from the reference picture
selection mode specified in annex N of [4].  The TID and Trun bits must
be set to 0 if VRC is not used.  The use of VRC shall be negotiated by
external means.


4.1 Encapsulating Packet that Begins with PSC

Any packet that begins with a picture start code (PSC), i.e. the first
packet of a picture frame, shall be encapsulated using only the first
32-bit word of the payload header since a picture header is already
included in the data bitstream.  In this case, PLEN shall be 0.

Here is an example of encapsulating the first packet in a frame:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0 0|SBIT |EBIT |0 0 0 0 0|0 0 0| TID | Trun  |       RR        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | bitstream data starts with complete picture header ...        .
  +---------------------------------------------------------------+


4.2 Encapsulating Packet that Begins with GBSC or SSC

Any packet that begins with either a GOB start code (GBSC) or a slice
start code (SSC) shall include a copy of the picture header in the
payload header for resiliency.  PLEN shall be set to specify the length
of the included picture header in bytes.  Hence, PLEN > 0.  The end bit
position corresponding to the last byte of the picture header data is
indicated by PEBIT.  Actual bitstream data shall begin on an 8-bit byte
boundary following the payload header.

Notice that only the last six bits of the picture start code, '100000',
are included in the payload header.  A complete H.263+ picture header
with byte aligned picture start code can be conveniently assembled if
needed on the receiving end by prepending the sixteen leading '0' bits.

Assuming a PLEN of 9, below is an example of a packet that begins with a
GBSC or a SSC:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0 0|SBIT |EBIT |0 1 0 0 1|PEBIT| TID | Trun  |       RR        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |1 0 0 0 0 0| picture header starting with TR, PTYPE, ...       |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | ...                                                           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | ...           | bitstream data begins with GBSC/SCC ...       .
  +-+-+-+-+-+-+-+-+-----------------------------------------------+


4.3 Encapsulating Follow-On Packet

When slice structure coding option is not applied, some GOBs in the
bitstream may be larger than the size of one packet.  Similarly, when
ISD option is applied, a picture segment may be larger than the required
packet size.  The remaining fragment of a picture segment larger than
the required packet size is termed "follow-on" packet in this document.

These follow-on packets with data fragmented at the macroblock
boundaries are not independently recoverable.  In this case, the payload
header includes only the first 32-bit word and PLEN shall be set to 0.
A receiver should discard any follow-on packet it receives if the
preceding packet containing the segment header information has been
lost.

Here is an example of a follow-on packet:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0 0|SBIT |EBIT |0 0 0 0 0|0 0 0| TID | Trun  |       RR        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | sub-segment bitstream data ...                                .
  +---------------------------------------------------------------+

Even though they may have identical payload headers, a follow-on packet
can be differentiated from the first packet in a frame since the data in
a follow-on packet does not begin with a PSC.


5. Security Considerations

RTP packets using the payload format defined in this specification are
subject to the security considerations discussed in the RTP
specification [1], and any appropriate RTP profile (for example [3]).
This implies that confidentiality of the media streams is achieved by
encryption.  Because the data compression used with this payload format
is applied end-to-end, encryption may be performed after compression so
there is no conflict between the two operations.

A potential denial-of-service threat exists for data encodings using
compression techniques that have non-uniform receiver-end computational
load.  The attacker can inject pathological datagrams into the stream
which are complex to decode and cause the receiver to be overloaded.
However, this encoding does not exhibit any significant non-uniformity.

As with any IP-based protocol, in some circumstances a receiver may be
overloaded simply by the receipt of too many packets, either desired or
undesired.  Network-layer authentication may be used to discard packets
from undesired sources, but the processing cost of the authentication
itself may be too high.  In a multicast environment, pruning of specific
sources may be implemented in future versions of IGMP [5] and in
multicast routing protocols to allow a receiver to select which sources
are allowed to reach it.

A security review of this payload format found no additional
considerations beyond those in the RTP specification.


6. References

[1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP : A
    Transport Protocol for Real-Time Applications", RFC 1889.

[2] "Video Codec for Audiovisual Services at px64 kbits/s", ITU-T
    Recommendation H.261, 1993.

[3] "RTP Profile for Audio and Video Conference with Minimal Control",
    RFC 1890.

[4] "Video Coding for Low Bitrate Communication", Draft ITU-T
    Recommendation H.263, Draft 20, September 1997.

[5] T. Turletti, C. Huitema, "RTP Payload Format for H.261 Video
    Streams", RFC 2032.

[6] C. Zhu, "RTP Payload Format for H.263 Video Streams", RFC 2190.