Internet Draft        RTP Payload for H.263                  February 26, 1996

Internet Engineering Task Force                       Audio-Video Transport WG
INTERNET-DRAFT                                             Chunrong "Chad" Zhu
                                                                   Intel Corp.
                                                             February 26, 1996
                                                              Expires: 8/26/96


               RTP Payload Format for H.263 Video Stream


Status of This Memo

This document is an Internet-Draft. Internet-Drafts are working documents of
Internet Engineering Task Force (IETF), its areas, and its working groups.
Note that other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and
maybe updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet-Drafts as reference material or to cite them
other than as "work in progress."

To learn the current status of any Internet-Draft, please check the
"lid-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Cost).

Distribution of this document is unlimited.




Abstract

This document specifies the RTP payload format for encapsulating H.263
bitstreams in Real-Time Transport Protocol (RTP). The H.263 payload header
is designed for flexibility and simplicity. RTP can use one of three
possible modes for H.263 video streams depending on the desired performance
characteristics. The shortest header mode (Mode A) results in simplicity
and easy recovery of lost packets, brought about by fragmentation at
Group of Block (GOB) boundaries. The long header modes (Mode B and C)
result in more efficient use of bandwidth, brought about by fragmentation
at Macroblock (MB) boundaries.








Zhu                                                                   [Page 1]


Internet Draft        RTP Payload for H.263                  February 26, 1996


1. Introduction

This document describes a scheme to packetize an H.263 video stream for
transport using RTP [1]. H.263 video stream is defined by ITU-T
Recommendation H.263 (referred to as H.263 in this document) [4] for video
coding at very low bit rate. RTP is defined by the Internet Engineering
Task Force (IETF) to provide end-to-end network transport functions
suitable for applications transmitting real-time data over multicast or
unicast network services.

The complete specification of RTP for a particular application will require a
profile specification document [3], a payload format specification, and an
RTP protocol document [1]. This document is intended to serve as the payload
format specification for H.263 video streams.

2. Definitions

For the purpose of this document, the following definitions apply:

CIF: Common Intermediate Format. For H.263, a CIF picture has 352 x 288 pixels
for luminance, and 176 x 144 pixels for chrominance.

QCIF: Quarter CIF source format with 176 x 144 pixels for luminance and
88 x 72 pixels for chrominance.

sub-QCIF:  picture source format with 128 x 96 pixels for luminance and
64 x 48 pixels for chrominance.

4CIF: picture source format with 704 x 576 pixels for luminance and
352 x 288 pixels for chrominance.

16CIF: picture source format with 1408 x 1152 pixels for luminance and
704 x 288 pixels for chominance.

GOB: for H.263, a Group of Blocks (GOB) comprises of  k*16 lines, depending
on the picture format (k=1 for QCIF, CIF and sub-QCIF, k=2 for 4CIF and
k=3 for 16CIF).

MB: a macroblock (MB) relates to four blocks of luminance and the spatially
corresponding two blocks of chrominance. Each block consists of 8x8 pixels.

3. Design Issues for Packetizing H.263 Bitstream

H.263 is based on ITU-T Recommendation H.261 [2] (referred to as H.261 in this
document). Although it employs similar techniques to reduce both temporal and
spatial redundancy, there are several major differences between the two
algorithms that impact the design of packetization schemes significantly.
This section summarizes those differences.




Zhu                                                                   [Page 2]


Internet Draft        RTP Payload for H.263                  February 26, 1996

3.1 Optional Features of H.263

In addition to the basic source coding algorithms, H.263 supports four
negotiable features to improve performance: Advanced Prediction, PB frames,
Syntax-based Arithmetic Coding, and Unrestricted Motion Vectors. They can be
used in any combination. These optional features make session management a
little harder to handle.

Advanced Prediction(AP): four motion vectors instead of one can be used for
some macroblocks in the frame. This feature makes recovery from packet loss
harder, because more redundant information has to be preserved at the
beginning of the packet when fragmenting at macroblock boundaries.

PB frames:  two frames ( P frame and B frame) are coded into one bitstream
with macroblocks from two frames interleaved. From packetization point of view,
a MB from the P frame and a MB from the B frame must be treated together
because each MB for the B frame are coded based on the corresponding MB
for the P frame. Means must be provided to ensure proper rendering of two
frames in the right order. Also if part of this combined bitstream is lost,
it will impact the two frames, and possibly more.

Syntax-based Arithmetic Coding (SAC): Huffman codes are not the only choice
for variable length coding. When SAC option is on, the run value pair
resulted after quantization of Discrete Cosine Transform (DCT) coefficients
will be coded differently from Huffman codes, but macroblock hierarchy will
be preserved.

Unrestricted motion vector feature does not impact packetization directly.
No special treatment is needed.

3.2 GOB Numbering

In H.263, each picture is divided into groups of blocks (GOB). GOBs are
numbered by vertical scan of the GOBs, starting with the upper GOB and
ending with the lower GOB. In contrast, a GOB in  H.261 is composed of
three rows of 16x16 MB for all QCIF, and three half rows of MB for CIF
format. Like H.261, a GOB is divided into macroblocks. The definition of
a macroblock is the same as in H.261.

Each GOB in H.263 can have a fixed GOB header, but unlike H.261 the use of
the header is optional. If the GOB header is present, it may or may not
start on a byte boundary. Byte alignment can be achieved by proper
bit stuffing by the encoder, but it is not required.


Zhu                                                                   [Page 3]


Internet Draft        RTP Payload for H.263                  February 26, 1996

In summary, a GOB in H.263 is defined and coded with finer granularity
with the same source format, thus resulting in more flexibility for
packetization than H.261.

3.3 Motion Vectors Encoding

Differential coding is used to code motion vectors as variable length codes.
Unlike in H.261, where each motion vector is predicted from the previous
MB in the GOB, H.263 employs a more flexible prediction scheme, where
three candidate predictors are used instead of one. It is done differently
depending on the presence of GOB header.

If the GOB header is included for a GOB, motion vectors are coded with
reference to MBs in this GOB only. But if GOB header is not present
for the current GOB, three motion vectors must be available to decode
one macroblock, where two of them are from the previous GOB. To decode
the whole inter-coded GOB, all the motion vectors must be available from
the previous GOB. This can be a major problem for a packetization
scheme like the one defined for H.261 when packetizing at MB boundary.
Assume a packet starts with one MB but the GOB header is not coded.
If the previous packet is lost, then all the motion vectors to predict
the motion vector for the MBs in this GOB are not available. In order
to decode the received MBs correctly, all the motion vectors for the
previous GOB would have to be saved at the beginning of the packet.
This would be very expensive in terms of bandwidth.

We address these problems by recommending the use of the GOB header for
every GOB. In addition, treating the GOB boundary as the picture boundary
during the encoding will further improve the visual quality in the presence
of packet loss. Several simulations during ITU meetings on H.324 Mobile
have also demonstrated its effectiveness and desirability. The encoding
strategy of each implementation of H.263 codec is beyond the scope of
this document, even though it has very significant impact on visual
quality in the presence of packet loss.

3.4 Macroblock Address

As specified by H.261, macroblock address (MBA) is encoded with a
variable length code to indicate the position of a macroblock within
a group of blocks in the H.261 bitstream. H.263 does not code the MBA
explicitly, but the macroblock address within a GOB is necessary to
recover the decoder state in the presence of packet loss. We propose
to include this information in the H.263 payload header for two of
the modes (Mode B and Mode C) that allow packetization at MB boundaries.

4. Usage of RTP

When transmitting H.263 video streams over the internet, we will directly
packetize the output of the encoder. All the bits resulting from the bitstream
including the fixed length codes and  variable length codes (Huffman codes,
or SAC if SAC is used) will be included in the packet. Also we do not intend
to multiplex audio and video signals in the same packets, as UDP and RTP
provide a much more efficient way to achieve multiplexing.

Zhu                                                                   [Page 4]


Internet Draft        RTP Payload for H.263                  February 26, 1996


RTP does not guarantee a reliable and orderly data delivery service,
so a packet might get lost in the network. To achieve a best-effort
recovery from packet loss, the decoder needs assistance to proceed with
decoding of other packets that are received. Thus it is desirable to
be able to process each packet independent of other packets. Like H.261 RTP
payload format, we propose to include some frame level information in each
packet, such as source format and flags for optional features to assist
the decoder in operating efficiently.

The H.263 video stream will be carried as payload data within the RTP packets.
A new H.263 payload header is defined in section 7, H.263 payload header.
This section defines the usage of RTP fixed header and H.263 video packet
structure.

4.1 RTP Header usage

Each RTP packet starts with a fixed RTP header. The following fields of
the RTP fixed header are used for H.263 video stream:

Marker bit (M bit): The Marker bit of the RTP header  is set to 1
when the current packet carries the end of current frame. 0 otherwise.

Payload Type (PT): The Payload Type shall specify H.263 video payload
format.

Timestamp: The RTP Timestamp encodes the sampling instance of the first
video frame contained in the RTP data packet. The RTP timestamp may be the
same  on successive packets if a video frame occupies more than one packet.
For H.263 video stream, the RTP timestamp is based on a 90 kHz clock, the
same as that of RTP payload for H.261 stream.

4.2 Video Packet Structure

H.263 compressed bitstream is carried as payload within each RTP packet.
For each RTP packet, the RTP header is followed by the H.263 payload header,
which is followed by the standard H.263 compressed bitstream. The size
H.263 payload header is variable depending on modes used as detailed
in the next section. The layout of the RTP H.263 video packet is
shown as:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 RTP Header                                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 H.263 Payload Header                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 H.263 stream                                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+





Zhu                                                                   [Page 5]


Internet Draft        RTP Payload for H.263                  February 26, 1996

5. H.263 Payload Header

One H.263 payload header is present for each H.263 video packet as
carried within one RTP packet. Three formats (Mode A, Mode B and
Mode C) are defined for RTP H.263 payload. In Mode A, a H.263
payload header of four bytes is present before actual compressed
H.263 video bitstream in the packet. It allows fragmentation at GOB
boundaries. In Mode B, a eight byte H.263 payload header is used and
each packet starts at MB boundaries with PB frame option off. Finally,
Mode C with a 12 bytes header is provided to support fragmentation at
MB boundaries for frames that are coded with PB frame option on.
The mode is indicated by the F field and P field in the first two
bits of the header. The three modes can be intermixed for one
compressed frame. All the modes should be supported by the client
application or its drivers.

In this section, the H.263 payload format is shown as rows of 32-bit
double word. Within each double word, the high order byte shall be
transmitted before the low order byte and bits within a byte shall be
transmitted in decreasing order, most significant bit first.

5.1 Mode A

In this mode, H.263 bitstream will be packetized at GOB boundaries.
In other words, each packet will start at the beginning of a GOB, and it
can carry one or more MBs or GOBs. Only four bytes are used for the header.
Mode A can be used with or without PB frame option. For those GOBs that are
smaller than network packet size, this mode is recommended. The H.263 payload
header definition for Mode A is shown as follows with F=0.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|P|SBIT |EBIT | SRC | R       |I|A|S|DBQ| TRB |    TR         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

F: 1 bit
The flag bit indicates the format of the header. F=0, Mode A, F=1,
Mode B or Mode C.

P: 1 bit
Optional PB-frame mode as defined by the H.263 [4]. "0" implies normal
I or P picture, "1" PB-frame. When F=1, P also indicates modes. Mode B
if P=0, Mode C if P=1.

SBIT: 3 bits
Start bit position specifies number of bits that should be ignored in the
first data byte.

EBIT: 3 bits
End bit position indicates number of bits that should be ignored in the
last data byte.

Zhu                                                                   [Page 6]


Internet Draft        RTP Payload for H.263                  February 26, 1996

SRC : 3 bits
Source format specifies the resolution of the frames contained as defined
by the H.263 [4].

R: 5 bits
Reserved, must be 0.

I:  1 bit.
Set to 1 if current picture is intra-coded. Otherwise 0. Notice this is
opposite to the picture coding type in PTYPE as defined within the H.263
bitstream [4].

A: 1 bit
Optional Advanced Prediction mode as defined by H.263 [4]. "0" implies off,
"1", implies on.

S: 1 bit
Optional Syntax-based Arithmetic Code mode as defined by the H.263 [4].
0" off, "1" on.

DBQ: 2 bits
Differential quantization parameter to calculate quantizer for the B frame
based on quantizer for the P frame, when PB frame option is on. The value
should be the same as DBQUANT defined by the H.263 [4]. Set to 0 if PB
frame option is off.

TRB: 3 bits
Temporal Reference for the B frame as defined by the H.263 [4]. Set to
zero if PB frame option is off.

TR: 8 bits
Temporal Reference for the P frame as defined by the H.263 [4]. Set to
zero if PB frame option is off.

5.2 Mode B

In this mode, the H.263 stream can be fragmented at MB boundaries. Thus
necessary information is needed at the start of the packet to recover
the decoder internal state in the presence of packet loss. It is intended
for those GOBs whose sizes are larger than the maximum packet size allowed
in the underlining protocol (such as IP). This mode can only be used with
PB frame option off. Mode C as defined in the next section can be used
to fragment at MB boundaries with PB frame option on. The H.263 payload
header definition for Mode B is shown as follows with F=1 and P=0:









Zhu                                                                   [Page 7]


Internet Draft        RTP Payload for H.263                  February 26, 1996

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|P|SBIT |EBIT | SRC | QUANT   |I|A|S|  GOBN   |   MBA         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| HMV1          |  VMV1         |  HMV2         |   VMV2        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



The following fields are defined the same as in Mode A: F, P, SBIT, EBIT,
SRC, I, A, S. The new fields are defined as follows:

QUANT: 5 bits
Quantization value for the first MB coded at the starting of the packet.
Set to 0 if the packet begins with a GOB header. This is the equivalent
of GQUANT defined by the H.263 [4].

GOBN: 5 bits
GOB number in effect at the start of the packet. GOB number is specified
differently for different resolutions. See H.263 [4] for details.

MBA: 8 bits
The absolute address of the first MB within its GOB. Unlike in H.261,
MB address is not coded explicitly in the bitstream. Instead, MB position
is decided relative to its immediate previous MB. In order to continue
decoding in the presence of loss of the previous packet, the absolute
address of the first MB within its GOB in the packet is coded in the header.

HMV1, VMV1, 8 bits each.
Horizontal and vertical motion vector predictors for the first MB coded in
this packet from the MB on the left. The same as MV1 defined by H.263 [4].
We strongly recommend using GOB header for every GOB.

HMV2, VMV2, 8 bits each.
Horizontal and vertical motion vector predictors from the block or MB on the
left of block number 3 in the current MB when advanced prediction option is on.
it is the same as MV1 defined for block number 3 in H.263 [4]. This is needed
because block number 3 in the first MB needs the motion vector predictor
from the block to its left, as block number 1. These two fields are not
used when advanced prediction is off and must be set to 0 See the H.263 [4]
for block organization in a frame.

5.3 Mode C

In this mode, H.263 stream can be fragmented at MB boundaries of P frames
when the PB frame option is on. It is intended for those GOBs whose sizes
are larger than the maximum packet size allowed in the underlining protocol
(such as IP). H.263 payload header definition for Mode C is shown as follows
with F=1 and P=1:



Zhu                                                                   [Page 8]


Internet Draft        RTP Payload for H.263                  February 26, 1996

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|P|SBIT |EBIT | SRC | QUANT   |I|A|S|  GOBN   |   MBA         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| HMV1          |  VMV1         |  HMV2         |   VMV2        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TR            |DBQ| TRB | R                                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


The following fields are defined the same as in Mode A: F, P, SBIT, EBIT, SRC,
I, A, S, TR, DBQ, TRB, and the rest of the fields are defined the same as
in Mode B, except field R of 19 bits is reserved and must be set to zero.

5.4 Selection of Modes for H.263 Payload Header

Mode A, B and C can be intermixed. The modes shall be selected carefully
based on performance characteristics, H.263 coding modes and underlying
network protocols.

The major advantage of Mode A over Mode B and C is its simplicity. The header
is one half and one third of the size of Mode B and C respectively.
Transmission overhead is reduced and the savings may be very significant when
working with very low bit rates, especially when low latency is desired.

Another advantage of Mode A is that it simplifies error recovery in the
presence of packet loss. The internal state of the decoder can be recovered
at GOB boundaries instead of having to deal with MBs as in Mode B and C, this
is because the GOB header and the picture start code are easy to identify.
This mode is recommended for GOBs whose size is smaller than the network
packet size. The major disadvantage of Mode A is lack of flexibility in
packetization and that it does not work when a GOB is larger than the
network packet size.

Mode B has the advantage of flexibility with fragmentation at MB boundaries
with PB frame option off. This mode is necessary when a GOB is larger than
the network packet size. It has the disadvantage of higher overhead with a
long header of 8 bytes. For small packets, this may not be desirable.
Mode C is the same as B, except it allows fragmentation with PB option
on at the price of 4 additional bytes.

On the other hand, we would like to emphasize that recovery from packet loss
will depend on the decoder's ability to use the information provided in the
H.263 payload header within the RTP packets.

6. References

[1] Henning Schulzrinne, Stephen Casner, Ron Frederick, Van Jacobson,
    RTP : A Transport Protocol for Real-Time Applications, RFC 1889, 1996.



Zhu                                                                   [Page 9]


Internet Draft        RTP Payload for H.263                  February 26, 1996

[2] Video Codec for Audiovisual Services at  px64 kbits/s, ITU-T
    Recommendation H.261, 1993

[3] RTP Profile for Audio and Video Conference with Minimal Control, 1995.

[4] Video Coding for Low Bitrate Communication, ITU-T Recommendation H.263
    (draft) , 1995

[5] T. Turletti, C. Huitema, RTP Payload Format for H.261 Video Stream. 1995.


7. Author's Address

Chunrong "Chad"  Zhu
Mail Stop: JF2-78
Intel Architecture Lab
Intel Corporation
2111 N.E. 25th Avenue
Hillsboro, OR 97124
USA

Tel: (503) 264-8849
Fax: (503) 264-6067
Email: czhu@ibeam.intel.com



8. Table of Contents

1.  Introduction.........................................2
2.  Definitions..........................................2
3.  Design Issues for Packetizating H.263 Bitstream......2
3.1 Optional Features of H.263...........................3
3.2 GOB Numbering........................................3
3.3 Motion Vectors Encoding..............................4
3.4 Macroblock Address...................................4
4.  USAGE OF RTP.........................................4
4.1 RTP Header usage.....................................5
4.2 Video Packet Structure...............................5
5.  H.263 Payload Header.................................5
5.1 Mode A...............................................6
5.2 Mode B...............................................7
5.3 Mode C...............................................8
5.4 Selection of Modes for H.263 Payload Header..........8
6.  References...........................................9
7.  Author's Address....................................10







Zhu                                                                  [Page 10]