RTP Payload Format for H.263 Video Streams
RFC 2190

Document Type RFC - Historic (September 1997; No errata)
Author Chunrong Zhu 
Last updated 2013-03-02
Stream Internent Engineering Task Force (IETF)
Formats plain text html pdf htmlized (tools) htmlized bibtex
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 2190 (Historic)
Consensus Boilerplate Unknown
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                             C. Zhu
Request for Comments: 2190                                   Intel Corp.
Category: Standards Track                                 September 1997

               RTP Payload Format for H.263 Video Streams

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.


   This document specifies the payload format for encapsulating an H.263
   bitstream in the Real-Time Transport Protocol (RTP). Three modes are
   defined for the H.263 payload header. An RTP packet can use one of
   the three modes for H.263 video streams depending on the desired
   network packet size and H.263 encoding options employed. The shortest
   H.263 payload header (mode A) supports fragmentation at Group of
   Block (GOB) boundaries. The long H.263 payload headers (mode B and C)
   support fragmentation at Macroblock (MB) boundaries.

1. Introduction

   This document describes a scheme to packetize an H.263 video stream
   for transport using RTP [1]. H.263 video stream is defined by ITU-T
   Recommendation H.263 (referred to as H.263 in this document) [4] for
   video coding at very low data rates. RTP is defined by the Internet
   Engineering Task Force (IETF) to provide end-to-end network transport
   functions suitable for applications transmitting real-time data over
   multicast or unicast network services.

2. Definitions

   The following definitions apply in this document:

   CIF: Common Intermediate Format. For H.263, a CIF picture has 352 x
   288 pixels for luminance, and 176 x 144 pixels for chrominance.

   QCIF: Quarter CIF source format with 176 x 144 pixels for luminance
   and 88 x 72 pixels for chrominance.

   Sub-QCIF:  picture source format with 128 x 96 pixels for luminance
   and 64 x 48 pixels for chrominance.

Zhu                         Standards Track                     [Page 1]
RFC 2190       RTP Payload Format for H.263 Video Streams September 1997

   4CIF: Picture source format with 704 x 576 pixels for luminance and
   352 x 288 pixels for chrominance.

   16CIF: Picture source format with 1408 x 1152 pixels for luminance
   and 704 x 576 pixels for chrominance.

   GOB: For H.263, a Group of Blocks (GOB) consists of  k*16 lines,
   where k depends on the picture format (k=1 for QCIF, CIF and sub-
   QCIF; k=2 for 4CIF and k=4 for 16CIF).

   MB: A macroblock (MB) contains four blocks of luminance and the
   spatially corresponding two blocks of chrominance. Each block
   consists of 8x8 pixels. For example, there are eleven MBs in a GOB in
   QCIF format and twenty two MBs in a GOB in CIF format.

3. Design Issues for Packetizing H.263 Bitstreams

   H.263 is based on the ITU-T Recommendation H.261 [2] (referred to as
   H.261 in this document). Compared to H.261, H.263 employs similar
   techniques to reduce both temporal and spatial redundancy, but there
   are several major differences between the two algorithms that affect
   the design of packetization schemes significantly. This section
   summarizes those differences.

3.1 Optional Features of H.263

   In addition to the basic source coding algorithms, H.263 supports
   four negotiable coding options to improve performance: Advanced
   Prediction, PB-frames, Syntax-based Arithmetic Coding, and
   Unrestricted Motion Vectors. They can be used in any combination.

   Advanced Prediction(AP): One or four motion vectors can be used for
   some macroblocks in a frame. This feature makes recovery from packet
   loss difficult, because more redundant information has to be
   preserved at the beginning of a packet when fragmenting at a
   macroblock boundary.

   PB-frames:  Two frames (a P frame and a B frame) are coded into one
   bitstream with macroblocks from the two frames interleaved. From a
   packetization point of view, a MB from the P frame and a MB from the
   B frame must be treated together because each MB for the B frame is
   coded based on the corresponding MB for the P frame. A means must be
   provided to ensure proper rendering of two frames in the right order.
   Also, if part of this combined bitstream is lost, it will affect both
   frames, and possibly more.

Zhu                         Standards Track                     [Page 2]
RFC 2190       RTP Payload Format for H.263 Video Streams September 1997

   Syntax-based Arithmetic Coding (SAC): When the SAC option is used,
   the resultant run-value pair after quantization of Discrete Cosine
   Transform (DCT) coefficients will be coded differently from Huffman
   codes, but the macroblock hierarchy will be preserved. Since context
   variables are only synchronized after fixed length codes in the
Show full document text