Network Working Group
   Internet Draft                                            D. Hoffman
   Document: draft-ietf-avt-mpeg1and2-mod-00.txt             G.Fernando
   Expires: March 2004                           Sun Microsystems, Inc.
                                                               V. Goyal
                                                    Packet Design, Inc.
                                                         M. R. Civanlar
                                                         Ko‡ University
                                                            August 2003



                    RTP Payload Format for MPEG1/MPEG2


STATUS OF THIS MEMO

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.


Abstract

   This memo describes a packetization scheme for MPEG video and audio
   streams.  The scheme proposed can be used to transport such a video
   or audio flow over the transport protocols supported by RTP.  Two
   approaches are described. The first is designed to support maximum
   interoperability with MPEG System environments.  The second is
   designed to provide maximum compatibility with other RTP-encapsulated
   media streams and future conference control work of the IETF.

   Most of this memo is identical to RFC 2250, an Internet standards
   track RTP payload format definition. No changes have been made in the
   packet formats on the wire. The main reason for this revision is to
   allow the use of this payload format with dynamic payload types


Hoffman/Fernando/Goyal/Civanlar                               [Page 1]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   that can specify the timestamp clock frequency by non-RTP means for
   improved jitter compensation. We used this opportunity to improve the
   description of the payload format specification by clarifying some
   wording that have been reported to be problematic.

Table of Contents

   1. Introduction...................................................2
   2. Encapsulation of MPEG System and Transport Streams.............4
      2.1 RTP header usage...........................................5
   3. Encapsulation of MPEG Elementary Streams.......................5
      3.1 MPEG Video elementary streams..............................5
      3.2 MPEG Audio elementary streams..............................7
      3.3 RTP Fixed Header for MPEG ES encapsulation.................7
      3.4 MPEG Video-specific header.................................8
      3.4.1 MPEG-2 Video-specific header extension...................9
      3.5 MPEG Audio-specific header................................11
   A. Error Recovery and Resynchronization Strategies...............11
   B. Changes from RFC 2250.........................................13
   C. Security Considerations.......................................14
   D. References....................................................14
   E. Author's Addresses............................................15

1. Introduction

   [Note to the RFC Editor: This paragraph is to be deleted when this
   draft is published as an RFC. Readers are directed to Appendix B
   Changes from RFC 2250, for a listing of the changes that have been
   made in this draft.]

   ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has
   defined the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard
   (ISO/IEC 13818)[2].  This memo describes a packetization scheme to
   transport MPEG video and audio streams using the Real-time Transport
   Protocol (RTP), version 2 [3, 4].

   The MPEG1 specification is defined in three parts: System, Video and
   Audio.  It is designed primarily for CD-ROM-based applications, and
   is optimized for approximately 1.5 Mbits/sec combined data rates. The
   video and audio portions of the specification describe the basic
   format of the video or audio stream.  These formats define the
   Elementary Streams (ES).  The MPEG1 System specification defines an
   encapsulation of the ES that contains Presentation Time Stamps (PTS),
   Decoding Time Stamps and System Clock references, and performs
   multiplexing of MPEG1 compressed video and audio ESs with user data.

   The MPEG2 specification is structured in a similar way. However, it
   hasn't been restricted only to CD-ROM applications. The MPEG2 System
   specification defines two system stream formats:  the MPEG2 Transport


Hoffman/Fernando/Goyal/Civanlar                               [Page 2]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   Stream (MTS) and the MPEG2 Program Stream (MPS).  The MTS is tailored
   for communicating or storing one or more programs of MPEG2 compressed
   data and also other data in relatively error-prone environments. The
   MPS is tailored for relatively error-free environments.

   We seek to achieve interoperability among 4 types of end-systems in
   the following specification. The 4 types are:

      1. Transmitting Interworking Unit (TIU)

         Receives MPEG information from a native MTS system for
         distribution over packet networks using a native RTP-based
         system layer (such as an IP-based internetwork). Examples:
         real-time encoder, MTS satellite link to Internet, video
         server with MTS-encoded source material.

      2. Receiving Interworking Unit (RIU)

         Receives MPEG information in real time from an RTP-based
         network for forwarding to a native MTS environment.
         Examples: Internet-based video server to MTS-based cable
         distribution plant.

      3. Transmitting Internet End-System (TAES)

         Transmits MPEG information generated or stored within the
         internet end-system itself, or received from internet-based
         computer networks.  Example: video server.

      4. Receiving Internet End-System (RAES)

         Receives MPEG information over an RTP-based internet for
         consumption at the internet end-system or forwarding to
         traditional computer network.  Example: desktop PC or
         workstation viewing training video.

   Each of the 2 types of transmitters must work with each of the 2
   types of receivers.  Because it is probable that the TAES, and
   certain that the RAES, will be based on existing and planned
   internet-connected computers, it is highly desirable for the
   interoperable protocol to be based on RTP.

   Because of the range of applications that might employ MPEG streams,
   we propose to define two payload formats.

   Much interest in the MPEG community is in the use of one of the MPEG
   System encodings, and hence, in Section 2 we propose encapsulations
   of MPEG1 System streams and MPEG2 Transport and Program Streams with



Hoffman/Fernando/Goyal/Civanlar                               [Page 3]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   RTP.  This profile supports the full semantics of MPEG System and
   offers basic interoperability among all four end-system types.

   When operating only among internet-based end-systems (i.e., TAES and
   RAES) a payload format that provides greater compatibility with the
   Internet architecture is desired, deferring some of the system issues
   to other protocols being defined in the Internet community (such as
   the MMUSIC WG).  In Section 3 we propose an encapsulation of
   compressed video and audio data (referred to in MPEG documentation as
   "Elementary Streams" (ES)) complying with either MPEG1 or MPEG2.

   Here, neither of the System standards of MPEG1 or MPEG2 are utilized.
   The ES's are directly encapsulated with RTP.

   Throughout this specification, we make extensive use of MPEG
   terminology.  The reader should consult the primary MPEG references
   for definitive descriptions of this terminology.

1.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119 [5]
   and indicate requirement levels for compliant RTP implementations.

2. Encapsulation of MPEG System and Transport Streams

   Each RTP packet will contain a timestamp derived from the sender's
   clock reference.  This clock is synchronized to the system stream
   Program Clock Reference (PCR) or System Clock Reference (SCR) and
   represents the target transmission time of the first byte of the
   packet payload.  The RTP timestamp will not be passed to the MPEG
   decoder.  This use of the timestamp is somewhat different than
   normally is the case in RTP, in that it is not considered to be the
   media display or presentation timestamp. The primary purposes of the
   RTP timestamp will be to estimate and reduce any network-induced
   jitter and to synchronize relative time drift between the transmitter
   and receiver.

   For MPEG2 Transport Streams the RTP payload will contain an integral
   number of MPEG transport packets.  To avoid end system
   inefficiencies, data from multiple small MTS packets (normally fixed
   in size at 188 bytes) are aggregated into a single RTP packet.  The
   number of transport packets contained is computed by dividing RTP
   payload length by the length of an MTS packet (188).

   For MPEG2 Program streams and MPEG1 system streams there are no
   packetization restrictions; these streams are treated as a packetized
   stream of bytes.


Hoffman/Fernando/Goyal/Civanlar                               [Page 4]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003



2.1 RTP header usage

   The RTP header fields are used as follows:

     Payload Type: Distinct payload types MUST be assigned for MPEG1
     System Streams, MPEG2 Program Streams and MPEG2 Transport Streams.
     See [4] for payload type assignments.

     M bit:  Set to 1 whenever the timestamp is discontinuous (such as
     might happen when a sender switches from one data source to
     another). This allows the receiver and any intervening RTP mixers
     or translators that are synchronizing to the flow to ignore the
     difference between this timestamp and any previous timestamp in
     their clock phase detectors.

     timestamp: 32 bit timestamp representing the target transmission
     time for the first byte of the packet. For the payload type MP2T
     defined in [4], the clock frequency used for the timestamp is 90
     kHz. However, this payload format MAY be used with a dynamic
     payload type where the clock frequency can be specified through
     non-RTP means e.g. SDP [6].

3. Encapsulation of MPEG Elementary Streams

   The following ES types may be encapsulated directly in RTP:

   (a) MPEG1 Video (ISO/IEC 11172-2) (b) MPEG2 Video (ISO/IEC 13818-2)
   (c) MPEG1 Audio (ISO/IEC 11172-3) (d) MPEG2 Audio(ISO/IEC 13818-3)

   A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and
   MPEG1/MPEG2 Audio, respectively. Further indication as to whether the
   data is MPEG1 or MPEG2 need not be provided in the RTP or MPEG
   specific headers of this encapsulation, as this information is
   available in the ES headers.

   Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz
   for MPV and MPA payload types as defined in [4] shall be carried in
   the fixed RTP header. The accuracy of the timestamp MAY be defined by
   non-RTP means using dynamic payload types with the payload formats
   defined in this section. All packets that make up an audio or video
   frame shall have the same time stamp.

3.1 MPEG Video elementary streams

   MPEG1 Video can be distinguished from MPEG2 Video at the video
   sequence header, i.e. for MPEG2 Video a sequence_header() is followed
   by sequence_extension().  The particular profile and level of MPEG2
   Video (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc.) are


Hoffman/Fernando/Goyal/Civanlar                               [Page 5]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   determined by the profile_and_level_indicator field of the
   sequence_extension header of MPEG2 Video.

   The MPEG bit-stream semantics were designed for relatively error-free
   environments, and there is significant amount of dependency (both
   temporal and spatial) within the stream such that loss of some data
   make other uncorrupted data useless.  The format as defined in this
   encapsulation uses application layer framing information plus
   additional information in the RTP stream-specific header to allow for
   certain recovery mechanisms.  Appendix A suggests several recovery
   strategies based on the properties of this encapsulation.

   Since MPEG pictures can be large, they will normally be fragmented
   into packets of size less than a typical LAN/WAN MTU.  The following
   fragmentation rules apply:

         1. The MPEG Video_Sequence_Header, when present, will always be
         at the beginning of an RTP payload.

         2. An MPEG GOP_header, when present, will always be at the
         beginning of the RTP payload, or will follow a
         Video_Sequence_Header.

         3. An MPEG Picture_Header, when present, will always be at the
         beginning of a RTP payload, or will follow a GOP_header.

   Each ES header must be completely contained within the packet.
   Consequently, a minimum RTP payload size of 261 bytes must be
   supported to contain the largest single header defined in the ES
   (that is, the extension_data() header containing the
   quant_matrix_extension()).  Otherwise, there are no restrictions on
   where headers may appear within packet payloads.

   In MPEG, each picture is made up of one or more "slices," and a slice
   is intended to be the unit of recovery from data loss or corruption.
   An MPEG-compliant decoder will normally advance to the beginning of
   next slice whenever an error is encountered in the stream.  MPEG
   slice begin and end bits are provided in the encapsulation header to
   facilitate this.

   The beginning of a slice must either be the first data in a packet
   (after any MPEG ES headers) or must follow after some integral number
   of slices in a packet.  This requirement insures that the beginning
   of the next slice after one with a missing packet can be found
   without requiring that the receiver scan the packet contents.  Slices
   may be fragmented across packets as long as all the above rules are
   met.




Hoffman/Fernando/Goyal/Civanlar                               [Page 6]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   An implementation based on this encapsulation assumes that the
   Video_Sequence_Header is repeated periodically in the MPEG bit
   stream.  In practice (though not required by MPEG standard) this is
   used to allow channel switching and to receive and start decoding a
   continuously relayed MPEG bit-stream at arbitrary points in the media
   stream.  It is suggested that when playing back from an MPEG stream
   from a file format (where the Video_Sequence_Header may only be
   represented at the beginning of the stream) that the first
   Video_Sequence_Header (preceded by an end-of-stream indicator) be
   saved by the packetizer for periodic injection in to the network
   stream.

3.2 MPEG Audio elementary streams

   MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG
   ancillary_data() header.  For either MPEG1 or MPEG2 Audio, distinct
   Presentation Time Stamps may be present for frames which correspond
   to either 384 samples for Layer-I, or 1152 samples for Layer-II or
   Layer-III.  The actual number of bytes required to represent this
   number of samples will vary depending on the encoder parameters.

   Multiple audio frames may be encapsulated within one RTP packet.  In
   this case, an integral number of audio frames must be contained
   within the packet and the fragmentation header defined in Section 3.5
   shall be set to 0.

   If, however, an audio frame is too large to fit inside a single RTP
   packet, it is fragmented across multiple successive RTP packets.  For
   example, for Layer-II MPEG audio sampled at a rate of 44.1 KHz each
   frame would represent a time slot of 26.1 msec. At this sampling rate
   if the compressed bit-rate is 384 kbits/sec (i.e.  48 kBytes/sec)
   then the average audio frame size would be 1.25 KBytes.  If packets
   were to be 500 Bytes long, then each audio frame would straddle 3 RTP
   packets.

   In this case, the "Frag_offset" field in the "MPEG Audio-specific
   header" (See Section 3.5) of each such RTP packet is set to the byte
   offset of the fragment within the entire frame.  (Thus, the
   "Frag_offset" of the first such packet is zero.)  If a frame is
   fragmented across multiple RTP packets, then these packets MUST each
   contain only one fragment (i.e., they MUST NOT be packed with data
   from any other frame).

3.3 RTP Fixed Header for MPEG ES encapsulation

   The RTP header fields are used as follows:





Hoffman/Fernando/Goyal/Civanlar                               [Page 7]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


      Payload Type: Distinct payload types should be assigned for video
      elementary streams and audio elementary streams. See [4] for
      payload type assignments.

      M bit:  For video, set to 1 if the packet contains the last slice
      of a picture (or, if the last slice of a picture is fragmented
      over multiple packets, the last fragment of that slice); set to 0
      otherwise.  For audio, set to 1 on first packet of a "talk-spurt,"
      0 otherwise.

      PT:  MPEG video or audio stream ID.

      timestamp: 32-bit timestamp representing presentation time of MPEG
      picture or audio frame.  Same for all packets that make up a
      picture or audio frame.  May not be monotonically increasing in
      video stream if B pictures present in stream.  For packets that
      contain only a video sequence and/or GOP header, the timestamp is
      that of the subsequent picture.

3.4 MPEG Video-specific header

   This header shall be attached to each RTP packet after the RTP fixed
   header.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    MBZ  |T|         TR        | |N|S|B|E|  P  | | BFC | | FFC |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                   AN              FBV     FFV
         MBZ: Unused. Must be set to zero in current specification. This
         space is reserved for future use.

         T: MPEG-2 (Two) specific header extension present (1 bit). Set
         to 1 when the MPEG-2 video-specific header extension (see
         Section 3.4.1) follows this header. This extension may be
         needed for improved error resilience; however, its inclusion in
         an RTP packet is optional. (See Appendix 1.)

         TR: Temporal-Reference (10 bits). The temporal reference of the
         current picture within the current GOP. This value ranges from
         0-1023 and is constant for all RTP packets of a given picture.

         AN: Active N bit for error resilience (1 bit). Set to 1 when
         the following bit (N) is used to signal changes in the picture
         header information for MPEG-2 payloads. It must be set to 0 for
         MPEG-1 payloads or when N bit is not used.




Hoffman/Fernando/Goyal/Civanlar                               [Page 8]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


         N: New picture header (1 bit). Used for MPEG-2 payloads when
         the previous bit (AN) is set to 1. Otherwise, it must be set to
         zero. Set to 1 when the information contained in the previously
         transmitted Picture Headers can't be used to reconstruct a
         header for the current picture. This happens when the current
         picture is encoded using a different set of parameters than the
         previous pictures of the same type. The N bit must be constant
         for all RTP packets that belong to the same picture so that
         receipt of any packet from a picture allows detecting whether
         information necessary for reconstruction was contained in that
         picture (N = 1) or a previous one (N = 0).

         S: Sequence-header-present (1 bit). Normally 0 and set to 1 at
         the occurrence of each MPEG sequence header.  Used to detect
         presence of sequence header in RTP packet.

         B: Beginning-of-slice (BS) (1 bit). Set when the start of the
         packet payload is a slice start code, or when a slice start
         code is preceded only by one or more of a
         Video_Sequence_Header, GOP_header and/or Picture_Header.

         E: End-of-slice (ES) (1 bit). Set when the last byte of the
         payload is the end of an MPEG slice.

         P: Picture-Type (3 bits). I (1), P (2), B (3) or D (4). This
         value is constant for each RTP packet of a given picture. Value
         000B is forbidden and 101B - 111B are reserved to support
         future extensions to the MPEG ES specification.

         FBV: full_pel_backward_vector

         BFC: backward_f_code

         FFV: full_pel_forward_vector

         FFC: forward_f_code
         Obtained from the most recent picture header, and are constant
         for each RTP packet of a given picture. For I frames none of
         these values are present in the picture header and they must be
         set to zero in the RTP header.  For P frames only the last two
         values are present and FBV and BFC must be set to zero in the
         RTP header. For B frames all the four values are present.

3.4.1 MPEG-2 Video-specific header extension

   This header may be attached to each RTP packet after the MPEG Video
   Specific Header where its presence is indicated by setting the T bit
   to one (Section 3.4).



Hoffman/Fernando/Goyal/Civanlar                               [Page 9]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |X|E|f_[0,0]|f_[0,1]|f_[1,0]|f_[1,1]| DC| PS|T|P|C|Q|V|A|R|H|G|D|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         X: Unused (1 bit). Must be set to zero in current
         specification. This space is reserved for future use.

         E: Extensions present (1 bit). If set to 1, this header
         extension, including the composite display extension when D =
         1, will be followed by one or more of the following extensions:
         quant matrix extension, picture display extension, picture
         temporal scalable extension, picture spatial scalable extension
         and copyright extension.

         The first byte of these extensions data gives the length of the
         extensions in 32 bit words including the length field itself.
         Zero padding bytes are used at the end if required to align the
         extensions to 32 bit boundary.

         Since they may not be vital in decoding of a picture, the
         inclusion of any one of these extensions in an RTP packet is
         optional even when the MPEG-2 video-specific header extension
         is included in the packet (T = 1). (See Appendix A.) If
         present, they should be copied from the corresponding
         extensions following the most recent MPEG-2 picture coding
         extension and they remain constant for each RTP packet of a
         given picture.

         The extension start code (32 bits) and the extension start code
         ID (4 bits) are included. Therefore the extensions are self
         identifying.

         f_[0,0]: forward horizontal f_code (4 bits)
         f_[0,1]: forward vertical f_code (4 bits)
         f_[1,0]: backward horizontal f_code (4 bits)
         f_[1,1]: backward vertical f_code (4 bits)
         DC: intra_DC_precision (2 bits)
         PS: picture_structure (2 bits)
         T: top_field_first (1 bit)
         P: frame_predicted_frame_dct (1 bit)
         C: concealment_motion_vectors (1 bit)
         Q: q_scale type (1 bit)
         V: intra_vlc_format (1 bit)
         A: alternate scan (1 bit)
         R: repeat_first_field (1 bit)
         H: chroma_420_type (1 bit)
         G: progressive frame (1 bit)


Hoffman/Fernando/Goyal/Civanlar                              [Page 10]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


         D: composite_display_flag (1 bit). If set to 1, next 32 bits
         following this one contains 12 zeros followed by 20 bits of
         composite display information.

         These values are copied from the most recent picture coding
         extension and are constant for each RTP packet of a given
         picture. Their meanings are as explained in the MPEG-2
         standard.

3.5 MPEG Audio-specific header

     This header shall be attached to each RTP packet at the start of
     the payload and after any RTP headers for an MPEG1/2 Audio payload
     type.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |             MBZ               |          Frag_offset          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Frag_offset: Byte offset into the audio frame for the data in
          this packet.


A. Error Recovery and Resynchronization Strategies

   The following error recovery and resynchronization strategies are
   intended to be guidelines only.  A compliant receiver is free to
   employ alternative (or no) strategies.

   When initially decoding an RTP-encapsulated MPEG Elementary Stream,
   the receiver may discard all packets until the Sequence-header-
   present bit is set to 1.  At this point, sufficient state information
   is contained in the stream to allow processing by an MPEG decoder.

   Loss of packets containing the GOP_header and/or Picture_Header are
   detected by an unexpected change in the Temporal-Reference and
   Picture-Type values.  Consider the following example GOP sequence:

           In display order: 0B 1B 2I 3B 4B 5P 6B 7B 8P GOP_HDR 0B ...
           In stream order:  2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_HDR 2I ...

      Consider also two counters:

           ref_pic_temp (Reference Picture (I,P) Temporal Reference)
           dep_pic_temp (Dependent Picture (B) Temporal Reference)




Hoffman/Fernando/Goyal/Civanlar                              [Page 11]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


   At each GOP beginning, set these counters to the temporal reference
   value of the corresponding picture type. For our example GOP
   sequence, ref_pic_temp = 2 and dep_pic_temp = 0. Keep incrementing
   BOTH counters by unity with each following picture. Ref_pic_temp
   should match the temporal references of the I and P frames, and
   dep_pic_temp should match the temporal references of the B frames.

          dep_pic_temp: -  0  1  2  3  4  5  6  7        8  9
      In stream order:  2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ...
          ref_pic_temp: 2  3  4  5  6  7  8  9  10  ^    11
                        --------------------------  |    ^
                                   Match            Drop |
                                                         Mismatch
                                                       in ref_pic_temp

   The loss of a GOP header can be detected by matching the appropriate
   counter (based on picture type) to the temporal reference value. A
   mismatch indicates a lost GOP header. If desired, a GOP header can be
   re-constructed using a "null" time_code, repeating the closed_gop
   flag from previous GOP headers, and setting the broken_link flag to
   1. If variable frame rate video is being used and the extent of
   successive packet losses is larger than a GOP, however; the loss of
   the GOP header may not be detected.

   The loss of a Picture_Header can also be detected by a mismatch in
   the Temporal Reference contained in the RTP packet from the
   appropriate dep_pic_temp or ref_pic_temp counters at the receiver.

   For MPEG-1 payloads, after scanning to the next Beginning-of-slice
   the Picture_Header is reconstructed from the P, TR, FBV, BFC, FFV and
   FFC contained in that packet, and from stream-dependent default
   values.

   For MPEG-2, additional information is needed for the reconstruction.
   This information is provided by the MPEG-2 video specific header
   extension contained in that packet if the T bit is set to 1, or the
   Picture Header for the current picture may be available from previous
   packets belonging to the same picture. The transmitter's strategy for
   inclusion of the MPEG-2 video specific header extension may depend
   upon a number of factors. This header may not be needed when:

         1. the information has been transmitted a sufficient number of
         times in previous packets to assure reception with the desired
         probability, or

         2. the information is transmitted over a separate reliable
         channel, or




Hoffman/Fernando/Goyal/Civanlar                              [Page 12]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


         3. expected loss rates are low enough that missed frames are
         not a concern, or

         4. conserving bandwidth is more important than error
         resilience, etc.

   If T=1 and E=0, there may be extensions present in the original video
   bitstream that are not included in the current packet. The
   transmitter may choose not to include extensions in a packet when
   they are not necessary for decoding or if one of the cases listed
   above for not including the MPEG-2 video specific header extension in
   a packet applies only to the extension data.

   If N=0, then the Picture Header from a previous picture of the same
   type (I,P or B) may be used so long as at least one packet has been
   received for every intervening picture of the same type and that the
   N bit was 0 for each of those pictures. This may involve:

         1. Saving the relevant picture header information that can be
         obtained from the MPEG-2 video specific header extension or
         directly from the video bitstream for each picture type,
         2. Keeping validity indicators for this saved information based
         on the received N bits and lost packets, and,

         3. Updating the data whenever a packet with N=1 is received.

   If the necessary information is not available from any of these
   sources, data deletion until a new picture start code is advised.

   Any time an RTP packet is lost (as indicated by a gap in the RTP
   sequence number), the receiver may discard all packets until the
   Beginning-of-slice bit is set.  At this point, sufficient state
   information is contained in the stream to allow processing by an MPEG
   decoder starting at the next slice boundary (possibly after
   reconstruction of the GOP_header and/or Picture_Header as described
   above).

B. Changes from RFC 2250

     .   Use of dynamic payload types that can specify the clock
          frequency (accuracy) of the timestamps through non-RTP means
          is allowed.

          o  In accordance with this, the references to "90 kHz" in
             "sender's clock reference" in Section 2 and "timestamp"
             definition in Section 3.3 have been removed.

     .   The following items have been reworded:



Hoffman/Fernando/Goyal/Civanlar                              [Page 13]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


          o Section 3.2: Audio frame fragmentation
          o Section 3.3: M bit definition

     .   A case for which the GOP header loss detection algorithm may
          not work has been added to Appendix A.

C. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [3], and any appropriate RTP profile (for example [4]).
   This implies that confidentiality of the media streams is achieved by
   encryption. Because the data compression used with this payload
   format is applied end-to-end, encryption may be performed after
   compression so there is no conflict between the two operations.

   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load. The attacker can inject pathological datagrams
   into the stream which are complex to decode and cause the receiver to
   be overloaded. However, this encoding does not exhibit any
   significant non-uniformity.

D. References

   1. ISO/IEC International Standard 11172; "Coding of moving pictures
      and associated audio for digital storage media up to about 1,5
      Mbits/s", November 1993.

   2. ISO/IEC International Standard 13818; "Generic coding of moving
      pictures and associated audio information", November 1994.

   3. Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
      "RTP: A Transport Protocol for Real-Time Applications", RFC 3550,
      July 2003.

   4. Schulzrinne, H., Casner, S., "RTP Profile for Audio and Video
      Conferences with Minimal Control", RFC 3551, July 2003.

   5. Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
      Levels", BCP 14, RFC 2119, March 1997.

   6. Handley, M. and V. Jacobson, "SDP: Session Description Protocol",
      RFC 2327, April 1998.



Hoffman/Fernando/Goyal/Civanlar                              [Page 14]


Internet Draft    RTP Payload Format for MPEG1/MPEG2         July 2003


E. Acknowledgements

   Humphrey Liu reported the need for the improved time resolution. Ram
   Kordale noticed the problem with recovering GOP headers under large
   scale data losses. Ross Finlayson helped with the rewordings.

F. Author's Addresses

      M. Reha Civanlar
      Ko‡ University
      Computer Engineering Department
      Sariyer, Istanbul 34450
      TURKEY

      Phone: +90 212-338-1719
      EMail: rcivanlar@ku.edu.tr

      Gerard Fernando
      Sun Microsystems, Inc.
      Mail-stop UMPK14-305
      2550 Garcia Avenue
      Mountain View, California 94043-1100
      USA

      Phone: +1 415-786-6373
      EMail: gerard.fernando@eng.sun.com

      Vivek Goyal
      Packet Design, Inc.
      3400 Hillview Ave, Bldg 3
      Palo Alto, CA 94304
      USA

      Phone: +1 650-739-1850
      EMail: vivek@packetdesign.com


      Don Hoffman
      Sun Microsystems, Inc.
      Mail-stop UMPK14-305
      2550 Garcia Avenue
      Mountain View, California 94043-1100
      USA

      Phone: +1 503-297-1580
      EMail: don.hoffman@eng.sun.com





Hoffman/Fernando/Goyal/Civanlar                              [Page 15]