Internet Engineering Task Force             Audio Visual Transport WG
   Internet-Draft                                 C.Guillemot, P.Christ,
                                                    S.Wesner, A. Klemets
   draft-ietf-avt-mpeg4streams-00.txt     INRIA / Univ. Stuttgart - RUS /
                                                               Microsoft
   March, 1 2000
   Expires: September, 1 2000




                     RTP Payload Format for MPEG-4
                                  with
                        Flexible Error Resiliency


                            STATUS OF THIS MEMO

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.
   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.
   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time. It is inappropriate to use Internet- Drafts as refer-
   ence material or to cite them other than as "work in progress."
   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


                                 Abstract

   This document describes a payload format, which can be used for the
   transport of both MPEG-4 Elementary Streams (ES), i.e audio, visual,
   BIFS and OD streams and MPEG-4 Sync Layer and Flexmux packet
   streams, in RTP [1] packets.  The payload format allows for protec-
   tion against loss in a generic way. The mechanisms proposed can op-
   erate both on full and partial MPEG-4 ES Access Units, on Sync Layer
   packets, or Flexmux packets.  These mechanisms can cover a broad
   range of protection schemes and avoid extra connection management
   complexity - e.g. for separate FEC channels - in MPEG-4 applications
   with a potentially high number of streams.








Guillemot/Christ/Wesner/Klemets.                            [Page 1]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

                             Table of Contents


    1     Introduction..............................................3
    2     MPEG-4 overview...........................................4
    2.1   Scene description framework...............................4
    2.2   MPEG-4 Systems............................................4
    2.3   MPEG-4 profiles...........................................6
    3     Design Considerations.....................................6
    4     Payload Format specification..............................9
    4.1   RTP Header Usage..........................................9
    4.2   Payload Header...........................................10
    4.3   Payload for the transport of ES..........................11
    4.4   Payload for the transport of SL-PDUs.....................11
    4.5   Payload for the transport of FlexMux-PDUs................12
    5     Extension data field for FEC data........................13
    5.1   Extension data field for Parity Codes....................13
    6     Multiplexing.............................................17
    7     Security Considerations..................................17
    8     Authors Addresses........................................17
    9     References...............................................18


                              List of Figures


    Figure 1: Structure of FlexMux packet in simple mode.............5
    Figure 2: Structure of FlexMux packet in MuxCode mode............6
    Figure 3: Architecture...........................................7
    Figure 4: Example of ESI.........................................8
    Figure 5: RTP payload format....................................10
    Figure 6: Portrait of the unified approach for transport of ES and
               SL packetized streams................................12
    Figure 7: Sample RTP payload for SL PDU transport...............12
    Figure 8: Sample RTP payload for FlexMux-PDU transport with
               protection support...................................13
    Figure 9: Sample RTP payload for FlexMux-PDU transport..........13
    Figure 10: FEC Header for Parity Codes..........................14
    Figure 11: Simplified FEC Header for Parity Codes  (with default
               masks)...............................................15
    Figure 12: FEC Header for Reed-Solomon Codes....................16
    Figure 13: Example of Interleaving (for P=7)....................16













Guillemot/Christ/Wesner/Klemets.                             [Page 2]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000



   1     Introduction


   The MPEG-4 standard targets a very large range of applications: from
   classical videotelephony and videoconferencing applications to ap-
   plications requiring a very high degree of interaction with audio-
   visual scenes.  In order to reach this latter goal very advanced
   tools have been specified in the different parts of the standard
   (Audio, Visual, Systems) which can be configured according to pro-
   files to meet various application requirements.

   This document is motivated by the large number of profiles, the
   large variety of MPEG-4 compressed streams (audio, visual, BIFS, OD,
   SL, FlexMux), and by the need for a flexible degree of protection to
   be applied to them.  In addition to having a unique payload format
   for both MPEG-4 Elementary Streams (ES), Synchronization Layer
   packet (SL-PDU Streams) or Flexmux packet streams, another motiva-
   tion is flexibility in associating error control mechanisms with the
   compressed media streams, in order to provide protection to various
   applications, not restricted just to simple profiles.  The error
   control mechanisms can be dynamically adapted to different types of
   stream elements (e.g. Access Units, segments, packets) and/or net-
   work characteristics .

   This design of this payload format has been inspired by previous
   proposals for generic payload formats, [2-3].  Additionally, it at-
   tempts to federate different error control approaches under a single
   protocol support mechanism.  The rationale for this payload format
   consists in:

     - A unified approach for both MPEG-4 ES, MPEG-4 sync layer, and
        Flexmux  packet streams - with simple grouping mechanisms.

     - A solution independent of the usage or the non-usage of the
        MPEG-4 OD framework, and not restricted to MPEG-4 simple pro-
        files.

     - Protection against packet loss with a flexible support of a
        range of loss control mechanisms (redundant data such as re-
        peated important segments of the elementary streams or FEC)
        adapted to typed segments of streams. Typed segments are parts
        of Access Units (AUs) being - in terms of the encoding syntax -
        syntactical and semantically meaningful parts of an AU - cf.
        [4], 7.2.3: "Such partial AUs may have significance for im-
        proved error resilience". -
        Access Units are the smallest entities in the bitstream that
        can be attributed individual timestamps. The -  in-band -
        mechanism proposed avoid extra connection management complexity
        possibly brought by separate FEC channels. Indeed, in MPEG-4
        applications, the number of streams can potentially be high.




Guillemot/Christ/Wesner/Klemets.                             [Page 3]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

    - protection against packet loss with a protocol support easily
       adaptable to varying network conditions, for both "live" and
       "pre-recorded" visual contents.

   The list of all the protection schemes supported will be announced
   via an out-of-band signaling at the beginning of the session, using
   for example SDP [7].  The protection scheme used at a specific in-
   stant during the session will be signaled via the extension type
   (XT) field in the payload header.


   2     MPEG-4 overview


   2.1  Scene description framework


   An MPEG-4 scene is composed of media objects. The MPEG-4 dynamic-
   scene description framework, which defines the spatio-temporal rela-
   tion of the media objects as well as their contents, is inspired by
   VRML.  The compressed binary representation of the scene description
   is called BIFS (Binary Format for Scenes), [4]. The compressed scene
   description is conveyed through one or more Elementary Streams (ES).

   A compression layer produces the compressed representations of the
   audio-visual objects that will be inserted into the scene.  These
   compressed representations are organized into Elementary Streams
   (ES).  Elementary Stream Descriptors provide information relative to
   the stream, such as the compression scheme used.  Elementary stream
   data is partitioned into Access Units.  The delineation of an Access
   Unit is completely determined by the entity - the compression layer
   - that generates the elementary stream.  An Access Unit is the
   smallest data entity to which timing information can be attributed.
   Two Access Units shall never refer to the same point in time.

   Natural and animated synthetic objects may refer to an Object De-
   scriptor (OD), which points to one or more Elementary Streams that
   carry the coded representation of the object or its animation data.
   An OD serves as a grouping of one or more Elementary Stream Descrip-
   tors that refer to a single media object.  The OD also defines the
   hierarchical relations and properties of the Elementary Streams De-
   scriptors.

   A complete set of ODs can be seen as an MPEG-4 resource or session
   description.  The Object Descriptors are conveyed through one or
   more Elementary Streams.  By conveying the session (or resource) de-
   scription as well as the scene description through their own Elemen-
   tary Streams, it becomes possible to change portions of scenes
   and/or properties of media streams separately and dynamically at
   well-known instants of time.


   2.2  MPEG-4 Systems




Guillemot/Christ/Wesner/Klemets.                            [Page 4]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   The MPEG-4 Systems specification [4] also defines a packetization of
   ES data into access units or parts thereof.  The packets are called
   SL packets, or SL-PDUs.  The resulting sequence of SL packets is
   called the SL-Packetized Stream (SPS).  Access Units are the only
   semantic entities at this layer and their content is opaque.  Pack-
   etization information has to be exchanged between the entity that
   generates an elementary stream and the sync layer.  This relation is
   best described by a conceptual interface between both layers, termed
   the Elementary Stream Interface (ESI).

   A SL packet (SL-PDU) consists of an SL packet header and a SL packet
   payload.  The SL packet header provides means for continuity check-
   ing in case of data loss and carries the coded representation of the
   time stamps and associated information.  This syntax is configurable
   to adapt to the needs of different types of elementary streams and
   is defined in the SLConfigDescriptor.

   A SL-PDU does not contain an indication of its length.  Therefore,
   SL packets must be framed by a suitable lower layer protocol. Conse-
   quently, a SL-PDU stream is not a self-contained data stream that
   can be stored or decoded without such framing.

   SL-PDUs of varying instantaneous bit rate can then be interleaved by
   using the FlexMux tool. The basic data entity of the FlexMux is a
   FlexMux packet, which has a variable length. Two different modes of
   operation of the FlexMux: the Simple Mode and the MuxCode Mode. In
   the simple mode one SL packet is encapsulated in one FlexMux packet
   and tagged by an index which is equal to the FlexMux Channel number
   as shown in Figure 1 below [4].


   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+
   |   index       | length        |        SL-PDU                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+
                                   | header    |   payload        |
                                   +-+-+-+-+-+-+... +-+-+-+-+-+-+-+


           Figure 1: Structure of FlexMux packet in simple mode




   In the MuxCode mode one or more SL packets are encapsulated in one
   FlexMux packet. In this mode the index value is used to dereference
   configuration information that defines the allocation of the FlexMux
   packet payload to different FlexMux Channels.








Guillemot/Christ/Wesner/Klemets.                            [Page 5]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000



   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+
   |   index       | length        |  version      |SL-PD | à |SL-PDU |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+


           Figure 2: Structure of FlexMux packet in MuxCode mode


   The Flexmux tool is optional.


   2.3  MPEG-4 profiles


   In order to allow effective implementations of the standard, subsets
   of the MPEG-4 Systems, Visual, and Audio tool sets have been
   identified, that can be used for specific applications. Profiles
   exist for various types of media content (audio [8], visual [5], and
   graphics [5]) and for scene descriptions [4]. Depending on the
   different visual profiles, different sets of parameters will be
   present in the header of the VideoObjectPlane().

   A set of error resilience tools has been defined in the MPEG-4 vis-
   ual syntax in order to recover corrupted headers [5]. In particular,
   the VideoObjectPlane data is structured in video packets, the entry
   point being defined by the function video_packet_header(), and
   delimited by resync_markers. Basic configuration parameters can be
   inserted in the packet header.  However, this concerns only
   parameters used in the simple visual profile, many parameters
   essential in the simple scalable, main and core profiles are not
   covered by this mechanism [5].

  Also, no such mechanism has been defined for BIFS and ODs streams.
  Although, TCP could be envisaged for the transport of BIFS and ODs
  under mild time constraints, TCP may not be suited under tight timing
  constraints for scene animation, update, and in multicast scenarios.


   3     Design Considerations


   The design goals of this RTP payload format are to provide the fol-
   lowing:
    - a unified solution, with error protection easily adaptable to
       varying network conditions, for both "live" and "pre-recorded"
       contents.
    - a unified solution for the transport of SL packet streams - with
       a possible N-to-1 mapping -, of Flexmux packet streams, and for
       the transport of robust ES (audio, visual, BIFS, Ods, IPMP)
       data.
    - a solution supporting advanced profiles (i.e. not restricted to
       the simple audio/visual profile), and independent of the usage
       or non-usage of the OD framework.



C.Guillemot et al.                                            [Page 6]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

  Figure 3, on the following page, shows the adopted model. It relies
  on an optional network adaptation layer, which supports protection
  mechanisms. Ideally, this network adaptation layer is both media and
  network aware.

   The compression layer organizes the ESs in Access Units (AU).  The
   AUs are the smallest entities that can be attributed individual
   timestamps.  The timestamps may be obtained directly, through the
   ESI, with syntax as specified by the SLConfigDescriptor.  If the
   SLConfigDescriptor indicates that timestamps are absent, the time-
   stamps may be obtained indirectly, for example, by using the frame
   rate.

   The compression layer passes full or partial Access Units, together
   with indications of AU boundaries, random access points, desired
   timing information as described by the SLConfigDescriptor, directly
   or indirectly (via the sync layer) to the network adaptation layer.
   It is however preferable, for implementation efficiency, to pass the
   ES data directly to the network adaptation layer, i.e. to avoid pro-
   ducing the full SL packets. Partial AUs or typed segments are - in
   terms of the encoding syntax - syntactical and semantically meaning-
   ful parts of an AU - cf. [4], 7.2.3, "Such partial AUs may have sig-
   nificance for improved error resilience".)

   ---         ----------------------------------
   |S|        |         Compression Layer        |   Media aware
   |L|        -----------------------------------
   | |                                         |
   |C|           ES Descriptor |               |
   |o|              |----------|---------|     |
   |n|            ES Type   RAP Flag    QoS    |
   |f|              |          |         |     |
   |.| -------------V----------V---------V-----|---- ESI
   |D|                                         |
   |e|        -------------------------------  |
   |s|        |                             |  |
   |c|        | Network Adaptation Layer    |<-O     Network aware
   |r|        | ->Redundancy, FEC |         |  |
   |.|        | |                 |         |  |
   -----------|-+- - - - - - - - -| - - -|  |  |
              --|-----------------|------|---  |
                |                 |      |     |
   -------------|--  -------------V------V-----V------
   | QoS          |  | RTP | | Ext.  | |"SL" | Media |
   | monitoring   |  | Hdr.| | Data= | |     |       |
   ----------------  |     | | e.g.  | |     |       |
                     |     | | FEC   | |     |       |
                     ---------------------------------



                          Figure 3: Architecture




Guillemot/Christ/Wesner/Klemets.                             [Page 7]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000





   Figure 4 lists parameters that should be passed along with the ES
   data.  The SLConfigDescriptor indicates the presence or absence of
   each parameter.  When any of these parameters are present, then the
   adaptation layer will directly produce the "stripped down" SL header
   to be inserted in the payload of the RTP packet.

   Note that, the normative behavior at the receiving side can be as-
   sured when the OD framework is present by using the SLConfigDescrip-
   tor, which is visible in the compression layer, or, outside the OD
   framework, by other means signaling the ES syntax e.g. through a
   "capability exchange".





                   DTS: Decoding Time Stamp
                   CTS: Composition Time Stamp
                   OCR: Object Clock Reference
                   IdleFlag
                   loop(randomAccess Flag
                        AUStartFlag
                        AUEndFlag
                        Esdata
                        dataLength
                        degradationPriority
                        segmentType )




                         Figure 4: Example of ESI





   The payload format also specifies a mechanism for grouping an AU or
   a partial AU, an SL-PDU or a FlexMux PDU together with protection
   data (FEC, redundant data).  This mechanism makes it possible to
   adapt the protection of the different typed segments, or SL-PDUs, to
   varying network conditions during the session, as well as to a deg-
   radation priority indicated by the SLConfigDescriptor.  The grouping
   mechanism can be also  used for grouping SL-PDUs, or possible Flex-
   Mux PDUs (the length of which is today limited to 256 bytes, length
   field of 8 bits).

   The payload format also supports a fragmentation mechanism where the
   full AUs or the partial AUs passed by the compression layer are
   fragmented at arbitrary boundaries.  This may result in fragments
   that are not independently decodable.  This kind of fragmentation


Guillemot/Christ/Wesner/Klemets.                             [Page 8]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   may be used in situations when the RTP packets are not allowed to
   exceed the path-MTU size.

   However, this media-unaware fragmentation is not recommended.  It is
   preferable that the compression layer provides partial AUs, in the
   form of typed segments, of a size small enough so that the resulting
   RTP packet can fit the MTU size.  However, it may be useful in the
   case of large audio frames which would have to be fragmented to fit
   the MTU size.

   Consecutive segments (e.g. video packets [5]) of the same type will
   be packed consecutively in the same RTP payload. The compression
   layer should provide partial AUs, of a size small enough so that the
   resulting RTP packet can fit the MTU size.  Note that passing par-
   tial AUs of small size will also facilitate congestion and rate con-
   trol based on the real output buffer management.  RTP packets that
   transport fragments belonging to the same AU will have their RTP
   timestamp set to the same value.


   4     Payload Format specification


   The packet will consist of an RTP header followed by possibly multi-
   ple payloads. They should be sent in the decoding order.


   4.1  RTP Header Usage


   Each RTP packet starts with a fixed RTP header. The following fields
   of the fixed RTP header are used:

  -  Marker bit (M bit): The marker bit of the RTP header is set to 1
     when the current packet carries the end of an access unit AU.

   - Payload Type (PT): Different payload types should be assigned for
     MPEG4 ES, MPEG4 SL-PDU, MPEG-4 FlexMux streams.  A payload type in
     the dynamic range should be chosen.

   - Timestamp: The RTP timestamp is set to the composition timestamp
     (CTS), if its presence is indicated by the SLConfigDescriptor, and
     if its length is not more than 32 bits.  Otherwise, i.e. if the
     CTS is not present or when not using the OD framework, the RTP
     timestamp should be set to the sampling instant of the first AU
     contained in the packet. The RTP timestamp encodes in this case
     the presentation time of the first AU contained in the packet. The
     RTP timestamp may be the same o successive packets if an AU
     occupies more than one packet. If the packet contains only
     'extension' data objects (see below), then the RTP timestamp is
     set at the value of the presentation time of the AU to which the
     first extension data object (e.g. FEC or redundant data) applies.





Guillemot/Christ/Wesner/Klemets.                             [Page 9]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   SSRC: A mapping between the ES identifiers and the SSRCs should be
   provided via out-of-band signaling (e.g. SDP).


   4.2  Payload Header


   The payload header is always present, with a variable length, and is
   defined as follows:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |G|E|    XT     |        LENGTH           |EBITS|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                                                               .
   +     Extension data                            +-+-+-+-+-+-+-+-+
   .                                               |G|E|F|  res    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             LENGTH            |          FOFFSET              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
   |                                                               .
   .                  Media Payload                                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




                       Figure 5: RTP payload format





   G (Group) (1 bit): If this field is 1, it indicates that the object
   associated to the current header is followed by another
   object.

   E (Extension) (1 bit): If its value is 1 then the next object
   contains Extension data. If its value is 0, then the next object
   contains AU data (full AU or partial AU - typed segment -).

   res (Reserved) (5 bits): this field is only present if the E-field
   is 0, resulting in always 1 byte for {G,E=1,XT} or {G,E=0,F,res}

   XT (Extension type) (6 bits):  This field is only present if E is
   set to 1. It then specifies the type of extension data. Examples of
   types will be FEC data with the specification of the FEC coding
   scheme (parity codes, block codes such as Reed Solomon codes,etc.),
   redundant data with duplicated high priority headers etc.

   LENGTH (13 bits): this field specifies the length in bytes of the
   next object. If the object is the last object of the payload (G=0)
   then this field is not present.



Guillemot/Christ/Wesner/Klemets.                           [Page 10]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   EBITS (3 bits): Indicates the number of bits that shall be ignored
   in the last byte of the extension data. If the object is the last
   object of the payload (G=0) then this field is not present.

   F (Fragmentation) (1 bit):  This field is only present when the E-
   field is 0. If its value is 1, then the next object is a fragment of
   a typed segment.  If this field is 0, then the next object is a com-
   plete typed segment or complete AU.

   FOFFSET (16 bits): This field is present only when the F field is
   present and F=1. It contains the byte offset of the first byte of
   the fragment of the segment from the beginning of the AU.  This
   field should be indeed rarely present, but may be useful to position
   the segment in the AU, when large Aus (eg. audio frames) have to be
   fragmented.


   4.3  Payload for the transport of ES


   An AU may be fragmented across packets. However, AU headers  and
   independently decodable partial AUs (or segments, e.g. video packets
   in the case of video streams) shall not be split across RTP packets.

   All AU-level decoder configuration information can be considered as
   information of high priority, since, if lost, the whole AU is lost.
   The extension data field may then be used for repeating the corre-
   sponding headers.


   4.4  Payload for the transport of SL-PDUs


   First SL-PDU in the payload:

   If the presence of the DTS - Decoding Time Stamp - is indicated by
   the SLConfigDescriptor, then the DTS value is placed as the first
   data of the media payload, the length of the field being provided by
   the SLConfigDescriptor.

   If the presence of the OCR - Object Clock Reference - is indicated
   by the SLConfigDescriptor, then the OCR value is placed as the sec-
   ond field of the media payload, the length of the field being pro-
   vided by the SLConfigDescriptor.

   If the payload format is used to accommodate SL-packet streams, the
   SN number, if present, can be placed as the third field of the media
   payload. Corresponding length values are provided by the SLConfigDe-
   scriptor.
   If the resulting optional parameters consume a non-integer number of
   bytes, zero padding bits must be inserted at the end of these pa-
   rameters to byte-align the rest of the payload.





Guillemot/Christ/Wesner/Klemets.                           [Page 11]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Payload Header | Optional Extension| Opt. parameters | Media   |
   |               | data              | as indicated by |.........|
   |               |                   | SLConfigDesc    | payload |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 6: Portrait of the unified approach for transport of ES and
                           SL packetized streams



   In scenarios where the sync layer is used without a need for further
   protection, the payload will be as illustrated in Figure 7.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |G|E|F|  res    | optional SL header parameters as indicated by .
   +-+-+-+-+-+-+-+-+    the SLConfigDescriptor                     .
   |                                                               .
   .                  Media payload                                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



   Figure 7: Sample RTP payload for SL PDU transport.



   N SL-PDUs in one RTP packet:

   The first SL-PDU in the packet will be treated as above. Each of the
   subsequent SL-PDUs will be a media object delimited by G,E, F, RES
   and LENGTH fields. The corresponding media object will start by the
   SL-PDU header immediately followed by the SL-PDU payload.  The
   LENGTH field will indicate the length of the corresponding SL-PDU.



   4.5  Payload for the transport of FlexMux-PDUs



   The RTP payload consists of one or more complete FlexMux-PDUs as
   visualized in the figure below. Each FlexMux-PDU consists of an
   index element that identifies the content of the FlexMux-PDU, the
   length of the payload, a version number (only for MuxCode mode) and
   the payload itself as specified in [1].  The length, the index and
   the version (only in the MuxCode mode) elements are placed as the
   first bytes in the media payload. If preceded by an extension data
   field, the whole payload of the packet will be as shown in figure 8
   below.  If the extension data field is not used then the payload
   will be as shown in figure 9.




Guillemot/Christ/Wesner/Klemets.                           [Page 12]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |G|E|    XT     |        LENGTH           |EBITS|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                                                               .
   +     Extension data                            +-+-+-+-+-+-+-+-+
   .                                               |G|E|F|  res    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   index       | length        | version       |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                                                               .
   .                  FlexMux PDU Payload                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


        Figure 8: Sample RTP payload for FlexMux-PDU transport with
                            protection support.





   The usage of the extension data mechanism allows to apply the FEC
   directly on FlexMux PDUs, hence, especially in the simple FlexMux
   mode, to apply different levels of protection to FlexMux PDUS
   transporting data from streams of different types (eg. BIFS, OD,
   IPMP, Audio, Video).


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |G|E|    res    | index         |  length       |  version      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   .                  FlexMux PDU payload                          .
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



          Figure 9: Sample RTP payload for FlexMux-PDU transport




   5     Extension data field for FEC data



   5.1  Extension data field for Parity Codes


   The Extension data field can be used for transporting FEC (parity
   codes) data in the spirit of [9]. The XT field is set to the type
   associated to the FEC mechanism (parity codes) used. The XT field


Guillemot/Christ/Wesner/Klemets.                         [Page 13]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   semantic, with all the FEC mechanisms supported, is announced via a
   non-RTP out of band signaling, such as SDP [7], with appropriate ex-
   tensions. Then the FEC mechanisms can, during the session, and de-
   pending on the segment type, and on the network characteristics, be
   adapted with a simple in-band signaling.

   The FEC operation, as defined in [9], acts on a stream of media
   packets without extension data, and generates a stream of FEC pack-
   ets. The media payload of the above media packets is then encapsu-
   lated in the object containing the AU data. The FEC header and FEC
   data are encapsulated in the extension data field. The extension
   data length field is set to the length of the FEC header plus FEC
   payload.

   The FEC header in the case of parity codes is given in Figure 10.
   It is inspired from the header specified in [9], with the following
   modifications: 1)- the PT recovery field is not used, since the
   payload type of the packets transported in a given channel is
   supposed to be known, namely to be of the type corresponding to this
   proposed payload; 2)- a R bit has been added in order to protect the
   marker bit of the media packets; 3)- In order for the FEC header to
   be byte-aligned, it is also proposed to reduce the mask length by 2
   bits (22 bits instead of 24).  This should be acceptable, since 24
   bits induces a very high delay.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      SN Base                  |        length recovery        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E|R|                Mask                       |               .
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   .                 TS Recovery                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




                  Figure 10: FEC Header for Parity Codes




   On the receiver side, the FEC packets will be reconstructed as de-
   fined in [9], by copying the sequence number, SSRC, CC field, RTP
   version and extension bit from the RTP header of the packets re-
   ceived.


   The fields SN base, E, Mask, TS recovery of the FEC header are de-
   fined as in [9]. The bit R is the Marker recovery bit. The marker
   bit is computed from the RTP media packets marker bits M, to which
   is applied the protection operation.

   The Length Recovery field determines the length of the recovered
   packets and is here computed via the protection operation applied to
   the 16 bit natural binary representation of the lengths (in bytes)

Guillemot/Christ/Wesner/Klemets.                            [Page 14]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   of the media payload, CSRC list, extension and padding of media
   packets associated with this FEC data, PLUS THE MARKER BIT.


   The length recovery field makes it possible to apply the procedure
   to media packets that are not of the same length.

   The protection also applies to sync layer parameters when present in
   the payload of the media packets. The advantage of the approach -
   with respect to having separate FEC packets - is a reduced overhead
   for sending the FEC data.

   It is also proposed to allocate 3 Extension Types to parity codes
   with 3 different default masks in order to reduce the overhead of
   the FEC header which would therefore become as in Figure 11below:


   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      SN Base                  |        length recovery        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E|R|                      TS Recovery                          .
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   .   |res        |
   +-+-+-+-+-+-+-+-+




             Figure 11: Simplified FEC Header for Parity Codes
                           (with default masks)





   The Extension data field can be used for transporting FEC (parity
   codes) data in the spirit of [10]. The XT field is set at to the
   type associated to the FEC mechanism (parity codes) used. The XT
   field semantic, with all the FEC mechanisms supported, is announced
   via a non-RTP out of band signaling, such as SDP [7], with  appro-
   priate extensions.

   The FEC operation, as defined in [10], acts on a stream of media
   packets without extension data, generating a stream of FEC packets.
   The media payload of the above media packets is then encapsulated in
   the object containing the AU data. The FEC header and FEC data are
   encapsulated in the extension data field. The extension data length
   field is set to the length of the FEC header plus FEC payload.

   The FEC header for Reed-Solomon codes is provided in figure 12. It
   is inspired from the header specified in [10], with the following
   modifications: 1)- the PT recovery field is not used, since the pay-
   load type of the packets transported in a given channel is supposed
   to be known, namely to be of the type corresponding to this proposed
   payload; 2)- a R bit has been added in order to protect the marker

Guillemot/Christ/Wesner/Klemets.                           [Page 15]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   bit of the media packets; 3)- In order for the FEC header to be
   byte-aligned, it is also proposed to reduce the length of the K
   field to 6 bits instead of 8 bits. Indeed, 8 bits would allow to
   process 256 media packets inducing a very high delay. The length of
   the N field is also reduced to 7 bits (corresponding to the maximum
   code rate of ») instead of 8 bits, and accordingly reduce the length
   of the i field from 8 to 6 bits, since the i field indicates the po-
   sition of the packet within the N-K FEC packets.4)- A P field has
   been added allowing for interleaving in order to create a FEC code
   capable of correcting longer bursts of packet losses.  The P field
   defines the interleaving periodicity minus 1, as illustrated in fig-
   ure 11 below for the special case of P=7.


   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      SN Base                  |        length recovery        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E|R|       N     |        k  |       i   |  P  |TS Recovery    .
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   .                  TS Recovery (cnt'd)          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




               Figure 12: FEC Header for Reed-Solomon Codes



   The advantage of the approach - with respect to having separate FEC
   packets - is a reduced overhead for sending the FEC data.

                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |  1  |  8  |  15 | 22  | à..   |mn-6 |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |  2  |  9  |  16 | 23  | à.    |mn-5 |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  .                                     .
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  |  7  |  14 |  21 | 28  | à..   |mn   |
                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 13: Example of Interleaving (for P=7)













Guillemot/Christ/Wesner/Klemets.                            [Page 16]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000

   6     Multiplexing


   MPEG-4 applications can involve a large number of ESs, and thus also
   a large number of RTP sessions. A multiplexing scheme allowing se-
   lective bundling of ES may therefore be necessary for some applica-
   tions. The multiplexing problem is outside the scope of this payload
   format.

   7     Security Considerations


   RTP packets transporting information with the proposed payload for-
   mat are subject to the security considerations discussed in the RTP
   specification [1]. This implies that confidentiality of the media
   streams is achieved by encryption.

   If the entire stream (extension data and AU data) is to be secured
   and all the participants are expected to have the keys to decode the
   entire stream, then the encryption is performed in the usual manner,
   and there is no conflict between the two operations (encapsulation
   and encryption).

   The need for a portion of stream (e.g. extension data) to be en-
   crypted with a different key, or not to be encrypted, would require
   application level signaling protocols to be aware of the usage of
   the XT field, and to exchange keys and negotiate their usage on the
   media and extension data separately.


   8     Authors Addresses


   Christine Guillemot
   INRIA
   Campus Universitaire de Beaulieu
   35042 RENNES Cedex, FRANCE
   email: Christine.Guillemot@irisa.fr

   Paul Christ
   Computer Center - RUS University of Stuttgart
   Allmandring 30
   D70550 Stuttgart, Germany.
   email: Paul.Christ@rus.uni-stuttgart.de

   Stefan Wesner
   Computer Center - RUS University of Stuttgart
   Allmandring 30
   D70550 Stuttgart, Germany.
   email: wesner@rus.uni-stuttgart.de

   Anders Klemets
   1 Microsoft Way
   Redmond, WA 98052-6399
   USA.
   E-mail: anderskl@microsoft.com


Guillemot/Christ/Wesner/Klemets.                           [Page 17]


Internet-Draft    Payload Format for MPEG-4 Streams        March 2000



   9     References


  [1]   H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A
        Transport Protocol for Real Time Applications",  RFC 1889,
        Internet Engineering Task Force, January 1996.
  [2]   A. Klemets, 'Common Generic RTP Payload Format', draft-klemets
        generic-rtp-00, March 13, 1998.
  [3]   A. Periyannan, D. Singer, M. Speer, 'Delivering Media Generi-
        cally over RTP', draft-periyannan-generic-rtp-00, March 13,
        1998
  [4]   ISO/IEC 14496-1 FDIS MPEG-4 Systems November 1998
  [5]   ISO/IEC 14496-2 FDIS MPEG-4 Visual November 1998
  [6]   Mark Handley, Van Jacobson, 'SDP: Session Description Proto-
        col', draft-ietf-mmusic-sdp-07.txt, 2nd Apr 1998.
  [7]   ISO/IEC 14496-3 FDIS MPEG-4 Audio November 1998.
  [8]   J. Rosenberg, H. Schulzrinne, "An RTP Payload format for
        Generic Forward Error Correction", draft-ietf-avt-fec-05.txt,
        26 Feb. 1999.
  [9]   J. Rosenberg, H. Schulzrinne, "An RTP Payload format for Reed
        Solomon Codes", draft-ietf-avt-reedsolomon-00.txt, 3 November
        1998.
































Guillemot/Christ/Wesner/Klemets.                        [Page 18]