Internet Draft                                               S. Wenger
 Document: draft-ietf-avt-rtp-h264-02.txt               M.M. Hannuksela
 Expires: December 2003                                  T. Stockhammer
                                                          M. Westerlund
                                                              D. Singer
                                                              June 2003
                                                  Expires December 2003
 
 
 
 
                   RTP payload Format for H.264 Video
 
 
 
 Status of this Memo
 
 This document is an Internet-Draft and is in full conformance with
 all provisions of Section 10 of RFC2026.  Internet-Drafts are working
 documents of the Internet Engineering Task Force (IETF), its areas,
 and its working groups.  Note that other groups may also distribute
 working documents as Internet-Drafts.
 
 Internet-Drafts are draft documents valid for a maximum of six months
 and may be updated, replaced, or obsoleted by other documents at any
 time.  It is inappropriate to use Internet-Drafts as reference
 material or to cite them other than as "work in progress."
 
 The list of current Internet-Drafts can be accessed at
 http://www.ietf.org/1id-abstracts.txt
 
 The list of Internet-Draft Shadow Directories can be accessed at
 http://www.ietf.org/shadow.html
 
 
 Copyright Notice
 
    Copyright (C) The Internet Society (2003).  All Rights Reserved.
 
 Abstract
 
    This memo describes an RTP Payload format for the ITU-T
    Recommendation H.264 video codec.  This codec was designed as a
    joint project of the Video Coding Experts Group (VCEG) of ITU-T and
 Wenger et. al.      Expires August 2003            [Page 1]


 Internet Draft                                          26 June, 2003
    the Moving Picture Experts Group (MPEG) of ISO/IEC.  Recommendation
    H.264 was approved by ITU-T on May 2003, and the approved draft
    specification is available for public review.  ISO/IEC
    International Standard 14496-10 will be technically identical to
    ITU-T Recommendation H.264.
 
 Wenger et. al.     Expires December 2003                [Page 2]


 Internet Draft                                          26 June, 2003
 
 Table of Contents
 
 1. Introduction..............................................5
  1.1. The H.264 codec.........................................5
  1.2. Parameter Set Concept...................................6
  1.3. Network Abstraction Layer Unit Types......................7
 2. Conventions...............................................9
 3. Scope ....................................................9
 4. Definitions...............................................9
 5. RTP Payload Format........................................ 11
  5.1. RTP Header Usage....................................... 11
  5.2. Common structure of the RTP payload format................ 13
  5.3. Decoding Order Number (DON)............................. 14
  5.4. Single NAL Unit Packet ................................. 16
  5.5. Aggregation Packets.................................... 17
  5.6. Fragmentation Units (FUs)............................... 25
 6. Packetization Rules....................................... 28
  6.1. Unrestricted Mode (Multiple Picture Model)................ 29
  6.2. Restricted Mode (Single Picture Model).................... 30
 7. De-Packetization Process (Informative)....................... 30
 8. Payload Format Parameters.................................. 33
  8.1. MIME Registration...................................... 33
  8.2. SDP Parameters ........................................ 37
 9. Security Considerations.................................... 38
 10. Informative Appendix: Application Examples .................. 39
  10.1. Video Telephony, No Slice Data Partitioning, No Packet
  Aggregation ............................................... 39
  10.2. Video Telephony, Interleaved Packetization Using Packet
  Aggregation ............................................... 40
  10.3. Video Telephony, with Data Partitioning.................. 41
  10.4. Low-Bit-Rate Streaming................................. 41
  10.5. Robust Packet Scheduling in Video Streaming.............. 42
 11. Informative Appendix: Rationale for Decoding Order Number..... 43
  11.1. Introduction......................................... 43
  11.2. Example of Multi-Picture Slice Interleaving.............. 43
  11.3. Example of Robust Packet Scheduling..................... 45
  11.4. Robust Transmission Scheduling of Redundant Coded Slices... 49
  11.5. Remarks on Other Design Possibilities.................... 50
 12. Open Issues............................................. 51
 13. Full Copyright Statement.................................. 51
 14. Intellectual Property Notice............................... 52
 15. References.............................................. 52
  15.1. Normative References.................................. 52
  15.2. Informative References................................. 53
 Annex A: Changes relative to draft-ietf-avt-rtp-h264-01.txt....... 55
 
 Wenger et. al.     Expires December 2003                [Page 3]


 Internet Draft                                          26 June, 2003
 
 
 Wenger et. al.     Expires December 2003                [Page 4]


 Internet Draft                                          26 June, 2003
 1.    Introduction
 
 1.1.      The H.264 codec
 
    This memo specifies an RTP payload specification for the video
    coding standard known as ITU-T Recommendation H.264 and ISO/IEC
    International Standard 14496 Part 10 (also known as MPEG-4 Advanced
    Video Coding).  Recommendation H.264 was approved by ITU-T on May
    2003, and the approved draft specification is available for public
    review [1].  In this memo the H.264 acronym is used for the codec
    and the standard, but the memo is equally applicable to the ISO/IEC
    counterpart of the coding standard.
 
    The H.264 video codec has a very broad application range that
    covers all forms of digital compressed video from low bit rate
    Internet Streaming applications to HDTV broadcast and Digital
    Cinema applications with near loss-less coding.  Most, if not all,
    relevant companies in all of these fields (including Video-
    Conferencing, Streaming, TV broadcast, and Digital Cinema) have
    participated in the standardization, which gives hope that this
    wide application range is more than an illusion and may
    materialize, probably in a relatively short time frame.  The
    overall performance of H.264 is as such that bit rate savings of
    50% or more, compared to the current state of technology, are
    reported.  Digital Satellite TV quality, for example, was reported
    to be achievable at 1.5 Mbit/s, compared to the current operation
    point of MPEG 2 video at around 3.5 Mbit/s [6].
 
    The codec specification [1] itself distinguishes conceptually
    between a video coding layer (VCL), and a network abstraction layer
    (NAL).  The VCL contains the signal processing functionality of the
    codec, things such as transform, quantization, motion
    search/compensation, and the loop filter.  It follows the general
    concept of most of today's video codecs, a macroblock-based coder
    that utilizes inter picture prediction with motion compensation,
    and transform coding of the residual signal.  The VCL encoder
    outputs slices: a bit string that contains the macroblock data of
    an integer number of macroblocks, and the information of the slice
    header (containing the spatial address of the first macroblock in
    the slice, the initial quantization parameter, and similar).
    Macroblocks in slices are ordered in scan order unless a different
    macroblock allocation is specified, using the so-called Flexible
 Wenger et. al.     Expires December 2003                [Page 5]


 Internet Draft                                          26 June, 2003
    Macroblock Ordering syntax.  In-picture prediction is used only
    within a slice. More information is provided in [6].
 
    The NAL encoder encapsulates the slice output of the VCL encoder
    into Network Abstraction Layer Units (NAL units), which are
    suitable for the transmission over packet networks or the use in
    packet oriented multiplex environments.  Annex B of H.264 defines
    an encapsulation process to transmit such NAL units over byte-
    stream oriented networks.  In the scope of this memo Annex B is not
    relevant.
 
    Internally, the NAL uses NAL Units.  A NAL unit consists of a one-
    byte header and the payload byte string.  The header co-serves as
    the RTP payload header and indicates the type of the NAL unit, the
    (potential) presence of bit errors or syntax violations in the NAL
    unit payload, and information regarding the relative importance of
    the NAL unit for the decoding process.  This RTP payload
    specification is designed to be unaware of the bit string in the
    NAL unit payload.
 
    One of the main properties of H.264 is the complete decoupling of
    the transmission time, the decoding time, and the sampling or
    presentation time of slices and pictures.  The decoding process
    specified in H.264 is unaware of time, and the H.264 syntax does
    not carry information such as the number of skipped frames (as
    common in the form of the Temporal Reference in earlier video
    compression standards).  Also, there are NAL units that are
    affecting many pictures and are, hence, inherently time-less.  For
    this reason, the handling of the RTP timestamp requires some
    special considerations for those NAL units for which the sampling
    or presentation time is not defined, or, at transmission time,
    unknown.
 
 
 1.2.      Parameter Set Concept
 
    One very fundamental design concept of H.264 is to generate self-
    contained packets, to make mechanisms such as the header
    duplication of RFC2429 [8] or MPEG-4's HEC [9] unnecessary.  The
    way how this was achieved is to decouple information that is
    relevant to more than one slice from the media stream.  This higher
 Wenger et. al.     Expires December 2003                [Page 6]


 Internet Draft                                          26 June, 2003
    layer meta information should be sent reliably, asynchronously and
    in advance from the RTP packet stream that contains the slice
    packets.  (Provisions for sending this information in-band are also
    available for such applications that do not have an out-of-band
    transport channel appropriate for the purpose).  The combination of
    the higher level parameters is called a parameter set.  The H.264
    specification includes two types of parameter sets: sequence
    parameter set and picture parameter set.  An active sequence
    parameter set remains unchanged throughout a coded video sequence,
    and an active picture parameter set remains unchanged within a
    coded picture.  The sequence and picture parameter set structures
    contain information such as
 
      o picture size,
      o optional coding modes employed, and
      o macroblock to slice group map.
 
    In order to be able to change picture parameters (such as the
    picture size), without having the need to transmit parameter set
    updates synchronously to the slice packet stream, the encoder and
    decoder can maintain a list of more than one sequence and picture
    parameter set.  Each slice header contains a codeword that
    indicates the sequence and picture parameter set to be used.
 
    This mechanism allows to decouple the transmission of parameter
    sets from the packet stream, and transmit them by external means,
    e.g. as a side effect of the capability exchange, or through a
    (reliable or unreliable) control protocol. It may even be possible
    that they get never transmitted but are fixed by an application
    design specification.
 
 
 1.3.      Network Abstraction Layer Unit Types
 
    Tutorial information on the NAL design can be found in [10],
    [11] and [12].
 
    All NAL units consist of a single NAL unit type octet, which also
    co-serves as the payload header.  The payload of a NAL unit follows
    immediately.
 
 Wenger et. al.     Expires December 2003                [Page 7]


 Internet Draft                                          26 June, 2003
    The syntax and semantics of the NAL unit type octet are specified
    in [1], but the essential properties of the NAL unit type octet are
    summarized below.  The NAL unit type octet has the following
    format:
 
    +---------------+
    |0|1|2|3|4|5|6|7|
    +-+-+-+-+-+-+-+-+
    |F|NRI|  Type   |
    +---------------+
 
    F: 1 bit
       forbidden_zero_bit, when zero, indicates a bit error free NAL
       unit.  The H.264 specification declares a value of 1 as a syntax
       violation.  Hence, when set, the decoder is advised that bit
       errors or any other syntax violation may be present in the
       payload or in the NAL unit type octet.  A prudent reaction of
       decoders that are incapable of handling erroneous bit streams is
       to discard such NAL units.
 
    NRI: 2 bits
       nal_ref_idc.  A value of 00 indicates that the content of the
       NAL unit is not used to reconstruct reference pictures for inter
       picture prediction.  Such NAL units can be discarded without
       risking the integrity of the reference pictures.  Values above
       00 indicate that the decoding of the NAL unit is required to
       maintain the integrity of the reference pictures.  Furthermore,
       values above 00 indicate the relative transport priority, as
       determined by the encoder.  Intelligent network elements can use
       this information to protect more important NAL units better than
       less important NAL units.  11 is the highest transport priority,
       followed by 10, then by 01 and, finally, 00 is the lowest.
 
    Type: 5 bits
       nal_unit_type.  The NAL Unit payload type as defined in table 7-
       1 of [1], and later within this memo.  Note that the NAL unit
       types defined in this memo are marked as unspecified in [1].
 
    For a reference of all currently defined NAL unit types and their
    semantics please refer to section 7.4.1 in [1].
 
 Wenger et. al.     Expires December 2003                [Page 8]


 Internet Draft                                          26 June, 2003
 
 2.    Conventions
 
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC 2119 [2].
 
    This specification uses the notion of setting and clearing a bit
    when handling bit fields. Setting a bit is the same as assigning
    that bit the value of 1 (On). Clearing a bit is the same as
    assigning that bit the value of 0 (Off).
 
 
 3.    Scope
 
    This payload specification can only be used to carry the "naked"
    H.264 NAL unit stream over RTP.  Likely, the first applications of
    this specification will be in the conversational multimedia field,
    video telephone or video conference.  The draft is not intended for
    the use in conjunction with the byte stream format of Annex B of
    H.264.
 
 
 4.    Definitions and Abbreviations
 
 4.1.      Definitions
 
    This document uses the definitions of [1]. The following terms
    defined in [1] are summed up below for convenience:
 
       access unit: A set of NAL units always containing a primary
       coded picture.  In addition to the primary coded picture, an
       access unit may also contain one or more redundant coded
       pictures or other NAL units not containing slices or slice data
       partitions of a coded picture.  The decoding of an access unit
       always results in a decoded picture.
 
       coded video sequence: A sequence of access units that consists,
       in decoding order, of an IDR access unit followed zero or more
       non-IDR access units including all subsequent access units up to
       but not including any subsequent IDR access unit.
 
 Wenger et. al.     Expires December 2003                [Page 9]


 Internet Draft                                          26 June, 2003
       instantaneous decoding refresh (IDR) access unit: An access unit
       in which the primary coded picture is an IDR picture.
 
       instantaneous decoding refresh (IDR) picture: A coded picture
       containing only slices with I or SI slice types that causes a
       "reset" in the decoding process.  After the decoding of an IDR
       picture all following coded pictures in decoding order can be
       decoded without inter prediction from any picture decoded prior
       to the IDR picture.
 
       primary coded picture: The coded representation of a picture to
       be used by the decoding process for a bitstream conforming to
       H.264.  The primary coded picture contains all macroblocks of
       the picture.
 
       redundant coded picture: A coded representation of a picture or
       a part of a picture.  The content of a redundant coded picture
       shall not be used by the decoding process for a bitstream
       conforming to H.264.  The content of a redundant coded picture
       may be used by the decoding process for a bitstream that
       contains errors or losses.
 
       VCL NAL unit: A collective term used to refer to coded slice and
       coded data partition NAL units.
 
    In addition, the following definitions apply:
 
       decoding order number (DON): A field in the payload structure or
       a derived variable indicating NAL unit decoding order.  Values
       of DON are in the range of 0 to 65535, inclusive.  After
       reaching the maximum value, the value of DON wraps around to 0.
 
       NAL unit decoding order: A NAL unit order that conforms to the
       constraints on NAL unit order given in section 7.4.1.2 in [1].
 
       transmission order: The order of packets in ascending RTP
       sequence number order (in modulo arithmetic).  Within an
       aggregation packet, the NAL unit transmission order is the same
       as the order of appearance of NAL units in the packet.
 
 
 Wenger et. al.     Expires December 2003               [Page 10]


 Internet Draft                                          26 June, 2003
 4.2.      Abbreviations
 
    DON:        Decoding Order Number
    DONB:       Decoding Order Number Base
    DOND:       Decoding Order Number Difference
    FU:         Fragmentation Unit
    IDR:        Instantaneous Decoding Refresh
    IEC:        International Engineering Consortium
    ISO:        International Organization for Standardization
    ITU-T:                      International Telecommunication Union,
                Telecommunication Standardization Sector
    MTAP:       Multi-Time Aggregation Packet
    MTAP16:     MTAP with 16-bit timestamp offset
    MTAP24:     MTAP with 24-bit timestamp offset
    NAL:        Network Adaptation Layer
    NALU:       NAL Unit
    STAP:       Single-Time Aggregation Packet
    STAP-A:     STAP type A
    STAP-B:     STAP type B
    TS:         Timestamp
    VCL:        Video Coding Layer
 
 
 5.    RTP Payload Format
 
 5.1.      RTP Header Usage
 
    The format of the RTP header is specified in RFC 1889 [3] and
    reprinted in Figure 1 for convenience.  This payload format uses
    the fields of the header in a manner consistent with that
    specification.
 
    When encapsulating one NAL unit per RTP packet, the RECOMMENDED RTP
    payload format is specified in section 5.2.  The RTP payload (and
    the settings for some RTP header bits) for aggregation packets and
    fragmentation units are specified in sections 5.5 and 5.6,
    respectively.
 Wenger et. al.     Expires December 2003               [Page 11]


 Internet Draft                                          26 June, 2003
 
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |V=2|P|X|  CC   |M|     PT      |       sequence number         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           timestamp                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           synchronization source (SSRC) identifier            |
    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
    |            contributing source (CSRC) identifiers             |
    |                             ....                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 1: RTP header according RFC 1889.
 
    The RTP header information is set as follows:
 
    Version (V): 2 bits
       Set to 2 according to RFC 1889.
 
    Padding (P): 1 bit
       Used according to RFC 1889.
 
    Extension (X): 1 bit
       Used according to RFC 1889 and profile definitions.
 
    CSRC count (CC): 4 bits
       Used according to RFC 1889.
 
    Marker bit (M): 1 bit
       Set for the very last packet of the access unit indicated by the
       RTP timestamp, in line with the normal use of the M bit and to
       allow an efficient playout buffer handling.  Decoders MAY use
       this bit as an early indication of the last packet of an access
       unit, but MUST NOT rely on this property.
       Informative note: Only one M bit is associated with an
       aggregation packet carrying multiple NAL units, and thus if a
       gateway has re-packetized an aggregation packet into several
       packets, it cannot reliably set the M bit of those packets.
 
 Wenger et. al.     Expires December 2003               [Page 12]


 Internet Draft                                          26 June, 2003
    Payload type (PT): 7 bits
       The assignment of an RTP payload type for this new packet format
       is outside the scope of this document, and will not be specified
       here.  The assignment of a payload type needs to be performed
       either through the profile used or in a dynamic way.
 
    Sequence number (SN): 16 bit
       Increased by one for each sent packet.  Set to a random value
       during startup as per RFC1889
 
    Timestamp: 32 bits
       The RTP timestamp is set to the sampling timestamp of the
       content.  A 90 kHz clock rate MUST be used.  If the content is a
       part of a coded frame that was sampled as two fields having
       distinct sampling times and that is supposed to be displayed as
       fields having distinct display times, the RTP timestamp is set
       to the sampling timestamp of the latest sampled field.  In
       addition, the picture timing SEI message (subclauses D.1.2 and
       D.2.2 of [1]) SHOULD be used to convey the timestamps for
       display, and the last clock timestamp in decoding order conveyed
       in a picture timing SEI message MUST correspond to the RTP
       timestamp of the primary coded picture of the same access unit.
       If the NAL unit has no own timing properties (e.g. parameter set
       and SEI NAL units), the RTP timestamp is set to the RTP
       timestamp of the primary coded picture of the access unit to
       which the NAL unit is included according to section 7.4.1.2 of
       [1].  The setting of the RTP Timestamp for MTAPs is defined in
       section 5.5.2.
 
    Synchronization source (SSRC) identifier: 32 bits
       Used according to RFC 1889.
 
    Contributing source (CSRC) identifiers: 0 to 15 items, 32 bits each
       Used according to RFC 1889.
 
 
 5.2.      Common structure of the RTP payload format
 
    The payload format is defined as a number of different payload
    structures depending on need.  However which structure that a
    received RTP packet contains is evident from the first byte of the
 Wenger et. al.     Expires December 2003               [Page 13]


 Internet Draft                                          26 June, 2003
    payload. This byte will always be structured as a NAL unit header.
    The NAL unit type field indicates which structure that is present.
    The possible structures are:
 
    Single NAL Unit Packet: Contains only a single NAL unit in the
    payload.  The NAL header type field will be equal to the original
    NAL type, i.e., in the range of 1 to 23, inclusive.  Specified in
    section 5.4.
 
    Aggregation packet: Packet type used to aggregate multiple NAL
    units into a single RTP payload.  This packet exists in four
    versions, the Single-Time Aggregation Packet type A (STAP-A), the
    Single-Time Aggregation Packet type B (STAP-B), Multi-Time
    Aggregation Packet (MTAP) with 16 bit offset (MTAP16), and Multi-
    Time Aggregation Packet (MTAP) with 24 bit offset (MTAP24).  The
    NAL type numbers assigned for STAP-A, STAP-B, MTAP16, and MTAP24
    are 24, 25, 26, and 27 respectively.  Specified in section 5.5.
 
    Fragmentation packet: Used to fragment a single NAL unit over
    multiple RTP packets. Identified with the NAL type number 27.
    Specified in section 5.6.
 
    Table 1 - Summary of NAL types and their payload structures.
 
    Type   Packet    Type name                       Section
    --------------------------------------------------------
    1-23   NAL unit  A single NAL unit packet          5.4
    24     STAP-A    Single-time aggregation packet    5.5.1
    25     STAP-B    Single-time aggregation packet    5.5.1
    26     MTAP16    Multi-time aggregation packet     5.5.2
    27     MTAP24    Multi-time aggregation packet     5.5.2
    28     FU        Fragmentation unit                5.6
 
 
 5.3.      Decoding Order Number (DON)
 
    This memo allows transmission of NAL units out of their decoding
    order.  Decoding order number (DON) is a field in the payload
    structure or a derived variable that indicates the NAL unit
    decoding order.  Rationale and example use cases for transmission
 Wenger et. al.     Expires December 2003               [Page 14]


 Internet Draft                                          26 June, 2003
    out of decoding order and for the use of DON are given in section
    11.
 
    The coupling of transmission and decoding order is controlled by
    the optional interleaving-depth MIME parameter as follows.  When
    the value of the optional interleaving-depth MIME parameter is
    equal to 0 and transmission of NAL units out of their decoding
    order is disallowed by external means, the transmission order of
    NAL units MUST conform to the NAL unit decoding order.  When the
    value of the optional interleaving-depth MIME parameter is greater
    than 0 or transmission of NAL units out of their decoding order is
    allowed by external means,
    o the order of NAL units in an STAP-B, an MTAP16, and an MTAP24 is
      NOT REQUIRED to be the NAL unit decoding order, and
    o the order of NAL units composed by decapsulating STAP-Bs, MTAPs,
      and FUs in two consecutive packets, when neither of these packets
      is a single NAL unit packet or an STAP-A, is NOT REQUIRED to be
      the NAL unit decoding order.
 
    The RTP payload structures for a single NAL unit packet and an
    STAP-A do not include DON.  STAP-B and FU structures include DON,
    and the structure of MTAPs enables derivation of DON as specified
    in section 5.5.2.
 
    Informative note: If a transmitter wants to encapsulate one NAL
    unit per packet and transmit packets out of their decoding order,
    STAP-B packet type can be used.
 
    The transmission order of NAL units in single NAL unit packets and
    STAP-As MUST be the same as their NAL unit decoding order.  The NAL
    units within an STAP-A MUST appear in the NAL unit decoding order.
    The transmission order of a single NAL unit packet or an STAP-A
    MUST succeed the transmission order of any NAL unit that precedes
    the single NAL unit packet or the STAP-A, respectively, in NAL unit
    decoding order.  The transmission order of a single NAL unit packet
    or an STAP-A MUST precede the transmission order of any NAL unit
    that succeeds the single NAL unit packet or the STAP-A,
    respectively, in NAL unit decoding order.
 
    Informative note: Due to the restrictions in the transmission order
    of single NAL unit packets and STAP-As, the decoding order of
 Wenger et. al.     Expires December 2003               [Page 15]


 Internet Draft                                          26 June, 2003
    single NAL unit packets and STAP-As is the same as their
    transmission order, and all NAL units preceding a single NAL unit
    packet or an STAP-A in transmission order are decoded before the
    single NAL unit packet or the STAP-A, respectively.
 
    The DON of the first NAL unit in transmission order MAY be set to
    any value.  Values of DON are in the range of 0 to 65535,
    inclusive.  After reaching the maximum value, the value of DON
    wraps around to 0.
 
    The decoding order of two NAL units is determined as follows.  Let
    the value of DON of one NAL unit be D1 and the value of DON of
    another NAL unit be D2.  If D1 < D2 and D2 - D1 < 32768, or if D1 >
    D2 and D1 - D2 >= 32768, then the NAL unit having a value of DON
    equal to D1 precedes the NAL unit having a value of DON equal to D2
    in NAL unit decoding order.  If D1 < D2 and D2 - D1 >= 32768, or if
    D1 > D2 and D1 - D2 < 32768, then the NAL unit having a value of
    DON equal to D2 precedes the NAL unit having a value of DON equal
    to D1 in NAL unit decoding order.
 
    Values of DON related fields (DON, DONB, and DOND, see section 5.5)
    MUST be such that the decoding order determined by the values of
    DON as specified above conforms to the NAL unit decoding order.  If
    the order of two consecutive NAL units in the NAL unit stream is
    switched and the new order still conforms to the NAL unit decoding
    order, the NAL units MAY have the same value of DON.  For example,
    when arbitrary slice order is allowed by the video coding profile
    in use, all the coded slice NAL units of a coded picture are
    allowed to have the same value of DON.  Consequently, NAL units
    having the same value of DON can be decoded in any order, and two
    NAL units having a different value of DON should be passed to the
    decoder in the order specified above.
 
    An example decapsulation process to recover the NAL unit decoding
    order is given in section 7.
 
 
 5.4.      Single NAL Unit Packet
 
    The single NAL unit packet defined here MUST contain one and only
    one NAL unit of the types defined in [1].  This means that neither
 Wenger et. al.     Expires December 2003               [Page 16]


 Internet Draft                                          26 June, 2003
    an aggregation packet nor a fragmentation unit can be used within a
    single NAL unit packet.  A NAL unit stream composed by
    decapsulating single NAL unit packets in RTP sequence number order
    MUST conform to the NAL unit decoding order.  The structure of the
    single NAL unit packet is shown in Figure 2.
 
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          RTP Header                           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     |                       Single NAL unit                         |
     |                                                               |
     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                               :...OPTIONAL RTP padding        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 2. RTP payload format for single NAL unit packet.
 
 
 5.5.      Aggregation Packets
 
    Aggregation packets are the NALU unit aggregation scheme of this
    payload specification.  The scheme is introduced to reflect the
    dramatically different MTU sizes of two key target networks --
    wireline IP networks (with an MTU size that is often limited by the
    Ethernet MTU size -- roughly 1500 bytes), and IP or non-IP (e.g.
    ITU-T H.324/M) based wireless communication systems with preferred
    transmission unit sizes of 254 bytes or less.  In order to prevent
    media transcoding between the two worlds, and to avoid undesirable
    packetization overhead, a NAL unit aggregation scheme is
    introduced.
 
    Two types of aggregation packets are defined by this specification:
 
    o Single-time aggregation packet (STAP) aggregates NAL units with
      identical NALU-time.  Two types of STAPs are defined, one without
      DON (STAP-A) and another one including DON (STAP-B).
    o Multi-time aggregation packet (MTAP) aggregates NAL units with
      potentially differing NALU-time.  Two different MTAPs are defined
      that differ in the length of the NAL unit timestamp offset.
 Wenger et. al.     Expires December 2003               [Page 17]


 Internet Draft                                          26 June, 2003
 
    The term NALU-time is defined as the value that the RTP timestamp
    would have if that NAL unit would be transported in its own RTP
    packet.
 
    Each NAL unit to be carried in an aggregation packet is
    encapsulated in an aggregation unit.  Please see below for the
    three different aggregation units and their characteristics.
 
    The structure of the RTP payload format for aggregation packets is
    presented in Figure 3.
 
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|NRI|  type   |                                               |
    +-+-+-+-+-+-+-+-+                                               |
    |                                                               |
    |             one or more aggregation units                     |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :...OPTIONAL RTP padding        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 3. RTP payload format for aggregation packets.
 
    MTAPs and STAPs share the following packetization rules:  The RTP
    timestamp MUST be set to the earliest of the NALU times of all the
    NAL units to be aggregated.  The type field of the NAL unit type
    octet MUST be set to the appropriate value as indicated in Table 2.
    The F bit MUST be cleared if all F bits of the aggregated NAL units
    are zero, otherwise it MUST be set.
 
 
 Wenger et. al.     Expires December 2003               [Page 18]


 Internet Draft                                          26 June, 2003
    Table 2: Type field for STAPs and MTAPs
 
    Type   Packet    Timestamp offset   DON related fields
                     field length       (DON, DONB, DOND)
                     (in bits)          present
    --------------------------------------------------------
    24     STAP-A       0                 no
    25     STAP-B       0                 yes
    26     MTAP16      16                 yes
    27     MTAP24      24                 yes
 
    The marker bit in the RTP header MUST be set to the value the
    marker bit of the last NAL unit of the aggregated packet would have
    if it were transported in its own RTP packet.
 
    The payload of an aggregation packet consists of one or more
    aggregation units.  See section 5.5.1 and 5.5.2 for the three
    different types of aggregation units.  An aggregation packet can
    carry as many aggregation units as necessary, however the total
    amount of data in an aggregation packet obviously MUST fit into an
    IP packet, and the size SHOULD be chosen such that the resulting IP
    packet is smaller than the MTU size.  An aggregation packet MUST
    NOT contain fragmentation units specified in section 5.6.
 
 
 5.5.1.        Single-Time Aggregation Packet
 
    Single-time aggregation packet (STAP) SHOULD be used whenever
    aggregating NAL units that share the same NALU-time.  The payload
    of an STAP-A does not include DON and consists of at least one
    single-time aggregation unit as presented in Figure 4.  The payload
    of an STAP-B consists of a 16-bit unsigned decoding order number
    (DON) followed by at least one single-time aggregation unit as
    presented in Figure 5.
 
 Wenger et. al.     Expires December 2003               [Page 19]


 Internet Draft                                          26 June, 2003
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    :                                               |
    +-+-+-+-+-+-+-+-+                                               |
    |                                                               |
    |                single-time aggregation units                  |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 4. Payload format for STAP-A.
 
 
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    :  decoding order number (DON)  |               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
    |                                                               |
    |                single-time aggregation units                  |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 5. Payload format for STAP-B.
 
    A single-time aggregation unit consists of 16-bit unsigned size
    information that indicates the size of the following NAL unit in
    bytes (excluding these two octets, but including the NAL unit type
    octet of the NAL unit), followed by the NAL unit itself including
    its NAL unit type byte.  A single-time aggregation unit is byte-
    aligned within the RTP payload but it may not be aligned on a 32-
    bit word boundary.  Figure 6 presents the structure of the single-
    time aggregation unit.
 
 Wenger et. al.     Expires December 2003               [Page 20]


 Internet Draft                                          26 June, 2003
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    :        NAL unit size          |               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
    |                                                               |
    |                           NAL unit                            |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 6. Structure for single-time aggregation unit.
 
    The NAL units within an STAP-A MUST appear in the NAL unit decoding
    order.  If the value of the optional interleaving-depth MIME
    parameter is equal to 0 or transmission of NAL units out of their
    decoding order is disallowed by other means, the NAL units within
    an STAP-B MUST appear in the NAL unit decoding order.  Otherwise,
    the NAL units within an STAP-B are NOT REQUIRED to appear in the
    NAL unit decoding order.
 
    Figure 7 presents an example of an RTP packet that contains an
    STAP-B.  The STAP contains two single-time aggregation units,
    labeled as 1 and 2 in the figure.
 
 Wenger et. al.     Expires December 2003               [Page 21]


 Internet Draft                                          26 June, 2003
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          RTP Header                           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |STAP-B NAL HDR | DON                           | NALU 1 Size   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
     :                                                               |
     +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |               | NALU 2 Size                   | NALU 2 HDR    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       NALU 2 Data                             |
     |                                                               |
     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                               :...OPTIONAL RTP padding        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 7. An example of an RTP packet including an STAP-B and two
    single-time aggregation units.
 
 
 5.5.2.        Multi-Time Aggregation Packets (MTAPs)
 
    The NAL unit payload of MTAPs consists of a 16-bit unsigned
    decoding order number base (DONB) and one or more multi-time
    aggregation units as presented in Figure 8.  DONB MUST contain the
    smallest value of DON among the NAL units of the MTAP.  The choice
    between the different MTAP types (MTAP16 and MTAP24) is application
    dependent -- the larger the timestamp offset is the higher is the
    flexibility of the MTAP, but the higher is also the overhead.
 
 Wenger et. al.     Expires December 2003               [Page 22]


 Internet Draft                                          26 June, 2003
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    :  decoding order number base   |               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
    |                                                               |
    |                 multi-time aggregation units                  |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 8. NAL unit payload format for MTAPs.
 
    Two different multi-time aggregation units are defined in this
    specification.  Both of them consist of 16 bits unsigned size
    information of the following NAL unit, an 8-bit unsigned decoding
    order number delta (DOND), and n bits of timestamp offset (TS
    offset) for this NAL unit, whereby n can be 16 or 24.  The
    structure of the multi-time aggregation units for MTAP16 and MTAP24
    are presented in Figure 9 and Figure 10 respectively.  Note that
    the starting or ending position of an aggregation unit within a
    packet is NOT REQUIRED to be on a 32-bit word boundary.  DON of the
    following NAL unit is equal to DONB + DOND.  The derived DON is
    REQUIRED to be in the range of 0 to 65535, inclusive.  This memo
    does not specify how the NAL units within an MTAP are ordered, but,
    in most cases, NAL unit decoding order SHOULD be used.  The
    timestamp offset field MUST be set to a value equal to the value of
    the following formula: (the NALU-time of the NAL unit - the RTP
    timestamp of the packet).
 
 Wenger et. al.     Expires December 2003               [Page 23]


 Internet Draft                                          26 June, 2003
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    :        NAL unit size          |      DOND     |  TS offset    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  TS offset    |                                               |
    +-+-+-+-+-+-+-+-+              NAL unit                         |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 9. Multi-time aggregation unit for MTAP16
 
 
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    :        NALU unit size         |      DOND     |  TS offset    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         TS offset             |                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
    |                              NAL unit                         |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 10. Multi-time aggregation unit for MTAP24
 
    For the "earliest" multi-time aggregation unit in an MTAP the
    timing offset MUST be zero.  Hence, the RTP timestamp of the MTAP
    itself is identical to the earliest NALU-time.
 
    Figure 11 presents an example of an RTP packet that contains a
    multi-time aggregation packet of type MTAP16 that contains two
    multi-time aggregation units, labeled as 1 and 2 in the figure.
 
 Wenger et. al.     Expires December 2003               [Page 24]


 Internet Draft                                          26 June, 2003
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          RTP Header                           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  NALU 1 HDR   |  NALU 1 DATA                                  |
     +-+-+-+-+-+-+-+-+                                               +
     :                                                               |
     +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |               | NALU 2 SIZE                   |  NALU 2 DOND  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
     |                                                               |
     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                               :...OPTIONAL RTP padding        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 11. An example of an RTP packet including a multi-time
    aggregation packet of type MTAP16 and two multi-time aggregation
    units.
 
 
 5.6.      Fragmentation Units (FUs)
 
    This payload type allows fragmenting a NAL unit into several RTP
    packets.  Doing so on the application layer instead of relying on
    lower layer fragmentation (e.g. by IP) has the following
    advantages:
 
    o The payload format is capable of transporting NAL units bigger
      than 64 kbytes over an IPv4 network that may be present in pre-
      recorded video, particularly in High Definition formats (there is
      a limit of the number of slices per picture, which results in a
      limit of NAL units per picture, which may result in big NAL
      units)
 Wenger et. al.     Expires December 2003               [Page 25]


 Internet Draft                                          26 June, 2003
    o The fragmentation mechanisms allows fragmenting a single picture
      and applying generic forward error correction, see e.g. [15], to
      fragmentation units with identical timestamp.
 
    Fragmentation is defined only for a single NAL unit, and not for
    any aggregation packets.  A fragment of a NAL unit consists of an
    integer number of consecutive octets of that NAL unit.  Each octet
    of the NAL unit MUST be part of exactly one fragment of that NAL
    unit. Fragments of the same NAL unit MUST be sent in consecutive
    order with ascending RTP sequence numbers (with no other RTP
    packets within the same RTP packet stream being sent between the
    first and last fragment).  Similarly, a NAL unit MUST be
    reassembled in RTP sequence number order.
 
    When a NAL unit is fragmented and conveyed within fragmentation
    units (FUs), it is referred to as fragmented NAL unit.  STAPs and
    MTAPs MUST NOT be fragmented.
 
    The RTP timestamp of an RTP packet carrying an FU is set to the
    NALU time of the fragmented NAL unit.
 
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | FU indicator  |   FU header   |        DON (optional)         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
    |                                                               |
    |                         FU payload                            |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :...OPTIONAL RTP padding        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    Figure 12. RTP payload format for fragmentation unit.
 
 Wenger et. al.     Expires December 2003               [Page 26]


 Internet Draft                                          26 June, 2003
    The RTP payload of a fragmentation unit consists of fragmentation
    unit indicator of one octet, a fragmentation unit header of one
    octet, a 16-bit unsigned decoding order number (DON) that is
    conditionally present, and a fragmentation unit payload.
 
    A value equal to 28 in the NAL unit type field of the FU indicator
    octet identifies an FU.  The FU indicator octet has the following
    format:
 
    +---------------+
    |0|1|2|3|4|5|6|7|
    +-+-+-+-+-+-+-+-+
    |F|NRI|    28   |
    +---------------+
 
    The use of the F bit is described in section 1.3.  The value of the
    NRI field MUST be set according to the value of the NRI field in
    the fragmented NAL unit.
 
    The FU header has the following format:
 
    +---------------+
    |0|1|2|3|4|5|6|7|
    +-+-+-+-+-+-+-+-+
    |S|E|R|  Type   |
    +---------------+
 
    S: 1 bit
       The Start bit, when one, indicates the start of a fragmented NAL
       unit. Otherwise, when the following FU payload is not the start
       of a fragmented NAL unit payload, the Start bit is set to zero.
 
    E: 1 bit
       The End bit, when one, indicates the end of a fragmented NAL
       unit, i.e. the last byte of the payload is also the last byte of
       the fragmented NAL unit. Otherwise, when the following FU
       payload is not the last fragment of a fragmented NAL unit, the
       End bit is set to zero.
 
    R: 1 bit
 Wenger et. al.     Expires December 2003               [Page 27]


 Internet Draft                                          26 June, 2003
       The Reserved bit MUST be equal to 0 and MUST be ignored by the
       receiver.
 
    Type: 5 bits
       The NAL Unit payload type as defined in table 7-1 of [1].
 
    DON is present when the S bit in the FU header is equal to 1.  The
    value of DON is selected as described in section 5.3.
 
    Informative note: The DON field in FUs allows gateways to fragment
    NAL units to FUs without organizing the incoming NAL units to the
    NAL unit decoding order.
 
    A fragmented NAL unit MUST NOT be transmitted in one FU, i.e.,
    Start bit and End bit MUST NOT both be set to one in the same FU
    header.  A fragmentation unit MUST NOT contain an aggregation
    packet.  Fragmentation units of a NAL unit MUST be sent in
    consecutive packets of the same RTP stream.
 
    The FU payload consists of fragments of the payload of the
    fragmented NAL unit such that if the fragmentation unit payloads of
    consecutive FUs are sequentially concatenated, the payload of the
    fragmented NAL unit is reconstructed.  Note that the NAL unit type
    octet of the fragmented NAL unit is not included as such in the
    fragmentation unit payload, but rather the information of the NAL
    unit type octet of the fragmented NAL unit is conveyed in F and NRI
    fields of the FU indicator octet of the fragmentation unit and in
    the type field of the FU header.  A FU payload MAY have any number
    of octets and MAY be empty.
 
    If a fragmentation unit is lost, the receiver SHOULD discard all
    following fragmentation units in transmission order corresponding
    to the same fragmented NAL unit.
 
 
 6.    Packetization Rules
 
    Two cases of packetization rules have to be distinguished by the
    possibility to put packets belonging to more than a single picture
    into a single aggregated packet (using MTAPs).
 
 
 Wenger et. al.     Expires December 2003               [Page 28]


 Internet Draft                                          26 June, 2003
 6.1.      Unrestricted Mode (Multiple Picture Model)
 
    This mode MAY be supported by some receivers.  Usually, the
    capability of a receiver to support this mode is implied by the
    application, or indicated by external control protocol means.  The
    use of this mode MUST be signaled with the optional mtap-allowed
    MIME or SDP parameter, if MIME or SDP signaling is in use.  The
    following packetization rules MUST be enforced by the sender:
 
    o All packet types specified in Error! Reference source not found.
      MAY be used.
 
    o The transmission order of packets and NAL units is constrained as
      specified in section 5.3.  Some receivers MAY NOT support a
      transmission order that does not conform to the NAL unit decoding
      order.
 
    o Coded slice NAL units or coded slice data partition NAL units
      belonging to the same coded picture (and hence share the same RTP
      timestamp value) MAY be sent in any order permitted by the
      applicable profile defined in [1], although, for delay-critical
      systems, they SHOULD be sent in their original coding order to
      minimize the delay.  Note that the coding order is not
      necessarily the scan order, but the order the NAL packets become
      available to the RTP stack.
 
    o Sequence and picture parameter set NAL units MUST NOT be sent in
      an RTP session whose parameter sets were already changed by
      control protocol messages during the lifetime of the RTP session.
 
    o Network elements such as gateways MAY convert single NAL unit
      packets into one aggregation packet, convert an aggregation
      packet into several single NAL unit packets, or mix both
      concepts.  However, when doing so they SHOULD take into account
      at least the following parameters: path MTU size, unequal
      protection mechanisms (e.g. through packet-based FEC according to
      RFC2398, carried by RFC2198, especially for sequence and picture
      parameter set NAL units and coded slice data partition A NAL
      units), bearable latency of the system, and buffering
      capabilities of the receiver.
 
 Wenger et. al.     Expires December 2003               [Page 29]


 Internet Draft                                          26 June, 2003
    o Network elements such as gateways MUST NOT duplicate any NAL unit
      except for sequence or picture parameter set NAL units, because
      neither this memo nor the H.264 specification provides means to
      identify duplicated NAL units.  Sequence and picture parameter
      set NAL units MAY be duplicated to make their correct reception
      more probable, but any such duplication MUST NOT affect the
      contents of any active sequence or picture parameter set.
 
 
 6.2.      Restricted Mode (Single Picture Model)
 
    This mode MUST be supported by all receivers.  It is primarily
    intended for low-delay applications.  Its main difference from the
    Unrestricted Mode is to forbid the packetization of data belonging
    to more than one picture in a single RTP packet.  Hence, MTAPs MUST
    NOT be used.  The following packetization rules MUST be enforced by
    the sender:
 
   o All rules of the Unrestricted Mode above, but MTAPs MUST NOT be
      used.  This implies that aggregation packets MUST NOT include
      coded slices or coded slice data partitions belonging to
      different access units.
 
 
 7.    De-Packetization Process (Informative)
 
    The de-packetization process is implementation dependent.  Hence,
    the following description should be seen as an example of a
    suitable implementation.  Other schemes may be used as well.
    Optimizations relative to the described algorithms are likely
    possible.
 
    The general concept behind these de-packetization rules is to
    reorder NAL units from transmission order to the NAL unit decoding
    order.
 
    The receiver includes a receiver buffer, which is used to reorder
    packets from transmission or to the NAL unit decoding order.  The
    receiver may use the following guidelines when determining the size
    of the receiver buffer.  The optional interleaving-depth MIME
    parameter indicates the size of the receiver buffer as the number
    of VCL NAL units.  The size of the receiver buffer in bytes may be
 Wenger et. al.     Expires December 2003               [Page 30]


 Internet Draft                                          26 June, 2003
    estimated by multiplying the optional initial-buffering-time MIME
    parameter with the bandwidth-value of the corresponding media level
    SDP description parameter, if available (and taking into account
    the necessary conversions between the units of initial-buffering-
    time and bandwidth-value).  The receiver should take buffering for
    transmission delay jitter into account too and either reserve a
    separate buffer for transmission delay jitter buffering or combine
    the buffer for transmission delay jitter with the receiver buffer.
 
    The receiver stores incoming NAL units in reception order into the
    receiver buffer as follows.  NAL units contained in a single NAL
    unit packet are stored into the receiver buffer as such.  NAL units
    of aggregation packets are stored into the receiver buffer
    individually.  For each NAL unit of a single NAL unit packet and an
    aggregation packet, the RTP sequence number of the packet that
    contained the NAL unit is stored and associated with the stored NAL
    unit.  All fragments of a fragmented NAL unit are collected and
    stored into the receiver buffer.  For a fragmented NAL unit, the
    RTP sequence number of the RTP packet containing the first fragment
    in transmission order is stored and associated with the fragmented
    NAL unit.  Moreover, for each NAL unit, the packet type (single NAL
    unit packet, STAP-A, STAP-B, MTAP16, MTAP24, or fragmentation unit)
    that contained the NAL unit is stored and associated with each
    stored NAL unit.  Furthermore, for NAL units carried in STAP-Bs,
    MTAPs, and FUs, decoding order number (DON) is calculated and
    stored.
 
    Hereinafter, let N be the value of the optional interleaving-depth
    MIME type parameter (see section 8.1).  When the RTP session is
    initialized, the receiver buffers at least N VCL NAL units or at
    least for the duration equal to the value of the optional initial-
    buffering-time MIME parameter, if available, whichever results into
    shorter initial buffering duration, before passing any packet to
    the decoder.
 
    If the receiver buffer contains at least N VCL NAL units, NAL units
    are removed from the receiver buffer and passed to the decoder in
    the order specified below until the buffer contains N-1 VCL NAL
    units.  Variable ts is set to the value of system timer that was
    initialized to 0 when the first packet of the NAL unit stream was
    received.  If the receiver buffer contains a NAL unit whose
 Wenger et. al.     Expires December 2003               [Page 31]


 Internet Draft                                          26 June, 2003
    reception time tr fulfills the condition that ts - tr > initial-
    buffering-time, NAL units are passed to the decoder (and removed
    from the receiver buffer) in the order specified below until the
    receiver buffer contains no NAL unit whose reception time tr
    fulfills the specified condition.  Note that transmission delay
    jitter should be taken into account in the calculations with
    timestamps.
 
    The order that NAL units are passed to the decoder is specified as
    follows:
 
    o Let PDON be a variable that is initialized to 0 at the beginning
      of the an RTP session.
 
    o If the oldest RTP sequence number in the buffer corresponds to a
      single NAL unit packet or an STAP-A, the NAL unit in the single
      NAL unit packet or the STAP-A, respectively, is the next NAL unit
      in the NAL unit decoding order.  The value of PDON is set to 0.
 
    o If the oldest RTP sequence number in the buffer corresponds to An
      STAP-B, an MTAP, or an FU, the NAL unit decoding order is
      recovered among the NAL units conveyed in STAP-Bs, MTAPs, and FUs
      in RTP sequence number order until the next single NAL unit
      packet or the next STAP-A (exclusive).  This set of NAL units is
      hereinafter referred to as the candidate NAL units.  If no NAL
      units conveyed in single NAL unit packets or STAP-As reside in
      the receiver buffer, all NAL units belong to candidate NAL units.
 
    o For each NAL unit among the candidate NAL units, a DON distance
      is calculated as follows.  If the value of DON of the NAL unit is
      larger than the value of PDON, the DON distance is equal to DON -
      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
      + 1.  NAL units are delivered to the decoder in ascending order
      of DON distance.  When a desired amount of NAL units have been
      passed to the decoder, the value of PDON is set to the value of
      DON for the last NAL unit passed to the decoder.
 
    o If several NAL units share the same DON distance, they can be
      passed to the decoder in any order.
 
 Wenger et. al.     Expires December 2003               [Page 32]


 Internet Draft                                          26 June, 2003
    o If the video decoder in use does not support Arbitrary Slice
      Order, the decoding order of coded slices and coded slice data
      partitions A is ordered in ascending order of the
      first_mb.in.slice syntax element in the slice header. Moreover,
      coded slice data partition B and C immediately follow the
      corresponding coded slice data partition A in decoding order.
 
    The following additional de-packetization rules may be used to
    implement an operational H.264 de-packetizer:
 
    o Intelligent RTP receivers (e.g. in gateways) may identify lost
      coded slice data partitions A (DPAs). If a lost DPA is found, a
      gateway may decide not to send the corresponding coded slice data
      partitions B and C, as their information is meaningless for H.264
      decoders.  In this way a network element can reduce network load
      by discarding useless packets, without parsing a complex bit
      stream.
 
    o Intelligent receivers may discard all packets in which the value
      of the NRI field of the NAL unit type octet is equal to 0.
      However, they process those packets if possible, because the user
      experience may suffer if the packets are discarded.
 
 
 8.    Payload Format Parameters
 
    This section specifies the parameters that MAY be used to select
    optional features of the payload format.  The parameters are
    specified here as part of the MIME subtype registration for the
    ITU-T H.264 | ISO/IEC 14496-10 codec.  A mapping of the parameters
    into the Session Description Protocol (SDP) [4] is also provided
    for those applications that use SDP.  Equivalent parameters could
    be defined elsewhere for use with control protocols that do not use
    MIME or SDP.
 
 
 8.1.      MIME Registration
 
    The MIME subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
    allocated from the IETF tree.
 
    Any unspecified parameter MUST be ignored by the receiver.
 Wenger et. al.     Expires December 2003               [Page 33]


 Internet Draft                                          26 June, 2003
 
    Media Type name:     video
 
    Media subtype name:  H264
 
    Required parameters: none
 
    Optional parameters:
       profile-level-id: A profile-level element used in specifying the
                         value of this parameter is generated by
                         forming a string of hexadecimal
                         representations of the following three bytes
                         in the sequence parameter set NAL unit
                         specified in [1]: 1) profile_idc, 2) a byte
                         herein referred to as profile-iop, composed of
                         the values of constrained_set0_flag,
                         constrained_set1_flag, constrained_set2_flag,
                         and reserved_zero_5bits in bit-significance
                         order starting from the most significant bit,
                         and 3) level_idc.  Note that
                         reserved_zero_5bits is required to be equal to
                         0 in [1], but other values for it may be
                         specified in the future by ITU-T or ISO/IEC.
                         The value of profile-level-id is a sequence of
                         profile-level elements.  If this parameter is
                         used for indicating properties of a NAL unit
                         stream, it indicates the profiles that are in
                         use in the stream and the highest level that
                         is in use for each signaled profile.  The
                         profile-iop byte for each signaled profile
                         indicates whether the NAL unit stream also
                         obeys all constraints of the indicated
                         profiles as follows.  If bit 7 (the most
                         significant bit), bit 6, or bit 5 of profile-
                         iop is equal to 1, all constraints of the
                         Baseline profile, the Main profile, or the
                         Extended profile, respectively, are obeyed in
                         the NAL unit stream.  If the profile-level-id
                         parameter is used for capability exchange or
                         session setup procedure, it indicates the
                         profiles that the codec supports and the
 Wenger et. al.     Expires December 2003               [Page 34]


 Internet Draft                                          26 June, 2003
                         highest level that is supported for each
                         signaled profile.  The profile-iop byte for
                         each signaled profile indicates whether the
                         codec has such additional limitations that
                         only the common subset of the algorithmic
                         features and limitations of the profiles
                         signaled with the profile-iop byte and the
                         profile indicated by profile_idc is supported
                         by the codec.  For example, if a codec
                         supports the Baseline Profile at level 3 and
                         below and the Main Profile at level 2.1 and
                         below without any additional limitations, the
                         profile-level-id becomes 42A01E4D4015.  If a
                         codec supports only the common subset of the
                         coding tools of the Baseline profile and the
                         Main profile at level 2.1 and below, the
                         profile-level-id becomes 42E015.  If no
                         profile-level-id is present, the Baseline
                         Profile without additional constraints at
                         Level 1 MUST be implied.
 
       parameter-sets:   This parameter MAY be used to convey such
                         sequence and picture parameter set NAL units,
                         herein referred to as the initial parameter
                         set NAL units, that MUST precede any other NAL
                         units in decoding order.  The parameter MUST
                         NOT be used to indicate codec capability in
                         any capability exchange procedure.  The value
                         of the parameter is the hexadecimal
                         representation of the initial parameter set
                         NAL units as specified in sections 7.3.2.1 and
                         7.3.2.2 of [1].  The parameter sets are
                         conveyed in decoding order and no framing of
                         the parameter set NAL units takes place.  A
                         comma is used to separate any pair of
                         parameter sets in the list.  Note that the
                         number of bytes in a parameter set NAL unit is
                         typically less than 10 bytes, but a picture
                         parameter set NAL unit can contain several
                         hundreds of bytes.
 
 Wenger et. al.     Expires December 2003               [Page 35]


 Internet Draft                                          26 June, 2003
       interleaving-depth: This parameter MAY be used to signal the
                         properties of a NAL unit stream or the
                         capabilities of a transmitter or receiver
                         implementation.  The parameter specifies the
                         maximum amount of VCL NAL units that precede
                         any VCL NAL unit in the NAL unit stream in
                         transmission order and follow the VCL NAL unit
                         in decoding order.  If the parameter is not
                         present, then a value of 0 MUST be used for
                         interleaving-depth.  The value of
                         interleaving-depth MUST be an integer in the
                         range from 0 to 32767, inclusive.
 
       initial-buffering-time: This parameter MAY be used to signal the
                         properties of a NAL unit stream or the
                         capabilities of a transmitter or receiver
                         implementation.  The parameter is a decimal
                         representation of the maximum time in clock
                         ticks of a 90-kHz clock between the
                         transmission time of any NAL unit and the
                         decoding time of the NAL unit assuming
                         reliable and instantaneous transmission and
                         the same timeline for transmission and
                         decoding.  If the parameter is not present,
                         then a value of 0 MUST be used for initial-
                         buffering-time.  The value of initial-
                         buffering-time MUST be an integer in the range
                         from 0 to 4 294 967 295, inclusive.  Receivers
                         SHOULD take transmission delay jitter
                         buffering, including buffering for the delay
                         jitter caused by mixers, translators,
                         gateways, proxies, traffic-shapers and other
                         network elements, into account in addition to
                         the signaled initial-buffering-time.
 
       mtap-allowed:    Permissible values are 0 and 1.  If 0 or not
                         present, MTAPs MUST NOT be present in the NAL
                         unit stream. If 1, MTAPs MAY be present in the
                         NAL unit stream.
 
    Encoding considerations:
 Wenger et. al.     Expires December 2003               [Page 36]


 Internet Draft                                          26 June, 2003
                         This type is only defined for transfer via RTP
                         (RFC 1889).
 
    Security considerations:
                         See section 9 of RFC XXXX. [Ed.Note: to be
                         replaced with the RFC number of this
                         specification]
 
    Public specification:
                         Please refer to RFC XXXX [Ed.Note: to be
                         replaced with the RFC number of this
                         specification] and its section 15.
 
    Additional information:
                         None
 
    File extensions:     none
    Macintosh file type code: none
    Object identifier or OID: none
 
    Person & email address to contact for further information:
                         stewe@cs.tu-berlin.de
 
    Intended usage:      COMMON.
 
    Author/Change controller:
                         stewe@cs.tu-berlin.de
                         IETF Audio/Video transport working group
 
 
 8.2.      SDP Parameters
 
    The MIME media type video/H264 string is mapped to fields in the
    Session Description Protocol (SDP) [4] as follows:
 
    o The media name in the "m=" line of SDP MUST be video.
 
    o The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
      MIME subtype).
 
    o The clock rate in the "a=rtpmap" line MUST be 90000.
 Wenger et. al.     Expires December 2003               [Page 37]


 Internet Draft                                          26 June, 2003
 
    o The "a=fmtp" line of SDP MUST contain the optional parameters
      "profile-level-id", "parameter-sets", "interleaving-depth",
      "initial-buffering-time", and "mtap-allowed", if any, to indicate
      the coder capability and configuration, respectively.  These
      parameters are expressed as a MIME media type string, in the form
      of as a semicolon separated list of parameter=value pairs.
 
    An example of media representation in SDP is as follows (Baseline
    Profile, Level 3.0, more than one slice group, arbitrary slice
    ordering, and redundant slices are in use):
 
    m=video 49170/2 RTP/AVP 98
    a=rtpmap:98 H264/90000
    a=fmtp:98 profile-level-id=42A01E
 
 
 9.    Security Considerations
 
    RTP packets using the payload format defined in this specification
    are subject to the security considerations discussed in the RTP
    specification [1], and any appropriate RTP profile (for example
    [2]).
    This implies that confidentiality of the media streams is achieved
    by encryption.  Because the data compression used with this payload
    format is applied end-to-end, encryption may be performed after
    compression so there is no conflict between the two operations.
 
    A potential denial-of-service threat exists for data encodings
    using compression techniques that have non-uniform receiver-end
    computational load.  The attacker can inject pathological datagrams
    into the stream which are complex to decode and cause the receiver
    to be overloaded.  H.264 is particularly vulnerable to such attacks
    because it is extremely simple to generate datagrams containing NAL
    units that affect the decoding process of many future NAL units.
 
    As with any IP-based protocol, in some circumstances a receiver may
    be overloaded simply by the receipt of too many packets, either
    desired or undesired.  Network-layer authentication may be used to
    discard packets from undesired sources, but the processing cost of
    the authentication itself may be too high.  In a multicast
    environment, pruning of specific sources may be implemented in
 Wenger et. al.     Expires December 2003               [Page 38]


 Internet Draft                                          26 June, 2003
    future versions of IGMP [5] and in multicast routing protocols to
    allow a receiver to select which sources are allowed to reach it.
 
 
 10.     Informative Appendix: Application Examples
 
    This payload specification is very flexible in its use, to cover
    the extremely wide application space that is anticipated for the
    H.264.  However, such a great flexibility also makes it difficult
    for an implementer to decide on a reasonable packetization scheme.
    Some information how to apply this specification to real-world
    scenarios is likely to appear in the form of academic publications
    and a test model software and description in the near future.
    However, some preliminary usage scenarios are described here as
    well.
 
 
 10.1.       Video Telephony according to ITU-T Recommendation H.241
      Annex A
 
    H.323-based video telephony systems that use H.264 as an optional
    video compression scheme are required to support H.241 Annex A as a
    packetization scheme.  The packetization mechanism defined in this
    Annex is technically identical with a small subset of this
    specification.
    When operating according to H.241 Annex A, parameter sets NAL units
    are sent in-band.  Only Single NAL unit packets are used.  A
    typical packet stream generated by such a system consists of all
    sequence and picture parameter sets used for the future video
    sequence, possibly sent in more than one copy to raise the
    likeliness of their arrival at the receiver, followed by the
    packets carrying the NAL units of the IDR picture, and followed by
    packets carrying the subsequent pictures.  Many such systems are
    not sending IDR pictures regularly, but only when required by user
    interaction or by control protocol means, e.g. when switching
    between video channels in an Multipoint Control Unit.
 
 10.2.       Video Telephony, No Slice Data Partitioning, No NAL unit
      Aggregation
 
    The RTP part of this scheme is implemented and tested (though not
    the control-protocol part, see below).
 
 Wenger et. al.     Expires December 2003               [Page 39]


 Internet Draft                                          26 June, 2003
    In most real-world video telephony applications, the picture
    parameters such as picture size or optional modes never change
    during the lifetime of a connection.  Hence, all necessary
    parameter sets (usually only one) are sent as a side effect of the
    capability exchange/announcement process e.g. according to the SDP
    syntax specified in section 8.2 of this document.  Since all
    necessary parameter set information is established before the RTP
    session starts, there is no need for sending any parameter set NAL
    units.  Slice data partitioning is not used either.  Hence, the RTP
    packet stream consists basically of NAL units that carry single
    coded slices.
 
    The size of those single-slice NAL units is chosen by the encoder
    such that they offer the best performance.  Often, this is done by
    adapting the coded slice size to the MTU size of the IP network.
    For small picture sizes this may result in a one-picture-per-one-
    packet strategy.  The loss of packets and the resulting drift-
    related artifacts are cleaned up by Intra refresh algorithms.
 
 
 10.3.       Video Telephony, Interleaved Packetization Using NAL unit
      Aggregation
 
    This scheme allows better error concealment and is widely used in
    H.263 based designed using RFC2429 packetization.  It is also
    implemented and good results were reported [10].
 
    The VCL encoder codes the source picture such that all macroblocks
    (MBs) of one MB line are assigned to one slice.  All slices with
    even MB row addresses are combined into one STAP, and all slices
    with odd MB row addresses into another STAP.  Those STAPs are
    transmitted as RTP packets.  The establishment of the parameter
    sets is performed as discussed above.
 
    Note that the use of STAPs is essential here, because the high
    number of individual slices (18 for a CIF picture) would lead to
    unacceptably high IP/UDP/RTP header overhead (unless the source
    coding tool FMO is used, which is not assumed in this scenario).
    Furthermore, some wireless video transmission systems, such as
    H.324M and the IP-based video telephony specified in 3GPP, are
    likely to use relatively small transport packet size.  For example,
    a typical MTU size of H.223 AL3 SDU is around 100 bytes [13].
 Wenger et. al.     Expires December 2003               [Page 40]


 Internet Draft                                          26 June, 2003
    Coding individual slices according to this packetization scheme
    provides a further advantage in communication between wired and
    wireless networks, as individual slices are likely to be smaller
    than the preferred maximum packet size of wireless systems.
    Consequently, a gateway can convert the STAPs used in a wired
    network to several RTP packets with only one NAL unit that are
    preferred in a wireless network and vice versa.
 
 
 10.4.       Video Telephony, with Data Partitioning
 
    This scheme is implemented and was shown to offer good performance
    especially at higher packet loss rates [10].
 
    Data Partitioning is known to be useful only when some form of
    unequal error protection is available.  Normally, in single-session
    RTP environments, even error characteristics are assumed --
    statistically, the packet loss probability of all packets of the
    session is the same.  However, there are means to reduce the packet
    loss probability of individual packets in an RTP session.  RFC 2198
    [14], for example, allows carrying a redundant copy of a essential
    packet in the next RTP packet.  Packet-based Forward Error
    Correction [15] carried in RFC2198 is also an appropriate means to
    protect high priority information.
 
    In all cases, the incurred overhead is substantial, but in the same
    order of magnitude as the number of bits that have otherwise be
    spent for intra information.  However, this mechanism is not adding
    any delay to the system.
 
    Again, the complete parameter set establishment is performed
    through control protocol means.
 
 
 10.5.       Low-Bit-Rate Streaming
 
    This scheme has been implemented with H.263 and gave good results
    [16].  There is no technical reason why similarly good results
    could not be achievable with H.264.
 
    In today's Internet streaming, some of the offered bit-rates are
    relatively low in order to allow terminals with dial-up modems to
 Wenger et. al.     Expires December 2003               [Page 41]


 Internet Draft                                          26 June, 2003
    access the content.  In wired IP networks, relatively large
    packets, say 500 - 1500 bytes, are preferred to smaller and more
    frequently occurring packets in order to reduce network congestion.
    Moreover, use of large packets decreases the amount of RTP/UDP/IP
    header overhead.  For low-bit-rate video, the use of large packets
    means that sometimes up to few pictures should be encapsulated in
    one packet.
 
    However, loss of a packet including many coded pictures would have
    drastic consequences in visual quality, as there is practically no
    other way to conceal a loss of an entire picture than to repeat the
    previous one.  One way to construct relatively large packets and
    maintain possibilities for successful loss concealment is to
    construct MTAPs that contain slices from several pictures in an
    interleaved manner.  An MTAP should not contain spatially adjacent
    slices from the same picture or spatially overlapping slices from
    any picture.  If a packet is lost, it is likely that a lost slice
    is surrounded by spatially adjacent slices of the same picture and
    spatially corresponding slices of the temporally previous and
    succeeding pictures.  Consequently, concealment of the lost slice
    is likely to succeed relatively well.
 
 
 10.6.       Robust Packet Scheduling in Video Streaming
 
    This scheme has been implemented with MPEG-4 Part 2 and simulated
    in a wireless streaming environment [17].  There is no technical
    reason why similar or better results could not be achievable with
    H.264.
 
    Streaming clients typically have a receiver buffer that is capable
    of storing a relatively large amount of data.  Initially, when a
    streaming session is established, a client does not start playing
    the stream back immediately, but rather it typically buffers the
    incoming data for a few seconds.  This buffering helps to maintain
    continuous playback, because, in case of occasional increased
    transmission delays or network throughput drops, the client can
    decode and play buffered data.  Otherwise, without initial
    buffering, the client has to freeze the display, stop decoding, and
    wait for incoming data.  The buffering is also necessary for either
    automatic or selective retransmission in any protocol level.  If
 Wenger et. al.     Expires December 2003               [Page 42]


 Internet Draft                                          26 June, 2003
    any part of a picture is lost, a retransmission mechanism may be
    used to resend the lost data.  If the retransmitted data is
    received before its scheduled decoding or playback time, the loss
    is perfectly recovered.  Coded pictures can be ranked according to
    their importance in the subjective quality of the decoded sequence.
    For example, non-reference pictures, such as conventional B
    pictures, are subjectively least important, because their absence
    does not affect decoding of any other pictures.  In addition to
    non-reference pictures, the ITU-T H.264 | ISO/IEC 14496-10 standard
    includes a temporal scalability method called sub-sequences [18].
    Subjective ranking can also be made on coded slice data partition
    or slice group basis.  Coded slices and coded slice data partitions
    that are subjectively the most important can be sent earlier than
    their decoding order indicates, whereas coded slices and coded
    slice data partitions that are subjectively the least important can
    be sent later than their natural coding order indicates.
    Consequently, any retransmitted parts of the most important slices
    and coded slice data partitions are more likely to be received
    before their scheduled decoding or playback time compared to the
    least important slices and slice data partitions.
 
 
 11.     Informative Appendix: Rationale for Decoding Order Number
 
 11.1.       Introduction
 
    The Decoding Order Number (DON) concept was introduced mainly to
    enable efficient multi-picture slice interleaving (see section
    10.5) and robust packet scheduling (see section 10.6).  In both of
    these applications NAL units are transmitted out of decoding order.
    DON indicates the decoding order of NAL units and should be used in
    the receiver the recover the decoding order.  Example use cases for
    efficient multi-picture slice interleaving and for robust packet
    scheduling are given in sections 11.2 and 11.3 respectively.
    Section 11.4 describes the benefits of the DON concept in error
    resiliency achieved by redundant coded pictures.  Section 11.5
    summarizes considered alternatives to DON and justifies why DON was
    chosen to this RTP payload specification.
 
 
 11.2.       Example of Multi-Picture Slice Interleaving
 
 Wenger et. al.     Expires December 2003               [Page 43]


 Internet Draft                                          26 June, 2003
    An example of multi-picture slice interleaving follows.  A subset
    of a coded video sequence is depicted below in output order.  R
    denotes a reference picture, N denotes a non-reference picture, and
    the number indicates a relative output time.
 
    ... R1 N2 R3 N4 R5 ...
 
    The decoding order of these picture is from left to right as
    follows:
    ... R1 R3 N2 R5 N4 ...
 
    The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
    DON equal to 1, 2, 3, 4, and 5, respectively.
 
    Each reference picture consists of three slice groups that are
    scattered as follows (a number denotes the slice group number for
    each macroblock in a QCIF frame):
 
    0 1 2 0 1 2 0 1 2 0 1
    2 0 1 2 0 1 2 0 1 2 0
    1 2 0 1 2 0 1 2 0 1 2
    0 1 2 0 1 2 0 1 2 0 1
    2 0 1 2 0 1 2 0 1 2 0
    1 2 0 1 2 0 1 2 0 1 2
    0 1 2 0 1 2 0 1 2 0 1
    2 0 1 2 0 1 2 0 1 2 0
    1 2 0 1 2 0 1 2 0 1 2
 
    For the sake of simplicity, we assume that all the macroblocks of a
    slice group are included in one slice.  Three MTAPs are constructed
    from three consecutive reference pictures so that each MTAP
    contains three aggregation units, each of which contains all the
    macroblocks from one slice group.  The first MTAP contains slice
    group 0 of picture R1, slice group 1 of picture R2, and slice group
    2 of picture R3.  The second MTAP contains slice group 1 of picture
    R1, slice group 2 of picture R2, and slice group 0 of picture R3.
    The third MTAP contains slice group 2 of picture R1, slice group 0
    of picture R2, and slice group 1 of picture R3.  Each non-reference
    picture is encapsulated into an STAP-B.
 
    Consequently, the transmission order of NAL units is the following:
 Wenger et. al.     Expires December 2003               [Page 44]


 Internet Draft                                          26 June, 2003
      R1, slice group 0, DON 1
      R3, slice group 1, DON 2
      R5, slice group 2, DON 4
      R1, slice group 1, DON 1
      R3, slice group 2, DON 2
      R5, slice group 0, DON 4
      R1, slice group 2, DON 1
      R3, slice group 1, DON 2
      R5, slice group 0, DON 4
      N2,                DON 3
      N4,                DON 5
 
    The receiver is able to organize the NAL units back in decoding
    order based on the value of DON associated with each NAL unit.
 
    If one the MTAPs is lost, the spatially adjacent and temporally co-
    located macroblocks are received and can be used to conceal the
    loss efficiently.  If one of the STAPs is lost, the effect of the
    loss does not propagate temporally.
 
 
 11.3.       Example of Robust Packet Scheduling
 
    An example of robust packet scheduling follows.  The communication
    system used in the example consists of the following components in
    the order that the video is processed from source to sink:
    o camera and capturing
    o pre-encoding buffer
    o encoder
    o encoded picture buffer
    o transmitter
    o transmission channel
    o receiver
    o receiver buffer
    o decoder
    o decoded picture buffer
    o display
 
    The video communication system used in the example operates as
    follows.  Note that processing of the video stream happens
    gradually and at the same time in all components of the system.
 Wenger et. al.     Expires December 2003               [Page 45]


 Internet Draft                                          26 June, 2003
    The source video sequence is shot and captured to a pre-encoding
    buffer.  The pre-encoding buffer can be used to order pictures from
    sampling order to encoding order or to analyze multiple
    uncompressed frames for bitrate rate control purposes, for
    example. In some cases the pre-encoding buffer may not exist, but
    rather the sampled pictures are encoded right away.  The encoder
    encodes pictures from the pre-encoding buffer and stores the
    output, i.e., coded pictures, to the encoded picture buffer.  The
    transmitter encapsulates the coded pictures from the encoded
    picture buffer to transmission packets and sends them to a receiver
    through a transmission channel.  The receiver stores the received
    packets to the receiver buffer.  The receiver buffering process
    typically includes buffering for transmission delay jitter.  The
    receiver buffer can also be used to recover correct decoding order
    of coded data.  The decoder reads coded data from the receiver
    buffer and produces decoded pictures as output into the decoded
    picture buffer.  The decoded picture buffer is used to recover the
    output (or display) order of pictures.  Finally, pictures are
    displayed.
 
    In the following example figures, I denotes an IDR picture, R
    denotes a reference picture, N denotes a non-reference picture, and
    the number after I, R, or N indicates a relative sampling time
    proportional to the previous IDR picture in decoding order.  Values
    below the sequence of pictures indicate scaled system clock
    timestamps.  The system clock is initialized arbitrarily in this
    example, and time runs from left to right.  Each I, R, and N
    picture is mapped into the same timeline compared to the previous
    processing step, if any, assuming that encoding, transmission, and
    decoding take no time.  Thus, events happening at the same time are
    located in the same column throughout all example figures.
 
    A subset of a sequence of coded pictures is depicted below in
    sampling order.
 
    ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
    ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
    ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...
 
    The sampled pictures are buffered in the pre-encoding buffer to
    arrange them in encoding order.  In this example, we assume that
 Wenger et. al.     Expires December 2003               [Page 46]


 Internet Draft                                          26 June, 2003
    the non-reference pictures are predicted from both the previous and
    the next reference picture in output order.  Thus, the pre-encoding
    buffer has to contain at least two pictures and the buffering
    causes a delay of two picture intervals.  The output of the pre-
    encoding buffering process, and the encoding (and decoding) order
    of the pictures is as follows:
 
             ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
             ... -|---|---|---|---|---|---|---|---|-  ...
             ... 60  61  62  63  64  65  66  67  68  ...
 
    The encoder or the transmitter can set the value of DON for each
    picture to a value of DON for the previous picture in decoding
    order plus one.
 
    For the sake of simplicity, let us assume that:
    o the frame rate of the sequence is constant,
    o each picture consists on only one slice,
    o each slice is encapsulated in a single NAL unit packet,
    o pictures are transmitted in decoding order, and
    o pictures are transmitted at constant intervals (that is equal to
    1 / frame rate).
 
    Thus, pictures are received in decoding order:
 
             ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
             ... -|---|---|---|---|---|---|---|---|- ...
             ... 60  61  62  63  64  65  66  67  68  ...
 
    The optional interleaving-depth MIME type parameter is set to 0,
    because no buffering is needed to recover the correct decoding
    order from transmission (or reception order).
 
    The decoder has to buffer for one picture interval initially in its
    decoded picture buffer to organize pictures from decoding order to
    output order as depicted below:
 
                 ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
                 ... -|---|---|---|---|---|---|---|---|- ...
                 ... 61  62  63  64  65  66  67  68  69  ...
 
 Wenger et. al.     Expires December 2003               [Page 47]


 Internet Draft                                          26 June, 2003
    The amount of required initial buffering in the decoded picture
    buffer can be signaled in the buffering period SEI message or with
    the num_reorder_frames syntax element of H.264 video usability
    information.  num_reorder_frames indicates the maximum number of
    frames, complementary field pairs, or non-paired fields that
    precede any frame, complementary field pair, or non-paired field in
    the sequence in decoding order and follow it in output order.  For
    the sake of simplicity, we assume that num_reorder_frames is used
    to indicate the initial buffer in the decoded picture buffer.  In
    this example, num_reorder_frames is equal to 1.
 
    It can be observed that if the IDR picture I00 is lost during
    transmission and a retransmission request is issued when the value
    of the system clock is 62, there is one picture interval of time
    (until the system clock reaches timestamp 63) to receive the
    retransmitted IDR picture I00.
 
    Let us then assume that IDR pictures are transmitted two frame
    intervals earlier than their decoding position, i.e., the pictures
    are transmitted as follows:
 
                    ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
                    ... --|---|---|---|---|---|---|---|---|- ...
                    ...  62  63  64  65  66  67  68  69  70  ...
 
    The optional interleaving-depth MIME type parameter is set equal to
    1 according to its definition.  (The value of interleaving-depth in
    this example can be derived as follows:  Picture I00 is the only
    picture preceding picture N58 or N59 in transmission order and
    following it in decoding order.  Except for pictures I00, N58, and
    N59, the transmission order is the same as the decoding order of
    pictures.  Since a coded picture is encapsulated into exactly one
    NAL unit, the value of interleaving-depth is equal to the maximum
    number of pictures preceding any picture in transmission order and
    following the picture in decoding order.)
 
    The receiver buffering process contains one picture at a time
    according to the value of the interleaving-depth parameter and
    orders pictures from the reception order to the correct decoding
    order based on the value of DON associated with each picture.  The
    output of the receiver buffering process is the following:
 Wenger et. al.     Expires December 2003               [Page 48]


 Internet Draft                                          26 June, 2003
 
                         ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
                         ... -|---|---|---|---|---|---|---|---|- ...
                         ... 63  64  65  66  67  68  69  70  71  ...
 
    Again, an initial buffering delay of one picture interval is needed
    to organize pictures from decoding order to output order as
    depicted below:
 
                             ... N58 N59 I00 N01 N02 R03 N04 N05 ...
                             ... -|---|---|---|---|---|---|---|- ...
                             ... 64  65  66  67  68  69  70  71  ...
 
    It can be observed that the maximum delay that IDR pictures can
    undergo during transmission, including possible application,
    transport, or link layer retransmission, is equal to three picture
    intervals.  Thus, the loss resiliency of IDR pictures is improved
    in systems supporting retransmission compared to the case in which
    pictures were transmitted in their decoding order.
 
 
 11.4.       Robust Transmission Scheduling of Redundant Coded Slices
 
    A redundant coded picture is a coded representation of a picture or
    a part of a picture that is not used in the decoding process if the
    corresponding primary coded picture is correctly decoded.  There
    should be no noticeable difference between any area of the decoded
    primary picture and a corresponding area that would result from
    application of the H.264 decoding process for any redundant picture
    in the same access unit.  A redundant coded slice is a coded slice
    that is a part of a redundant coded picture.
 
    Redundant coded pictures can be used to provide unequal error
    protection in error-prone video transmission.  If a primary coded
    representation of a picture is decoded incorrectly, a corresponding
    redundant coded picture can be decoded.  Examples of applications
    and coding techniques utilizing the redundant codec picture feature
    include the video redundancy coding [19] and protection of "key
    pictures" in multicast streaming [20].
 
 Wenger et. al.     Expires December 2003               [Page 49]


 Internet Draft                                          26 June, 2003
    One property of many error-prone video communications systems is
    that transmission errors are often bursty and therefore they may
    affect more than one consecutive transmission packets in
    transmission order.  In low bitrate video communication it is
    relatively common that an entire coded picture can be encapsulated
    into one transmission packet.  Consequently, a primary coded
    picture and the corresponding redundant coded pictures may be
    transmitted in consecutive packets in transmission order.  In order
    to make the transmission scheme more tolerant of bursty
    transmission errors, it is beneficial to transmit a primary coded
    picture apart from the corresponding redundant coded pictures.  The
    DON concept enables this.
 
 
 11.5.       Remarks on Other Design Possibilities
 
    The slice header syntax structure of the H.264 coding standard
    contains the frame_num syntax element that can indicates the
    decoding order of coded frames.  However, the usage of the
    frame_num syntax element is not feasible or desirable to recover
    the decoding order due to the following reasons:
    o The receiver is required to parse at least one slice header per
      coded picture (before passing the coded data to the decoder).
    o Coded slices from multiple coded video sequences cannot be
      interleaved, because the frame number syntax element is reset to
      0 in each IDR picture.
    o The coded fields of a complementary field pair share the same
      value of the frame_num syntax element.  Thus, the decoding order
      of the coded fields of a complementary field pair cannot be
      recovered based on the frame_num syntax element or any other
      syntax element of the H.264 coding syntax.
 
    The RTP payload format for transport of MPEG-4 elementary streams
    [21] enables interleaving of access units and transmission of
    multiple access units in the same RTP packet.  An access unit is
    specified in the H.264 coding standard to consist of all NAL units
    that are associated with a primary coded picture according to
    subclause 7.4.1.2 of [1].  Consequently, slices of different
    pictures cannot be interleaved and the multi-picture slice
    interleaving technique (see section 10.5) for improved error
    resilience cannot be used.
 Wenger et. al.     Expires December 2003               [Page 50]


 Internet Draft                                          26 June, 2003
 
 
 12.     Open Issues
 
    Since the length of the value of the optional parameter-sets MIME
    parameter may be remarkable, it is desirable to use base64 encoding
    instead of the current base16 (hexadecimal) encoding.  We will
    carry out the change to base64 encoding in the next release of this
    memo.
 
    Security section needs review
 
 13.     Full Copyright Statement
 
    Copyright (C) The Internet Society (2002). All Rights Reserved.
 
    This document and translations of it may be copied and furnished to
    others, and derivative works that comment on or otherwise explain
    it
    or assist in its implementation may be prepared, copied, published
    and distributed, in whole or in part, without restriction of any
    kind, provided that the above copyright notice and this paragraph
    are included on all such copies and derivative works.
 
    However, this document itself may not be modified in any way, such
    as by removing the copyright notice or references to the Internet
    Society or other Internet organizations, except as needed for the
    purpose of developing Internet standards in which case the
    procedures for copyrights defined in the Internet Standards process
    must be followed, or as required to translate it into languages
    other than English.
 
    The limited permissions granted above are perpetual and will not be
    revoked by the Internet Society or its successors or assigns.
 
    This document and the information contained herein is provided on
    an
    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 Wenger et. al.     Expires December 2003               [Page 51]


 Internet Draft                                          26 June, 2003
 
 
 14.     Intellectual Property Notice
 
    The IETF takes no position regarding the validity or scope of any
    intellectual property or other rights that might be claimed to
    pertain to the implementation or use of the technology described in
    this document or the extent to which any license under such rights
    might or might not be available; neither does it represent that it
    has made any effort to identify any such rights.  Information on
    the IETF's procedures with respect to rights in standards-track and
    standards-related documentation can be found in BCP-11.  Copies of
    claims of rights made available for publication and any assurances
    of licenses to be made available, or the result of an attempt made
    to obtain a general license or permission for the use of such
    proprietary rights by implementors or users of this specification
    can be obtained from the IETF Secretariat.
 
    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights which may cover technology that may be required to practice
    this standard.  Please address the information to the IETF
    Executive Director.
 
    The IETF has been notified of intellectual property rights claimed
    in regard to some or all of the specification contained in this
    document.  For more information consult the online list of claimed
    rights at http://www.ietf.org/ipr.
 
 
 15.     References
 
 15.1.       Normative References
 
    [1]  "Draft ITU-T Recommendation and Final Draft International
          Standard of Joint Video Specification (ITU-T Rec. H.264 |
          ISO/IEC 14496-10 AVC)", available from  ftp://ftp.imtc-
          files.org/jvt-experts/2003_03_Pattaya/JVT-G50r1.zip, May
          2003.
    [2]  S. Bradner, "Key words for use in RFCs to Indicate
          Requirement Levels", BCP 14, RFC 2119, March 1997.
 Wenger et. al.     Expires December 2003               [Page 52]


 Internet Draft                                          26 June, 2003
    [3]  H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,
          "RTP: A Transport Protocol for Real-Time Applications", RFC
          1889, January 1996.
    [4]  M. Handley and V. Jacobson, "SDP: Session Description
          Protocol", RFC 2327, April 1998.
    [5]  ITU-T Recommendation T.35 (need to put in Rec name and date)
 
 
 15.2.       Informative References
 
    [6]  A. Luthra, G. Sullivan, and T. Wiegand (eds.), Special Issue
          on H.264/AVC. IEEE Transactions on Circuits and Systems on
          Video Technology, July 2003.
    [7]  P. Borgwardt, "Handling Interlaced Video in H.26L", VCEG-
          N57r2, available from ftp://standard.pictel.com/video-
          site/0109_San/VCEG-N57r2.doc, September 2001.
    [8]  C. Borman et. Al., "RTP Payload Format for the 1998 Version
          of ITU-T Rec. H.263 Video (H.263+)", RFC 2429, October 1998.
    [9]  ISO/IEC IS 14496-2.
    [10] S. Wenger, "H.26L over IP", IEEE Transaction on Circuits and
          Systems for Video technology, July 2003.
    [11] S. Wenger, "H.26L over IP: The IP Network Adaptation Layer",
          Proceedings Packet Video Workshop 02, April 2002
    [12] T. Stockhammer, M.M. Hannuksela, and S. Wenger, "H.26L/JVT
          Coding Network Abstraction Layer and IP-based Transport" in
          Proc. ICIP 2002, Rochester, NY, September 2002.
    [13] ITU-T Recommendation H.223 (1999).
    [14] C. Perkins et. al., "RTP Payload for Redundant Audio Data",
          RFC 2198, September 1997.
    [15] J. Rosenberg, H. Schulzrinne, "An RTP Payload Format for
          Generic Forward Error Correction", RFC 2733, December 1999.
    [16] V Varsa, M. Karczewicz, "Slice interleaving in compressed
          video packetization", Packet Video Workshop 2000.
    [17] S.H. Kang and A. Zakhor, "Packet scheduling algorithm for
          wireless video streaming," International Packet Video
          Workshop 2002, available http://www.pv2002.org.
    [18] M.M. Hannuksela, "Enhanced concept of GOP", JVT-B042,
          available ftp://standard.pictel.com/video-site/0201_Gen/JVT-
          B042.doc, January 2002.
    [19] S. Wenger, "Video Redundancy Coding in H.263+", 1997
          International Workshop on Audio-Visual Services over Packet
          Networks, September 1997.
 Wenger et. al.     Expires December 2003               [Page 53]


 Internet Draft                                          26 June, 2003
    [20] Y.-K. Wang, M.M. Hannuksela, and M. Gabbouj, "Error Resilient
          Video Coding Using Unequally Protected Key Pictures", in
          Proc. International Workshop VLBV03, September 2003.
    [21] J. van der Meer, D. Mackie, V. Swaminathan, D. Singer, and P.
          Gentric, "RTP Payload Format for Transport of MPEG-4
          Elementary Streams", draft-ietf-avt-mpeg4-simple-07.txt,
          February 2003.
 
 
    Author's Addresses
 
    Stephan Wenger                    Phone: +49-172-300-0813
    TU Berlin / Teles AG              Email: stewe@cs.tu-berlin.de
    Franklinstr. 28-29
    D-10587 Berlin
    Germany
 
    Miska M. Hannuksela               Phone: +358-7180-73151
    Nokia Corporation                 Email: miska.hannuksela@nokia.com
    P.O. Box 100
    33721 Tampere
    Finland
 
    Thomas Stockhammer                Phone: +49-89-28923474
    Institute for Communications Eng. Email: stockhammer@ei.tum.de
    Munich University of Technology
    D-80290 Munich
    Germany
 
    Magnus Westerlund                 Phone: +46-8-4048287
    Multimedia Technologies           Email:
    Ericsson Research EAB/TVA/A       magnus.westerlund@ericsson.com
    Ericsson AB
    Torshamsgatan 23
    S-164 80 Stockholm
    Sweden
 
    David Singer                      Phone +1 408 974-3162
    QuickTime Engineering             Email: singer@apple.com
    Apple
    1 Infinite Loop MS 302-3MT
    Cupertino
    CA 95014
 Wenger et. al.     Expires December 2003               [Page 54]


 Internet Draft                                          26 June, 2003
    USA
 
 
 Annex A: Changes relative to draft-ietf-avt-rtp-h264-01.txt
 
    [This section will be removed in a future version of this draft.]
 
    This memo contains the following technical changes relative to the
    previous I-D:
    o The memo is aligned with the latest version of the H.264
      specification.  In particular, the concept of access unit defined
      in the H.264 specification has been incorporated into the memo.
    o The use of RTP timestamps for interlaced video has been
      clarified.
    o Two forms of single-time aggregation packets are available: one
      with decoding order number (DON) and another without DON.
    o The assignment rules for the values of DON have been simplified.
      NAL units having the same value of DON may be decoded at any
      order.  NAL units having a different value of DON must be passed
      to the decoder in ascending order of DON value.
    o Receiver buffer size considerations have been added.
    o The depacketization process has been updated to suit lossy
      transmission better.  The optional initial-buffering-time MIME
      parameter forms a part of the solution.
    o Example use cases of DON and rationale behind the DON concept
      have been added.
    o The application examples have been clarified and a new example
      has been added
    o The security section has been reworked
    o numerous editorial fixes
 
 Wenger et. al.     Expires December 2003               [Page 55]