INTERNET-DRAFT                                              Eric Edwards
draft-ietf-avt-rtp-jpeg2000-01.txt                       Satoshi Futemma
                                                        Eisaburo Itakura
                                                        Nobuyoshi Tomita
                                                            Andrew Leung
                                                       Takahiro Fukuhara
                                                        Sony Corporation
                                                           June 30, 2002
                                               Expires: December 30 2002


              RTP Payload Format for JPEG 2000 Video Streams


Status of this Memo

    This document is an Internet-Draft and is in subject to all
    provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups. Note that
    other groups may also distribute working documents as
    Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time. It is inappropriate to use Internet-Drafts as reference
    materials or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Drafts Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.


Abstract

    This document describes a payload format for transporting JPEG 2000
    video streams using RTP (Real-time Transport Protocol).  JPEG 2000
    video streams are formed as a continuous series of JPEG 2000 still
    images.  The JPEG 2000 payload format described in this document has
    three features: (1) Improvement of robustness to packet loss by
    intelligently fragmenting JPEG 2000 packet units, (2) Persistency of
    main header to minimize loss effect and maximize recovery, (3)
    Priority information field for scalable delivery from the same code
    stream.  These will allow for scalability and robustness of JPEG
    2000's potential to be maximized in streaming applications.


1. Introduction

    This document specifies payload formats for JPEG 2000 video streams
    over the Real-time Transport Protocol (RTP). JPEG 2000 is an ISO/IEC
    International Standard developed for next-generation still image
    encoding.  Its basic encoding technology is described in [1].

Edwards, et al.                                                 [Page 1]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002


    Part 3 of the JPEG 2000 standard defines Motion JPEG 2000[2].
    However, this defines only the file format but not the transmission
    format for streaming on the Internet.  For this reason, it is
    necessary to define the RTP format for JPEG 2000 video streams.

    JPEG 2000 supports many features over the current JPEG standard
    [3][4][5]:

        o Higher compression efficiency than JPEG with less visual loss
          especially at extreme compression ratios.

        o A single code stream that offers both lossy and superior
          lossless compression.

        o Transmission over noisy environments.

        o Progressive transmission by pixel accuracy and resolution.

        o Random code stream access and processing.

    First, the JPEG-2000 algorithm is briefly explained below. Fig. 1
    shows a block diagram of JPEG 2000 encoding method.

                                                   +-----+
                                                   | ROI |
                                                   +-----+
                                                      |
                                                      V
                     +----------+   +----------+   +------------+
                     |DC, comp. |   | Wavelet  |   |            |
     raw image  ==>  |transform-|==>|transform-|==>|Quantization|==+
                     |  ation   |   |  ation   |   |            |  |
                     +----------+   +----------+   +------------+  |
                                                                   |
                  +-------------+   +----------+   +------------+  |
                  |             |   |          |   |            |  |
     JPEG 2000 <==|Data ordering|<==|Arithmetic|<==|Coefficient |<=+
     code stream  |             |   |  coding  |   |bit modeling|
                  +-------------+   +----------+   +------------+

                  Fig. 1: Block diagram of the JPEG 2000 encoder


    Each color component or tile is transformed into wavelet
    coefficients.  The component or tile is sub-sampled into various
    levels usually vertically and horizontally from high frequencies
    (which contains all the sharp details) to the low frequencies (which
    contains all the flat areas.)  Quantization is performed on the
    coefficients within each subband.  The wavelet coefficient is
    divided by the quantization step size and the result is truncated.
    After quantization, code blocks are formed from within the precincts
    within the tiles.  Precincts are a finer separation than tiles and
    code blocks are the smallest separation of the image data.  Entropy

Edwards, et al.                                                 [Page 2]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    coding is performed within each code block and arithmetically
    encoded by bit plane.  After the coefficients of all code blocks
    have been coded into a short bit stream, a header is added turning
    it into a packet.  The header has all the information needed to
    decompress the packet into code blocks.  A group of packets is
    called layers.

    For additional features in transmitting, a re-ordering of the formed
    packets is necessary.  The standard has four ways to transmit and
    decode a compressed image by: resolution, quality, position, or
    component.

    This is only to serve as an introduction to JPEG 2000 and to aid in
    understanding the rest of this document.  Further details of the
    encoder can be found in various texts on JPEG 2000 [1].

    To decompress a JPEG 2000 code stream, one would follow the reverse
    order of the encoding order, minus the quantization step.  It is
    outside the scope of this document to describe in detail this
    procedure.  Please refer to various JPEG 2000 texts for details [1].


1.1 Terminology

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in RFC2119 [7].


1.2 Author's comments, responses, and changes relative to -00

[ this section will be removed in a future version of this document]

1.2.1 Author's comments on this draft

    Changes required from implementation of the last draft of the document

        Implementation of the last draft of this document revealed some
        potential problems with the previous draft.  Some markers would
        never be used, and some situations may always occur, which there
        would be no combinations of markers to indicate it and
        inefficient usage of packets would be encountered (i.e. packing
        multiple tiles into a single RTP payload packet.)  Revisions
        have been added to handle these cases and redundant markers have
        been removed.

    Removal of redundant texts from this document

        A lot of text has been removed from the introduction of this
        document.  This document cannot possibly cover JPEG 2000 in any
        comprehensive way compared to other resources available or
        cited.  Implementors of this standard should have a more
        comprehensive understanding of JPEG 2000 than anything that was
        written in the introduction previously.  Please refer to cited

Edwards, et al.                                                 [Page 3]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

        texts for further information on JPEG 2000.


1.2.2 Response to comments

    Response to comments made on previous drafts of this document and
    design methodology used here.  Comments from IETF and WG01 are
    responded to here.

    H.263 picture header redundancy technique

        The picture header redundancy technique from RFC2428, an RTP
        payload for H.263+, is quite intelligent and useful.  In JPEG
        2000, there can be instances where the Main Header of the
        codestream can become incredibly large, larger than the MTU size
        if many encoding options are used.  In such a situation, sending
        the Main Header with each codestream packet would not be viable
        at all.  The codestream header is already quite compressed
        during from basic JPEG 2000 development.  Another technique will
        be used in this standard to do something similar.  Through the
        optional payload header extension using the optional Marker
        Segment Optional Header, the sender can include all the data
        that it feels to be most important inside this optional header.

    Scalable audio technique

        The scalable audio technique from RFC2198 is quite interesting
        and in some ways, applicable to our standard.  This standard's
        target market is quite wide and very unique.  JPEG 2000 was
        developed to be a highly flexible standard for digital imaging,
        target applications from ultra-thin clients to image archiving.
        At the imaging archiving level, the technique would be useful as
        we move down to thinner clients, such a technique may not be
        optimal when memory resources are scarce.

    Optimal packet reordering

        JPEG 2000 packet reordering and transmission may give a much
        lower error rate when packets are lost or dropped as the error
        would not be immediately apparent and can just "smear" over from
        frame to frame.  With packet reordering, the client must store
        all the packets and rearrange them in memory for the decode.
        The authors feel this would be incredibly taxing on some target
        devices and not sure if such a scheme's result would be
        effective.  There should be some investigation into this area
        with testing to find maybe a single best reordering scheme.

    JPIP Interoperability

        JPIP is new work taking place in the ISO/WG01 JPEG group to
        develop a new part to the JPEG 2000 Standard.As the new JPIP
        targets different application areas than this standard,
        interoperability is highly desired.  While this is an RTP
        standard and JPIP is an RTSP standard, we have provided

Edwards, et al.                                                 [Page 4]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

        provisions for compatability from within the optional header.
        This standard currently has only reserved definition for JPIP
        header within the optional header.  As JPIP is in early stages
        of development and standardization, this standard shall
        incorporate JPIP as a peer standard and strive for
        interoperability as both become more mature.


1.2.3  Changes from the -00 version

    The changes from the -00 version of this Internet Draft are:

    Tiling bit removed and MTL has become MHF

        The tile bit has been removed from the MTL field and the MTL field
        has been renamed to MHF.

    Tiling flag introduced.

        The T flag field comes before the tile number field in the
        payload header.

    Fragment offset shortened from 32bit to 24bit field.

        The justification is that for even QHD images, the fragment
        offset value will not exceed 24bits.  (Our target applications
        are at most QHD size which has at most 4000 width and 3000
        height.  even if we encode that QHD size with 2bpp, the encoded
        size is 4000x3000x2 / 8 = 3MB which is less than 2^24.)
        Additionally, the savings in bits has also been reserved for
        future use.

    Examples of packetization

        The packetization methods are much simpler than the first
        version of the document, which required the examples to help
        illustrate the packetization method.

    Introduction to JPEG 2000 shortened

        As mentioned previously in the comments section.

    X and E bit swapped positions in the header

        The fields have been swapped positions as implementations
        demonstrated this is optimal data layout for this information.

    Default priority table introduced

        When there is no user table defined, a default table will be
        used.  This table is based on the JPEG 2000 packet number in the
        codestream.  Most JPEG 2000 images have at most 90 (=6x5x3)
        jp2-packets which are constructed from 5 decomposition levels (6
        resolutions), 5 layers and 3 components (YcrCb).  Therefore, we

Edwards, et al.                                                 [Page 5]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

        suppose the 254 level priorities are enough in the worst case.



2. JPEG 2000 Video Features

    JPEG 2000 video streams are formed as a continuous series of JPEG
    2000 still images so the above features of JPEG 2000 can be used
    effectively.  A JPEG 2000 video stream has the following merits:

    SNR is improved at a low bit rate.  The formation can be used as a
    video stream format at a low bit rate.

    This is a Full Intra format, which each frame is independently
    compressed has a low encoding and decoding delay.

    JPEG 2000 has flexible and accurate rate control.  This is suitable
    for traffic control and congestion control at the network
    transmission.

    JPEG 2000 can provide its own code stream error resilience markers
    to aid in code stream recovery.


3. Design of RTP payload format for JPEG 2000 video streams

    To provide a payload format that exploits the JPEG 2000 video
    stream, described in the previous section, the following must be
    taken into consideration:

    - Provisions for packet loss

        On the Internet, 5% packet loss is common and this percentage
        may sometimes come to 20% or more.  To split JPEG 2000 video
        streams into RTP packets, efficient packetization of the code
        stream is required to minimize the effects of disabled decoding
        due to missing code-blocks over error prone environments.  If
        the main header is lost in transmission, the decoding ability is
        lost.  Accordingly, a system to compensate for the loss of the
        main header as much as possible is required.

    - A packetizing scheme that exploits JPEG 2000 functionality.

        A packetizing scheme so that an image can be progressively
        transmitted and reconstructed progressively by the receiver
        using JPEG 2000 functionality.  Maximizing performance over
        various network conditions and various computing power of
        receiving platforms.


4. Proposal for an RTP payload format for JPEG 2000 video streams

4.1 RTP fixed header usage


Edwards, et al.                                                 [Page 6]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    For each RTP packet, the RTP fixed header is followed by the JPEG
    2000 payload header, which is followed by JPEG 2000 code stream.
    The RTP header fields that have a meaning specific to the JPEG 2000
    video are described as follows:

    Payload type (PT): The payload type is dynamically assigned by means
    outside the scope of this document. A payload type in the dynamic
    range shall be chosen by means of an out of band signaling protocol
    (e.g., RTSP, SIP, etc.)

    Marker bit (M): The marker bit of the RTP fixed header MUST be set
    to 1 on the last RTP packet of a video frame, and otherwise, it must
    be 0.  When transmission is performed by multiple RTP sessions, the
    bit is set in the last packet of the frame in each session.

    Timestamp: The RTP timestamp is in units of 90 KHz. The same
    timestamp must appear in each fragment of a given frame. The initial
    value of the timestamp is random to make known plaintext attacks on
    encryption more difficult, even if the source itself does not
    encrypt, as the packets may flow through a translator that does.


4.2 RTP Payload header format

    The RTP payload header format for JPEG 2000 video stream is as
    follows:

     0                  1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |X|E|MHF|mh_id|T|   priority    |           tile number         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    reserved   |                fragment offset                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Fig. 2:  RTP payload header format for JPEG 2000


    X : 1 bit

        Extension bit flag.  This bit MUST be set to 1 when a JPEG 2000
        optional payload header follows this header, the JPEG 2000
        payload header, otherwise it MUST be set to 0.  The details of
        optional payload headers are described in Section 8 of this
        document.

    E : 1 bit

        Enable bit flag.  If this bit is set to 1, it means "intelligent
        packetization" described in Section 5.2.  If E bit is 0, it
        means "non-intelligent packetization" and a receiver MUST ignore
        any other payload header information other than extension bit
        flag and fragment offset.


Edwards, et al.                                                 [Page 7]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    MHF (Main Header Flag) : 2 bits

        MHF shows whether the main header is packed into the RTP packet
        or not.  When the main header exists in the RTP packet, the
        sender MUST set the first bit to 1, otherwise this field MUST
        set to 0.  If the first bit is 1, the second bit is valid, and
        if the last part of the main header is included (either whole or
        fragmented), the sender MUST set the second bit to 1. In other
        words, this field is either 3(=0b11) or 2(=0b10) if the main
        header exists in the RTP packet, otherwise 0. Table of MHF usage
        is below:

        +----+-------------------------------------------------------+
        |MHF | Description                                           |
        +----+-------------------------------------------------------+
        | 00 | no main header is packed at all                       |
        | 01 | the fragmented main header (not last part) is packed. |
        | 10 | reserved for future use.                              |
        | 11 | a whole main header or the last part of the           |
        |    | fragmented main header is packed.                     |
        +----+-------------------------------------------------------+

                           Table 1: MHF usage values


        The receiver checks MHF to determine the main header range and
        may perform main header compensation described in Section 7 if
        the main header is lost.

    mh_id : 3 bits

        Main header identification value.  This is used for the JPEG
        2000 main header recovery.  The same mh_id is used as long as
        the coding parameters described in the main header remain
        unchanged.  The mh_id starts at a value 1 when the first main
        header is transmitted.  Mh_id value must increase by 1 every
        time a new main header is transmitted.  Once the mh_id value is
        greater than 7, it must roll over and start at 1 again.  Usage
        of this header is described in Section 7 of this document. This
        field is only valid when E bit is 1.  If the E bit is 0, then
        this field SHOULD be zero.

    priority : 8 bits

        The priority field indicates the importance of the JPEG 2000
        packet included in the payload.  Typically, a higher priority is
        set in the packets containing the JPEG 2000 packets of the lower
        layers and the lower subbands.

    T (Tile flag) : 1 bit

        This field shows whether tile number field is valid or not:

        T=0 means that tile number field is valid and shows the tile

Edwards, et al.                                                 [Page 8]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

        number of the tile-part. The sender MUST set T flag to 0 when
        only one tile-part is packed into the RTP packet regardless
        whether it is a whole tile-part or a fragmentation of the
        tile-part.

        T=1 means that tile number field is invalid. The sender MUST set
        T flag to 1 when the multiple whole tile-parts are packed into
        the RTP packet or there is no tile-part (in other words, only a
        main header) in the RTP packet.


    tile number : 16 bits

        The interpretation of this field is changed depending on the
        value of the T_flag.  When T=0, this field shows the tile
        number.  When T=1, tile number field is invalid. The sender
        SHOULD set tile number to 0, and the receiver MUST ignore this
        field.


    fragment offset : 24 bits

        This value must be set to the byte offset in the JPEG 2000 data
        stream of this RTP packet's contents.

        JPEG 2000 frames are typically larger than underlying network's
        maximum transfer units (MTU), frames might be fragmented into
        several packets.  The fragment offset is the data offset in
        bytes of the current packet from the start of the JPEG 2000 code
        stream.  This field helps the receiver to reassemble JPEG 2000
        code stream.

        To perform scalable video delivery by using multiple RTP
        sessions, the offset value from the first byte of the same frame
        is set for fragment offset.  Accordingly, in scalable video
        delivery using multiple RTP sessions, the fragment offset may
        not start with 0 in some RTP sessions even if the packet is the
        first one of the frame.



5. Fragmentation of JPEG 2000 code stream and Type Field

    Fig. 3 shows the construction of the JPEG 2000 code stream.  The
    JPEG 2000 code stream consists of a main header beginning with the
    SOC marker, one or more tiles (only one tile for no tile division),
    and the EOC marker to indicate the end of the code steam.  Each tile
    consists of a tile-part header starts with the SOT marker and ending
    with the SOD marker, and a bit stream (a series of JPEG 2000
    packets.)

         +--  +------------+
   Main  |    |    SOC     |  Required as the first marker.
   header|    +------------+

Edwards, et al.                                                 [Page 9]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

         |    |    main    |  Main header marker segments
         +--  +------------+
         |    |    SOT     |  Required at the beginning of each tile-part
   Tile- |    +------------+  header.
   part  |    |   T0,TP0   |  Tile 0, tile-part 0 header marker segments
   header|    +------------+
         |    |    SOD     |  Required at the end of each tile-part header
         +--  +------------+
              | bit stream |  Tile-part bit stream.
         +--  +------------+  Might include SOP and EPH
         |    |    SOT     |
   Tile- |    +------------+
   part  |    |   T1,TP0   |
   header|    +------------+
         |    |    SOD     |
         +--  +------------+
              | bit stream |
              +------------+
              |    EOC     |  Required as the last marker in the code stream
              +------------+

              Fig. 3: Construction of the JPEG 2000 code stream


    The JPEG 2000 code stream consists of a main header, tile-part
    headers, and JPEG 2000 packets.  When we packetize the JPEG 2000
    code stream, these construction units from the code stream must be
    maintained.  Each RTP packet will consist of a main header,
    tile-part header, or JPEG 2000 packet.

    If the server does not understand JPEG 2000 code stream (i.e. the
    sender is not intelligent) it should pack JPEG 2000 code stream in
    the largest possible MTU data size for the RTP packet.  The sender
    must segment the JPEG 2000 code stream along arbitrary lengths into
    RTP sized packets for the receiver.  In this case, the E bit MUST be
    set to 0.  This type of packetization is called "non-intelligent
    packetization".

    If the sender understands JPEG 2000 code streams and can read the
    JPEG 2000 packets from the code stream.  (i.e. the sender is
    intelligent) This type of packetization is called "intelligent
    packetization".  JPEG 2000 packets should be packed into RTP payload
    packets in the following way:

    1. If the JPEG 2000 packets are smaller than the MTU size, the
       sender should put as many whole JPEG 2000 packets into a single
       RTP packet.  That is, the JPEG 2000 payload data should begin
       with either one of the SOC marker, SOT marker, or SOP marker (if
       it exists in the JPEG 2000 data stream).

    2. If the JPEG 2000 packets are larger than the MTU size, the sender
       should segment the JPEG 2000 packets at the largest possible MTU
       size but JPEG 2000 packets must not overlap.


Edwards, et al.                                                [Page 10]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    Regardless of the sender's capabilities, the receiver MUST be able
    to handle RTP packets of any size.

    If the sender does not fragment, any packets larger than the MTU
    size might be fragmented into multiple smaller IP packets than the
    MTU size by the IP layer.  If one fragmented IP packet is lost
    during transmission, it is recognized as a loss of the whole RTP
    packet because the receiving host might not be able to reassemble
    the RTP packet.

    The segmentation of the JPEG 2000 code stream into RTP packets must
    fit within the RTP payload size.

    For intelligent packetization, all packets SHOULD be 32 bit aligned.
    If padding bits are required, then the padding bits MUST come at the
    end of the payload.  Any required padding bits MUST NOT appear
    between the header and the payload or at the beginning.

    In the following, all the possible packetization cases are described
    with diagrams.


5.1 Separation at arbitrary lengths

    In this case, a JPEG 2000 code stream is split into several
    fragments at arbitrary byte-position(Fig.4). The E bit MUST be set
    to 0 for this packetization type.

    +---+---+---+----------------------+
    |RTP|PL |SOC| jpeg 2000 codestream |
    |hdr|hdr|   | fragment (1)         |
    +---+---+---+----------------------+
    +---+---+--------------------------+
    |RTP|PL |   jpeg 2000 codestream   |
    |hdr|hdr|   fragment (2)           |
    +---+---+--------------------------+
                ...
    +---+---+----------------------+---+
    |RTP|PL | jpeg 2000 codestream |EOC|
    |hdr|hdr| fragment (N)         |   |
    +---+---+----------------------+---+
    *PL hdr = payload header

                Fig. 4: Arbitrary length fragmentation.


    The E (Enable) bit flag in the payload header MUST be 0 for this
    packetization type.  All other fields except for the X bit and
    fragment offset field, in the payload header SHOULD be 0 and the
    receiver MUST ignore any other values when the enable bit is 0.

    Such RTP packetization scheme is not recommended from the standpoint
    of error resilience.  It is desirable to use it only in some limited
    environments shown below:

Edwards, et al.                                                [Page 11]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002


    - The sender finds it difficult to distinguish the main header, tile
      header, and JPEG 2000 packets from one another.  Such a situation
      is likely to occur when the sender has poor computational power
      and there is no SOP marker in the JPEG 2000 code stream.

    - The network environment is error free.

    - If the JPEG 2000 error resilience markers (TLM, PLM, PLT, PPM, and
      PPT markers) are present in the code stream.  Error resilience
      will be handled outside of RTP.  Its description is not within the
      scope of this document.  Using these markers may improve error
      resilience and recovery.  Producing JPEG 2000 bit streams with
      these markers is highly recommended in all cases.


5.2  General JPEG 2000 RTP packet types

    For the following packetization types, the E bit MUST be set to 1 in
    all following cases.

    (1) JPEG 2000 main header (SOC marker) must come first after the
        payload header (just after the RTP payload header).  If a whole
        main header is packed into the RTP packet, the MHF_value must be
        3 (=0b11). The tile-part header and jp2-packets MAY follow the
        main header in the same packet.  When only the main header is in
        the RTP packet, the T flag MUST be set to 1 and the tile number
        field is ignored.  The sender SHOULD set the tile number to
        0x00, and the receiver MUST ignore this field.

    (2) If two or more tile-parts are packed into a single RTP packet,
        only whole tile-parts MUST be packed into the RTP packet.
        Segmented tile-packets MUST NOT be packed or spread over RTP
        several RTP packets.  When the multiple tile-parts exist in a
        single RTP packet, the T flag MUST be set to 1, which shows the
        tile number field is invald .

    (3) If one tile-part is packed into the RTP packet, the tile-part
        header, if any, MUST come first.  Note that the tile-part header
        just after the main header MAY either be packed with the main
        header, or be separated to another RTP packet.  In this case, T
        flag MUST be set to 0 and the tile number of the tile-part is
        set in the "tile number" field.  Jp2-packets MAY follow the
        tile-part header and may be packed into the same RTP packet.

    (4) If no headers of any kind are in the RTP packet, the T flag MUST
        be set to 0 and the tile number field MUST be set to the tile
        number which the jp2-packets belongs to.

    (5) If the main header, a tile-part header, or a jp2-packet is split
        into the multiple RTP packets, only one fragment SHALL be packed
        into an RTP packet.  If the main header is split, only the last
        fragment's MHF is 3 (=0b11), and the rest are 2(=0b10) .  All
        other fragmented RTP packet's MHF value shall be 0.

Edwards, et al.                                                [Page 12]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002



6. Scalable Delivery and Priority field

    JPEG 2000 code stream has rich functionality built into it so
    decoders can easily handle scalable delivery or progressive
    transmission.  Progressive transmission that allows images to be
    reconstructed with increasing pixel accuracy or spatial resolution
    is essential for many applications.  This feature allows the
    reconstruction of images with different resolutions and pixel
    accuracy, as needed or desired, for different target devices.  The
    largest image source devices can provide a code stream that is
    easily processed for the smallest image display device.

    The JPEG 2000 packets contain all compressed image data from a
    specific layer, a specific component, a specific resolution level,
    and a specific precinct.  The order in which these packets are found
    in the code stream is called the "progression order".  The ordering
    of the packets can progress along four axes: layer, component,
    resolution level and precinct.

    Providing priority field to show importance of data contained in a
    given RTP packet can exploit JPEG 2000 progressive & scalable
    functions.

    The lower the priority value, the higher the priority.  In other
    words, the priority value 0 is the highest priority, and 255 is the
    lowest priority.  We define the priority value 0 and 255 as special
    priorities: 0 for the headers (the main header or tile-part header),
    and 255 for no priority.  When any headers (the main header or
    tile-part header) are packed into the RTP packet, the sender MUST
    set the priority value to 0.  When the sender will not use the
    priority field, the sender MUST set the priority value to 255 to
    inform the receiver that sender doesn't use the priority field.


6.1 Priority mapping table

    For the progression order, the priority value to be given to each
    JPEG 2000 packet is defined by the priority mapping table.  In
    principle, the priority mapping table is negotiated between the
    sender and the receiver through external protocols (such as: RTSP,
    SIP, etc), which not within the scope of this document.  However, in
    some environments such as a multicast videoconference environment,
    it might be difficult to negotiate the priority-mapping table
    between senders and receivers.  We define the default priority
    mapping for such a situation.  The receiver interprets the priority
    as a user-defined priority value only when the priority-mapping
    table has been negotiated and otherwise the receiver interprets as
    the default priority.

6.1.1 default priority mapping

    The JPEG 2000 codestream is ordered in some progression order and

Edwards, et al.                                                [Page 13]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    the in most cases; the foremost jp2-packets are more important than
    the latter ones.  In the default priority table, jp2-packet number
    is used as a priority value.  Jp2-packet number is "packet sequence
    number" defined at SOP marker segments described in Annex A.8.1 [1].
    The default priority values have a range from 1 to 254.  If the
    number of packets is larger than 254, that is, a sequence number
    exceeds 254, the sender MUST set priority values of the following
    jp2-packets to 254.

6.1.2 user-defined priority table

    The user-defined priority table is freely defined by users, but
    priority value 0 and 255 MUST be used as a special priorities: 0 for
    the headers and 255 for no priority.

    For example, in the LRCP order codestream with 3 layers and 3
    resolutions, the user-defined priority table can be defined below
    (the format is not significant).  It has 4 level priorities.

                priority 1: L=0,R=0, C=any, P=any
                priority 2: L=0,R=1-2, C=any, P=any
                priority 3: L=1,R=any, C=any, P=any
                priority 4: L=2,R=any, C=any, P=any

    As another example, the resolution-based priority table can be
    defined as below:

                Priority 1: R=0, L=0, C=any, P=any
                Priority 2: R=0, L=1-2, C=any, P=any
                Priority 3: R=1, L=any, C=any, P=any
                Priority 4: R=2, L=any, C=any, P=any


    As another example, the component-based priority table can be
    defined as below:

                Priority 1: C=0, L=0, R=0, P=any
                Priority 2: C=0, L=0, R=any, P=any
                            C=0, L=any, R=0, P=any
                Priority 3: C=1-2, L=any, R=any, P=any


    To change the priority-mapping table, a new priority-mapping table
    must be sent from the sender to the receiver as needed.


6.2 Sender's Actions

    Priority is given in accordance with the priority-mapping table.
    For RTP packets that only consist of a whole or fragmented main/tile
    header, the sender MUST set priority 0 when a priority-mapping table
    is used.  If a priority-mapping table is not used, the priority
    value must be 0xFF for the same RTP packets.


Edwards, et al.                                                [Page 14]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    When the several jp2-packets are packed into the same RTP packet,
    the priority values of these jp2-packets are sometimes different.
    In such a case, a sender MUST set the packet priority to the highest
    priority of all the ones inside the packet.  If the sender does not
    use any priority-mapping table, it MUST set 0xff in the priority
    field.

    The sender may transmit each priority using separate multiple RTP
    sessions defined by the priority value.  For example, different
    priority may be allocated to others in a multicast group.  The
    sender may also transmit all priority valued RTP packets using a
    single RTP session.

6.3 Receiver's Action

    Progressive transmission that allows images to be reconstructed with
    increasing pixel accuracy or spatial resolution is essential for
    many applications.  This feature allows the reconstruction of images
    with different resolutions and pixel accuracy, as needed or desired,
    for different target devices.  The image architecture provides for
    the efficient delivery of image data in many applications such as
    client/server applications.  The receiver should decode packets
    above a certain priority to obtain maximum performance depending on
    the receiver's platform.

    The receiver can determine on its own (using or not using the
    mapping table or other variables) the priority value level the RTP
    packets it should decode.

    For example, when a less powerful CPU is used or the terminal has
    only a low-resolution display, decoding only RTP packets below a
    certain priority permits obtaining optimal performance.

    If any high-priority RTP packet is not received when a packet loss
    occurs, frame(s) can be skipped because no significant visual loss
    may be perceived even if decoding can be successfully performed.

    When any uninterpretable or an unexpected priority is received, the
    receiver must interpret the packets as no priority (i.e. priority=
    0xFF).


7. JPEG 2000 main header compensation

    The JPEG 2000 image main header describes various encode parameters
    and the decoder decodes by using the parameters described in the
    main header.  If the RTP packet that contains the main header is
    lost, the corresponding JPEG 2000 code stream cannot and should not
    be decoded.  In an extremely rare case, if the main header has
    dropped and all the remainder JPEG 2000 packets has been received
    successfully, the receiver cannot decode the frame without main
    header information.  Even when the main header is lost, it can be
    recovered to a certain level using the following method.


Edwards, et al.                                                [Page 15]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    A recovery of the main header that has been lost is very simple with
    this procedure.  In the case of JPEG 2000 video, it is common that
    encode parameters will not vary greatly from each successive frame.
    Even if the RTP packet including the main header of a frame has
    dropped, decoding processing may be performed by using the main
    header of the previous frame if this previous frame is already
    encoded by the same encode parameters.

    The mh_id field of the payload header is used to recognize whether
    the encoding parameters of the main header are the same as the
    encoding parameters of the previous frame. The same value is set in
    mh_id of the RTP packet in the same frame.  Mh_id and encode
    parameters are not associated with each other as 1:1 but they are
    used to recognize whether the encode parameters of the previous
    frame are the same or not.

    The mh_id field value SHOULD be saved from previous frames to be
    used to recover the current frame's main header, if lost.  If the
    mh_id of the current frame has the same value as the mh_id value of
    the previous frame, the previous frame's main header SHOULD be used
    to decode the current frame, in case of a lost header.

    The sender MUST increment mh_id when parameters in the header change
    and send a new main header accordingly.

    The receiver MAY use the md_id and MAY retain the header for such
    compensation.


7.1 Sender processing

    The sender must transmit RTP packets with the same mh_id value
    unless the encoder parameters are different from the previous frame.
    The encode parameters are the fixed information marker segment (SIZ
    marker) and functional marker segments (COD, COC, RGN, QCD, QCC, and
    POC) specified in JPEG 2000 Part 1 Annex A [1].  If the encode
    parameters have been changed, the sender transmitting RTP packets
    MUST increment the mh_id value by one.  The initial mh_id value
    should be 1.  When the mh_id value exceeds 7, the value MUST return
    to 1 again.

    If the md_id field is set to 0, the receiver MUST not save the main
    header and MUST NOT compensate for lost headers using the above
    method.


7.2 Receiver processing

    When the receiver has received the main header correctly, the RTP
    sequence number, the mh_id and main header should be saved except
    when the mh_id value is 0.  Only the last main header that was
    received correctly SHOULD be saved.  That is, if there has been a
    saved main header, the previous one is deleted and the new main
    header is saved.

Edwards, et al.                                                [Page 16]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002


    When the main header is not received, the receiver compares the
    current mh_id value (this mh_id can be known by receiving at least
    one RTP packet) with the saved mh_id value.  When the values are the
    same, decoding may be performed by using the saved main header.

    Knowing whether the main header is lost or not maybe difficult,
    especially when the main header is fragmented.

    In all cases, the main header will start with fragment offset = 0.
    In the case of fragmented main header, only the first fragment will
    have the fragment offset = 0.


8. Optional Payload Header

    When the extension bit of the JPEG 2000 payload header is 1, an
    optional payload header follows the payload header.  The JPEG 2000
    video stream payload comes after the optional payload header.  The
    figure shows a general format of the optional payload header.

     0                  1                   2                  3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    optype   |X|             length            |               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
    |                   option specific format .....                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Fig. 5 : JPEG 2000 video stream optional payload header generic
                format

    optype : 7 bits

        optype describes the optional payload header type.  Optype value
        0 MUST not be used.  Optype values from 1 to 63 are defined in
        the specification.  In this draft, 1 and 2 are defined and the
        rest are reserved for future use. Optype values from 64 to 127
        are application-defined ones and can be freely used for
        application's own definition.  Any optype values in the range of
        0-63 not specified within this document MUST be ignored and the
        accompanying header must be ignored as well.

        +--------------------+----------------------------------+
        | Optype value range | Defined in                       |
        +--------------------+----------------------------------+
        | 0                  | Not allowed.                     |
        | 1-63               | In this specification.           |
        | 64-127             | Free for application definition. |
        +--------------------+----------------------------------+

                Table 2: Optype value definition range

    X : 1 bit

Edwards, et al.                                                [Page 17]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002


        Further extension bit. This must be set to 1 if another optional
        payload header follows this optional payload header; otherwise
        it must be set to 0.

        When the extension bit of the optional header is 1, another
        optional payload header MUST come immediately after this
        optional payload header.

    length : 16 bits

        This value must be the length of optional header in bytes.  The
        receiver shall perform processing for the optional header when
        the extension bit of the JPEG 2000 payload header is 1.


8.1 Marker Segment Optional Header

    The marker segment optional header allows changes to almost any
    property of the JPEG 2000 main or tile header functional markers
    such as: (SIZ, COD, COC, RGN, QCD, QCC, POC, etc.)

    As an optional header, this can be used to duplicate critical data
    from the main or tile header redundantly with each packet.  At the
    same time, small changes to a larger header would be simple with
    this marker.

    The format of this optional header:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  optype=1   |X|             length            |F|  JP2code  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        marker segment data                  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Fig. 6 : Marker segment optional header format

    optype: 7 bits

        This value MUST be 1 for this optional header.

    X : 1 bit

        Extension bit. This signifies whether another optional header
        follows this one.  If there is another, the X bit MUST be set to
        1; else, it must be 0. For multiple changes to the header,
        chaining these headers together is recommended.

    length : 16 bits

        Length value.  The length of this optional marker should be the
        length of the corresponding JP2 functional marker minus 1.

Edwards, et al.                                                [Page 18]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

        (i.e.  Lxxx - 1) Please see section 8.1.1 and section 8.1.2 for
        specific example.

    F : 1 bit

        Functional bit.  Whether the optional header is making a change
        in the main or tile header.  F = 0 for tile header and F = 1 for
        main header.

    JP2code : 7 bits

        JP2 functional code value.  This value contains the lower 7bits
        of the original JPEG 2000 functional code marker.  (i.e. COD
        marker = 0xFF52, lower 7 bits = 0x52 -> 0b1010010)

    marker segment data : length bits

        The data in this area MUST be the same as the corresponding JPEG
        2000 marker data specified in Annex A of [1] but not including
        the length of the marker segment.

        A limitation of this optional header is that the functional
        markers in the optional header MUST be present in the original
        main or tile header.  Markers other than the ones in main or
        tile headers MUST NOT be present in this header.


8.1.1 Specific example of marker segment header: COD

    Here is a specific marker segment header for a COD functional
    segment:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  optype=1   |0|        length=(Lcod-1)        |1|  1010010  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    Scod       |                   SGcod                     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    SGcod      |                                             |
    +-+-+-+-+-+-+-+-+                                             |
    |                    Spcod (Lcod length)                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Fig. 7 : COD marker segment optional header example

    -Optype = 1.  As specified in this recommendation.
    -X = 0. For this instance.
    -Length = Lcod - 1.  The length of the original COD marker - 1.
    -F = 1.  This change is in the main header, then F=1.
    - JP2Code = 1010010 -> 0x52. COD marker in JPEG 2000 value: 0xFF52.


8.1.2 Specific example of marker segment header: QCD

Edwards, et al.                                                [Page 19]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002


    Here is a specific marker segment header for a QCD functional
    segment:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  optype=1   |0|        length=(Lqcd-1)        |0|  1011100  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    Sqcd       |                                             |
    +-+-+-+-+-+-+-+-+                                             |
    |                    SPqcd (Lqcd length)                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Fig. 8 : QCD marker segment optional header example

    -Optype = 1.  As specified in this recommendation.
    -X = 0. For this instance.
    -Length = Lqcd - 1.  The length of the original QCD marker - 1.
    -F = 0.  This change is in the tile header, then F=0.
    -JP2Code = 1011100 -> 0x5C. QCD marker in JPEG 2000 value: 0xFF5C.


8.2 JPIP Optional Header

    Interoperability with different standards is extremely useful.  The
    ISO WG1 group also has put forth a transmission protocol standard
    called: JPIP.  This standard is a protocol standard for viewing JPEG
    2000 images interactively using RTSP.  To embrace this standard, an
    optional JPIP header to handle the RTP data for JPIP compatible
    clients is defined here.

    At the time of this writing, the JPIP work is still in its early
    stage of standardization.  Currently, a reserved optype value of 2
    will be placed for JPIP when it is complete.

    The option specific information in this optional header shall be the
    same as the server response data packet from a JPIP server or a
    description of the packet's JPEG 2000 packets.


    Optype : 7 bits

        The optype value for a compatible JPIP optional header must be
        2.

    Option specific format: X bits

        This shall be determined at a later date.


9. Security Consideration

    RTP packets using the payload format defined in this specification

Edwards, et al.                                                [Page 20]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    are subject to the security considerations discussed in the RTP
    specifications[3]. This implies that confidentiality of the media
    streams is achieved by encryption. Because the data compression used
    with this payload format is applied end-to-end, encryption may be
    performed on the compressed data so there is no conflict between the
    two operations.


10. Recommended Practices

    As the JPEG 2000 coding standard is highly flexible, many different
    but compliant data streams can be produced and still be labeled as a
    JPEG 2000 data stream.

    The following is a set of recommendations set forth from our
    experience in developing JPEG 2000 and this payload specification.
    Implementations of this standard must handle all possibilities
    mentioned in this specification.  The following is a listing of
    items an implementation could optimize.

    Error Resilience Markers

        The use of error resilience markers in the JPEG 2000 data stream
        is highly recommended in all situations.  Error recovery with
        these markers is helpful to the decoder and save external
        resources.  Markers such as: RESET, RESTART, and ERTERM

    Packetization Ordering

        Packetization ordering is completely dependent on the client's
        capabilities.  Some orderings allow for less amount of
        distortion in the event of loss at the expense of memory storage
        and packet reordering.

    YCbCr Color space

        The YCbCr color space provides the greatest amount of
        compression in color with respect to the human visual system.
        When used with JPEG 2000, the usage of this color space can
        provide excellent visual results at extreme bit rates.

    Progression Ordering

        JPEG 2000 offers many different ways to order the final code
        stream to optimize the transfer with the presentation.  The most
        useful ordering in our usage cases have been for layer
        progression and resolution progression ordering.

    Tiling and Packets

        JPEG 2000 packets are formed regardless of the encoding method.
        The encoder has little control over the size of these JPEG 2000
        packets as they maybe large or small.


Edwards, et al.                                                [Page 21]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

        Tiling splits the image up into smaller areas and each are
        encoded separately.  With tiles, the JPEG 2000 packet sizes are
        also reduced.  When using tiling, almost all JPEG 2000 packet
        sizes are an acceptable size (i.e. smaller than the MTU size of
        most networks.)

        It is highly recommended that tiling be used so that
        packetization of JPEG 2000 packets for transport can be done
        simpler.


11. Intellectual Property Right

    There are format and mechanisms included in a pending patent
    application that have been FILED to the Japanese Patent Office.
    It must be stressed that as of this document's submission they have
    only been filed and have not been granted.

    We wish to contribute to development of the Internet community and
    continue our good relationship with IETF.  Our primary concern is to
    promote technology so that others may feel it is useful and
    worthwhile in the spirit of the IETF.

    If the mechanisms are granted as patents, the patents will be
    licensed under reasonable and non-discriminatory conditions to any
    person(s) who wishes to implement such mechanisms.


12. References

    [1] ISO/IEC JTC1/SC29: ISO/IEC 15444-1 "Information technology -
        JPEG 2000 image coding system - Part 1: Core coding system",
        December 2000.

    [2] ISO/IEC JTC1/SC29/WG1: "Motion JPEG 2000 Committee Draft 1.0",
        http://www.jpeg.org/public/cd15444-3.pdf, December 2000.

    [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson "RTP: A
        Transport Protocol for Real Time Applications", RFC 1889, January
        1996.

    [4] ISO/IEC JTC1/SC29/WG1: "JPEG2000 requirements and profiles
        version 6.3", draft in progress,
        http://www.jpeg.org/public/wg1n1803.pdf

    [5] Diego Santa-Cruz, Touradj Ebrahimi, Joel Askelof, Mathias
        Larsson and Charilaos Christopoulos: "JPEG 2000 still image
        coding versus other standards", In Proc. of SPIE's 45th annual
        meeting, Application of Digital Image Processing XXIII,
        vol.4115, pp.446-454, July 2000.


13. Authors' Addresses


Edwards, et al.                                                [Page 22]


INTERNET-DRAFT     draft-ietf-avt-rtp-jpeg2000-01.txt          June 2002

    Eric Edwards
    Sony Corporation
    Media Processing Division
    Network & Software Technology Center of America
    3300 Zanker Road, MD: SJ2C4
    San Jose, CA 95134
    Phone: +1 408 955 6462
    Fax: +1 408 955 5724
    Email: Eric.Edwards@am.sony.com

    Satoshi Futemma/Eisaburo Itakura/Nobuyoshi Tomita
    Sony Corporation
    6-7-35 Kitashinagawa Shinagawa-ku
    Tokyo 141-0001 JAPAN
    Phone: +81 3 5448 3096
    Fax: +81 3 5448 4622
    Email: {satosi-f|itakura|n-tomita}@sm.sony.co.jp

    Andrew Leung/Takahiro Fukuhara
    Sony Corporation
    1-11-1 Osaki Shinagawa-ku
    Tokyo 141-0032 JAPAN
    Phone: +81 3 5435 3665
    Fax: +81 3 5435 3891
    Email: {andrew|fukuhara}@av.crl.sony.co.jp































Edwards, et al.                                                [Page 23]