INTERNET-DRAFT                                              Eric Edwards
draft-edwards-avt-rtp-jpeg2000-00.txt                    Satoshi Futemma
                                                        Eisaburo Itakura
                                                       Takahiro Fukuhara
                                                        Sony Corporation
                                                       November 14, 2001
                                                    Expires: May 13 2002


             RTP Payload Format for JPEG 2000 Video Streams

Status of this memo

    This document is an Internet-Draft and is in subject to all
    provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups. Note that
    other groups may also distribute working documents as
    Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time. It is inappropriate to use Internet-Drafts as reference
    materials or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/1id-abstracts.html

    The list of Internet-Drafts Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.


Abstract

    This document describes a payload format for transporting JPEG 2000
    video streams using RTP (Real-time Transport Protocol).
    JPEG 2000 video streams are formed as a continuous series of JPEG
    2000 still images which is next-generation still image coding.
    The JPEG 2000 payload format described in this document has three
    features: (1) Improvement of robustness to packet loss by
    fragmenting JPEG 2000 packet units intelligently, (2) Persistency of
    main header to minimize loss effect, (3) Priority information field
    for scalable delivery from the same codestream.
    These will allow the scalability and robustness of JPEG 2000 to be
    maximized in streaming applications.


1. Introduction

    This document specifies payload formats for JPEG 2000 video streams
    over the Real-time Transport Protocol (RTP).
    JPEG-2000 is the international standardization system for
    next-generation still image encoding and its basic encoding
    technology is described in [1].

Edwards, et al.                                                 [Page 1]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


    In JPEG 2000 part 3, Motion JPEG 2000 is defined[2]. However, this
    defines only the file format but not the transmission format for
    streaming on the Internet. For this reason, it is necessary to
    define the RTP format for JPEG 2000 video streams.

    JPEG 2000 supports many features over the current JPEG
    standard[3][4][5].

        o Higher compression efficiency than JPEG with less visual loss
          especially at bit rates less than 0.25bpp for grayscale images.

        o A single codestream that offers both lossy and superior
          lossless compression.

        o Transmission over noisy environments.  The JPEG 2000
          codestream can be built with markers to boost its error
          resilience and recovery.  The JPEG 2000 codestream is very
          robust to bit errors as it has been designed to avoid
          catastrophic decoding failure due to bit errors.

        o Progressive transmission by pixel accuracy and resolution:
          Progressive transmission that allows images to be
          reconstructed with increasing pixel accuracy or spatial
          resolution is essential for many applications.  This feature
          allows the reconstruction of images with different resolutions
          and pixel accuracy, as needed or desired, for different target
          devices. The image architecture provides for the efficient
          delivery of image data in many applications such as
          client/server applications.

        o Random codestream access and processing.  There are parts of
          an image which maybe more important than others.  Specific
          regions of the codestream can be defined to be less distorted
          than other areas.  Access to any specific area of an image is
          handled efficiently without the need to completely decompress
          the codestream.  Simple image transforms (rotating,
          translation, filtering) can be done with compressed
          codestream.


    First, the JPEG 2000 algorithm is briefly explained below.
    Fig. 1 shows a block diagram of JPEG 2000  encoder.

                                                  +-----+
                                                  | ROI |
                                                  +-----+
                                                     |
                                                     V
                    +----------+   +----------+   +------------+
                    |DC, comp. |   | Wavelet  |   |            |
        raw image==>|transform-|==>|transform-|==>|Quantization|==+
                    |  ation   |   |  ation   |   |            |  |
                    +----------+   +----------+   +------------+  |

Edwards, et al.                                                 [Page 2]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

                                                                  |
                 +-------------+   +----------+   +------------+  |
                 |             |   |          |   |            |  |
    JPEG 2000 <==|Data ordering|<==|Arithmetic|<==|Coefficient |<=+
    codestream   |             |   |  coding  |   |bit modeling|
                 +-------------+   +----------+   +------------+

             Fig. 1: Block diagram of the JPEG2000 encoder


       First, the image will go through component separation, if it is a
    color image.  Split into RGB, YUV, or various other colorspaces.  It
    can also further be sectioned into tiles within the image for
    processing.

    Each color component or tile is transformed into the wavelet
    coefficients.  The component or tile is sampled into various levels
    usually subsampled vertically and horizontally from high frequencies
    (which contains all the sharp details) to the low frequencies (which
    contains all the flat areas).  These wavelet coefficients are
    categorized into different frequencies called subbands.  Subband HH
    has the high frequency information, then HL and LH are the contains
    the middle frequencies, and the lowest frequencies and most
    important coefficients are in the LL subband.

    Quantization is performed on the coefficients within each subband.
    The wavelet coefficient is divided by the quantization step size and
    the result is truncated.  This can happen iteratively to produce an
    accurate target bitrate.

    After quantization, code-blocks are formed from within the precincts
    within the tiles.  Precincts are a finer separation than tiles and
    code-blocks are the smallest separation of the image data.  Entropy
    coding is performed within each code-block and arithmetically
    encoded by bitplane.  There are 3 passes for the code-block:
    significance propagation pass, magnitude refinement pass, and
    cleanup pass.

    After the coefficients of all code-blocks have been coded into a
    short bitstream, a header is added turning it into a packet.  The
    header has all the information needed to decompress the packet into
    code-blocks.  A group of packets is called layers.

    For additional features in transmitting, a re-ordering of the formed
    packets is necessary.  The standard has four ways to transmit and
    decode a compressed image by: resolution, quality, location, or
    component.  As there are many markers builtin to the codestream of
    JPEG 2000, a parser can go through the bitstream and get the proper
    order of packets to transmit and decode.

    This is only to serve as an introduction to JPEG 2000 to aid in
    understanding the rest of this document.  Further details of the
    encoder can be found in various texts on JPEG 2000.


Edwards, et al.                                                 [Page 3]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

    To decompress a JPEG 2000 codestream, one would follow the reverse
    order of the encoding order, minus the quantization step.  It is
    outside the scope of this document to describe in detail this
    procedure.  Please refer to various JPEG 2000 texts for details.


2. JPEG 2000 video features

    As described above, JPEG 2000 has the following features.

        o Higher compression efficiency than existing JPEG and yet less
          SNR deterioration
          (improved compression efficiency over JPEG with dramatic
           improvements at low bitrates)

        o Random codestream access and processing

        o Both lossless and compression and lossy compression can be
          performed by the same algorithm.

        o Optional spatial resolution and SNR progressive can be easily
          taken out from a single codestream.
          (NOTE)SNR means Signal to Noise Ratio. This is the factor to
                define the quality.

        o Parts of an image can have more bits for more detail. (ROI
          (Region of Interest) function)

        o Various levels of error resilience functionality.


    JPEG 2000 video streams are formed as a continuous series of JPEG
    2000 still images, so the above features of JPEG 2000 can be used
    effectively. JPEG 2000 video stream has the following merits.

        o SNR is improved at a low bit rate.  The formation can be used
          as a video stream format at a low band.

        o This is a Full Intra format in which each frame is
          independently compressed has a low encoding and decoding
          delay. This is suitable for interactive video communication.
          Even if a packet loss occurs in any part  of the frame, error
          is not propagated to subsequent frames.  Moreover, each frame
          can be handled independently this facilitates video editing.

        o JPEG 2000 has  flexible and accurate rate control.  This is
          suitable for traffic control and congestion control at the
          Internet transmission.

        o JPEG 2000 can provide within its own codestream error
          resilience markers to aid in codestream recovery.  An encoder
          can insert a resynchronization marker at the beginning of a
          JPEG 2000 packet and a segmentation symbol at the end of the
          bit plane to aid in recovery within a frame.

Edwards, et al.                                                 [Page 4]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001



3. Requirements for RTP payload format of JPEG 2000 video streams

    To provide a payload format that makes the most of the merits of
    JPEG 2000 video stream, described in the previous section, the
    following must be taken into consideration.

    - Provisions for packet loss

        On the Internet, 5% packet loss is common and this percentage
        may sometimes come to 20% or more.  To split JPEG 2000 video
        streams into RTP packets, efficient packetization of the
        codestream is required to minimize the effects of disabled
        decoding due to missing code-blocks over error prone
        environments.  If the main header is lost in transmission, the
        decoding ability is lost.  Accordingly, a system to compensate
        for the loss of the main header as much as possible is required.

    - A packetizing scheme that permits making the most of the JPEG 2000
      functionality.

        A packetizing scheme so that an image can be progressively
        transmitted and reconstructed progressively by the receiver using
        JPEG 2000 functionality.  Maximizing performance over various
        network conditions and various computing power of receiving
        platforms.


4. Proposal for an RTP payload format for JPEG 2000 video streams

4.1 RTP fixed header usage

    For each RTP packet, the RTP fixed header is followed by the JPEG
    2000 payload header, which is followed by JPEG 2000 codestream.
    The RTP header fields that have a meaning specific to the JPEG 2000
    video are described as follows:

    Payload type (PT): The payload type is dynamically assigned by means
    outside the scope of this document. A payload type in the dynamic
    range SHALL be chosen by means of an out of band signaling
    protocol (e.g., RTSP, SIP, etc).

    Marker bit (M): The marker bit of the RTP fixed header is set to 1
    on the last RTP packet of a video frame, and otherwise, must be 0.
    When transmission is performed by multiple RTP sessions, the bit is
    set in the last packet of the frame in each session.

    Timestamp: The RTP timestamp is in units of 90 KHz. The same
    timestamp must appear in each fragment of a given frame. The
    initial value of the timestamp is random (unpredictable) to make
    known-plaintext attacks on encryption more difficult, even if the
    source itself does not encrypt, because the packets may flow
    through a translator that does.

Edwards, et al.                                                 [Page 5]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001



4.2 RTP Payload header format

    The RTP payload header format for JPEG 2000 video stream is as
    follows:

     0               1               2               3
     0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |     type      | type-specific |   priority    |X|rsvd | mh_id |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        fragment offset                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Fig. 2:  RTP payload header format for JPEG 2000

    type : 8 bits

        The type field shows which part of JPEG 2000 codestream is
        included. The details of the type are described later.

    type-specific : 8 bits

        Interpretation depends on the value of the type field.
        This field is defined for future usage.
        This bit must be set to 0 when not used.
        i.e. Tile specific priority number (general idea)

    priority : 8 bits

        The priority field shows the importance of the JPEG 2000
        packet included in the given RTP packet. Typically, the higher
        priority is set at the packet which contains the JPEG 2000
        packets of the lower layers and the lower subbands.

    X : 1 bit

        extension bit. This bit must be set to 1 when
        JPEG 2000 optional payload header follows the JPEG 2000
        payload header, and otherwise set to 0.
        The details of the optional payload header is described later.

    rsvd : 3 bits

        These bits are reserved for future use and must be set to 0.

    mh_id : 4 bit

        identification of the main header of JPEG 2000. The same mh_id
        is used as long as the coding parameters described in the main
        header remain unchanged.

    fragment offset : 32 bits

Edwards, et al.                                                 [Page 6]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


        Because JPEG 2000 frames are typically larger than
        underlying network's maximum transfer units (MTU), frames may
        often be fragmented into several packets. The fragment offset is
        the data offset in bytes of the current packet from the first
        byte in the JPEG 2000 codestream. This field helps the receiver
        to reassemble JPEG 2000 codestream.

        To perform scalable video delivery by using multiple RTP
        sessions, the offset value from the first byte of the same frame
        is set for fragment offset.  Accordingly, in scalable video
        delivery using multiple RTP sessions, maybe the fragment offset
        will not be started with 0 in some RTP sessions even if the
        packet is the first one of the frame.


5. Fragmentation of JPEG 2000 codestream and Type Field

    Fig. 2 shows the construction of the JPEG 2000 codestream.  The JPEG
    2000 codestream consists of a main header beginning with the SOC
    marker, one or more tiles (only one tile for no tile division), and
    the EOC marker to indicate the end of the codesteam.  Each tile
    consists of a tile-part header starts with the SOT marker and ending
    with the SOD marker, and a bit stream (a series of JPEG 2000
    packets) of the bit stream.

         +--  +------------+
   Main  |    |    SOC     |  Required as the first marker.
   header|    +------------+
         |    |    main    |  Main header marker segments
         +--  +------------+
         |    |    SOT     |  Required at the beginning of each tile-part
   Tile- |    +------------+  header.
   part  |    |   T0,TP0   |  Tile 0, tile-part 0 header marker segments
   header|    +------------+
         |    |    SOD     |  Required at the end of each tile-part header
         +--  +------------+
              | bit stream |  Tile-part bit stream.
         +--  +------------+  Might include SOP and EPH
         |    |    SOT     |
   Tile- |    +------------+
   part  |    |   T1,TP0   |
   header|    +------------+
         |    |    SOD     |
         +--  +------------+
              | bit stream |
              +------------+
              |    EOC     |  Required as the last marker in the codestream
              +------------+

            Fig. 3: Construction of the JPEG 2000 codestream


    JPEG 2000 video streams are typically larger than underlying

Edwards, et al.                                                 [Page 7]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

    network's maximum transfer units (MTU), video sequence may often be
    fragmented into several IP packets at the network layer. the JPEG
    2000 video streams are fragmented into RTP packets according to the
    following basic rule.

    The JPEG 2000 construction consists of a main header, tile-part
    headers, and JPEG 2000 packets.  When we packetize the JPEG 2000
    codestream, these construction units from the codestream should be
    maintained.  Each RTP packet should consist of a main header,
    tile-part header, or JPEG 2000 packet.

    If the sender understands JPEG 2000 codestream and can read the JPEG
    2000 packets from the codestream.  (i.e. the sender is intelligent)
    JPEG 2000 packets should be packed into RTP payload packets in the
    following way:

        1. If the JPEG 2000 packets are smaller than the MTU size, the
           sender should put as many whole JPEG 2000 packets into a
           single RTP packet.  That is, the JPEG 2000 payload data
           should begin with one of the SOC marker, SOT marker, or SOP
           marker (if it exists).

        2. If the JPEG 2000 packets are larger than the MTU size, the
           sender should segment the JPEG 2000 packets at the largest
           possible MTU size but without JPEG 2000 packets overlapping.

    If the server does not understand JPEG 2000 codestream (i.e. the
    sender is not intelligent,) it should pack JPEG 2000 codestream in
    the largest possible MTU data size for the RTP packet.  JPEG 2000
    codestream will be segmented along arbitrary lengths by the sender
    into RTP packets.

    Regardless of the sender's capabilities, the receiver must be able
    to handle RTP packets of any size.

    If we do not fragment at the sender, any packets larger than the MTU
    size, will be fragmented into multiple smaller IP packets than the
    MTU size by the IP layer.  If one fragmented IP packet is lost
    during transmission, it is recognized as a loss of the whole RTP
    packet because the receiving host cannot reassemble the RTP packet.
    The segmentation of the JPEG 2000 codestream into RTP packets,
    should fit within the RTP payload size.

    In the following, all the possible packetization cases are described
    with diagrams.  For each case, the type field value shown in Fig. 2
    is also indicated.


5.1 Separation at arbitrary lengths

    In this case, a JPEG 2000 codestream is split into several
    fragments at arbitrary byte-position.
    The type value of the RTP packet is set to 0.


Edwards, et al.                                                 [Page 8]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

    +---+---+---+----------------------+
    |RTP|PL |SOC| jpeg 2000 codestream |        type = 0
    |hdr|hdr|   | fragment (1)         |
    +---+---+---+----------------------+
    +---+---+--------------------------+
    |RTP|PL |   jpeg 2000 codestream   |        type = 0
    |hdr|hdr|   fragment (2)           |
    +---+---+--------------------------+
                ...
    +---+---+----------------------+---+
    |RTP|PL | jpeg 2000 codestream |EOC|        type = 0
    |hdr|hdr| fragment (N)         |   |
    +---+---+----------------------+---+
    *PL hdr = payload header


    Such RTP packetization scheme is not recommended from the standpoint
    of error resilience.  It is desirable to use it only in some limited
    environments shown below.

        - The sender finds it difficult to distinguish the main header,
          tile header, and JPEG 2000 packets from one another.  There is
          no SOP marker in the JPEG 2000 codestream.  The sender is not
          intelligent.

        - The network environment is error free.

        - If the JPEG 2000 error resilience markers (TLM, PLM, PLT, PPM,
          and PPT markers) are present in the codestream.  Error
          resilience will be handled outside of RTP.  Its description is
          not within the scope of this document.  Using these markers
          may improve the error resilience.



5.2  General JPEG 2000 RTP packet types

   (1) JPEG 2000 main header(SOC marker) must come first of the RTP
       payload (just after the RTP payload header). The type value of
       the RTP packets which contain the whole main header (not
       fragmented) is 4,

       (1-a) The RTP packet only contains the complete main header.
       +---+---+------+
       |RTP|PL |Main  |                         type = 4
       |hdr|hdr|header|
       +---+---+------+

       (1-b) The main header and the first tile-part header are packed
       into one RTP packet.
       +---+---+------+---------+
       |RTP|PL |Main  |Tile-part|               type = 4
       |hdr|hdr|header|header   |
       +---+---+------+---------+

Edwards, et al.                                                 [Page 9]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


       (1-c) The main header, the first tile-part header and JPEG 2000
       packet(s) are packed into one RTP packet.
       +---+---+------+---------+---------+-----+---------+
       |RTP|PL |Main  |Tile-part|jpeg 2000| ... |jpeg 2000|     type = 4
       |hdr|hdr|header|header   |packet   |     |packet   |
       +---+---+------+---------+---------+-----+---------+


       (1-d) The main header is split into the several RTP packets.

       If the main header is larger than one RTP packet,  then it may be
       split into several RTP packets.
       In this case, the RTP packets must contain only a piece of the
       main header. The type value of the RTP packets which contain the
       first piece of the main header is type 5, and the last piece is
       type 7 and the middle pieces are all type 6.

       +---+---+--------------+
       |RTP|PL |Main Header(1)|                 type = 5
       |hdr|hdr|              |
       +---+---+--------------+
       +---+---+--------------+
       |RTP|PL |Main Header(2)|                 type = 6
       |hdr|hdr|              |
       +---+---+--------------+
       +---+---+--------------+
       |RTP|PL |Main Header(3)|                 type = 6
       |hdr|hdr|              |
       +---+---+--------------+
                 ...                              ...
       +---+---+--------------+
       |RTP|PL |Main Header(N)|                 type = 7
       |hdr|hdr|              |
       +---+---+--------------+


       (Note) When the main header is split into multiple RTP packets,
       the first  tile-part header must not be included in the RTP
       packet containing the last fragment.
       +---+---+--------------+---------+
       |RTP|PL |Main Header(N)|Tile-part|       This packetization is
       |hdr|hdr|              |header   |       not allowed.
       +---+---+--------------+---------+



   (2) Tile-part headers (SOT marker) must come first of the RTP payload
       (just after the RTP payload header), except for the first
       tile-part header just after the main header.
       The first tile-part header may either be packed with the main
       header, or be separated to another RTP packet.
       The type value of the RTP packet which begins with the
       tile-part header is 8.

Edwards, et al.                                                [Page 10]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


       (2-a) The RTP packet only contains the complete tile-part header.
       +---+---+----------+
       |RTP|PL |Tile-part |                     type = 8
       |hdr|hdr|Header    |
       +---+---+----------+

       (2-b) The tile-part header and JPEG 2000 packet(s) are packed
       into one RTP packet.
       +---+---+----------+---------+-----+---------+
       |RTP|PL |Tile-part |jpeg 2000| ... |jpeg 2000|   type = 8
       |hdr|hdr|Header    |packet   |     |         |
       +---+---+----------+---------+-----+---------+


       (2-c) The tile-part header is split into the several RTP
       packets.

       If the tile-part header is larger than one RTP
       packet, it may be split into several RTP packets. In this
       case, the RTP packets contain only a piece of the tile-part
       header.
       The RTP packets which contain the first piece of the tile-part
       header is type 9, and the last piece is type 11, and the middle
       pieces are all type 10.

       +---+---+-------------------+
       |RTP|PL |Tile-part header   |            type = 9
       |hdr|hdr|fragment(1)        |
       +---+---+-------------------+
       +---+---+-------------------+
       |RTP|PL |Tile-part header   |            type = 10
       |hdr|hdr|fragment(2)        |
       +---+---+-------------------+
       +---+---+-------------------+
       |RTP|PL |Tile-part header   |            type = 10
       |hdr|hdr|fragment(3)        |
       +---+---+-------------------+
                 ...
       +---+---+-------------------+
       |RTP|PL |Tile-part header   |            type = 11
       |hdr|hdr|fragment(N)        |
       +---+---+-------------------+

       (Note) When the tile-part header is split into multiple RTP
       packets, the JPEG 2000 packet must not be included in the RTP
       packet containing the last fragment.
       +---+---+-------------------+---------+
       |RTP|PL |Tile-part header   |jpeg 2000|  This packetization is
       |hdr|hdr|fragment(N)        |packet   |  not allowed.
       +---+---+-------------------+---------+




Edwards, et al.                                                [Page 11]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

   (3) The JPEG 2000 packet must be packed by itself, except for JPEG
       2000 packets just after the tile-part header. Also several JPEG
       2000 packets may be packed into the one RTP packet.
       If SOP(Start of Packet) marker is used for error resilience,
       SOP marker shall be placed at the beginning of the RTP payload.
       (When the SOP marker is used, it is placed at the beginning of
       the RTP packet.)
       The type value of the RTP packet, which contains only jpeg 2000
       packet(s) is 12.

       (3-a) More than one jpeg 2000 packets are packed into one RTP packet.
       +---+---+---------+-----+---------+
       |RTP|PT |jpeg 2000| ... |jpeg 2000|      type = 12
       |hdr|hdr|packet   |     |packet   |
       +---+---+---------+-----+---------+

       (3-b) The jpeg 2000 packet is split into the several RTP
       packets

       If the JPEG 2000 packet is larger than one RTP packet, then it
       may be split into two or more RTP packets. In this case, the RTP
       packets contain only a piece of the jpeg 2000 packet.

       The RTP packet with the first piece of JPEG 2000 packet is type
       13, and the last piece is type 15, and the middle pieces are
       all type 14.

       +---+---+-------------------+
       |RTP|PT |jpeg 2000 packet   |            type = 13
       |hdr|hdr|fragment(1)        |
       +---+---+-------------------+
       +---+---+-------------------+
       |RTP|PT |jpeg 2000 packet   |            type = 14
       |hdr|hdr|fragment(2)        |
       +---+---+-------------------+
       +---+---+-------------------+
       |RTP|PT |jpeg 2000 packet   |            type = 14
       |hdr|hdr|fragment(3)        |
       +---+---+-------------------+
                 ...                               ...
       +---+---+-------------------+
       |RTP|PT |jpeg 2000 packet   |            type = 15
       |hdr|hdr|fragment(N)        |
       +---+---+-------------------+

       (Note) When the JPEG 2000 packet is split into multiple RTP
       packets, another JPEG 2000 packet must not be included in the RTP
       packet containing the last fragment.
       +---+---+-------------------+---------+
       |RTP|PT |jpeg 2000 packet   |jpeg 2000|  This packetization is
       |hdr|hdr|fragment(N)        |packet   |  not allowed.
       +---+---+-------------------+---------+



Edwards, et al.                                                [Page 12]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

6. Scalable Delivery and Priority field

    JPEG 2000 codestream has rich functionality built into it so
    decoders can easily handle scalable delivery or progressive
    transmission.  Progressive transmission that allows images to be
    reconstructed with increasing pixel accuracy or spatial resolution
    is essential for many applications. This feature allows the
    reconstruction of images with different resolutions and pixel
    accuracy, as needed or desired, for different target devices.  The
    largest image source devices can provide a codestream that is easily
    processed for the smallest image display device.

    The JPEG 2000 packets contain all compressed image data from a
    specific layer, a specific component, a specific resolution level,
    and a specific precinct. The order in which these packets are found
    in the codestream is called the "progression order". The ordering
    of the packets can progress along four axes: layer, component,
    resolution level and precinct.

    Providing priority field to show importance of data contained in a
    given RTP packet makes the most of JPEG 2000 progressive/scalable
    functions.

    In resolution progression order, the higher decomposition level is
    more important.  The priority field of the RTP packet that contains
    the higher decomposition level is set to the higher priority.
    When transmitted in spatial resolution order, LL0 components data
    is set to the highest priority.


6.1 Priority mapping table

    For the progression order, the priority value to be given to each
    JPEG 2000 packet is defined by the priority mapping table.  The
    higher the importance, the smaller the priority value.  The priority
    mapping table can define the priority values for spatial resolution,
    layer, color component, or precinct level. This priority table is
    sent from the sender to a receiver through another protocol (RTSP,
    SIP, etc.) outside of RTP.  To change the priority mapping table, a
    new priority mapping table must be sent from the sender to the
    receiver as needed.

    If there is no priority mapping table, the priority value of the RTP
    packet must be set to '0xff'.

    For example, the priority table can be sent to the receiver from the
    sender but the receiver will determine its own level of priority RTP
    packets to receive using the priority table as a guideline.

    The priority value of 1 has the highest priority in the priority
    mapping table.  As the priority value increases, the priority
    becomes lower.  If transmission is performed without attaching any
    priority mapping table, 0xff (255) must be set in the priority
    field.

Edwards, et al.                                                [Page 13]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


    For RTP packets that only consist of a whole or fragmented main or
    tile header and containing no JPEG 2000 packets , priority 0 must be
    set by the sender if a priority mapping table is used.  (If a
    priority mapping table is not used, the priority value must be 0xff
    for the same RTP packets.)

    The sender may transmit each priority using separate multiple RTP
    sessions defined by the priority value. For example, different
    priority may be allocated to other multicast groups. The sender may
    also transmit all priority valued RTP packets using a single RTP
    session.

    When multiple JPEG 2000 packets are included in a single RTP packet,
    the higher priority value of JPEG 2000 packets is set for the whole
    RTP packet by the sender.

    In the following, an example of priority mapping table is shown.
    The component based priority should be used when there is a higher
    priority component like Y in YUV components.


6.1.1 Layer based priority

    This is an example of priority mapping table in the progression
    order in which SNR is improved progressively.  The JPEG 2000 packet
    of layer 0 and resolution 0 has the highest priority.  The JPEG 2000
    packets with layer 0 and resolution 1 or more are next in priority.
    As the layer number increases, the priority becomes lower.

        L  R  C P | priority
      ------------+-------------
        0  0  - - |     1
        0 >0  - - |     2
        1  -  - - |     3
         ....     |   ....


6.1.2 Resolution level based priority

    This is an example of priority mapping table in the progression
    order in which the spatial resolution is increased.  The JPEG 2000
    packet with layer 0 and resolution 0 has the highest priority and
    the JPEG 2000 packets with later 1 or more and resolution 0 are next
    in priority.  As the resolution level increases, the priority
    becomes lower.

        L  R C P  | priority
      ------------+-------------
        0  0 - -  |     1
       >0  0 - -  |     2
        -  1 - -  |     3
         ....     |   ....


Edwards, et al.                                                [Page 14]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


6.1.3 Component based priority

    The priority mapping table for component progression is used only
    when there is priority order among components.  This example is for
    YUV components.  The JPEG 2000 packet with layer 0, resolution 0,
    and component 0 has the highest priority.  The JPEG packets with
    layer 1 or more, resolution 0, and component 0 are next in priority.
    The JPEG 2000 packets with resolution 0 and component 0 are the
    third in priority.  As the resolution increases, the priority
    becomes lower.

        L  R C P  | priority
      ------------+-------------
        0  0 0 -  |     1
       >0  0 0 -  |     2
          >0 0 -  |     3
        -  - 1 -  |     4
         ....     |   ....


6.2 Sender's Actions

    Priority is given in accordance with the priority mapping table.
    The priority field is only a hint for the receiver but never forces
    the receiver to use any specific processing method.  If the priority
    mapping table is not used, '0xff' must be set.


6.3 Receiver's Action

    Progressive transmission that allows images to be reconstructed with
    increasing pixel accuracy or spatial resolution is essential for
    many applications. This feature allows the reconstruction of images
    with different resolutions and pixel accuracy, as needed or desired,
    for different target devices.  The image architecture provides for
    the efficient delivery of image data in many applications such as
    client/server applications.  The receiver should decode packets
    above a certain priority to obtain maximum performance depending on
    the receiver's platform.

    The receiver can determine on its own (using or not using the
    mapping table and several other variables) the priority value level
    the RTP packets it should decode.

    For example, when the CPU power is incompetent or the terminal has
    only a low-resolution display, decoding only RTP packets below a
    certain priority permits obtaining optimal performance.

    If any high-priority RTP packet is not received when a packet loss
    occurs, frame(s) can be skipped  because visual loss may be
    remarkable even if decoding can be successfully performed.

    When any uninterpretable or unexpected priority is received, the

Edwards, et al.                                                [Page 15]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

    receiver must interpret packets as no priority (i.e. priority=
    0xff.)

7. JPEG 2000 main header compensation

    The JPEG 2000 image main header describes various encode parameters
    and the decoder decodes by using the parameters described in the main
    header.  If the RTP packet that contains the main header is lost,
    the corresponding JPEG 2000 codestream cannot be decoded.  In an
    extremely rare case, if the main header has dropped and all the
    remainder JPEG 2000 packets has been received successfully, the
    receiver cannot decode the frame.  Even when the main header is
    lost, it can be recovered to a certain level using the following
    method.

    A recovery of the main header that has been lost is very simple.  In
    the case of JPEG 2000 video, it is common that encode parameters
    will not greatly change in each frame.  Even if the RTP packet
    including the main header of a frame has dropped, decoding
    processing can be performed by using the main header of the previous
    frame if this previous frame is already encoded by the same encode
    parameters.

    The mh_id field of the payload header is used to recognize whether
    the encoding parameters of the main header are the same as the
    encoding parameters of the previous frame. The same value is set in
    mh_id of the RTP packet in the same frame.  mh_id and encode
    parameters are not associated with each other as 1:1 but they are
    used to recognize whether the encode parameters of the previous
    frame are the same or not.

    The mh_id field is saved from previous frames to be used to recover
    the current frame's main header, if lost.  If the mh_id of the
    current frame has the same value as the mh_id value of the previous
    frame, the previous frame's main header can be used to decode the
    current frame, in case the main header lost.


7.1 Sender processing

    The sender transmits RTP packets with the same mh_id value unless
    the encoder parameters are different from the previous frame.  The
    encode parameters are the fixed information marker segment (SIZ
    marker) and functional marker segments (COD, COC, RGN, QCD, QCC, and
    POC) specified in JPEG 2000 Part 1 Annex. A.  If the encode
    parameters have been changed, the sender transmits RTP packets by
    incrementing the mh_id value by one.  The initial mh_id value is 1.
    When the mh_id value exceeds 15, the value returns to 1 again.

    If the mh_id field is set to 0, the receiver must not save the main
    header and must not compensate for lost headers using the above
    method.



Edwards, et al.                                                [Page 16]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

7.2 Receiver processing

    When the receiver has received the main header correctly, the RTP
    sequence number, the mh_id and main header are saved except when the
    mh_id value is 0.  Only the last main header that was received
    correctly is saved.  That is, if there has been a saved main header,
    the previous one is deleted and the new main header is saved.

    When the main header could not be received, the receiver compares
    the current mh_id value (this mh_id can be known by receiving at
    least one RTP packet) with the saved mh_id value.  When the values
    are the same, decoding is performed by using the saved main header.

    The main header of mh_id = 0 is an indication from the sender  to
    not compensate for lost headers or to save any headers.  .


8. Optional Payload Header

    When the extension bit of the JPEG 2000 payload header is 1, the
    payload header is followed by an optional payload header.  The JPEG
    2000 video stream payload comes after the optional payload header.
    The figure shows a general format of the optional payload header.

     0               1               2               3
     0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    optype   |X|     length    |                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                              |
    |                   option specific format .....               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Fig. JPEG 2000 video stream optional payload header generic format

    optype : 7 bits

        optype shows the optional payload header type.

    X : 1bit

        more extension bit. This must be set to 1 if another optional
        payload header follows this optional payload header; otherwise
        it must be set to 0.

    length : 8 bits

        length of optional header in bytes.


    The receiver performs processing for the optional header when the
    extension bit of the JPEG 2000 payload header is 1.
    When having received an optype that cannot be interpreted, the
    receiver will skip the amount specified in the length field and not
    process the optional payload header..

Edwards, et al.                                                [Page 17]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001


    When the more extension bit of the optional header is 1, another
    optional payload header will come immediately after this optional
    payload header.


8.1 Quantization Optional Header

    As one of optional payload headers, the quantization optional header
    is defined.  If only the QCD and/or QCC information has been
    changed, this optional payload header conveys the information.  One
    optional payload header for QCD and another optional payload header
    for the QCC information.  Both changes must not be conveyed in a
    single optional payload header.

    If the receiver having received the quantization optional header but
    the main header of the current frame is lost; the receiver can
    replace the QCD and QCC information in the saved main header using
    the current QCD or QCC optional header only if the mh_id value of
    the current frame and previous frame differ by 1. The receiver
    should interpret this optional payload header only when the mh_id
    value changes.

    This header is supposed to be used when an adjustment is made by
    quantization size in order to keep the amount of compressed JPEG
    2000 image data at a constant level.

    The quantization optional header format is shown in the figure
    below.

     0               1               2               3
     0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   optype=1  |X|     length    |Q|    cindex   |  decomp level |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |     style     |                                               |
    +-+-+-+-+-+-+-+-+                                               +
    |               quantization step size value  .....             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Fig. Quantization Optional Header format

    Each field is explained below.

    optype : 1 bit

        The optype value of the quantization optional header is 1.

    Q : 1 bit

        This indicates whether the information is of QCD or of QCC.  If
        the information is of QCD, 0 is set.  If the information is of
        QCC, 1 is set.


Edwards, et al.                                                [Page 18]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

    cindex : 7 bits

        When the information is of QCC, this represents a component
        number.

    decomp level : 8 bits

        This indicates the decomposition level of the corresponding frame.

    style : 8 bits

        This indicates the quantization style specified in the QCD and
        QCC marker segments.  (Refer to JPEG 2000 Part I: Annex A Table
        A-28.)

    quantization step size value : variable length

        This is followed by the quantization stop size value specified
        by style.  (Refer to JPEG 2000 Part I: Annex A Table A-29 and
        A-30.)


9. Security Consideration

    RTP packets using the payload format defined in this specification
    are subject to the security considerations discussed in the RTP
    specifications[3]. This implies that confidentiality of the media
    streams is achieved by encryption. Because the data compression
    used with this payload format is applied end-to-end, encryption
    may be performed on the compressed data so there is no conflict
    between the two operations.


10. Author's Address

    Eric Edwards
    Sony Corporation
    Media Processing Division
    Network & Software Technology Center of America
    3300 Zanker Road, MD: SJ2C4
    San Jose, CA 95134
    Phone: +1 408 955 6462
    Fax: +1 408 955 5724
    Email: Eric.Edwards@am.sony.com

    Satoshi Futemma
    Sony Corporation
    6-7-35 Kitashinagawa Shinagawa-ku
    Tokyo 141-0001 JAPAN
    Phone: +81 3 5448 4373
    Fax: +81 3 5448 4622
    Email: satosi-f@sm.sony.co.jp

    Eisaburo Itakura

Edwards, et al.                                                [Page 19]


INTERNET-DRAFT    draft-edwards-avt-rtp-jpeg2000-00.txt    November 2001

    Sony Corporation
    6-7-35 Kitashinagawa Shinagawa-ku
    Tokyo 141-0001 JAPAN
    Phone: +81 3 5448 3096
    Fax: +81 3 5448 4622
    Email: itakura@sm.sony.co.jp

    Takahiro Fukuhara
    Sony Corporation
    1-11-1 Osaki Shinagawa-ku
    Tokyo 141-0032 JAPAN
    Phone: +81 3 5435 3665
    Fax: +81 3 5435 3891
    Email: fukuhara@av.crl.sony.co.jp


11. References

    [1] ISO/IEC JTC1/SC29/WG1: "JPEG 2000 Part I Final Draft
        International Standard", September 2000.

    [2] ISO/IEC JTC1/SC29/WG1: "Motion JPEG 2000 Committee Draft
        1.0", http://www.jpeg.org/public/cd15444-3.pdf,
        December 2000.

    [3] A. N. Skodras, C. A. Christopoulos and T. Ebrahimi: "JPEG2000:
        The Upcoming Still Image Compression Standard", In Proc. of the
        11th Portuguese Conference on Pattern Recognition, pp. 359-366,
        Porto, Portugal, May 2000.

    [4] ISO/IEC JTC1/SC29/WG1: "JPEG2000 requirements and profiles
        version 6.3", draft in progress,
        http://www.jpeg.org/public/wg1n1803.pdf, July 2000.

    [5] Diego Santa-Cruz, Touradj Ebrahimi, Joel Askelof, Mathias
        Larsson and Charilaos Christopoulos: "JPEG 2000 still image
        coding versus other standards", In Proc. of SPIE's 45th annual
        meeting, Applications of Digital Image Processing XXIII,
        vol. 4115, pp. 446-454, July 2000.

    [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson "RTP:
        A Transport Protocol for Real Time Applications", RFC 1889,
        January 1996.












Edwards, et al.                                                [Page 20]