Internet Draft
   draft-rey-avt-3gpp-timed-text-02.txt                            J. Rey
                                                                Y. Matsui
                                                               Matsushita

   Expires: August 2004                                     February 2004


                  RTP Payload Format for 3GPP Timed Text

   Status of this document

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.

   Copyright Notice

      Copyright (C) The Internet Society (2003).  All Rights Reserved.


   Abstract

   This document specifies an RTP payload format for the transmission of
   3GPP (3rd Generation Partnership Project) timed text.  3GPP timed
   text is time-lined decorated text media format whose defined storage
   format is the ISO (International Standardisation Organisation) Base
   Media File Format.  As of today, 3GP files containing timed text
   contents can be downloaded via HTTP and be synchronised with
   audio/video contents.  There is however no available mechanism for
   streaming 3GPP timed text contents.  In the following sections the
   problems of streaming timed text are addressed and a payload format
   for streaming 3GPP timed text over RTP is specified.







                   IETF draft - Expires August 2004           [Page 1]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004





Table of Contents

   1. Change Log......................................................2
   2. Introduction....................................................3
   3. Terminology.....................................................5
   4. RTP Payload Format for 3GPP Timed Text..........................5
   5. Resilient Transport............................................13
   6. Congestion control.............................................13
   7. SMIL usage.....................................................14
   8. MIME Type usage Registration...................................14
   9. SDP usage......................................................16
   10. Examples of RTP packet structure..............................17
   11. IANA Considerations...........................................17
   12. Security considerations.......................................17
   13. References....................................................18
   14. Annex A Basics of 3GP File Structure..........................19
   15. Author's Addresses............................................20
   16. IPR Notices...................................................21
   17. Full Copyright Statement......................................21

1. Change Log

1.1 Changes from draft-rey-rtp-avt-3gpp-tt-00.txt

   Major changes:
   - completed empty sections from -00 draft.
   - abstract and introduction re-arranged. Moved section "Basics of the
   3GP File Structure" to end of the document as Annex B.
   - SLEN, SIDX and SDUR lengths fixed to 16, 16 and 24 bits,
   respectively.
   - New OPTIONAL header, SPLDESC, added to transport sample description
   in-band.
   - Section 4 on payload format expanded: text header, fragment header
   and sample description header are fully specified.
    - SMIL usage section added.

1.2 Changes from draft-rey-rtp-avt-3gpp-tt-01.txt

   Major changes:

   - Terminology, some terms introduced to clarify text.
   - Section 4
    - rules and recommendations on fragmentation are given.
    - payload headers were calssified into five types, with a common
   field section and specific fields for each type.

   Rey & Matsui                                               [Page 2]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


    - header structure similar to RFC 3640 for easy transformation.

2. Introduction

   The purpose of this draft is to provide a means to stream the 3GPP
   timed text using RTP.

   3GPP timed text is a 3GP file format for time-lined decorated text
   specified in [1].  The 3GP file format itself follows the ISO Base
   Media File Format recommendation [2].  Besides plain text, the 3GPP
   timed text format allows the display of decorated text (e.g. blinking
   text, scrolling, hyperlinks) synchronised or not with other media.

   The 3GPP timed text format was developed for 3GPP Transparent End-to-
   end Packet-switched Streaming Services (PSS) [1].  The scope of the
   3GPP PSS includes downloading and streaming of multimedia content
   over 3G packet-switched networks.  The PSS adopts multimedia codecs
   (such as MPEG-4 Visual, AMR, MPEG-4 AAC, and JPEG) and protocols like
   SMIL [9] for presentation layouts or RTP for streaming.  The current
   usage of the 3GPP timed text file format is limited to downloading
   via HTTP (with or without audio contents) due to the lack of an
   appropriate RTP payload format.

   In general, a multimedia presentation might consist of several
   audio/video/text streams or tracks (in 3GP file format jargon).
   Different tracks may have different contents and tracks of different
   media may be spatially synchronised using the information within the
   tracks or a scene description language like SMIL.  An example of this
   would be a media session with three different media tracks: 1 audio,
   1 video and 1 timed text that reproduces a music video with karaoke
   subtitles.  The information contained in each track defines the
   regions where each media is displayed, how the media looks like and
   how it is synchronised, e.g., the song lyrics is displayed below the
   video and the words are highlighted and synchronised with the
   soundtrack.

   Basically the 3GPP timed text format can be summarised as consisting
   of four differentiated functional components:

    - initial setup information for text tracks: these are the height
   and width of the text region where the text track (contents) are
   displayed, the translation offsets tx and ty relative to the video
   track region and the layer or proximity of the text to the user.  In
   the 3GPP timed text format, these pieces of information are extracted
   from Track Header Box, "tkhd".
    - general formatting information about the text track: default font,
   default background colour, default horizontal and vertical
   justification, default line width, default scrolling, etcetera.  In
   the 3GPP timed text format, these pieces of information are extracted
   from the Sample Description Box, "stsd".
    - the actual text, conveyed as plain text using either UTF-8 or UTF-
   16 encoding and,

   Rey & Matsui                                               [Page 3]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


    - the "decoration": whether it is highlighted text, blinking text,
   karaoke, hypertext, scroll delay, other text styles/formatting than
   the defaults, etcetera.  In the 3GPP timed text format, these pieces
   of information are extracted from the various Modifier Boxes: "hlit",
   "hclr", "blnk", "krok", "href", "dlay", "styl" or "tbox".

   For details refer to Annex A that summarises the basics of the 3GP
   file format and to [1], where a more detailed description of the
   setup information, format parameters and modifiers is contained.


2.1 Requirements

   In this section a set of requirements is listed.  A justification for
   each of them is also given.  An RTP Payload Format for 3GPP timed
   text SHALL:

        1.  Keep the 3GP text sample structure.  A text sample consists
   of the text length, the text string (either UTF-8 or UTF-16 encoded),
   and one or several Modifier Boxes containing the text "decoration",
   as defined in [1].  This is important to foster interoperability of
   3GP file and RTP payload formats.

        2.  Transmit the text sample size, sample duration and sample
   description index in-band.  In RTP it is important to transmit it in-
   band because this information might change from sample to sample.
   This is also important for buffering purposes as described in Section
   4.1.

        3.  Enable the transmission of the formatting information
   (contained in the Sample Description Box, "stsd") by out-of-band and
   in-band means.  In general, a single sample description may be used
   by different text samples.  Therefore, to save overhead it is
   sensible to transmit a default formatting once at the initialisation
   phase and update this on demand.  On the other hand, these pieces of
   information may become large so that out-of-band transmission might
   not be the most appropriate method.  Also, out-of-band channels might
   not be always available.  For these reasons, the payload format SHALL
   enable also the in-band transmission of sample description
   information.  This is especially useful for live streaming (where the
   contents are not known a priori) or to protect this information
   through other mechanisms like FEC [4] or retransmission [13].  RFC
   2354 [8] discusses available mechanisms for packet loss resiliency.

        4.  Enable the aggregation of text samples in one single RTP
   packet.  In a mobile communication environment a typical text sample
   size is around 100-200 bytes.  Thus, transporting several text
   samples in one RTP packet makes the transport over RTP more
   efficient.

        5.  Enable the fragmentation and reassembly of a text sample
   into several RTP packets in order to cover a wide range of
   applications and network environments.  In general, fragmentation is

   Rey & Matsui                                               [Page 4]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   a rare event given the low bit rates and text sample sizes.  However,
   the 3GPP Timed Text media format allows for larger text samples and
   so SHALL the payload format cover this possibility.

        6.  Enable the use of resilient transport mechanisms, such as
   repetition, retransmissions and FEC.


3. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [5].

   Furthermore, the following definitions are used in this document:

    - text sample or whole text sample: this refers to a unit of text
   data as contained in the source 3GP file.  Its equivalent in
   audio/video would be an audio/video frame (see Section 14 for
   details).  A text sample contains a text length indication, a text
   string, and zero or several modifier boxes (the decoration of the
   text).
    - fragment or text sample fragment: a fraction of a text sample as
   defined above. A fragment may either text strings and modifiers or
   just one of these.
    - sample contents: general term to identify timed text data
   transported when using this payload format.  In addition to text
   samples and fragments, it may also be used to refer to the sample
   descriptions associated to the text samples.


4. RTP Payload Format for 3GPP Timed Text

   The format of an RTP packet containing 3GPP timed text is shown
   below:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X| CC    |M|    PT       |        sequence number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                      RTP payload                              |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Marker bit (M): the marker bit must be set to 1 if the RTP packet
   includes the last fragment of a text sample; otherwise set to 0.  For


   Rey & Matsui                                               [Page 5]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   RTP packets containing several text samples, being at least one of
   them non-fragmented, the marker bit MUST be set to 1.

   Timestamp: the timestamp indicates the sampling instant of the timed
   text sample contained in the RTP packet.  The initial value is
   randomly determined.  If the RTP packet includes more than one text
   sample (i.e. text sample aggregation), the timestamp indicates the
   sampling instant of the oldest text sample in the RTP packet.  The
   samples MUST be placed in playout order, whereas the oldest sample is
   placed first in the payload.  The timestamp of the subsequent samples
   is obtained by adding the timed text sample duration (if present) to
   the timestamp value.  For example, let sdur(0), sdur(1) and sdur(2)
   be the durations of three subsequent timed text samples included in
   an RTP packet.  Let rtpts be the timestamp as present in the RTP
   header.  The timestamp ts(i) for each sample (i=0,1,2) would be:

                ts(i)=rtpts + sum[sdur (i-1)];

                   ts(0)=rtpts,
                   ts(1)=rtpts +  sdur(0)
                   ts(2)=rtpts + (sdur(0)+ sdur(1))

   Some text samples may become large and have to be fragmented and so
   spread over several RTP packets.  In this case, the receiver needs to
   associate fragments of the same text sample.  This is done using the
   timestamp.  The order of the fragments is resolved using the fields
   available in the following headers.

   The value timestamp clockrate is copied directly from the 3GP file:
   the value of "timescale" in the Media Header Box is to be used.

   Payload Type (PT): the payload type is set dynamically and sent by
   out-of-band means.

   The usage of the remaining RTP header fields follows the rules of RTP
   [3] and the profile in use.


4.1 Fragmentation of Timed Text Samples

   This section justifies why text samples MAY be fragmented and
   discusses some of the possible approaches to do it.  A solution is
   proposed together with rules and recommendations for fragmenting and
   transporting text samples using this payload format.

   3GPP Timed Text applications are expected to operate at low bit rates.
   This fact added to the small size of timed text samples (typically
   one or two hundred bytes) makes fragmentation of text samples a rare
   event.  Samples should usually fit into the MTU size of the used
   network path.

   Nevertheless, the text string (e.g. ending roll in a movie) and some
   modifier boxes, i.e. for hyperlinks ("href"), for karaoke ("krok") or

   Rey & Matsui                                               [Page 6]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   for fonts ("styl") might become large and need fragmentation.  This
   may also apply for future modifier boxes.  While the text string is
   recommended as per [1] to take a maximum of 2048 bytes for maximum
   client interoperability, there is no recommendation on the amount of
   space occupied by modifier boxes.

   In order to transport these larger text samples using RTP, it could
   be argued that a careful encoding be used to transform the original
   large sample into smaller self-contained text samples that fit into
   the transport MTU.  This would comply with the ALF principle, as per
   RFC 2367 [14].  It would also need additional pre-processing previous
   to RTP encapsulation.  Given the low probability of fragmentation, it
   is believed that the overhead of this pre-processing (careful
   encoding) is not worth.  It appears more appropriate to encode text
   samples without taking the path MTU into account.  In this manner,
   this payload format meets a trade-off by intentionally leaving out
   this pre-processing and making some text samples more sensitive to
   packet losses.

   However, a minimum set of fragmentation rules and recommendations
   SHALL be observed to guarantee a minimum resiliency and guide in the
   task of fragmentation.  Text samples and fragments thereof are
   aggregated in the RTP payload according to the rules and
   recommendations specified as follows:

    - it is RECOMMENDED that text samples are fragmented as seldom as
   possible.  E.g. if a previous packet has some free space and a new
   text sample fits in one MTU, a new RTP packet SHOULD be sent, instead
   of sending two or more fragments out of it.  In order to fill up the
   remaining bits, piggybacking of sample descriptions MAY be performed.

    - text strings MUST split at character boundaries.  Otherwise, it is
   not possible to display the text of a fragment if the previous is
   lost.

    - it is RECOMMENDED to include as fewer text sample fragments as
   possible in an RTP packet.  This reduces the effects of packet loss.
   RTP packets using this payload format MAY include zero or more whole
   text samples, zero or more text sample fragments and zero or more
   sample descriptions.

    - sample descriptions SHALL NOT be fragmented, since they contain
   important information that may affect several text samples.

    - for enhanced resiliency against packet loss it is RECOMMENDED that
   fragments containing decoration are especially protected using FEC
   [4], retransmission [13], packet repetition or a similar technique.

    - when fragmenting text samples, the start of the decoration
   (modifiers) MUST be indicated.  Otherwise, if packets are lost, a
   client may be unable to identify where the modifiers start and the
   text ends.


   Rey & Matsui                                               [Page 7]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   Usually, RTP applications use the information on packet size from UDP
   or lower layers to find out the length of the RTP payload.  This
   means that if several text samples (or fragments) are contained in
   the payload a length indication MUST be present for all fragments,
   but the last one.  Similarly, those transported as unique payload do
   not need of a length indication.

   However, some transport schemes for RTP, e.g. RFC 3640 [15], require
   that the length of each fragment is indicated.  This payload format
   does not mandate to comply with such requirement, but OPTIONALLY
   allows to do so.  Implementations subject to such requirement MUST
   include an explicit length indication,i.e. the LEN field, by setting
   the L bit in all cases.

   Note also that the order of the text samples and fragments in the RTP
   payload is important.  As described above in the definition of the
   RTP timestamp usage, these MUST be placed chronologically in the RTP
   payload, so that the SDUR field allows calculating the timestamp of
   the samples following.  At the same time, other samples MAY follow if
   the current and following samples contain each a length indication.
   Otherwise, the sample is either placed at the end or as unique RTP
   payload.  Fragments carrying modifier box contents and sample
   descriptions MAY be placed in any order (no timing requirements) and
   MAY be present as often as needed.  Modifier box fragments SHOULD be
   placed as close as possible to the text strings, which they belong
   to.


4.2 Payload Header Definitions

   An RTP packet using the payload headers defined in this document has
   the following format:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X| CC    |M|    PT       |        sequence number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Res.|U|L|TYPE |                                               |
      +-+-+-+-+-+-+-+-+                                               |
      :        (variable payload header depending on TYPE value)      :
      :                                                               :
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      : SAMPLE CONTENTS = Text Sample(s), Fragment(s), Sample         :
      : Description(s)                                                :
      :                                                               :
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Rey & Matsui                                               [Page 8]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   The payload headers specified in this document consist of a set of
   common fields followed by specific fields for each header type.

   The structure of the payload headers resembles that of the 'access
   units' in RFC 3640.  This similarity is intentional, in order to ease
   the transport using MPEG4 elementary streams.  In this manner, the
   'AU header' of that document finds an equivalent in the common header
   fields for all TYPE values: R, U, L, TYPE and LEN.  The specific
   fields plus the sample contents would be similar to the 'AU data
   section'.

   The payload header the following format:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R   |U|L|TYPE |             LEN (if L=1)      |   specific    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   fields (variable)...|
      +-+-+-+-+-+-+-+-+-+-+-+-+

   where,

   - R (3 bits) "Reserved bits": this field MUST be set to zero.

   - U (1 bit) "UTF Transformation": indicates whether the text
   characters are encoded using UTF-8 (U=0) or UTF-16 (U=1).  It MUST
   also be present in text sample fragments if they contain text
   strings.  If  the payload does not carry text strings, this bit has
   no meaning and SHALL be disregarded.

   - L (1 bit) "Length Field" flag: if set indicates the presence of the
   OPTIONAL length field, LEN.  By default this flag is reset.  Where
   several 'access units' are included in an RTP packet, the length MUST
   be used to parse the payload.  In this case, it SHALL be present in
   all 'access units' but the last one.  In some cases, as in RFC 3640
   [15], the length field is always needed, and so MUST L flag be always
   set.

   - LEN (16 bits) "Length Field": indicates the size (in bytes) of this
   field and all the bits following, i.e. specific payload header fields
   for each TYPE (excluding initial byte) plus the sample contents.  If
   present, LEN has the following values:

    - TYPE = 1, LEN >= 6,
    - TYPE = 2, LEN > 9,
    - TYPE = 3, LEN > 3,
    - TYPE = 4, LEN > 3,
    - TYPE = 5, LEN > 3.

   For whole text samples (TYPE=1), the sample contents length
   corresponds to the entry value in the Sample Size Box, "stsz" (see
   also SLEN below).  Otherwise the sample contents length MUST be

   Rey & Matsui                                               [Page 9]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   calculated when fragmenting the sample, in order to obtain the
   correct LEN value.

   Empty text samples do not have sample contents. For this case, TYPE 1
   with a LEN of 6 (0x0006) MUST be used.

   The following payload header compositions (including the specific
   fields) result for the different values of TYPE:

    - TYPE = 0,6 and 7 are reserved.
    - TYPE = 1,

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R   |U|L|TYPE |       LEN (if L=1, always >=6)|    SIDX       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      SDUR                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport whole text samples.  If several
   text samples are sent in an RTP packet, every sample has its own
   header.

   The LEN field MUST be always equal (for empty text samples) or
   greater than 6 (0x0006).

   The fields above have the following meaning:

    - SIDX (8 bits) "Text Sample Entry Index": indicates the reference
   index for the text sample, which corresponds to the index field in
   the Sample to Chunk Box, "stsc", for the sample.  This field is used
   to map the corresponding sample description information.  The SIDX is
   unequivocally linked to one particular sample description.
   Therefore, sample descriptions SHALL not be modified during a
   streaming session.  A maximum of 126 SIDX values is allowed per text
   stream.  To allow SIDX wrap-up, clients SHALL keep as valid only
   those values of SIDX outside of the interval (X+128) modulo 255,
   where X is the last SIDX value received.  The SIDX values 0 (0x00)
   and 255 (0xff) are reserved for possible future extensions.

    - SDUR (24 bits) "Text Sample Duration": indicates the sample
   duration in timestamp units of the text sample, which corresponds to
   the entry value in the Decoding Time to Sample Box, "stts", for that
   sample.  This field allows by a clockrate of 1000 Hz a maximum
   duration of approximately 279 hours (16 bits is would allow for just
   65 seconds, which might be too short for some streams).

   It is assumed that all text samples have a known duration at the time
   of transmission.  In some cases however, e.g. live streaming, the
   SDUR value might not be known.  To cover this exception, the value
   zero (0x000000) is reserved to signal unknown duration.  For all
   other cases SDUR MUST be different from 0x000000.

   Rey & Matsui                                              [Page 10]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004



   The ordering of 'access units' in the RTP payload is important.
   Logically, samples of unknown duration SHALL NOT precede any text
   samples or text sample fragments in the RTP payload.  Otherwise it
   would not be possible to find out the timestamp of these.  However,
   samples of unknown duration MAY precede sample descriptions, as they
   have no duration.

   In general, text sample contents expire when the next sample becomes
   valid.  As an exception, samples of unknown duration (SDUR=0x000000)
   are valid until new packets arrive.

   Note also, that samples of unknown duration SHALL NOT use features,
   such as scrolling or karaoke, which would need to know the duration
   of the sample up-front.

    - TYPE = 2,

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R   |U|L|TYPE |        LEN(if L=1, always >9) |    SIDX       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               SLEN            |            SDUR               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | SDUR (cont.)  |TOTAL  |  THIS |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport text sample fragments
   containing text strings.

   The LEN field (16 bits) has the same meaning as above. If present,
   the length of LEN MUST be greater than a value of nine (0x0009).

   The SLEN field (16 bits) indicates the size (in bytes) of the (whole)
   text sample to which this fragment belongs.  As seen above, the text
   sample length corresponds to the entry value in the Sample Size Box,
   "stsz".  Clients MAY use SLEN to buffer space for the remaining
   fragments of the text sample.

   The fields TOTAL (4 bits) and THIS (4 bits) indicate the total number
   of fragments in which the original text sample has been fragmented
   and which order occupies the current fragment in that sequence,
   respectively.  The usual 'byte offset' field is not used here for two
   reasons: a) it would take one more byte and b) it does not provide
   any useful information on the character offset.  UTF-8/16 text
   strings have, in general, a variable character length ranging from 1
   to 6 bytes.  Therefore, the TOTAL/THIS solution is preferred.

   The R, U, L, TYPE, SIDX, and SDUR fields have identical
   interpretation as above.  The U, SIDX and SDUR fields are useful
   since partial text strings MAY also be displayed with the
   corresponding decoration.

   Rey & Matsui                                              [Page 11]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004



   - TYPE = 3,

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R   |U|L|TYPE |        LEN(if L=1, always >3) |TOTAL  |  THIS |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport either whole modifier boxes or
   just the first fragment of these.  This depends on whether the
   modifier boxes fit into one RTP packet.

   In case fragmentation is needed, this header type identifies first
   fragment.  As explained above, the rules for fragmentation require
   that the start of the modifier boxes be signaled.

   The R, U, L, TOTAL/THIS and LEN fields are used as above.  If present,
   the LEN field MUST be greater than three (0x0003).

   Note that the SLEN, SIDX and SDUR fields are not present.  This is
   because: a) these fragments do not contain text strings and b) these
   types of fragments are applied over text string fragments, which
   already contain this information.

   - TYPE = 4,

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R   |U|L|TYPE |        LEN(if L=1, always >3) |TOTAL  |  THIS |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport modifier fragments, other than
   the first one.

   The R, U, L, TOTAL/THIS and LEN fields are used as above.  If present,
   the LEN field MUST be greater than three (0x0003).

   Note that the SLEN, SIDX and SDUR fields are not present.  This is
   because: a) these fragments do not contain text strings and b) these
   types of fragments are applied over text string fragments, which
   already contain this information.


   - TYPE = 5

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R   |U|L|TYPE |      LEN(if L=1, always >3)   |   SIDX        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Rey & Matsui                                              [Page 12]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   This header type is used to transport sample descriptions.  The L
   flag MUST be always set.  Consequently, the LEN field MUST be greater
   than three (0x0003).  Every sample description MUST have its own TYPE
   5 header.


5. Resilient Transport

   Apart from the basic fragmentation measures described in the section
   above, the simplest option for packet loss resilient transport is to
   send the same text samples (or fragments) again, i.e. repetition.  A
   server MAY decide to send the same contents again as a measure for
   error resilience.

   Repetition of text samples (or fragments) is only allowed if exactly
   the same RTP payloads are sent, so that a receiver can use original
   and repeated fragments together to reconstruct the text samples.  If
   a text sample was originally sent as non-fragmented text sample, a
   repetition of that sample MUST be sent also as a single non-
   fragmented text sample.  Likewise, if the original text sample was
   fragmented and spread over several RTP packets, repeated fragments
   SHALL also be observe the same byte boundaries and use the same
   headers and bytes per fragment.  Finally, if several text samples
   resolve to the same timestamp, the receiver SHOULD use the one
   received in the RTP packet with the highest sequence number and
   discard the rest.

   In repeated text samples (or fragments), all RTP header fields MUST
   keep their original values except the sequence number that MUST be
   increased to comply with RTP.


6. Congestion control

   The RTP profile under which this payload format is used defines an
   appropriate congestion control mechanism in different environments.
   Following the rules under the profile, an RTP application can
   determine its acceptable bitrate and packet rate in order to be fair
   to other TCP or RTP flows.

   If an RTP application using this payload format uses retransmission,
   the acceptable packet rate and bitrate includes both the original and
   retransmitted data.  This guarantees that an application using
   retransmission achieves the same fairness as one that does not.  Such
   a rule may translate in practice into the following actions:

   If enhanced service is used, it should be made sure that the total
   bitrate and packet rate do not exceed that of the requested service.
   It should be further monitored that the requested services are
   actually delivered.  In a best-effort environment, the sender SHOULD
   NOT send retransmission packets without ensuring first that enough
   bandwidth for retransmission is available.  Other solutions like


   Rey & Matsui                                              [Page 13]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   reducing the packet rate and bitrate of the original stream (for
   example by encoding the data at a lower rate) MAY be used.

   Similar considerations apply, if an RTP application using this
   payload format implements forward error correction, FEC [4].  Hereby,
   the sender should take care that the amount of FEC does not actually
   worsen the problem.

   Therefore, it is RECOMMENDED that applications implementing this
   payload format also implement congestion control.  The actual
   mechanism for congestion control is out of the scope of this document
   but should be suitable for real-time flows.  As an example, RFC 3448
   [11] specifies an equation-based congestion control that fulfils this
   requirement.


7. SMIL usage

   The SMIL recommendation [9] specifies a means for synchronising
   different media streams.

   This payload format defines the spatial layout parameters for a timed
   text stream.  These specify the location of the text display area
   relative to the top left corner of the video display area, when a
   text stream is played with a single video stream without SMIL.  In
   cases where several media streams shall be synchronized, SMIL MAY be
   used to specify the spatial layout parameters.

   It shall be noted that even if SMIL scene description is used the
   track header information pieces SHOULD be sent anyway as they
   represent the intrinsic media properties.


8. MIME Type usage Registration

8.1 3GPP Timed Text MIME Registration

   MIME type: video

   MIME subtype: 3gpp-tt

   Required parameters:

        rate: the RTP timestamp clockrate is equal to the clockrate of
        the media.  The value timestamp clockrate is copied directly
        from the 3GP file: the value of "timescale" in the Media Header
        Box is to be used.

        brand=<brand-name>, where <brand-name> identifies a Release
        specification of 3GPP Timed Text being transmitted over RTP.  A
        brand indicates the "best use" of the contents: the brand value
        "3gp5" indicates Release 5 of 3GPP Technical Specification (TS)
        26.245 [1].

   Rey & Matsui                                              [Page 14]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004



        spldesc=<flag>, where <flag> may take three different values:

        - spldesc=in, for in-band transmission of sample descriptions
        - spldesc=out, for out-of-band and,
        - spldesc=both, when both methods are allowed.

        tx3g=<base64-value-1>, <base64-value-2>,... where <base64-value-
        i> represents a list of sample description entries using base64
        encoding.  This parameter MAY be used to convey sample
        descriptions out-of-band.  The list of sample entries is not
        required to follow any particular order.  Each value <base64-
        value-i> represents the concatenation of the SIDX and sample
        descriptions contents for that SIDX.  The LEN field is not
        needed.

        width=<integer-value> indicates the width in pixels of the text
        track or area where the text is actually displayed.

        height=<integer-value> indicates the height in pixels of the
        text track.

        tx=<integer-value>, indicates the horizontal translation offset
        in pixels of the text track with respect to the origin of the
        video track.

        ty=<integer-value>, indicates the vertical translation offset in
        pixels of the text track.

        layer=<integer-value>, indicates the proximity of the text track
        to the viewer.  Higher values means closer to the viewer.  This
        parameter has no units.

   Optional parameters:

        mver=<version-value>, "Minor version" where <version-value> is a
        positive integer.  It identifies the oldest compatible version.
        How the version is defined can be found in TS 26.245 "3GPP
        Transparent end-to-end packet switched streaming service (PSS);
        Timed Text Format (Release 6)".

        cbrand=<value1,value2,...>, "List of Compatible Brands" where
        value1 is a brand.  This list MUST at least contain the <brand-
        name> as in "brand".


   Encoding considerations: this type is only defined for transfer via
   RTP.

   Security considerations: please refer to Section 12 of RFCXXXX.

   Interoperability considerations: the 3GPP Timed Text media format is
   specified in 3GPP Release 5 version of TS 26.245 "Transparent end-to-

   Rey & Matsui                                              [Page 15]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   end packet switched streaming service (PSS); Timed Text Format
   (Release 6)".  In later releases, 3GPP may specify extensions or
   updates to the media format in a backwards-compatible way, e.g. new
   modifier boxes or extensions to the sample descriptions.  The payload
   format RFCXXXX allows for such extensions.  For future 3GPP Releases
   of the Timed Text Format, the therein parameters "brand", "mver" and
   "cbrand" are used to identify the current version of the media
   format, the oldest compatible version and a list of compatible
   versions.

   Published specification: RFC XXXX

   Applications which use this media type: multimedia streaming
   applications.

   Additional information: the 3GPP Timed Text media format is specified
   in 3GPP TS 26.245 "Transparent end-to-end packet switched streaming
   service (PSS); Timed Text Format (Release 6)".  This document and
   future extensions to the 3GPP Timed Text format are publicly
   available at http://www.3gpp.org.

   Person & email address to contact for further information:
   rey@panasonic.de
   matsui.yoshinori@jp.panasonic.com

   Intended usage: COMMON

   Author/Change controller:
   Jose Rey
   Yoshinori Matsui
   IETF AVT WG


9. SDP usage

   This document defines the MIME subtype name "3gpp-tt" and introduces
   several REQUIRED payload-format-specific parameters: "brand",
   "width", "height", "tx", "ty", "layer", "spldesc" and "tx3g" and two
   OPTIONAL parameters "mver" and "cbrand".

9.1 Mapping to SDP

   The information carried in the MIME media type specification has a
   specific mapping to fields in SDP [4], which is commonly used to
   describe RTP sessions.  When SDP is used to specify transmission
   using this payload format, the mapping is done as follows:

   -  The MIME type ("video") goes in the SDP "m=" as the media name.
   The "video" MIME Type is used as timed text is considered visual
   media.

   -  The MIME subtype ("3gpp-tt") goes in SDP "a=rtpmap" as the
   encoding name.  The value timestamp clockrate is copied directly from

   Rey & Matsui                                              [Page 16]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   the 3GP file, the value of "timescale" in the Media Header Box is to
   be used.  Other values MAY be specified by out-of-band means.

   -  The REQUIRED payload-format-specific parameters "brand", "width",
   "height", "tx", "ty", "layer", "tx3g" and "spldesc" go in the SDP
   "a=fmtp" as a semicolon separated list of parameter=<value> (or
   parameter=<value1,value2,value3> for "tx3g") pairs.

   -  The OPTIONAL payload-format-specific parameters "mver", "cbrand"
   go in the SDP "a=fmtp" as a semicolon-separated list of
   parameter=<value> pairs.

   -  Any remaining parameters go in the SDP "a=fmtp" attribute by
   copying them directly from the MIME media type string as a semicolon
   separated list of parameter=value pairs.

   In the following sections some example SDP descriptions are
   presented.


10. Examples of RTP packet structure

   In this section, some examples of RTP packet structure are explained
   for better understanding of this payload format.  The wrap-around of
   the long lines is indicated by the backslash character "\".  The
   examples assume aggregate control of stream container files.  The
   session descriptions are not complete but limited to the example
   purposes.


10.1 An RTP packet containing multiple text samples
   <TODO>


11. IANA Considerations

   This document introduces the MIME subtype name "3gpp-tt" in Section
   8.


12. Security considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [3].  This implies that confidentiality of the media
   streams is achieved by encryption.

   Furthermore, the main security issues are confidentiality and
   authentication of the text itself.  The payload format itself does
   not have any support for security.  These issues have to be solved by
   a payload external mechanism, e.g. SRTP [10].



   Rey & Matsui                                              [Page 17]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


13. References

13.1 Normative References

   1 3GPP, "Transparent end-to-end packet switched streaming service
     (PSS); Timed Text Format (Release 6)", TS 26.245 v 0.1.6, Working
     Draft, July 2003.

   2 ISO/IEC 14496-1:2001/AMD5, "Information technology û Coding of
     audio-visual objects û Part 1: Systems, ISO Base Media File
     Format", 2003.

   3 H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A
     Transport Protocol for Real-Time Applications", RFC 3550, July
     2003.

   4 M. Handley, V. Jacobson, "SDP: Session Description Protocol", RFC
     2327, April 1998.

   5 S. Bradner, "Key words for use in RFCs to indicate requirement
     levels," BCP 14, RFC 2119, IETF, March 1997.

13.2 Informative References

   6 C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C.
     Bolot, A. Vega-Garcia, S. Fosse-Parisis, "RTP Payload for
     Redundant Audio Data", September 1997.

   7 J. Rosenberg, H. Schulzrinne, "An RTP Payload Format for Generic
     Forward Error Correction", RFC 2733, December 1999.

   8 C. Perkins, O. Hodson, "Options for Repair of Streaming Media",
     RFC 2354, June 1998.

   9 W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)",
     August, 2001.

   10 M. Baugher, D. A. McGrew, D. Oran, R. Blom, E. Carrara, M.
     Naslund, K. Norrman, "The Secure Real-Time Transport Protocol",
     draft-ietf-avt-srtp-05.txt, June 2002.

   11 Handley, et al., "TCP Friendly Rate Control (TFRC): Protocol
     Specification ", RFC 3448, January 2003.

   12 R. Hovey, S. Bradner, "The Organizations involved in the IETF
     Standards Process", BCP 11, RFC 2028, October 1996.

   13 J. Rey et al., "RTP Retransmission Payload Format", draft-ietf-
     avt-rtp-retransmission-10.txt, work in progress, January 2004.

   14 M. Handley, C. Perkins, "Guidelines for Writers of RTP Payload
     Format Specifications", RFC 2367, December 1999.


   Rey & Matsui                                              [Page 18]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   15 Van der Meer et al., "RTP Payload Format for Transport of MPEG-4
     Elementary Streams ", RFC3640, November 2003.


14. Annex A Basics of 3GP File Structure

   Each 3GP file consists of "Boxes".  Boxes start with a header which
   indicates both size and type contained.  The 3GP file contains the
   File Type Box (ftyp), the Movie Box (moov), and the Media Data Box
   (mdat).  The Movie Box and the Media Data Box, serving as containers,
   include own boxes for each media.  Similarly, each box type may
   include a number of boxes, see ISO Base Media file Format [2] for a
   complete list of possibilities.

   In the following, only those boxes are mentioned, which are useful
   for the purposes of this payload format.

   The File Type Box identifies the type and properties of a 3GP file.
   The File Type Box contents comprise the major brand, the minor
   version and the compatible brands.  These are communicated via out-
   of-band means, such as SDP, when streamed with RTP.  For the 3GPP
   timed text file format, the set of compatible-brands MUST include
   "3gp5".

   The Movie Box contains one or more Track Boxes (trak) which include
   information about each track.  A Track Box contains, among others,
   the Track Header Box (tkhd), the Media Header Box (mdhd) and the
   Media Information Box (minf).  The Media Header Box contains the
   timescale or number of time units that pass in one second.  The Media
   Information Box includes the Sample Table Box (stbl) which itself
   contains the Sample Description Box (stsd), the Decoding Time to
   Sample Box (stts), the Sample Size Box (stsz) and the Sample to Chunk
   Box (stsc).  Sample descriptions for each text sample are encoded as
   "tx3g" sample entries in the Sample Description Box (stsd).

   The Track Header Box specifies the characteristics of a single track,
   where a track is, in this case, the streamed text during a session.
   Exactly one Track Header Box is needed for a track.  It contains
   information about the track, such as the spatial layout (width and
   height), the video transformation matrix and the layer number.  Since
   these pieces of information are essential and static, i.e. constant
   for the duration of the session, they MUST be sent prior to the
   transmission of any text samples.  See the ISO base media file format
   [2] for details about the definition of the conveyed information.

   When using scene description in SMIL [9], it is possible to specify
   the layer and the position of the text track.  However, in this case,
   the transmission of the Track Header Box (tkhd) is still RECOMMENDED,
   as the intrinsic track information is specified there.  Otherwise,
   the Track Header Box information MUST be sent prior to the start of
   the text streaming.



   Rey & Matsui                                              [Page 19]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004


   The Sample Table Box (stbl) contains all the time and data indexing
   of the media samples in a track.  Using the tables here, it is
   possible to locate samples in time, determine their type, and
   determine their size, container, and offset into that container.
   From the Sample Table Box (stbl) the following information is carried
   in each RTP packet using this payload format: the Sample Description
   Box (stsd), the Decoding Time to Sample Box (stts), the Sample Size
   Box (stsz) and the Sample to Chunk Box (stsc).  The Decoding Time to
   Sample Box (stts) is mapped to the field SDUR (Text Sample Duration);
   the Sample Size Box (stsz) is mapped the field SLEN (Text Sample
   Length) and the Sample to Chunk Box is mapped to the field SIDX (Text
   Sample Entry Index).  The Sample to Chunk Box (stsc) associates the
   text sample and its corresponding sample description entry in the
   Sample Description Box (stsd, see below).  The Sample to Chunk Box
   can be used to associate a text sample with a sample description
   entry.  Since the sample description may vary during the session, the
   association SDIX must be sent together with the text samples using
   this payload format.

   The Sample Description Box (stsd) provides information on the basic
   characteristics of text samples.  Each entry is a sample entry box of
   type "tx3g".  An example of the information contained in a sample
   entry could be the font size or the background colour.  Since these
   pieces of information are commonly used by many text samples during
   the session, it is sent by out-of-bands means.  A complete list of
   text characteristics can be found in [1].

   Finally, the Media Data Box contains the media data itself.  In 3GPP
   timed text tracks this box contains text samples.  Its equivalent to
   audio and video is audio and video frames, respectively.  The text
   sample consists of the text length, the text string, and one or
   several Modifier Boxes.  The text length is the size of the text in
   bytes.  The text string is plain text to render.  The Modifier Box is
   information to render in addition to the text such as colour, font,
   etc.


15. Author's Addresses

   Jose Rey                                     rey@panasonic.de
   Panasonic European Laboratories GmbH
   Monzastr. 4c
   D-63225 Langen, Germany
   Phone: +49-6103-766-134
   Fax:   +49-6103-766-166

   Yoshinori Matsui             matsui.yoshinori@jp.panasonic.com
   Matsushita Electric Industrial Co., LTD.
   1006 Kadoma
   Kadoma-shi, Osaka, Japan
   Phone: +81 6 6900 9689
   Fax: +81 6 6900 9699


   Rey & Matsui                                              [Page 20]


   Internet Draft  RTP Payload Format for 3GPP Timed Text February 2004



16. IPR Notices

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights.  Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP 11 [12].  Copies
   of claims of rights made available for publication and any assurances
   of licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementers or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard.  Please address the information to the IETF Executive
   Director.


17. Full Copyright Statement

   "Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the  purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will
   not be revoked by the Internet Society or its successors or
   assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."


   Rey & Matsui                                              [Page 21]