Internet Draft
   draft-ietf-avt-rtp-3gpp-timed-text-                             J. Rey
   00.txt                                                       Y. Matsui
                                                               Matsushita

   Expires: October 5, 2004                                 April 5, 2004


                  RTP Payload Format for 3GPP Timed Text

   Status of this document

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.

   IPR Disclosure Agreement

   By submitting this Internet-Draft, I certify that any applicable
   patent or other IPR claims of which I am aware have been disclosed,
   and any of which I become aware will be disclosed, in accordance with
   RFC 3668.

   Copyright Notice

     Copyright (C) The Internet Society (2004).  All Rights Reserved.


   Abstract

   This document specifies an RTP payload format for the transmission of
   3GPP (3rd Generation Partnership Project) timed text.  3GPP timed
   text is a time-lined decorated text media format with defined storage
   in a 3GP file.  Timed Text can be synchronised with audio/video
   contents.  As of today, 3GP files containing timed text contents can
   only be downloaded via HTTP.  There is no available mechanism for
   streaming 3GPP timed text contents neither out of 3GP files nor
   directly from live content.  In the following sections the problems


                 IETF draft - Expires October 5, 2004         [Page 1]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   of streaming timed text are addressed and a payload format for
   streaming 3GPP timed text over RTP is specified.


Table of Contents

   1. Terminology.....................................................3
   2. Introduction....................................................5
   3. RTP Payload Format for 3GPP Timed Text..........................8
   4. Resilient Transport............................................19
   5. Congestion control.............................................20
   6. Scene Description..............................................21
   7. MIME Type usage Registration...................................22
   8. SDP usage......................................................24
   9. Examples of RTP packet structure...............................26
   10. IANA Considerations...........................................27
   11. Security considerations.......................................27
   12. References....................................................27
   13. Annexes.......................................................29
   14. Acknowledgements..............................................32
   15. Author's Addresses............................................32
   16. IPR Notices...................................................32
   17. Full Copyright Statement......................................32
   18. Acknowledgement...............................................33


   [Note to the RFC Editor: please delete the Change Log section upon
   publication of this document as RFC]
   [Note to the RFC Editor: please replace "RFCXXXX" with the RFC
   designation of this document when published]

   Change Log

   Changes from draft-rey-avt-rtp-3gpp-timed-text-00

   Major changes:
   - completed empty sections from -00 draft.
   - abstract and introduction re-arranged. Moved section "Basics of the
   3GP File Structure" to end of the document as Annex B.
   - SLEN, SIDX and SDUR lengths fixed to 16, 16 and 24 bits,
   respectively.
   - New OPTIONAL header, SPLDESC, added to transport sample description
   in-band.
   - Section 4 on payload format expanded: text header, fragment header
   and sample description header are fully specified.
    - SMIL usage section added.



   Rey, et al.                                                [Page 2]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   Changes from draft-rey-avt-rtp-3gpp-timed-text-01

   Major changes:

   - Terminology, some terms introduced to clarify text.
   - Section 4
    - rules and recommendations on fragmentation are given.
    - payload headers were classified into five types, with a common
   field section and specific fields for each type.
    - header structure similar to RFC 3640 for easy transformation.


   Changes from draft-rey-avt-rtp-3gpp-timed-text-02

   Major changes:

   - IPR Disclosure Agreement added to boilerplate, IPR Notices and
   Copyright Statement modified as per BCP 78.

   - SIDX usage re-defined.

   - "spldesc" parameter semantics lightly changed.

   - LEN field made MANDATORY, therefore TYPE header 2 rearranged to
   ease processing in 32-bit machines.

   - clarify that TYPE 5 SHOULD be implemented and, at least, a receiver
   MUST be able to discard it, if not implemented.

   - some guidelines on the clockrate for live streaming and within 3GP
   files.

   - Offer/Answer section

   - Extended glossary in the Terminology section

   - new fmtp parameter, "version", to indicate compliance to a
   particular version of 3GPP Timed Text specification.


1. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [5].

   Furthermore, the following terms are used and have specific meaning
   within the context of this document:

   text sample or whole text sample:



   Rey, et al.                                                [Page 3]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


        this refers to a unit of timed text data as contained in the
        source 3GP file.  Its equivalent in audio/video would be a
        frame.  A text sample contains text strings followed by zero or
        more modifier boxes.


   fragment or text sample fragment:

        a fraction of a text sample.  A fragment may contain either text
        strings or modifier (decoration) contents, but not both at the
        same time.


   sample contents:

        general term to identify timed text data transported when using
        this payload format.


   text strings:

        text strings is the term used to denote the concatenation of a
        16 bit byte count value, followed by a 16 bit byte order mark
        (0xFEFF) if UTF-16 encoding is used, and the actual text
        characters encoded either as UTF-8 or UTF-16.


   decoration/modifiers:

        the terms "decoration" and "modifiers" are used interchangeably
        throughout the document to denote the contents of the text
        sample that modify the default text formatting, given by the
        corresponding sample description.  Modifiers may, for example,
        specify different font size for a particular sequence of
        characters or define karaoke timing for the sample.


   sample description:

        this term is used to denote information that applies to a text
        sample as a whole, i.e. scroll, text box position, delay,
        default font, background colour, etc.  This information may also
        apply to different text samples.


   units or access units:

        the payload headers specified in this document allow to
        encapsulate text samples, fragments thereof and sample
        descriptions by prepending a specific payload header, building
        what is called a unit.



   Rey, et al.                                                [Page 4]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   aggregation / aggregate packet

        An aggregate RTP packet consists of several units.


   track / stream

        3GP files contain audio/video and text tracks.  This document
        enables to 'stream' these tracks using RTP.  Therefore both
        terms may be exchanged in this document in the context of 3GP
        files.


   Media Header Box / Track Header Box

        the 3GP file format makes use of structure such as these boxes
        defined in the ISO Base File Format [2].  When referring to
        these in this document, initials are capitalised for clarity.


2. Introduction

   3GPP timed text is a media format for time-lined decorated text
   specified in [1].  3GPP Timed text contents may be stored in 3GP
   files or may be generated in real time.  The 3GP file format itself
   is based on the ISO Base Media File Format recommendation [2].
   Section 13.2 gives some insight in the 3GP file structure.

   The purpose of this draft is to provide a means to stream 3GPP timed
   text contents using RTP.  This includes the streaming of timed text
   being read out of a 3GP file as well as the streaming of timed text
   generated in real time, a.k.a. live streaming.

2.1 General Overview of the 3GPP Timed Text format

   The 3GPP timed text format was developed for use in the services
   specified in the 3GPP Transparent End-to-end Packet-switched
   Streaming Services (3GPP PSS, for short) [18].  Besides plain text,
   the 3GPP timed text format allows the display of decorated text (e.g.
   karaoke, scrolling, hyperlinks) synchronised or not with other media,
   like audio or video.

   The scope of the 3GPP PSS includes both downloading and streaming of
   multimedia content over 3G packet-switched networks.  However, due to
   the lack of an appropriate RTP payload format, the current usage of
   the 3GPP timed text file format is limited to downloading via HTTP.

   The 3GPP PSS adopts multimedia codecs (such as MPEG-4 Visual, AMR,
   MPEG-4 AAC, and JPEG) and protocols like SMIL [9] for presentation
   layouts or RTP [3] for streaming.  In general, a multimedia
   presentation might consist of several audio/video/text streams (or
   tracks in ISO file format jargon).  Different streams may have
   different contents.  The media may be spatially synchronised either

   Rey, et al.                                                [Page 5]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   using the information within the streams or a scene description
   language like SMIL.  An example of this would be a media session with
   three different media streams: 1 audio, 1 video and 1 timed text that
   reproduces a music video with karaoke subtitles.  For each stream
   some information is needed, which defines the regions where each
   media is displayed, how the media looks like and how it is
   synchronised, etc,...In karaoke, for example, the song lyrics are
   displayed below the music video and the words are highlighted in
   synchronisation with the music track.

   For the purpose of streaming 3GPP timed text four differentiated
   functional components might be identified:

        - initial spatial layout information related to the text track:
        these are the height and width of the text region where text is
        displayed, the position of the text region in the display and
        the layer or proximity of the text to the user.  These pieces of
        information are contained in the Track Header Box.  Sections 6.1
        and 13 provide further details.

        - default settings for formatting and positioning the text:
        default style (font, size, colour,...), default background
        colour, default horizontal and vertical justification, default
        line width, default scrolling, etcetera.  Sample descriptions
        contain such default settings.

        - the actual text: encoded characters using either UTF-8 or UTF-
        16 encoding and,

        - the decoration inside the modifier boxes.  Whether some
        characters have different style, some delay, blink, etcetera...
        needs to be indicated by appending the modifier boxes to the
        text strings.  Modifier boxes are only present in the text
        samples if they are needed.  Otherwise, the default settings in
        the corresponding sample description apply.  At the time of
        writing this payload format the following decorations or
        modifications are specified in the 3GPP timed text media format
        [1]:

          - text highlight,
          - highlight colour,
          - blinking text,
          - karaoke feature,
          - hyperlink,
          - text delay,
          - text style and,
          - positioning of the text box and,
          - text wrap indication.

   Section 13.3 specifies how certain values in the 3GP file are mapped
   to the corresponding fields of this payload format.  For live
   streaming appropriate values using the same formats and units shall
   be used, as specified in each case.

   Rey, et al.                                                [Page 6]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004



   For further details on the Timed Text media format, refer to [1],
   where a more detailed description of the above functional components
   is given.


2.2 Requirements for a timed text payload format

   In this section a set of requirements is listed.  A justification for
   each of them is also given.  An RTP Payload Format for 3GPP timed
   text SHALL:

        1.  Keep the 3GP text sample structure.  A text sample consists
   the text strings and zero or more modifier boxes.  This is important
   to foster interoperability of 3GP file and RTP payload formats.  This
   also maximizes the re-use of existing mechanisms.

        2.  Transmit the text sample size, sample duration and sample
   description index in-band.  In RTP it is important to transmit it in-
   band because this information might change from sample to sample.
   This is also important for buffering purposes as described in Section
   3.1.1.

        3.  Enable the transmission of the sample descriptions both by
   out-of-band and in-band means.  In general, a single sample
   description may be used by different text samples.  Therefore, to
   save overhead it is sensible to transmit a default formatting once at
   the initialisation phase and update this on demand.  These pieces of
   information may become large so that out-of-band transmission might
   not be the most appropriate method.  Also, out-of-band channels might
   not be always available.  For these reasons, the payload format SHALL
   enable also the in-band transmission of sample description
   information.  This is especially useful for live streaming (where
   contents are not known a priori).

        4.  Enable the aggregation of units into an RTP packet.  In a
   mobile communication environment a typical text sample size is around
   100-200 bytes.  Thus, transporting several units in one RTP packet
   makes the transport more efficient.

        5.  Enable the fragmentation and reassembly of a text sample
   into several RTP packets in order to cover a wide range of
   applications and network environments.  In general, fragmentation is
   a rare event given the low bit rates and text sample sizes.  However,
   the 3GPP Timed Text media format allows for larger text samples and
   so SHALL the payload format cover this possibility.

        6.  Enable the use of resilient transport mechanisms, such as
   repetition, retransmissions and FEC.  Additional mechanisms like FEC
   [7] or retransmission [13] can be used to protect the information.
   RFC 2354 [8] discusses available mechanisms for packet loss
   resiliency.


   Rey, et al.                                                [Page 7]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004



3. RTP Payload Format for 3GPP Timed Text

   The format of an RTP packet containing 3GPP timed text is shown
   below:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X| CC    |M|    PT       |        sequence number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                      RTP payload                              |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Marker bit (M): the marker bit must be set to 1 if the RTP packet
   includes one or more whole text samples or the last fragment of a
   text sample; otherwise set to 0.

   Timestamp: the timestamp MUST indicate the sampling instant of the
   earliest (or unique) text sample contained in the RTP packet.  The
   initial value MUST be randomly determined.  Text samples MUST be
   placed in play-out order, i.e. earliest first in the payload.  The
   timestamp of the subsequent samples (or fragments thereof) MUST be
   obtained by adding the timed text sample duration of subsequent
   samples to the RTP timestamp value.

        For example, let sdur(0), sdur(1) and sdur(2) be the durations
        of three subsequent timed text samples included in an RTP
        packet.  Let rtpts be the timestamp as present in the RTP
        header.  The timestamp ts(i) for each sample (i=0,1,2) would be:

                ts(i)=rtpts + sum[sdur (i-1)];

                   ts(0)=rtpts,
                   ts(1)=rtpts +  sdur(0)
                   ts(2)=rtpts + (sdur(0)+ sdur(1))

   Some text samples may become large and have to be fragmented and so
   spread over several RTP packets.  In this case, the receiver needs to
   associate fragments of the same text sample.  This is done using the
   timestamp.  The order of the fragments is resolved using the fields
   available in the payload headers defined in this document.

   The timestamp clockrate does not match the sampling rate, as it is
   usual in other media like audio or video.



   Rey, et al.                                                [Page 8]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   If the case timed text is streamed from a 3GP file, the value
   timestamp clockrate MUST be copied directly from the value of
   "timescale" in the Media Header Box for that text track.  Note that
   each track in a 3GP file MAY have its own clockrate specified in the
   Media Header Box.

   For live streaming an appropriate timestamp clockrate shall be used.
   A default value of 1000 Hz is RECOMMENDED. This value should provide
   enough timing resolution for synchronizing text with other media and
   expressing the duration of text samples.  Other clockrates MAY be
   used.

   Timed text does not mandate any sampling rate, but it is the real
   time encoder SHALL choose an appropriate sampling rate such that the
   text samples meet the scenario constraints.  E.g. samples may be
   tailored to match the packet MTU as close as possible or to provide a
   given redundancy for the available bit rate.  The encoding
   application MUST also take into account the delay constraints of the
   real-time session and assess whether FEC, retransmission or other
   similar techniques are reasonable options for repair.

   The following example shall illustrate how a real-time encoder may
   choose its settings:

        Imagine a news speaker scenario where the news are transcribed
        synchronised with the image of the reporter and the headlines
        images in the background.  Assuming that a person can read an
        average of 4-6 words per second, at an average word length of 5
        characters plus one space per word, an available IP MTU of 576
        bytes, characters are encoded using 2-bytes, no modifiers are
        used and a rate of ~576*8bits per second=4.6Kbps is available, a
        text sample covering 60 seconds of text would theoretically be
        optimum: IP/UDP/RTP+(text sample)=20+8+18 (12+6, TYPE 1 header)
        + ~250*2= ~546 bytes<576 bytes.  However, a delay of sixty
        seconds might be too much and just one packet per sample too low
        of a redundancy.  For real time communications the allowed delay
        is typically a few seconds (e.g. 3s).  Thus, the encoder could
        sample text every 1s (yielding RTP payloads of ~14-18 bytes),
        encapsulate the current and last two samples in every RTP packet
        (accounting to an IP packet size of 98 bytes) and send the
        packet six times, thus exhausting the available bit rate and
        increasing packet loss resilience.  These examples illustrate
        how the encoding application shall adapt to the scenario
        constraints.

   Payload Type (PT): the payload type is set dynamically and sent by
   out-of-band means.

   The usage of the remaining RTP header fields follows the rules of RTP
   [3] and the profile in use.




   Rey, et al.                                                [Page 9]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


3.1 General Remarks

   Before going into the details of the payload headers, some general
   observations are made in this section.  These should help the reader
   in understanding the design decisions.

3.1.1 Character Counting

   Timed text samples may have to be fragmented.  Although some of these
   fragments might be affected by packet losses, this does not prevent
   the remaining text fragments (or parts of them) from being displayed
   (this is explained further below).

   This payload format does not enable a receiver to find out the exact
   number of text characters lost.  The reason for this is that UTF-8/16
   encodings yield a variable number of bytes per character, and so the
   fragment size does not help in finding the number of lost characters.

3.1.2 Fragmentation of Timed Text Samples

   This section justifies why text samples may have to be fragmented and
   discusses some of the possible approaches to do it.  A solution is
   proposed together with rules and recommendations for fragmenting and
   transporting text samples using this payload format.

   3GPP Timed Text applications are expected to operate at low bit rates.
   This fact added to the small size of timed text samples (typically
   one or two hundred bytes) makes fragmentation of text samples a rare
   event.  Samples should usually fit into the MTU size of the used
   network path.

   Nevertheless, some text strings (e.g. ending roll in a movie) and
   some modifier boxes, i.e. for hyperlinks, for karaoke or for styles
   might become large and need fragmentation.  This may also apply for
   future modifier boxes.  While the text string is recommended in [1]
   to take a maximum of 2048 bytes for maximum client interoperability,
   there is no recommendation on the amount of space occupied by
   modifier boxes.

   In order to transport these larger text samples using RTP, it could
   be argued that a careful encoding be used to transform the original
   large sample into smaller self-contained text samples that fit into
   the given transport MTU.  This would comply with the ALF principle,
   as per RFC 2367 [14].  It would also need additional pre-processing
   previous to RTP encapsulation and that servers understand the
   modifiers format.  Given the low probability of fragmentation, it is
   believed that the overhead of this pre-processing is not worth and it
   is more appropriate to encode text samples without taking the path
   MTU into account.  In this manner, this payload format meets a trade-
   off by intentionally leaving out this pre-processing and making the
   fragmented samples less robust to packet losses.



   Rey, et al.                                               [Page 10]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   Nonetheless, a minimum set of fragmentation rules and recommendations
   SHALL be observed to guarantee a minimum resiliency and guide in the
   task of fragmentation:

   . whenever possible, whole text samples SHOULD be aggregated into
     RTP packets, using the payload headers defined in this document.
     This increases transport efficiency.

   . since fragmentation cannot be avoided in all cases, it is
     RECOMMENDED that text samples are fragmented as seldom as possible.
     As an example, if a packet has some free space, which would fit
     only a small part of the next text sample, a new RTP packet SHOULD
     be sent, instead of sending two or more fragments out of the
     sample.  This reduces complexity by minimising the number of
     fragments.

   . in order to fill up the remaining bits of a packet, piggybacking
     of sample descriptions MAY be performed.  Also fragments of past
     samples MAY be piggybacked.  For this purpose the server MAY
     reserve a certain amount of buffer to store sent units for
     piggybacking.  Details in Section 3.2.

   . text strings MUST split at character boundaries.  Otherwise, it is
     not possible to display the text of a fragment if the previous was
     lost.

   . sample descriptions SHALL NOT be fragmented, since they contain
     important information that may affect several text samples.

   . unlike text strings, the modifier boxes are NOT REQUIRED to split
     at meaningful boundaries, nor there is a possibility to apply
     partial modifier contents to the text strings.  Note that enabling
     this would require that: a) senders understand the semantics of
     the modifier boxes and b) specific fragment headers for each of
     the modifier boxes are defined.  This is considered not worth
     given the low probability of fragmentation.

   . as a consequence of the above, the modifier fragments are only
     useful if all of them are received.  Therefore, for enhanced
     resiliency against packet loss it is RECOMMENDED that fragments
     containing decoration be especially protected using FEC [7],
     retransmission [13], packet repetition or an equivalent technique.
     Similarly, these techniques MAY also be applied to text strings
     and sample descriptions.

   . furthermore, when fragmenting samples containing modifiers, the
     start of the modifiers MUST be indicated using the payload header
     defined for that purpose, i.e. a new TYPE 3 unit MUST be defined
     (see below).  Otherwise, if packets are lost, a client may be
     unable to identify where the modifiers start and the text ends.




   Rey, et al.                                               [Page 11]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


3.1.3 On the length indication in the units

   Usually, RTP applications use the information on packet size from UDP
   or lower layers to find out the length of the RTP payload.  This
   payload format does not use this information but includes an explicit
   length indication for each unit in the payload.

   While this is technically not needed for every unit (those placed
   last in the payload could leave it out) it is considered that the
   overhead added is minimum and the overall complexity remains low.

   At the same time, this design choice allows easy interoperability
   with the RTP Payload Format for Transport of MPEG-4 Elementary
   Streams, RFC 3640 [15], which does require an explicit length
   indication for each unit (see AU-header in [15]).

3.1.4 On the ordering and interleaving of units in aggregate payloads

   As stated in the timestamp definition, the order of the units in an
   aggregate payload is important.  In general, older units MUST precede
   newer ones.

   However, not all units are provided with timing attributes: units
   containing sample descriptions (TYPE 5) or modifier fragments (TYPE
   3&4) lack these.  Therefore, relaxed ordering constraints as follow
   apply:

        . units containing sample descriptions MAY be placed in any
           order (no timing requirements) and MAY be present as often as
           needed, e.g. piggybacked.

        . units containing modifier box(es) or fragments thereof SHOULD
           be transmitted in the same order as they appear in the sample
           and be placed as near as possible to the text to which they
           apply.  Logically, this does not apply for retransmitted or
           redundant packets or for units that are piggybacked into
           other packets.

   The latter recommendation targets at avoiding (or minimising) the
   spreading of fragments belonging to a text sample over several RTP
   packets, a.k.a. interleaving.  Interleaving of units SHOULD NOT be
   used with this payload format due to the variable packet size of the
   timed text samples, which would yield unpredictable latencies.  This
   decreases the robustness against packet losses.

   In this manner, both units with and without duration MAY be part of
   the aggregate payload, whereas units without timing attributes SHALL
   NOT be used to resolve the timestamp of subsequent units.  For this
   purpose, they SHALL be ignored, i.e. by jumping to the next unit with
   duration or until the end of the packet is reached.  The receiver
   SHALL use the newest timestamp value calculated in the current
   aggregate packet, as per the timestamp definition above.


   Rey, et al.                                               [Page 12]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   On the other hand, units with unknown duration do have some ordering
   constraints: they MAY only precede units that do not have a duration
   value (TYPE 3, 4 & 5 below).  Otherwise, it would not be clear when
   the following units should be displayed due to the unknown duration.

3.1.5 Live streaming vs. Streaming from a 3GP file

   This section shall clarify the differences between streaming live
   content or from a file.

   The term live streaming refers to those cases where the sender
   creates the media contents without necessarily storing them in a 3GP
   file.  Usually, the generated contents are stored for a limited
   amount of time in a buffer.  This buffer is used to cancel the
   network delay and delay jitter.

   In this document, the values for the header fields are exemplified
   with those stored in the 3GP file.  In particular, section 13.3
   clarifies how the 3GP file parameters are mapped to the fields of the
   payload header defined in this document.  For live streaming,
   appropriate values complying with the format and units described in
   [1] shall be used.  Where needed, clarifications on appropriate
   values are given in this document.

3.2 Payload Header Definitions

   An RTP packet using the payload headers defined in this document has
   the following format:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X| CC    |M|    PT       |        sequence number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   | TYPE|                                               :
      +-+-+-+-+-+-+-+-+                                               :
      :        (variable payload header depending on TYPE value)      :
      :                                                               :
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      :                     SAMPLE CONTENTS                           :
      :                                                               :
      :                                                               :
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                        Figure 1 RTP Packet Format.





   Rey, et al.                                               [Page 13]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   The payload headers specified in this document consist of a set of
   common fields followed by specific fields for each header type and
   the sample contents. See Figure 1 and Figure 3.

   In this manner, the structure of the payload headers resembles that
   of the 'access units' (AU) in RFC 3640.  This similarity is
   intentional to improve interoperability.  The 'AU header' of that
   document finds an equivalent in the common header fields for all TYPE
   values: R, U, TYPE and LEN.  Similarly, the specific fields plus the
   sample contents would be equivalent the 'AU data section' in [15].
   We refer to these as unit header and unit payload.

   An aggregate RTP packet containing two text samples and a text sample
   fragment would schematically look like this:

                                        +----------------------+
                                        |                      |
                                        |   RTP Header         |
                                        |                      |
                                       _.----------------------+
                                  ..-'' |                      |
                            _..-''      |    Payload Header 1  |
                                        ........................
                        UNIT 1          |                      |
                                        |    Text Sample 1     |
                            `-...._     |                      |
                                  ``-.  ........................
                                _,,..-- |                      |
                            --''        |    Payload Header 2  |
                                        ........................
                        UNIT 2          |                      |
                                        |    Text Sample 2     |
                            ._          |                      |
                              `--._     |                      |
                                   `--. ........................
                                   ,-'  |                      |
                               _.-'     |    Payload Header 3  |
                            ,-'         ........................
                        UNIT 3          |                      |
                                        | Text Sample Fragment |
                            `-.._       |                      |
                                 `-.._  |                      |
                                      `-+----------------------+
                        Figure 2 Example RTP packet.o

3.2.1 Unit Header Format

   The unit header has the following format:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1


   Rey, et al.                                               [Page 14]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   |TYPE |             LEN               |   specific    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   fields (variable)   |
      +-+-+-+-+-+-+-+-+-+-+-+-+
                        Figure 3 Unit Header Format.


   where:

   . U (1 bit) "UTF Transformation flag": indicates whether the text
     characters are encoded using UTF-8 (U=0) or UTF-16 (U=1).  This is
     used to inform RTP receivers whether UTF-8 or UTF-16 was used to
     encode the text string and so enable to display text string
     fragments.  The U bit is only meaningful in TYPE 2 header,
     otherwise it MUST be set to zero and ignored.  This is because
     complete text samples already contain an implicit indication of
     the encoding (byte order mark) in the text string itself (unit
     payload), which is understood by the decoding application.

   . R (4 bits) "Reserved bits": for future extensions.  This field
     MUST be set to zero (0x0).

   . TYPE (3 bits) "Type Field": this field specifies which specific
     fields follow.  The following TYPE values are defined

        - TYPE 1, for whole text samples
        - TYPE 2, for text string fragments
        - TYPE 3, for whole modifier boxes or first modifier fragments
        - TYPE 4, for modifier fragments other than first.
        - TYPE 5, is for sample descriptions.  One header per sample
          description.
        - TYPE 0, 6 and 7 are reserved.

        Two TYPEs (1 & 2) are defined for units containing text strings,
        another two (3 & 4) for units not containing text strings and
        hence not carrying timing attributes and a final TYPE 5 for
        sample descriptions, also lacking timing attributes.  Detailed
        description in subsections below.

   . Finally, the LEN (16 bits) "Length Field": indicates the size (in
     bytes) of this field and all the bits following, .i.e. specific
     payload header fields for each TYPE (excluding initial byte) plus
     the sample contents.  For stored content, and TYPE=1 units, the
     sample contents length is the as the SLEN (see SLEN below).
     Otherwise the length value of the current fragment MUST be
     calculated during fragmentation or sampling.

     LEN has the following values:

      - TYPE = 1, LEN >= 6,
      - TYPE = 2, LEN > 9,
      - TYPE = 3, LEN > 3,

   Rey, et al.                                               [Page 15]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


      - TYPE = 4, LEN > 3 and,
      - TYPE = 5, LEN > 3.

     In the next subsection the different payload headers for the
     values of TYPE are specified.


3.2.2 TYPE 1 Header

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   |TYPE |       LEN  (always >=6)       |    SIDX       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      SDUR                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport whole text samples.  If several
   text samples are sent in an RTP packet, every sample has its own
   header.  See Figure 2.

   Empty text samples are considered whole text samples, although they
   do not contain sample contents.  In this case, TYPE 1 units MUST not
   have contents.  This means that the LEN field MUST have a value of 6
   (0x0006) to include just the common and specific header fields.
   Otherwise, the LEN field MUST be always greater than 6 (0x0006).

   The fields above have the following meaning:

   . SIDX (8 bits) "Text Sample Entry Index": this is an index used to
     identify the sample descriptions.  SIDX values 128 and 255 are
     reserved for future use.

     The SIDX field is used to find the sample description
     corresponding to the unit's payload.  There are two types of SIDX
     values: static and dynamic.

     Static SIDX values are used to identify sample descriptions that
     MUST be sent out-of-band and MUST remain active during the whole
     session.  The transport of sample descriptions out-of-band is a
     MANDATORY feature.  A static SIDX value is unequivocally linked to
     one particular sample description during the whole session.  It
     SHOULD be avoided that many sample descriptions are carried out-
     of-band, since these may become large and, ultimately, transport
     is not the goal of the out-of-band channel.  Thus, this feature
     MUST be limited to those sample descriptions that provide a set of
     minimum default format settings.  Static SIDX values MUST fall in
     the interval [129,254].  The first SIDX value assigned to any
     static sample description MUST be 129.

     Dynamic SIDX values are used for sample descriptions sent in-band.
     Sample descriptions MAY be sent in-band either because they are
     generated in real time or for transport resiliency.  A dynamic

   Rey, et al.                                               [Page 16]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


     SIDX value is unequivocally linked to one particular sample
     description during the period in which this is active in the
     session and it SHALL NOT be modified during that period.  This
     period MAY be smaller or equal to the session duration.  A maximum
     of 64 dynamic active SIDX is allowed at any moment.  Dynamic SIDX
     values MUST fall in the interval [0,127].  This should be enough
     for both recorded content and live streaming applications.
     Nevertheless, a wraparound mechanism is provided in Section 13 to
     handle sessions where more than 64 SIDX values might be active
     during some period of time in the session.

   . SDUR (24 bits) "Text Sample Duration": indicates the sample
     duration in timestamp units of the text sample.  For this field, a
     length of 3 bytes is preferred to 2 bytes.  This is because, for a
     typical clockrate of 1000 Hz, 16 bits would allow for a maximum
     duration of just 65 seconds, which might be too short for some
     streams.

     Text samples have generally a known duration at the time of
     transmission.  However, in some cases, e.g. live streaming, the
     time for which a text piece shall be shown might not be known.  As
     an example, imagine you are in an airport watching the latest news
     report while you wait for your plane.  Airports are loud, so the
     news report is transcribed in the lower area of the screen.  This
     area displays two lines of text: the headlines and the words
     spoken by the news speaker.  As usual, the headlines are shown for
     a longer time than the rest.  This time is, in principle, unknown
     to the stream server.  A headline is replaced when the next
     headline arrives.

     As seen in the example, units of unknown duration MUST remain
     valid until the next unit arrives.

     Apart from determining the time period during which the text is
     displayed, the duration field is also used to find the timestamp
     of the any subsequent units within the RTP packet.  Therefore
     samples of unknown duration SHALL NOT use features, such as
     scrolling or karaoke, which would need to know the duration of the
     sample up front.

     For text stored in 3GP files, see Section 13.3 for details on how
     to extract the duration value.  Live encoders SHALL assign
     appropriate values and units according to [1] and later releases.


3.2.3 TYPE 2 Header

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   |TYPE |        LEN( always >9)      |    SIDX       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    SDUR                       | TOTAL  | THIS |

   Rey, et al.                                               [Page 17]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               SLEN            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport text sample fragments
   containing text strings.  In detail:

   . The LEN field (16 bits) has the same meaning as above.  The value
     of LEN MUST be greater than a value of 9 (0x0009).

   . The SLEN field (16 bits) indicates the size (in bytes) of the
     original (whole) text sample to which this fragment belongs.
     Clients MAY use SLEN to buffer space for the remaining fragments
     of the text sample.

     For stored content, see Section 13.3 for details on how to find
     the SLEN value in a 3GP file.  For live content, the SLEN is
     obtained during the sampling process.

   . The fields TOTAL (4 bits) and THIS (4 bits) indicate the total
     number of fragments in which the original text sample has been
     fragmented and which order occupies the current fragment in that
     sequence, respectively.  The usual "byte offset" field is not used
     here for two reasons: a) it would take one more byte and b) it
     does not provide any information on the character offset.  UTF-
     8/16 text strings have, in general, a variable character length
     ranging from 1 to 6 bytes.  Therefore, the TOTAL/THIS solution is
     preferred.

   . The U, R, TYPE, SIDX, and SDUR fields have identical
     interpretation as above.  The U, SIDX and SDUR fields are
     meaningful since partial text strings MAY also be displayed with
     the corresponding decoration.

3.2.4 TYPE 3 Header

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   |TYPE |        LEN( always >3)        |TOTAL  |  THIS |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport either the entire modifier
   contents in a sample or just the first fragment of these.  This
   depends on whether the modifier boxes fit in the current RTP packet.
   As explained above, the rules for fragmentation require that the
   start of the modifier boxes be signaled.

   . The TOTAL/THIS field indicates whether the unit contains a part of
     or the whole of the modifiers: if TOTAL=THIS, then all modifiers
     are included here.  Otherwise, this unit just contains the first
     fragment.


   Rey, et al.                                               [Page 18]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   . The U, R, TOTAL/THIS and LEN fields are used as above.  The LEN
     field MUST be greater than three (0x0003).

   Note that the SLEN, SIDX and SDUR fields are not present.  This is
   because: a) these fragments do not contain text strings and b) these
   types of fragments are applied over text string fragments, which
   already contain this information.

3.2.5 TYPE 4 Header

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   |TYPE |        LEN( always >3)        |TOTAL  |  THIS |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport modifier fragments, other than
   the first one.

   The U, R, TOTAL/THIS and LEN fields are used as above.  The LEN field
   MUST be greater than three (0x0003).

3.2.6 TYPE 5 Header

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |U|   R   |TYPE |      LEN( always >3)          |   SIDX        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This header type is used to transport sample descriptions.  The LEN
   field MUST be greater than three (0x0003).  Every sample description
   MUST have its own TYPE 5 header.

   This header SHOULD be supported, since it adds minimum complexity and
   it may increase the robustness of the streaming session.  At the very
   least, every client implementation MUST be able to discard a TYPE 5
   unit, if the unit payload cannot be used.

   Note that the implementation of this header is only RECOMMENDED,
   since some text streaming applications might never use dynamic sample
   descriptions.


4. Resilient Transport

   Apart from the basic fragmentation measures described in the section
   above, the simplest option for packet loss resilient transport is to
   send the same RTP packet or the same text samples (or fragments)
   again.  A server MAY decide to use repetition as a measure for packet
   loss resilience.



   Rey, et al.                                               [Page 19]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   Repetition of text samples (or fragments) is only allowed if exactly
   the same units are sent, as in the first transmission. In this
   manner, a receiver can use the already received and the newly
   repeated fragments to reconstruct the original text samples.  This
   also reduces complexity as fragmentation of any given text sample is
   only done once.

   E.g. if a text sample was originally sent as a unique non-fragmented
   text sample, a repetition of that sample MUST be sent also as a
   single non-fragmented text sample in one unit.  Likewise, if the
   original text sample was fragmented and spread over several RTP
   packets, say a total of 3 units, then the repeated fragments SHALL
   also have the same byte boundaries and use the same headers and bytes
   per fragment.

   With repetition, repeated units resolve to the same timestamp as
   their originals.  Where redundant units are available, the receiver
   SHOULD use those units received in the RTP packet with the highest
   sequence number and discard the rest.

   If single units are repeated in packets different from their
   originals, care SHALL be taken to preserve their original timing.

   Regarding the RTP header fields:

   . in repeated packets, all RTP header fields MUST keep their
     original values except the sequence number that MUST be increased
     to comply with RTP.

   . in packets containing repeated units, the general rules in Section
     3 for assigning values to the RTP header fields apply.

   Finally, if sample descriptions for a given SIDX value are not
   available at the receiver, it is a matter of implementation whether
   the text sample contents are displayed.  A possible solution MAY be
   that the encoder provides a static default sample description to be
   used for these cases.


5. Congestion control

   The RTP profile under which this payload format is used defines an
   appropriate congestion control mechanism in different environments.
   Following the rules under the profile, an RTP application can
   determine its acceptable bitrate and packet rate in order to be fair
   to other TCP or RTP flows.

   If an RTP application using this payload format uses retransmission,
   the acceptable packet rate and bitrate includes both the original and
   retransmitted data.  This guarantees that an application using
   retransmission achieves the same fairness as one that does not.  Such
   a rule may translate in practice into the following actions:


   Rey, et al.                                               [Page 20]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   If enhanced service is used, it should be made sure that the total
   bitrate and packet rate do not exceed that of the requested service.
   It should be further monitored that the requested services are
   actually delivered.  In a best-effort environment, the sender SHOULD
   NOT send retransmission packets without ensuring first that enough
   bandwidth for retransmission is available.  Other solutions like
   reducing the packet rate and bitrate of the original stream (for
   example by encoding the data at a lower rate) MAY be used.

   Similar considerations apply, if an RTP application using this
   payload format implements forward error correction, FEC [7].  Hereby,
   the sender should take care that the amount of FEC does not actually
   worsen the problem.

   Therefore, it is RECOMMENDED that applications implementing this
   payload format also implement congestion control.  The actual
   mechanism for congestion control is out of the scope of this document
   but should be suitable for real-time flows.  As an example, RFC 3448
   [11] specifies an equation-based congestion control that fulfils this
   requirement.


6. Scene Description

6.1 Text rendering position and composition

   In order to stream timed text, either stored in a 3GP file or
   streamed live, some initial layout information is needed by the
   client to correctly display the text.  These are the width, height
   and position of the text area and the layer or proximity of the text
   to the user.

   These pieces of information MUST be conveyed in a reliable form
   previous to the start of the session.  An example of a reliable
   transport may be the out-of-band channel used for SDP.  Any SDP
   description containing a 3GPP timed text stream MUST include the
   parameters listed above.  Section 7 provides details on the usage in
   SDP descriptions.

   For stored content, some values contained in the Track Header Box
   SHALL be used.  See Section 13.3 for details on finding these values
   in a 3GP file.  For live streaming appropriate values SHALL be used.



6.2 SMIL usage

   The attributes contained in the Track Header Boxes of a 3GP file only
   specify the spatial relationship of the tracks within the given 3GP
   file.  If several media streams are sent (either read out of
   different files or simultaneously streamed live), they require
   spatial synchronization.  For such purpose, SMIL SHOULD be used.


   Rey, et al.                                               [Page 21]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   SMIL assigns regions in the display to each of those files and places
   the tracks within those regions.  The original track header
   information is used for each track within its region.  Therefore,
   even if SMIL scene description is used, the track header information
   pieces SHOULD be sent anyway as they represent the intrinsic media
   properties.

   See [1] and the 3GPP SMIL Language Profile in [18] for details.


7. MIME Type usage Registration

7.1 3GPP Timed Text MIME Registration

   MIME type: video

   MIME subtype: 3gpp-tt

   Required parameters:

        rate: the RTP timestamp clockrate is equal to the clockrate of
        the media.  If RTP packets are generated out of a 3GP file, the
        clockrate of the text media MUST be copied from the 3GP file,
        i.e. the clockrate is the value of "timescale" parameter in the
        Media Header Box belonging to that text track.  Other tracks
        (audio/video/text) in the 3GP file may have their own clockrates
        as indicated in their corresponding Media Header Box.  For live
        encoding, an clockrate of 1000 Hz is RECOMMENDED but other
        values MAY be used.

        version=<Z(x*256+y)>, indicates the version of the 3GPP TS
        26.245 specification after which the timed text is encoded.  "Z"
        is the number of the Release, "x" and "y" are taken from the
        3GPP specification version, vZ.x.y.  E.g. for 3GPP TS 26.245
        v6.0.0, x*256+y=0, the version value is "60".

        spldesc=<value1,value2,...> indicates the way the server sends
        the sample descriptions.  There are three possibilities:

             . "out" all are sent out-of-band, e.g. in the SDP.  This
                may be used when the total number of sample descriptions
                used is low.
             . "both", where both, in- and out-of-band, mechanisms are
                used,

             All clients and servers MUST understand this parameter.
             The server MUST always include the "spldesc" parameter in
             the session description and it MUST include the supported
             mechanisms in order of preference.  The server MUST
             include, at least, the value "out".

        tx3g=<base64-value-1>,<base64-value-2>,...This parameter MUST be
        used, if conveying sample descriptions out-of-band is necessary.

   Rey, et al.                                               [Page 22]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


        The list of sample entries MAY follow any particular order and
        it MAY be empty.  The <base64-value-i> represents the base64
        encoding of the concatenation of the SIDX and the sample
        description for that SIDX.  The format of a sample description
        entry can be found in 3GPP TS 26.245 Release 6 and later
        releases.

        All servers and clients MUST understand this parameter and MUST
        be capable of using the sample description(s) contained in it.

        width=<integer-value> indicates the width in pixels of the text
        track or area where the text is actually displayed.

        height=<integer-value> indicates the height in pixels of the
        text track.

        tx=<integer-value>, indicates the horizontal translation offset
        in pixels of the text track with respect to the origin of the
        video track.

        ty=<integer-value>, indicates the vertical translation offset in
        pixels of the text track.

        layer=<integer-value>, indicates the proximity of the text track
        to the viewer.  Higher values means closer to the viewer.  This
        parameter has no units.

   Optional parameters:

        brand=<brand-name>, where <brand-name> indicates the "best use"
        of the original 3GP file from which the timed text contents are
        read.

        cbrand=<brand-name-1>,<brand-name-2>,...indicates the list of
        compatible brands.  This parameter only provides information
        about the original 3GP file being read from.

        mver=<version-value>, "Minor version" where <version-value> is a
        positive integer.  It identifies the oldest compatible version
        of the 3GP file format specification in 3GPP TS 26.234 Release
        and corresponding specifications in later Releases.

        Note these parameters are merely informational.  Details on each
        of them can be found in the 3GP file format section of 3GPP TS
        26.234 Release 5 specification and corresponding specifications
        in later Releases.


   Encoding considerations: this type is only defined for transfer via
   RTP.

   Security considerations: please refer to Section 11 of RFCXXXX.


   Rey, et al.                                               [Page 23]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   Interoperability considerations: the 3GPP Timed Text media format for
   which this payload format is defined is specified in Release 6 of
   3GPP TS 26.245 "Transparent end-to-end packet switched streaming
   service (PSS); Timed Text Format (Release 6)".  The 3GPP file format
   (3GP) referred to in this document and the used SMIL language profile
   can be found in Release 5 of 3GPP TS 26.234 and in the corresponding
   specifications for later Releases.  Note also that 3GPP may in future
   Releases specify extensions or updates to the media format in a
   backwards-compatible way, e.g. new modifier boxes or extensions to
   the sample descriptions.  The payload format defined in RFCXXXX
   allows for such extensions.  For future 3GPP Releases of the Timed
   Text Format, the parameter "version" is used to identify the Release
   and exact specification used.

   Published specification: RFC XXXX

   Applications which use this media type: multimedia streaming
   applications.

   Additional information: the 3GPP Timed Text media format is specified
   in 3GPP TS 26.245 "Transparent end-to-end packet switched streaming
   service (PSS); Timed Text Format (Release 6)".  This document and
   future extensions to the 3GPP Timed Text format are publicly
   available at http://www.3gpp.org.

   Magic number(s): None.

   File extension(s): 3GPP Timed Text tracks are stored in files
   conforming the 3GP file format.  The 3GPP file format (3GP) referred
   to in this document can be found in Release 5 of 3GPP TS 26.234 and
   in the corresponding specifications for later Releases.

   Macintosh File Type Code(s): None.

   Person & email address to contact for further information:
   Jose Rey, rey@panasonic.de
   Yoshinori Matsui, matsui.yoshinori@jp.panasonic.com
   Audio/Video Transport Working Group.

   Intended usage: COMMON

   Author/Change controller:
   Jose Rey
   Yoshinori Matsui
   IETF AVT WG


8. SDP usage

8.1 Mapping to SDP

   The information carried in the MIME media type specification has a
   specific mapping to fields in [4], which is commonly used to describe

   Rey, et al.                                               [Page 24]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   RTP sessions.  When SDP is used to specify transmission using this
   payload format, the mapping is done as follows:

   -  The MIME type ("video") goes in the SDP "m=" as the media name.
   The "video" MIME Type is used as timed text is considered visual
   media.

       m=video <portnumber> RTP/<RTP profile> <dynamic payload type>

   -  The MIME subtype ("3gpp-tt") and the timestamp rate go in SDP
   "a=rtpmap" line as the encoding name and (clock) rate, respectively:

       a=rtpmap:<payload type> 3gpp-tt/<rate>

   -  The REQUIRED payload-format-specific parameters "width", "height",
   "tx", "ty", "layer", "spldesc", "version" and "tx3g" go in the SDP
   "a=fmtp" as a semicolon separated list of parameter= <value> pairs or
   parameter= <value1,value2,value3>, for "tx3g" and "spldesc".  The
   format is:

       a=fmtp:<dynamic payload type> <parameter name>=<value>[
       ; <parameter name>=<value>; parameter=<value1,value2,value3>]

   -  The OPTIONAL payload-format-specific parameter "brand", "cbrand",
   and "mver" go in the SDP "a=fmtp" as a semicolon-separated list of
   parameter=<value> pairs.  Details on the versioning are found in
   Release 5 of 3GPP TS 26.234 and in the corresponding specifications
   for later Releases.

   -  Any remaining parameters go in the SDP "a=fmtp" attribute by
   copying them directly from the MIME media type string as a semicolon
   separated list of parameter=value pairs.

   -  Any unknown parameters SHALL be ignored.

   In Section 9 some example SDP descriptions are presented.


8.2 Usage in Offer/Answer

   In the following, the interpretation of the SDP parameters defined in
   this document in the Offer/Answer (O/A) context [16] is explained.

   In unicast sender and receiver typically negotiate the streams,
   codecs and values of the parameters used for the session.  This is
   also possible in multicast to a lesser extend.

   As stated in the O/A model, some "fmtp" (payload-format-specific)
   parameters have a clear meaning and shall be processed by the
   answerer as they are contained in the offer.  Other parameters may
   need to be set among parties, because it is not clear that offerer
   and answerer SHALL use the same values.


   Rey, et al.                                               [Page 25]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   The only parameter whose value MAY be negotiated is the "spldesc".
   An offerer may offer to send sample descriptions in two modes:

   . "both": sample descriptions are sent in the current session both
     out-of-band and in-band.  It is the responsibility of the server
     to decide which are sent using which method.  The server SHALL
     ensure that the indispensable descriptions are sent out-of-band
     and, at the same time, that the out-of-band channel is not
     overloaded with large sample descriptions.  Additionally, the
     contents SHALL still be useful if some in-band descriptions are
     lost, i.e. redundancy in some form: FEC [7], retransmission [13],
     repetition or a similar technique SHOULD be applied.

   . "out": sample descriptions MUST be sent out-of-band only.  This is
     a form for a client to tell the server that it shall not bother to
     send in-band sample descriptions because it will not use them
     anyway.  Servers offering these method SHALL ensure that it is
     possible to rely on a reduced number of sample descriptions sent
     out-of-band so that the text is still useful.

   Upon receiving the session description with this parameter containing
   a list of supported mechanisms, the answerer MAY decide to use one of
   these or none.  E.g., if a client only supports out-of-band and the
   server only offers "both", then the client MUST reject the offer by
   leaving the "spldesc" parameter empty.  Otherwise, the client MUST
   include the "spldesc" with the desired value (MUST be just one) in
   its answer.  The offerer MUST then use the preferred mechanism.


8.3 Usage outside of Offer/Answer

   SDP may also be employed outside of the Offer/Answer context, for
   instance for multimedia sessions that are announced through the
   Session Announcement Protocol (SAP) [17], or streamed through the
   Real Time Streaming Protocol (RTSP) [18].

   In this case, the only change with respect to the above, is that the
   answerer cannot negotiate the "spldesc" value.  If the answerer
   accepts the session as announced, it MUST be prepared to receive
   sample descriptions using both methods.

   This is compliant with the requirement for clients and servers to
   understand the "spldesc" as well as static sample descriptions and,
   at the same time, be able to discard units with dynamic sample
   descriptions, if not supported.


9. Examples of RTP packet structure

   In this section, some examples of RTP packet structure are explained
   for better understanding of this payload format.  The wrap-around of
   the long lines is indicated by the backslash character "\".  The
   examples assume aggregate control of stream container files.  The

   Rey, et al.                                               [Page 26]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   session descriptions are not complete but limited to the example
   purposes.


9.1 An RTP packet containing multiple text samples
   <TODO>


10. IANA Considerations

   IANA is requested to register the MIME subtype name "3gpp-tt" for the
   media type "video" as specified in Section 8 of this document.


11. Security considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [3].

   In particular, an attacker may invalidate the current set of valid
   sample descriptions at the client by means of repeating a packet with
   an old sample description.  This would mean that the display of the
   text would be corrupted, if displayed at all.  Another form of attack
   may consist in sending redundant fragments, whose boundaries do not
   match the exact boundaries of the originals.  This may cause a
   decoder to crash.

   These types of attack may easily be avoided by using authentication.

   Additionally, peers in a timed text session may desire to retain
   privacy in their communication, i.e. confidentiality.  This payload
   format does not provide any mechanisms for achieving this.  Both
   confidentiality and authentication have to be solved by a mechanism
   external to this payload format, e.g. SRTP [10].


12. References

12.1 Normative References

   [1]  Transparent end-to-end packet switched streaming service (PSS);
     Timed Text Format (Release 6), TS 26.245 v 0.1.6, Working Draft,
     July 2003.

   [2]  ISO/IEC 14496-1:2001/AMD5, "Information technology û Coding of
     audio-visual objects û Part 1: Systems, ISO Base Media File
     Format", 2003.

   [3]  H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A
     Transport Protocol for Real-Time Applications", RFC 3550, July
     2003.


   Rey, et al.                                               [Page 27]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   [4]  M. Handley, V. Jacobson, "SDP: Session Description Protocol",
     RFC 2327, April 1998.

   [5]  S. Bradner, "Key words for use in RFCs to indicate requirement
     levels," BCP 14, RFC 2119, IETF, March 1997.

12.2 Informative References

   [6]  C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C.
     Bolot, A. Vega-Garcia, S. Fosse-Parisis, "RTP Payload for
     Redundant Audio Data", September 1997.

   [7]  J. Rosenberg, H. Schulzrinne, "An RTP Payload Format for Generic
     Forward Error Correction", RFC 2733, December 1999.

   [8]  C. Perkins, O. Hodson, "Options for Repair of Streaming Media",
     RFC 2354, June 1998.

   [9]  W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)",
     August, 2001.

   [10] M. Baugher, D. A. McGrew, D. Oran, R. Blom, E. Carrara, M.
     Naslund, K. Norrman, "The Secure Real-Time Transport Protocol",
     draft-ietf-avt-srtp-05.txt, June 2002.

   [11] Handley, et al., "TCP Friendly Rate Control (TFRC): Protocol
     Specification ", RFC 3448, January 2003.

   [12] R. Hovey, S. Bradner, "The Organizations involved in the IETF
     Standards Process", BCP 11, RFC 2028, October 1996.

   [13] J. Rey et al., "RTP Retransmission Payload Format", draft-ietf-
     avt-rtp-retransmission-10.txt, work in progress, January 2004.

   [14] M. Handley, C. Perkins, "Guidelines for Writers of RTP Payload
     Format Specifications", RFC 2367, December 1999.

   [15] Van der Meer et al., "RTP Payload Format for Transport of MPEG-4
     Elementary Streams ", RFC3640, November 2003.

   [16] J. Rosenberg., H. Schulzrinne, " An Offer/Answer Model with the
     Session Description Protocol (SDP)", RFC 3264, June 2002.

   [17] Transparent end-to-end packet switched streaming service (PSS);
     Protocols and codecs (Release 6), TS 26.234 v 0.4.0, Working
     Draft, February 2004.

   [18] Transparent end-to-end packet switched streaming service (PSS);
     Protocols and codecs (Release 5), TS 26.234 v 5.6.0, Working
     Draft, September 2003.




   Rey, et al.                                               [Page 28]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


13. Annexes

13.1 Dynamic SIDX wraparound mechanism

   This mechanism MUST be implemented if the implementation shall use
   TYPE 5 units.

   As mentioned in Section 3.2.2, dynamic SIDX values remain active
   either during the entire duration of the session (used just once) or
   in different intervals of it (used once or more).  Although 64 sample
   descriptions should cover the needs of most timed text applications,
   a wraparound mechanism to handle the exception is described here.  In
   the following, SIDX value means dynamic SIDX value.

   There is a sliding window of 64 active SIDX values.  Values within
   the window are active, all others are considered inactive.  An SIDX
   value becomes "active" if at least one sample description identified
   by that SIDX has been received.  Since sample descriptions MAY be
   sent redundantly, it is possible that a client receives a given SIDX
   several times.  However, the receiver SHALL ignore redundant sample
   descriptions and it MUST use the already cached copy.  The guard
   range of inactive values ensures that always the correct association
   SIDX <-> sample description is used.

   The following algorithm is used to maintain the dynamic SIDX values:

     Let X be the SIDX of the last received sample description.  Let Y
     be a value within the allowed range for dynamic SIDX: [0,127], and
     different from X.

        1. Initialize all dynamic SIDX values as inactive.  For stored
          content, read the sample description index in the Sample to
          Chunk box ("stsc") for that sample.  For live streaming, the
          first value MAY be zero or any other value in the interval
          above.  The initial value is SIDX=X.  Go to step 2.
        2. First in-band sample description with SIDX=X is received. Go
          to step 3.
        3. Set SIDX=Y inactive if inside the interval [X+1 modulo(128),
          X+64 modulo(128)].  Otherwise, set SIDX=Y as active.  Go to
          step 4.
        4. Wait for next sample description.  Upon reception of a sample
          description with SIDX=X do:
             a. If X is currently active, then wait for next SIDX (do
               nothing).  Else go to step 3.

   Example,

        if X=4, any SIDX in the interval [5,68] is inactive.  Active
        SIDX values are in the complementary interval [69,127] plus
        [0,4].  Once the client is initialized, the interval of active
        SIDX values MUST change whenever a sample description with an
        inactive SIDX value is received.  E.g., if the client receives a
        SIDX=6, then the active interval is now different: [0,6] plus

   Rey, et al.                                               [Page 29]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


        [71,127].  However, if the received SIDX is in the current valid
        interval no change SHALL be applied.  This means that at any
        instant a maximum of 64 SIDX values are valid, whereas the total
        of values used might be over 64.

13.2 Basics of the 3GP File Structure

   This section provides a coarse overview of the 3GP file structure.

   Each 3GP file consists of "Boxes".  Boxes start with a header which
   indicates both size and type contained.  In general, a 3GP file
   contains the File Type Box (ftyp), the Movie Box (moov), and the
   Media Data Box (mdat).  The Movie Box and the Media Data Box, serving
   as containers, include own boxes for each media.  Similarly, each box
   type may include a number of boxes, see ISO Base Media file Format
   [2] for a complete list of possibilities.

   In the following, only those boxes are mentioned, which are useful
   for the purposes of this payload format.

   The File Type Box identifies the type and properties of a 3GP file.
   The File Type Box contents comprise the major brand, the minor
   version and the compatible brands.  When streamed with RTP, these are
   communicated via out-of-band means, such as SDP.

   The Movie Box (moov) contains one or more Track Boxes (trak) which
   include information about each track.  A Track Box contains, among
   others, the Track Header Box (tkhd), the Media Header Box (mdhd) and
   the Media Information Box (minf).

   The Track Header Box specifies the characteristics of a single track,
   where a track is, in this case, the streamed text during a session.
   Exactly one Track Header Box is present for a track.  It contains
   information about the track, such as the spatial layout (width and
   height), the video transformation matrix and the layer number.  Since
   these pieces of information are essential and static, i.e. constant
   for the duration of the session, they MUST be sent prior to the
   transmission of any text samples.  See the ISO base media file format
   [2] for details about the definition of the conveyed information.

   The Media Header Box contains the timescale or number of time units
   that pass in one second, i.e. cycles per second or Hertz.  The Media
   Information Box includes the Sample Table Box (stbl) which itself
   contains the Sample Description Box (stsd), the Decoding Time to
   Sample Box (stts), the Sample Size Box (stsz) and the Sample to Chunk
   Box (stsc).  Sample descriptions for each text sample are encoded as
   "tx3g" sample entries in the Sample Description Box (stsd).

   The Sample Table Box (stbl) contains all the time and data indexing
   of the media samples in a track.  Using the tables here, it is
   possible to locate samples in time, determine their type, and
   determine their size, container, and offset into that container.


   Rey, et al.                                               [Page 30]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   Finally, the Media Data Box contains the media data itself.  In timed
   text tracks this box contains text samples.  Its equivalent to audio
   and video is audio and video frames, respectively.  The text sample
   consists of the text length, the text string, and one or several
   Modifier Boxes.  The text length is the size of the text in bytes.
   The text string is plain text to render.  The Modifier Box is
   information to render in addition to the text such as colour, font,
   etc.

13.3 Usage of 3GP file information for transport in RTP

   For the purpose of streaming timed text contents, some values in the
   boxes contained in a 3GP file are mapped to fields of this payload
   header.  This section explains where to find and how to use those
   values.

   From the Track Header Box (tkhd):

        . tx,ty: these values are the second but last and third but
           last values in the unity matrix.  All 32 bits are used.
        . width, height, layer: they also have the same name in the box
           and the payload header.  All 32 bits are used.

   From the Sample Table Box (stbl) the following information is carried
   in each RTP packet using this payload format:

        . the Sample Description Box (stsd): this stsd box provides
           information on the basic characteristics of text samples.
           Each entry is a sample entry box of type "tx3g".  An example
           of the information contained in a sample entry could be the
           font size or the background colour.  These pieces of
           information are commonly used by many text samples during the
           session.  Each sample entry "tx3g" is transported either in-
           band or out-of-band.
        . the Decoding Time to Sample Box (stts): the 24 least
           significant bits of the "sample_delta" are mapped to the
           field SDUR (Text Sample Duration),
        . the Sample Size Box (stsz): the 16 least significant bits of
           the "sample_size" or "entry_size" (depending on whether the
           sample size is fixed or variable) are mapped to the SLEN
           field for that sample.
        . the Sample to Chunk Box (stsc): the value of the
           "sample_description_index" for that sample in the Sample to
           Chunk Box is mapped to the field SIDX (Text Sample Entry
           Index).  The Sample to Chunk Box (stsc) associates the text
           sample and its corresponding sample description entry in the
           Sample Description Box (stsd, see below).  The Sample to
           Chunk Box can be used to associate a text sample with a
           sample description entry.  Since the sample description may
           vary during the session, the association SDIX is sent
           together with the text samples using this payload format.



   Rey, et al.                                               [Page 31]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


14. Acknowledgements

   The authors would like to thank Dave Singer, Jan van der Meer, Magnus
   Westerlund and Colin Perkins for their comments and suggestions to
   this document.


15. Author's Addresses

   Jose Rey                                     rey@panasonic.de
   Panasonic European Laboratories GmbH
   Monzastr. 4c
   D-63225 Langen, Germany
   Phone: +49-6103-766-134
   Fax:   +49-6103-766-166

   Yoshinori Matsui             matsui.yoshinori@jp.panasonic.com
   Matsushita Electric Industrial Co., LTD.
   1006 Kadoma
   Kadoma-shi, Osaka, Japan
   Phone: +81 6 6900 9689
   Fax:   +81 6 6900 9699


16. IPR Notices

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.


17. Full Copyright Statement




   Rey, et al.                                               [Page 32]


   Internet Draft  RTP Payload Format for 3GPP Timed Text April 5, 2004


   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


18. Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.





































   Rey, et al.                                               [Page 33]