AVT Working Group                                                P. Kerr
Internet-Draft                                                  Xiph.Org
Expires: July 1, 2005                                  December 31, 2004

                      draft-kerr-avt-vorbis-rtp-04
              RTP Payload Format for Vorbis Encoded Audio

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on July 1, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2004).

Abstract

   This document describes a RTP payload format for transporting Vorbis
   encoded audio.  It details the RTP encapsulation mechanism for raw
   Vorbis data and details the delivery mechanisms for the decoder
   probability model, referred to as a codebook, metadata and other
   setup information.

Editors Note


Kerr                      Expires July 1, 2005                  [Page 1]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   All references to RFC XXXX are to be replaced by references to the
   RFC number of this memo, when published.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Payload Format . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.1   RTP Header . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.2   Payload Header . . . . . . . . . . . . . . . . . . . . . .  5
     2.3   Payload Data . . . . . . . . . . . . . . . . . . . . . . .  6
     2.4   Example RTP Packet . . . . . . . . . . . . . . . . . . . .  7
   3.  Frame Packetizing  . . . . . . . . . . . . . . . . . . . . . .  8
     3.1   Example Fragmented Vorbis Packet . . . . . . . . . . . . .  8
     3.2   Packet Loss  . . . . . . . . . . . . . . . . . . . . . . . 10
   4.  Configuration Headers  . . . . . . . . . . . . . . . . . . . . 11
     4.1   In-band Header Transmission  . . . . . . . . . . . . . . . 11
     4.2   Session Description for Vorbis RTP Streams . . . . . . . . 14
     4.3   Codebook Caching . . . . . . . . . . . . . . . . . . . . . 15
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
   6.  Congestion Control . . . . . . . . . . . . . . . . . . . . . . 17
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
   8.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 19
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
   9.1   Normative References . . . . . . . . . . . . . . . . . . . . 20
   9.2   Informative References . . . . . . . . . . . . . . . . . . . 20
       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 20
       Intellectual Property and Copyright Statements . . . . . . . . 21












Kerr                      Expires July 1, 2005                  [Page 2]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

1.  Introduction

   Vorbis is a general purpose perceptual audio codec intended to allow
   maximum encoder flexibility, thus allowing it to scale competitively
   over an exceptionally wide range of bitrates.  At the high
   quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits),
   it is in the same league as MPEG-2 and MPC.  Similarly, the 1.0
   encoder can encode high-quality CD and DAT rate stereo at below 48k
   bits/sec without resampling to a lower rate.  Vorbis is also intended
   for lower and higher sample rates (from 8kHz telephony to 192kHz
   digital masters) and a range of channel representations (monaural,
   polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255
   discrete channels).  Vorbis encoded audio is generally encapsulated
   within an Ogg format bitstream [1], which provides framing and
   synchronization.  For the purposes of RTP transport, this layer is
   unnecessary, and so raw Vorbis packets are used in the payload.

1.1  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [2].















Kerr                      Expires July 1, 2005                  [Page 3]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

2.  Payload Format

   For RTP based transportation of Vorbis encoded audio the standard RTP
   header is followed by a 5 octet payload header, then the payload
   data.  The payload headers are used to associate the Vorbis data with
   its associated decoding codebooks as well as indicating if the
   following packet contains fragmented Vorbis data and/or the the
   number of whole Vorbis data frames.  The payload data contains the
   raw Vorbis bitstream information.

2.1  RTP Header

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The RTP header begins with an octet of fields (V, P, X, and CC) to
   support specialized RTP uses (see [3] and [4] for details).  For
   Vorbis RTP, the following values are used.

   Version (V): 2 bits

   This field identifies the version of RTP.  The version used by this
   specification is two (2).

   Padding (P): 1 bit

   Padding MAY be used with this payload format according to section 5.1
   of [3].

   Extension (X): 1 bit

   Always set to 0, as audio silence suppression is not used by the
   Vorbis codec.

   CSRC count (CC): 4 bits

   The CSRC count is used in accordance with [3].


Kerr                      Expires July 1, 2005                  [Page 4]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   Marker (M): 1 bit

   Set to zero.  Audio silence suppression not used.  This conforms to
   section 4.1 of [9].

   Payload Type (PT): 7 bits

   An RTP profile for a class of applications is expected to assign a
   payload type for this format, or a dynamically allocated payload type
   SHOULD be chosen which designates the payload as Vorbis.

   Sequence number: 16 bits

   The sequence number increments by one for each RTP data packet sent,
   and may be used by the receiver to detect packet loss and to restore
   packet sequence.  This field is detailed further in [3].

   Timestamp: 32 bits

   A timestamp representing the sampling time of the first sample of the
   first Vorbis packet in the RTP packet.  The clock frequency MUST be
   set to the sample rate of the encoded audio data and is conveyed
   out-of-band as a SDP attribute.

   SSRC/CSRC identifiers:

   These two fields, 32 bits each with one SSRC field and a maximum of
   16 CSRC fields, are as defined in [3].

2.2  Payload Header

   After the RTP Header section the following five octets are the
   Payload Header.  This header is split into a number of bitfields
   detailing the format of the following Payload Data packets.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |C|F| T |# pkts.|
      +-+-+-+-+-+-+-+-+

   Codebook Ident: 32 bits

   This 32 bit field is used to associate the Vorbis data to a decoding
   Codebook.  It is created by making a CRC32 checksum of the codebook
   required to decode the particular Vorbis audio stream.


Kerr                      Expires July 1, 2005                  [Page 5]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   Continuation (C): 1 bit

   Set to one if this is a continuation of a fragmented packet.

   Fragmented (F): 1 bit

   Set to one if the payload contains complete packets or if it contains
   the last fragment of a fragmented packet.

   Payload Type (T): 2 bits

   This field sets the packet payload type.  There are currently four
   type of packet payloads.

      0 = Raw Vorbis payload
      1 = Configuration payload
      2 = Codebook payload
      3 = Metadata payload

   The last 4 bits are the number of complete packets in this payload.
   This provides for a maximum number of 15 Vorbis packets in the
   payload.  If the packet contains fragmented data the number of
   packets MUST be set to 0.

2.3  Payload Data

   Raw Vorbis packets are unbounded in length currently, although at
   some future point there will likely be a practical limit placed on
   them.  Typical Vorbis packet sizes are from very small (2-3 bytes) to
   quite large (8-12 kilobytes).  The reference implementation [8]
   typically produces packets less than ~800 bytes, except for the
   codebook header packets which are ~4-12 kilobytes.  Within an RTP
   context the maximum Vorbis packet size, including the RTP and payload
   headers, SHOULD be kept below the path MTU to avoid packet
   fragmentation.

   Each Vorbis payload packet starts with a one octet length header,
   which is used to represent the size of the following data payload,
   followed by the raw Vorbis data.

   For payloads which consist of multiple Vorbis packets the payload
   data consists of the packet length followed by the packet data for
   each of the Vorbis packets in the payload.

   The Vorbis packet length header is the length of the Vorbis data
   block only and does not count the length octet.

   The payload packing of the Vorbis data packets SHOULD follow the


Kerr                      Expires July 1, 2005                  [Page 6]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   guidelines set-out in [4] where the oldest packet occurs immediately
   after the RTP packet header.

   Channel mapping of the audio is in accordance with BS.  775-1 ITU-R.

2.4  Example RTP Packet

   Here is an example RTP packet containing two Vorbis packets.

   RTP Packet Header:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | 2 |0|0|  0    |0|      PT     |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                 timestamp (in sample rate units)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          synchronisation source (SSRC) identifier             |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Payload Data:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|1| 0 | 2 pks |      len      |         vorbis data ...       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      ..                     ...vorbis data...                       ..
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      ..    data      |      len      |   next vorbis packet data...  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+







Kerr                      Expires July 1, 2005                  [Page 7]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

3.  Frame Packetizing

   Each RTP packet contains either one complete Vorbis packet, one
   Vorbis packet fragment, or an integer number of complete Vorbis
   packets (up to a max of 15 packets, since the number of packets is
   defined by a 4 bit value).

   Any Vorbis data packet that is 256 octets or less SHOULD be bundled
   in the RTP packet with as many Vorbis packets as will fit, up to a
   maximum of 15.

   If a Vorbis packet is larger than 256 octets it MUST be fragmented.
   A fragmented packet has a zero in the last four bits of the payload
   header.  Each fragment after the first will also set the Continued
   (C) bit to one in the payload header.  The RTP packet containing the
   last fragment of the Vorbis packet will have the Fragmented (F) bit
   set to one.  To maintain the correct sequence for fragmented packet
   reception the timestamp field of fragmented packets MUST be the same
   as the first packet sent, with the sequence number incremented as
   normal for the subsequent RTP packets.  Path MTU is detailed in [6]
   and [7].

3.1  Example Fragmented Vorbis Packet

   Here is an example fragmented Vorbis packet split over three RTP
   packets.

      Packet 1:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |           1000                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             xxxxx                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|0| 0 |      0|      len      |         vorbis data ..        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       ..vorbis data..                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Kerr                      Expires July 1, 2005                  [Page 8]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   In this packet the initial sequence number is 1000 and the timestamp
   is xxxxx.  The number of packets field is set to 0.

      Packet 2:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |           1001                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             xxxxx                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |1|0| 0 |      0|      len      |         vorbis data ...       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       ..vorbis data..                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The C bit is set to 1 and the number of packets field is set to 0.
   For large Vorbis fragments there can be several of these type of
   payload packets.  The maximum packet size SHOULD be no greater than
   the path MTU, including all RTP and payload headers.  The sequence
   number has been incremented by one but the timestamp field remains
   the same as the initial packet.

      Packet 3:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |           1002                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             xxxxx                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |


Kerr                      Expires July 1, 2005                  [Page 9]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |1|1| 0 |      0|      len      |         vorbis data ..        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       ..vorbis data..                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This is the last Vorbis fragment packet.  The C and F bits are set
   and the packet count remains set to 0.  As in the previous packets
   the timestamp remains set to the first packet in the sequence and the
   sequence number has been incremented.

3.2  Packet Loss

   As there is no error correction within the Vorbis stream, packet loss
   will result in a loss of signal.  Packet loss is more of an issue for
   fragmented Vorbis packets as the client will have to cope with the
   handling of the C and F flags.  If we use the fragmented Vorbis
   packet example above and the first packet is lost the client SHOULD
   detect that the next packet has the packet count field set to 0 and
   the C bit is set and MUST drop it.  The next packet, which is the
   final fragmented packet, SHOULD be dropped in the same manner, or
   buffered.  Feedback reports on lost and dropped packets MUST be sent
   back via RTCP.














Kerr                      Expires July 1, 2005                 [Page 10]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

4.  Configuration Headers

   Unlike other mainstream audio codecs Vorbis has no statically
   configured probability model, instead it packs all entropy decoding
   configuration, VQ and Huffman models into a self-contained codebook.
   This codebook block also requires additional identification
   information detailing the number of audio channels, bitrates and
   other information used to initialise the Vorbis stream.

   To decode a Vorbis stream three configuration header blocks are
   needed.  The first header indicates the sample and bitrates, the
   number of channels and the version of the Vorbis encoder used.  The
   second header contains the decoders probability model, or codebook
   and the third header details stream metadata.

   As the RTP stream may change certain configuration data mid-session
   there are two different methods for delivering this configuration
   data to a client, in-band and SDP which is detailed below.  SDP
   delivery is used to set-up an initial state for the client
   application and in-band is used to change state during the session.
   The changes may be due to different metadata or codebooks as well as
   different bitrates of the stream.

   Out of the two delivery vectors the use of an SDP attribute to
   indicate an URI where the configuration and codebook data can be
   obtained is preferred as they can be fetched reliably using TCP.  The
   in-band codebook delivery SHOULD only be used in situations where the
   link between the client is unidirectional or if the SDP-based
   information is not available.

   Synchronizing the configuration and codebook headers to the RTP
   stream is critical.  The 32 bit Codebook Ident field is used to
   indicate when a change in the stream has taken place.  The client
   application MUST have in advance the correct configuration and
   codebook headers and if the client detects a change in the Ident
   value and does not have this information it MUST NOT decode the raw
   Vorbis data.

4.1  In-band Header Transmission

   The three header data blocks are sent in-band with the packet type
   bits set to match the payload type.  Normally the codebook and
   configuration headers are sent once per session if the stream is an
   encoding of live audio, as typically the encoder state will not
   change, but the encoder state can change at the boundary of chained
   Vorbis audio files.  Metadata can be sent at the start as well as any
   time during the life of the session.  Clients MUST be capable of
   dealing with periodic re-transmission of the configuration headers.


Kerr                      Expires July 1, 2005                 [Page 11]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   A Vorbis configuration header is indicated with the payload type
   field set to 1.  The Vorbis version MUST be set to zero to comply
   with this document.  The fields Sample Rate, Bitrate Maximum/Nominal/
   Minimum and Num Audio Channels are set in accordance with [9] with
   the bsz fields above referring to the blocksize parameters.  The
   framing bit is not used for RTP transportation and so applications
   constructing Vorbis files MUST take care to set this if required.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |             xxxx              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             xxxxx                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|1| 2 |      1| bsz 0 | bsz 1 |       Num Audio Channels      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Vorbis Version                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Audio Sample Rate                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Bitrate Maximum                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Bitrate Nominal                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Bitrate Minimum                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   If the payload type field is set to 2, this indicates the packet
   contains codebook data.

   The configuration information detailed below MUST be completely
   intact, as a client can not decode a stream with an incomplete or
   corrupted codebook set.

   A 16 bit codebook length field precedes the codebook datablock.  The
   length field allows for codebooks to be up to 64K in size.  Packet
   fragmentation, as per the Vorbis data, MUST be performed if the
   codebooks size exceeds path MTU.  The Codebook Ident field MUST be
   set to match the associated codebook needed to decode the Vorbis


Kerr                      Expires July 1, 2005                 [Page 12]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

   stream.

   The Codebook Ident is the CRC32 checksum of the codebook and is used
   to detect a corrupted codebook as well as associating it with its
   Vorbis data stream.  This Ident value MUST NOT be set to the value of
   the current stream if this header is being sent before the boundary
   of the chained file has been reached.  If a checksum failure is
   detected then this is considered to be a failure and MUST be reported
   to the client application.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |             xxxx              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             xxxxx                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           Codebook Ident                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|1| 2 |      1|           Codebook Length                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    length     |           Codebook                           ..
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      ..                          Codebook                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   With the payload type flag set to 3, this indicates that the packet
   contain the comment metadata, such as artist name, track title and so
   on.  These metadata messages are not intended to be fully descriptive
   but to offer basic track/song information.  This message MUST be sent
   at the start of the stream, together with the setup and codebook
   headers, even if it contains no information.  During a session the
   metadata associated with the stream may change from that specified at
   the start, e.g.  a live concert broadcast changing acts/scenes, so
   clients MUST have the ability to receive header blocks.  Details on
   the format of the comments can be found in the Vorbis documentation
   [10].

   The format for the data takes the form of a 32 bit codec vendors name
   length field followed by the name encoded in UTF-8.  The next 32 bit
   field denotes the number of user comments.  Each of the user comments
   is prefixed by a 32 bit length field followed by the comment text.


Kerr                      Expires July 1, 2005                 [Page 13]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |             xxxx              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             xxxxx                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Codebook Ident                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|1| 3 |      1|          Vendor string length                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    length     |          Vendor string                       ..
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    User comments list length                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      ..               User comment length / User comment             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

4.2  Session Description for Vorbis RTP Streams

   Session description information concerning the Vorbis stream SHOULD
   be provided if possible and MUST be in accordance with [5].

   If the stream comprises chained Vorbis files the configuration and
   codebook headers for each file SHOULD be packaged together and passed
   to the client using the headers attribute.

   Below is an outline of the mandatory SDP attributes.

      c=IN IP4/6
      m=audio  RTP/AVP 98
      a=rtpmap:98 VORBIS/44100/2
      a=fmtp:98 header=<URI of configuration header>

   The Vorbis configuraton specified in the header attribute MUST
   contain all of the configuration data and codebooks needed for the
   life of the session.

   The port value is specified by the server application bound to the
   address specified in the c attribute.  The bitrate value and channels
   specified in the m attribute MUST match the Vorbis sample rate value.


Kerr                      Expires July 1, 2005                 [Page 14]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

4.3  Codebook Caching

   Codebook caching allows clients that have previously connected to a
   stream to re-use the associated codebooks and configuration data.
   When a client receives a codebook it may store it locally and can
   compare the CRC32 key with that of the new stream and begin decoding
   before it has received any of the headers.






















Kerr                      Expires July 1, 2005                 [Page 15]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

5.  IANA Considerations

   MIME media type name: audio

   MIME subtype: vorbis

   Required Parameters:

   header indicates the URI of the decoding configuration headers.

   Optional Parameters:

   None.

   Encoding considerations:

   This type is only defined for transfer via RTP as specified in RFC
   XXXX.

   Security Considerations:

   See Section 6 of RFC 3047.

   Interoperability considerations: none

   Published specification:

   See the Vorbis documentation [9] for details.

   Applications which use this media type:

   Audio streaming and conferencing tools

   Additional information: none

   Person & email address to contact for further information:

   Phil Kerr: <phil@plus24.com>

   Intended usage: COMMON

   Author/Change controller:

   Author: Phil Kerr Change controller: IETF AVT Working Group




Kerr                      Expires July 1, 2005                 [Page 16]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

6.  Congestion Control

   Vorbis clients SHOULD send regular receiver reports detailing
   congestion.  A mechanism for dynamically downgrading the stream,
   known as bitrate peeling, will allow for a graceful backing off of
   the stream bitrate.  This feature is not available at present so an
   alternative would be to redirect the client to a lower bitrate stream
   if one is available.






















Kerr                      Expires July 1, 2005                 [Page 17]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

7.  Security Considerations

   RTP packets using this payload format are subject to the security
   considerations discussed in the RTP specification [3].  This implies
   that the confidentiality of the media stream is achieved by using
   encryption.  Because the data compression used with this payload
   format is applied end-to-end, encryption may be performed on the
   compressed data.  Where the size of a data block is set care MUST be
   taken to prevent buffer overflows in the client applications.





















Kerr                      Expires July 1, 2005                 [Page 18]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

8.  Acknowledgments

   This document is a continuation of draft-moffitt-vorbis-rtp-00.txt.
   The MIME type section is a continuation of draft-short-avt-rtp-
   vorbis-mime-00.txt

   Thanks to the AVT, Ogg Vorbis Communities / Xiph.org including Steve
   Casner, Aaron Colwell, Ross Finlayson, Ramon Garcia, Pascal
   Hennequin, Ralph Giles, Tor-Einar Jarnbjo, Colin Law, John Lazzaro,
   Jack Moffitt, Christopher Montgomery, Colin Perkins, Barry Short,
   Mike Smith, Magnus Westerlund.




















Kerr                      Expires July 1, 2005                 [Page 19]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

9.  References

9.1  Normative References

   [1]  Pfeiffer, S., "The Ogg Encapsulation Format Version 0", RFC
        3533.

   [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", RFC 2119.

   [3]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
        "RTP: A Transport Protocol for real-time applications", RFC
        3550.

   [4]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
        Conferences with Minimal Control.", RFC 3551.

   [5]  Handley, M. and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327.

   [6]  Mogul et al., J., "Path MTU Discovery", RFC 1063.

   [7]  McCann et al., J., "Path MTU Discovery for IP version 6", RFC
        1981.

9.2  Informative References

   [8]   "libvorbis: Available from the Xiph website,
         http://www.xiph.org".

   [9]   "Ogg Vorbis I spec:  Codec setup and packet decode.
         http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-ref.html".

   [10]  "Ogg Vorbis I spec:  Comment field and header specification.
         http://www.xiph.org/ogg/vorbis/doc/v-comment.html".

Author's Address

   Phil Kerr
   Xiph.Org

   EMail: phil@plus24.com
   URI:   http://www.xiph.org/




Kerr                      Expires July 1, 2005                 [Page 20]


Internet-Draft        draft-kerr-avt-vorbis-rtp-04         December 2004

Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.

Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Kerr                      Expires July 1, 2005                 [Page 21]