Network Working Group                                    Johan Sjoberg
INTERNET-DRAFT                                       Magnus Westerlund
Expires: March 2005                                           Ericsson
                                                         Ari Lakaniemi
                                                                 Nokia
                                                    September 30, 2004



    Real-Time Transport Protocol (RTP) Payload Format for Extended AMR
                      Wideband (AMR-WB+) Audio Codec
                   <draft-ietf-avt-rtp-amrwbplus-02.txt>



Status of this memo


   By submitting this Internet-Draft, I (we) certify that any
   applicable patent or other IPR claims of which I am (we are) aware
   have been disclosed, and any of which I (we) become aware will be
   disclosed, in accordance with RFC 3668 (BCP 79).


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.


   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.txt


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


   This document is a submission of the IETF AVT WG.  Comments should
   be directed to the AVT WG mailing list, avt@ietf.org.



Abstract


   This document specifies a real-time transport protocol (RTP) payload
   format to be used for Extended AMR Wideband (AMR-WB+) encoded audio
   signals. The AMR-WB+ codec is an audio extension of the AMR-WB codec
   providing additional frame types designed to give higher quality of
   music and speech than the original frame types.  A media type
   registration is included for AMR-WB+.





Sjoberg, et. al.                                              [Page 1]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004




TABLE OF CONTENTS


1. Definitions.....................................................3
   1.1. Glossary...................................................3
   1.2. Terminology................................................3
2. Introduction....................................................3
3. Background on AMR-WB+ and Design Principles.....................4
   3.1. The AMR-WB+ Audio Codec....................................5
   3.2. Multi-rate Encoding and Rate Adaptation....................6
   3.3. Voice Activity Detection and Discontinuous Transmission....7
   3.4. Support for Multi-Channel Session..........................7
   3.5. Unequal Bit-error Detection and Protection.................7
   3.6. Robustness against Packet Loss.............................7
      3.6.1. Use of Forward Error Correction (FEC).................8
      3.6.2. Use of Frame Interleaving.............................9
   3.7. AMR-WB+ Audio over IP scenarios...........................10
4. RTP Payload Format for AMR-WB+.................................11
   4.1. RTP Header Usage..........................................11
   4.2. Payload Structure.........................................12
   4.3. Payload definitions.......................................13
      4.3.1. The Payload Table of Contents........................13
      4.3.2. Audio Data...........................................15
      4.3.3. Methods for Forming the Payload......................16
      4.3.4. Payload Examples.....................................17
   4.4. Interleaving Considerations...............................18
   4.5. Implementation Considerations.............................19
      4.5.1. ISF recovery when frames are lost....................19
5. Congestion Control.............................................21
6. Security Considerations........................................21
   6.1. Confidentiality...........................................22
   6.2. Authentication and Integrity..............................22
   6.3. Decoding Validation.......................................22
7. Payload Format Parameters......................................23
   7.1. Media Type Registration...................................23
   7.2. Mapping Media Type Parameters into SDP....................25
      7.2.1. Offer-Answer Model Considerations....................25
      7.2.2. Examples.............................................26
8. IANA Considerations............................................27
9. Contributors...................................................27
10. Acknowledgements..............................................27
11. References....................................................27
   11.1. Normative references.....................................27
   11.2. Informative References...................................28
12. Authors' Addresses............................................29
13. IPR Notice....................................................29
14. Copyright Notice..............................................30
15. Changes.......................................................30







Sjoberg, et. al.            Standards Track                  [Page 2]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



1. Definitions


1.1. Glossary


   3GPP    - the Third Generation Partnership Project
   AMR     - Adaptive Multi-Rate Codec
   AMR-WB  - Adaptive Multi-Rate Wideband Codec
   AMR-WB+ - Extended Adaptive Multi-Rate Wideband Codec
   CMR     - Codec Mode Request
   CN      - Comfort Noise
   DTX     - Discontinuous Transmission
   FEC     - Forward Error Correction
   FT      - Frame Type
   ISF     - Internal Sampling Frequency
   SCR     - Source Controlled Rate Operation
   SID     - Silence Indicator (the frames containing only CN
             parameters)
   TFI     - Transport Frame Index
   TS      - Timestamp
   VAD     - Voice Activity Detection
   UED     - Unequal Error Detection
   UEP     - Unequal Error Protection



1.2. Terminology


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC 2119 [2].


   n^r is exponentiation where n is multiplied by itself r times; n and
   r are integers. k%m denotes the modulo operation (k mod m), i.e. the
   remainder part from the operation k/m; k and m are integers.



2. Introduction


   This document specifies the payload format for packetization of
   Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] encoded audio
   signals into the Real-time Transport Protocol (RTP) [3].  The
   payload format supports transmission of mono or stereo audio,
   aggregating multiple frames per payload, and mechanisms enhancing
   robustness against packet loss.


   AMR-WB+ codec is an extension to the Adaptive Multi-Rate Wideband
   (AMR-WB) codec and therefore has a couple of features not available
   in AMR-WB.  The new features in transport point of view are native
   support also for stereophonic audio and the possibility to use
   different internal sampling frequencies.  The primary usage scenario
   for AMR-WB+ is transport over IP and therefore AMR-WB-like need for
   interworking with other transport networks is not necessary.




Sjoberg, et. al.            Standards Track                  [Page 3]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004




   AMR-WB+ will mainly be used in streaming scenarios and there the
   benefit of using an octet-aligned format to decrease the complexity
   of the server is seen substantial, and therefore anything similar to
   the bandwidth efficient mode defined in [7] is not specified for
   AMR-WB+; the saved bandwidth using bandwidth efficient mode would
   also be very small for all extension frame types as they are octet
   aligned.


   The inbuilt codec support for stereo encoding makes the RTP payload
   format implementation of multi-channel support as in AMR and AMR-WB
   [7] difficult, but also less needed.  Therefore, the multi-channel
   support as specified in AMR and AMR-WB payload format is not
   specified for AMR-WB+. Due to all these changes, and the different
   scope of the AMR-WB+ codec this formats defines a new significantly
   different RTP payload format compared to the ones for AMR and AMR-WB
   [7].


   There is no file format for AMR-WB+ defined within this
   specification.  Instead the 3GPP defined ISO based 3GP file format
   [14] will support AMR-WB+, and provides all functionality needed
   from a file format.  This format does also support storage of AMR
   and AMR-WB, plus other multi-media formats allowing for synchronized
   playback.  As the 3GP format provides much greater capability than
   the previously defined formats for AMR and AMR-WB, this format is
   expected to be used and be sufficient for all use cases.


   Background on AMR-WB+ and design principles can be found in Section
   3.  The payload format itself is specified in Section 4 and follows
   the principles used in [3], [9], and [7].  In Section 7, a media
   type registration is provided.



3. Background on AMR-WB+ and Design Principles


   The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec
   is designed for compression of speech and audio achieving low bit-
   rate with good quality. The codec is specified by 3GPP, and primary
   target applications within 3GPP are packet-switched streaming
   service (PSS) [13] and multimedia messaging service (MMS). However,
   due to its flexibility and robustness, AMR-WB+ is very well suited
   for streaming services in highly varying transport environments,
   e.g. the Internet.


   Because of the flexibility of this codec, the behavior in a
   particular application is controlled by several parameters that
   select options or specify the acceptable values for a variable.
   These options and variables are described in general terms at
   appropriate points in the text of this specification as parameters
   to be established through out-of-band means. In Section 7, all of
   the parameters are specified in the form of media type registration




Sjoberg, et. al.            Standards Track                  [Page 4]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   for the AMR-WB+ encoding. The method used to signal these parameters
   at session setup or to arrange prior agreement of the participants
   is beyond the scope of this document; however, Section 7.2 provides
   a mapping of the parameters into the Session Description Protocol
   (SDP) [6] for those applications that use SDP.



3.1. The AMR-WB+ Audio Codec


   The AMR-WB+ audio codec was originally developed by 3GPP to be used
   for streaming and messaging services in GSM and 3G cellular systems.
   AMR-WB+ is designed as an audio extension to the AMR-WB speech
   codec. The new extension frame types add new functionality to the
   codec in order to provide high audio quality for a large range of
   signals including music. Stereophonic operation has also been added
   where a new high-efficiency hybrid stereo coding algorithm enables
   stereo operation at bit-rates as low as 6.2 kbit/s in total.


   The AMR-WB+ audio codec includes the nine frame types specified for
   AMR-WB, extended with additional new frame types with bit-rates
   ranging from 5.2 to 48 kbit/s. Whereas the AMR-WB frame types employ
   16000 Hz sampling frequency and operates only on monophonic signals,
   the extension frame types can operate at a number of internal
   sampling frequencies, ISFs, both in mono and stereo, see Table 24 in
   [1]. However, the output sampling frequency of the decoder is
   limited to 8, 16, 24, 32 or 48 kHz.


   The audio processing is performed on equal-size superframes, each
   corresponding to 2048 samples per encoded channel. The codec
   performs a number of encoding decisions for each superĀ”frame
   choosing between different encoding algorithms and block lengths
   giving fidelity-optimized encoding adapting to the signal
   characteristics of the source. The superframes are encoded in 4
   equal-size transport frames, i.e. corresponding to 512 samples per
   channel, each being individually decodable. For the individual
   transport frames to be decodable, the position within the superframe
   must be known.


   An AMR-WB+ frame type is constructed from two different parameters;
   core bit-rate, and stereo bit-rate. The core bit-rate denotes the
   bits available for the core codec while the stereo bit-rate denotes
   the bit-rate added to the core bit-rate when enabling stereo
   encoding. In order to calculate the correct bit-rate, also the ISF
   must be taken into account. The total bit-rate of the frame is
   calculated as the sum of the core bit-rate and the stereo bit-rate
   times the ISF where 25600 Hz has been normalized to 1. The AMR-WB+
   standard specifies eight core bit-rates, sixteen stereo bit-rates
   and thirteen different ISF values. These can be found in Tables 22,
   23 and 24 in [1]. In addition to the AMR-WB frame types 0-9, there
   are four pre-defined AMR-WB+ extension frame types, which have fixed
   core bit-rates, stereo bit-rates and ISFs, see Table 21 in [1].




Sjoberg, et. al.            Standards Track                  [Page 5]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   These four pre-defined frame types have also a fixed input sampling
   frequency to the encoder set at 16 or 24 kHz respectively. These
   frametypes share the property with the AMR-WB modes that each frame
   is only capable of representing 20 ms of audio signal.


   Since there is a large number of possible parameter combinations, a
   limited normative combination set of core bit-rates and stereo bit-
   rates has been defined, see Table 25 in [1]. Note that the first 16
   entries in this table are the same as the entries in Table 21, which
   incorporates the original AMR-WB modes. The totel bit rate specified
   with the frame type in conjunction with the chosen ISF defines the
   actual codec bit rate. There exist a number of combinations that
   will produce the same codec bit-rate. For example, one possible way
   of producing a 32 kbps audio stream is to utilize frame type 41,
   i.e. 25.6 kbps, and the ISF of 32kHz (5/4 * (19.2+6.4) = 32 kbps),
   and another way is to use frame type 47 and the ISF of 25.6 kHz (1 *
   (24 + 8) = 32 kbps).


   The duration of one AMR-WB+ audio transport frame can vary and
   depends on the ISF. Since a frame always correspond to 512 samples
   at the used ISF, its duration is limited to the range 13.33 to 40
   ms. The RTP TS clock rate 72000 Hz results in an AMR-WB+ transport
   frame length from 960 to 2880 ticks. If the internal sampling rate
   is set to 25600 Hz, a transport frame is equal to 20 ms and the
   superframe is equal to 80 ms.


   The encoder is able to change the used ISF and encoding frame type
   (both mono and stereo) during an encoding session. For the extension
   frame types with index 16-47 ISF changes are constrained to occur at
   superframe boundaries, i.e. within a super-frame the ISF is
   constant. Such a limitation does not apply for frame types with
   index 0-9, i.e. the original AMR-WB frame types.


3.2. Multi-rate Encoding and Rate Adaptation


   The multi-rate encoding capability of AMR-WB+ is designed for
   preserving high audio quality under a wide range of bandwidth
   requirements and transmission conditions.


   AMR-WB+ enables seamless switching between frame types using the
   same number of audio channels and the same internal sampling
   frequency. Every AMR-WB+ codec implementation is required to support
   all the respective audio coding frame types defined by the codec and
   must be able to handle switching between any two frame types.
   Switching between frame types employing different number of audio
   channels or different internal sampling frequency is possible, but
   may not be seamless. Therefore it is recommended to perform such
   switchings infrequently and if possible during periods where the
   input is silent.






Sjoberg, et. al.            Standards Track                  [Page 6]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



3.3. Voice Activity Detection and Discontinuous Transmission


   AMR-WB+ supports the same algorithms for voice activity detection
   (VAD) and generation of comfort noise (CN) parameters during silence
   periods as used by the AMR-WB codec. However it can only be used in
   together with the AMR-WB frame types (FT=0-8). As with the AMR-WB
   codec this option allows for reduction of the number of transmitted
   bits and packets during silence periods to a minimum when operating
   in the AMR-WB frame types (FT = 0..8). The operation of sending CN
   parameters at regular intervals during silence periods is usually
   called discontinuous transmission (DTX) or source controlled rate
   (SCR) operation.  The AMR-WB+ frames containing CN parameters are
   called Silence Indicator (SID) frames. See more details about VAD
   and DTX functionality in [4] and [5].



3.4. Support for Multi-Channel Session


   Some of the AMR-WB+ frame types support encoding of stereophonic
   audio. Because of this native support for two-channel stereophonic
   signal it does not seem necessary to support multi-channel transport
   with separate codecs as done in AMR-WB RTP payload [7].  The codec
   has the capability of stereo to mono downmixing. Thus also receiver
   only capable of playout of mono, can still decode and play signals
   originally encoded as stereo. However, to avoid spending bit-rate on
   stereo encoding that will not be utilized, a mechanism for
   signalling mono only support is defined.



3.5. Unequal Bit-error Detection and Protection


   The audio bits encoded in each AMR-WB frame are sorted according to
   their different perceptual sensitivity to bit errors. This property
   can be exploited e.g. in cellular systems to achieve better voice
   quality by using unequal error protection and detection (UEP and
   UED) mechanisms. However, the bits of the extension frame types of
   the AMR-WB+ codec do not have a consistent sensitivity property and
   are not sorted in sensitivity order. Thus, UEP or UED cannot be
   utilized with the extension frame types. If there is a need to use
   UEP or UED and for a payload format supporting this, please use the
   RTP payload format for the AMR-WB frame types defined in RFC 3267
   [7].



3.6. Robustness against Packet Loss


   The payload format supports several means, including forward error
   correction (FEC) and frame interleaving, to increase robustness
   against packet loss.






Sjoberg, et. al.            Standards Track                  [Page 7]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



3.6.1. Use of Forward Error Correction (FEC)


   The simple scheme of repetition of previously sent data is one way
   of achieving FEC. Another possible scheme which can be more
   bandwidth efficient is to use payload external FEC, e.g. RFC2733
   [11], which generates extra packets containing repair data.  For the
   AMR-WB+ extension frame types, it is only possible to use the codec
   to send redundant copies using the same frame type and internal
   sampling frequency. We describe such a scheme next.


   This involves the simple retransmission of previously transmitted
   frames together with the current frame(s). This is done by using a
   sliding window to group the audio frames to be sent in each payload.
   Figure 1 below shows us an example.


   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--


     <---- p(n-1) ---->
              <----- p(n) ----->
                       <---- p(n+1) ---->
                                <---- p(n+2) ---->
                                         <---- p(n+3) ---->
                                                  <---- p(n+4) ---->


   Figure 1: An example of redundant transmission.


   In this example each frame is retransmitted once in the following
   RTP payload packet. Here, f(n-2)..f(n+4) denotes a sequence of audio
   frames and p(n-1)..p(n+4) a sequence of payload packets.


   The use of this approach does not require signaling at the session
   setup. In other words, the audio sender can choose to use this
   scheme without consulting the receiver. This is because a packet
   containing redundant frames will not look different from a packet
   with only new frames. For a certain timestamp, the receiver may
   receive multiple copies of a frame containing encoded audio data or
   frames indicated as NO_DATA.


   This redundancy scheme provides the same functionality as the one
   described in RFC 2198 "RTP Payload for Redundant Audio Data" [12].
   In most cases the mechanism in this payload format is more efficient
   and simpler than requiring both endpoints to support RFC 2198 in
   addition. There is one situation in which use of RFC 2198 is
   indicated: if some other codec than AMR-WB+ is desired for the
   redundant encoding, the AMR-WB+ payload format won't be able to
   carry it.


   The sender is responsible for selecting an appropriate amount of
   redundancy based on feedback about the channel, e.g., in RTCP




Sjoberg, et. al.            Standards Track                  [Page 8]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   receiver reports. The sender is also responsible for avoiding
   congestion, which may be exacerbated by redundancy (see Section 5
   for more details).



3.6.2. Use of Frame Interleaving


   To decrease protocol overhead, the payload design allows several
   audio frames be encapsulated into a single RTP packet. One of the
   drawbacks of such an approach is that in case of packet loss this
   means loss of several consecutive audio frames, which usually causes
   clearly audible distortion in the reconstructed audio. Interleaving
   of frames can improve the audio quality in such cases by
   distributing the consecutive losses into a series of single frame
   losses.  However, interleaving and bundling several frames per
   payload will also increase end-to-end delay and sets higher
   buffering requirements, and it is therefore not appropriate for all
   usage scenarios. Anyway, streaming applications will most likely be
   able to exploit interleaving to improve audio quality in lossy
   transmission conditions.


   This payload design supports the use of frame interleaving as an
   option.  The usage of this feature needs to be negotiated or at
   least signalled.


   The interleaving supported by this format is rather flexible. For
   example, a continuous pattern can be defined, as the below example
   shows.


   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--


              [ P(n)   ]
     [ P(n+1) ]                 [ P(n+1) ]
                       [ P(n+2) ]                 [ P(n+2) ]
                                         [ P(n+3) ]                 [P(
                                                           [ P(n+4) ]


   Figure 2: Example of interleaving pattern that has constant delay.


   In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
   aggregated two in each packet with interleaving. The packets, P(n)
   to P(n+4), contains a pattern that allows for constant delay in both
   interleaving and deinterleaving process.  The deinterleaving buffer
   in this example needs to have room for at least 3 frames including
   the one that is ready to be consumed. The case when this is needed
   is for example when f(n) is the next to be played, then the receiver
   would have consumed all previous frames, and will need to have f(n),
   f(n+1) and f(n+3) in the buffer.  Then when it is time to consume
   f(n+1) no more RTP packet is need. When f(n+2) is to be consumed




Sjoberg, et. al.            Standards Track                  [Page 9]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   then P(n+3) is needed and the deinterleaving buffer will contain
   f(n+2), f(n+3) and f(n+5).



3.7. AMR-WB+ Audio over IP scenarios


   Since the primary target application for the AMR-WB+ codec is packet
   switched streaming, the most relevant usage scenario for this
   payload format is IP end-to-end between a server and a terminal, as
   shown in Figure 3.


             +----------+                          +----------+
             |          |    IP/UDP/RTP/AMR-WB+    |          |
             |  SERVER  |<------------------------>| TERMINAL |
             |          |                          |          |
             +----------+                          +----------+


              Figure 3: Server to terminal IP scenario





































Sjoberg, et. al.            Standards Track                 [Page 10]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



4. RTP Payload Format for AMR-WB+


   The AMR-WB+ payload format is different from the AMR and AMR-WB
   payload formats [7]. The structure is simpler, and does only consist
   of a table of contents, and the audio data. The payload format has
   two modes, the basic, and the interleaved mode. The main structural
   difference between the two modes is the extension of the table of
   contents with a timestamp offset field in the interleaved mode.


   The basic mode supports aggregation of multiple consecutive frames
   in a payload. The interleaved mode supports aggregation of multiple
   frames that are non-consecutive in time.  It is possible to have
   frames of different internal sampling frequency in the same payload.
   However frequent switching of the internal sampling frequency is not
   expected. The codec is restricted for the extended frame types to
   switch ISF on superframe boundaries.  However to avoid any
   limitation on how many frames that are present in a payload, the
   payload format allows for switching at any frame in the payload.


   The payload format is designed around the property that the AMR-WB+
   frames can be sorted and identified based on the RTP timestamp of
   each audio frame. For example, the timestamp of the audio frames is
   used to identify duplicates. The timestamp is also used in the
   deinterleaving buffer to regenerate the correct order of the frames
   before decoding.


   The interleaving scheme of this payload format is significantly more
   flexible than the one present in RFC 3267. The AMR and AMR-WB
   payload format is only capable of using periodic patterns with
   frames taken from an interleaving group at fixed intervals. This
   interleaving scheme allows for any patterns as long as the time
   difference between any two in the payload adjacent frames are not
   more than 0.91 seconds, i.e. maximum field value / RTP timestamp
   rate (65535/72000). And by using extra NO_DATA frames even that can
   be extended.


   To allow for error resiliency through redundant transmission, the
   periods covered by multiple packets MAY overlap in time.  A receiver
   MUST be prepared to receive any audio frame multiple times, all
   multiply sent frames MUST use the same frame type (or NO_DATA) and
   internal sampling frequency and have the same RTP timestamp.


   The payload is always made an integral number of octets long by
   padding with zero bits if necessary.  If additional padding is
   required to bring the payload length to a larger multiple of octets
   or for some other purpose, then the P bit in the RTP header MAY be
   set and padding appended as specified in [3].



4.1. RTP Header Usage





Sjoberg, et. al.            Standards Track                 [Page 11]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   The format of the RTP header is specified in [3].  This payload
   format uses the fields of the header in a manner consistent with
   that specification.


   The RTP timestamp corresponds to the sampling instant of the first
   sample encoded for the first frame in the packet.  The timestamp
   clock frequency SHALL be 72000 Hz. This frequency allows the frame
   duration to be integer RTP timestamp ticks for the used internal
   sampling frequencies, and also gives reasonable conversion factors
   to used audio sampling frequencies. See section 4.3.1 for how to
   derive the RTP timestamp for any audio frame beyond the first one.


   The RTP header marker bit (M) SHALL be set to 1 if the first frame
   carried in the packet contains an audio frame, which is the first in
   a talkspurt. For all other packets the marker bit SHALL be set to
   zero (M=0).


   The assignment of an RTP payload type for this new packet format is
   outside the scope of this document, and will not be specified here.
   It is expected that the RTP profile under which this payload format
   is being used will assign a payload type for this encoding or
   specify that the payload type is to be bound dynamically.


   The media type parameter "channels" is used to indicate the maximum
   number of channels allowed to be used for a given payload type. A
   payload type where channels=1 (mono), SHALL only carry mono content.
   While a payload type for which channels=2 has been declared MAY
   carry both mono and stereo content.



4.2. Payload Structure


   The complete payload consists of a payload table of contents, and
   audio data representing one or more audio frames.  The following
   diagram shows the general payload format layout:


   +-------------------+----------------
   | table of contents | audio data ...
   +-------------------+----------------


   Payloads containing more than one audio frame are called compound
   payloads.


   The following sections describe the variations taken by the payload
   format depending on whether the AMR-WB+ session is set up to use the
   basic mode or interleaved mode.









Sjoberg, et. al.            Standards Track                 [Page 12]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



4.3. Payload definitions


4.3.1. The Payload Table of Contents


   The table of contents (ToC) consists of a list of ToC entries where
   each entry corresponds to an audio frame carried in the payload,
   i.e.,


   +----------------+----------------+- ... -+----------------+
   |  ToC entry #1  |  Toc entry #2  |          ToC entry #N  |
   +----------------+----------------+- ... -+----------------+


   When multiple frames are present in a packet, the ToC entries SHALL
   be placed in the packet in order of their creation time.


   All fields in the RTP payload are in network byte order, i.e. with
   the left most bit being most significant.


   A ToC entry takes the following format:


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F| Frame Type  |TFI|R|  ISF    | Timestamp offset (optional)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   F (1 bit): If set to 1, indicates that this frame is followed by
      another audio frame in this payload; if set to 0, indicates that
      this frame is the last frame in this payload.


   Frame Type (FT) (7 bits): Indicates the audio codec frame type used
      for the corresponding frame. Indicates the combination of AMR-WB+
      core and stereo rate, special AMR-WB+ frame types, the AMR-WB
      rate, or comfort noise, as specified by Table 25 in [1].


   Transport Frame Index (TFI) (2 bits): An index from 0 (first) to 3
      (last) indicating this transport frame's position in the
      superframe. This field SHALL be set to 0 for Frame Type values 0-
      9.


   ISF (5 bits): Indicates the internal sampling frequency employed for
      the corresponding frame. The index values correspond to internal
      sampling frequency as specified in Table 24 in [1]. This field
      SHALL be set to 0 for Frame Type values 0-13.


   Timestamp offset (16 bits): When using interleaved mode, this field
      SHALL be present, otherwise not. The field indicates the number
      of RTP Timestamp ticks that this frame is offset, in relation to
      the previous frame's RTP timestamp value. The RTP Timestamp
      offset for the first audio frame SHALL be 0. The field is in
      network byte order and is a 16 bit unsigned integer.




Sjoberg, et. al.            Standards Track                 [Page 13]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004




   R: Reserved bit, SHALL be set to 0 and SHALL be ignored by
      receivers.


   The RTP Timestamp value for a frame is the timestamp value of the
   first sample encoded in the frame. The timestamp value for a frame
   is derived differently depending on if it is basic or interleaved
   mode. In both cases the first frame in a compound packet has a RTP
   timestamp equal to the one given in the RTP header. In the basic
   mode, the RTP time for any frame of a subsequent frame is derived by
   adding together the frame durations of all the previous frames and
   add that to the RTP header timestamp value. For example if the RTP
   Header timestamp value is 12345, and the frame duration is 16 ms
   (Internal sampling frequency = 32 kHz).
   Then the RTP timestamp of a fourth frame present in the payload will
   be 12345 + 3 * 1152 = 15801.


   In interleaved mode the RTP timestamp is derived from the RTP header
   timestamp field and the sum of the RTP timestamp offset field in the
   TOC entries up to and including the frame for which one calculates
   the RTP TS for in modulo arithmetic. The following example derives
   the RTP TS for the third frame in a compound packet, which has the
   following header and TOC information:


   RTP header TS: 12345
   Frame 1 offset field: 0
   Frame 2 offset field: 13824
   Frame 3 offset field: 18432


   In this case one simply adds together the offset values up to
   current frame to compute the frame timestamp. For example Frame 3's
   timestamp is (12345 + 0 + 13824 + 18432)% 2^32 = 44601 (% stands for
   modulo operation)


   The value of Frame Type is defined in Table 25 in [1]. FT=14
   (AUDIO_LOST) is used to indicate frames that are lost. NO_DATA
   (FT=15) frame could mean either that there is no data produced by
   the audio encoder for that frame or that no data for that frame is
   transmitted in the current payload (i.e., valid data for that frame
   could be sent in either an earlier or later packet). The duration
   for these non-included frames is dependent on the internal sampling
   frequency indicated by the ISF field.


   For frame types with index 0-13 the ISF field SHALL be set 0 and has
   no meaning. The frame length for these frame types are fixed to 20
   ms in time, and an RTP timestamp duration of 1440 ticks. For frame
   types with index 0-9 the TFI field SHALL be set to 0, and lacks
   meaning.


   If receiving a ToC entry with a FT value not defined the whole
   packet SHOULD be discarded.  This is to avoid the loss of data




Sjoberg, et. al.            Standards Track                 [Page 14]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   synchronization in the depacketization process, which can result in
   a severe degradation in audio quality.


   Note that packets containing only NO_DATA frames SHOULD NOT be
   transmitted.  Also, NO_DATA frames at the end of a frame sequence to
   be carried in a payload SHOULD NOT be included in the transmitted
   packet.  The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX
   described in [5] and SHALL only be used in combination with the AMR-
   WB frame types (0-8).


   When multiple frames are present, their ToC entries will be placed
   in the ToC in order of their creation time independent on payload
   mode. In basic mode the frames will be consecutive in time, while in
   interleaved mode the frames may not only be non-consecutive in time
   but may even have varying inter frame distances.


   The following figure shows an example of a ToC of three entries in
   basic mode.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type1 | 0 |0| ISF 1   |1| Frame Type2 | 1 |0| ISF 2   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| Frame Type3 | 2 |0| ISF 3   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   The following figure shows an example of a TOC of three entries in
   interleaved mode.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type1 | 2 |0| ISF 1   | Timestamp offset 1            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| Frame Type2 | 0 |0| ISF 2   | Timestamp offset 2            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| Frame Type3 | 3 |0| ISF 3   | Timestamp offset 3            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



4.3.2. Audio Data


   Audio data of a payload contains one or more audio frames or comfort
   noise frames, as described in the ToC of the payload.


      Note, for ToC entries with FT=14 or 15, there will be no
      corresponding audio frame present in the audio data.


   Each audio frame for an extension frame type represents an AMR-WB+
   transport frame corresponding to the encoding of 512 samples of




Sjoberg, et. al.            Standards Track                 [Page 15]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   audio sampled with the internal sampling frequency specified by the
   ISF indicator. Frame types with index 10-13, being the exception,
   are only capable of using a single internal sampling frequency
   (25600 Hz).  The encoding rates (core and stereo) are indicated in
   the frame type field of the corresponding ToC entry. The octet
   length of the audio frame is implicitly defined by the frame type
   field and is given in tables 21 and 25 of [1]. The order and
   numbering notation of the bits are as specified in [1]. As specified
   there, the bits of the AMR-WB audio frames (frame type values in
   range 0...8) have been rearranged in order of decreasing
   sensitivity. For the AMR-WB+ extension frame types and comfort noise
   frames, the bits are in the order produced by the encoder. The last
   octet of each audio frame MUST be padded with zeroes at the end if
   not all bits in the octet are used. In other words, each audio frame
   MUST be octet-aligned. However, all frame types specified in [1]
   lead to octet-aligned frames.



4.3.3. Methods for Forming the Payload


   The payload begins with the table of contents consisting of a list
   of ToC entries, two or four bytes per entry.


   The audio data follows the table of contents, all of the octets
   comprising an audio frame are appended to the payload as a unit. The
   audio frames are packed in the same order as their corresponding ToC
   entries are arranged in the ToC list, with the exception that if a
   given frame has a ToC entry with FT=14 or 15, there will be no data
   octets present for that frame.


























Sjoberg, et. al.            Standards Track                 [Page 16]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



4.3.4. Payload Examples


4.3.4.1. Example 1, Basic Payload Carrying Multiple Frames


   The following diagram shows a payload from a session that carries
   three AMR-WB+ frames of 14 kbps coding frame type (FT=26) with a
   frame length of 280 bits. The internal sampling frequency in this
   example is 25.6 kHz (ISF = 8). The TFI for the first frame is 2,
   indicating that the first transport frame in this payload is the
   third in a superframe. The following frames are consecutive, i.e.
   the fourth and first transport frames in the superframe.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| FT = 26     | 2 |0| ISF = 8 |1| FT = 26     | 3 |0| ISF = 8 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| FT = 26     | 0 |0| ISF = 8 |   f1(0..7)    |   f1(8..15)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f1(272..279)  |   f2(0..7)    | ...                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                                           | f2(272..279)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   f3(0..7)    | ...                                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                           | f3(272..279)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+






















Sjoberg, et. al.            Standards Track                 [Page 17]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



4.3.4.2. Example 2, Payload in Interleaved mode


This example shows a payload with three frames of 24 kbps stereo coding
frame type (FT=40).  This payload uses the interleaved mode.  The
frames 1, 2 and 3 are not consecutive in time. They are in playout
order frame 1, 8, and 15, and the TFI values also match this. The
internal sampling frequency in this example is 32 kHz (ISF = 10).


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| FT = 40     | 1 |0| ISF = 10| Timestamp offset = 0          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1| FT = 40     | 0 |0| ISF = 10| Timestamp offset = 8064       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| FT = 40     | 3 |0| ISF = 10| Timestamp offset = 8064       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   f1(0..7)    |   f1(8..15)   |  f1(16..23)   |  f1(24..31)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f1(448..455)  | f1(456..463)  | f1(464..471)  | f1(472..479)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   f2(0..7)    |   f2(8..15)   |  f2(16..23)   |  f2(24..31)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f2(448..455)  | f2(456..463)  | f2(464..471)  | f2(472..479)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   f3(0..7)    |   f3(8..15)   |  f3(16..23)   |  f3(24..31)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | f3(448..455)  | f3(456..463)  | f3(464..471)  | f3(472..479)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



4.4. Interleaving Considerations


   The flexible interleaving scheme requires some further usage
   considerations. As presented in the example in Section 3.6.2, an
   interleaving pattern requires certain sizes of the deinterleaving
   buffer. This required buffer space, expressed as number of frame
   slots is expressed using the "interleaving" media parameter. The
   number of frame slots needed, can be converted into actual memory
   requirement, considering the largest (in bytes) combination of AMR-
   WB+'s core and stereo rates.


   However the frame buffer size is not always sufficient to determine
   when it is appropriate to start consuming frames from the
   interleaving buffer. Two cases exist, either due to switching of the




Sjoberg, et. al.            Standards Track                 [Page 18]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   internal sampling frequency or due to changes of the interleaving
   pattern. Due to this the "int-delay" media type parameter is
   defined. It allows a sender to indicate the minimal media time that
   needs to be present in the buffer before starting to consume media
   from the buffer.



4.5. Implementation Considerations


   An application implementing this payload format MUST understand all
   the payload parameters in the out-of-band signaling used.  For
   example, if an application uses SDP, all the SDP and MIME parameters
   in this document MUST be understood.  This requirement ensures that
   an implementation always can decide if it is capable or not of
   communicating.


   Both basic and interleaving mode SHALL be implemented. The
   implementation burden of both is rather small and requiring both
   ensures interoperability. It is also RECOMMENDED to implement the
   AMR-WB format in RFC 3267 [7], for applications or scenarios where
   interoperability with AMR-WB only codecs is necessary.


   When doing error concealment certain precautions are needed due to
   the possibility of switching of the internal sampling frequency. The
   main difficulty arises from the fact that with packet loss
   information gets lost such as timestamp, frame lengths and the
   chosen ISF. This may lead to that concealment is done using
   incorrect framelengths, which can in the worst case make some of the
   subsequent frames unusable. More information and an example
   algorithm solving this problem is available in section 4.5.1 below.


   As the AMR-WB+ codec contains all the functionality of the AMR-WB
   codec, anyone supporting the AMR-WB+ codec and this payload format
   is RECOMMENDED to also implement the payload format in RFC 3267 [7]
   for the AMR-WB frame types. This will significantly help
   interoperability with other devices that only support AMR-WB, in
   applications and scenarios where this is possible. Otherwise an end-
   point that is in fact capable of everything except the RTP payload
   format for AMR-WB will not be able to communicate.



4.5.1. ISF recovery when frames are lost


   In case of packet loss proper error concealment has to be initiated
   in the AMR-WB+ decoder for the lost frames associated with the lost
   packets. Proper frame loss concealment requires a codec framing that
   matches the timestamps of the correctly received frames. Hence, it
   is necessary to recover the timestamps of the lost frames.
   Adifficulty with this may arise due to the fact that the codec frame
   length that is associated with the ISF may have changed during the
   frame loss.




Sjoberg, et. al.            Standards Track                 [Page 19]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004




   The task of recovering the timestamps of lost frames is illustrated
   in an example in which a case is assumed where two frames at
   timestamps t0 and t1 have been received properly, the ISF values
   being isf0 and isf1, respectively. The associated frame lengths (in
   timestamp ticks) are given with L0 and L1, respectively. Three
   frames with timestamps x1 - x3 have been lost. The example further
   assumes that there is one ISF change during the frame loss from isf0
   to isf1, as shown in the figure below.


   What is generally not known in the decoder and what is required for
   recovery of the timestamps is:
   * the ISFs associated to the lost frames
   * how many frames have been lost



     |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|


     |   Rxd    |   lost   | lost | lost |  Rxd |
   --+----------+----------+------+------+------+--


     t0         x1         x2     x3     t1


   In the following an example algorithm is given according to which
   timestamps and ISFs belonging to lost frames can be recovered.


   As in above example, it is assumed that two frames have been
   received properly with timestamps t0 and t1, and ISF values isf0 and
   isf1, and associated frame lengths L0 and L1, respectively.
   Furthermore, the TFIs of the two received frames are denoted by tfi0
   and tfi1, respectively.


   Example Algorithm:


   Start:                              # check for frame loss
   If (t0 + L0) == t1 Then goto End    # no frame loss


   Step 1:                             # check case with no ISF change
   If (isf0 != isf1) Then goto Step 2  # At least one ISF change
   If (isFractional(t1 - t0)/L0) Then goto Step 3
                                       # More than 1 ISF change


   Return recovered timestamps as
   x(n) = t0 + n*L1 and associated ISF equal to isf0, for 0<n<(t1 -
   t0)/L0
   goto End


   Step 2:
   Loop initialization: n := 4 - tfi0 mod 4
   While n <= (t1-t0)/L0
     Evaluate m := (t1 - t0 - n*L0)/L1




Sjoberg, et. al.            Standards Track                 [Page 20]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



     If (isInteger(m) AND ((tfi0+n+m) mod 4 == tfi1)) Then goto found;
     n := n+4
     endloop
   goto step 3                         # More than 1 ISF change


   found:
   Return recovered timestamps and ISFs as
   x(i) = t0 + i*L0 and associated ISF equal to isf0, for 0 < i <= n
   x(i) = t0 + n*L0 + (i-n)*L1 and associated ISF equal to isf1, for n
   < i <= n+m
   goto End


   Step 3:
   More than 1 ISF change has occurred. Since LSF changes can be
   assumed to be infrequent, such a situation occurs only if long
   sequences of frames are lost. In that case it is not useful to
   recover the timestamps of the lost frames. Rather, the AMR-WB+
   decoder should be reset and decoding should be resumed starting with
   the frame with timestamp t1.


   End:



5. Congestion Control


   The general congestion control considerations for transporting RTP
   data apply to AMR-WB+ audio over RTP as well, see RTP [3] and any
   applicable RTP profile like AVP [9].  However, the multi-rate
   capability of AMR-WB+ audio coding provides a mechanism for
   controlling congestion, since the bandwidth demand can be adjusted
   by selecting a different coding frame type or lower internal
   sampling rate.


   Another parameter that may impact the bandwidth demand for AMR-WB+
   is the number of frames that are encapsulated in each RTP payload.
   Packing more frames in each RTP payload can reduce the number of
   packets sent and hence the overhead from IP/UDP/RTP headers, at the
   expense of increased delay and reduced error robustness against
   packet losses.


   If forward error correction (FEC) is used to combat packet loss, the
   amount of redundancy added by FEC will need to be regulated so that
   the use of FEC itself does not cause a congestion problem.



6. Security Considerations


   RTP packets using the payload format defined in this specification
   are subject to the general security considerations discussed in RTP
   [3]. As this format transports encoded audio, the main security
   issues include confidentiality, integrity protection, and




Sjoberg, et. al.            Standards Track                 [Page 21]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   authentication of the audio itself.  The payload format itself does
   not have any built-in security mechanisms. Any suitable external
   mechanisms, such as SRTP [10], MAY be used.


   This payload format or the AMR-WB+ decoder does not exhibit any
   significant non-uniformity in the receiver side computational
   complexity for packet processing and thus is unlikely to pose a
   denial-of-service threat due to the receipt of pathological data.



6.1. Confidentiality


   To achieve confidentiality of the encoded AMR-WB+ audio, all audio
   data bits will need to be encrypted.  There is less a need to
   encrypt the payload header or the table of contents due to 1) that
   they only carry information about the frame type, and 2) that this
   information could be useful to some third party, e.g., quality
   monitoring.


   As long as the AMR-WB+ payload is only packed and unpacked at either
   end, encryption may be performed after packet encapsulation so that
   there is no conflict between the two operations.



6.2. Authentication and Integrity


   To authenticate the sender of the audio and provide integrity
   protection, an external mechanism has to be used.  It is RECOMMENDED
   that such a mechanism protect all the audio data bits and the RTP
   header.


   Data tampering by a man-in-the-middle attacker could result in
   erroneous depacketization/decoding that could lower the audio
   quality.


   To prevent a man-in-the-middle attacker from tampering with the
   payload packets, some additional information besides the audio bits
   SHOULD be protected.  This may include the ToC, RTP timestamp, RTP
   sequence number, RTP payload type, and the RTP marker bit.



6.3. Decoding Validation


   When processing a received payload packet, if the receiver finds
   that the calculated payload length, based on the information of the
   session and the values found in the payload header fields, does not
   match the size of the received packet, the receiver SHOULD discard
   the packet.  This is because decoding a packet that has errors in
   its length field could severely degrade the audio quality.






Sjoberg, et. al.            Standards Track                 [Page 22]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



7. Payload Format Parameters


   This section defines the parameters that may be used to select
   features of the AMR-WB+ payload format.  The parameters are defined
   here as part of the media type registration for the AMR-WB+ audio
   codec.  A mapping of the parameters into the Session Description
   Protocol (SDP) [6] is also provided for those applications that use
   SDP.  Equivalent parameters could be defined elsewhere for use with
   control protocols that do not use MIME or SDP.


   The data format and parameters are only specified for real-time
   transport in RTP.



7.1. Media Type Registration


   The media type for the Extended Adaptive Multi-Rate Wideband (AMR-
   WB+) codec is allocated from the IETF tree since AMR-WB+ is expected
   to be a widely used audio codec in general streaming applications.


   Note, any unspecified parameter MUST be ignored by the receiver.


   Media Type name:     audio


   Media subtype name:  AMR-WB+


   Required parameters:


   None


   Optional parameters:


   These parameters apply to RTP transfer only.


   channels:       The maximum number of audio channels present in the
                   audio frames. Permissible values are 1 (mono) or 2
                   (stereo).  If no parameter is present, the maximum
                   number of channels is 2 (stereo).


   interleaving:   Indicates that frame level interleaving mode SHALL
                   be used for the payload.  The parameter specifies
                   the number of frame slots required in a
                   deinterleaving buffer (including the frame that is
                   ready to be consumed).  Its value is equal to one
                   plus the maximum number of frames that precede any
                   frame in transmission order and follow the frame in
                   RTP timestamp order.  If this parameter is not
                   present, interleaving SHALL NOT be used.


   int-delay:      The minimal media time delay in RTP timestamp ticks
                   that is needed in the deinterleaving buffer, i.e.




Sjoberg, et. al.            Standards Track                 [Page 23]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



                   the difference in RTP timestamp between the earliest
                   and latest audio frame present in the deinterleaving
                   buffer, to ensure correct decoding.


   ptime:          see RFC2327 [6].


   maxptime:       see Section 8 in RFC 3267 [7].


   Restriction on Usage:
                This type is only defined for transfer via RTP (STD
                64).


   Encoding considerations:


   Security considerations:
                See Section 6 of RFC XXXX.


   Interoperability considerations:
                To maintain interoperability with AMR-WB capable end-
                points, in cases where negotiation is possible and the
                AMR-WB+ end-point supporting this format also supports
                RFC 3267 for AMR-WB transport, an AMR-WB+ end-point
                SHOULD declare itself also as AMR-WB capable (i.e.
                supporting also "audio/AMR-WB" as specified in RFC
                3267).


                As the AMR-WB+ decoder is capable of performing stereo
                to mono conversions, all receivers of AMR-WB+ should be
                able to receive both stereo and mono, although the
                receiver only is capable of playout of mono signals.


   Public specification:
                Please refer to Section 11 of RFC XXXX.


   Additional information:
                File storage of the AMR-WB+ format is specified within
                the 3GPP defined ISO based multimedia file format
                defined in 3GPP TS 26.244, see reference [14] of RFC
                XXXX. The file format has the MIME types "audio/3GPP"
                or "video/3GPP" as defined by RFC 3839 [15].


   Person & email address to contact for further information:
                johan.sjoberg@ericsson.com
                ari.lakaniemi@nokia.com


   Intended usage: COMMON.
                It is expected that many IP based streaming
                applications will use this type.


   Change controller:





Sjoberg, et. al.            Standards Track                 [Page 24]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



                IETF Audio/Video Transport working group delegated from
                the IESG.



7.2. Mapping Media Type Parameters into SDP


   The information carried in the media type specification has a
   specific mapping to fields in the Session Description Protocol (SDP)
   [6], which is commonly used to describe RTP sessions.  When SDP is
   used to specify sessions employing the AMR-WB+ codec, the mapping is
   as follows:


   -  The media type ("audio") goes in SDP "m=" as the media name.


   -  The media type (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.  The RTP clock rate in "a=rtpmap" SHALL be
      72000 for AMR-WB+, and the encoding parameter number of channels
      MUST either be explicitly set to 1 or 2, or be omitted, implying
      the default value of 2.


   -  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.


   -  Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the MIME media type string as a
      semicolon separated list of parameter=value pairs.



7.2.1. Offer-Answer Model Considerations


   To achieve good interoperability for the AMR-WB+ RTP payload in an
   Offer-Answer [8] negotiative usage in SDP the following
   considerations should be made:


   For negotiable offer/answer usage the following interpretations of
   the parameters SHALL be done:


   -  The "interleaving" parameter is declarative. For streams
      declared as sendrecv or recvonly: The receiver will accept to
      receive payload using the interleaved mode of the payload format.
      The value declares the amount of buffer space the receiver has
      available for the sender to utilize. For sendonly streams the
      parameter indicates the desired configuration and amount of
      buffer space. An answerer is RECOMMENDED to accept the offered
      value if capable of using them.


   -  The "int-delay" parameter is declarative. For streams declared
      as sendrecv or recvonly the value indicate the maximum initial
      delay the receiver will accept in the deinterleaving buffer. For
      sendonly streams the value is the amount of media time the sender





Sjoberg, et. al.            Standards Track                 [Page 25]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



      desires to use, the value SHOULD be copied into any response.


   -  The "channels" parameter is declarative. For "sendonly" streams
      it indicates the desired channel usage, stereo and mono, or mono
      only. For "recvonly" and "sendrecv" streams the parameter
      indicates what the receiver accepts to use. As any receiver will
      be capable of receiving stereo frame type and perform local
      mixing with the AMR-WB+ decoder, there is normally only one
      reason to restrict to mono only.  That reason is to avoid
      spending bit-rate on data that are not utilized if the front-end
      only is capable of mono.


   -  The "ptime" parameter works as indicated by the offer/answer
      model [8], "maxptime" SHALL be used in the same way.


   -  To maintain interoperability with AMR-WB in cases where
      negotiation is possible, an AMR-WB+ capable end-point which also
      implements the AMR-WB payload format [7] is RECOMMENDED to also
      declare itself capable of AMR-WB as it is a subset of the AMR-WB+
      codec.


   In declarative usage, like SDP in RTSP [16] or SAP [17], the
   following interpretation of the parameters SHALL be done:


   -  The "interleaving" parameter, if present, configures the payload
      format in that mode, and the value indicates the number of frames
      that the deinterleaving buffer is required to support to be able
      to handle this session correctly.


   -  The "int-delay" parameter, indicates the initial buffering delay
      required to receive this stream correctly.


   -  The "channels" parameter indicates if the content being
      transmitted can contain either both stereo and mono rates, or
      only mono.


   -  All other parameters indicate the value that are being used by
      the sending entity.



7.2.2. Examples


   One example SDP session description utilizing AMR-WB+ mono and
   stereo encoding follow.


    m=audio 49120 RTP/AVP 99
    a=rtpmap:99 AMR-WB+/72000/2
    a=fmtp:99 interleaving=30; int-delay=86400
    a=maxptime:100






Sjoberg, et. al.            Standards Track                 [Page 26]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   Note that the payload format (encoding) names are commonly shown in
   upper case.  Media subtypes are commonly shown in lower case.  These
   names are case-insensitive in both places.  Similarly, parameter
   names are case-insensitive both in MIME types and in the default
   mapping to the SDP a=fmtp attribute.



8. IANA Considerations


   It is requested that one new MIME subtype (audio/amr-wb+) is
   registered by IANA, see Section 7.



9. Contributors


   Daniel Enstrom has contributed with writing the codec introduction
   section.


10. Acknowledgements


   The authors would like to thank Redwan Salami and Stefan Bruhn for
   their significant contributions made throughout the writing and
   reviewing of this document.  Anisse Taleb and Ingemar Johansson
   contributed by implementing the payload format, and thus helped
   locating some flaws.  We would also like to acknowledge Qiaobing
   Xie, coauthor of RFC 3267 on which this document is based on.



11. References


11.1. Normative references


   [1]  3GPP TS 26.290 "Audio codec processing functions; Extended AMR
        Wideband codec; Transcoding functions", version 6.0.0 (2004-
        09), 3rd Generation Partnership Project (3GPP).
   [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, Internet Engineering Task Force,
        March 1997.
   [3]  H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A
        Transport Protocol for Real-Time Applications", STD 64, RFC
        3550, Internet Engineering Task Force, July 2003.
   [4]  3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
        aspects", version 5.0.0 (2001-03), 3rd Generation Partnership
        Project (3GPP).
   [5]  3GPP TS 26.193 "AMR Wideband speech codec; Source Controled
        Rate operation", version 5.0.0 (2001-03), 3rd Generation
        Partnership Project (3GPP).
   [6]  Handley, M. and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327, Internet Engineering Task Force, April
        1998.





Sjoberg, et. al.            Standards Track                 [Page 27]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   [7]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
        Time Transport Protocol (RTP) Payload Format and File Storage
        Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-
        Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, Internet
        Engineering Task Force, June 2002.
   [8]  J. Rosenberg, and H. Schulzrinne, "An Offer/Answer Model with
        the Session Description Protocol (SDP)", RFC 3264, Internet
        Engineering Task Force, June 2002.



11.2. Informative References


   [9]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
        with Minimal Control", STD 65, RFC 3551, Internet Engineering
        Task Force, July 2003.
   [10] Baugher, et. al., "The Secure Real Time Transport Protocol",
        RFC 3711, Internet Engineering Task Force, March 2004.
   [11] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
        Generic Forward Error Correction", RFC 2733, Internet
        Engineering Task Force, December 1999.
   [12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley,
        M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP
        Payload for Redundant Audio Data", RFC 2198, Internet
        Engineering Task Force, September 1997.
   [13] 3GPP TS 26.233 "Packet Switched Streaming service", version
        5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP).
   [14] 3GPP TS 26.244 " Transparent end-to-end packet switched
        streaming service (PSS); 3GPP file format (3GP)", version 6.1.0
        (2004-09), 3rd Generation Partnership Project (3GPP).    rd
   [15] D. Singer, and R. Castagno, "MIME Type Registrations for 3
        Generation Partnership Programme (3GPP) Multimedia files," RFC
        3839, Internet Engineering Task Force, July 2004.
   [16] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
        Protocol (RTSP)", RFC 2326, Internet Engineering Task Force,
        April 1998.
   [17] M. Handley, C. Perkins, E. Whelan, " Session Announcement
        Protocol", RFC 2974, Internet Engineering Task Force, June
        2001.


   Any 3GPP document can be downloaded from the 3GPP webserver,
   "http://www.3gpp.org/", see specifications.














Sjoberg, et. al.            Standards Track                 [Page 28]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004




12. Authors' Addresses


   Johan Sjoberg
   Ericsson Research
   Ericsson AB
   SE-164 80 Stockholm, SWEDEN


   Phone:   +46 8 7190000
   EMail: Johan.Sjoberg@ericsson.com



   Magnus Westerlund
   Ericsson Research
   Ericsson AB
   SE-164 80 Stockholm, SWEDEN


   Phone:   +46 8 7190000
   EMail: Magnus.Westerlund@ericsson.com



   Ari Lakaniemi
   Nokia Research Center
   P.O.Box 407
   FIN-00045 Nokia Group, FINLAND


   Phone:   +358-71-8008000
   EMail: ari.lakaniemi@nokia.com



13. IPR Notice


   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed
   to pertain to the implementation or use of the technology described
   in this document or the extent to which any license under such
   rights might or might not be available; nor does it represent that
   it has made any independent effort to identify any such rights.
   Information on the procedures with respect to rights in RFC
   documents can be found in BCP 78 and BCP 79.


   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use
   of such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository
   at http://www.ietf.org/ipr.


   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement




Sjoberg, et. al.            Standards Track                 [Page 29]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.



14. Copyright Notice


   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


   This document and the information contained herein are provided on
   an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
   INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


   This Internet-Draft expires in March 2005.



RFC Editor Considerations


   The RFC editor is requested to replace all occurrences of XXXX with
   the RFC number this document receives.


   The RFC editor is also requested to remove the next section
   "Changes".



15. Changes


   Changes in draft-ietf-avt-rtp-amrwbplus-01.txt compared to draft-
   ietf-avt-rtp-amrwbplus-00.txt:


   - Extended description of the codec to explain the super and
      transport frame concept used.
   - Added the Transport Frame Index field.
   - Clarified what the "channels" parameter is useful for.
   - Fixed a number of editorial errors.


   Changes in draft-ietf-avt-rtp-amrwbplus-02.txt compared to draft-
   ietf-avt-rtp-amrwbplus-01.txt:


   - Fixed a number of editorial errors.
   - Changed the ToC field mode index to frame type.
   - MIME type is changed to media type.
   - The media type has got some more and changed fields according to
      the new media type rules.
   - The need for ISF recovery and an example algorithm is added to
      the section Implementation Considerations.




Sjoberg, et. al.            Standards Track                 [Page 30]


INTERNET-DRAFT       RTP payload format for AMR-WB+ September 30, 2004



   - Section 3.1 is significantly changed; with a more extensive
      description of the codec, and the tables that were available in
      this section are now in [1] and referenced instead of copied.




















































Sjoberg, et. al.            Standards Track                 [Page 31]