Network Working Group                                    Johan Sjoberg
 INTERNET-DRAFT                                       Magnus Westerlund
 Expires: December 2004                                        Ericsson
                                                          Ari Lakaniemi
                                                                  Nokia
                                                          June 17, 2004
 
 
     Real-Time Transport Protocol (RTP) Payload Format for Extended AMR
                       Wideband (AMR-WB+) Audio Codec
                    <draft-ietf-avt-rtp-amrwbplus-00.txt>
 
 
 Status of this memo
 
    By submitting this Internet-Draft, I (we) certify that any
    applicable patent or other IPR claims of which I am (we are) aware
    have been disclosed, and any of which I (we) become aware will be
    disclosed, in accordance with RFC 3668 (BCP 79).
 
    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.
 
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."
 
    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/1id-abstracts.txt
 
    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html
 
    This document is a submission of the IETF AVT WG.  Comments should
    be directed to the AVT WG mailing list, avt@ietf.org.
 
 Abstract
 
    This document specifies a real-time transport protocol (RTP) payload
    format to be used for Extended AMR Wideband (AMR-WB+) encoded audio
    signals. The AMR-WB+ codec is an audio extension of the AMR-WB codec
    providing additional modes designed to give higher quality of music
    and speech than the original modes.  A MIME type registration is
    included for AMR-WB+.
 
 
 
 
 
 Sjoberg, et. al.                                              [Page 1]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 
 TABLE OF CONTENTS
 
 1. Definitions.....................................................3
    1.1. Glossary...................................................3
    1.2. Terminology................................................3
 2. Introduction....................................................3
 3. Background on AMR-WB+ and Design Principles.....................4
    3.1. The AMR-WB+ Audio Codec....................................5
    3.2. Multi-rate Encoding and Mode Adaptation....................9
    3.3. Voice Activity Detection and Discontinuous Transmission....9
    3.4. Support for Multi-Channel Session..........................9
    3.5. Unequal Bit-error Detection and Protection.................9
    3.6. Robustness against Packet Loss............................10
       3.6.1. Use of Forward Error Correction (FEC)................10
       3.6.2. Use of Frame Interleaving............................11
    3.7. AMR-WB+ Audio over IP scenarios...........................12
 4. RTP Payload Format for AMR-WB+.................................13
    4.1. RTP Header Usage..........................................14
    4.2. Payload Structure.........................................14
    4.3. Payload definitions.......................................15
       4.3.1. The Payload Table of Contents........................15
       4.3.2. Audio Data...........................................17
       4.3.3. Methods for Forming the Payload......................18
       4.3.4. Payload Examples.....................................18
    4.4. Interleaving Considerations...............................20
    4.5. Implementation Considerations.............................20
 5. Congestion Control.............................................21
 6. Security Considerations........................................21
    6.1. Confidentiality...........................................21
    6.2. Authentication and Integrity..............................22
    6.3. Decoding Validation.......................................22
 7. Payload Format Parameters......................................22
    7.1. MIME Registration.........................................23
    7.2. Mapping MIME Parameters into SDP..........................24
       7.2.1. Offer-Answer Model Considerations....................25
       7.2.2. Examples.............................................26
 8. IANA Considerations............................................26
 9. Acknowledgements...............................................26
 10. References....................................................27
    10.1. Normative references.....................................27
    10.2. Informative References...................................27
 11. Authors' Addresses............................................28
 12. Copyright Notice..............................................29
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 2]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 1. Definitions
 
 1.1. Glossary
 
    3GPP    - the Third Generation Partnership Project
    AMR     - Adaptive Multi-Rate Codec
    AMR-WB  - Adaptive Multi-Rate Wideband Codec
    AMR-WB+ - Extended Adaptive Multi-Rate Wideband Codec
    CMR     - Codec Mode Request
    CN      - Comfort Noise
    DTX     - Discontinuous Transmission
    FEC     - Forward Error Correction
    ISF     - Internal Sampling Frequency
    MI      - Mode Index
    SCR     - Source Controlled Rate Operation
    SID     - Silence Indicator (the frames containing only CN
              parameters)
    TS      - Timestamp
    VAD     - Voice Activity Detection
    UED     - Unequal Error Detection
    UEP     - Unequal Error Protection
 
 
 1.2. Terminology
 
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC 2119 [2].
 
 
 2. Introduction
 
    This document specifies the payload format for packetization of
    Extended Adaptive Multi-Rate Wideband (AMR-WB+) encoded audio
    signals into the Real-time Transport Protocol (RTP) [3].  The
    payload format supports transmission of mono or stereo audio,
    aggregating multiple frames per payload, and mechanisms enhancing
    robustness against packet loss.
 
    AMR-WB+ codec is an extension to the Adaptive Multi-Rate Wideband
    (AMR-WB) codec and therefore has a couple of features not available
    in AMR-WB.  The new features in transport point of view are native
    support also for stereophonic audio and possibility to use different
    internal sampling frequencies.  The primary usage scenario for AMR-
    WB+ is transport over IP and therefore AMR-WB-like need for
    interworking with other transport networks is not necessary.
 
    AMR-WB+ will mainly be used in streaming scenarios and there the
    benefit of using a more octet-aligned format to decrease the
    complexity of the server is seen substantial, and therefore anything
    similar to the bandwidth efficient mode defined in [7] is not
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 3]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    specified for AMR-WB+; the saved bandwidth using bandwidth efficient
    mode would also be very small for all extension modes as they are
    octet aligned.
 
    The inbuilt codec support for stereo encoding makes the
    implementation of multi-channel support as in [7] difficult, but
    also less needed.  Therefore, the multi-channel support as specified
    in AMR and AMR-WB payload format is not specified for AMR-WB+. Due
    to all these changes, and the different scope of the AMR-WB+ codec
    this formats defines a new significantly different RTP payload
    format compared with the ones for AMR and AMR-WB [7].
 
    There is no file format for AMR-WB+ defined within this
    specification.  Instead the 3GPP defined ISO based 3GP file format
    [14] will support AMR-WB+, and provides all functionality needed
    from a file format.  This format does also support storage of AMR
    and AMR-WB, plus other multi-media formats allowing for synchronized
    playback.  As the 3GP format provides much greater capability than
    the previously defined formats for AMR and AMR-WB, this format is
    expected to be used and be sufficient for all use cases.
 
    Background on AMR-WB+ and design principles can be found in Section
    3.  The payload format itself is specified in Section 4 and follows
    the principles used in [3], [9], and [7].  In Section 7, a MIME type
    registration is provided.
 
 
 3. Background on AMR-WB+ and Design Principles
 
    The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec
    is designed for encoding and transport of speech and low bit-rate
    audio with good quality. The codec is being specified by 3GPP, and
    primary target applications within 3GPP are packet-switched
    streaming service (PSS) [13] and multimedia messaging service (MMS).
    However, due to its flexibility and robustness, AMR-WB+ is very well
    suited for streaming services in highly varying transport
    environments, e.g. the Internet.
 
    Because of the flexibility of this codec, the behavior in a
    particular application is controlled by several parameters that
    select options or specify the acceptable values for a variable.
    These options and variables are described in general terms at
    appropriate points in the text of this specification as parameters
    to be established through out-of-band means. In Section 7, all of
    the parameters are specified in the form of MIME subtype
    registration for the AMR-WB+ encoding. The method used to signal
    these parameters at session setup or to arrange prior agreement of
    the participants is beyond the scope of this document; however,
    Section 7 provides a mapping of the parameters into the Session
    Description Protocol (SDP) [6] for those applications that use SDP.
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 4]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    Note that the AMR-WB+ design and specification work in 3GPP is still
    work in progress. Target is to finalize the codec specifications
    within 3GPP Release 6 timeline, the release will be frozen earliest
    in September 2004. However, due to non-finished status of the codec
    work some of the issues discussed in this internet-draft are still
    subject to change, but the draft presents the situation according to
    author's best knowledge at the time of writing.
 
 
 3.1. The AMR-WB+ Audio Codec
 
    The AMR-WB+ audio codec was originally developed by 3GPP to be used
    for streaming and messaging services in GSM and 3G cellular systems.
    AMR-WB+ is designed as an audio extension to the AMR-WB speech
    codec. Thus, it includes the nine coding modes specified for AMR-WB,
    extended with additional new modes with bit rates ranging from 5.2
    to 53,3 kbit/s. Whereas the AMR-WB modes employ 16000 Hz sampling
    frequency and operates on monophonic signals in all modes, the
    extension modes operate at a number of internal sampling
    frequencies, both in mono and stereo. The audio processing is
    performed on equal-size frames, the transport frames correspond to
    512 samples.  If the internal sampling rate is set at 25600 Hz, this
    is equal to 20 ms.
 
    The AMR-WB+ codec includes the AMR-WB modes, as shown in Table 1
    below.
 
                         Sampling    Mono/     Number of
 
     Index    Mode     rate [kHz]  stereo  bits per frame
     -----------------------------------------------------
       0   WB 6.60 kbps    16       mono       132
       1   WB 8.80 kbps    16       mono       177
       2   WB 12.65 kbps   16       mono       253
       3   WB 14.25 kbps   16       mono       285
       4   WB 15.85 kbps   16       mono       317
       5   WB 18.25 kbps   16       mono       365
       6   WB 19.85 kbps   16       mono       397
       7   WB 23.05 kbps   16       mono       461
       8   WB 23.85 kbps   16       mono       477
       9   WB SID          16       mono        40
      10   WB+ 13.6 kbps  16/24     mono       272
      11   WB+ 18 kbps    16/24     stereo     360
      12   WB+ 24 kbps    16/24     mono       480
      13   WB+ 24 kbps    16/24     stereo     480
      14   LOST_SPEECH     -          -          0
      15   NO_DATA         -          -          0
 
    Table 1: AMR-WB modes.
 
    There are four special extension modes (Index 10-13 in table 1) that
    have a fixed internal sampling frequency (25600 Hz) and audio input
 
 
 Sjoberg, et. al.            Standards Track                  [Page 5]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    frequencies (16 or 24 kHz). These modes share the property with the
    AMR-WB modes that each frame is only capable of representing 20 ms.
 
    The remaining extension modes are specified by three parameters;
    mono bit-rate, stereo bit-rate and internal sampling frequency.
    There are eight mono bit-rates and 16 stereo bit-rates available,
    see Tables 2 and 3 below. Note that the mode naming below assumes an
    internal sampling frequency of 25600 Hz.
 
                            Number of
 
     Index    Mode       bits per frame
     ----------------------------------
       0   WB+ 10.4 kbps      208
       1   WB+ 12.0 kbps      240
       2   WB+ 13.6 kbps      272
       3   WB+ 15.2 kbps      304
       4   WB+ 16.8 kbps      336
       5   WB+ 19.2 kbps      384
       6   WB+ 20.8 kbps      416
       7   WB+ 24.0 kbps      480
 
    Table 2: AMR-WB+ core mono modes.
 
 
                            Number of
 
     Index    Mode       bits per frame
     ----------------------------------
       0   WB+_s 2.0 kbps      40
       1   WB+_s 2.4 kbps      48
       2   WB+_s 2.8 kbps      56
       3   WB+_s 3.2 kbps      64
       4   WB+_s 3.6 kbps      72
       5   WB+_s 4.0 kbps      80
       6   WB+_s 4.4 kbps      88
       7   WB+_s 4.8 kbps      96
       8   WB+_s 5.2 kbps     104
       9   WB+_s 5.6 kbps     112
      10   WB+_s 6.0 kbps     120
      11   WB+_s 6.4 kbps     128
      12   WB+_s 6.8 kbps     136
      13   WB+_s 7.2 kbps     144
      14   WB+_s 7.6 kbps     152
      15   WB+_s 8.0 kbps     160
 
    Table 3: AMR-WB+ stereo modes.
 
    When using the codec in an extension mode, the number of samples
    each frame corresponds to is always the same but the duration of
    each frame varies depending on the internal sampling frequency.
    There is no preferred sampling frequency for the codec to operate
    at, but in order to limit the possible settings for an effective
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 6]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    transmission, the following sampling frequencies are supported in
    this payload format.
 
 
 
           Internal   Frame         Frame
     ISF   Sampling   duration     duration      Bit-rate
    Index  Rate [Hz]  [ms]       [RTP TS ticks]   factor
    ------------------------------------------------------
      0    N/A         20           1440          N/A
      1    12800       40           2880          1/2
      2    14400       35.55        2560          9/16
      3    16000       32           2304          5/8
      4    17067       30           2160          2/3
      5    19200       26.67        1920          3/4
      6    21333       24           1728          5/6
      7    24000       21.33        1536         15/16
      8    25600       20           1440           1
      9    28800       17.78        1280          9/8
     10    32000       16           1152          5/4
     11    34133       15           1080          4/3
     12    38400       13.33         960          3/2
     13    42667       12            864          5/3
 
 
    Table 4: The relation between internal sampling frequency and frame
             lengths in time and RTP timestamp ticks. Note also that the
             RTP TS ticks assume TS clock rate of 72000 Hz. Index 0 is
             used for AMR-WB and the 4 extension modes in table 1.
 
 
    The duration of one AMR-WB+ audio transport frame is variable and
    depends on internal sampling frequency. The frame durations are all
    between 12 and 40 ms per transport frame.  A transport frame is
    always representing 512 samples at the used internal sampling
    frequency. This results in that an AMR-WB+ transport frame length in
    RTP ticks is dependent on the internal sampling frequency and varies
    between 864 and 2880.  Also the bit-rate will be dependent on the
    internal sampling frequency, the last column of Table 4 indicates
    which multiplication factor, any bit-rate value for 25600 Hz
    internal sampling factor should be converted with.  The ISF index is
    carried in the payload format to indicate which internal sampling
    frequency is used for each AMR-WB+ encoded frame.
 
    The mode index is used to identify the content of an AMR-WB+ encoded
    frame. The mode index indicates if it is; an AMR-WB mode, Comfort
    noise, NO_DATA, AMR-WB+ core mode in mono usage, or a combination of
    a core mode and a stereo mode.  The mode indexes are presented in
    the below table 5. The core mode and stereo mode index values are
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 7]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    according to table 2 and 3 respectively. The bit-rate value assumes
    an internal sampling frequency of 25600 Hz.
 
               Core      Stereo         Total          Number of
    Index      mode       mode       bit-rate [kbps]  bits per frame
    -----------------------------------------------------------------
    0-15: As specified in table 1.
    16          0         None           10.4           208
    17          1         None           12.0           240
    18          2         None           13.6           272
    19          3         None           15.2           304
    20          4         None           16.8           336
    21          5         None           19.2           384
    22          6         None           20.8           416
    23          7         None           24.0           480
    24          0         0              12.4           248
    25          0         1              12.8           256
    26          0         4              14             280
    27          1         1              14.4           288
    28          1         3              15.2           304
    29          1         5              16             320
    30          2         2              16.4           328
    31          2         4              17.2           344
    32          2         6              18             360
    33          3         3              18.4           368
    34          3         5              19.2           384
    35          3         7              20             400
    36          4         4              20.4           408
    37          4         6              21.2           424
    38          4         9              22.4           448
    39          5         5              23.2           464
    40          5         7              24             480
    41          5         11             25.6           512
    42          6         8              26             520
    43          6         10             26.8           536
    44          6         15             28.8           576
    45          7         9              29.6           592
    46          7         10             30             600
    47          7         15             32             640
    48-127 : Reserved
 
    Table 5: The normative mode index table. Bit-rates assumes 25600 Hz
    internal sampling frequency.
 
    The actual bit-rate of audio encoding is, as indicated, dependent on
    the combination of core mode and stereo mode (mode index) and the
    internal sampling frequency (ISF). There exist a number of
    combinations that will produce the same bit-rate. For example one
    possible way of producing a 32 kbps audio stream is to utilize
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 8]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    MI=41, i.e. 25.6 kbps, and then use an internal sampling frequency
    of 32kHz (5/4 * 25.6 = 32 kpbs).
 
 3.2. Multi-rate Encoding and Mode Adaptation
 
    The multi-rate encoding (i.e., multi-mode) capability of AMR-WB+ is
    designed for preserving high audio quality under a wide range of
    bandwidth requirements and transmission conditions.
 
    AMR-WB+ enables seamless switching between modes using the same
    number of audio channels and the same internal sampling frequency.
    Every AMR-WB+ codec implementation is required to support all the
    respective audio coding modes defined by the codec and must be able
    to handle mode switching between any two modes. Switching between
    modes employing different number of audio channels or different
    internal sampling frequency is possible, but may not be seamless.
    Therefore it is recommended to perform any such switch during
    periods where the input is silent, or take other precautions when
    performing a switch to ensure maintaining good audio quality.
 
 
 3.3. Voice Activity Detection and Discontinuous Transmission
 
    AMR-WB+ supports the same algorithms for voice activity detection
    (VAD) and generation of comfort noise (CN) parameters during silence
    periods as used by the AMR-WB codec. Hence, also the AMR-WB+ codec
    has the option to reduce the number of transmitted bits and packets
    during silence periods to a minimum. The operation of sending CN
    parameters at regular intervals during silence periods is usually
    called discontinuous transmission (DTX) or source controlled rate
    (SCR) operation.  The AMR-WB+ frames containing CN parameters are
    called Silence Indicator (SID) frames. See more details about VAD
    and DTX functionality in [4] and [5].
 
 
 3.4. Support for Multi-Channel Session
 
    Some of the AMR-WB+ modes support encoding of stereophonic audio.
    Because of this native support for two-channel stereophonic signal
    it does not seem necessary to support multi-channel transport with
    separate codecs as done in AMR-WB RTP payload [7].  However, for
    making the signalling of the number of supported channels explicit,
    it is possible to negotiate a restriction to only mono usage.  A
    reason for having the number of channels present at RTP level is
    that the codec external requirements are different, i.e. the
    playback facilities of a receiver need to handle stereo or mono
    signals.
 
 
 3.5. Unequal Bit-error Detection and Protection
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 9]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    The audio bits encoded in each AMR-WB frame, have different
    perceptual sensitivity to bit errors. This property can be exploited
    e.g. in cellular systems to achieve better voice quality by using
    unequal error protection and detection (UEP and UED) mechanisms.
    However, as the extension modes in the AMR-WB+ codec do not have
    this property, UEP or UED cannot be utilized. If one has desire to
    use UEP or UED and needs payload format support for this, please use
    the RTP payload format for the AMR-WB modes defined in RFC 3267 [7].
 
 
 3.6. Robustness against Packet Loss
 
    The payload format supports several means, including forward error
    correction (FEC) and frame interleaving, to increase robustness
    against packet loss.
 
 3.6.1. Use of Forward Error Correction (FEC)
 
    The simple scheme of repetition of previously sent data is one way
    of achieving FEC. Another possible scheme which can be more
    bandwidth efficient is to use payload external FEC, e.g. RFC2733
    [11], which generates extra packets containing repair data.  For the
    AMR-WB+ extension modes, it is only possible to use the codec to
    send redundant copies using the same mode index and internal
    sampling frequency. We describe such a scheme next.
 
    This involves the simple retransmission of previously transmitted
    frames together with the current frame(s). This is done by using a
    sliding window to group the audio frames to be sent in each payload.
    Figure 1 below shows us an example.
 
    --+--------+--------+--------+--------+--------+--------+--------+--
      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
    --+--------+--------+--------+--------+--------+--------+--------+--
 
      <---- p(n-1) ---->
               <----- p(n) ----->
                        <---- p(n+1) ---->
                                 <---- p(n+2) ---->
                                          <---- p(n+3) ---->
                                                   <---- p(n+4) ---->
 
    Figure 1: An example of redundant transmission.
 
    In this example each frame is retransmitted once in the following
    RTP payload packet. Here, f(n-2)..f(n+4) denotes a sequence of audio
    frames and p(n-1)..p(n+4) a sequence of payload packets.
 
    The use of this approach does not require signaling at the session
    setup. In other words, the audio sender can choose to use this
    scheme without consulting the receiver. This is because a packet
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 10]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    containing redundant frames will not look different from a packet
    with only new frames. The receiver may receive multiple copies or
    both indicated as NO_DATA and endocded audio of a frame for a
    certain timestamp if no packet is lost.
 
    This redundancy scheme provides the same functionality as the one
    described in RFC 2198 "RTP Payload for Redundant Audio Data" [12].
    In most cases the mechanism in this payload format is more efficient
    and simpler than requiring both endpoints to support RFC 2198 in
    addition. There is one situation in which use of RFC 2198 is
    indicated: if some other codec than AMR-WB+ is desired for the
    redundant encoding, the AMR-WB+ payload format won't be able to
    carry it.
 
    The sender is responsible for selecting an appropriate amount of
    redundancy based on feedback about the channel, e.g., in RTCP
    receiver reports. The sender is also responsible for avoiding
    congestion, which may be exacerbated by redundancy (see Section 5
    for more details).
 
 3.6.2. Use of Frame Interleaving
 
    To decrease protocol overhead, the payload design allows several
    audio frames be encapsulated into a single RTP packet. One of the
    drawbacks of such an approach is that in case of packet loss this
    means loss of several consecutive audio frames, which usually causes
    clearly audible distortion in the reconstructed audio. Interleaving
    of frames can improve the audio quality in such cases by
    distributing the consecutive losses into a series of single frame
    losses.  However, interleaving and bundling several frames per
    payload will also increase end-to-end delay and sets higher
    buffering requirements, and it is therefore not appropriate for all
    usage scenarios. Anyway, streaming applications will most likely be
    able to exploit interleaving to improve audio quality in lossy
    transmission conditions.
 
    This payload design supports the use of frame interleaving as an
    option.  The usage of this feature needs to be negotiated or at
    least signalled.
 
    The interleaving supported by this format is rather flexible. For
    example, a continuous pattern can be defined, as the below example
    shows.
 
    --+--------+--------+--------+--------+--------+--------+--------+--
      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
    --+--------+--------+--------+--------+--------+--------+--------+--
 
               [ P(n)   ]
      [ P(n+1) ]                 [ P(n+1) ]
                        [ P(n+2) ]                 [ P(n+2) ]
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 11]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
                                          [ P(n+3) ]                 [P(
                                                            [ P(n+4) ]
 
    In this example the consecutive frames, denoted f(n-2) to f(n+4),
    are aggregated two in each packet with interleaving. The packets,
    P(n) to P(n+4), contains a pattern that allows for constant delay in
    both interleaving and deinterleaving process.  The deinterleaving
    buffer in this example needs to have room for at least 3 frames
    including the one that is ready to be consumed. This case when this
    is needed is for example when f(n) is the next to be played, then
    the receiver would have consumed all previous frames, and will need
    to have f(n), f(n+1) and f(n+3) in the buffer.  Then when it is time
    to consume f(n+1) no more RTP packet is need. When f(n+2) is to be
    consumed then P(n+3) is needed and the deinterleaving buffer will
    contain f(n+2), f(n+3) and f(n+5).
 
 
 3.7. AMR-WB+ Audio over IP scenarios
 
    Since the primary target for the AMR-WB+ codec is packet switched
    streaming, the most relevant usage scenario for this payload format
    is IP end-to-end between a server and a terminal, as shown in Figure
    2.
 
              +----------+                          +----------+
              |          |    IP/UDP/RTP/AMR-WB+    |          |
              |  SERVER  |<------------------------>| TERMINAL |
              |          |                          |          |
              +----------+                          +----------+
 
               Figure 2: Server to terminal IP scenario
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 12]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 4. RTP Payload Format for AMR-WB+
 
    The AMR-WB+ payload format is different from the AMR and AMR-WB
    payload formats [7]. The structure is simpler, and does only consist
    of; a table of contents, and the audio data. The payload format has
    two modes, the basic, and the interleaved mode. The main structural
    difference between the two modes is the extension of the table of
    contents with a timestamp offset field in interleaved mode.
 
    As the AMR-WB+ codec contains all the functionality of the AMR-WB
    codec, anyone supporting the AMR-WB+ codec and this payload format
    is RECOMMENDED to also implement the payload format in RFC 3267 [7]
    for the AMR-WB modes. This will significantly help interoperability
    with other devices that only support AMR-WB, in applications and
    scenarios where this possible. Otherwise an end-point that is in
    fact capable of everything except the RTP payload format for AMR-WB
    will not be able to communicate.
 
    The basic mode supports aggregation of multiple consecutive frames
    in a payload. The interleaved mode supports aggregation of multiple
    frames that are non-consecutive in time.  It is possible to have
    frames of different internal sampling frequency in the same payload.
    However frequent switching of the internal sampling frequency is not
    expected. However to avoid problems with the codecs restriction of
    switching ISF only on super frame [1] boundaries, the payload format
    allows for switching at any frame in the payload.
 
    The payload format is designed around the property that the AMR-WB+
    frames can be sorted and identified based on the RTP timestamp of
    each audio frame. For example, the timestamp of the audio frames is
    used to identify duplicates. The timestamp is also used in the
    deinterleaving buffer to regenerate the correct order of the frames
    before decoding.
 
    The interleaving scheme of this payload format is significantly more
    flexible than the one present in RFC 3267. The AMR and AMR-WB
    payload format is only capable of using periodic patterns with
    frames taken from an interleaving group at fixed intervals. This
    interleaving scheme allows for any patterns as long as the time
    difference between any two in the payload adjacent frames are not
    more than 0.91 seconds, i.e. maximum field value / RTP timestamp
    rate (65535/72000). And by using extra NO_DATA frames even that can
    be extended.
 
    To allow for error resiliency through redundant transmission, the
    periods covered by multiple packets MAY overlap in time.  A receiver
    MUST be prepared to receive any audio frame multiple times, all
    multiply sent frames MUST use the same mode (or NO_DATA) and
    internal sampling frequency and have the same RTP timestamp.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 13]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    The payload is always made an integral number of octets long by
    padding with zero bits if necessary.  If additional padding is
    required to bring the payload length to a larger multiple of octets
    or for some other purpose, then the P bit in the RTP header MAY be
    set and padding appended as specified in [3].
 
 
 4.1. RTP Header Usage
 
    The format of the RTP header is specified in [3].  This payload
    format uses the fields of the header in a manner consistent with
    that specification.
 
    The RTP timestamp corresponds to the sampling instant of the first
    sample encoded for the first frame in the packet.  The timestamp
    clock frequency SHALL be 72000 Hz. This frequency allows the frame
    duration to be integer RTP timestamp ticks for the used internal
    sampling frequencies, and also gives reasonable conversion factors
    to used audio sampling frequencies. See section 4.3.1 for how to
    derive the RTP timestamp for any audio frame beyond the first one.
 
    The RTP header marker bit (M) SHALL be set to 1 if the first frame
    carried in the packet contains an audio frame, which is the first in
    a talkspurt.  For all other packets the marker bit SHALL be set to
    zero (M=0).
 
    The assignment of an RTP payload type for this new packet format is
    outside the scope of this document, and will not be specified here.
    It is expected that the RTP profile under which this payload format
    is being used will assign a payload type for this encoding or
    specify that the payload type is to be bound dynamically.
 
    The MIME parameter "channels" is used to indicate the maximum number
    of channels allowed to be used for a given payload. A payload type
    where channels=1 (mono), SHALL only carry mono content. While a
    payload type for which channels=2 has been declared MAY carry both
    mono and stereo content.
 
 
 4.2. Payload Structure
 
    The complete payload consists of a payload table of contents, and
    audio data representing one or more audio frames.  The following
    diagram shows the general payload format layout:
 
    +-------------------+----------------
    | table of contents | audio data ...
    +-------------------+----------------
 
    Payloads containing more than one audio frame are called compound
    payloads.
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 14]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 
    The following sections describe the variations taken by the payload
    format depending on whether the AMR-WB+ session is set up to use the
    basic mode or interleaved mode.
 
 4.3. Payload definitions
 
 
 4.3.1. The Payload Table of Contents
 
    The table of contents (ToC) consists of a list of ToC entries where
    each entry corresponds to an audio frame carried in the payload,
    i.e.,
 
    +----------------+----------------+- ... -+----------------+
    |  ToC entry #1  |  Toc entry #2  |          ToC entry #N  |
    +----------------+----------------+- ... -+----------------+
 
    When multiple frames are present in a packet, the ToC entries SHALL
    be placed in the packet in order of their creation time.
 
    All fields in the RTP payload are in network byte order, i.e. with
    the left most bit being most significiant.
 
    A ToC entry takes the following format:
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Mode Index  |R|R|R|ISF mode | Timestamp offset (optional)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    F (1 bit): If set to 1, indicates that this frame is followed by
       another audio frame in this payload; if set to 0, indicates that
       this frame is the last frame in this payload.
 
    Mode Index (7 bits): Indicates the audio codec mode used for the
       corresponding frame. Indicates the combination of AMR-WB+ core
       and stereo mode, the AMR-WB mode, or comfort noise, as specified
       by Table 5 in section 3.1.
 
    ISF mode (5 bits): Indicates the internal sampling frequency
       employed for the corresponding frame. The index values correspond
       to internal sampling frequency as specified in Table 4 in section
       3.1. This field SHALL be set to 0 for Mode Index values 0-13.
 
    Timestamp offset (16 bits): When using interleaved mode, this field
       SHALL be present, otherwise not. The field indicates the number
       of RTP Timestamp ticks that this frame is offset, in relation to
       the previous frame's RTP timestamp value. The RTP Timestamp
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 15]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
       offset for the first audio frame SHALL be 0. The field is in
       network byte order.
 
    R: Reserved bit, SHALL be set to 0 and SHALL be ignored by
       receivers.
 
    The RTP Timestamp value for a frame is the timestamp value of the
    first sample encoded in the frame. The timestamp value for a frame
    is derived differently depending on if it is basic or interleaved
    mode. In both cases the first frame in a compound packet has a RTP
    timestamp equal to the one given in the RTP header. In the basic
    mode, the RTP time for any frame of a subsequent frame is derived by
    adding together the frame durations of all the previous frames and
    add that to the RTP header timestamp value. For example if the RTP
    Header timestamp value is 12345, and the frame duration is 16 ms
    (Internal sampling frequency = 32 kHz).
    Then the RTP timestamp of a fourth frame present in the payload will
    be 12345 + 3 * 1152 = 15801.
 
    In interleaved mode the RTP timestamp is derived from the RTP header
    timestamp field and the sum of the RTP timestamp offset field in the
    TOC entries up to and including the frame for which one calculates
    the RTP TS for in modulo arithmetic. So for example to derive the
    RTP TS for the third frame in a compound packet, which has the
    following header and TOC information:
 
    RTP header TS: 12345
    Frame 1 offset field: 0
    Frame 2 offset field: 13824
    Frame 3 offset field: 18432
 
    In this case one simply adds together the offset values up to
    current frame to compute the frame timestamp. For example Frame 3's
    timestamp is (12345 + 0 + 13824 + 18432)% 2^32 = 44601 (% stands for
    modulo operation)
 
    The value of mode index is defined in Table 5 Section 3.1. MI=14
    (AUDIO_LOST) is used to indicate frames that are lost. NO_DATA
    (MI=15) frame could mean either that there is no data produced by
    the audio encoder for that frame or that no data for that frame is
    transmitted in the current payload (i.e., valid data for that frame
    could be sent in either an earlier or later packet). The duration
    for these non-included frames is dependent on the internal sampling
    frequency indicated by the ISF mode field.
 
 
    For modes with index 0-13 the ISF field shall be set 0 and has no
    meaning. The frame length for these modes are fixed to 20 ms in
    time, and an RTP timestamp duration of 1440 ticks.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 16]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    If receiving a ToC entry with a MI value not defined the whole
    packet SHOULD be discarded.  This is to avoid the loss of data
    synchronization in the depacketization process, which can result in
    a huge degradation in audio quality.
 
    Note that packets containing only NO_DATA frames SHOULD NOT be
    transmitted.  Also, NO_DATA frames at the end of a frame sequence to
    be carried in a payload SHOULD NOT be included in the transmitted
    packet.  The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX
    described in [5] and SHALL only be used in combination with the AMR-
    WB modes (0-8).
 
    When multiple frames are present, their ToC entries will be placed
    in the ToC in order of their creation time independent on payload
    mode. In basic mode the frames will be consecutive in time, while in
    interleaved mode the frames may not only be non-consecutive in time
    but may even have varying inter frame distances.
 
    The following figure shows an example of a ToC of three entries in
    basic mode.
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Mode Index1 |0|0|0|ISF mode1|1| Mode Index2 |0|0|0|ISF mode2|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Mode Index3 |0|0|0|ISF mode3|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    The following figure shows an example of a TOC of three entries in
    interleaved mode.
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Mode Index1 |0|0|0|ISF mode1| Timestamp offset 1            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Mode Index2 |0|0|0|ISF mode2| Timestamp offset 2            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Mode Index3 |0|0|0|ISF mode3| Timestamp offset 3            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 4.3.2. Audio Data
 
    Audio data of a payload contains one or more audio frames or comfort
    noise frames, as described in the ToC of the payload.
 
       Note, for ToC entries with MI=14 or 15, there will be no
       corresponding audio frame present in the audio data.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 17]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    Each audio frame for an extension mode represents an AMR-WB+
    transport frame containing the encoding of 512 samples of audio
    sampled with the internal sampling frequency specified by the ISF
    mode indicator. Modes with index 10-13, being the exception, is only
    capable ofusing a single internal sampling frequency (25600 Hz).
    The encoding modes (core and stereo) is indicated in the mode index
    field of the corresponding ToC entry.  The octet length of the audio
    frame is implicitly defined by the mode indicated in the mode index
    field.  The order and numbering notation of the bits are as
    specified in [1].  As specified there, the bits of the AMR-WB audio
    frames (mode indices in range 0...8) have been rearranged in order
    of decreasing sensitivity. For the AMR-WB+ modes and comfort noise
    frames, the bits are in the order produced by the encoder.  The
    resulting bit sequence for a frame of length K bits is denoted d(0),
    d(1), ..., d(K-1). The last octet of each audio frame MUST be padded
    with zeroes at the end if not all bits in the octet are used.  In
    other words, each audio frame MUST be octet-aligned.
 
 
 4.3.3. Methods for Forming the Payload
 
    The payload begins with the table of contents consisting of a list
    of ToC entries, two or four bytes per entry.
 
    The audio data follows the table of contents, all of the octets
    comprising an audio frame are appended to the payload as a unit. The
    audio frames are packed in the same order as their corresponding ToC
    entries are arranged in the ToC list, with the exception that if a
    given frame has a ToC entry with MI=14 or 15, there will be no data
    octets present for that frame.
 
 
 4.3.4. Payload Examples
 
 4.3.4.1. Example 1, Basic Payload Carrying Multiple Frames
 
    The following diagram shows a payload from a session that carries
    three AMR-WB+ frames of 14 kbps coding mode (MI=26) with a frame
    length of 280 bits. The internal sampling frequency in this example
    is 25.6 kHz (ISF mode = 8).
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Mode Index1 |0|0|0|ISF mode1|1| Mode Index2 |0|0|0|ISF mode2|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Mode Index3 |0|0|0|ISF mode3|   f1(0..7)    |   f1(8..15)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f1(272..279)  |   f2(0..7)    | ...                           |
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 18]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                                           | f2(272..279)  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   f3(0..7)    | ...                                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f3(272..279)  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
 4.3.4.2. Example 2, Payload in Interleaved mode
 
    This example shows a payload with three frames of 24 kbps stereo
    coding mode (MI=40), carried in this payload.  This payload uses the
    interleaved mode.  The frames 1, 2 and 3 is not consecutive, and may
    for example in playout order be frame 1, 9, and 17 in a sequence.
    The internal sampling frequency in this example is 32 kHz (ISF mode
    = 10).
 
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Mode Index1 |0|0|0|ISF mode1| Timestamp offset 1            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Mode Index2 |0|0|0|ISF mode2| Timestamp offset 2            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Mode Index3 |0|0|0|ISF mode3| Timestamp offset 3            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   f1(0..7)    |   f1(8..15)   |  f1(16..23)   |  f1(24..31)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f1(448..455)  | f1(456..463)  | f1(464..471)  | f1(472..479)  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   f2(0..7)    |   f2(8..15)   |  f2(16..23)   |  f2(24..31)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f2(448..455)  | f2(456..463)  | f2(464..471)  | f2(472..479)  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   f3(0..7)    |   f3(8..15)   |  f3(16..23)   |  f3(24..31)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f3(448..455)  | f3(456..463)  | f3(464..471)  | f3(472..479)  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 19]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 
 
 4.4. Interleaving Considerations
 
    The new more flexible interleaving scheme requires some further
    usage considerations. As presented in the example in Section 3.6.2,
    an interleaving pattern requires certain sizes of the deinterlaving
    buffer. This required buffer space, expressed as number of frame
    slots is expressed using the "interleaving" MIME parameter. The
    number of frame slots needed, can be converted into actually memory
    requirement, considering the largest (in bytes) combination of AMR-
    WB+'s core and stereo mode.
 
    However the frame buffer size is not always sufficient to determine
    when it is appropriate to start consuming frames from the
    interleaving buffer. Two cases exist, either due to switching of the
    internal sampling frequency or due to changes of the pattern. Due to
    this the "int-delay" MIME parameter is defined. It allows a sender
    to indicate the minimal media time that needs to be present in the
    buffer before starting to consume media from the buffer.
 
 4.5. Implementation Considerations
 
    An application implementing this payload format MUST understand all
    the payload parameters in the out-of-band signaling used.  For
    example, if an application uses SDP, all the SDP and MIME parameters
    in this document MUST be understood.  This requirement ensures that
    an implementation always can decide if it is capable or not of
    communicating.
 
    Both basic and interleaving mode SHALL be implemented. The
    implementation burden of both is rather small and requiring both
    ensures interoperability. It is also RECOMMENDED to implement the
    AMR-WB format in RFC 3267 [7], for applications or scenarios where
    interoperability with AMR-WB only codecs is necessary.
 
    When doing error concealment certain precautions are needed due to
    the possibility of switching of the internal sampling frequency. The
    first problem is that unless one has at least one audio frame and
    its timestamp value, which is later than the frame to conceal,
    available when performing error concealment, one can conceal using
    incorrect framelengths, which can in the worst case make some of the
    subsequent frames unusable.
 
    Example:
    Frame nr      :  1   2   3   4
    Frame Len (ms):  20  15  15  15
 
    Assume that one has received frame 1, but none of the following
    frames. When it is time to decode the next frame, the decoder is
    going to conceal frame 2. However, as this frame was lost, one does
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 20]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    not know that this frame represents 15 ms instead of the previous
    20. When then the receiver gets frame nr 4, it can determine that it
    should have concealed 30 ms to cover missing frames 2 and 3, either
    as one 30 ms frame, or as several frames adding up to 30 ms.
 
    This is something a receiver implementation will need to consider
    and handle appropriately for the application. A rather basic idea to
    solve this is to be capable of removing the extra time generated by
    the wrongly concealed frame. Thus allowing a receiver to at least be
    able to maintain synchronization.  As the problem is due to the
    switching of internal sampling frequency, and such switches are
    expected to be scarce, the problem is a minor one.
 
 
 5. Congestion Control
 
    The general congestion control considerations for transporting RTP
    data apply to AMR-WB+ audio over RTP as well, see RTP [3] and any
    applicable RTP profile like AVP [9].  However, the multi-rate
    capability of AMR-WB+ audio coding provides a mechanism for
    controlling congestion, since the bandwidth demand can be adjusted
    by selecting a different coding mode or lower internal sampling
    rate.
 
    Another parameter that may impact the bandwidth demand for AMR-WB+
    is the number of frames that are encapsulated in each RTP payload.
    Packing more frames in each RTP payload can reduce the number of
    packets sent and hence the overhead from IP/UDP/RTP headers, at the
    expense of increased delay and reduced error robustness against
    packet losses.
 
    If forward error correction (FEC) is used to combat packet loss, the
    amount of redundancy added by FEC will need to be regulated so that
    the use of FEC itself does not cause a congestion problem.
 
 6. Security Considerations
 
    RTP packets using the payload format defined in this specification
    are subject to the general security considerations discussed in RTP
    [3]. As this format transports encoded audio, the main security
    issues include confidentiality, integrity protection, and
    authentication of the audio itself.  The payload format itself does
    not have any built-in security mechanisms. Any suitable external
    mechanisms, such as SRTP [10], MAY be used.
 
    This payload format or the AMR-WB+ decoder does not exhibit any
    significant non-uniformity in the receiver side computational
    complexity for packet processing and thus is unlikely to pose a
    denial-of-service threat due to the receipt of pathological data.
 
 6.1. Confidentiality
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 21]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 
    To achieve confidentiality of the encoded AMR-WB+ audio, all audio
    data bits will need to be encrypted.  There is less a need to
    encrypt the payload header or the table of contents due to 1) that
    they only carry information about the frame type, and 2) that this
    information could be useful to some third party, e.g., quality
    monitoring.
 
    As long as the AMR-WB+ payload is only packed and unpacked at either
    end, encryption may be performed after packet encapsulation so that
    there is no conflict between the two operations.
 
 
 6.2. Authentication and Integrity
 
    To authenticate the sender of the audio and provide integrity
    protection, an external mechanism has to be used.  It is RECOMMENDED
    that such a mechanism protect all the audio data bits and the RTP
    header.
 
    Data tampering by a man-in-the-middle attacker could result in
    erroneous depacketization/decoding that could lower the audio
    quality.
 
    To prevent a man-in-the-middle attacker from tampering with the
    payload packets, some additional information besides the audio bits
    SHOULD be protected.  This may include the ToC, RTP timestamp, RTP
    sequence number, RTP payload type, and the RTP marker bit.
 
 6.3. Decoding Validation
 
    When processing a received payload packet, if the receiver finds
    that the calculated payload length, based on the information of the
    session and the values found in the payload header fields, does not
    match the size of the received packet, the receiver SHOULD discard
    the packet.  This is because decoding a packet that has errors in
    its length field could severely degrade the audio quality.
 
 7. Payload Format Parameters
 
    This section defines the parameters that may be used to select
    features of the AMR-WB+ payload format.  The parameters are defined
    here as part of the MIME subtype registration for the AMR-WB+ audio
    codec.  A mapping of the parameters into the Session Description
    Protocol (SDP) [6] is also provided for those applications that use
    SDP.  Equivalent parameters could be defined elsewhere for use with
    control protocols that do not use MIME or SDP.
 
    The data format and parameters are only specified for real-time
    transport in RTP.
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 22]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 7.1. MIME Registration
 
    The MIME subtype for the Extended Adaptive Multi-Rate Wideband (AMR-
    WB+) codec is allocated from the IETF tree since AMR-WB+ is expected
    to be a widely used audio codec in general streaming applications.
 
    Note, any unspecified parameter MUST be ignored by the receiver.
 
    Media Type name:     audio
 
    Media subtype name:  AMR-WB+
 
    Required parameters:
 
    None
 
    Optional parameters:
 
    These parameters apply to RTP transfer only.
 
    channels:       The maximum number of audio channels present in the
                    audio frames. Permissible values are 1 (mono) or 2
                    (stereo).  If no parameter is present, the maximum
                    number of channels is 2 (stereo).
 
 
    interleaving:   Indicates that frame level interleaving mode SHALL
                    be used for the payload and its value defines the
                    maximum number of frames allowed in an interleaving
                    buffer (see Section 4.4).  If this parameter is not
                    present, interleaving SHALL NOT be used.
 
    int-delay:      The minimal media time delay in RTP timestamp ticks
                    that is needed in the deinterleaving buffer, i.e.
                    the difference in RTP timestamp between the earliest
                    and latest audio frame present in the deinterleaving
                    buffer, to ensure correct decoding.
 
    ptime:          see RFC2327 [6].
 
    maxptime:       see Section 8 in RFC 3267 [7].
 
 
    Encoding considerations:
                 This type is only defined for transfer via RTP (STD 64)
                 and as described in Section 4 of RFC XXXX.
 
    Security considerations:
                 See Section 6 of RFC XXXX.
 
    Public specification:
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 23]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
                 Please refer to Section 10 of RFC XXXX.
 
    Additional information:
                 File storage of the AMR-WB+ format is to be specified
                 within the 3GPP defined ISO based multimedia file
                 format defined in 3GPP TS 26.244, see reference [14] of
                 RFC XXXX. The file format has the MIME types
                 "audio/3GPP" or "video/3GPP" as defined by RFC YYYY
                 [15].
 
                 To maintain interoperability with AMR-WB capable end-
                 points, in cases where negotiation is possible and the
                 AMR-WB+ end-point supporting this format also supports
                 RFC 3267 for AMR-WB transport, an AMR-WB+ end-point
                 SHOULD declare itself also as AMR-WB capable (i.e.
                 supporting also "audio/AMR-WB" as specified in RFC
                 3267).
 
                 As the AMR-WB+ decoder is capable of performing stereo
                 to mono conversions, all receivers of AMR-WB+ should be
                 able to receive both stereo and mono, although the
                 receiver only is capable of playout of mono signals.
 
    Person & email address to contact for further information:
                 johan.sjoberg@ericsson.com
                 ari.lakaniemi@nokia.com
 
    Intended usage: COMMON.
                 It is expected that many IP based streaming
                 applications will use this type.
 
    Author/Change controller:
                 johan.sjoberg@ericsson.com
                 ari.lakaniemi@nokia.com
                 IETF Audio/Video transport working group
 
 
 7.2. Mapping MIME Parameters into SDP
 
    The information carried in the MIME media type specification has a
    specific mapping to fields in the Session Description Protocol (SDP)
    [6], which is commonly used to describe RTP sessions.  When SDP is
    used to specify sessions employing the AMR-WB+ codec, the mapping is
    as follows:
 
    -  The MIME type ("audio") goes in SDP "m=" as the media name.
 
    -  The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
       the encoding name.  The RTP clock rate in "a=rtpmap" SHALL be
       72000 for AMR-WB+, and the encoding parameter number of channels
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 24]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
       MUST either be explicitly set to 1 or 2, or be omitted, implying
       the default value of 2.
 
    -  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
       "a=maxptime" attributes, respectively.
 
    -  Any remaining parameters go in the SDP "a=fmtp" attribute by
       copying them directly from the MIME media type string as a
       semicolon separated list of parameter=value pairs.
 
 
 7.2.1. Offer-Answer Model Considerations
 
    To achieve good interoperability for the AMR-WB+ RTP payload in an
    Offer-Answer [8] negotiative usage in SDP the following
    considerations should be made:
 
    For negotiable offer/answer usage the following interpretations of
    the parameters SHALL be done:
 
   -   The "interleaving" parameter is declarative. For streams
       declared as sendrecv or recvonly: The receiver will accept to
       receive payload using the interleaved mode of the payload format.
       The value declares the amount of buffer space the receiver has
       available for the sender to utilize. For sendonly streams the
       parameter indicates the desired configuration and amount of
       buffer space. A answerer is RECOMMENDED to accept the offered
       value if capable of using them.
 
   -   The "int-delay" parameter is declarative. For streams declared
       as sendrecv or recvonly the value indicate the maximum initial
       delay the receiver will accept in the deinterleaving buffer. For
       sendonly streams the value is the amount of media time the sender
       desires to use, the value SHOULD be copied into any response.
 
   -   The "channels" parameter is declarative. For "sendonly" streams
       it indicates the desired channel usage, stereo and mono, or mono
       only. For "recvonly" and "sendrecv" streams the parameter
       indicates what the receiver accepts to use. As any receiver will
       be capable of receiving stereo mode and perform local mixing with
       the AMR-WB+ decoder, there should normally be no reason to
       restrict to mono only. However certain applications may have
       needs for this indication, where the actual front-end needs to be
       indicated.
 
   -   The "ptime" parameter works as indicated by the offer/answer
       model [8], "maxptime" SHALL be used in the same way.
 
   -   To maintain interoperability with AMR-WB in cases where
       negotiation is possible, an AMR-WB+ capable end-point which also
       implements the AMR-WB payload format [7] is RECOMMENDED to also
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 25]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
       declare itself capable of AMR-WB as it is a subset of the AMR-WB+
       codec.
 
    In declarative usage, like SDP in RTSP [16] or SAP [17], the
    following interpretation of the parameters SHALL be done:
 
    -  The "interleaving" parameter if present configures the payload
       format in that mode, and the value indicates the number of frames
       that the deinterleaving buffer is required to support to be able
       to handle this session correctly.
 
    -  The "int-delay" parameter, indicates the initial buffering delay
       required to receive this stream correctly.
 
    -  The "channels" parameter indicates if the content being
       transmitted can contain either both stereo and mono modes, or
       only mono.
 
    -  All other parameters indicate the value that are being used by
       the sending entity.
 
 
 7.2.2. Examples
 
    One example SDP session description utilizing AMR-WB+ mono and
    stereo encoding follow.
 
     m=audio 49120 RTP/AVP 99
     a=rtpmap:99 AMR-WB+/72000/2
     a=fmtp:99 interleaving=30; int-delay=86400
     a=maxptime:100
 
    Note that the payload format (encoding) names are commonly shown in
    upper case.  MIME subtypes are commonly shown in lower case.  These
    names are case-insensitive in both places.  Similarly, parameter
    names are case-insensitive both in MIME types and in the default
    mapping to the SDP a=fmtp attribute.
 
 8. IANA Considerations
 
    It is requested that one new MIME subtype (audio/amr-wb+) is
    registered by IANA, see Section 7.
 
 9. Acknowledgements
 
    The authors would like to thank Redwan Salami and Stefan Bruhn for
    their significant contributions made throughout the writing and
    reviewing of this document. We would also like to acknowledge
    Qiaobing Xie coauthor of RFC 3267 on which this document is based
    on.
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 26]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 10. References
 
 10.1. Normative references
 
    [1]  3GPP TS 26.290 "Audio codec processing functions; Extended AMR
         Wideband codec; Transcoding functions", version 1.0.0 (2004-
         05), 3rd Generation Partnership Project (3GPP).
    [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, Internet Engineering Task Force,
         March 1997.
    [3]  H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A
         Transport Protocol for Real-Time Applications", STD 64, RFC
         3550, Internet Engineering Task Force, July 2003.
    [4]  3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
         aspects", version 5.0.0 (2001-03), 3rd Generation Partnership
         Project (3GPP).
    [5]  3GPP TS 26.193 "AMR Wideband speech codec; Source Controled
         Rate operation", version 5.0.0 (2001-03), 3rd Generation
         Partnership Project (3GPP).
    [6]  Handley, M. and V. Jacobson, "SDP: Session Description
         Protocol", RFC 2327, Internet Engineering Task Force, April
         1998.
    [7]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
         Time Transport Protocol (RTP) Payload Format and File Storage
         Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-
         Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, Internet
         Engineering Task Force, June 2002.
    [8]  J. Rosenberg, and H. Schulzrinne, "An Offer/Answer Model with
         the Session Description Protocol (SDP)", RFC 3264, June 2002.
 
 10.2. Informative References
 
    [9]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
         with Minimal Control", STD 65, RFC 3551, Internet Engineering
         Task Force, July 2003.
    [10] Baugher, et. al., "The Secure Real Time Transport Protocol",
         RFC 3711, Internet Engineering Task Force, March 2004.
    [11] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
         Generic Forward Error Correction", RFC 2733, December 1999.
    [12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley,
         M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP
         Payload for Redundant Audio Data", RFC 2198, September 1997.
    [13] 3GPP TS 26.233 "Packet Switched Streaming service", version
         5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP).
    [14] 3GPP TS 26.244 " Transparent end-to-end packet switched
         streaming service (PSS); 3GPP file format (3GP)", version 6.0.0
         (2004-03), 3rd Generation Partnership Project (3GPP).
    [15] D. Singer, and R. Castagno, "MIME Type Registrations for 3GPP
         Multimedia files," RFC YYYY (draft-singer-avt-3gpp-mime-
         02.txt), Internet Engineering Task Force, September 2003.
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 27]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
    [16] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
         Protocol (RTSP)", RFC 2326, Internet Engineering Task Force,
         April 1998.
    [17] M. Handley, C. Perkins, E. Whelan, " Session Announcement
         Protocol", RFC 2974, Internet Engineering Task Force, June
         2001.
 
    ETSI documents can be downloaded from the ETSI web server,
    "http://www.etsi.org/".  Any 3GPP document can be downloaded from
    the 3GPP webserver, "http://www.3gpp.org/", see specifications.  TIA
    documents can be obtained from "www.tiaonline.org".
 
 
 11. Authors' Addresses
 
    Johan Sjoberg
    Ericsson Research
    Ericsson AB
    SE-164 80 Stockholm, SWEDEN
 
    Phone:   +46 8 50878230
    EMail: Johan.Sjoberg@ericsson.com
 
 
    Magnus Westerlund
    Ericsson Research
    Ericsson AB
    SE-164 80 Stockholm, SWEDEN
 
    Phone:   +46 8 4048287
    EMail: Magnus.Westerlund@ericsson.com
 
 
    Ari Lakaniemi
    Nokia Research Center
    P.O.Box 407
    FIN-00045 Nokia Group, FINLAND
 
    Phone:   +358-71-8008000
    EMail: ari.lakaniemi@nokia.com
 
 
 RFC Editor Considerations
 
    The RFC editor is requested to replace all occurances of XXXX with
    the RFC number this document receives. It is also requested that all
    occurances of YYYY is replaced with the RFC number that [15]
    receives when published. Also the reference [15] is requested to be
    updated with the correct information upon publication of that
    document.
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 28]


 INTERNET-DRAFT       RTP payload format for AMR-WB+     June 17, 2004
 
 
 IPR Notice
 
    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.
 
    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use
    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.
 
    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
    this standard.  Please address the information to the IETF at ietf-
    ipr@ietf.org.
 
 
 12. Copyright Notice
 
    Copyright (C) The Internet Society (2004).  This document is subject
    to the rights, licenses and restrictions contained in BCP 78, and
    except as set forth therein, the authors retain all their rights.
 
    This document and the information contained herein are provided on
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
    INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 
    This Internet-Draft expires in December 2004.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 29]