Network Working Group                                    Johan Sjoberg
 INTERNET-DRAFT                                       Magnus Westerlund
 Expires: June 2005                                            Ericsson
                                                          Ari Lakaniemi
                                                                  Nokia
                                                       December 3, 2004
 
 
     Real-Time Transport Protocol (RTP) Payload Format for Extended AMR
                       Wideband (AMR-WB+) Audio Codec
                    <draft-ietf-avt-rtp-amrwbplus-03.txt>
 
 
 Status of this memo
 
    By submitting this Internet-Draft, each author represents that
    any applicable patent or other IPR claims of which he or she is
    aware have been or will be disclosed, and any of which he or she
    becomes aware will be disclosed, in accordance with Section 6 of
    RFC 3668.
 
    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.
 
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."
 
    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/1id-abstracts.txt
 
    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html
 
    This document is a submission of the IETF AVT WG.  Comments should
    be directed to the AVT WG mailing list, avt@ietf.org.
 
 
 Abstract
 
    This document specifies a real-time transport protocol (RTP) payload
    format to be used for Extended AMR Wideband (AMR-WB+) encoded audio
    signals. The AMR-WB+ codec is an audio extension of the AMR-WB codec
    providing additional frame types designed to give higher quality of
    music and speech than the original frame types.  A media type
    registration is included for AMR-WB+.
 
 
 
 Sjoberg, et. al.                                              [Page 1]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
 TABLE OF CONTENTS
 
 1. Definitions.....................................................3
    1.1. Glossary...................................................3
    1.2. Terminology................................................3
 2. Introduction....................................................3
 3. Background on AMR-WB+ and Design Principles.....................4
    3.1. The AMR-WB+ Audio Codec....................................5
    3.2. Multi-rate Encoding and Rate Adaptation....................7
    3.3. Voice Activity Detection and Discontinuous Transmission....8
    3.4. Support for Multi-Channel Session..........................8
    3.5. Unequal Bit-error Detection and Protection.................8
    3.6. Robustness against Packet Loss.............................8
       3.6.1. Use of Forward Error Correction (FEC).................9
       3.6.2. Use of Frame Interleaving............................10
    3.7. AMR-WB+ Audio over IP scenarios...........................11
 4. RTP Payload Format for AMR-WB+.................................12
    4.1. RTP Header Usage..........................................13
    4.2. Payload Structure.........................................13
    4.3. Payload definitions.......................................14
       4.3.1. The Payload Table of Contents........................14
       4.3.2. Audio Data...........................................20
       4.3.3. Methods for Forming the Payload......................20
       4.3.4. Payload Examples.....................................20
    4.4. Interleaving Considerations...............................23
    4.5. Implementation Considerations.............................23
       4.5.1. ISF recovery when frames are lost....................24
 5. Congestion Control.............................................26
 6. Security Considerations........................................26
    6.1. Confidentiality...........................................27
    6.2. Authentication and Integrity..............................27
    6.3. Decoding Validation.......................................27
 7. Payload Format Parameters......................................27
    7.1. Media Type Registration...................................28
    7.2. Mapping Media Type Parameters into SDP....................29
       7.2.1. Offer-Answer Model Considerations....................30
       7.2.2. Examples.............................................31
 8. IANA Considerations............................................32
 9. Contributors...................................................32
 10. Acknowledgements..............................................32
 11. References....................................................32
    11.1. Normative references.....................................32
    11.2. Informative References...................................33
 12. Authors' Addresses............................................34
 13. IPR Notice....................................................34
 14. Copyright Notice..............................................35
 15. Changes.......................................................35
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 2]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 1. Definitions
 
 1.1. Glossary
 
    3GPP    - the Third Generation Partnership Project
    AMR     - Adaptive Multi-Rate Codec
    AMR-WB  - Adaptive Multi-Rate Wideband Codec
    AMR-WB+ - Extended Adaptive Multi-Rate Wideband Codec
    CMR     - Codec Mode Request
    CN      - Comfort Noise
    DTX     - Discontinuous Transmission
    FEC     - Forward Error Correction
    FT      - Frame Type
    ISF     - Internal Sampling Frequency
    SCR     - Source Controlled Rate Operation
    SID     - Silence Indicator (the frames containing only CN
              parameters)
    TFI     - Transport Frame Index
    TS      - Timestamp
    VAD     - Voice Activity Detection
    UED     - Unequal Error Detection
    UEP     - Unequal Error Protection
 
 
 1.2. Terminology
 
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC 2119 [2].
 
    n^r is exponentiation where n is multiplied by itself r times; n and
    r are integers. k%m denotes the modulo operation (k mod m), i.e. the
    remainder part from the operation k/m; k and m are integers.
 
 
 2. Introduction
 
    This document specifies the payload format for packetization of
    Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] encoded audio
    signals into the Real-time Transport Protocol (RTP) [3].  The
    payload format supports transmission of mono or stereo audio,
    aggregating multiple frames per payload, and mechanisms enhancing
    robustness against packet loss.
 
    AMR-WB+ codec is an extension to the Adaptive Multi-Rate Wideband
    (AMR-WB) codec and therefore has a couple of features not available
    in AMR-WB.  The new features in transport point of view are native
    support also for stereophonic audio and the possibility to use
    different internal sampling frequencies.  The primary usage scenario
    for AMR-WB+ is transport over IP and therefore AMR-WB-like need for
    interworking with other transport networks is not necessary.
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 3]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
    AMR-WB+ will mainly be used in streaming scenarios and there the
    benefit of using an octet-aligned format to decrease the complexity
    of the server is seen substantial, and therefore anything similar to
    the bandwidth efficient mode defined in [7] is not specified for
    AMR-WB+; the saved bandwidth using bandwidth efficient mode would
    also be very small for all extension frame types as they are octet
    aligned.
 
    The inbuilt codec support for stereo encoding makes the RTP payload
    format implementation of multi-channel support as in AMR and AMR-WB
    [7] difficult, but also less needed.  Therefore, the multi-channel
    support as specified in AMR and AMR-WB payload format is not
    specified for AMR-WB+. Due to all these changes, and the different
    scope of the AMR-WB+ codec this formats defines a new significantly
    different RTP payload format compared to the ones for AMR and AMR-WB
    [7].
 
    There is no file format for AMR-WB+ defined within this
    specification.  Instead the 3GPP defined ISO based 3GP file format
    [14] will support AMR-WB+, and provides all functionality needed
    from a file format.  This format does also support storage of AMR
    and AMR-WB, plus other multi-media formats allowing for synchronized
    playback.  As the 3GP format provides much greater capability than
    the previously defined formats for AMR and AMR-WB, this format is
    expected to be used and be sufficient for all use cases.
 
    Background on AMR-WB+ and design principles can be found in Section
    3.  The payload format itself is specified in Section 4 and follows
    the principles used in [3] and [9] .  In Section 7, a media type
    registration is provided.
 
 
 3. Background on AMR-WB+ and Design Principles
 
    The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec
    is designed for compression of speech and audio achieving low bit-
    rate with good quality. The codec is specified by 3GPP, and primary
    target applications within 3GPP are packet-switched streaming
    service (PSS) [13] and multimedia messaging service (MMS). However,
    due to its flexibility and robustness, AMR-WB+ is very well suited
    for streaming services in highly varying transport environments,
    e.g. the Internet.
 
    Because of the flexibility of this codec, the behavior in a
    particular application is controlled by several parameters that
    select options or specify the acceptable values for a variable.
    These options and variables are described in general terms at
    appropriate points in the text of this specification as parameters
    to be established through out-of-band means. In Section 7, all of
    the parameters are specified in the form of media type registration
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 4]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    for the AMR-WB+ encoding. The method used to signal these parameters
    at session setup or to arrange prior agreement of the participants
    is beyond the scope of this document; however, Section 7.2 provides
    a mapping of the parameters into the Session Description Protocol
    (SDP) [6] for those applications that use SDP.
 
 
 3.1. The AMR-WB+ Audio Codec
 
    The AMR-WB+ audio codec was originally developed by 3GPP to be used
    for streaming and messaging services in GSM and 3G cellular systems.
    AMR-WB+ is designed as an audio extension to the AMR-WB speech
    codec. The extension adds new functionality to the codec in order to
    provide high audio quality for a large range of signals including
    music. Stereophonic operation has also been added where a new high-
    efficiency hybrid stereo coding algorithm enables stereo operation
    at bit-rates as low as 6.2 kbit/s in total.
 
    The AMR-WB+ audio codec includes the nine frame types specified for
    AMR-WB, extended with new bit-rates ranging from 5.2 to 48 kbit/s.
    Whereas the AMR-WB frame types employ 16000 Hz sampling frequency
    and operates only on monophonic signals, the extension can operate
    at a number of internal sampling frequencies, ISFs, both in mono and
    stereo, see Table 24 in [1]. However, the output sampling frequency
    of the decoder is limited to 8, 16, 24, 32 or 48 kHz.
 
    An overview of the AMR-WB+ encoding operations is as follows. The
    encoder receives the audio sampled at for example 48 kHz.  The
    encoding process starts with pre-processing and resampling to the
    Internal Sampling Frequency (ISF) used.  The encoding is performed
    on equal sized super-frames, each corresponding to 2048 samples per
    channel at the ISF.  The codec performs a number of encoding
    decisions for each super-frame choosing between different encoding
    algorithms and block lengths giving fidelity-optimized encoding
    adapted to the signal characteristics of the source.  The stereo
    encoding (if used) is performed separately from the monophonic core
    encoding, thus enabling the selection of different combinations of
    core and stereo encoding rates.  The resulting encoded audio is
    produced in 4 equally long transport frames, individually usuable by
    the decoder, corresponding to 512 samples.
 
    The codec supports 13 different ISFs, ranging from 12.8 up to 38.4
    kHz as described by table 24 in [1].  This allows a trade-off
    between audio bandwidth and the bit-rate required.  As encoding is
    performed on 2048 samples at the ISF, the duration of a super-frame
    and the effective bit-rate of the used frame-type varies.  The ISF
    of 25600 Hz has a super-frame duration of 80 ms and is also the norm
    used to describe the encoding bit-rates.  Using this normalization,
    the ISF selection results in bit-rate variations from 1/2 up to 3/2
    of the nominal bit-rate.
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 5]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    The audio encoding is, as previously stated, performed on equally-
    sized super-frames, each corresponding to 2048 samples per encoded
    channel.  The super-frames are encoded in 4 equally-sized transport
    frames, i.e. corresponding to 512 samples per channel, each being
    individually useful in the decoding. For the individual transport
    frames to be decodable, the position within the super-frame must be
    known. The encoding for the extension modes is performed as one
    monophonic core encoding and one stereo encoding. The core encoding
    is performed by splitting the monophonic signal into a lower and a
    higher frequency band. The lower band is encoded using either
    algebraic code excited linear prediction (ACELP) or transform coded
    excitation (TCX), which is selected once per transport frame with
    certain allowed combinations within the super-frame. The higher band
    is encoded using a low-rate parametric bandwidth extension
    approach. The stereo signal is encoded using a similar frequency
    band decomposition as that for the mono signal, however here the
    signal is divided into three bands that are individually
    parameterized using different techniques.
 
    The total bit-rate produced by the extension is the result of the
    combination of the encoder's core rate, stereo rate and ISF. The
    extension supports 8 different core encoding modes producing bit-
    rates between 10.4 and 24.0 kbits/s, see table 22 of [1]. There are
    16 stereo encoding rates generating bit-rates between 2.0 and 8.0
    kbits/s, see table 23 of [1]. The frame-type encodes the AMR-WB
    modes, 4 fixed extension modes (see below), 24 combinations of core
    and stereo modes for stereo signals, and the 8 core modes for mono
    signals as listed in table 25 in [1]. This results in that the AMR-
    WB+ supports encodings between 10.4 and 32 kbits/s using an ISF of
    25600 Hz. Further freedom in produced bit-rates and quality is
    available by using different ISFs. The selection of an ISF will
    change the available audio bandwidth of the reconstructed signal,
    and at the same time change the total bit-rate. The bit-rate for a
    given combination of frame-type and ISF is determined by multiplying
    the frame-type's bit-rate with the used ISF's bit-rate factor (see
    table 24 of [1]).
 
    The extension also has 4 frame-types, which have fixed core bit-
    rates, stereo bit-rates and ISFs, see frame-types 10-13 in Table 21
    in [1]. These four pre-defined frame types have a fixed input
    sampling frequency to the encoder set at 16 or 24 kHz respectively.
    These frame types share the property with the AMR-WB modes that each
    transport frame is only representing 20 ms of audio signal, however
    they are also part of 80 ms super-frames. Thus frame-types 0-13
    (AMR-WB and fixed extension rates) as listed in table 21 of [1] do
    not require explicit ISF indication. The other frame types 14-47
    requires the ISF employed to be indicated.
 
    The fact that the extension has 32 different frame-types that can be
    combined with 13 ISFs allows for a great flexibility in bit-rate and
    selection of desired quality. For example there exist a number of
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 6]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    combinations that will produce the same codec bit-rate. One possible
    way of producing a 32 kbits/s audio stream is to utilize frame type
    41, i.e. 25.6 kbits/s, and the ISF of 32kHz (5/4 * (19.2+6.4) = 32
    kbits/s), and another way is to use frame type 47 and the ISF of
    25.6 kHz (1 * (24 + 8) = 32 kbits/s). Which combination to use
    depends on the content being encoded. In the above example the first
    case has greater audio bandwidth, while the second spends more bits
    on somewhat lesser audio bandwidth.
 
    The duration of one AMR-WB+ audio transport frame can vary and
    depends on the ISF. Since a transport frame always correspond to 512
    samples at the used ISF, its duration is limited to the range 13.33
    to 40 ms. The RTP TS clock rate 72000 Hz results in an AMR-WB+
    transport frame length from 960 to 2880 ticks. If the internal
    sampling rate is set to 25600 Hz, a transport frame is equal to 20
    ms and the super-frame is equal to 80 ms.
 
    The encoder is able to change the used ISF and encoding frame type
    (both mono and stereo) during an encoding session. For the extension
    frame types with index 10-13 and 16-47, ISF and frame-type changes
    are constrained to occur at super-frame boundaries, i.e. within a
    super-frame the ISF is constant. Such a limitation does not apply
    for frame types with index 0-9, i.e. the original AMR-WB frame
    types.
 
    In conclusion there are some features that needs special
    consideration from transport point of view. First that the frame
    duration depends on the ISF puts requirements on the RTP
    timestamping. Secondly each frame of encoded audio must maintain
    information about its frame-type, ISF and position in the super
    frame (ISF is only required for frame-type 14-47).
 
 
 3.2. Multi-rate Encoding and Rate Adaptation
 
    The multi-rate encoding capability of AMR-WB+ is designed for
    preserving high audio quality under a wide range of bandwidth
    requirements and transmission conditions.
 
    AMR-WB+ enables seamless switching between frame types using the
    same number of audio channels and the same internal sampling
    frequency. Every AMR-WB+ codec implementation is required to support
    all the respective audio coding frame types defined by the codec and
    must be able to handle switching between any two frame types.
    Switching between frame types employing different number of audio
    channels or different internal sampling frequency is possible, but
    may not be seamless. Therefore it is recommended to perform such
    switching infrequently and if possible during periods where the
    input is silent.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 7]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 3.3. Voice Activity Detection and Discontinuous Transmission
 
    AMR-WB+ supports the same algorithms for voice activity detection
    (VAD) and generation of comfort noise (CN) parameters during silence
    periods as used by the AMR-WB codec. However it can only be used in
    together with the AMR-WB frame types (FT=0-8). As with the AMR-WB
    codec this option allows for reduction of the number of transmitted
    bits and packets during silence periods to a minimum when operating
    in the AMR-WB frame types (FT = 0..8). The operation of sending CN
    parameters at regular intervals during silence periods is usually
    called discontinuous transmission (DTX) or source controlled rate
    (SCR) operation.  The AMR-WB+ frames containing CN parameters are
    called Silence Indicator (SID) frames. See more details about VAD
    and DTX functionality in [4] and [5].
 
 
 3.4. Support for Multi-Channel Session
 
    Some of the AMR-WB+ frame types support encoding of stereophonic
    audio. Because of this native support for two-channel stereophonic
    signal it does not seem necessary to support multi-channel transport
    with separate codecs as done in AMR-WB RTP payload [7].  The codec
    has the capability of stereo to mono downmixing. Thus also receiver
    only capable of playout of mono, can still decode and play signals
    originally encoded as stereo. However, to avoid spending bit-rate on
    stereo encoding that will not be utilized, a mechanism for signaling
    mono only support is defined.
 
 
 3.5. Unequal Bit-error Detection and Protection
 
    The audio bits encoded in each AMR-WB frame are sorted according to
    their different perceptual sensitivity to bit errors. This property
    can be exploited e.g. in cellular systems to achieve better voice
    quality by using unequal error protection and detection (UEP and
    UED) mechanisms. However, the bits of the extension frame types of
    the AMR-WB+ codec do not have a consistent sensitivity property and
    are not sorted in sensitivity order. Thus, UEP or UED cannot be
    utilized with the extension frame types. If there is a need to use
    UEP or UED and for a payload format supporting this, please use the
    RTP payload format for the AMR-WB frame types defined in RFC 3267
    [7].
 
 
 3.6. Robustness against Packet Loss
 
    The payload format supports several means, including forward error
    correction (FEC) and frame interleaving, to increase robustness
    against packet loss.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 8]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 3.6.1. Use of Forward Error Correction (FEC)
 
    The simple scheme of repetition of previously sent data is one way
    of achieving FEC. Another possible scheme which can be more
    bandwidth efficient is to use payload external FEC, e.g. RFC2733
    [11], which generates extra packets containing repair data.  For the
    AMR-WB+ extension frame types, it is only possible to use the codec
    to send redundant copies using the same frame type and internal
    sampling frequency. We describe such a scheme next.
 
    This involves the simple retransmission of previously transmitted
    frames together with the current frame(s). This is done by using a
    sliding window to group the audio frames to be sent in each payload.
    Figure 1 below shows us an example.
 
    --+--------+--------+--------+--------+--------+--------+--------+--
      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
    --+--------+--------+--------+--------+--------+--------+--------+--
 
      <---- p(n-1) ---->
               <----- p(n) ----->
                        <---- p(n+1) ---->
                                 <---- p(n+2) ---->
                                          <---- p(n+3) ---->
                                                   <---- p(n+4) ---->
 
    Figure 1: An example of redundant transmission.
 
    In this example each frame is retransmitted once in the following
    RTP payload packet. Here, f(n-2)..f(n+4) denotes a sequence of audio
    frames and p(n-1)..p(n+4) a sequence of payload packets.
 
    The use of this approach does not require signaling at the session
    setup. In other words, the audio sender can choose to use this
    scheme without consulting the receiver. This is because a packet
    containing redundant frames will not look different from a packet
    with only new frames. For a certain timestamp, the receiver may
    receive multiple copies of a frame containing encoded audio data or
    frames indicated as NO_DATA.
 
    This redundancy scheme provides the same functionality as the one
    described in RFC 2198 "RTP Payload for Redundant Audio Data" [12].
    In most cases the mechanism in this payload format is more efficient
    and simpler than requiring both endpoints to support RFC 2198 in
    addition. There is one situation in which use of RFC 2198 is
    indicated: if some other codec than AMR-WB+ is desired for the
    redundant encoding, the AMR-WB+ payload format won't be able to
    carry it.
 
    The sender is responsible for selecting an appropriate amount of
    redundancy based on feedback about the channel, e.g., in RTCP
 
 
 
 Sjoberg, et. al.            Standards Track                  [Page 9]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    receiver reports. The sender is also responsible for avoiding
    congestion, which may be exacerbated by redundancy (see Section 5
    for more details).
 
 
 3.6.2. Use of Frame Interleaving
 
    To decrease protocol overhead, the payload design allows several
    audio frames be encapsulated into a single RTP packet. One of the
    drawbacks of such an approach is that in case of packet loss this
    means loss of several consecutive audio frames, which usually causes
    clearly audible distortion in the reconstructed audio. Interleaving
    of frames can improve the audio quality in such cases by
    distributing the consecutive losses into a series of single frame
    losses.  However, interleaving and bundling several frames per
    payload will also increase end-to-end delay and sets higher
    buffering requirements, and it is therefore not appropriate for all
    usage scenarios. Anyway, streaming applications will most likely be
    able to exploit interleaving to improve audio quality in lossy
    transmission conditions.
 
    This payload design supports the use of frame interleaving as an
    option.  The usage of this feature needs to be negotiated or at
    least signaled.
 
    The interleaving supported by this format is rather flexible. For
    example, a continuous pattern can be defined, as the below example
    shows.
 
    --+--------+--------+--------+--------+--------+--------+--------+--
      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
    --+--------+--------+--------+--------+--------+--------+--------+--
 
               [ P(n)   ]
      [ P(n+1) ]                 [ P(n+1) ]
                        [ P(n+2) ]                 [ P(n+2) ]
                                          [ P(n+3) ]                 [P(
                                                            [ P(n+4) ]
 
    Figure 2: Example of interleaving pattern that has constant delay.
 
    In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
    aggregated two in each packet with interleaving. The packets, P(n)
    to P(n+4), contains a pattern that allows for constant delay in both
    interleaving and deinterleaving process.  The deinterleaving buffer
    in this example needs to have room for at least 3 frames including
    the one that is ready to be consumed. The case when this is needed
    is for example when f(n) is the next to be played, then the receiver
    would have consumed all previous frames, and will need to have f(n),
    f(n+1) and f(n+3) in the buffer.  Then when it is time to consume
    f(n+1) no more RTP packet is need. When f(n+2) is to be consumed
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 10]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    then P(n+3) is needed and the deinterleaving buffer will contain
    f(n+2), f(n+3) and f(n+5).
 
 
 3.7. AMR-WB+ Audio over IP scenarios
 
    Since the primary target application for the AMR-WB+ codec is packet
    switched streaming, the most relevant usage scenario for this
    payload format is IP end-to-end between a server and a terminal, as
    shown in Figure 3.
 
              +----------+                          +----------+
              |          |    IP/UDP/RTP/AMR-WB+    |          |
              |  SERVER  |<------------------------>| TERMINAL |
              |          |                          |          |
              +----------+                          +----------+
 
               Figure 3: Server to terminal IP scenario
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 11]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 4. RTP Payload Format for AMR-WB+
 
    Despite belonging to a same family of codecs, the payload format for
    the AMR-WB+ is different from the AMR and AMR-WB payload formats
    [7]. The main emphasis in the payload design has been to minimize
    the overhead in typical use cases, while still providing full
    flexibility with slightly higher overhead. This is made possible by
    defining some frame specific parameters to cover all frames in the
    payload instead of defining them for each frame separately. The
    payload format has two modes, the basic mode, and the interleaved
    mode. The main structural difference between the two modes is the
    extension of the table of content entries with a frame displacement
    fields in the interleaved mode.
 
    The basic mode supports aggregation of multiple consecutive frames
    in a payload.  The interleaved mode supports aggregation of multiple
    frames that are non-consecutive in time.  In both modes it is
    possible to have frames encoded at different frame types in the same
    payload, but the internal sampling frequency must remain constant
    throughout the payload. However, frequent switching of the internal
    sampling frequency is not expected, and codec is restricted to
    switch ISF only on super-frame boundaries.  Thus, the payload format
    allows ISF switching only between payloads.
 
    The payload format is designed around the property that AMR-WB+
    frames are consecutive and between any ISF change share the same
    frame duration.  Then enables the receiver to derive the timestamp
    for an individual frame within a payload from either the order
    (basic mode) or the compact displacement fields (interleaving mode).
    The timestamp is used to regenerate the correct order of frames
    after reception, identify duplicates, and detect lost frames that
    require concealment.
 
    The interleaving scheme of this payload format is significantly more
    flexible than the one present in RFC 3267.  The AMR and AMR-WB
    payload format is only capable of using periodic patterns with
    frames taken from an interleaving group at fixed intervals, whereas
    this interleaving scheme allows for any patterns as long as the
    difference in decoding order between any two adjacent frames in the
    interleaved payload is not more than 256 frames.  Note that even at
    highest ISF this allows interleaving depth up to 3.41 seconds.
 
    To allow for error resiliency through redundant transmission, the
    periods covered by multiple packets MAY overlap in time.  A receiver
    MUST be prepared to receive any audio frame multiple times, all
    multiply sent frames MUST use the same frame type, ISF and have the
    same RTP timestamp, or be a NO_DATA frame.
 
    The payload consists of octet aligned elements (header, TOC and
    audio frames), and only the audio frames for AMR-WB frame-types (0-
    9) requires any padding to make them an integral number of octets
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 12]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    long.  If additional padding is required to bring the payload length
    to a larger multiple of octets or for some other purpose, then the P
    bit in the RTP header MAY be set and padding appended as specified
    in [3].
 
 
 4.1. RTP Header Usage
 
    The format of the RTP header is specified in [3].  This payload
    format uses the fields of the header in a manner consistent with
    that specification.
 
    The RTP timestamp corresponds to the sampling instant of the first
    sample encoded for the first frame in the packet.  The timestamp
    clock frequency SHALL be 72000 Hz. This frequency allows the frame
    duration to be integer RTP timestamp ticks for the used internal
    sampling frequencies, and also gives reasonable conversion factors
    to used audio sampling frequencies. See section 4.3.1 for how to
    derive the RTP timestamp for any audio frame beyond the first one.
 
    The RTP header marker bit (M) SHALL be set to 1 if the first frame
    carried in the packet contains an audio frame, which is the first in
    a talkspurt. For all other packets the marker bit SHALL be set to
    zero (M=0).
 
    The assignment of an RTP payload type for this new packet format is
    outside the scope of this document, and will not be specified here.
    It is expected that the RTP profile under which this payload format
    is being used will assign a payload type for this encoding or
    specify that the payload type is to be bound dynamically.
 
    The media type parameter "channels" is used to indicate the maximum
    number of channels allowed to be used for a given payload type.  A
    payload type where channels=1 (mono), SHALL only carry mono content.
    While a payload type for which channels=2 has been declared MAY
    carry both mono and stereo content.
 
 
 4.2. Payload Structure
 
    The complete payload consists of a payload header, a payload table
    of contents, and the audio data representing one or more audio
    frames.  The following diagram shows the general payload format
    layout:
 
    +----------------+-------------------+----------------
    | payload header | table of contents | audio data ...
    +----------------+-------------------+----------------
 
    Payloads containing more than one audio frame are called compound
    payloads.
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 13]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
    The following sections describe the variations taken by the payload
    format depending on whether the AMR-WB+ session is set up to use the
    basic mode or interleaved mode.
 
 
 4.3. Payload Definitions
 
 4.3.1. The Payload Header
 
    The payload header carries data that is common for all frames in the
    payload. The structure of the payload header is described below.
 
     0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+
    |   ISF   |TFI|L|
    +-+-+-+-+-+-+-+-+
 
    ISF (5 bits): Indicates the Internal Sampling Frequency employed for
       all frames in this payload.  The index value corresponds to
       internal sampling frequency as specified in Table 24 in [1].
       This field SHALL be set to 0 for Frame Type values 0-13.
 
    TFI (2 bits): Transport Frame Index from 0 (first) to 3 (last)
       indicating the position of the first transport frame of this
       payload in the AMR-WB+ super-frame structure.  This field SHALL
       be set to 0 for Frame Type values 0-9, and SHALL be ignored by
       the receiver.
 
    L (1 bit): Long displacement field flag for payloads in interleaved
       mode.  If set to 0, four-bit displacement fields are used to
       indicate interleaving offset; if set to 1, displacement fields of
       eight bits are used (see section 4.3.2.2).  For payloads in the
       basic mode this bit SHALL be set to 0 and SHALL be ignored by the
       receiver.
 
    Note that the change of ISF during a session always requires
    separate packets for frames employing different ISF value.
    Furthermore, in the interleaved mode the ISF switching also requires
    termination of the previous interleaving pattern and restarting a
    new one for the new ISF.
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 14]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 4.3.2. The Payload Table of Contents
 
    The table of contents (ToC) consists of a list of ToC entries where
    each entry corresponds to a group of audio frames carried in the
    payload, i.e.,
 
    +----------------+----------------+- ... -+----------------+
    |  ToC entry #1  |  Toc entry #2  |          ToC entry #N  |
    +----------------+----------------+- ... -+----------------+
 
    When multiple groups of frames are present in a payload, the ToC
    entries SHALL be placed in the packet in order of their creation
    time.
 
 
 4.3.2.1. ToC Entry in the Basic Mode
 
    A ToC entry of a payload in the basic mode takes the following
    format:
 
     0                   1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Frame Type  |    #frames    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    F (1 bit): If set to 1, indicates that this ToC entry is followed by
       another ToC entry; if set to 0, indicates that this ToC entry is
       the last one in the ToC.
 
    Frame Type (FT) (7 bits): Indicates the audio codec frame type used
       for the group of frames corresponding to this ToC entry.  FT
       indicates the combination of AMR-WB+ core and stereo rate,
       special AMR-WB+ frame types, the AMR-WB rate, or comfort noise,
       as specified by Table 25 in [1].
 
    #frames (8 bits): This field indicates the number of frames in the
       group corresponding to this ToC entry.  The number of frames is
       the value of this field plus one, i.e. in the range 1-256.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 15]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 4.3.2.2. ToC Entry in the Interleaved Mode
 
    A ToC entry of a payload in the interleaved mode takes the following
    format if the L-bit in the payload header is set to 0:
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Frame Type  |    #frames    |  DIS1 |  ...  |  DISi |  ...  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  ...  |  ...  |  DISn |  padd |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    F (1 bit): See definition in 4.3.2.1.
 
    Frame Type (FT) (7 bits): See definition in 4.3.2.1.
 
    #frames (8 bits): See definition in 4.3.2.1.
 
    DIS1...DISn (4 bits): A list of n (n=#frames) displacement fields
       indicating the displacement of the i:th (i=1..n) audio frame
       relative to the preceding audio frame in the payload as number of
       frames.  The four-bit displacement values may be between 0 and 15
       indicating the number of audio frames in decoding order between
       the (i-1):th and the i:th frame in the payload.  Note that for
       the first ToC entry of the payload the value of DIS1 has no
       meaning, since this frame's location in the decoding order is
       uniquely defined by the RTP timestamp and TFI in the payload
       header.  For the first ToC entry of a payload the DIS1 SHALL be
       set to zero, and the receiver SHALL ignore the value.  Note also
       that for subsequent ToC entries DIS1 indicates the number of
       frames between the last frame of the previous group and the first
       frame of this group.
 
    Padd (4 bits): Four padding bits SHALL be included at the end of the
       ToC entry in case there is odd number of frames in the group
       corresponding to this entry.  These bits SHALL be set to zero and
       SHALL be ignored by the receiver.  If a group containing an even
       number of frames is associated with this ToC entry, these padding
       bits SHALL NOT be included in the payload.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 16]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    A ToC entry of a payload in the interleaved mode takes the following
    format if the L-bit in the payload header is set to 1:
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Frame Type  |    #frames    |      DIS1     |      ...      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      ...      |     DISn      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
    F (1 bit): See definition in 4.3.2.1.
 
    Frame Type (FT) (7 bits): See definition in 4.3.2.1.
 
    #frames (8 bits): See definition in 4.3.2.1.
 
    DIS1...DISn (8 bits): A list of n (n=#frames) displacement fields
       indicating the displacement of the i:th (i=1..n) audio frame
       relative to the preceding audio frame in the payload as number of
       frames.  The four-bit displacement values may be between 0 and
       255 indicating the number of audio frames in decoding order
       between the (i-1):th and the i:th frame in the payload.  Note
       that for the first ToC entry of the payload the value of DIS1 has
       no meaning, since this frame's location in the decoding order is
       uniquely defined by the RTP timestamp and TFI in the payload
       header.  For the first ToC entry of a payload the DIS1 SHALL be
       set to zero, and the receiver SHALL ignore the value.  Note also
       that for subsequent ToC entries DIS1 indicates the displacement
       between the last frame of the previous group and the first frame
       of this group.
 
 
 4.3.2.3. RTP Timestamp Derivation
 
    The RTP Timestamp value for a frame is the timestamp value of the
    first sample encoded in the frame. The timestamp value for a frame
    is derived differently depending on if the payload is in basic or
    interleaved mode. In both cases the first frame in a compound packet
    has an RTP timestamp equal to the one given in the RTP header. In
    the basic mode, the RTP time for any subsequent frame is derived by
    adding together the frame durations of all the previous frames and
    adding that to the RTP header timestamp value. For example if the
    RTP Header timestamp value is 12345, the payload carries four
    frames, and the frame duration is 16 ms (ISF = 32 kHz) corresponding
    to 1152 timestamp ticks, the RTP timestamp of the fourth frame in
    the payload is 12345 + 3 * 1152 = 15801.
 
    In interleaved mode the RTP timestamp for each frame in the payload
    is derived by combining the RTP header timestamp and the sum of the
    time offsets of all preceding frames in this payload. The frame
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 17]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    timestamps are computed based on displacement fields and the frame
    duration derived from the ISF value.  Note that the displacement in
    time between frame i-1 and frame i is (DISi + 1) * frame duration
    because also the duration of the (i-1):th must be taken into
    account.  The following example derives the RTP timestamps for the
    frames in an interleaved mode payload having the following header
    and ToC information:
 
    RTP header timestamp: 12345
    ISF = 32 kHz
    Frame 1 displacement field: DIS1 = 0
    Frame 2 displacement field: DIS2 = 6
    Frame 3 displacement field: DIS3 = 4
    Frame 4 displacement field: DIS4 = 7
 
    The ISF value 32 kHz implies frame duration of 16 ms, which means
    1152 ticks in 72 kHz timestamp rate.  The timestamp of the first
    frame in the payload is the RTP timestamp, i.e. TS1 = RTP TS. Note
    that the displacement field value for this frame must be ignored.
    For the second frame in the payload the timestamp can be calculated
    as TS2 = TS1 + (DIS2 + 1) * 1152 = 20409. For the third frame the
    timestamp is TS3 = TS2 + (DIS3 + 1) * 1152 = 26169. Finally, for the
    fourth frame of the payload we have TS4 = TS3 + (DIS4 + 1) * 1152 =
    35385.
 
    The value of Frame Type is defined in Table 25 in [1]. FT=14
    (AUDIO_LOST) is used to indicate frames that are lost. NO_DATA
    (FT=15) frame could mean either that there is no data produced by
    the audio encoder for that frame or that no data for that frame is
    transmitted in the current payload (i.e., valid data for that frame
    could be sent either in an earlier or later packet). The duration
    for these non-included frames is dependent on the internal sampling
    frequency indicated by the ISF field.
 
    For frame types with index 0-13 the ISF field SHALL be set 0 and has
    no meaning. The frame duration for these frame types are fixed to 20
    ms in time, i.e. 1440 ticks in 72 kHz. For payloads containing only
    frame types with index 0-9 the TFI field SHALL be set to 0, and
    lacks meaning.
 
 
 4.3.2.4. Other TOC Considerations
 
    If receiving a ToC entry with a FT value not defined, the whole
    packet SHOULD be discarded.  This is to avoid the loss of data
    synchronization in the depacketization process, which can result in
    a severe degradation in audio quality.
 
    Note that packets containing only NO_DATA frames SHOULD NOT be
    transmitted.  Also, NO_DATA frames at the end of a frame sequence to
    be carried in a payload SHOULD NOT be included in the transmitted
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 18]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    packet.  The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX
    described in [5] and SHALL only be used in combination with the AMR-
    WB frame types (0-8).
 
    When multiple groups of frames are present, their ToC entries will
    be placed in the ToC in order of their creation time independent on
    payload mode. In basic mode the frames will be consecutive in time,
    while in interleaved mode the frames may not only be non-consecutive
    in time but may even have varying inter frame distances.
 
 4.3.2.5. ToC Examples
 
    The following figure shows an example of a ToC for three audio
    frames in basic mode.  Note that in this case all audio frames are
    encoded using the same frame type, i.e. there is only one ToC entry.
 
     0                   1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Frame Type1 |  #frames = 3  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
    The following figure shows an example of a ToC of three entries in
    basic mode.  Note that also in this case the payload carries three
    frames, but three ToC entries are needed since all frames of the
    payload are encoded using different frame types.
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Frame Type1 |  #frames = 1  |1| Frame Type2 |  #frames = 1  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Frame Type3 |  #frames = 1  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
    The following figure shows an example of a TOC of two entries in
    interleaved mode using four-bit displacement fields.  The payload
    includes two groups of frames, the first one including a single
    frame, and the other one consisting of two frames.
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Frame Type1 |  #frames = 1  |  DIS1 |  padd |0| Frame Type2 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  #frames = 2  |  DIS1 |  DIS2 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 19]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 4.3.3. Audio Data
 
    Audio data of a payload contains one or more audio frames or comfort
    noise frames, as described in the ToC of the payload.
 
       Note, for ToC entries with FT=14 or 15, there will be no
       corresponding audio frame present in the audio data.
 
    Each audio frame for an extension frame type represents an AMR-WB+
    transport frame corresponding to the encoding of 512 samples of
    audio sampled with the internal sampling frequency specified by the
    ISF indicator. Frame types with index 10-13, being the exception,
    are only capable of using a single internal sampling frequency
    (25600 Hz).  The encoding rates (combination of core bit-rate and
    stereo bit-rate) are indicated in the frame type field of the
    corresponding ToC entry. The octet length of the audio frame is
    implicitly defined by the frame type field and is given in tables 21
    and 25 of [1]. The order and numbering notation of the bits are as
    specified in [1]. As specified there, the bits of the AMR-WB audio
    frames (frame type values in range 0...8) have been rearranged in
    order of decreasing sensitivity. For the AMR-WB+ extension frame
    types and comfort noise frames, the bits are in the order produced
    by the encoder. The last octet of each audio frame MUST be padded
    with zeroes at the end if not all bits in the octet are used. In
    other words, each audio frame MUST be octet-aligned. However, all
    extension frame types (10-13, 16-47) specified in [1] lead to octet-
    aligned frames.
 
 
 4.3.4. Methods for Forming the Payload
 
    The payload begins with the payload header, followed by the table of
    contents consisting of a list of ToC entries.
 
    The audio data follows the table of content, all of the octets
    comprising an audio frame are appended to the payload as a unit. The
    audio frames are packed in timestamp order within each group of
    frames (per TOC entry). Each group of frames are packed in the same
    order as their corresponding ToC entries are arranged in the ToC
    list, with the exception that a ToC entry with FT=14 or 15, there
    will be no data octets present for that group of frames.
 
 
 4.3.5. Payload Examples
 
 4.3.5.1. Example 1, Basic Mode Payload Carrying Multiple Frames Encoded
    Using the Same Frame Type
 
    The following diagram shows a payload that carries three AMR-WB+
    frames encoded using 14 kbits/s frame type (FT=26) with a frame
    length of 280 bits (35 bytes). The internal sampling frequency in
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 20]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    this example is 25.6 kHz (ISF = 8).  The TFI for the first frame is
    2, indicating that the first transport frame in this payload is the
    third in a super-frame.  Since this payload is in the basic mode the
    subsequent frames of the payload are consecutive frames in decoding
    order, i.e. the fourth transport frame of the current super-frame
    and the first transport frame of the next super-frame.  Note that
    because the frames are all encoded using the same frame type, only
    one ToC entry is required.
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ISF = 8 | 2 |0|0|  FT = 26    |  #frames = 3  |   f1(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...           | f1(272...279) |   f2(0...7)   |               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f2(272...279) |   f3(0...7)   | ...                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                                           | f3(272...279) |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
 4.3.5.2. Example 2, Basic Mode Payload Carrying Multiple Frames Encoded
    Using Different Frame Types
 
    The following diagram shows a payload that carries three AMR-WB+
    frames; the first frame is encoded using 18.4 kbits/s frame type
    (FT=33) with a frame length of 368 bits (46 bytes), and the two
    subsequent frames are encoded using 20 kbits/s frame type (FT=35)
    having frame length of 400 bits (50 bytes).  The internal sampling
    frequency in this example is 32 kHz (ISF = 10), implying the overall
    bit-rates of 23 kbits/s for the first frame of the payload, and 25
    kbits/s for the subsequent frames.  The TFI for the first frame is
    1, indicating that the first transport frame in this payload is the
    second in a super-frame.  Since this is a payload in the basic mode
    the subsequent frames of the payload are consecutive frames in
    decoding order, i.e. the third and fourth transport frames of the
    current super-frame.  Note that since the payload carries two
    different frame types, there are two ToC entries.
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 21]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  ISF=10 | 1 |0|1|  FT = 33    |  #frames = 1  |0|  FT = 35    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  #frames = 2  |   f1(0...7)   | ...                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f1(360...367) |   f2(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f2(392...399) |   f3(0...7)   | ...                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f3(392...399) |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
 4.3.5.3. Example 3, Payload in Interleaved Mode
 
    This example shows a payload in interleaved mode carrying four
    frames encoded using 32 kbits/s frame type (FT=47) with frame length
    of 640 bits (80 bytes).  The internal sampling frequency is 38.4 kHz
    (ISF = 12) implying bit-rate of 48 kbits/s for all frames in the
    payload.  The TFI for the first frame is 0, i.e. it is the first
    transport frame of a super-frame. The displacement fields for the
    subsequent frames are DIS2=18, DIS3=15, and DIS4=10, which implies
    that the subsequent frames have the TFIs of 3, 3, and 2,
    respectively.  The long displacement field flag L in the payload
    header is set to 1, which means that the displacement fields in the
    ToC entry use eight bits.  Note that since all frames of this
    payload are encoded using the same frame type, there is need only
    for a single ToC entry.  Furthermore, the displacement field for the
    first frame corresponding to the first ToC entry (DIS1=0) must be
    ignored since its timestamp and TFI are defined by the RTP timestamp
    and the TFI found in the payload header.
 
    The RTP timestamp values of the frames in this example is:
    Frame1: TS1 = RTP Timestamp
    Frame2: TS2 = TS1 + 19 * 960
    Frame3: TS3 = TS2 + 16 * 960
    Frame4: TS4 = TS3 + 11 * 960
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 22]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  ISF=12 | 0 |1|0|  FT = 47    |  #frames = 4  |   DIS1 = 0    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   DIS2 = 18   |   DIS3 = 15   |   DIS4 = 10   |   f1(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f1(632...639) |   f2(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f2(632...639) |   f3(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f3(632...639) |   f4(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f4(632...639) |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 
 4.4. Interleaving Considerations
 
    The flexible interleaving scheme requires some further usage
    considerations. As presented in the example in Section 3.6.2, an
    interleaving pattern requires certain sizes of the deinterleaving
    buffer. This required buffer space, expressed as number of frame
    slots is expressed using the "interleaving" media parameter. The
    number of frame slots needed, can be converted into actual memory
    requirement, considering the largest (in bytes) combination of AMR-
    WB+'s core and stereo rates.
 
    However the frame buffer size is not always sufficient to determine
    when it is appropriate to start consuming frames from the
    interleaving buffer. Two cases exist, either due to switching of the
    internal sampling frequency or due to changes of the interleaving
    pattern. Due to this the "int-delay" media type parameter is
    defined. It allows a sender to indicate the minimal media time that
    needs to be present in the buffer before starting to consume media
    from the buffer.
 
 
 4.5. Implementation Considerations
 
    An application implementing this payload format MUST understand all
    the payload parameters in the out-of-band signaling used.  For
    example, if an application uses SDP, all the SDP and MIME parameters
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 23]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    in this document MUST be understood.  This requirement ensures that
    an implementation always can decide if it is capable or not of
    communicating.
 
    Both basic and interleaving mode SHALL be implemented. The
    implementation burden of both is rather small and requiring both
    ensures interoperability.  As the AMR-WB+ codec contains all the
    functionality of the AMR-WB codec, anyone supporting the AMR-WB+
    codec and this payload format is RECOMMENDED to also implement the
    payload format in RFC 3267 [7] for the AMR-WB frame types. This will
    significantly help interoperability with other devices that only
    support AMR-WB, in applications and scenarios where this is
    possible. Otherwise an end-point that is in fact capable of
    everything except the RTP payload format for AMR-WB will not be able
    to communicate.
 
    When doing error concealment certain precautions are needed due to
    the possibility of switching of the internal sampling frequency. The
    main difficulty arises from the fact that with packet loss
    information gets lost such as timestamp, frame lengths and the
    chosen ISF. This may lead to that concealment is done using
    incorrect frame lengths, which can in the worst case make some of
    the subsequent frames unusable. More information and an example
    algorithm solving this problem is available in section 4.5.1 below.
 
 
 4.5.1. ISF recovery when frames are lost
 
    In case of packet loss proper error concealment has to be initiated
    in the AMR-WB+ decoder for the lost frames associated with the lost
    packets. Proper frame loss concealment requires a codec framing that
    matches the timestamps of the correctly received frames. Hence, it
    is necessary to recover the timestamps of the lost frames. A
    difficulty with this may arise due to the fact that the codec frame
    length that is associated with the ISF may have changed during the
    frame loss.
 
    The task of recovering the timestamps of lost frames is illustrated
    in an example in which a case is assumed where two frames at
    timestamps t0 and t1 have been received properly, the ISF values
    being isf0 and isf1, respectively. The associated frame lengths (in
    timestamp ticks) are given with L0 and L1, respectively. Three
    frames with timestamps x1 - x3 have been lost. The example further
    assumes that there is one ISF change during the frame loss from isf0
    to isf1, as shown in the figure below.
 
    What is generally not known in the decoder and what is required for
    recovery of the timestamps is:
    * the ISFs associated to the lost frames
    * how many frames have been lost
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 24]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
      |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|
 
      |   Rxd    |   lost   | lost | lost |  Rxd |
    --+----------+----------+------+------+------+--
 
      t0         x1         x2     x3     t1
 
    In the following an example algorithm is given, which may be used to
    recover timestamps and ISFs belonging to lost frames.
 
    As in above example, it is assumed that two frames have been
    received properly with timestamps t0 and t1, and ISF values isf0 and
    isf1, and associated frame lengths L0 and L1, respectively.
    Furthermore, the TFIs of the two received frames are denoted by tfi0
    and tfi1, respectively.
 
    Example Algorithm:
 
    Start:                              # check for frame loss
    If (t0 + L0) == t1 Then goto End    # no frame loss
 
    Step 1:                             # check case with no ISF change
    If (isf0 != isf1) Then goto Step 2  # At least one ISF change
    If (isFractional(t1 - t0)/L0) Then goto Step 3
                                        # More than 1 ISF change
 
    Return recovered timestamps as
    x(n) = t0 + n*L1 and associated ISF equal to isf0, for 0<n<(t1 -
    t0)/L0
    goto End
 
    Step 2:
    Loop initialization: n := 4 - tfi0 mod 4
    While n <= (t1-t0)/L0
      Evaluate m := (t1 - t0 - n*L0)/L1
      If (isInteger(m) AND ((tfi0+n+m) mod 4 == tfi1)) Then goto found;
      n := n+4
      endloop
    goto step 3                         # More than 1 ISF change
 
    found:
    Return recovered timestamps and ISFs as
    x(i) = t0 + i*L0 and associated ISF equal to isf0, for 0 < i <= n
    x(i) = t0 + n*L0 + (i-n)*L1 and associated ISF equal to isf1, for n
    < i <= n+m
    goto End
 
    Step 3:
    More than 1 ISF change has occurred. Since ISF changes can be
    assumed to be infrequent, such a situation occurs only if long
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 25]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    sequences of frames are lost. In that case it is not useful to
    recover the timestamps of the lost frames. Rather, the AMR-WB+
    decoder should be reset and decoding should be resumed starting with
    the frame with timestamp t1.
 
    End:
 
    The above algorithm does still not solve the issue when the buffer
    depth is shallower than the loss burst. In these cases the when
    conecealment must be done without any knowledge about future frames,
    the conecealment result in loss of frame boundary alignment. If that
    occur, it may be necessary to restart the audio codec and perform
    resynchronization.
 
 
 5. Congestion Control
 
    The general congestion control considerations for transporting RTP
    data apply to AMR-WB+ audio over RTP as well, see RTP [3] and any
    applicable RTP profile like AVP [9].  However, the multi-rate
    capability of AMR-WB+ audio coding provides a mechanism for
    controlling congestion, since the bandwidth demand can be adjusted
    by selecting a different coding frame type or lower internal
    sampling rate.
 
    Another parameter that may impact the bandwidth demand for AMR-WB+
    is the number of frames that are encapsulated in each RTP payload.
    Packing more frames in each RTP payload can reduce the number of
    packets sent and hence the overhead from IP/UDP/RTP headers, at the
    expense of increased delay and reduced error robustness against
    packet losses.
 
    If forward error correction (FEC) is used to combat packet loss, the
    amount of redundancy added by FEC will need to be regulated so that
    the use of FEC itself does not cause a congestion problem.
 
 
 6. Security Considerations
 
    RTP packets using the payload format defined in this specification
    are subject to the general security considerations discussed in RTP
    [3] and any applicable profile such as AVP [9] or SAVP [10]. As this
    format transports encoded audio, the main security issues include
    confidentiality, integrity protection, and authentication of the
    audio itself.  The payload format itself does not have any built-in
    security mechanisms. Any suitable external mechanisms, such as SRTP
    [10], MAY be used.
 
    This payload format or the AMR-WB+ decoder does not exhibit any
    significant non-uniformity in the receiver side computational
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 26]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    complexity for packet processing and thus is unlikely to pose a
    denial-of-service threat due to the receipt of pathological data.
 
 
 6.1. Confidentiality
 
    To achieve confidentiality of the encoded AMR-WB+ audio, all audio
    data bits will need to be encrypted.  There is less a need to
    encrypt the payload header or the table of contents due to 1) that
    they only carry information about the frame type, and 2) that this
    information could be useful to some third party, e.g., quality
    monitoring.
 
    As long as the AMR-WB+ payload is only packed and unpacked at either
    end, encryption may be performed after packet encapsulation so that
    there is no conflict between the two operations.
 
 
 6.2. Authentication and Integrity
 
    To authenticate the sender of the audio and provide integrity
    protection, an external mechanism has to be used.  It is RECOMMENDED
    that such a mechanism protect at least the complete RTP payload and
    header.
 
    Data tampering by a man-in-the-middle attacker could replace audio
    content and also result in erroneous depacketization/decoding that
    could lower the audio quality.
 
 
 6.3. Decoding Validation
 
    When processing a received payload packet, if the receiver finds
    that the calculated payload length, based on the information of the
    session and the values found in the payload header fields, does not
    match the size of the received packet, the receiver SHOULD discard
    the packet.  This is because decoding a packet that has errors in
    its fields used to indicate nr of frames or frame-type, which is
    used to determine data lengths could severely degrade the audio
    quality.
 
 
 7. Payload Format Parameters
 
    This section defines the parameters that may be used to select
    features of the AMR-WB+ payload format.  The parameters are defined
    here as part of the media type registration for the AMR-WB+ audio
    codec.  A mapping of the parameters into the Session Description
    Protocol (SDP) [6] is also provided for those applications that use
    SDP.  Equivalent parameters could be defined elsewhere for use with
    control protocols that do not use MIME or SDP.
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 27]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
    The data format and parameters are only specified for real-time
    transport in RTP.
 
 
 7.1. Media Type Registration
 
    The media type for the Extended Adaptive Multi-Rate Wideband (AMR-
    WB+) codec is allocated from the IETF tree since AMR-WB+ is expected
    to be a widely used audio codec in general streaming applications.
 
    Note, any unspecified parameter MUST be ignored by the receiver.
 
    Media Type name:     audio
 
    Media subtype name:  AMR-WB+
 
    Required parameters:
 
    None
 
    Optional parameters:
 
    channels:       The maximum number of audio channels present in the
                    audio frames. Permissible values are 1 (mono) or 2
                    (stereo).  If no parameter is present, the maximum
                    number of channels is 2 (stereo).
 
    interleaving:   Indicates that frame level interleaving mode SHALL
                    be used for the payload.  The parameter specifies
                    the number of frame slots required in a
                    deinterleaving buffer (including the frame that is
                    ready to be consumed).  Its value is equal to one
                    plus the maximum number of frames that precede any
                    frame in transmission order and follow the frame in
                    RTP timestamp order.  If this parameter is not
                    present, interleaving SHALL NOT be used.
 
    int-delay:      The minimal media time delay in RTP timestamp ticks
                    that is needed in the deinterleaving buffer, i.e.
                    the difference in RTP timestamp between the earliest
                    and latest audio frame present in the deinterleaving
                    buffer, to ensure correct decoding.
 
    ptime:          see RFC2327 [6].
 
    maxptime:       see Section 8 in RFC 3267 [7].
 
    Restriction on Usage:
                 This type is only defined for transfer via RTP (STD
                 64).
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 28]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
    Encoding considerations:
 
    Security considerations:
                 See Section 6 of RFC XXXX.
 
    Interoperability considerations:
                 To maintain interoperability with AMR-WB capable end-
                 points, in cases where negotiation is possible and the
                 AMR-WB+ end-point supporting this format also supports
                 RFC 3267 for AMR-WB transport, an AMR-WB+ end-point
                 SHOULD declare itself also as AMR-WB capable (i.e.
                 supporting also "audio/AMR-WB" as specified in RFC
                 3267).
 
                 As the AMR-WB+ decoder is capable of performing stereo
                 to mono conversions, all receivers of AMR-WB+ should be
                 able to receive both stereo and mono, although the
                 receiver only is capable of playout of mono signals.
 
    Public specification:
                 RFC XXXX
                 3GPP TS 26.290, see reference [1] of RFC XXXX
 
    Additional information:
                 This MIME type is not applicable for file storage.
                 Instead file storage of AMR-WB+ encoded audio is
                 specified within the 3GPP defined ISO based multimedia
                 file format defined in 3GPP TS 26.244, see reference
                 [14] of RFC XXXX. This file format has the MIME types
                 "audio/3GPP" or "video/3GPP" as defined by RFC 3839
                 [15].
 
    Person & email address to contact for further information:
                 johan.sjoberg@ericsson.com
                 ari.lakaniemi@nokia.com
 
    Intended usage: COMMON.
                 It is expected that many IP based streaming
                 applications will use this type.
 
    Change controller:
                 IETF Audio/Video Transport working group delegated from
                 the IESG.
 
 
 7.2. Mapping Media Type Parameters into SDP
 
    The information carried in the media type specification has a
    specific mapping to fields in the Session Description Protocol (SDP)
    [6], which is commonly used to describe RTP sessions.  When SDP is
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 29]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    used to specify sessions employing the AMR-WB+ codec, the mapping is
    as follows:
 
    -  The media type ("audio") goes in SDP "m=" as the media name.
 
    -  The media type (payload format name) goes in SDP "a=rtpmap" as
       the encoding name.  The RTP clock rate in "a=rtpmap" SHALL be
       72000 for AMR-WB+, and the encoding parameter number of channels
       MUST either be explicitly set to 1 or 2, or be omitted, implying
       the default value of 2.
 
    -  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
       "a=maxptime" attributes, respectively.
 
    -  Any remaining parameters go in the SDP "a=fmtp" attribute by
       copying them directly from the MIME media type string as a
       semicolon separated list of parameter=value pairs.
 
 
 7.2.1. Offer-Answer Model Considerations
 
    To achieve good interoperability for the AMR-WB+ RTP payload in an
    Offer-Answer [8] negotiate usage in SDP the following considerations
    should be made:
 
    For negotiable offer/answer usage the following interpretations of
    the parameters SHALL be done:
 
    -  The "interleaving" parameter is symmetric, thus requiring that
       the answerer must also include it for the answer to a offered
       payload type containing the parameter. However the buffer space
       value is declarative in usage in unicast.For multicast usage the
       same value in the response is required to accept the payload
       type. For streams declared as sendrecv or recvonly: The receiver
       will accept to receive payload using the interleaved mode of the
       payload format. The value declares the amount of buffer space the
       receiver has available for the sender to utilize. For sendonly
       streams the parameter indicates the desired configuration and
       amount of buffer space. An answerer is RECOMMENDED to respond
       using the offered value, if capable of using it.
 
    -  The "int-delay" parameter is declarative. For streams declared
       as sendrecv or recvonly the value indicate the maximum initial
       delay the receiver will accept in the deinterleaving buffer. For
       sendonly streams the value is the amount of media time the sender
       desires to use, the value SHOULD be copied into any response.
 
    -  The "channels" parameter is declarative. For "sendonly" streams
       it indicates the desired channel usage, stereo and mono, or mono
       only. For "recvonly" and "sendrecv" streams the parameter
       indicates what the receiver accepts to use. As any receiver will
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 30]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
       be capable of receiving stereo frame type and perform local
       mixing with the AMR-WB+ decoder, there is normally only one
       reason to restrict to mono only.  That reason is to avoid
       spending bit-rate on data that are not utilized if the front-end
       only is capable of mono.
 
    -  The "ptime" parameter works as indicated by the offer/answer
       model [8], "maxptime" SHALL be used in the same way.
 
    -  To maintain interoperability with AMR-WB in cases where
       negotiation is possible, an AMR-WB+ capable end-point which also
       implements the AMR-WB payload format [7] is RECOMMENDED to also
       declare itself capable of AMR-WB as it is a subset of the AMR-WB+
       codec.
 
    In declarative usage, like SDP in RTSP [16] or SAP [17], the
    following interpretation of the parameters SHALL be done:
 
    -  The "interleaving" parameter, if present, configures the payload
       format in that mode, and the value indicates the number of frames
       that the deinterleaving buffer is required to support to be able
       to handle this session correctly.
 
    -  The "int-delay" parameter indicates the initial buffering delay
       required to receive this stream correctly.
 
    -  The "channels" parameter indicates if the content being
       transmitted can contain either both stereo and mono rates, or
       only mono.
 
    -  All other parameters indicate the value that are being used by
       the sending entity.
 
 
 7.2.2. Examples
 
    One example SDP session description utilizing AMR-WB+ mono and
    stereo encoding follow.
 
     m=audio 49120 RTP/AVP 99
     a=rtpmap:99 AMR-WB+/72000/2
     a=fmtp:99 interleaving=30; int-delay=86400
     a=maxptime:100
 
    Note that the payload format (encoding) names are commonly shown in
    upper case.  Media subtypes are commonly shown in lower case.  These
    names are case-insensitive in both places.  Similarly, parameter
    names are case-insensitive both in MIME types and in the default
    mapping to the SDP a=fmtp attribute.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 31]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 8. IANA Considerations
 
    It is requested that one new MIME subtype (audio/amr-wb+) is
    registered by IANA, see Section 7.
 
 
 9. Contributors
 
    Daniel Enstrom has contributed with writing the codec introduction
    section. Stefan Bruhn has contributed by writing the ISF recovery
    algorithm.
 
 10. Acknowledgements
 
    The authors would like to thank Redwan Salami and Stefan Bruhn for
    their significant contributions made throughout the writing and
    reviewing of this document.  Anisse Taleb and Ingemar Johansson
    contributed by implementing the payload format, and thus helped
    locating some flaws.  We would also like to acknowledge Qiaobing
    Xie, coauthor of RFC 3267 on which this document is based on.
 
 
 11. References
 
 11.1. Normative references
 
    [1]  3GPP TS 26.290 "Audio codec processing functions; Extended AMR
         Wideband codec; Transcoding functions", version 6.0.0 (2004-
         09), 3rd Generation Partnership Project (3GPP).
    [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, Internet Engineering Task Force,
         March 1997.
    [3]  H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A
         Transport Protocol for Real-Time Applications", STD 64, RFC
         3550, Internet Engineering Task Force, July 2003.
    [4]  3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
         aspects", version 5.0.0 (2001-03), 3rd Generation Partnership
         Project (3GPP).
    [5]  3GPP TS 26.193 "AMR Wideband speech codec; Source Controled
         Rate operation", version 5.0.0 (2001-03), 3rd Generation
         Partnership Project (3GPP).
    [6]  Handley, M. and V. Jacobson, "SDP: Session Description
         Protocol", RFC 2327, Internet Engineering Task Force, April
         1998.
    [7]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
         Time Transport Protocol (RTP) Payload Format and File Storage
         Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-
         Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, Internet
         Engineering Task Force, June 2002.
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 32]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    [8]  J. Rosenberg, and H. Schulzrinne, "An Offer/Answer Model with
         the Session Description Protocol (SDP)", RFC 3264, Internet
         Engineering Task Force, June 2002.
 
 
 11.2. Informative References
 
    [9]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
         with Minimal Control", STD 65, RFC 3551, Internet Engineering
         Task Force, July 2003.
    [10] Baugher, et. al., "The Secure Real Time Transport Protocol",
         RFC 3711, Internet Engineering Task Force, March 2004.
    [11] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
         Generic Forward Error Correction", RFC 2733, Internet
         Engineering Task Force, December 1999.
    [12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley,
         M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP
         Payload for Redundant Audio Data", RFC 2198, Internet
         Engineering Task Force, September 1997.
    [13] 3GPP TS 26.233 "Packet Switched Streaming service", version
         5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP).
    [14] 3GPP TS 26.244 " Transparent end-to-end packet switched
         streaming service (PSS); 3GPP file format (3GP)", version 6.1.0
         (2004-09), 3rd Generation Partnership Project (3GPP).
    [15] D. Singer, and R. Castagno, "MIME Type Registrations for 3rd
         Generation Partnership Project (3GPP) Multimedia files," RFC
         3839, Internet Engineering Task Force, July 2004.
    [16] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
         Protocol (RTSP)", RFC 2326, Internet Engineering Task Force,
         April 1998.
    [17] M. Handley, C. Perkins, E. Whelan, " Session Announcement
         Protocol", RFC 2974, Internet Engineering Task Force, June
         2001.
 
    Any 3GPP document can be downloaded from the 3GPP webserver,
    "http://www.3gpp.org/", see specifications.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 33]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
 
 12. Authors' Addresses
 
    Johan Sjoberg
    Ericsson Research
    Ericsson AB
    SE-164 80 Stockholm, SWEDEN
 
    Phone:   +46 8 7190000
    EMail: Johan.Sjoberg@ericsson.com
 
 
    Magnus Westerlund
    Ericsson Research
    Ericsson AB
    SE-164 80 Stockholm, SWEDEN
 
    Phone:   +46 8 7190000
    EMail: Magnus.Westerlund@ericsson.com
 
 
    Ari Lakaniemi
    Nokia Research Center
    P.O.Box 407
    FIN-00045 Nokia Group, FINLAND
 
    Phone:   +358-71-8008000
    EMail: ari.lakaniemi@nokia.com
 
 
 13. IPR Notice
 
    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.
 
    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use
    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.
 
    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 34]


 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 30, 2004
 
 
    this standard.  Please address the information to the IETF at ietf-
    ipr@ietf.org.
 
 
 14. Copyright Notice
 
    Copyright (C) The Internet Society (2004).  This document is subject
    to the rights, licenses and restrictions contained in BCP 78, and
    except as set forth therein, the authors retain all their rights.
 
    This document and the information contained herein are provided on
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
    INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 
    This Internet-Draft expires in March 2005.
 
 
 RFC Editor Considerations
 
    The RFC editor is requested to replace all occurrences of XXXX with
    the RFC number this document receives.
 
    The RFC editor is also requested to remove the next section
    "Changes".
 
 
 Changes
 
    Changes in draft-ietf-avt-rtp-amrwbplus-03.txt compared to draft-
    ietf-avt-rtp-amrwbplus-02.txt:
 
    - Totally changed the payload format layout to reduce overhead
       (Section 4).
    - Updated the Offer/Answer definition for the interleaving
       parameter.
    - Rewritten the codec introduction to better explain the codec
       (Section 3.1).
    - Updated the security consideration on authentication.
    - Numerous editorial changes.
 
 
 
 
 
 
 
 
 
 
 
 Sjoberg, et. al.            Standards Track                 [Page 35]