Internet Engineering Task Force                  Johan Sjoberg, Ericsson
Audio Video Transport WG                     Magnus Westerlund, Ericsson
INTERNET-DRAFT                                      Ari Lakaniemi, Nokia
February 19, 2001                               Petri Koskelainen, Nokia
Expires: August 19, 2001                        Bernhard Wimmer, Siemens
                                                Tim Fingscheidt, Siemens
                                                  Qiaobing Xie, Motorola
                                                  Sanjay Gupta, Motorola


                       RTP payload format for AMR
                    <draft-ietf-avt-rtp-amr-04.txt>


Status of this Memo


   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This document is an individual submission to the IETF. Comments
   should be directed to the authors.


Abstract

   This document describes a proposed real-time transport protocol (RTP)
   payload format for AMR speech encoded signals. The AMR payload format
   is designed to be able to interoperate with existing AMR transport
   formats. This document also includes a MIME type registration for
   AMR. The MIME type is specified for both real-time transport and
   storage.






Sjoberg et al.                                                  [Page 1]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


1. Introduction

   The adaptive multi-rate (AMR) speech codec [1] was developed by the
   European Telecommunications Standards institute (ETSI). The AMR codec
   is standardized for GSM, and is also chosen by 3GPP as the mandatory
   codec for third generation systems. It is currently under
   standardization for TDMA. I.e. the AMR codec will be widely used in
   cellular systems. The AMR codec is developed to preserve high speech
   quality under a wide range of transmission conditions.

   The AMR codec is a multi-mode codec with 8 narrow band speech modes
   with bit rates between 4.75 and 12.2 kbps. The sampling frequency is
   8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per
   frame. The AMR modes are closely related to each other and use the
   same coding framework. Three of the AMR modes are already adopted
   standards of their own, the 6.7 kbps mode as PDC-EFR [7], the 7.4
   kbps mode as IS-641 codec in TDMA [6], and the 12.2 kbps mode as GSM-
   EFR [5].

   The AMR codec is designed with a voice activity detector (VAD) and
   generation of comfort noise (CN) parameters during silence periods.
   Hence, the AMR codec can reduce the number of transmitted bits and
   packets during silence periods to a minimum. The operation to send CN
   parameters at regular intervals during silence periods is usually
   called discontinuous transmission (DTX) or source controlled rate
   (SCR) operation.

   AMR implementations must support all 8 speech coding modes, and mode
   switching can occur to any mode at any time. The mode information
   must therefore be transmitted together with the speech encoded bits,
   to indicate the mode. The AMR speech codec is designed with modes
   producing different bit rates to be able to adapt the source bit rate
   according to the radio link quality in mobile phone systems. The
   objective was to give highest possible speech quality under a variety
   of radio channel conditions. To realize rate adaptation the decoder
   needs to signal the mode it prefers to receive to the encoder.

   Due to the flexibility and robustness of AMR, it is suitable also for
   other purposes than circuit switched cellular systems. Other suitable
   applications are real-time services over packet switched networks.
   The payload format should be designed for robustness against both bit
   errors and packet loss. The speech encoded bits have different
   perceptual sensitivity to bit errors and cellular systems exploit
   this by using unequal error protection and detection (UEP and UED).

   The UED/UEP mechanism focus the correction and detection of corrupted
   bits to the perceptually most sensitive bits. A speech frame is only
   declared damaged if there are bit errors in the most sensitive bits,
   i.e. class A bits. It is acceptable to have some bit errors in the
   other bits, i.e. class B and C. Also a damaged frame is still useful
   for error concealment in the decoding, which uses some of the less



Sjoberg et al.                  [Page 2]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   sensitive bits. This improves the speech quality compared to
   discarding the data.

   Today there exist some link layers that does not discard packets with
   bit errors, e.g. SLIP and some wireless links (with the Internet
   traffic pattern shifting towards a more media-centric one, more link
   layers of such nature may emerge in the future). With transport layer
   support for partial checksums, for example those supported by UDP-
   Lite [10] (work in progress), bit error tolerant AMR traffic could
   achieve better performance over these types of links.

   There are at least two basic approaches for carrying AMR traffic over
   bit error tolerant networks:

    1) Utilizing a partial checksum to cover headers and the most
       important AMR speech bits of the payload. It is recommended that
       at least all class A bits are covered by the checksum.

    2) Utilizing a partial checksum to only cover headers, but a frame
      CRC to cover the class A bits of each AMR frame in the payload.

   In either approach, at least part of the class B/C bits are left
   without error-check and thus bit error tolerance is achieved.

   It is still important that the network designer pays attention to the
   class B and C residual bit error rate. Though less sensitive to error
   than class A bits, class B bits are not insignificant and undetected
   errors in these bits cause degradation in speech quality. An example
   of residual error rates considered acceptable for AMR in UMTS can be
   found in [17].

   Approach 1 is a bit efficient, flexible and simple way, but comes
   with two disadvantages, namely, a) bit errors in protected speech
   bits will cause the payload to be discarded, and b) when transporting
   multiple frames in a payload there is the possibility that a single
   bit error in protected bits gets all the frames discarded.

   These disadvantages can be avoided if needed, with some overhead in
   the form of a frame-wise CRC (Approach 2). In problem a), the CRC
   makes it possible to detect bit errors in class A bits and use the
   frame for error concealment, which gives a small improvement in
   speech quality. Secondly (b), when transporting multiple frames in a
   payload the CRC's remove the possibility that a single bit error in a
   class A bit gets all the frames discarded. Avoiding that gives an
   improvement in speech quality when transporting multiple frames and
   subject to bit errors.

   The choice between the two approaches must be made based on the
   available bandwidth, and desired tolerance to bit errors. Neither
   solution is appropriate to all cases.




Sjoberg et al.                  [Page 3]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   To achieve better robustness against packet loss the payload supports
   FEC. The simple scheme of repetition of previously sent data is one
   possibility. Another possible scheme which is more bandwidth
   efficient is to use payload external FEC, e.g. RFC2733 [16], which
   generates extra packets containing repair data. The whole payload can
   also be sorted in sensitivity order to support external FEC schemes
   using UEP. There is work in progress on a generic version of such a
   scheme [15].


2.  Requirements

   The AMR payload format for RTP was designed to meet the following
   requirements:

     o Different levels of robustness must be supported, from no
      redundant data to extreme robustness capable of handling very
      high packet loss rates with no or small speech quality
      degradation.

     o Fast, bandwidth efficient, frame-wise AMR mode adaptation must
      be supported. This means that it must be possible to send Codec
      Mode Requests back from the receiving side to the transmitting
      side with information on the preferred mode.

     o Source controlled rate operation (SCR) (also called DTX) and
      comfort noise parameter (CN) transmission defined in AMR must be
      supported.


3. Payload format

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119 [3].

   The AMR payload format is designed to be flexible, ranging from very
   low overhead to an extended format with the possibility to increase
   bit error robustness and pack several speech frames in one packet.

   The payload format consists of one payload header, a table of
   content, optionally one CRC per payload frame and zero or more
   payload frames. The payload format is bandwidth efficient. This is
   achieved by not using octet alignment for the payload header, table
   of content or the payload frames, but the full payload is octet
   aligned. If the option to transmit a robust sorted payload is enabled
   and employed, the full payload SHALL finally be ordered in descending
   bit error sensitivity order to be prepared for unequal error
   protection or unequal error detection schemes. The AMR encoded bit
   streams are defined in sensitivity order in Annex B of [2], the




Sjoberg et al.                  [Page 4]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   original order as delivered from the speech encoder is defined in
   [1].

   The last octet of an AMR payload packet MUST be padded with zeroes at
   the end if not all bits are used.

   The AMR frame types, or modes, are defined in [2]. Frame type 15, no
   transmission, is needed to indicate not transmitted frames or lost
   frames. Not transmitted could mean both no data produced by the
   speech encoder for this frame or no data transmitted in this payload,
   i.e. valid data for this frame could be sent in another payload. For
   example, when multiple frames are sent in each payload and comfort
   noise starts. A frame type sequence in a payload with 8 frames,
   speech frames with AMR mode 7 are interrupted by CN in the
   fifth frame, could look like: {7,7,7,7,8,15,15,8}. The AMR SCR/DTX is
   described in [4].

   The AMR payload format supports robust transmission, multiple frames
   in one payload packet, and the use of fast codec mode adaptation.

   Robustness against packet loss can be accomplished by using the
   possibility to retransmit previously transmitted frames together with
   the current frame or frames.

   The AMR performance over error tolerant links can be be improved by
   delivering also speech frames with bit errors. Unequal error
   detection is needed since bit errors SHOULD only be allowed in the
   least error sensitive bits. This payload format provides two
   alternative methods to implement unequal error detection:

   A. CRC calculation over the class A speech bits

      If several consecutive speech frames are packed into each
      payload, the optional CRC may be used to protect the class A
      speech bits, see table 1. The number of class A bits is specified
      as informative in [2] and therefore copied into table 1 as
      normative for this payload format. Speech frames with errors in
      class A bits MUST be marked with SPEECH_BAD for corrupted speech
      frames (FT=0..7) or SID_BAD for corrupted SID frames (FT=8) and
      be sent to the speech decoder, see [4]. In this case the RTP
      header, payload header and table of content should be covered by
      a transport layer checksum, e.g. UDP-lite [10]. Packets should be
      discarded if the transport layer CRC detects errors.

   B. Robust sorting of payload bits

      Robust behavior can also be accomplished by robust sorting of the
      payload. This enables the use of UED (e.g. UDP-lite) and UEP
      (e.g. ULP [15]). The UED and/or UEP is recommended to cover at
      least the RTP header, payload header, table of content and class
      A bits.



Sjoberg et al.                  [Page 5]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001



   Support for unequal error detection is OPTIONAL. If either scheme is
   to be used, it MUST be signalled out of band (see section 8).

                     Class A   total speech
   Index   Mode       bits       bits
   ----------------------------------------
     0     AMR 4.75   42         95
     1     AMR 5.15   49        103
     2     AMR 5.9    55        118
     3     AMR 6.7    58        134
     4     AMR 7.4    61        148
     5     AMR 7.95   75        159
     6     AMR 10.2   65        204
     7     AMR 12.2   81        244
     8     AMR CNG    39         39

   Table 1. Specification of the number of class A bits.

   A frame quality indicator is included for interoperability with the
   ATM payload format described in ITU-T I.366.2, the UMTS Iu interface
   [13] and other transport formats. The speech quality is increased if
   damaged frames are forwarded to the speech decoder error concealment
   unit and not dropped. In many communication scenarios the AMR encoded
   bits will be transmitted from one IP/UDP/RTP terminal to a terminal
   in a system with another transport format and/or vice versa. The
   transport format transcoding will be done in a gate way. A second
   likely scenario is that IP/UDP/RTP is used as transport between other
   systems, i.e. IP is originated and terminated in gate ways on both
   sides of the IP transport.

    AMR over
    I.366.{2,3} or +------+                        +----------+
    3G Iu or       |      |     IP/UDP/RTP/AMR     |          |
    -------------->|  GW  |----------------------->| TERMINAL |
    GSM Abis       |      |                        |          |
    etc.           +------+                        +----------+

   Figure 1: GW to VoIP terminal scenario


    AMR over                                             AMR over
    I.366.{2,3} or +------+                     +------+ I.366.{2,3} or
    3G Iu or       |      |   IP/UDP/RTP/AMR    |      | 3G Iu or
    -------------->|  GW  |-------------------->|  GW  |--------------->
    GSM Abis       |      |                     |      | GSM Abis
    etc.           +------+                     +------+ etc.

   Figure 2. GW to GW scenario





Sjoberg et al.                  [Page 6]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


3.1. The payload header

   The length of the payload header is 6 bits. The bits in the header
   are specified as follows:

   S (1bit): Indicates if set that the payload is robust sorted,
   otherwise simple payload sorting is employed. Note that this bit can
   be set only if the receiver has signaled support for the OPTIONAL
   robust payload sorting.

   C (1 bit): Indicates the existence of optional CRC fields in the
   payload table of content. Note that this bit can be set only if the
   receiver has signaled support for the OPTIONAL CRC.

   R (1 bit): Indicates, if set, that the Codec Mode Request (CMR) is
   valid.

   CMR (3 bits): this field is only valid if the R bit is set(R=1).
   Codec Mode Requested (CMR) for the other communication direction. It
   is only allowed to request the one of the speech modes, frame type
   index 0-7 see Table 1a in [2]. If R=0 the CMR bits SHALL be set to
   zero, other values are for future use.

   0
    0 1 2 3 5 6
   +-+-+-+-+-+-+
   |S|C|R| CMR |
   +-+-+-+-+-+-+

   Figure 3: AMR payload header


3.2. The payload table of content and CRCs

   The table of content (ToC) consists of one table of content entry for
   each speech frame in the payload. A table of content entry includes
   several specified fields as follows:

   F (1 bit): Indicates if this frame is followed by further frames. F=1
   further frames follow, F=0 last frame.

   Q (1 bit): The payload quality bit indicates, if not set, that the
   payload is severely damaged and the receiver should set the RX_TYPE,
   see [4], to SPEECH_BAD or SID_BAD depending on the frame type (FT).

   FT (4 bits): Frame type indicator, indicating the AMR speech coding
   mode or comfort noise (CN) mode. The mapping of existing AMR modes to
   FT is given in Table 1a in [2]. If FT=15 (No transmission) no CRC or
   payload frame is present.





Sjoberg et al.                  [Page 7]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


    0
    0 1 2 3 4 5
   +-+-+-+-+-+-+
   |F|Q|  FT   |
   +-+-+-+-+-+-+

   Figure 5: Table of content entry field

   CRC (8 bits): OPTIONAL field, exists if the payload header bit C is
   set (C=1). The 8 bit CRC is used for error detection. These 8 parity
   bits are generated according to section 4.1.4 in [2].

    0
    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |      CRC      |
   +-+-+-+-+-+-+-+-+

   Figure 5: CRC field

   The ToC and CRCs are arranged with all table of content entries
   fields first followed by all CRC fields. The ToC starts with the
   frame data belonging to the oldest speech frame.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|Q|  FT   |F|Q|  FT   |F|Q|  FT   |      CRC      |      CRC  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |      CRC      |
   +-+-+-+-+-+-+-+-+-+-+

   Figure 5: The ToC and CRCs for a payload with three speech frames


3.3. AMR speech frame

   An AMR speech frame represent one encoded speech frame encode with
   the mode according to the ToC field FT. The length of this field is
   implicitly defined by the AMR mode in the FT field. The bits SHALL be
   sorted according to Appendix B of [2].


3.4. Compound AMR payload

   The compound AMR payload consists of one AMR payload header, the
   table of content and one or more AMR payload frames, see section 3.1,
   3.2 and 3.3. These can be put together with robust or simple payload
   sorting. The payload header bit S indicates the method used.





Sjoberg et al.                  [Page 8]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   Definitions for describing the compound AMR payload:

   b(m)    - bit m of the compound AMR payload
   t(n,m)  - bit m in the table of content entry for speech frame n
   p(n,m)  - bit m in the CRC for speech frame n
   f(n,m)  - bit m in speech frame n
   F(n)    - number of bits in speech frame n, defined by FT
   h(m)    - bit m of payload header
   C       - number of CRC bits , 0 or 8 bits
   N       - number of payload frames in the payload
   S       - number of unused bits

   Payload frames f(n,m) are ordered in consecutive order, where frame
   n=1 is preceding frame n=2. Within one payload all frames between the
   oldest and most recent must be present. If speech data is missing for
   one frame, due to e.g. DTX, send the NO_TRANSMISSION frame type.


3.4.1. Robust payload sorting

   A bit error in a more sensitive bit is subjectively more annoying
   than in a less sensitive bit. Therefore, to be able to protect only
   the most sensitive bits in a payload packet with a forward error
   detection code, e.g. a CRC outside RTP, the bits inside a frame are
   ordered into sensitivity order. The protection SHOULD cover an
   appropriate number of octets from the beginning of the payload,
   covering at least the AMR payload header, ToC and class A bits (see
   [2]). Exactly how many octets that needs protection depends on the
   network and application. To maintain sensitivity ordering inside the
   AMR payload, when more than one speech frame is transmitted in one
   payload, reordering of the data is needed.

   The reordering to maintain the sensitivity ordered AMR payload SHALL
   be performed on bit level. The AMR payload header, ToC and CRCs SHALL
   still be placed unchanged in the beginning of the payload.
   Thereafter, the payload frames are sorted with one bit alternating
   from each payload frame.

   The robust payload sorting algorithm is defined in C-style as:

   /* payload header */
   k=0;
   for (i = 0; i < 6; i++){
     b(k++) = h(i);
   }
   /* table of content */
   for (j = 0; j < N; j++){
     for (i = 0; i < 6; i++){
       b(k++) = t(j,i);
     }
   }



Sjoberg et al.                  [Page 9]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   /* CRCs */
   for (j = 0; j < N; j++){
     for (i = 0; i < C; i++){
       b(k++) = p(j,i);
     }
   }
   /* payload frames */
   max = max(F(0),..,F(N-1));
   for (i = 0; i < max; i++){
     for (j = 0; j < N; j++){
       if (i < F(j)){
         b(k++) = f(j,i);
       }
     }
   }
   /* padding */
   S = 8 - k%8;
   if (S < 8){
     for (i = 0; i < S; i++){
       b(k++) = 0;
     }
   }


3.4.2. Simple payload sorting

   If multiple new frames are encapsulated into the payload and robust
   payload sorting is not used. The payload is formed by concatenating
   the payload header, the ToC, optional CRC fields and the speech
   frames in the payload. However, the bits inside a frame are ordered
   into sensitivity order as defined in [2].

   The simple payload sorting algorithm is defined in C-style as:

   /* payload header */
   k=0;
   for (i = 0; i < 6; i++){
     b(k++) = h(i);
   }
   /* table of content */
   for (j = 0; j < N; j++){
     for (i = 0; i < 6; i++){
       b(k++) = t(j,i);
     }
   }
   /* CRCs */
   for (j = 0; j < N; j++){
     for (i = 0; i < C; i++){
       b(k++) = p(j,i);
     }
   }



Sjoberg et al.                  [Page 10]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   /* payload frames */
   for (j = 0; j < N; j++){
     for (i = 0; i < F(j); i++){
         b(k++) = f(j,i);
       }
     }
   }
   /* padding */
   S = 8 - k%8;
   if (S < 8){
     for (i = 0; i < S; i++){
       b(k++) = 0;
     }
   }


3.5. Decoding security consideration

   If the payload length calculation, using C, F and FT fields, do not
   indicate the same length as the actually received payload size the
   payload should be dropped. Decoding a packet that has errors in
   length indicator bits could severely degrade the speech quality.


4. RTP header usage

   The RTP header marker bit (M) is used to mark (M=1) the packages
   containing the first speech frame after CN. For all other packages
   the marker bit is set to 0 (M=0).

   The timestamp corresponds to the sampling instant of the first sample
   encoded for the first frame in the packet. A frame can be either
   encoded speech, comfort noise parameters, or NO_TRANSMISSION. The
   timestamp unit is in samples. The duration of one AMR speech frame is
   20 ms and the sampling frequency is 8 kHz, corresponding to 160
   encoded speech samples per frame. Thus, the timestamp is increased by
   160 for each consecutive frame. All frames in a packet MUST be
   successive 20 ms frames.


5. Congestion Control

   The need of congestion control for data transported with RTP has to
   be considered. AMR speech data have some elastic properties due to
   the different bandwidth demand for each mode. Another parameter that
   can reduce the bandwidth demand for AMR are how many frames of speech
   data that are encapsulated in each payload. This will reduce the
   number of packets and the overhead from IP/UDP/RTP headers. If using
   forward error correction (FEC) there is also the need to regulate the
   amount, so the FEC itself does not worsen the problem. Therefore, it
   is RECOMMENDED that applications using this payload implements



Sjoberg et al.                  [Page 11]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   congestion control. The actual mechanism for congestion control is
   not specified but should be suitable for real-time flows, e.g.
   "Equation-Based Congestion Control for Unicast Applications" [14].


6. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [8]. This implies that confidentiality of the media
   streams is achieved by encryption.  Because the payload format is
   arranged end-to-end, encryption MAY be performed after encapsulation
   so there is no conflict between the two operations.

   This payload type does not exhibit any significant non-uniformity in
   the receiver side computational complexity for packet processing to
   cause a potential denial-of-service threat.

   As this format transports encoded speech, the main security issues
   are decoding security (see section 3.5), confidentiality and
   authentication of the speech itself. Some other smaller issues also
   exist. The payload format itself does not have any support for
   security. These issues have to be solved by a payload external
   mechanism.

6.1. Confidentiality

   To achieve confidentiality of the encoded speech all speech data bits
   must be encrypted. There is less need to encrypt the payload header
   or the frame header as they only carry information about the
   requested AMR mode, AMR frame type and frame quality. This
   information could be useful to some third party, e.g. quality
   monitoring. The type of encryption used can not only have impact on
   the confidentiality but also on error robustness. The error
   robustness against bit errors will be non, unless an encryption
   method without error-propagation is used, e.g. a stream cipher. This
   is only an issue when using UEP/D, when bit errors can be accepted in
   some part of the payload.

6.2. Authentication

   To authenticate the sender of the speech an external mechanism have
   to be added. It is recommended that such a mechanism protects all the
   speech data bits. To prevent a man in the middle to tamper with the
   packetization of the speech data, some extra data could be protected.
   The data is: RTP timestamp, RTP sequence number, RTP marker bit.
   Tampering could result in erroneous depacketization/decoding that
   could lower speech quality. Tampering with the AMR mode request field
   can result in that the sender must receive speech in a different
   quality than desired.




Sjoberg et al.                  [Page 12]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001



7. Examples

7.1. Simple example

   In the simple example we just send one frame in each RTP packet, no
   valid Codec Mode Request CMR is sent (R=0), the payload was not
   damaged at IP origin (Q=1) and no CRC is used. The AMR mode is the
   5.9 kbps mode (FT=2). The speech encoded bits are put into f(0) to
   f(117) in descending sensitivity order according to [2]. Simple
   payload sorting is used, S=0.


      |                            Bit no.                            |
   Oct|   0       1       2       3       4       5       6       7   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    0 |  S=0  |  C=0  |  R=0  |   0   |   0   |   0   |  F=0  |  Q=1  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    1 |   0   |   0   |   1   |   0   | f(0)  | f(1)  | f(2)  |  ...  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
   16 | f(116)| f(117)|   0   |   0   |   0   |   0   |   0   |   0   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+

   Figure 8: One frame per packet example.


7.2. Example with CRCs

   In this example the two frames with 6.7 kbps mode (FT=3) are sent in
   the payload. A mode request is sent(R=1), requesting the 10.2 kbps
   mode for the other link(CMR=6). CRC is used (C=1). Frame one (134
   bits) is f1(0..133) and frame 2 f2(0..133). For each payload frame a
   CRC is calculated p1(0..7) for frame 1 and p2(0..7) for frame 2.
   Simple payload sorting is used, S=0.




















Sjoberg et al.                  [Page 13]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


      |                            Bit no.                            |
   Oct|   0       1       2       3       4       5       6       7   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    0 |  S=0  |  C=1  |  R=1  |   1   |   1   |   0   |  F=1  |  Q=1  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    1 |   0   |   0   |   1   |   1   |  F=0  |  Q=1  |   0   |   0   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    2 |   1   |   1   | p1(0) | p1(1) | p1(2) | p1(3) | p1(4) | p1(5) |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    3 | p1(6) | p1(7) | p2(0) | p2(1) | p2(2) | p2(3) | p2(4) | p2(5) |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    4 | p2(6) | p2(7) | f1(0) | f1(1) |  ...  |  ...  |  ...  |  ...  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
   20 |  ...  |  ...  |  ...  |  ...  |  ...  |  ...  |f1(132)|f1(133)|
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
   21 | f2(0) | f2(1) |  ...  |  ...  |  ...  |  ...  |  ...  |  ...  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
   37 |  ...  |  ...  |  ...  |f2(131)|f2(132)|f2(133)|   0   |   0   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+

   Figure 9: Example with CRCs.


7.3. Example with multiple frames per payload and robust sorting

   In this example two 5.9 kbps mode (FT=2) frames are sent in one
   payload. No CRC is used (C=0). A mode request is sent(R=1),
   requesting the 7.95 kbps mode for the other link(CMR=5). The first
   frame is represented by the 118 bits f(0) to f(117) and the
   subsequent frame by g(0) to g(117). Robust sorting is used.

      |                            Bit no.                            |
   Oct|   0       1       2       3       4       5       6       7   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    0 |  S=1  |  C=0  |  R=1  |   1   |   0   |   1   |  F=1  |  Q=1  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    1 |   0   |   0   |   1   |   0   |  F=0  |  Q=1  |   0   |   0   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
    2 |   1   |   0   | f(0)  | g(0)  | f(1)  | g(1)  |  ...  |  ...  |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+
   31 |  ...  |  ...  | f(116)| g(116)| f(117)| g(117)|   0   |   0   |
   ---+-------+-------+-------+-------+-------+-------+-------+-------+

   Figure 10: Example two frames per payload and robust sorting.


8. The AMR MIME type registration

   This chapter defines the MIME type for the Adaptive Multi-Rate (AMR)
   speech codec [1]. The data format and parameters are specified for
   both real-time transport and for storage type applications (e.g. e-



Sjoberg et al.                  [Page 14]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   mail attachment, multimedia messaging). The former is referred as RTP
   mode and the latter as storage mode.

   AMR implementations according to [1] MUST support all eight coding
   modes. The mode change can occur at any time during operation and
   therefore the mode information is transmitted in-band together with
   speech bits to allow mode change without any additional signaling.

   In addition to the speech codec, AMR specifications also include
   Discontinuous Transmission / comfort noise (DTX/CN) functionality
   [11]. The DTX/CN switches the transmission off during silent parts of
   the speech and only CN parameter updates are sent at regular
   intervals.


8.1. RTP mode

   It is possible that the decoder may want to receive a certain AMR
   mode or a subset of AMR modes, due to link limitations in some
   cellular systems, e.g. the GSM radio link can only use a subset of
   maximum four modes. Therefore, it is possible to request a specific
   set of AMR modes in capability description and the encoder MUST abide
   this request. If the request for mode set is not given any mode may
   be used or requested.

   The AMR codec can in principle perform a mode change at any time
   between any two modes. To support interoperability with GSM through a
   gate-way it is possible to set limitations for mode changes. The
   decoder has possibility to define the minimum number of frames
   between mode changes and to limit the mode change to happen into
   neighboring modes only.

   It is also possible to limit the number of AMR frames encapsulated
   into one RTP packet. This is an optional feature and if no parameter
   is given in capability description, the transmitter can encapsulate
   any number of AMR speech frames into one RTP packet.

   The payload CRC UED MUST only be used if the receiver has signaled
   support for this functionality in the capability description.

   To support unequal error protection and/or detection the payload
   format supports robust payload sorting. The robust payload sorting is
   an OPTIONAL feature and MUST only be used if the receiver has
   signaled support for this functionality in the capability
   description.


8.2. Storage mode
   The AMR storage mode is used for storing AMR frames, e.g. as a file
   or e-mail attachment. Frames are stored in consecutive order in octet
   aligned manner. This implies that the first octet after the last



Sjoberg et al.                  [Page 15]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   octet of frame n must be the first octet of frame n+1. Each stored
   AMR frame consists of a Q bit and the 4-bit FT field (see definition
   in section 3.2), followed by the AMR encoded speech bits (see section
   3.3). The last octet of each frame is padded with zeroes, if needed,
   to achieve octet alignment. An example is given in figure 11.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Q|  FT   |                                                     |
   +-+-+-+-+-+                                                     +
   |                                                               |
   +                AMR speech bits for frame n                    +
   |                                                               |
   +                                                     +-+-+-+-+-+
   |                                                     | Padding |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Q|  FT   |                                                     |
   +-+-+-+-+-+                                                     +
   |                                                               |
   +                AMR speech bits for frame n+1                  +
   |                                                               |
   +                                                     +-+-+-+-+-+
   |                                                     | Padding |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 11: An example of storage format with two AMR 5.9 kbit/s
   frames (118 speech bits). Note that bits marked as 'padding' must be
   set to zero.

   Frames lost in transmission and non-received frames between SID
   updates during non-speech period must be stored as NO_TRANSMISSION
   frames (frame type 15, see definition in [2]) to keep synchronization
   with the original media.

   The receiving entity (AMR decoder) MUST be able to decode all eight
   coding modes as well as the AMR DTX/CN [6]. Since no exchange of
   particular coding considerations can be signaled before downloading
   or receiving stored AMR data, the optional features (robust sorting,
   CRC) specified for RTP mode MUST NOT be used with storage mode.


8.3. MIME Registration

   MIME-name for the AMR codec is allocated from IETF tree since AMR is
   expected to be widely used speech codec in VoIP applications. Some
   parts of this chapter will distinguish between RTP and storage modes.

   Media Type name:     audio

   Media subtype name:  AMR



Sjoberg et al.                  [Page 16]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001



   Required parameters: none

   Optional parameters for RTP mode:
    mode-set:  Requested AMR mode set. Restricts the active codec mode
               set to a subset of all modes. Possible values are comma
               separated list of modes: 0,...,7 (see Table 1a [2] an
               example is given in section 8.4). If not present, all
               speech modes are available.
    mode-change-period: Defines a number N which restricts the mode
               changes in such a way that mode changes are only allowed
               on multiples of N, initial state of the phase is
               arbitrary. If this parameter is not present, mode change
               can happen at any time.
    mode-change-neighbor: If present, mode changes SHALL only be made to
               neighboring modes in the active codec mode set. If not
               present, change between any two modes in the active codec
               mode set is allowed.
    maxframes: Maximum number of AMR speech frames in one RTP packet.
               The receiver may set this parameter in order to limit
               the buffering requirements or delay.
    crc:       If present, transmission of CRCs in the payload is
               supported, otherwise not supported.
    robust-sorting: If present, robust payload sorting is supported,
               otherwise not supported and simple payload sorting SHALL
               be used.

   Optional parameters for storage mode:     none

   Encoding considerations for RTP mode: See section 3 in this document.

   Encoding considerations for storage mode: See section 8.2 in this
   document.

   Security considerations: see chapter 6 "Security".

   Public specification: please refer to chapter 9 "References".

   Additional information for storage mode:
     Magic number: none
     File extensions: amr, AMR
     Macintosh file type code: none
     Object identifier or OID: none

   Person & email address to contact for further information:
     johan.sjoberg@ericsson.com
     ari.lakaniemi@nokia.com
     Bernhard.Wimmer@mch.siemens.de
   Intended usage: COMMON. It is expected that many VoIP applications
   (as well as mobile applications) will use this type.




Sjoberg et al.                  [Page 17]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001




   Author/Change controller:
     johan.sjoberg@ericsson.com
     ari.lakaniemi@nokia.com

8.4 Mapping to SDP Parameters

   Please note that this chapter applies to the RTP mode only.

   Parameters are mapped to SDP [12] as usual.
   Example usage in SDP:
    m=audio 49120 RTP/AVP 97
    a=rtpmap:97 AMR/8000
    a=fmtp:97 mode-set=0,2,5,7; maxframes=1


9.   References

   [1]  3G TS 26.090, "Adaptive Multi-Rate (AMR) speech transcoding".

   [2]  3G TS 26.101, "AMR Speech Codec Frame Structure".

   [3]  IETF RFC 2119, "Key words for use in RFCs to Indicate
        Requirement Levels".

   [4]  3G TS 26.093, "AMR Speech Codec; Source Controlled Rate
        operation".

   [5]  GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding".

   [6]  TIA/EIA -136-Rev.A, part 410 - "TDMA Cellular/PCS - Radio
        Interface, Enhanced Full Rate Voice Codec (ACELP). Formerly IS-
        641. TIA published standard, 1998".

   [7]  ARIB, RCR STD-27H, "Personal Digital Cellular Telecommunication
        System RCR Standard".

   [8]  IETF RFC1889, "RTP: A Transport Protocol for Real-Time
        Applications".

   [9]  IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic
        over Cellular Access Networks".

   [10] IETF draft-larzon-udplite-03.txt, "The UDP Lite Protocol".

   [11] GSM 06.92, "Comfort noise aspects for Adaptive Multi-Rate (AMR)
        speech traffic channels".

   [12] M. Handley and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327, April 1998



Sjoberg et al.                  [Page 18]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001



   [13] 3G TS 25.415 "UTRAN Iu Interface User Plane Protocols"

   [14] S. Floyd, M. Handley, J. Padhye, J. Widmer, "Equation-Based
        Congestion Control for Unicast Applications", ACM SIGCOMM 2000,
        Stockholm, Sweden

   [15] IETF draft-ietf-avt-ulp-00.txt, " An RTP Payload Format for
        Generic FEC with Uneven Level Protection ".

   [16] IETF RFC2733, "An RTP Payload Format for Generic Forward Error
        Correction".

   [17] 3G TS 26.102, "AMR speech codec interface to Iu and Uu".


10. Authors' addresses

   Johan Sjoberg                  Tel:   +46 8 50878230
   Ericsson Research              EMail: Johan.Sjoberg@ericsson.com
   Ericsson Radio Systems AB
   Torshamnsgatan 23
   SE-164 80 Stockholm
   SWEDEN

   Magnus Westerlund              Tel:   +46 8 4048287
   Ericsson Research              EMail: Magnus.Westerlund@ericsson.com
   Ericsson Radio Systems AB
   Torshamnsgatan 23
   SE-164 80 Stockholm
   SWEDEN

   Ari Lakaniemi                  Tel:   +358 40 5276440
   Nokia Research Center          EMail: ari.lakaniemi@nokia.com
   P.O.Box 407
   FIN-00045 Nokia Group
   Finland

   Petri Koskelainen
   Nokia Research Center          Email: petri.koskelainen@nokia.com
   P.O.Box 100
   FIN-33721 Tampere
   Finland


   Tim Fingscheidt                Tel:   +49 89 722 57658
   Siemens AG, ICP CD             Fax:   +49 89 722 46489
   Grillparzerstrasse 10-18       EMail: Tim.Fingscheidt@mch.siemens.de
   D - 81675 Munich
   Germany




Sjoberg et al.                  [Page 19]


INTERNET-DRAFT         RTP Payload Format for AMR      February 19, 2001


   Bernhard Wimmer                Tel:   +49 89 722 23247
   Siemens AG, ICP CD             Fax:   +49 89 722 46489
   Grillparzerstrasse 10-18       EMail: Bernhard.Wimmer@mch.siemens.de
   D - 81675 Munich
   Germany

   Qiaobing Xie                   Tel:   +1-847-632-3028
   Motorola, Inc.                 EMail: qxie1@email.mot.com
   1501 W. Shure Drive, #2309
   Arlington Heights, IL 60004
   USA

   Sanjay Gupta                   Tel:   +1-847-435-0306
   Motorola, Inc.                 EMail: QA4496@email.mot.com
   1501 W. Shure Drive, #3205
   Arlington Heights, IL 60004
   USA



   This Internet-Draft expires August 19, 2001.

































Sjoberg et al.                  [Page 20]