Internet Engineering Task Force                  Johan Sjoberg, Ericsson
Audio Video Transport WG                          Erik Ekudden, Ericsson
INTERNET-DRAFT                                Morgan Lindqvist, Ericsson
March 10, 2000                               Magnus Westerlund, Ericsson
Expires: September 10, 2000                                       Sweden




                       RTP payload format for AMR
                   <draft-sjoberg-avt-rtp-amr-00.txt>


Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This document is an individual submission to the IETF. Comments
   should be directed to the authors.


Abstract

   This document describes a proposed real-time transport protocol (RTP)
   [8] payload format for AMR speech encoded [1] signals. The AMR
   payload format can be used with minimal overhead sending one speech
   frame per RTP packet or using an extended format. The extended
   payload format supports means to send redundant data for speech
   frames sent in earlier RTP packets and to send multiple speech frames
   in one RTP packet. The payload format handles the current AMR mode
   set with 8 narrow-band modes and is prepared for future AMR
   extensions (e.g. wide-band modes). Mode adaptation and source
   controlled rate operation (SCR) are supported by the AMR payload
   format.



Sjoberg                                                         [Page 1]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000




1.  Introduction

   The adaptive multi-rate (AMR) speech codec was developed by the
   European Telecommunications Standards institute (ETSI). The AMR codec
   is standardized for GSM, and is also chosen by 3GPP as the mandatory
   codec for third generation systems. It is currently under
   standardization for TDMA. I.e. the AMR codec will be widely used in
   cellular systems. The AMR codec is developed to preserve high speech
   quality under a wide range of transmission conditions.

   The AMR codec is a multi-mode codec with 8 narrow band modes with bit
   rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz
   and processing is done on 20 ms frames,  i.e. 160 samples per frame.
   The AMR modes are closely related to each other and uses the same
   coding framework. Three of the AMR modes are already adopted and used
   standards of there own, the 6.7 kbps mode as PDC-EFR [7], the 7.4
   kbps mode as IS-641 codec in TDMA [6], and the 12.2 kbps mode as GSM-
   EFR [5].

   AMR implementations must support all 8 speech coding modes, and mode
   switching can occur to any mode at any time. The mode information
   must therefore be transmitted together with the speech encoded bits,
   to indicate the mode.

   It is possible for the decoder to signal to the encoder the mode it
   prefers to receive. The reason can be e.g. transmission bandwidth or
   quality.

   The AMR codec is designed with a voice activity detector (VAD) and
   generation of comfort noise (CN) parameters during silence periods.
   Hence, the AMR codec can reduce the number of transmitted bits and
   packets during silence periods to a minimum.  The operation to send
   CN parameters at regular intervals during silence periods is usually
   called discontinuous transmission (DTX) or source controlled rate
   (SCR) operation. The three codec standards that are part of AMR
   [5][6][7] also have SCR/CN functionality specified. To enable
   interoperability with terminals supporting these standards the AMR
   can optionally be extended to support also these CN schemes.

   AMR wide-band modes with 16000 Hz sampling frequency is under
   standardization.

   Due to the flexibility and robustness of AMR it is suitable also for
   other purposes than circuit switched cellular systems. Other suitable
   applications are real-time services over packet switched networks,
   e.g. over RTP. To be optimized for transmission over networks with
   high packet loss rates extra redundancy is built into the RTP payload
   format for AMR. The speech encoded bits have different perceptual
   sensitivity to bit errors. Cellular systems exploit this by using



Sjoberg                                                         [Page 2]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000


   unequal error protection and detection (UEP and UED). This mechanism
   concentrate the correction and detection of corrupted bits to the
   perceptually most sensitive bits. A frame is only regarded as lost or
   damaged if errors are detected in the most sensitive bits. The UED
   can also be employed on RTP if UDP lite is used as transport layer
   protocol (UDP lite [10] is work in progress). The payload then has to
   be ordered in sensitivity order. The AMR encoded bits are defined in
   sensitivity order in [2]. The different sensitivity could also be
   used for not transmitting the least sensitive bits when redundant
   frames are sent. The special problems with IP real-time traffic over
   cellular access networks are further discussed in [9]. Other AMR
   scenarios are possible, e.g. one end is circuit switched GSM then a
   gate-way to IP and an IP terminal in the other end. To improve
   quality also frames damaged by the GSM radio should be transmitted to
   the decoder in the IP network. To make this possible frame quality
   information has to be transmitted over the IP network. The quality
   bit is also needed for the AMR RTP payload format to interwork with
   for example the ATM AAL2 AMR profile.


2.  Requirements

   The AMR payload format for RTP was designed to meet the following
   requirements:

    o Different levels of robustness must be supported, from no
      redundant data to extreme robustness capable of handling very
      high packet loss rates with no or small speech quality
      degradation.

    o Fast, frame-wise AMR mode adaptation must be supported. This
      means that it must be possible to send Codec Mode Requests back
      from the receiving side to the transmitting side with information
      on the preferred mode. Slower AMR mode adaptation may also be
      accomplished with external signaling.

    o Source controlled rate operation (SCR) and comfort noise
      parameter (CN) transmission defined in AMR must be supported.


3.  Payload Format Specification

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119 [3].

   The AMR payload format is designed to be flexible, ranging from very
   low overhead (minimal) to an extended format with room for future AMR
   extensions, e.g. wide band modes, and the possibility to send extra
   redundancy information and several speech frames in one packet.




Sjoberg                                                         [Page 3]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000


   The payload format consists of payload header and one or more payload
   frames. Neither the payload header nor the payload frames are octet
   aligned on their own but the full payload is. The full payload SHALL
   finally be ordered in descending bit error sensitivity order to be
   prepared for unequal error protection or unequal error detection
   schemes, e.g. UDP lite. The AMR encoded bit streams are defined in
   sensitivity order in Annex B of [2], the original order as delivered
   from the speech encoder is defined in [1].

   The last octet of an AMR payload packet is padded with zeroes at the
   end if not all bits are needed.

                             speech   octets in
   Index     Mode             bits    minimal form
   --------------------------------------------
     0       AMR 4.75           95       13
     1       AMR 5.15          103       14
     2       AMR 5.9           118       16
     3       AMR 6.7           134       18
     4       AMR 7.4           148       20
     5       AMR 7.95          159       21
     6       AMR 10.2          204       27
     7       AMR 12.2          244       32
     8       AMR CN             39        6
     9       GSM EFR CN         43        7
    10       IS-641 CN          38        6
    11       PDC-EFR CN         37        6
    12 - 14  For future use      -        -
    15       No transmission     0        1
    16 - 31  For future use      -        -

   Table 1: AMR frame types. Minimal form is one frame per payload and
   no Codec Mode Request.

   The bit order of frame type 0 - 11 is given in [2]. Frame type 15, no
   transmission, is needed to indicate not transmitted frames or lost
   frames, e.g. when multiple frames are sent in each payload and
   comfort noise starts. A frame type sequence in a payload with 8
   frames, AMR mode 7, and CN starts in the fifth frame, could look
   like: {7,7,7,7,8,15,15,8}. The AMR SCR is described in [4]. Another
   reason for the no transmission frame type is a possible need to send
   an urgent Codec Mode Request in a silence period with comfort noise.

   The AMR payload format supports robust transmission, multiple frames
   in one payload packet, and the use of fast codec mode adaptation.

   The robust behavior is accomplished by retransmission of previously
   transmitted frames together with the current frame or frames. The
   redundant frames could be transmitted in their entirety or only
   partly. If only a part of the redundant frame is transmitted the
   least sensitive bits are omitted. A partly transmitted redundant



Sjoberg                                                         [Page 4]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000


   frame SHALL fill the number of used octets for that frame. The bits
   in the payload are sorted in descending sensitivity order to support
   UED like in UDP lite.

   When bits in redundant frames are not transmitted, the not
   transmitted/received bits MUST be reconstructed on the receiver side.
   It is RECOMMENDED to produce the non received bits with a random
   generation or another quality preserving method. To use a fixed
   pattern SHOULD be avoided from speech quality reasons.


3.1.  The payload header

   The payload header has dynamic length, 3 or 8 bits. The bits in the
   header are specified as follows:

   Q (1 bit): The payload quality bit indicates, if not set, that the
   payload is severely damaged and the receiver should set the RX_TYPE,
   see [4], to SPEECH_BAD or SID_BAD depending on the frame type (FT).

   L (1 bit): Indicates the existence of LEN fields in the payload
   frames.

   R (1 bit): Indicates if the Codec Mode Request (CMR) is sent or not.

   CMR (5 bits): OPTIONAL field, depending on the R bit. Requested codec
   mode for the other communication direction. The interpretation is
   equal to the FT field, see Table 1.

    0
    0 1 2
   +-+-+-+
   |Q|L|R|
   +-+-+-+

   Figure 1: AMR payload header, R=0

    0
    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |Q|L|R|   CMR   |
   +-+-+-+-+-+-+-+-+

   Figure 2: AMR payload header, R=1


3.2.  AMR payload frame

   An AMR payload frame represent one encoded speech frame. Each payload
   frame includes several specified fields as follows:




Sjoberg                                                         [Page 5]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000


   F (1 bit): Indicates if this frame is followed by further frames. F=1
   further frames follow, F=0 last frame.

   FT (5 bits): Frame type indicator, indicating the AMR speech coding
   mode or comfort noise (CN) mode. The mapping of existing AMR modes
   are given in Table 1. If FT=15 (No transmission) no LEN or AMR
   encoded bits follow.

   LEN (7 bits): OPTIONAL field, exists if the payload header bit L is
   set, L=1. LEN specifies the number of octets in the AMR encoded bits
   field in this  frame. If LEN indicates more bits than the AMR mode
   information in the FT field, the implicit knowledge of the number of
   bits for the AMR mode indicated by FT is the valid number of AMR
   encoded bits. If LEN indicates fewer bits than given by the mode
   information in the FT field, LEN gives the number of encoded bits. If
   a frame is transmitted only partially the least sensitive bits at the
   end of the frame are omitted. This use is intended for partial
   redundant data.

   AMR encoded bits: This is the speech codec encoded data field. The
   length of this field is either defined implicitly by the AMR mode in
   the FT field, or by the LEN field. The last payload frame SHALL
   always contain a full AMR frame, i.e. no LEN field is needed.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   FT    |     LEN     |                                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+                                     +
   |                                                               |
   +                                                               +
   /                    AMR encoded bits                           /
   +                                                 +-+-+-+-+-+-+-+
   |                                                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 3: Payload frame format, F=1 and L=1

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   FT    |                                                   |
   +-+-+-+-+-+-+                                                   +
   |                                                               |
   +                                                               +
   /                    AMR encoded bits                           /
   +                                             +-+-+-+-+-+-+-+-+-+
   |                                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 4: Payload frame format, F=0 or L=0



Sjoberg                                                         [Page 6]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000




3.3. Payload block sorting

   The bits in each frame are ordered in sensitivity order, i.e. a bit
   error in a more sensitive bit is subjectively more annoying than in a
   less sensitive bit. To be able to protect the most sensitive bits in
   a payload packet with a forward error detection code, e.g. a CRC
   outside RTP, the full RTP payload MUST be sorted in sensitivity
   order. The protection MAY then cover an appropriate number of octets
   from the beginning of the payload. How many octets depends on the
   channel and application. This can for example be accomplished by UDP
   lite [10] (work in progress). To maintain sensitivity ordering inside
   the AMR payload when more than one speech frame is transmitted in one
   packet reordering of the data is needed.

   The reordering to maintain the sensitivity ordered AMR payload SHALL
   be performed on bit level. The AMR payload header SHALL still be
   placed unchanged in the beginning of the payload. Thereafter, the
   payload frames are sorted with one bit alternating from each payload
   frame.

   +-------------+
   | h(0)-h(H-1) |
   +------------------------+
   | f(0,0) _ f(0,F(0))     |
   +----------------------------+
   | f(1,0) _ f(1,F(1))         |
   +----------------------------+
   | f(2,0) _ f(2,F(2))   |
   +----------------------+
   \                          \
   +-------------------------------+
   | f(N-1,0) _ f(N-1,F(N-1))      |
   +-------------------------------+

   Figure 5: The payload header and N payload frames before sorting.

   The sorting algorithm can be described in C-code.

   b(m)     - bit m of RTP final payload
   f(n,m)   - bit m in payload frame n
   F(n)     - number of bits in payload frame n, defined by FT or by LEN
   h(m)     - bit m of payload header
   H        - number of payload header bits, 3 or 8 bits
   N        - number of payload frames in the payload
   S        - number of unused bits

   Payload frames f(n,m) are ordered in consecutive order, where frame
   n=1 is preceding frame n=2.




Sjoberg                                                         [Page 7]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000


   The sorting algorithm is defined in C-style as:

   for (i = 0; i < H; i++){
     b(i) = h(i);
   }
   max = max(F(0),..,F(N-1));
   k = H;
   for (i = 0; i < max; i++){
     for (j = 0; j < N; j++){
       if (i < F(j)){
         b(k++) = f(j,i);
       }
     }
   }
   S = 8 - k%8;
   if (S < 8){
     for (i = 0; i < S; i++){
       b(k++) = 0;
     }
   }


4.    RTP header usage

   The RTP header marker bit (M) is used to mark (M=1) the packages
   containing the first speech frame after CN. All other packages the
   marker bit is set to 0 (M=0).

   The time-stamp corresponds to the sampling time of the first sample
   encoded for the first encoded speech frame in the packet. The
   timestamp unit is in samples, i.e. one AMR speech frame is 20 ms and
   sampling frequency is 8 kHz corresponds to 160 encoded speech samples
   per frame, i.e. the timestamp is increased by 160 for each
   consecutive frame. All frames in a packet MUST be successive 20 ms
   frames.


5.   Examples

5.1. Simple example

   In the simple example we just send one full (L=0) frame in each RTP
   packet, no Codec Mode Request CMR is sent (R=0), the payload was not
   damaged at IP origin (Q=1). In this example we transmit one frame
   encoded with the 5.9 kbps mode (FT=2). The speech encoded bits are
   put into f(0) to f(117) in descending sensitivity order according to
   [2].







Sjoberg                                                         [Page 8]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000



       |                            Bit no.                            |
   Oct.|   0       1       2       3       4       5       6       7   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     0 |  Q=1  |  L=0  |  R=0  |  F=0  |   0   |   0   |   0   |   1   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     1 |   0   | f(0)  | f(1)  | f(2)  |  ...  |  ...  |  ...  |  ...  |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
    16 |  ...  |  ...  |  ...  |  ...  | f(115)| f(116)| f(117)|   0   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+

   Figure 6: One frame per packet example.


5.2. Example with partial redundancy

   In this example the 6.7 kbps mode (FT=3) is sent with one redundant
   frame, also FT=3. Only a part of the redundant frame is sent, in this
   example 12 octets, (L=1, LEN=12). A mode request is sent(R=1),
   requesting the 10.2 kbps mode for the other link(CMR=6). The
   redundant frame (12 octets) is r(0) to r(95) and the current frame
   (134 bits) is f(0) to f(133).

       |                            Bit no.                            |
   Oct.|   0       1       2       3       4       5       6       7   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     0 |  Q=1  |  L=1  |  R=1  |   0   |   0   |   1   |   1   |   0   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     1 |  F=1  |  F=0  |   0   |   0   |   0   |   0   |   0   |   0   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     2 |   1   |   1   |   1   |   1   |   0   | f(0)  |   0   | f(1)  |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     3 |   0   | f(2)  |   1   | f(3)  |   1   | f(4)  |   0   | f(5)  |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
     4 |   0   | f(6)  | r(0)  | f(7)  | r(1)  | f(8)  | r(2)  |  ...  |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
    28 |  ...  |  ...  |  ...  |  ...  | r(93) | f(100)| r(94) | f(101)|
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
    29 | r(95) | f(102)| f(103)| f(104)|  ...  |  ...  |  ...  |  ...  |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
    32 |  ...  |  ...  |  ...  |  ...  |  ...  |  ...  | f(131)| f(132)|
   ----+-------+-------+-------+-------+-------+-------+-------+-------+
    33 | f(133)|   0   |   0   |   0   |   0   |   0   |   0   |   0   |
   ----+-------+-------+-------+-------+-------+-------+-------+-------+

   Figure 7: Example with partial redundancy.


6.  References

   [1] GSM 06.90, "Adaptive Multi-Rate (AMR) speech transcoding".



Sjoberg                                                         [Page 9]


INTERNET-DRAFT         RTP Payload Format for AMR         March 10, 2000



   [2] 3G TS 26.101, "AMR Speech Codec Frame Structure".

   [3] RFC 2119, "Key words for use in RFCs to Indicate Requirement
   Levels".

   [4] 3G TS 26.093, "AMR Speech Codec; Source Controlled Rate
   operation".

   [5] GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding".

   [6] TIA/EIA IS-641-A, "TDMA Cellular/PCS _Radio interface, Enhanced
   Full-Rate Voice Codec".

   [7] ARIB, RCR STD-27H, Section 5.4, "ACELP Speech CODEC".

   [8] IETF RFC1889, "RTP: A Transport Protocol for Real-Time
   Applications".

   [9] IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic
   over Cellular Access Networks".

   [10] IETF draft-larzon-udplite-02.txt, "The UDP Lite Protocol".


7.  Authors' addresses

   Johan Sjoberg
   Ericsson Research
   E-mail: Johan.Sjoberg@ericsson.com

   Erik Ekudden
   Ericsson Research
   E-mail: Erik.Ekudden@ericsson.com

   Morgan Lindqvist
   Ericsson Research
   E-mail: Morgan.Lindqvist@ericsson.com

   Magnus Westerlund
   Ericsson Research
   E-mail: Magnus.Westerlund@era.ericsson.se


This Internet-Draft expires September 10, 2000.









Sjoberg                                                        [Page 10]