Internet Engineering Task Force Johan Sjoberg, Ericsson
Audio Video Transport WG Magnus Westerlund, Ericsson
INTERNET-DRAFT Ari Lakaniemi, Nokia
February 19, 2001 Petri Koskelainen, Nokia
Expires: August 19, 2001 Bernhard Wimmer, Siemens
Tim Fingscheidt, Siemens
Qiaobing Xie, Motorola
Sanjay Gupta, Motorola
RTP payload format for AMR
<draft-ietf-avt-rtp-amr-04.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This document is an individual submission to the IETF. Comments
should be directed to the authors.
Abstract
This document describes a proposed real-time transport protocol (RTP)
payload format for AMR speech encoded signals. The AMR payload format
is designed to be able to interoperate with existing AMR transport
formats. This document also includes a MIME type registration for
AMR. The MIME type is specified for both real-time transport and
storage.
Sjoberg et al. [Page 1]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
1. Introduction
The adaptive multi-rate (AMR) speech codec [1] was developed by the
European Telecommunications Standards institute (ETSI). The AMR codec
is standardized for GSM, and is also chosen by 3GPP as the mandatory
codec for third generation systems. It is currently under
standardization for TDMA. I.e. the AMR codec will be widely used in
cellular systems. The AMR codec is developed to preserve high speech
quality under a wide range of transmission conditions.
The AMR codec is a multi-mode codec with 8 narrow band speech modes
with bit rates between 4.75 and 12.2 kbps. The sampling frequency is
8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per
frame. The AMR modes are closely related to each other and use the
same coding framework. Three of the AMR modes are already adopted
standards of their own, the 6.7 kbps mode as PDC-EFR [7], the 7.4
kbps mode as IS-641 codec in TDMA [6], and the 12.2 kbps mode as GSM-
EFR [5].
The AMR codec is designed with a voice activity detector (VAD) and
generation of comfort noise (CN) parameters during silence periods.
Hence, the AMR codec can reduce the number of transmitted bits and
packets during silence periods to a minimum. The operation to send CN
parameters at regular intervals during silence periods is usually
called discontinuous transmission (DTX) or source controlled rate
(SCR) operation.
AMR implementations must support all 8 speech coding modes, and mode
switching can occur to any mode at any time. The mode information
must therefore be transmitted together with the speech encoded bits,
to indicate the mode. The AMR speech codec is designed with modes
producing different bit rates to be able to adapt the source bit rate
according to the radio link quality in mobile phone systems. The
objective was to give highest possible speech quality under a variety
of radio channel conditions. To realize rate adaptation the decoder
needs to signal the mode it prefers to receive to the encoder.
Due to the flexibility and robustness of AMR, it is suitable also for
other purposes than circuit switched cellular systems. Other suitable
applications are real-time services over packet switched networks.
The payload format should be designed for robustness against both bit
errors and packet loss. The speech encoded bits have different
perceptual sensitivity to bit errors and cellular systems exploit
this by using unequal error protection and detection (UEP and UED).
The UED/UEP mechanism focus the correction and detection of corrupted
bits to the perceptually most sensitive bits. A speech frame is only
declared damaged if there are bit errors in the most sensitive bits,
i.e. class A bits. It is acceptable to have some bit errors in the
other bits, i.e. class B and C. Also a damaged frame is still useful
for error concealment in the decoding, which uses some of the less
Sjoberg et al. [Page 2]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
sensitive bits. This improves the speech quality compared to
discarding the data.
Today there exist some link layers that does not discard packets with
bit errors, e.g. SLIP and some wireless links (with the Internet
traffic pattern shifting towards a more media-centric one, more link
layers of such nature may emerge in the future). With transport layer
support for partial checksums, for example those supported by UDP-
Lite [10] (work in progress), bit error tolerant AMR traffic could
achieve better performance over these types of links.
There are at least two basic approaches for carrying AMR traffic over
bit error tolerant networks:
1) Utilizing a partial checksum to cover headers and the most
important AMR speech bits of the payload. It is recommended that
at least all class A bits are covered by the checksum.
2) Utilizing a partial checksum to only cover headers, but a frame
CRC to cover the class A bits of each AMR frame in the payload.
In either approach, at least part of the class B/C bits are left
without error-check and thus bit error tolerance is achieved.
It is still important that the network designer pays attention to the
class B and C residual bit error rate. Though less sensitive to error
than class A bits, class B bits are not insignificant and undetected
errors in these bits cause degradation in speech quality. An example
of residual error rates considered acceptable for AMR in UMTS can be
found in [17].
Approach 1 is a bit efficient, flexible and simple way, but comes
with two disadvantages, namely, a) bit errors in protected speech
bits will cause the payload to be discarded, and b) when transporting
multiple frames in a payload there is the possibility that a single
bit error in protected bits gets all the frames discarded.
These disadvantages can be avoided if needed, with some overhead in
the form of a frame-wise CRC (Approach 2). In problem a), the CRC
makes it possible to detect bit errors in class A bits and use the
frame for error concealment, which gives a small improvement in
speech quality. Secondly (b), when transporting multiple frames in a
payload the CRC's remove the possibility that a single bit error in a
class A bit gets all the frames discarded. Avoiding that gives an
improvement in speech quality when transporting multiple frames and
subject to bit errors.
The choice between the two approaches must be made based on the
available bandwidth, and desired tolerance to bit errors. Neither
solution is appropriate to all cases.
Sjoberg et al. [Page 3]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
To achieve better robustness against packet loss the payload supports
FEC. The simple scheme of repetition of previously sent data is one
possibility. Another possible scheme which is more bandwidth
efficient is to use payload external FEC, e.g. RFC2733 [16], which
generates extra packets containing repair data. The whole payload can
also be sorted in sensitivity order to support external FEC schemes
using UEP. There is work in progress on a generic version of such a
scheme [15].
2. Requirements
The AMR payload format for RTP was designed to meet the following
requirements:
o Different levels of robustness must be supported, from no
redundant data to extreme robustness capable of handling very
high packet loss rates with no or small speech quality
degradation.
o Fast, bandwidth efficient, frame-wise AMR mode adaptation must
be supported. This means that it must be possible to send Codec
Mode Requests back from the receiving side to the transmitting
side with information on the preferred mode.
o Source controlled rate operation (SCR) (also called DTX) and
comfort noise parameter (CN) transmission defined in AMR must be
supported.
3. Payload format
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119 [3].
The AMR payload format is designed to be flexible, ranging from very
low overhead to an extended format with the possibility to increase
bit error robustness and pack several speech frames in one packet.
The payload format consists of one payload header, a table of
content, optionally one CRC per payload frame and zero or more
payload frames. The payload format is bandwidth efficient. This is
achieved by not using octet alignment for the payload header, table
of content or the payload frames, but the full payload is octet
aligned. If the option to transmit a robust sorted payload is enabled
and employed, the full payload SHALL finally be ordered in descending
bit error sensitivity order to be prepared for unequal error
protection or unequal error detection schemes. The AMR encoded bit
streams are defined in sensitivity order in Annex B of [2], the
Sjoberg et al. [Page 4]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
original order as delivered from the speech encoder is defined in
[1].
The last octet of an AMR payload packet MUST be padded with zeroes at
the end if not all bits are used.
The AMR frame types, or modes, are defined in [2]. Frame type 15, no
transmission, is needed to indicate not transmitted frames or lost
frames. Not transmitted could mean both no data produced by the
speech encoder for this frame or no data transmitted in this payload,
i.e. valid data for this frame could be sent in another payload. For
example, when multiple frames are sent in each payload and comfort
noise starts. A frame type sequence in a payload with 8 frames,
speech frames with AMR mode 7 are interrupted by CN in the
fifth frame, could look like: {7,7,7,7,8,15,15,8}. The AMR SCR/DTX is
described in [4].
The AMR payload format supports robust transmission, multiple frames
in one payload packet, and the use of fast codec mode adaptation.
Robustness against packet loss can be accomplished by using the
possibility to retransmit previously transmitted frames together with
the current frame or frames.
The AMR performance over error tolerant links can be be improved by
delivering also speech frames with bit errors. Unequal error
detection is needed since bit errors SHOULD only be allowed in the
least error sensitive bits. This payload format provides two
alternative methods to implement unequal error detection:
A. CRC calculation over the class A speech bits
If several consecutive speech frames are packed into each
payload, the optional CRC may be used to protect the class A
speech bits, see table 1. The number of class A bits is specified
as informative in [2] and therefore copied into table 1 as
normative for this payload format. Speech frames with errors in
class A bits MUST be marked with SPEECH_BAD for corrupted speech
frames (FT=0..7) or SID_BAD for corrupted SID frames (FT=8) and
be sent to the speech decoder, see [4]. In this case the RTP
header, payload header and table of content should be covered by
a transport layer checksum, e.g. UDP-lite [10]. Packets should be
discarded if the transport layer CRC detects errors.
B. Robust sorting of payload bits
Robust behavior can also be accomplished by robust sorting of the
payload. This enables the use of UED (e.g. UDP-lite) and UEP
(e.g. ULP [15]). The UED and/or UEP is recommended to cover at
least the RTP header, payload header, table of content and class
A bits.
Sjoberg et al. [Page 5]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
Support for unequal error detection is OPTIONAL. If either scheme is
to be used, it MUST be signalled out of band (see section 8).
Class A total speech
Index Mode bits bits
----------------------------------------
0 AMR 4.75 42 95
1 AMR 5.15 49 103
2 AMR 5.9 55 118
3 AMR 6.7 58 134
4 AMR 7.4 61 148
5 AMR 7.95 75 159
6 AMR 10.2 65 204
7 AMR 12.2 81 244
8 AMR CNG 39 39
Table 1. Specification of the number of class A bits.
A frame quality indicator is included for interoperability with the
ATM payload format described in ITU-T I.366.2, the UMTS Iu interface
[13] and other transport formats. The speech quality is increased if
damaged frames are forwarded to the speech decoder error concealment
unit and not dropped. In many communication scenarios the AMR encoded
bits will be transmitted from one IP/UDP/RTP terminal to a terminal
in a system with another transport format and/or vice versa. The
transport format transcoding will be done in a gate way. A second
likely scenario is that IP/UDP/RTP is used as transport between other
systems, i.e. IP is originated and terminated in gate ways on both
sides of the IP transport.
AMR over
I.366.{2,3} or +------+ +----------+
3G Iu or | | IP/UDP/RTP/AMR | |
-------------->| GW |----------------------->| TERMINAL |
GSM Abis | | | |
etc. +------+ +----------+
Figure 1: GW to VoIP terminal scenario
AMR over AMR over
I.366.{2,3} or +------+ +------+ I.366.{2,3} or
3G Iu or | | IP/UDP/RTP/AMR | | 3G Iu or
-------------->| GW |-------------------->| GW |--------------->
GSM Abis | | | | GSM Abis
etc. +------+ +------+ etc.
Figure 2. GW to GW scenario
Sjoberg et al. [Page 6]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
3.1. The payload header
The length of the payload header is 6 bits. The bits in the header
are specified as follows:
S (1bit): Indicates if set that the payload is robust sorted,
otherwise simple payload sorting is employed. Note that this bit can
be set only if the receiver has signaled support for the OPTIONAL
robust payload sorting.
C (1 bit): Indicates the existence of optional CRC fields in the
payload table of content. Note that this bit can be set only if the
receiver has signaled support for the OPTIONAL CRC.
R (1 bit): Indicates, if set, that the Codec Mode Request (CMR) is
valid.
CMR (3 bits): this field is only valid if the R bit is set(R=1).
Codec Mode Requested (CMR) for the other communication direction. It
is only allowed to request the one of the speech modes, frame type
index 0-7 see Table 1a in [2]. If R=0 the CMR bits SHALL be set to
zero, other values are for future use.
0
0 1 2 3 5 6
+-+-+-+-+-+-+
|S|C|R| CMR |
+-+-+-+-+-+-+
Figure 3: AMR payload header
3.2. The payload table of content and CRCs
The table of content (ToC) consists of one table of content entry for
each speech frame in the payload. A table of content entry includes
several specified fields as follows:
F (1 bit): Indicates if this frame is followed by further frames. F=1
further frames follow, F=0 last frame.
Q (1 bit): The payload quality bit indicates, if not set, that the
payload is severely damaged and the receiver should set the RX_TYPE,
see [4], to SPEECH_BAD or SID_BAD depending on the frame type (FT).
FT (4 bits): Frame type indicator, indicating the AMR speech coding
mode or comfort noise (CN) mode. The mapping of existing AMR modes to
FT is given in Table 1a in [2]. If FT=15 (No transmission) no CRC or
payload frame is present.
Sjoberg et al. [Page 7]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
0
0 1 2 3 4 5
+-+-+-+-+-+-+
|F|Q| FT |
+-+-+-+-+-+-+
Figure 5: Table of content entry field
CRC (8 bits): OPTIONAL field, exists if the payload header bit C is
set (C=1). The 8 bit CRC is used for error detection. These 8 parity
bits are generated according to section 4.1.4 in [2].
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| CRC |
+-+-+-+-+-+-+-+-+
Figure 5: CRC field
The ToC and CRCs are arranged with all table of content entries
fields first followed by all CRC fields. The ToC starts with the
frame data belonging to the oldest speech frame.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|Q| FT |F|Q| FT |F|Q| FT | CRC | CRC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC |
+-+-+-+-+-+-+-+-+-+-+
Figure 5: The ToC and CRCs for a payload with three speech frames
3.3. AMR speech frame
An AMR speech frame represent one encoded speech frame encode with
the mode according to the ToC field FT. The length of this field is
implicitly defined by the AMR mode in the FT field. The bits SHALL be
sorted according to Appendix B of [2].
3.4. Compound AMR payload
The compound AMR payload consists of one AMR payload header, the
table of content and one or more AMR payload frames, see section 3.1,
3.2 and 3.3. These can be put together with robust or simple payload
sorting. The payload header bit S indicates the method used.
Sjoberg et al. [Page 8]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
Definitions for describing the compound AMR payload:
b(m) - bit m of the compound AMR payload
t(n,m) - bit m in the table of content entry for speech frame n
p(n,m) - bit m in the CRC for speech frame n
f(n,m) - bit m in speech frame n
F(n) - number of bits in speech frame n, defined by FT
h(m) - bit m of payload header
C - number of CRC bits , 0 or 8 bits
N - number of payload frames in the payload
S - number of unused bits
Payload frames f(n,m) are ordered in consecutive order, where frame
n=1 is preceding frame n=2. Within one payload all frames between the
oldest and most recent must be present. If speech data is missing for
one frame, due to e.g. DTX, send the NO_TRANSMISSION frame type.
3.4.1. Robust payload sorting
A bit error in a more sensitive bit is subjectively more annoying
than in a less sensitive bit. Therefore, to be able to protect only
the most sensitive bits in a payload packet with a forward error
detection code, e.g. a CRC outside RTP, the bits inside a frame are
ordered into sensitivity order. The protection SHOULD cover an
appropriate number of octets from the beginning of the payload,
covering at least the AMR payload header, ToC and class A bits (see
[2]). Exactly how many octets that needs protection depends on the
network and application. To maintain sensitivity ordering inside the
AMR payload, when more than one speech frame is transmitted in one
payload, reordering of the data is needed.
The reordering to maintain the sensitivity ordered AMR payload SHALL
be performed on bit level. The AMR payload header, ToC and CRCs SHALL
still be placed unchanged in the beginning of the payload.
Thereafter, the payload frames are sorted with one bit alternating
from each payload frame.
The robust payload sorting algorithm is defined in C-style as:
/* payload header */
k=0;
for (i = 0; i < 6; i++){
b(k++) = h(i);
}
/* table of content */
for (j = 0; j < N; j++){
for (i = 0; i < 6; i++){
b(k++) = t(j,i);
}
}
Sjoberg et al. [Page 9]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
/* CRCs */
for (j = 0; j < N; j++){
for (i = 0; i < C; i++){
b(k++) = p(j,i);
}
}
/* payload frames */
max = max(F(0),..,F(N-1));
for (i = 0; i < max; i++){
for (j = 0; j < N; j++){
if (i < F(j)){
b(k++) = f(j,i);
}
}
}
/* padding */
S = 8 - k%8;
if (S < 8){
for (i = 0; i < S; i++){
b(k++) = 0;
}
}
3.4.2. Simple payload sorting
If multiple new frames are encapsulated into the payload and robust
payload sorting is not used. The payload is formed by concatenating
the payload header, the ToC, optional CRC fields and the speech
frames in the payload. However, the bits inside a frame are ordered
into sensitivity order as defined in [2].
The simple payload sorting algorithm is defined in C-style as:
/* payload header */
k=0;
for (i = 0; i < 6; i++){
b(k++) = h(i);
}
/* table of content */
for (j = 0; j < N; j++){
for (i = 0; i < 6; i++){
b(k++) = t(j,i);
}
}
/* CRCs */
for (j = 0; j < N; j++){
for (i = 0; i < C; i++){
b(k++) = p(j,i);
}
}
Sjoberg et al. [Page 10]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
/* payload frames */
for (j = 0; j < N; j++){
for (i = 0; i < F(j); i++){
b(k++) = f(j,i);
}
}
}
/* padding */
S = 8 - k%8;
if (S < 8){
for (i = 0; i < S; i++){
b(k++) = 0;
}
}
3.5. Decoding security consideration
If the payload length calculation, using C, F and FT fields, do not
indicate the same length as the actually received payload size the
payload should be dropped. Decoding a packet that has errors in
length indicator bits could severely degrade the speech quality.
4. RTP header usage
The RTP header marker bit (M) is used to mark (M=1) the packages
containing the first speech frame after CN. For all other packages
the marker bit is set to 0 (M=0).
The timestamp corresponds to the sampling instant of the first sample
encoded for the first frame in the packet. A frame can be either
encoded speech, comfort noise parameters, or NO_TRANSMISSION. The
timestamp unit is in samples. The duration of one AMR speech frame is
20 ms and the sampling frequency is 8 kHz, corresponding to 160
encoded speech samples per frame. Thus, the timestamp is increased by
160 for each consecutive frame. All frames in a packet MUST be
successive 20 ms frames.
5. Congestion Control
The need of congestion control for data transported with RTP has to
be considered. AMR speech data have some elastic properties due to
the different bandwidth demand for each mode. Another parameter that
can reduce the bandwidth demand for AMR are how many frames of speech
data that are encapsulated in each payload. This will reduce the
number of packets and the overhead from IP/UDP/RTP headers. If using
forward error correction (FEC) there is also the need to regulate the
amount, so the FEC itself does not worsen the problem. Therefore, it
is RECOMMENDED that applications using this payload implements
Sjoberg et al. [Page 11]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
congestion control. The actual mechanism for congestion control is
not specified but should be suitable for real-time flows, e.g.
"Equation-Based Congestion Control for Unicast Applications" [14].
6. Security Considerations
RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP
specification [8]. This implies that confidentiality of the media
streams is achieved by encryption. Because the payload format is
arranged end-to-end, encryption MAY be performed after encapsulation
so there is no conflict between the two operations.
This payload type does not exhibit any significant non-uniformity in
the receiver side computational complexity for packet processing to
cause a potential denial-of-service threat.
As this format transports encoded speech, the main security issues
are decoding security (see section 3.5), confidentiality and
authentication of the speech itself. Some other smaller issues also
exist. The payload format itself does not have any support for
security. These issues have to be solved by a payload external
mechanism.
6.1. Confidentiality
To achieve confidentiality of the encoded speech all speech data bits
must be encrypted. There is less need to encrypt the payload header
or the frame header as they only carry information about the
requested AMR mode, AMR frame type and frame quality. This
information could be useful to some third party, e.g. quality
monitoring. The type of encryption used can not only have impact on
the confidentiality but also on error robustness. The error
robustness against bit errors will be non, unless an encryption
method without error-propagation is used, e.g. a stream cipher. This
is only an issue when using UEP/D, when bit errors can be accepted in
some part of the payload.
6.2. Authentication
To authenticate the sender of the speech an external mechanism have
to be added. It is recommended that such a mechanism protects all the
speech data bits. To prevent a man in the middle to tamper with the
packetization of the speech data, some extra data could be protected.
The data is: RTP timestamp, RTP sequence number, RTP marker bit.
Tampering could result in erroneous depacketization/decoding that
could lower speech quality. Tampering with the AMR mode request field
can result in that the sender must receive speech in a different
quality than desired.
Sjoberg et al. [Page 12]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
7. Examples
7.1. Simple example
In the simple example we just send one frame in each RTP packet, no
valid Codec Mode Request CMR is sent (R=0), the payload was not
damaged at IP origin (Q=1) and no CRC is used. The AMR mode is the
5.9 kbps mode (FT=2). The speech encoded bits are put into f(0) to
f(117) in descending sensitivity order according to [2]. Simple
payload sorting is used, S=0.
| Bit no. |
Oct| 0 1 2 3 4 5 6 7 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
0 | S=0 | C=0 | R=0 | 0 | 0 | 0 | F=0 | Q=1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
1 | 0 | 0 | 1 | 0 | f(0) | f(1) | f(2) | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
16 | f(116)| f(117)| 0 | 0 | 0 | 0 | 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
Figure 8: One frame per packet example.
7.2. Example with CRCs
In this example the two frames with 6.7 kbps mode (FT=3) are sent in
the payload. A mode request is sent(R=1), requesting the 10.2 kbps
mode for the other link(CMR=6). CRC is used (C=1). Frame one (134
bits) is f1(0..133) and frame 2 f2(0..133). For each payload frame a
CRC is calculated p1(0..7) for frame 1 and p2(0..7) for frame 2.
Simple payload sorting is used, S=0.
Sjoberg et al. [Page 13]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
| Bit no. |
Oct| 0 1 2 3 4 5 6 7 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
0 | S=0 | C=1 | R=1 | 1 | 1 | 0 | F=1 | Q=1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
1 | 0 | 0 | 1 | 1 | F=0 | Q=1 | 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
2 | 1 | 1 | p1(0) | p1(1) | p1(2) | p1(3) | p1(4) | p1(5) |
---+-------+-------+-------+-------+-------+-------+-------+-------+
3 | p1(6) | p1(7) | p2(0) | p2(1) | p2(2) | p2(3) | p2(4) | p2(5) |
---+-------+-------+-------+-------+-------+-------+-------+-------+
4 | p2(6) | p2(7) | f1(0) | f1(1) | ... | ... | ... | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
20 | ... | ... | ... | ... | ... | ... |f1(132)|f1(133)|
---+-------+-------+-------+-------+-------+-------+-------+-------+
21 | f2(0) | f2(1) | ... | ... | ... | ... | ... | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
37 | ... | ... | ... |f2(131)|f2(132)|f2(133)| 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
Figure 9: Example with CRCs.
7.3. Example with multiple frames per payload and robust sorting
In this example two 5.9 kbps mode (FT=2) frames are sent in one
payload. No CRC is used (C=0). A mode request is sent(R=1),
requesting the 7.95 kbps mode for the other link(CMR=5). The first
frame is represented by the 118 bits f(0) to f(117) and the
subsequent frame by g(0) to g(117). Robust sorting is used.
| Bit no. |
Oct| 0 1 2 3 4 5 6 7 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
0 | S=1 | C=0 | R=1 | 1 | 0 | 1 | F=1 | Q=1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
1 | 0 | 0 | 1 | 0 | F=0 | Q=1 | 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
2 | 1 | 0 | f(0) | g(0) | f(1) | g(1) | ... | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
31 | ... | ... | f(116)| g(116)| f(117)| g(117)| 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
Figure 10: Example two frames per payload and robust sorting.
8. The AMR MIME type registration
This chapter defines the MIME type for the Adaptive Multi-Rate (AMR)
speech codec [1]. The data format and parameters are specified for
both real-time transport and for storage type applications (e.g. e-
Sjoberg et al. [Page 14]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
mail attachment, multimedia messaging). The former is referred as RTP
mode and the latter as storage mode.
AMR implementations according to [1] MUST support all eight coding
modes. The mode change can occur at any time during operation and
therefore the mode information is transmitted in-band together with
speech bits to allow mode change without any additional signaling.
In addition to the speech codec, AMR specifications also include
Discontinuous Transmission / comfort noise (DTX/CN) functionality
[11]. The DTX/CN switches the transmission off during silent parts of
the speech and only CN parameter updates are sent at regular
intervals.
8.1. RTP mode
It is possible that the decoder may want to receive a certain AMR
mode or a subset of AMR modes, due to link limitations in some
cellular systems, e.g. the GSM radio link can only use a subset of
maximum four modes. Therefore, it is possible to request a specific
set of AMR modes in capability description and the encoder MUST abide
this request. If the request for mode set is not given any mode may
be used or requested.
The AMR codec can in principle perform a mode change at any time
between any two modes. To support interoperability with GSM through a
gate-way it is possible to set limitations for mode changes. The
decoder has possibility to define the minimum number of frames
between mode changes and to limit the mode change to happen into
neighboring modes only.
It is also possible to limit the number of AMR frames encapsulated
into one RTP packet. This is an optional feature and if no parameter
is given in capability description, the transmitter can encapsulate
any number of AMR speech frames into one RTP packet.
The payload CRC UED MUST only be used if the receiver has signaled
support for this functionality in the capability description.
To support unequal error protection and/or detection the payload
format supports robust payload sorting. The robust payload sorting is
an OPTIONAL feature and MUST only be used if the receiver has
signaled support for this functionality in the capability
description.
8.2. Storage mode
The AMR storage mode is used for storing AMR frames, e.g. as a file
or e-mail attachment. Frames are stored in consecutive order in octet
aligned manner. This implies that the first octet after the last
Sjoberg et al. [Page 15]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
octet of frame n must be the first octet of frame n+1. Each stored
AMR frame consists of a Q bit and the 4-bit FT field (see definition
in section 3.2), followed by the AMR encoded speech bits (see section
3.3). The last octet of each frame is padded with zeroes, if needed,
to achieve octet alignment. An example is given in figure 11.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Q| FT | |
+-+-+-+-+-+ +
| |
+ AMR speech bits for frame n +
| |
+ +-+-+-+-+-+
| | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Q| FT | |
+-+-+-+-+-+ +
| |
+ AMR speech bits for frame n+1 +
| |
+ +-+-+-+-+-+
| | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 11: An example of storage format with two AMR 5.9 kbit/s
frames (118 speech bits). Note that bits marked as 'padding' must be
set to zero.
Frames lost in transmission and non-received frames between SID
updates during non-speech period must be stored as NO_TRANSMISSION
frames (frame type 15, see definition in [2]) to keep synchronization
with the original media.
The receiving entity (AMR decoder) MUST be able to decode all eight
coding modes as well as the AMR DTX/CN [6]. Since no exchange of
particular coding considerations can be signaled before downloading
or receiving stored AMR data, the optional features (robust sorting,
CRC) specified for RTP mode MUST NOT be used with storage mode.
8.3. MIME Registration
MIME-name for the AMR codec is allocated from IETF tree since AMR is
expected to be widely used speech codec in VoIP applications. Some
parts of this chapter will distinguish between RTP and storage modes.
Media Type name: audio
Media subtype name: AMR
Sjoberg et al. [Page 16]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
Required parameters: none
Optional parameters for RTP mode:
mode-set: Requested AMR mode set. Restricts the active codec mode
set to a subset of all modes. Possible values are comma
separated list of modes: 0,...,7 (see Table 1a [2] an
example is given in section 8.4). If not present, all
speech modes are available.
mode-change-period: Defines a number N which restricts the mode
changes in such a way that mode changes are only allowed
on multiples of N, initial state of the phase is
arbitrary. If this parameter is not present, mode change
can happen at any time.
mode-change-neighbor: If present, mode changes SHALL only be made to
neighboring modes in the active codec mode set. If not
present, change between any two modes in the active codec
mode set is allowed.
maxframes: Maximum number of AMR speech frames in one RTP packet.
The receiver may set this parameter in order to limit
the buffering requirements or delay.
crc: If present, transmission of CRCs in the payload is
supported, otherwise not supported.
robust-sorting: If present, robust payload sorting is supported,
otherwise not supported and simple payload sorting SHALL
be used.
Optional parameters for storage mode: none
Encoding considerations for RTP mode: See section 3 in this document.
Encoding considerations for storage mode: See section 8.2 in this
document.
Security considerations: see chapter 6 "Security".
Public specification: please refer to chapter 9 "References".
Additional information for storage mode:
Magic number: none
File extensions: amr, AMR
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
johan.sjoberg@ericsson.com
ari.lakaniemi@nokia.com
Bernhard.Wimmer@mch.siemens.de
Intended usage: COMMON. It is expected that many VoIP applications
(as well as mobile applications) will use this type.
Sjoberg et al. [Page 17]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
Author/Change controller:
johan.sjoberg@ericsson.com
ari.lakaniemi@nokia.com
8.4 Mapping to SDP Parameters
Please note that this chapter applies to the RTP mode only.
Parameters are mapped to SDP [12] as usual.
Example usage in SDP:
m=audio 49120 RTP/AVP 97
a=rtpmap:97 AMR/8000
a=fmtp:97 mode-set=0,2,5,7; maxframes=1
9. References
[1] 3G TS 26.090, "Adaptive Multi-Rate (AMR) speech transcoding".
[2] 3G TS 26.101, "AMR Speech Codec Frame Structure".
[3] IETF RFC 2119, "Key words for use in RFCs to Indicate
Requirement Levels".
[4] 3G TS 26.093, "AMR Speech Codec; Source Controlled Rate
operation".
[5] GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding".
[6] TIA/EIA -136-Rev.A, part 410 - "TDMA Cellular/PCS - Radio
Interface, Enhanced Full Rate Voice Codec (ACELP). Formerly IS-
641. TIA published standard, 1998".
[7] ARIB, RCR STD-27H, "Personal Digital Cellular Telecommunication
System RCR Standard".
[8] IETF RFC1889, "RTP: A Transport Protocol for Real-Time
Applications".
[9] IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic
over Cellular Access Networks".
[10] IETF draft-larzon-udplite-03.txt, "The UDP Lite Protocol".
[11] GSM 06.92, "Comfort noise aspects for Adaptive Multi-Rate (AMR)
speech traffic channels".
[12] M. Handley and V. Jacobson, "SDP: Session Description
Protocol", RFC 2327, April 1998
Sjoberg et al. [Page 18]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
[13] 3G TS 25.415 "UTRAN Iu Interface User Plane Protocols"
[14] S. Floyd, M. Handley, J. Padhye, J. Widmer, "Equation-Based
Congestion Control for Unicast Applications", ACM SIGCOMM 2000,
Stockholm, Sweden
[15] IETF draft-ietf-avt-ulp-00.txt, " An RTP Payload Format for
Generic FEC with Uneven Level Protection ".
[16] IETF RFC2733, "An RTP Payload Format for Generic Forward Error
Correction".
[17] 3G TS 26.102, "AMR speech codec interface to Iu and Uu".
10. Authors' addresses
Johan Sjoberg Tel: +46 8 50878230
Ericsson Research EMail: Johan.Sjoberg@ericsson.com
Ericsson Radio Systems AB
Torshamnsgatan 23
SE-164 80 Stockholm
SWEDEN
Magnus Westerlund Tel: +46 8 4048287
Ericsson Research EMail: Magnus.Westerlund@ericsson.com
Ericsson Radio Systems AB
Torshamnsgatan 23
SE-164 80 Stockholm
SWEDEN
Ari Lakaniemi Tel: +358 40 5276440
Nokia Research Center EMail: ari.lakaniemi@nokia.com
P.O.Box 407
FIN-00045 Nokia Group
Finland
Petri Koskelainen
Nokia Research Center Email: petri.koskelainen@nokia.com
P.O.Box 100
FIN-33721 Tampere
Finland
Tim Fingscheidt Tel: +49 89 722 57658
Siemens AG, ICP CD Fax: +49 89 722 46489
Grillparzerstrasse 10-18 EMail: Tim.Fingscheidt@mch.siemens.de
D - 81675 Munich
Germany
Sjoberg et al. [Page 19]
INTERNET-DRAFT RTP Payload Format for AMR February 19, 2001
Bernhard Wimmer Tel: +49 89 722 23247
Siemens AG, ICP CD Fax: +49 89 722 46489
Grillparzerstrasse 10-18 EMail: Bernhard.Wimmer@mch.siemens.de
D - 81675 Munich
Germany
Qiaobing Xie Tel: +1-847-632-3028
Motorola, Inc. EMail: qxie1@email.mot.com
1501 W. Shure Drive, #2309
Arlington Heights, IL 60004
USA
Sanjay Gupta Tel: +1-847-435-0306
Motorola, Inc. EMail: QA4496@email.mot.com
1501 W. Shure Drive, #3205
Arlington Heights, IL 60004
USA
This Internet-Draft expires August 19, 2001.
Sjoberg et al. [Page 20]