Alan Duric
Soren Vang Andersen
Internet Draft
draft-ietf-avt-rtp-ilbc-00.txt Global IP Sound
October 28th, 2002
Expires: April, 28th, 2003
RTP Payload Format for iLBC Speech
Status of this Memo
This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document describes the RTP payload format for the internet Low
Bit Rate Coder (iLBC) Speech [1] developed by Global IP Sound
(GIPS). Also, within the document there are included necessary
details for the use of iLBC with MIME and SDP.
Table of Contents
Status of this Memo................................................1
Abstract...........................................................1
Table of Contents..................................................1
1. INTRODUCTION....................................................2
2. BACKGROUND......................................................2
3. RTP PAYLOAD FORMAT..............................................3
3.1 Bitstream definition...........................................3
3.2 Multiple iLBC frames in a RTP packet...........................5
4. IANA CONSIDERATIONS.............................................6
4.1 Storage Mode...................................................6
4.2 MIME registration of iLBC......................................6
5. MAPPING TO SDP PARAMETERS.......................................7
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
6. SECURITY CONSIDERATIONS.........................................7
7. REFERENCES......................................................8
8. ACKNOWLEDGEMENTS................................................9
9. AUTHOR'S ADDRESSES..............................................9
1. INTRODUCTION
This document describes how compressed iLBC speech as produced by
the iLBC codec [1] may be formatted for use as an RTP payload type.
Methods are provided to packetize the codec data frames into RTP
packets. The sender may send one or more codec data frames per
packet, depending on the application scenario or based on the
transport network condition, bandwidth restriction, delay
requirements and packet-loss tolerance.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [2].
2. BACKGROUND
Global IP Sound (GIPS) has developed and defines a freeware speech
compression algorithm for use in IP based communications [1]. The
iLBC codec enables graceful speech quality degradation in the case
of lost frames, which occurs in connection with lost or delayed IP
packets.
Some of the applications for which this coder is suitable are: real
time communications such as telephony and videoconferencing,
streaming audio, archival and messaging.
The iLBC codec [1] is an algorithm that compresses each 30 ms of
8000 Hz, 16-bit sampled input speech into size output frames with
rate of 399 bits.
The codec has a bit rate of 13.33 kbits/s using a block independent
linear-predictive coding (LPC) algorithm. The codec operates at
block lengths of 30 ms and produces 399 bits per block, which can be
packetized in 50 bytes. The described algorithm results in a speech
coding system with a controlled response to packet losses similar to
what is known from pulse code modulation (PCM) with a packet loss
concealment (PLC), such as ITU-T G711 standard [10], which operates
at a fixed bit rate of 64 kbit/s. At the same time, the described
algorithm enables fixed bit rate coding with a quality-versus-bit
rate tradeoff close to what is known from code-excited linear
prediction (CELP).
Duric, Andersen [Page 2]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
3. RTP PAYLOAD FORMAT
The iLBC codec uses 30 ms frames and a sampling rate clock of 8 kHz,
so the RTP timestamp MUST be in units of 1/8000 of a second. The RTP
payload for iLBC has the format shown in the figure bellow. No
addition header specific to this payload format is required.
This format is intended for the situations where the sender and the
receiver send one or more codec data frames per packet. The RTP
packet looks as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header [4] |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| |
+ one or more frames of iLBC [1] |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RTP header of the packetized encoded iLBC speech has the
expected values as described in [4]. The usage of M bit should be as
specified in the applicable RTP profile, for example, RFC 1890 [5],
where [5] specifies that if the sender does not suppress silence
(i.e., sends a frame on every 30 millisecond interval), the M bit
will always be zero. When more then one codec data frame is present
in a single RTP packet, the timestamp is, as always, that of the
oldest data frame represented in the RTP packet.
The assignment of an RTP payload type for this new packet format is
outside the scope of this document, and will not be specified here.
It is expected that the RTP profile for a particular class of
applications will assign a payload type for this encoding, or if
that is not done, then a payload type in the dynamic range shall be
chosen by the sender.
3.1 Bitstream definition
The total number of bits used to describe one block of 30 ms speech
is 399, which fits in 50 bytes and results in a bit rate of 13.33
kbit/s. In the bitstream definition the bits are distributed into
three classes according to their bit error or loss sensitivity. The
most sensitive bits (class 1) are placed first in the bitstream for
each frame. The less sensitive bits (class 2) are placed after the
class 1 bits. The least sensitive bits (class 3) are placed at the
end of the bitstream for each frame.
The class 1 bits occupy a total of 8 bytes (64 bits), the class 2
bits occupy 12 bytes (96 bits), and the class 3 bits occupy 30 bytes
(239 bits). This distribution of the bits enables the use of uneven
level protection (ULP). The detailed bit allocation is shown in the
table below. When a quantization index is distributed between more
classes the more significant bits belong to the lowest class.
Duric, Andersen [Page 3]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
Bitstream structure:
Parameter Bits Class 1,2,3
-------------------------------------------------------------------
Split 1 6 6,0,0
LSF 1 Split 2 7 7,0,0
LSF Split 3 7 7,0,0
----------------------------------------------------
Split 1 6 6,0,0
LSF 2 Split 2 7 7,0,0
Split 3 7 7,0,0
----------------------------------------------------
Sum 40 20,0,0
-------------------------------------------------------------------
Block Class. 3 3,0,0
-------------------------------------------------------------------
Position 22 sample segment 1 1,0,0
-------------------------------------------------------------------
Scale Factor State Coder 6 6,0,0
-------------------------------------------------------------------
Sample 0 3 0,1,2
Quantized Sample 1 3 0,1,2
Residual : : :
State : : :
Samples : : :
Sample 56 3 0,1,2
Sample 57 3 0,1,2
----------------------------------------------------
Sum 174 0,58,116
-------------------------------------------------------------------
Stage 1 7 4,2,1
CB for 22 samples in start state Stage 2 7 0,0,7
Stage 3 7 0,0,7
----------------------------------------------------
Sum 21 4,2,15
-------------------------------------------------------------------
Stage 1 5 1,1,3
Gain for 22 samples in start state Stage 2 4 1,1,2
Stage 3 3 0,0,3
----------------------------------------------------
Sum 12 2,2,8
-------------------------------------------------------------------
Stage 1 8 6,1,1
Indices sub-block 1 Stage 2 7 0,0,7
Stage 3 7 0,0,7
----------------------------------------------------
Stage 1 8 0,7,1
Indices sub-block 2 Stage 2 8 0,0,8
Stage 3 8 0,0,8
CB sub-blocks ----------------------------------------------------
Stage 1 8 0,7,1
Indices sub-block 3 Stage 2 8 0,0,8
Stage 3 8 0,0,8
----------------------------------------------------
Duric, Andersen [Page 4]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
Stage 1 8 0,7,1
Indices sub-block 4 Stage 2 8 0,0,8
Stage 3 8 0,0,8
----------------------------------------------------
Sum 94 6,22,66
-------------------------------------------------------------------
Stage 1 5 1,2,2
Gains sub-block 1 Stage 2 4 1,2,1
Stage 3 3 0,0,3
----------------------------------------------------
Stage 1 5 0,2,3
Gains sub-block 2 Stage 2 4 0,2,2
Stage 3 3 0,0,3
Gain sub-blocks ---------------------------------------------------
Stage 1 5 0,1,4
Gains sub-block 3 Stage 2 4 0,1,3
Stage 3 3 0,0,3
----------------------------------------------------
Stage 1 5 0,1,4
Gains sub-block 4 Stage 2 4 0,1,3
Stage 3 3 0,0,3
----------------------------------------------------
Sum 48 2,12,34
-------------------------------------------------------------------
SUM 399 64,96,239
Table 3.1 The bitstream definition for iLBC.
When packetized into the payload the bits MUST be sorted as: All the
class 1 bits in the order (from top and down) as they were specified
in the table, all the class 2 bits (from top and down) and finally
all the class 3 bits in the same sequential order. The last unused
bit of the payload SHOULD be set to zero.
3.2 Multiple iLBC frames in a RTP packet
More than one iLBC frame may be included in a single RTP packet by a
sender.
It is important to observe that senders have the following
additional restrictions:
o SHOULD NOT include more iLBC frames in a single RTP packet than
will fit in the MTU of the RTP transport protocol.
o Frames MUST NOT be split between RTP packets.
It is RECOMMENDED that the number of frames contained within an RTP
packet is consistent with the application. For example, in a
telephony and other real time applications where delay is important,
then the fewer frames per packet the lower the delay, whereas for a
bandwidth constrained links or delay insensitive streaming messaging
application, more then one or many frames per packet would be
acceptable.
Duric, Andersen [Page 5]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
Information describing the number of frames contained in an RTP
packet is not transmitted as part of the RTP payload. The way to
determine the number of iLBC frames is to count the total number of
octets within the RTP packet, and divide the octet count by the
number of expected octets per frame (50 per frame).
4. IANA CONSIDERATIONS
One new MIME sub-type as described in this section is to be
registered.
4.1 Storage Mode
The storage mode is used for storing speech frames (e.g. as a file
or e-mail attachment).
+------------------+
| Header |
+------------------+
| Speech frame 1 |
+------------------+
: :
+------------------+
| Speech frame n |
+------------------+
The file begins with a header that includes only a magic number to
identify that it is an iLBC file. The magic number for iLBC file
MUST correspond to the ASCII character string "#!iLBC\n", or "0x23
0x21 0x69 0x4C 0x42 0x43 0x0A" in hexadecimal form. After the
header, follow the speech frames in consecutive order.
4.2 MIME registration of iLBC
MIME media type name: audio
MIME subtype: iLBC
Optional parameters:
This parameter applies to RTP transfer only.
maxptime:The maximum amount of media which can be
encapsulated in a payload packet, expressed
as time in milliseconds. The time is
calculated as the sum of the time the media
present in the packet represents. The time SHOULD be
a multiple of the frame size. If this parameter is
not present, the sender MAY encapsulate any number of
speech frames into one RTP packet.
Encoding considerations:
Duric, Andersen [Page 6]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
This type is defined for transfer via both RTP (RFC
1889) and stored-file methods as described in Section
4.1, of RFC XXXX. Audio data is binary data, and must
be encoded for non-binary transport; the Base64
encoding is suitable for Email.
Security considerations:
See Section 6 of RFC XXXX.
Public specification:
Please refer to RFC XXXX [1].
Additional information:
The following applies to stored-file transfer
methods:
Magic number:
ASCII character string "#!iLBC\n"
(or 0x23 0x21 0x69 0x4C 0x42 0x43 0x0A in
hexadecimal)
File extensions: lbc, LBC
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
alan.duric@globalipsound.com
Intended usage: COMMON.
It is expected that many VoIP applications will use
this type.
Author/Change controller:
alan.duric@globalipsound.com
IETF Audio/Video transport working group
5. MAPPING TO SDP PARAMETERS
Parameters are mapped to SDP [7] in a standard way. When conveying
information by SDP, the encoding name SHALL be "iLBC" (the same as
the MIME subtype). An example of the media representation in SDP for
describing iLBC might be:
m=audio 49120 RTP/AVP 97
a=rtpmap:97 iLBC/8000
6. SECURITY CONSIDERATIONS
RTP packets using the payload format defined in this specification
are subject to the general security considerations discussed in [4]
and any appropriate profile (e.g. [5]).
As this format transports encoded speech, the main security issues
include confidentiality and authentication of the speech itself. The
Duric, Andersen [Page 7]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
payload format itself does not have any built-in security
mechanisms. Confidentiality of the media streams is achieved by
encryption, therefore external mechanisms, such as SRTP [9], MAY be
used for that purpose. The data compression used with this payload
format is applied end-to-end; hence encryption may be performed
after compression with no conflict between the two operations.
A potential denial-of-service threat exists for data encoding using
compression techniques that have non-uniform receiver-end
computational load. The attacker can inject pathological datagrams
into the stream which are complex to decode and cause the receiver
to become overloaded. However, the encodings covered in this
document do not exhibit any significant non-uniformity.
7. REFERENCES
[1] Andersen, et al., Internet Low Bit Rate Codec (iLBC)", draft-
ietf-avt-rtp-ilbc-00.txt, September 2002.
[2] S. Bradner, "Key words for use in RFCs to Indicate requirement
Levels", BCP 14, RFC 2119, March 1997.
[3] S. Bradner, "The Internet Standards Process -- Revision 3", BCP
9, RFC 2026, October 1996
[4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP:
A Transport Protocol for Real-Time Applications", IETF RFC 1889,
January 1996.
[5] H. Schulzrinne, "RTP Profile for Audio and Video Conferences
with Minimal Control" IETF RFC 1890, January 1996.
[6] Handley & Perkins, "Guidelines for Writers of RTP Payload
Formats", BCP 36, RFC 2736, December 1999.
[7] M. Handley and V. Jacobson, "SDP: Session Description Protocol",
IETF RFC 2327, April 1998
[8] N. Freed and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
IETF RFC 2045, November 1996.
[9] Baugher, et al., "The Secure Real Time Transport Protocol", IETF
Draft, June 2002.
[10] ITU-T Recommendation G.711, available online from the ITU
bookstore at http://www.itu.int.
[11] J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, ôRTP payload
format and file storage format for the Adaptive Multi-Rate (AMR)
and Adaptive Multi-Rate Wideband (AMR-WB) audio codecsö, IETF RFC
3267, June 2002.
Duric, Andersen [Page 8]
INTERNET DRAFT RTP Payload format for iLBC Speech October 2002
8. ACKNOWLEDGEMENTS
The authors wish to thank Henry Sinnreich and Patrik Faltstrom for
great support of the iLBC initiative and for their valuable feedback
and comments.
9. AUTHOR'S ADDRESSES
Alan Duric
Global IP Sound AB
Rosenlundsgatan 54
Stockholm, S-11863
Sweden
Phone: +46 8 54553040
Email: alan.duric@globalipsound.com
Soren Vang Andersen
Department of Communication Technology
Aalborg University
Fredrik Bajers Vej 7A
9200 Aalborg
Denmark
Phone: ++45 9 6358627
Email: sva@kom.auc.dk
Duric, Andersen [Page 9]