Alan Duric
                                                    Soren Vang Andersen
   Internet Draft
   draft-ietf-avt-rtp-ilbc-01.txt                       Global IP Sound
   March 3rd, 2003
   Expires: September 3rd, 2003


                    RTP Payload Format for iLBC Speech


Status of this Memo

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract

   This document describes the RTP payload format for the internet Low
   Bit Rate Coder (iLBC) Speech [1] developed by Global IP Sound
   (GIPS). Also, within the document there are included necessary
   details for the use of iLBC with MIME and SDP.


Table of Contents

   Status of this Memo................................................1
   Abstract...........................................................1
   Table of Contents..................................................1
   1. INTRODUCTION....................................................2
   2. BACKGROUND......................................................2
   3. RTP PAYLOAD FORMAT..............................................3
   3.1 Bitstream definition...........................................3
   3.2 Multiple iLBC frames in a RTP packet...........................5
   4. IANA CONSIDERATIONS.............................................6
   4.1 Storage Mode...................................................6
   4.2 MIME registration of iLBC......................................6
   5. MAPPING TO SDP PARAMETERS.......................................8
   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003

   6. SECURITY CONSIDERATIONS.........................................8
   7. REFERENCES......................................................8
   8. ACKNOWLEDGEMENTS................................................9
   9. AUTHOR'S ADDRESSES..............................................9


1. INTRODUCTION

   This document describes how compressed iLBC speech as produced by
   the iLBC codec [1] may be formatted for use as an RTP payload type.
   Methods are provided to packetize the codec data frames into RTP
   packets. The sender may send one or more codec data frames per
   packet, depending on the application scenario or based on the
   transport network condition, bandwidth restriction, delay
   requirements and packet-loss tolerance.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC 2119 [2].


2. BACKGROUND

   Global IP Sound (GIPS) has developed and defines a freeware speech
   compression algorithm for use in IP based communications [1]. The
   iLBC codec enables graceful speech quality degradation in the case
   of lost frames, which occurs in connection with lost or delayed IP
   packets.

   Some of the applications for which this coder is suitable are: real
   time communications such as telephony and videoconferencing,
   streaming audio, archival and messaging.

   The iLBC codec [1] is an algorithm that compresses each basic frame
   (20 ms or 30 ms) of 8000 Hz, 16-bit sampled input speech into size
   output frames with rate of 399 bits for 30 ms basic frame size and
   303 bits for 20 ms basic frame size.

   The codec has support for two basic frame lengths û 30 ms at 13.33
   kbit/s and 20 ms at 15.2 kbit/s, using a block independent linear-
   predictive coding (LPC) algorithm. When the codec operates at block
   lengths of 20 ms, it produces 303 bits per block which SHOULD be
   packetized in 38 bytes. Similarly, for block lengths of 30 ms it
   produces 399 bits per block which SHOULD be packetized in 50 bytes.
   The described algorithm results in a speech coding system with a
   controlled response to packet losses similar to what is known from
   pulse code modulation (PCM) with a packet loss concealment (PLC),
   such as ITU-T G711 standard [10], which operates at a fixed bit rate
   of 64 kbit/s. At the same time, the described algorithm enables
   fixed bit rate coding with a quality-versus-bit rate tradeoff close
   to what is known from code-excited linear prediction (CELP).



   Duric, Andersen                                            [Page 2]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003

3. RTP PAYLOAD FORMAT

   The iLBC codec uses 20 or 30 ms frames and a sampling rate clock of
   8 kHz, so the RTP timestamp MUST be in units of 1/8000 of a second.
   The RTP payload for iLBC has the format shown in the figure bellow.
   No addition header specific to this payload format is required.

   This format is intended for the situations where the sender and the
   receiver send one or more codec data frames per packet. The RTP
   packet looks as follows:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      RTP Header [4]                           |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                                                               |
   +                 one or more frames of iLBC [1]                |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The RTP header of the packetized encoded iLBC speech has the
   expected values as described in [4]. The usage of M bit should be as
   specified in the applicable RTP profile, for example, RFC 1890 [5],
   where [5] specifies that if the sender does not suppress silence
   (i.e., sends a frame on every frame interval), the M bit will always
   be zero. When more then one codec data frame is present in a single
   RTP packet, the timestamp is, as always, that of the oldest data
   frame represented in the RTP packet.

   The assignment of an RTP payload type for this new packet format is
   outside the scope of this document, and will not be specified here.
   It is expected that the RTP profile for a particular class of
   applications will assign a payload type for this encoding, or if
   that is not done, then a payload type in the dynamic range shall be
   chosen by the sender.

3.1 Bitstream definition

   The total number of bits used to describe one frame of 20 ms speech
   is 303, which fits in 38 bytes and results in a bit rate of 15.20
   kbit/s. For the case with a frame length of 30 ms speech the total
   number of bits used is 399, which fits in 50 bytes and results in a
   bit rate of 13.33 kbit/s. In the bitstream definition the bits are
   distributed into three classes according to their bit error or loss
   sensitivity. The most sensitive bits (class 1) is placed first in
   the bitstream for each frame. The less sensitive bits (class 2) is
   placed after the class 1 bits. The least sensitive bits (class 3)
   are placed at the end of the bitstream for each frame.

   Looking at the 20/30 ms frame length casees for each class: The
   class 1 bits occupy a total of 6/8 bytes (48/64 bits), the class 2
   bits occupy 8/12 bytes (64/96 bits), and the class 3 bits occupy
   24/30 bytes (191/239 bits). This distribution of the bits enable the
   use of uneven level protection (ULP). The detailed bit allocation is
   Duric, Andersen                                            [Page 3]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003

   shown in the table below. When a quantization index is distributed
   between more classes the more significant bits belong to the lowest
   class.

   Bitstream structure:

   ------------------------------------------------------------------+
   Parameter                         |       Bits Class <1,2,3>      |
                                     |  20 ms frame  |  30 ms frame  |
   ----------------------------------+---------------+---------------+
                            Split 1  |   6 <6,0,0>   |   6 <6,0,0>   |
                   LSF 1    Split 2  |   7 <7,0,0>   |   7 <7,0,0>   |
   LSF                      Split 3  |   7 <7,0,0>   |   7 <7,0,0>   |
                   ------------------+---------------+---------------+
                            Split 1  | NA (Not Appl.)|   6 <6,0,0>   |
                   LSF 2    Split 2  |      NA       |   7 <7,0,0>   |
                            Split 3  |      NA       |   7 <7,0,0>   |
                   ------------------+---------------+---------------+
                   Sum               |  20 <20,0,0>  |  40 <40,0,0>  |
   ----------------------------------+---------------+---------------+
   Block Class.                      |   2 <2,0,0>   |   3 <3,0,0>   |
   ----------------------------------+---------------+---------------+
   Position 22 sample segment        |   1 <1,0,0>   |   1 <1,0,0>   |
   ----------------------------------+---------------+---------------+
   Scale Factor State Coder          |   6 <6,0,0>   |   6 <6,0,0>   |
   ----------------------------------+---------------+---------------+
                   Sample 0          |   3 <0,1,2>   |   3 <0,1,2>   |
   Quantized       Sample 1          |   3 <0,1,2>   |   3 <0,1,2>   |
   Residual           :              |   :    :      |   :    :      |
   State              :              |   :    :      |   :    :      |
   Samples            :              |   :    :      |   :    :      |
                   Sample 56         |   3 <0,1,2>   |   3 <0,1,2>   |
                   Sample 57         |      NA       |   3 <0,1,2>   |
                   ------------------+---------------+---------------+
                   Sum               | 171 <0,57,114>| 174 <0,58,116>|
   ----------------------------------+---------------+---------------+
                            Stage 1  |   7 <6,0,1>   |   7 <4,2,1>   |
   CB for 22/23             Stage 2  |   7 <0,0,7>   |   7 <0,0,7>   |
   sample block             Stage 3  |   7 <0,0,7>   |   7 <0,0,7>   |
                   ------------------+---------------+---------------+
                   Sum               |  21 <6,0,15>  |  21 <4,2,15>  |
   ----------------------------------+---------------+---------------+
                            Stage 1  |   5 <2,0,3>   |   5 <1,1,3>   |
   Gain for 22/23           Stage 2  |   4 <1,1,2>   |   4 <1,1,2>   |
   sample block             Stage 3  |   3 <0,0,3>   |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                   Sum               |  12 <3,1,8>   |  12 <2,2,8>   |
   Duric, Andersen                                            [Page 4]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003

   ----------------------------------+---------------+---------------+
                            Stage 1  |   8 <7,0,1>   |   8 <6,1,1>   |
               sub-block 1  Stage 2  |   7 <0,0,7>   |   7 <0,0,7>   |
                            Stage 3  |   7 <0,0,7>   |   7 <0,0,7>   |
                   ------------------+---------------+---------------+
                            Stage 1  |   8 <0,0,8>   |   8 <0,7,1>   |
               sub-block 2  Stage 2  |   8 <0,0,8>   |   8 <0,0,8>   |
   Indices                  Stage 3  |   8 <0,0,8>   |   8 <0,0,8>   |
   for CB          ------------------+---------------+---------------+
   sub-blocks               Stage 1  |      NA       |   8 <0,7,1>   |
               sub-block 3  Stage 2  |      NA       |   8 <0,0,8>   |
                            Stage 3  |      NA       |   8 <0,0,8>   |
                   ------------------+---------------+---------------+
                            Stage 1  |      NA       |   8 <0,7,1>   |
               sub-block 4  Stage 2  |      NA       |   8 <0,0,8>   |
                            Stage 3  |      NA       |   8 <0,0,8>   |
                   ------------------+---------------+---------------+
                   Sum               |  46 <7,0,39>  |  94 <6,22,66> |
   ----------------------------------+---------------+---------------+
                            Stage 1  |   5 <1,2,2>   |   5 <1,2,2>   |
               sub-block 1  Stage 2  |   4 <1,1,2>   |   4 <1,2,1>   |
                            Stage 3  |   3 <0,0,3>   |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                            Stage 1  |   5 <1,1,3>   |   5 <0,2,3>   |
               sub-block 2  Stage 2  |   4 <0,2,2>   |   4 <0,2,2>   |
                            Stage 3  |   3 <0,0,3>   |   3 <0,0,3>   |
   Gains for       ------------------+---------------+---------------+
   sub-blocks               Stage 1  |      NA       |   5 <0,1,4>   |
               sub-block 3  Stage 2  |      NA       |   4 <0,1,3>   |
                            Stage 3  |      NA       |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                            Stage 1  |      NA       |   5 <0,1,4>   |
               sub-block 4  Stage 2  |      NA       |   4 <0,1,3>   |
                            Stage 3  |      NA       |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                   Sum               |  24 <3,6,15>  |  48 <2,12,34> |
   -------------------------------------------------------------------
   SUM                                 303 <48,64,191> 399 <64,96,239>

   Table 3.1 The bitstream definition for iLBC.

   When packetized into the payload the bits MUST be sorted as: All the
   class 1 bits in the order (from top and down) as they were specified
   in the table, all the class 2 bits (from top and down) and finally
   all the class 3 bits in the same sequential order.

   The last unused bit of the payload (for both 20 ms and 30 ms frame
   size) SHOULD be set to zero.



3.2 Multiple iLBC frames in a RTP packet

   More than one iLBC frame may be included in a single RTP packet by a
   sender.
   Duric, Andersen                                            [Page 5]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003


   It is important to observe that senders have the following
   additional restrictions:

   o SHOULD NOT include more iLBC frames in a single RTP packet than
   will fit in the MTU of the RTP transport protocol.

   o Frames MUST NOT be split between RTP packets.

   It is RECOMMENDED that the number of frames contained within an RTP
   packet is consistent with the application.  For example, in a
   telephony and other real time applications where delay is important,
   then the fewer frames per packet the lower the delay, whereas for a
   bandwidth constrained links or delay insensitive streaming messaging
   application, more then one or many frames per packet would be
   acceptable.

   Information describing the number of frames contained in an RTP
   packet is not transmitted as part of the RTP payload.  The way to
   determine the number of iLBC frames is to count the total number of
   octets within the RTP packet, and divide the octet count by the
   number of expected octets per frame (32/50 per frame).


4. IANA CONSIDERATIONS

   One new MIME sub-type as described in this section is to be
   registered.

4.1 Storage Mode

   The storage mode is used for storing speech frames (e.g. as a file
   or e-mail attachment).

   +------------------+
   | Header           |
   +------------------+
   | Speech frame 1   |
   +------------------+
   :                  :
   +------------------+
   | Speech frame n   |
   +------------------+

   The file begins with a header that includes only a magic number to
   identify that it is an iLBC file. The magic number for iLBC file
   MUST correspond to the ASCII character string "#!iLBC\n", or "0x23
   0x21 0x69 0x4C 0x42 0x43 0x0A" in hexadecimal form. After the
   header, follow the speech frames in consecutive order.


4.2 MIME registration of iLBC

   MIME media type name: audio

   Duric, Andersen                                            [Page 6]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003

   MIME subtype: iLBC

   Optional parameters:

   This parameter applies to RTP transfer only.

        maxptime:The maximum amount of media which can be
                 encapsulated in a payload packet, expressed
                 as time in milliseconds. The time is
                 calculated as the sum of the time the media
                 present in the packet represents. The time SHOULD be
                 a multiple of the frame size. If this parameter is
                 not present, the sender MAY encapsulate any number of
                 speech frames into one RTP packet.

   Encoding considerations:
                  This type is defined for transfer via both RTP (RFC
                  1889) and stored-file methods as described in Section
                  4.1, of RFC XXXX. Audio data is binary data, and must
                  be encoded for non-binary transport; the Base64
                  encoding is suitable for Email.

   Security considerations:
                  See Section 6 of RFC XXXX.

   Public specification:
                  Please refer to RFC XXXX [1].

   Additional information:
                  The following applies to stored-file transfer
                  methods:

                  Magic number:
                  ASCII character string "#!iLBC\n"
                  (or 0x23 0x21 0x69 0x4C 0x42 0x43 0x0A in
                  hexadecimal)

                  File extensions: lbc, LBC
                  Macintosh file type code: none
                  Object identifier or OID: none

   Person & email address to contact for further information:
                  alan.duric@globalipsound.com

   Intended usage: COMMON.
                  It is expected that many VoIP applications will use
                  this type.

   Author/Change controller:
                  alan.duric@globalipsound.com
                  IETF Audio/Video transport working group
   Duric, Andersen                                            [Page 7]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003


5. MAPPING TO SDP PARAMETERS

   Parameters are mapped to SDP [7] in a standard way. When conveying
   information by SDP, the encoding name SHALL be "iLBC" (the same as
   the MIME subtype). An example of the media representation in SDP for
   describing iLBC might be:

     m=audio 49120 RTP/AVP 97
     a=rtpmap:97 iLBC/8000

   If 20 ms frame size mode is used, remote iLBC encoder SHALL receive
   ômodeö parameter in the SDP "a=fmtp" attribute by copying them
   directly from the MIME media type string as a semicolon separated
   with parameter=value, where parameter is ômodeö, and values can be
   0, 20 or 30 (where 0 stands for support of both frame size modes; 20
   stands for preffered 20 ms frame size, etc.). An example of the
   media representation in SDP for describing iLBC when 20 ms frame
   size mode is used might be:

     m=audio 49120 RTP/AVP 97
     a=rtpmap:97 iLBC/8000
     a=fmtp:97 mode=20


6. SECURITY CONSIDERATIONS

   RTP packets using the payload format defined in this specification
   are subject to the general security considerations discussed in [4]
   and any appropriate profile (e.g. [5]).

   As this format transports encoded speech, the main security issues
   include confidentiality and authentication of the speech itself. The
   payload format itself does not have any built-in security
   mechanisms. Confidentiality of the media streams is achieved by
   encryption, therefore external mechanisms, such as SRTP [9], MAY be
   used for that purpose. The data compression used with this payload
   format is applied end-to-end; hence encryption may be performed
   after compression with no conflict between the two operations.

   A potential denial-of-service threat exists for data encoding using
   compression techniques that have non-uniform receiver-end
   computational load. The attacker can inject pathological datagrams
   into the stream which are complex to decode and cause the receiver
   to become overloaded. However, the encodings covered in this
   document do not exhibit any significant non-uniformity.


7. REFERENCES

   [1] Andersen, et al., Internet Low Bit Rate Codec (iLBC)", draft-
      ietf-avt-lbc-codec-01.txt, March 2003.

   [2] S. Bradner, "Key words for use in RFCs to Indicate requirement
      Levels", BCP 14, RFC 2119, March 1997.
   Duric, Andersen                                            [Page 8]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003


   [3] S. Bradner, "The Internet Standards Process -- Revision 3", BCP
      9, RFC 2026, October 1996

   [4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP:
      A Transport Protocol for Real-Time Applications", IETF RFC 1889,
      January 1996.

   [5] H. Schulzrinne, "RTP Profile for Audio and Video Conferences
      with Minimal Control" IETF RFC 1890, January 1996.

   [6] Handley & Perkins, "Guidelines for Writers of RTP Payload
      Formats", BCP 36, RFC 2736, December 1999.

   [7] M. Handley and V. Jacobson, "SDP: Session Description Protocol",
      IETF RFC 2327, April 1998

   [8] N. Freed and N. Borenstein, "Multipurpose Internet Mail
      Extensions (MIME) Part One: Format of Internet Message Bodies",
      IETF RFC 2045, November 1996.

   [9] Baugher, et al., "The Secure Real Time Transport Protocol", IETF
      Draft, June 2002.

   [10] ITU-T Recommendation G.711, available online from the ITU
      bookstore at http://www.itu.int.

   [11] J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, ôRTP payload
      format and file storage format for the Adaptive Multi-Rate (AMR)
      and Adaptive Multi-Rate Wideband (AMR-WB) audio codecsö, IETF RFC
      3267, June 2002.

8. ACKNOWLEDGEMENTS

   The authors wish to thank Henry Sinnreich and Patrik Faltstrom for
   great support of the iLBC initiative and for their valuable feedback
   and comments.


9. AUTHOR'S ADDRESSES

   Alan Duric
   Global IP Sound AB
   Rosenlundsgatan 54
   Stockholm, S-11863
   Sweden
   Phone:  +46 8 54553040
   Email:  alan.duric@globalipsound.com

   Soren Vang Andersen
   Department of Communication Technology
   Aalborg University
   Fredrik Bajers Vej 7A
   9200 Aalborg
   Denmark
   Duric, Andersen                                            [Page 9]


   INTERNET DRAFT RTP Payload format for iLBC Speech        March 2003

   Phone:  ++45 9 6358627
   Email:  sva@kom.auc.dk

   Duric, Andersen                                           [Page 10]