Internet Engineering Task Force            Peter Barany, Nortel Networks
Audio Video Transport WG                William Navarro, Nortel Networks
INTERNET-DRAFT
November 14, 2001
Expires: May 14, 2002




                 RTP payload format for EFR speech codec
                      <draft-barany-avt-efr-00.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This document is an individual submission to the IETF AVT WG.
   Comments should be directed to the authors.

Abstract

   This document specifies a Real-Time Transport Protocol (RTP) payload
   format for the Global System for Mobile communications (GSM) Enhanced
   Full Rate (EFR) speech codec. The EFR speech codec RTP payload format
   specified in this document closely resembles the EFR speech codec RTP
   payload format defined in TS 101 318 "Using GSM Speech Codecs Within
   ITU-T Recommendation H.323". It is designed specifically to optimally
   interoperate with existing (i.e., legacy) GSM circuit-switched
   transceiver equipment in the sense that it supports the following
   EFR speech codec circuit-switched domain functionality in the packet-
   switched domain: error concealment of lost speech frames and SIlence
   Descriptor (SID) frames. The EFR speech codec RTP payload format
   defined in TS 101 318 does not support this functionality. A MIME
   type registration for the EFR speech codec is also included.




Barany et al.                                                   [PAGE 1]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


Revision history

   -00: Document created for specification of an RTP payload format for
        the EFR speech codec.

Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119
   [ref-RFC-2119].

Table of contents

   Status of this memo.................................................1
   Abstract............................................................1
   Revision history (remove before publishing).........................2
   Conventions used in this document...................................2
   Table of contents...................................................2
   1.   Introduction...................................................2
   1.1. EFR speech codec...............................................2
   1.2. Existing RTP payload format for EFR speech codec...............3
   1.3. Legacy transceiver interoperability............................3
   1.4  EFR speech codec and AMR speech codec comparison...............4
   2.   Payload format.................................................5
   3.   IANA considerations............................................6
   4.   Security considerations........................................6
   5.   MIME type registration.........................................6
   5.1.  Mapping to SDP parameters.....................................7
   6.   References.....................................................7
   7.   Authors' addresses.............................................8

1.  Introduction

   This document specifies a Real-Time Transport Protocol (RTP) payload
   format for the Global System for Mobile communications (GSM) Enhanced
   Full Rate (EFR) speech codec. The EFR speech codec RTP payload format
   specified in this document closely resembles the EFR speech codec RTP
   payload format defined in [ref-EFR-RTP]. It is designed specifically
   to optimally interoperate with existing (i.e., legacy) GSM circuit-
   switched transceiver equipment in the sense that it supports the
   following EFR speech codec circuit-switched domain functionality in
   the packet-switched domain: error concealment of lost speech frames
   and SIlence Descriptor (SID) frames [ref-EFR-ERR].

1.1.  EFR speech codec

   The Enhanced Full Rate (EFR) speech codec [ref-EFR-COD] was developed


   Barany et al.                                                   [PAGE 2]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


   by the European Telecommunications Standards Institute (ETSI). The
   EFR speech codec is standardized for the Global System for Mobile
   communications (GSM).

   The EFR speech codec is a single-mode speech codec with a bit rate of
   12.2 kbps (i.e., 244 speech bits per 20 ms speech frame). The
   sampling frequency is 8,000 Hz, consequently there are 160 samples
   per 20 ms speech frame.

   In the circuit-switched domain, the EFR speech codec supports the
   following functionality:

   (1) DTX operation [ref-EFR-DTX]; and

   (2) error concealment of lost speech frames and SID frames [ref-EFR-
   ERR]

   This functionality is important because it makes it possible to
   achieve optimum Mean Opinion Scores (MOS) for GSM circuit-switched
   voice service using the EFR speech codec.

1.2.  Existing RTP payload format for EFR speech codec

   An existing RTP payload format for the EFR speech codec is defined
   in [ref-EFR-RTP] which is referenced in [ref-RTP-PROF]. A MIME
   registration for this RTP payload format is defined in [ref-RTP-
   MIME].

   While this EFR speech codec RTP payload format can be used to
   interoperate with existing (i.e., legacy) GSM circuit-switched
   transceiver equipment, the functionality will be suboptimal in the
   sense that it does not support the following EFR speech codec
   circuit-switched domain functionality in the packet-switched domain:
   error concealment of lost speech frames and SID frames [ref-EFR-ERR].

   Error concealment of lost speech frames and SID frames is not
   possible because the RTP payload format does not incorporate a
   payload quality indicator.

1.3.  Legacy transceiver interoperability

   The GSM/EDGE Radio Access Network (GERAN) (where EDGE stands for
   Enhanced Data Rates for Global Evolution) is described in [ref-
   GERAN]. GERAN is an evolution of:

   (1) GSM circuit-switched voice and data radio access networks; and

   (2) General Packet Radio Service (GPRS) and Enhanced GPRS (EGPRS)


Barany et al.                                                   [PAGE 3]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


   packet-switched radio access networks.

   GERAN provides an interface between these radio access networks and
   the Universal Mobile Telecommunications System (UMTS) core network.

   Currently, there are a great deal of legacy GSM circuit-switched
   transceivers deployed in the field by service providers that
   implement a standardized scheme for channel coding/decoding,
   interleaving/deinterleaving, CRC, modulation/demodulation, etc.
   [ref-EFR-CH] for EFR speech codec based GSM circuit-switched voice
   service.

   GERAN defines a service known as the "optimized speech bearer"
   [ref-GERAN] that makes it possible for a service provider to reuse
   these legacy GSM circuit-switched transceivers for EFR speech codec
   based GERAN packet-switched voice service. For the optimized speech
   bearer service, network level and transport level headers (i.e.,
   IP/UDP/RTP) are not transmitted over the air interface (i.e., Uu
   interface). The receiving entity (i.e., terminal or radio network
   controller) can regenerate the headers based upon (1) information
   submitted during call setup and (2) information derived from lower
   layers (i.e., link and physical layers). Note that the regenerated
   headers may not always be semantically identical to the original
   headers.

   Figure 1 illustrates a likely EFR speech codec based GERAN optimized
   speech bearer scenario where the EFR speech codec is used as a
   packet-switched application in a GERAN system with existing (i.e.,
   legacy) GSM circuit-switched transceiver equipment.


           Uu interface                          Iu-ps interface
   +----------+     +-------------+     +------------+     +-----------+
   |          |---->|   LEGACY    |---->|   RADIO    |---->|           |
   | TERMINAL |     |    BASE     |     |  NETWORK   |     |  GATEWAY  |
   |          |<----|   STATION   |<----| CONTROLLER |<----|           |
   |          |     | TRANSCEIVER |     |            |     |           |
   +----------+     +-------------+     +------------+     +-----------+

   Figure 1. Terminal to gateway scenario.


1.4  EFR speech codec and AMR speech codec comparison

   As mentioned in Section 1.1 of this document, the EFR speech codec is
   a single-mode speech codec with a bit rate of 12.2 kbps (i.e., 244
   speech bits per 20 ms speech frame). The sampling frequency is 8,000
   Hz, consequently there are 160 samples per 20 ms speech frame.  The


Barany et al.                                                   [PAGE 4]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


   original order of the 244 speech bits for the EFR speech codec as
   delivered from the speech encoder is defined in Table 5 in [ref-EFR-
   COD]. The 244 speech bits pass through a preliminary channel encoder
   which produces 260 bits corresponding to 244 input speech bits and 16
   redundancy bits [ref-EFR-CH]. The 260 bits are then reordered in
   descending bit error sensitivity order according to Table 6 in [ref-
   EFR-CH]. This enables the use of Unequal Error Detection (UED) and
   Unequal Error Protection (UEP). There are a total of 182 Class 1 bits
   (protected) and 78 Class 2 bits (unprotected). The Class 1 bits are
   further divided into Class 1a (the 50 most important bits) and Class
   1b bits (the 132 next most important bits). The Class 1a bits are
   protected by a cyclic code and a convolutional code whereas the Class
   1b bits are protected by the convolutional code only.

   The 12.2 kbps speech mode is one of the eight Adaptive Multi-Rate
   (AMR) speech codec speech modes [ref-AMR-COD]. The original order of
   the 244 speech bits for the 12.2 kbps speech mode of the AMR speech
   codec as delivered from the speech encoder is defined in Table 9a in
   [ref-AMR-COD]. This is the same as that defined for the EFR speech
   codec. However, for the AMR speech codec, the 244 speech bits do not
   pass through a preliminary channel coder and 16 redundancy bits are
   not added. Also, the 244 bits are reordered in descending bit error
   sensitivity order in a different manner than that done for the EFR
   speech codec (see Table 7 in [ref-EFR-CH]), with the bits being
   classified as Class A bits (the 81 most important bits), Class B bits
   (the 103 next most important bits), and Class C bits (the least
   important 60 bits). See Table 2 in [ref-AMR-FRM].

   Another significant difference between the two speech codecs is in
   regards to DTX operation. The SID frames are different. The SID frame
   for the EFR speech codec is defined in [ref-EFR-CN]. The SID frame
   for the AMR speech codec is defined in [ref-AMR-CN, ref-AMR-FRM].
   Also, the AMR speech codec has a SID_FIRST and SID_UPDATE frame (in
   addition to the SID frame) while the EFR speech codec does not.

   In light of these differences, the upshot of all this is that the
   EFR speech codec RTP payload format specified in this document is
   not based upon the AMR speech codec RTP payload format defined in
   [ref-AMR-RTP]. Instead, the EFR speech codec specified in this
   document closely resembles the EFR speech codec RTP payload format
   defined in [ref-EFR-RTP].

2.  Payload format

   As mentioned throughout this document, The EFR speech codec RTP
   payload format specified in this document closely resembles the EFR
   speech codec RTP payload format defined in [ref-EFR-RTP].



Barany et al.                                                   [PAGE 5]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


   The only difference is that the 4 bit signature (0xC, binary 1100)
   at the beginning of every buffer for the EFR speech codec RTP payload
   format defined in [ref-EFR-RTP] MUST be replaced by a 1 bit payload
   quality indicator Q followed by 3 reserved bits R. The payload
   quality indicator, if not set, indicates that the payload is severely
   damaged and the receiver should set the Bad Frame Indicator (BFI),
   see [ref-EFR-DTX], to either "Unusable frame" (for speech frames) or
   "Invalid SID frame" (for SID frames). The 3 reserved bits MUST be set
   to zero. All R bits MUST be ignored by the receiver.

   As is the case for the EFR speech RTP payload format defined in [ref-
   EFR-RTP], the bits in the buffer are numbered in the big-endian
   manner, starting from r1 (the MSB of the first octet) and finishing
   to r248 (the least significant bit of the last octet). Therefore, for
   the EFR speech codec RTP payload format specified in this document,
   the first octet in the buffer contains QRRR in its 4 MSBs as opposed
   to 1100 for the EFR speech codec RTP payload format defined in [ref-
   EFR-RTP].

3.  IANA considerations

   One new MIME sub-type as described in this section is to be
   registered.

   The MIME-name for the EFR speech codec is allocated from the IETF
   tree since the EFR speech codec may be a widely used speech codec for
   for GERAN packet-switched voice service using existing (i.e., legacy)
   GSM circuit-switched transceiver equipment.

4.  Security considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [ref-RTP], and any appropriate profile. This implies
   that confidentiality of the media streams is achieved by encryption.
   Because the data encoding used with this payload format is applied
   end-to-end, encryption may be performed after encoding so there is no
   conflict between the two operations.

   A potential denial-of-service threat exists for data encodings using
   receiver side decoding.  The attacker can inject pathological
   datagrams into the stream, which are complex to decode and cause the
   receiver to be overloaded.  The decoder software should consider this
   possibility and take the necessary precautions.

   As with any IP-based protocol, in some circumstances, a receiver may
   be overloaded simply by the receipt of too many packets, either
   desired or undesired.  Network-layer authentication may be used to


Barany et al.                                                   [PAGE 6]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


   discard packets from undesired sources, but the processing cost of
   the authentication itself may be too high.

5.  MIME type registration

   Media Type name:      audio

   Media subtype name:   GERAN-EFR

   Required parameters:  none

   Optional parameters:  none

   Encoding considerations: See Section 2 of this document.

   Security considerations: See Section 4 of this document.

   Intended usage: COMMON

5.1.  Mapping to SDP parameters

   Example of usage of EFR speech codec in SDP [ref-SDP], possible GERAN
   "optimized voice bearer" service that utilizes existing (i.e.,
   legacy) GSM circuit-switched transceiver equipment:

      m=audio 49120 RTP/AVP 97
      a=rtpmap:97 GERAN-EFR/8000

6.  References

   [ref-RFC-2119] RFC 2119 "Key Words for Use in RFCs to Indicate
                  Requirement Levels".

   [ref-EFR-RTP]  TS 101 318 "Using GSM Speech Codecs Within ITU-T
                  Recommendation H.323".

   [ref-EFR-ERR]  3GPP TS 46.061 "Substitution and muting of lost frames
                  for Enhanced Full Rate (EFR) Speech Traffic Channels".

   [ref-EFR-COD]  3GPP TS 46.060 "Enhanced Full Rate (EFR) Speech
                  Transcoding".

   [ref-EFR-DTX]  3GPP TS 46.081 "Discontinuous Transmission (DTX) for
                  Enhanced Full Rate (EFR) Speech Traffic Channels".

   [ref-RTP-PROF] draft-ietf-avt-profile-new-11.txt "RTP Profile for
                  Audio and Video Conferences with Minimal Control".



Barany et al.                                                   [PAGE 7]


INTERNET-DRAFT        RTP Payload Format for EFR       November 14, 2001


   [ref-RTP-MIME] draft-ietf-avt-rtp-mime-05.txt "MIME Type Registration
                  of RTP Payload Formats".

   [ref-GERAN]    3GPP TS 43.051 "GSM/EDGE Radio Access Network (GERAN);
                  Overall Description-Stage 2".

   [ref-EFR-CH]   3GPP TS 45.003 "Channel Coding".

   [ref-AMR-COD]  3GPP TS 26.090 "AMR Speech Codec; Transcoding
                  Functions".

   [ref-AMR-FRM]  3GPP TS 26.101 "AMR Speech Codec Frame Structure".

   [ref-EFR-CN]   3GPP TS 46.062 "Comfort noise aspects for Enhanced
                  Full Rate (EFR) Speech Traffic Channels".

   [ref-AMR-CN]   3GPP TS 26.092 "AMR Speech Codec; Comfort Noise
                  Aspects".

   [ref-AMR-RTP]  draft-ietf-avt-rtp-amr-10.txt "RTP Payload Format and
                  File Storage Format for AMR and AMR-WB Audio".

   [ref-RTP]      draft-ietf-avt-rtp-new-10.txt " RTP: A Transport
                  Protocol for Real-Time Applications".

   [ref-SDP]      draft-ietf-mmusic-sdp-new-03.txt " SDP: Session
                  Description Protocol".

7.  Authors' Addresses

   Peter Barany                   Tel: +1 972 685 2471
   Nortel Networks                EMail: pbarany@nortelnetworks.com
   2201 Lakeside Boulevard
   Richardson, Texas 75083
   United States of America

   William Navarro                Tel: +33 1 39 44 57 56
   Nortel Networks                EMail: navarro@nortelnetworks.com
   19, Avenue du Centre
   Montigny-le-Bretonneaux - PC CT111
   78928 Yvelines Cedex 9
   France


Barany et al.                                                   [PAGE 8]