RTP Payload Format for European Telecommunications Standards Institute (ETSI) European Standard ES 201 108 Distributed Speech Recognition Encoding
RFC 3557

Document Type RFC - Proposed Standard (July 2003; No errata)
Last updated 2013-03-02
Stream IETF
Formats plain text pdf htmlized bibtex
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 3557 (Proposed Standard)
Consensus Boilerplate Unknown
Telechat date
Responsible AD Allison Mankin
IESG note A well-constructed payload with just the need of a few grammar edits.  SOB has a Discuss to check if there are subtle possible interactions with speechsc. Dave Oran promises any comments quickly.  Otherwise the payload was ok with the IESG.
Send notices to <csp@csperkins.org>, <magnus.westerlund@ericsson.com>
Network Working Group                                        Q. Xie, Ed.
Request for Comments: 3557                                Motorola, Inc.
Category: Standards Track                                      July 2003

                         RTP Payload Format for
European Telecommunications Standards Institute (ETSI) European Standard
           ES 201 108 Distributed Speech Recognition Encoding

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

Abstract

   This document specifies an RTP payload format for encapsulating
   European Telecommunications Standards Institute (ETSI) European
   Standard (ES) 201 108 front-end signal processing feature streams for
   distributed speech recognition (DSR) systems.

Xie                         Standards Track                     [Page 1]
RFC 3557         RTP Payload Format for DSR ES 201 108         July 2003

Table of Contents

   1.  Conventions and Acronyms . . . . . . . . . . . . . . . . . . .  2
   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
       2.1.  ETSI ES 201 108 DSR Front-end Codec. . . . . . . . . . .  3
       2.2.  Typical Scenarios for Using DSR Payload Format . . . . .  4
   3.  ES 201 108 DSR RTP Payload Format. . . . . . . . . . . . . . .  5
       3.1.  Consideration on Number of FPs in Each RTP Packet. . . .  6
       3.2.  Support for Discontinuous Transmission . . . . . . . . .  6
   4.  Frame Pair Formats . . . . . . . . . . . . . . . . . . . . . .  7
       4.1.  Format of Speech and Non-speech FPs. . . . . . . . . . .  7
       4.2.  Format of Null FP. . . . . . . . . . . . . . . . . . . .  8
       4.3.  RTP header usage . . . . . . . . . . . . . . . . . . . .  8
   5.  IANA Considerations. . . . . . . . . . . . . . . . . . . . . .  9
       5.1.  Mapping MIME Parameters into SDP . . . . . . . . . . . . 10
   6.  Security Considerations. . . . . . . . . . . . . . . . . . . . 11
   7.  Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 11
   8.  Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 11
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
       9.1.  Normative References . . . . . . . . . . . . . . . . . . 11
       9.2.  Informative References . . . . . . . . . . . . . . . . . 12
   10. IPR Notices. . . . . . . . . . . . . . . . . . . . . . . . . . 12
   11. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 13
   12. Editor's Address . . . . . . . . . . . . . . . . . . . . . . . 14
   13. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 15

1.  Conventions and Acronyms

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   The following acronyms are used in this document:

   DSR  - Distributed Speech Recognition

   ETSI - the European Telecommunications Standards Institute

   FP   - Frame Pair

   DTX  - Discontinuous Transmission

2.  Introduction

   Motivated by technology advances in the field of speech recognition,
   voice interfaces to services (such as airline information systems,
   unified messaging) are becoming more prevalent.  In parallel, the
   popularity of mobile devices has also increased dramatically.

Xie                         Standards Track                     [Page 2]
RFC 3557         RTP Payload Format for DSR ES 201 108         July 2003

   However, the voice codecs typically employed in mobile devices were
   designed to optimize audible voice quality and not speech recognition
   accuracy, and using these codecs with speech recognizers can result
   in poor recognition performance.  For systems that can be accessed
   from heterogeneous networks using multiple speech codecs, recognition
   system designers are further challenged to accommodate the
   characteristics of these differences in a robust manner.  Channel
   errors and lost data packets in these networks result in further
   degradation of the speech signal.

   In traditional systems as described above, the entire speech
   recognizer lies on the server.  It is forced to use incoming speech
   in whatever condition it arrives after the network decodes the
   vocoded speech.  To address this problem, we use a distributed speech
   recognition (DSR) architecture.  In such a system, the remote device
   acts as a thin client, also known as the front-end, in communication
   with a speech recognition server, also called a speech engine.  The
   remote device processes the speech, compresses the data, and adds
Show full document text