RTP Payload Format for European Telecommunications Standards Institute (ETSI) European Standard ES 201 108 Distributed Speech Recognition Encoding
RFC 3557
Document | Type |
RFC - Proposed Standard
(July 2003; No errata)
Was draft-ietf-avt-dsr (avt WG)
|
|
---|---|---|---|
Author | Qiaobing Xie | ||
Last updated | 2013-03-02 | ||
Stream | Internent Engineering Task Force (IETF) | ||
Formats | plain text html pdf htmlized (tools) htmlized bibtex | ||
Stream | WG state | (None) | |
Document shepherd | No shepherd assigned | ||
IESG | IESG state | RFC 3557 (Proposed Standard) | |
Action Holders |
(None)
|
||
Consensus Boilerplate | Unknown | ||
Telechat date | |||
Responsible AD | Allison Mankin | ||
IESG note | A well-constructed payload with just the need of a few grammar edits. SOB has a Discuss to check if there are subtle possible interactions with speechsc. Dave Oran promises any comments quickly. Otherwise the payload was ok with the IESG. | ||
Send notices to | <csp@csperkins.org>, <magnus.westerlund@ericsson.com> |
Network Working Group Q. Xie, Ed. Request for Comments: 3557 Motorola, Inc. Category: Standards Track July 2003 RTP Payload Format for European Telecommunications Standards Institute (ETSI) European Standard ES 201 108 Distributed Speech Recognition Encoding Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document specifies an RTP payload format for encapsulating European Telecommunications Standards Institute (ETSI) European Standard (ES) 201 108 front-end signal processing feature streams for distributed speech recognition (DSR) systems. Xie Standards Track [Page 1] RFC 3557 RTP Payload Format for DSR ES 201 108 July 2003 Table of Contents 1. Conventions and Acronyms . . . . . . . . . . . . . . . . . . . 2 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1. ETSI ES 201 108 DSR Front-end Codec. . . . . . . . . . . 3 2.2. Typical Scenarios for Using DSR Payload Format . . . . . 4 3. ES 201 108 DSR RTP Payload Format. . . . . . . . . . . . . . . 5 3.1. Consideration on Number of FPs in Each RTP Packet. . . . 6 3.2. Support for Discontinuous Transmission . . . . . . . . . 6 4. Frame Pair Formats . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Format of Speech and Non-speech FPs. . . . . . . . . . . 7 4.2. Format of Null FP. . . . . . . . . . . . . . . . . . . . 8 4.3. RTP header usage . . . . . . . . . . . . . . . . . . . . 8 5. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . 9 5.1. Mapping MIME Parameters into SDP . . . . . . . . . . . . 10 6. Security Considerations. . . . . . . . . . . . . . . . . . . . 11 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 11 8. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . 12 10. IPR Notices. . . . . . . . . . . . . . . . . . . . . . . . . . 12 11. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 13 12. Editor's Address . . . . . . . . . . . . . . . . . . . . . . . 14 13. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 15 1. Conventions and Acronyms The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The following acronyms are used in this document: DSR - Distributed Speech Recognition ETSI - the European Telecommunications Standards Institute FP - Frame Pair DTX - Discontinuous Transmission 2. Introduction Motivated by technology advances in the field of speech recognition, voice interfaces to services (such as airline information systems, unified messaging) are becoming more prevalent. In parallel, the popularity of mobile devices has also increased dramatically. Xie Standards Track [Page 2] RFC 3557 RTP Payload Format for DSR ES 201 108 July 2003 However, the voice codecs typically employed in mobile devices were designed to optimize audible voice quality and not speech recognition accuracy, and using these codecs with speech recognizers can result in poor recognition performance. For systems that can be accessed from heterogeneous networks using multiple speech codecs, recognition system designers are further challenged to accommodate the characteristics of these differences in a robust manner. Channel errors and lost data packets in these networks result in further degradation of the speech signal. In traditional systems as described above, the entire speech recognizer lies on the server. It is forced to use incoming speech in whatever condition it arrives after the network decodes the vocoded speech. To address this problem, we use a distributed speech recognition (DSR) architecture. In such a system, the remote device acts as a thin client, also known as the front-end, in communication with a speech recognition server, also called a speech engine. The remote device processes the speech, compresses the data, and addsShow full document text