Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources
RFC 4313

Document Type RFC - Informational (December 2005; No errata)
Last updated 2013-03-02
Stream IETF
Formats plain text pdf html
Stream WG state (None)
Consensus Unknown
Document shepherd No shepherd assigned
IESG IESG state RFC 4313 (Informational)
Telechat date
Responsible AD Jon Peterson
Send notices to <oran@cisco.com>, <eburger@snowshore.com>
Network Working Group                                            D. Oran
Request for Comments: 4313                           Cisco Systems, Inc.
Category: Informational                                    December 2005

                Requirements for Distributed Control of
                  Automatic Speech Recognition (ASR),
       Speaker Identification/Speaker Verification (SI/SV), and
                     Text-to-Speech (TTS) Resources

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document outlines the needs and requirements for a protocol to
   control distributed speech processing of audio streams.  By speech
   processing, this document specifically means automatic speech
   recognition (ASR), speaker recognition -- which includes both speaker
   identification (SI) and speaker verification (SV) -- and
   text-to-speech (TTS).  Other IETF protocols, such as SIP and Real
   Time Streaming Protocol (RTSP), address rendezvous and control for
   generalized media streams.  However, speech processing presents
   additional requirements that none of the extant IETF protocols
   address.

Table of Contents

   1. Introduction ....................................................3
      1.1. Document Conventions .......................................3
   2. SPEECHSC Framework ..............................................4
      2.1. TTS Example ................................................5
      2.2. Automatic Speech Recognition Example .......................6
      2.3. Speaker Identification example .............................6
   3. General Requirements ............................................7
      3.1. Reuse Existing Protocols ...................................7
      3.2. Maintain Existing Protocol Integrity .......................7
      3.3. Avoid Duplicating Existing Protocols .......................7
      3.4. Efficiency .................................................8
      3.5. Invocation of Services .....................................8
      3.6. Location and Load Balancing ................................8

Oran                         Informational                      [Page 1]
RFC 4313          Speech Services Control Requirements     December 2005

      3.7. Multiple Services ..........................................8
      3.8. Multiple Media Sessions ....................................8
      3.9. Users with Disabilities ....................................9
      3.10. Identification of Process That Produced Media or
            Control Output ............................................9
   4. TTS Requirements ................................................9
      4.1. Requesting Text Playback ...................................9
      4.2. Text Formats ...............................................9
           4.2.1. Plain Text ..........................................9
           4.2.2. SSML ................................................9
           4.2.3. Text in Control Channel ............................10
           4.2.4. Document Type Indication ...........................10
      4.3. Control Channel ...........................................10
      4.4. Media Origination/Termination by Control Elements .........10
      4.5. Playback Controls .........................................10
      4.6. Session Parameters ........................................11
      4.7. Speech Markers ............................................11
   5. ASR Requirements ...............................................11
      5.1. Requesting Automatic Speech Recognition ...................11
      5.2. XML .......................................................11
      5.3. Grammar Requirements ......................................12
           5.3.1. Grammar Specification ..............................12
           5.3.2. Explicit Indication of Grammar Format ..............12
           5.3.3. Grammar Sharing ....................................12
      5.4. Session Parameters ........................................12
      5.5. Input Capture .............................................12
   6. Speaker Identification and Verification Requirements ...........13
      6.1. Requesting SI/SV ..........................................13
      6.2. Identifiers for SI/SV .....................................13
      6.3. State for Multiple Utterances .............................13
      6.4. Input Capture .............................................13
      6.5. SI/SV Functional Extensibility ............................13
   7. Duplexing and Parallel Operation Requirements ..................13
      7.1. Full Duplex Operation .....................................14
Show full document text