datatracker.ietf.org
Sign in
Version 5.6.2.p3, 2014-07-31
Report a bug

Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources
RFC 4313

Document type: RFC - Informational (December 2005; No errata)
Document stream: IETF
Last updated: 2013-03-02
Other versions: plain text, pdf, html

IETF State: (None)
Consensus: Unknown
Document shepherd: No shepherd assigned

IESG State: RFC 4313 (Informational)
Responsible AD: Jon Peterson
Send notices to: <oran@cisco.com>, <eburger@snowshore.com>

Network Working Group                                            D. Oran
Request for Comments: 4313                           Cisco Systems, Inc.
Category: Informational                                    December 2005

                Requirements for Distributed Control of
                  Automatic Speech Recognition (ASR),
       Speaker Identification/Speaker Verification (SI/SV), and
                     Text-to-Speech (TTS) Resources

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document outlines the needs and requirements for a protocol to
   control distributed speech processing of audio streams.  By speech
   processing, this document specifically means automatic speech
   recognition (ASR), speaker recognition -- which includes both speaker
   identification (SI) and speaker verification (SV) -- and
   text-to-speech (TTS).  Other IETF protocols, such as SIP and Real
   Time Streaming Protocol (RTSP), address rendezvous and control for
   generalized media streams.  However, speech processing presents
   additional requirements that none of the extant IETF protocols
   address.

Table of Contents

   1. Introduction ....................................................3
      1.1. Document Conventions .......................................3
   2. SPEECHSC Framework ..............................................4
      2.1. TTS Example ................................................5
      2.2. Automatic Speech Recognition Example .......................6
      2.3. Speaker Identification example .............................6
   3. General Requirements ............................................7
      3.1. Reuse Existing Protocols ...................................7
      3.2. Maintain Existing Protocol Integrity .......................7
      3.3. Avoid Duplicating Existing Protocols .......................7
      3.4. Efficiency .................................................8
      3.5. Invocation of Services .....................................8
      3.6. Location and Load Balancing ................................8

Oran                         Informational                      [Page 1]
RFC 4313          Speech Services Control Requirements     December 2005

      3.7. Multiple Services ..........................................8
      3.8. Multiple Media Sessions ....................................8
      3.9. Users with Disabilities ....................................9
      3.10. Identification of Process That Produced Media or
            Control Output ............................................9
   4. TTS Requirements ................................................9
      4.1. Requesting Text Playback ...................................9
      4.2. Text Formats ...............................................9
           4.2.1. Plain Text ..........................................9
           4.2.2. SSML ................................................9
           4.2.3. Text in Control Channel ............................10
           4.2.4. Document Type Indication ...........................10
      4.3. Control Channel ...........................................10
      4.4. Media Origination/Termination by Control Elements .........10
      4.5. Playback Controls .........................................10
      4.6. Session Parameters ........................................11
      4.7. Speech Markers ............................................11
   5. ASR Requirements ...............................................11
      5.1. Requesting Automatic Speech Recognition ...................11
      5.2. XML .......................................................11
      5.3. Grammar Requirements ......................................12
           5.3.1. Grammar Specification ..............................12
           5.3.2. Explicit Indication of Grammar Format ..............12
           5.3.3. Grammar Sharing ....................................12
      5.4. Session Parameters ........................................12
      5.5. Input Capture .............................................12
   6. Speaker Identification and Verification Requirements ...........13
      6.1. Requesting SI/SV ..........................................13
      6.2. Identifiers for SI/SV .....................................13
      6.3. State for Multiple Utterances .............................13
      6.4. Input Capture .............................................13
      6.5. SI/SV Functional Extensibility ............................13
   7. Duplexing and Parallel Operation Requirements ..................13
      7.1. Full Duplex Operation .....................................14

[include full document text]