Internet Engineering Task Force                                B. Foster
Internet Draft                                                  R. Kumar
Document: <draft-foster-mmusic-vbdformat-01.txt>            F. Andreasen
Category: Informational                                    Cisco Systems
Expires: September 1, 2002                                  March 1 2002


                      Voice-Band Data Media Format

Status of this Document

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet- Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.

1. Abstract

Voice-band data (fax and modem) traffic can often require different processing
and as such, the ability to specify a different payload type when passing this
type of traffic is important. This document defines a MIME type, audio/vbd for
voiceband data media, and a specific "fmtp" parameter for specifying the
underlying encoding.


2. Conventions used in this document


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC-2119.


3. Introduction

There are a number of ways of passing modem and fax traffic over an
IP network. One approach is to simply pass it in-band. Other
approaches involve terminating the fax/modem at each end and relaying
the data in some fashion. Either approach may be valid depending on

Foster, et al                Informational                          1
                      Voice-Band Data Media Format             March 2002


the processing capability of the gateway, characteristics of the
network etc.

This document is specifically concerned with the approach of passing
modem and fax traffic in-band. Because voice-band data has
distinctly different characteristics from voice, it is often
important to be able to distinguish this difference by indicating an
associated media format. This allows the receiver of the media to
process the packets differently.


4. Rationale for distinct Voiceband Data payload types

The rationale for distinguishing between a payload type associated with voice
and a payload type associated with voiceband data is twofold:

* At the receiver, voiceband data traffic is found to work best with fixed-size
jitter buffers, while adaptive jitter buffers are optimal for voice.

* Packet loss concealment algorithms are the receiver are suitable for voice,
but not for voiceband data.

For discrimination between voice and voiceband data and to allow different
processing at a receiver, separate payload types must be used even if the
underlying encoding is the same e.g. PCMU for voice and voiceband data. To this
end, a new RTP audio encoding name, to be registered as the MIME type audio/vbd
is defined. For a session, this encoding name could be dynamically mapped into
one or more payload types; this is true for any encoding. Each payload type
associated with the encoding "vbd" can have a separate format, specified through
a 'fmtp' attribute, indicating a different underlying base encoding (e.g. PCMU,
PCMA, G726-32, G726-40).

This document proposes the use of dynamic payload types for voiceband data that
are distinct from the payload types, static or dynamic, for voice even if the
underlying encoding algorithms are the same. This is to enable different,
voiceband data-specific receiver processing. For a given encoding algorithm, a
receiver may include  both in the media (m=) line in SDP. If it intends to
support the encoding algorithm for voiceband data but not for voice, it should
not include the applicable voice payload type in the 'm=' line.

5. Proposed representation in SDP

The encoding name, "vbd",  may be dynamically associated with one or more RTP
payload types. Using the "fmtp" SDP attribute, each "vbd" payload type is
associated with an underlying encoding. Thus,

    a=rtpmap:<vbd dynamic payload type> vbd/<clock rate>
     a=fmtp:<vbd dynamic payload type> <non-vbd audio payload type>

indicates a dynamic payload type to be associated with the codec "vbd". The fmtp
attribute indicates the underlying audio encoding associated with the "vbd"
codec. The audio encoding used by the "vbd" codec may be represented by either a

Foster, et al                Informational                          2
                      Voice-Band Data Media Format             March 2002


static or dynamic payload type. Note that it is possible to specify multiple
"vbd" payload types, each with a different "fmtp" value and, therefore, a
different audio encoding.

An example media description in SDP might be:

     m=audio 3456 RTP/AVP 15 98 99
     a=rtpmap:98 vbd/8000
    a=fmtp:98 0
     a=rtpmap:99 vbd/8000
    a=fmtp:99 8

This specifies dynamic RTP payload types 98 and 99 as being "vbd" codecs.
Further, it specifies that the vbd codec associated with payload type 98 uses an
underlying PCMU codec format (indicated by the static payload type 0). It also
specifies that payload type 99 has an underlying format of PCMA, (indicated by
the static payload type 8).

Note that the payload types 0 (PCMU) and 8 (PCMA) do not appear in the media
line in this case. The only permitted voice encoding is G728 (payload type 15).

The audio encoding underlying the voiceband data might also be represented by a
dynamic payload type, as in the following segment:

     m=audio 3456 RTP/AVP 15 98
     a=rtpmap:96 G726-40/8000
     a=rtpmap:98 vbd/8000
    a=fmtp:98 96

Again, the dynamic payload type of 96 does not appear in the media line in this
case. However, it is used to bind G726-40 as the underlying encoding algorithm
for the payload type of 98, used in voiceband data packets.

When both voice and voiceband data payload types are distinctly earmarked for a
session at session establishment, a transmitter may switch from a voice payload
type (15 in the example above) to a voiceband data payload type (98 in the
example above) when it detects an appropriate event such as an ANS or ANSAM as
defined in V.25 [1] and V.8 [2] respectively. When the receiving gateway or
endpoint sees a voiceband data payload type (98 in the example above), it
recognizes this as a  voiceband data codec (with G726-40 encoding) and adjusts
the jitter buffer accordingly.

The packet format defined in RFC 2198  can be used with a voiceband data codec
for greater reliability by virtue of redundant transmission. A dynamic payload
type is defined for the encoding name "red". The encapsulated voiceband data
packets are, in this case, staggered in time (earlier and later packets combined
in an RFC 2198 composite packet). In the following example media description:

     m=audio 3456 RTP/AVP  15 98 100
     a=rtpmap:98 vbd/8000
    a=fmtp:98 0
    a=rtpmap:100 red/8000

Foster, et al                Informational                          3
                      Voice-Band Data Media Format             March 2002


    a=fmtp:100 98/98

a dynamic payload type of 100 is associated with RFC 2198 packets. A 'fmtp' line
indicates that these RFC 2198 packets encapsulate two voiceband data payloads,
each with payload type  98.  The encapsulated packets are  staggered in time
(i.e. earlier and later packets combined in an RFC 2198 composite packet).

A "vbd" payload type is negotiated like any other codec type. For symmetric
connections that can be transitioned to a specific voiceband data payload type,
both  ends  must  declare  support  for  that  payload  type.  For  backward
compatibility, if this codec type ("vbd") is not bound to a connection, then
suitable voice payload types may be used for voiceband data.

6. Other Characteristics of Voiceband Data Sessions

This section is informational and is intended to elaborate on other differences
between voice and voiceband data traffic.

*    Silence suppression can be used with voice, but not with voiceband data
     which requires a continuous carrier signal.

*    Since voiceband data has a much lower distortion tolerance, it requires an
     audio encoding algorithm in which DC removal filters are absent. Examples
     of suitable schemes are PCM (ITU G.711) and 32 kbps/40 kbps ADPCM (ITU
     G.726). By contrast, many more encoding algorithms are available for voice
     traffic.  Note:  this  document  does  not  intend  to  list  all  encoding
     algorithms suitable for voiceband data.

7. Proposed Registration of MIME media type audio/vbd

MIME media type name: audio

MIME subtype name: vbd

Required parameters:

rate: The RTP timestamp clock rate, which is equal to the sampling rate.  The
typical rate is 8000, but other rates may be specified.

baseAlgorithm: The encoding scheme, such as PCMU, PCMA, G.726-32, G726-40 etc.,
used. No MIME parameters are inherited.

Optional parameters: channels, ptime, maxptime (Refer to Ref. 7).

Encoding considerations:
This type is only defined for transfer via RTP.

Security considerations: See Section 5 of Ref. 7.

Interoperability considerations: none


Foster, et al                Informational                          4
                      Voice-Band Data Media Format             March 2002


Published specification: The RFC that will evolve out of this document.

Applications which use this media type:
Audio and video streaming and conferencing tools.

Additional information: none

Intended usage: Modulated facsimile and modem signals that benefit from special
handling e.g. jitter buffer adjustment at a receiver.

Additional information:

1. Magic number(s): N/A

2. File extension(s): N/A

3. Macintosh file type code: N/A

Author/Change controller:
Bill Foster, Rajesh Kumar and Flemming Andreasen
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95134-1706
bfoster@cisco.com, rkumar@cisco.com, fandreas@cisco.com


8. References

  [1]     ITU-T, V.25 specification.

  [2]     ITU-T, V.8 Specification.

  [3]     M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC
          2327.

  [4]     H. Schulzrinne, RTP Profile for Audio and Video Conferences with
          Minimal Control, RFC 1890.

  [5]     http://www.iana.org/assignments/rtp-parameters.

  [6]     C. Perkins et al, RTP payload for redundant audio data, RFC 2198.

  [7]     The RFC that will come out of draft-ietf-avt-rtp-mime-06.txt, Casner,
          S. and Hoschka, P.




9. Author's Addresses

  Flemming Andreasen
  Cisco Systems

Foster, et al                Informational                          5
                      Voice-Band Data Media Format             March 2002


  499 Thornall Street, 8th Floor
  Edison, NJ 08837
  Phone: +1 732 452 1667
  Email: fandreas@cisco.com


  Bill Foster
  Cisco Systems
  Phone: +1 250 758-9418
  Email: bfoster@cisco.com


  Rajesh Kumar
  Cisco Systems
  170 West Tasman Dr
  San Jose, CA
  Phone: +1 408 527 0811
  Email: rkumar@cisco.com


7. Full Copyright Statement

  Copyright (C) The Internet Society (2001).  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works.  However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet Standards process must be
  followed, or as required to translate it into languages other than
  English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

  Acknowledgement

  Funding for the RFC Editor function is currently provided by the
  Internet Society.

Foster, et al                Informational                          6
                      Voice-Band Data Media Format             March 2002




Foster, et al                Informational                          7