AVT                                                            J. Lennox
Internet-Draft                                                     Vidyo
Intended status: Standards Track                           June 17, 2009
Expires: December 19, 2009

 A Real-Time Transport Protocol (RTP) Extension Header for Audio Level

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.  This document may contain material
   from IETF Documents or IETF Contributions published or made publicly
   available before November 10, 2008.  The person(s) controlling the
   copyright in some of this material may not have granted the IETF
   Trust the right to allow modifications of such material outside the
   IETF Standards Process.  Without obtaining an adequate license from
   the person(s) controlling the copyright in such materials, this
   document may not be modified outside the IETF Standards Process, and
   derivative works of it may not be created outside the IETF Standards
   Process, except to format it for publication as an RFC or to
   translate it into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on December 19, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Lennox                  Expires December 19, 2009               [Page 1]

Internet-Draft    RTP Extension Header for Audio Level         June 2009

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.


   This document defines a mechanism by which packets of Real-Time
   Transport Protocol (RTP) audio streams can indicate, in an RTP
   extension header, the audio level of the audio sample carried in the
   RTP packet.  In large conferences, this can reduce the load on an
   audio mixer or other middlebox which wants to forward only a few of
   the loudest audio streams, without requiring it to decode and measure
   every stream that is received.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  Audio Levels  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   4.  Signaling (Setup) Information . . . . . . . . . . . . . . . . . 5
   5.  Security Considerations . . . . . . . . . . . . . . . . . . . . 5
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
     7.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
     7.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
   Appendix A.  Open issues  . . . . . . . . . . . . . . . . . . . . . 7
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 7

Lennox                  Expires December 19, 2009               [Page 2]

Internet-Draft    RTP Extension Header for Audio Level         June 2009

1.  Introduction

   In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio
   conference, an audio mixer or forwarder receives audio streams from
   many or all of the conference participants.  It then selectively
   forwards some of them to other participants in the conference.  In
   large conferences, it is possible that such a server might be
   receiving a large number of streams, of which only a few should be
   forwarded to the other conference participants.

   In such a scenario, in order to pick the audio streams to forward, a
   centralized server needs to decode, measure audio levels, and
   possibly perform voice activity detection on audio data from a large
   number of streams.  The need for such processing limits the size or
   number of conferences such a server can support.

   As an alternative, this document defines an RTP header extension
   [RFC5285] through which senders of audio packets can indicate the
   audio level of the packets' payload, reducing the processing load for
   a server.

   The header extension in this draft is different to, but complementary
   with, the one defined in [I-D.ivov-avt-slic], which defines a
   mechanism by which audio mixers can indicate the relative levels of
   the contributing sources that made up the mixed audio.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [RFC2119] and
   indicate requirement levels for compliant implementations.

3.  Audio Levels

   The audio level extension header carries both the level of the audio
   carried in the RTP payload of the packet it is associated with, as
   well as an indication as to whether voice activity has been detected
   in the packet.

   The form of the audio level extension block is as follows:

Lennox                  Expires December 19, 2009               [Page 3]

Internet-Draft    RTP Extension Header for Audio Level         June 2009

          0                   1                   2
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
         |  ID   | len=1 |0| level       |V|  reserved   |

                                 Figure 1

   The length field takes the value 1 to indicate that 2 bytes follow.

   The audio level is defined in the same manner as is audio noise level
   in the RTP Comfort Noise [RFC3389] specification.  In that
   specification, the overall magnitude of the noise level is encoded
   into the first byte of the payload, with spectral information about
   the noise in subsequent bytes.  This specification's audio level
   parameter is defined so as to be identical to the comfort noise
   payload's noise-level byte.

   The magnitude of the audio level is packed into the least significant
   bits of the first payload byte of the extension header, with the most
   significant bit unused and set to 0 as shown in Figure 1.  The least
   significant bit of the audio level magnitude is packed into the least
   significant bit of the byte.

   The audio level is expressed in -dBov, with values from 0 to 127
   representing 0 to -127 dBov. dBov is the level, in decibels, relative
   to the overload point of the system, i.e. the maximum-amplitude
   signal that can be handled by the system without clipping.  (Note:
   Representation relative to the overload point of a system is
   particularly useful for digital implementations, since one does not
   need to know the relative calibration of the analog circuitry.)  For
   example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the
   0 dBov reference would be a square wave with values +/- 8031.  (This
   translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table
   6 of G.711.)

   In addition, a flag byte carries bits providing additional
   information about the audio payload carried in the media packet.  At
   this time only a single bit is defined.  The V bit indicates whether
   the encoder believes the audio packet contains voice activity (1) or
   does not (0).  The voice activity detection algorithm is unspecified
   and left implementation-specific.

   The other bits of the flag byte are reserved.  They SHOULD be set to
   zero by senders and ignored by receivers.

   When this extension header is used with RTP data sent using the RTP

Lennox                  Expires December 19, 2009               [Page 4]

Internet-Draft    RTP Extension Header for Audio Level         June 2009

   Payload for Redundant Audio Data [RFC2198], the header's data
   describes the contents of the primary encoding.

4.  Signaling (Setup) Information

   The URI for declaring this header extension in an extmap attribute is
   "urn:ietf:params:rtp-hdrext:audio-level".  There is no additional
   setup information needed for this extension (no extensionattributes).

5.  Security Considerations

   A malicious endpoint could choose to set the values in this extension
   header falsely, so as to falsely claim that audio or voice is or is
   not present.  It is not clear what could be gained by falsely
   claiming that audio is not present, but an endpoint falsely claiming
   that audio is present could perform a denial-of-service attack on an
   audio conference, so as to send silence to suppress other conference
   members' audio.  Thus, a device relying on audio level data from
   untrusted endpoints SHOULD periodically audit the level information
   transmitted, taking appropriate corrective action if endpoints appear
   to be sending incorrect data.

   In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
   extension headers are authenticated but not encrypted.  When this
   extension header is used, audio levels are therefore visible on a
   packet-by-packet basis to an attacker passively observing the audio
   stream.  As discussed in [I-D.perkins-avt-srtp-vbr-audio], such an
   attacker can infer a great deal of information about the
   conversation, often with phoneme-level resolution.  In scenarios
   where this is a concern, additional mechanisms SHOULD be used to
   protect the confidentiality of the extension header.

6.  IANA Considerations

   This document defines a new extension URI to the RTP Compact Header
   Extensions subregistry of the Real-Time Transport Protocol (RTP)
   Parameters registry, according to the following data:

   Extension URI:  urn:ietf:params:rtp-hdrext:audio-level
   Description:  Audio Level
   Contact:  jonathan@vidyo.com

Lennox                  Expires December 19, 2009               [Page 5]

Internet-Draft    RTP Extension Header for Audio Level         June 2009

   Reference:  RFC XXXX

7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
              September 1997.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
              Header Extensions", RFC 5285, July 2008.

7.2.  Informative References

              Ivov, E. and E. Marocco, "Delivering Conference
              Participant Sound Level Indicators in RTP Streams",
              draft-ivov-avt-slic-00 (work in progress), June 2009.

              Perkins, C., "Guidelines for the use of Variable Bit Rate
              Audio with Secure RTP",
              draft-perkins-avt-srtp-vbr-audio-00 (work in progress),
              March 2009.

              International Telecommunications Union, "Pulse code
              modulation (PCM) of voice frequencies", ITU-
              T Recommendation G.711, November 1988.

   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
              Comfort Noise (CN)", RFC 3389, September 2002.

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

Lennox                  Expires December 19, 2009               [Page 6]

Internet-Draft    RTP Extension Header for Audio Level         June 2009

Appendix A.  Open issues

   o  Should this draft be merged with [I-D.ivov-avt-slic]?
   o  Would it be useful to add a fractional part to the audio level,
      e.g., to describe the audio level in an 8+8 fixed-point format?
      Due to the format of RTP extension headers, a third byte for the
      extension header is essentially "free" if no other RTP extension
      headers are in use.
   o  Are any other bits useful in the flag byte?
   o  Is there any compelling use case for providing the audio level
      without voice detection information, and if so, should the two
      pieces of information be separated?

Author's Address

   Jonathan Lennox
   Vidyo, Inc.
   433 Hackensack Avenue
   Sixth Floor
   Hackensack, NJ  07601

   Email: jonathan@vidyo.com

Lennox                  Expires December 19, 2009               [Page 7]