Audio/Video Transport (avt)                               H. Schulzrinne
Internet-Draft                                               Columbia U.
Expires: July 27, 2005                                        S. Petrack
                                                                   eDial
                                                               T. Taylor
                                                                  Nortel
                                                        January 23, 2005

   RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals
                      draft-ietf-avt-rfc2833bis-07

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on July 27, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This memo describes how to carry dual-tone multifrequency (DTMF)
   signaling, other tone signals and telephony events in RTP packets.
   This memo captures and expands upon the basic framework defined in


Schulzrinne, et al.       Expires July 27, 2005                 [Page 1]


Internet-Draft         Telephony Events and Tones           January 2005

   RFC 2833, but retains only the most basic event codes.  It is
   intended that other codes will be documented separately.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
     1.2   Overview . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.3   Potential Applications . . . . . . . . . . . . . . . . . .  5
     1.4   Events, States, Tone Patterns, and Voice Encoded Tones . .  6
   2.  RTP Payload Format for Named Telephone Events  . . . . . . . .  8
     2.1   Introduction . . . . . . . . . . . . . . . . . . . . . . .  8
     2.2   Use of RTP Header Fields . . . . . . . . . . . . . . . . .  8
       2.2.1   Timestamp  . . . . . . . . . . . . . . . . . . . . . .  8
       2.2.2   Marker Bit . . . . . . . . . . . . . . . . . . . . . .  8
     2.3   Payload Format . . . . . . . . . . . . . . . . . . . . . .  8
       2.3.1   Event Field  . . . . . . . . . . . . . . . . . . . . .  9
       2.3.2   E ("End") Bit  . . . . . . . . . . . . . . . . . . . .  9
       2.3.3   R Bit  . . . . . . . . . . . . . . . . . . . . . . . .  9
       2.3.4   Volume Field . . . . . . . . . . . . . . . . . . . . .  9
       2.3.5   Duration Field . . . . . . . . . . . . . . . . . . . .  9
     2.4   Optional MIME Parameters . . . . . . . . . . . . . . . . . 10
       2.4.1   Relationship to SDP  . . . . . . . . . . . . . . . . . 10
     2.5   Procedures . . . . . . . . . . . . . . . . . . . . . . . . 11
       2.5.1   Sending Procedures . . . . . . . . . . . . . . . . . . 11
       2.5.2   Receiving Procedures . . . . . . . . . . . . . . . . . 15
     2.6   Reliability  . . . . . . . . . . . . . . . . . . . . . . . 18
   3.  Specification of Event Codes For DTMF Events . . . . . . . . . 20
     3.1   DTMF Events  . . . . . . . . . . . . . . . . . . . . . . . 20
   4.  RTP Payload Format for Telephony Tones . . . . . . . . . . . . 22
     4.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . 22
     4.2   Examples of Common Telephone Tone Signals  . . . . . . . . 22
     4.3   Use of RTP Header Fields . . . . . . . . . . . . . . . . . 24
       4.3.1   Timestamp  . . . . . . . . . . . . . . . . . . . . . . 24
       4.3.2   Marker Bit . . . . . . . . . . . . . . . . . . . . . . 24
       4.3.3   Payload Format . . . . . . . . . . . . . . . . . . . . 24
       4.3.4   Optional MIME Parameters . . . . . . . . . . . . . . . 26
     4.4   Procedures . . . . . . . . . . . . . . . . . . . . . . . . 26
       4.4.1   Sending Procedures . . . . . . . . . . . . . . . . . . 26
       4.4.2   Receiving Procedures . . . . . . . . . . . . . . . . . 27
   5.  Application Considerations . . . . . . . . . . . . . . . . . . 28
     5.1   Considerations On Selection Of Packetization Period
           For Events . . . . . . . . . . . . . . . . . . . . . . . . 28
       5.1.1   Interactions To Be Considered  . . . . . . . . . . . . 28
     5.2   Examples . . . . . . . . . . . . . . . . . . . . . . . . . 30
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 39
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 40
     7.1   MIME Registration  . . . . . . . . . . . . . . . . . . . . 41


Schulzrinne, et al.       Expires July 27, 2005                 [Page 2]


Internet-Draft         Telephony Events and Tones           January 2005

       7.1.1   audio/telephone-event  . . . . . . . . . . . . . . . . 41
       7.1.2   audio/tone . . . . . . . . . . . . . . . . . . . . . . 42
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 44
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 45
     9.1   Normative References . . . . . . . . . . . . . . . . . . . 45
     9.2   Informative References . . . . . . . . . . . . . . . . . . 45
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 46
       Intellectual Property and Copyright Statements . . . . . . . . 48






















Schulzrinne, et al.       Expires July 27, 2005                 [Page 3]


Internet-Draft         Telephony Events and Tones           January 2005

1.  Introduction

1.1  Terminology

   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
   and "OPTIONAL" are to be interpreted as described in RFC 2119 [1] and
   indicate requirement levels for compliant implementations.

   This document uses the following abbreviations:

   ANSam  Answer tone (amplitude modulated) [18]
   DTMF  Dual Tone Multifrequency
   IVR   Integrated Voice Response unit
   PSTN  Public Switched (circuit) Telephone Network
   RTP  Real-time Transport Protocol [5]
   SDP  Session Description Protocol [3]

1.2  Overview

   This memo defines two RTP [5] payload formats, one for carrying
   dual-tone multifrequency (DTMF) digits and other line and trunk
   signals as events (Section 2), and a second one to describe general
   multi-frequency tones in terms only of their frequency and cadence
   (Section 4).  Separate RTP payload formats for telephony tone signals
   are desirable since low-rate voice codecs cannot be guaranteed to
   reproduce these tone signals accurately enough for automatic
   recognition.  In addition, tone properties such as the phase
   reversals in the ANSam tone will not survive speech coding.  Defining
   separate payload formats also permits higher redundancy while
   maintaining a low bit rate.  Finally, some telephony events such as
   "on-hook" occur out-of-band and cannot be transmitted as tones.

   The remainder of this section provides the motivation for defining
   the payload types described in this document.  Section 2 defines the
   payload format and associated procedures for use of named events.
   Section 3 describes the events for which event codes are defined in
   this document.  Section 4 describes the payload format and associated
   procedures for tone representations.  Section 5 discusses some points
   that implementations might take into account and provides examples.
   Section 6 deals with security considerations.  Section 7 defines the
   IANA requirements for registration of event codes for named telephone
   events, establishes the initial content of that registry, and
   provides the MIME media type registrations for the two payload
   formats.



Schulzrinne, et al.       Expires July 27, 2005                 [Page 4]


Internet-Draft         Telephony Events and Tones           January 2005

1.3  Potential Applications

   The payload formats described here may be useful in a number of
   different scenarios.

   On the sending side, there are two basic possibilities: either the
   sending side is an end system which originates the signals itself, or
   it is a gateway with the task of propagating incoming telephone
   signals into the Internet.

   On the receiving side there are more possibilities.  The first is
   that the receiver must propagate tone signalling accurately into the
   PSTN for machine consumption.  One example of this is a gateway
   passing DTMF tones to an IVR.  In this scenario, frequencies,
   amplitudes, tone durations, and the durations of pauses between tones
   are all significant, and individual tone signals must be delivered
   reliably and in order.

   In the second scenario, the receiver must play out tones for human
   consumption.  Typically, rather than a series of tone signals each
   with its own meaning, the content will consist of a single sequence
   of tones and possibly silence, played out continuously or repeated
   cyclically for some period of time.  Often the end of the tone
   playout will be triggered by an event fed back in the other
   direction, using either in- or out-of-band means.  Examples of this
   are dial tone or busy tone.

   The relationship between popsition in the network and the tones to be
   played out is a complicating factor in this scenario.  In the phone
   network, tones are generated at different places, depending on the
   switching technology and the nature of the tone.  This determines,
   for example, whether a person making a call to a foreign country
   hears her local tones she is familiar with or the tones as used in
   the country called.

   For analog lines, dial tone is always generated by the local switch.
   ISDN terminals may generate dial tone locally and then send a Q.931
   [16] SETUP message containing the dialed digits.  If the terminal
   just sends a SETUP message without any Called Party digits, then the
   switch does digit collection, provided by the terminal as KEYPAD
   messages, and provides dial tone over the B-channel.  The terminal
   can either use the audio signal on the B-channel or can use the Q.931
   messages to trigger locally generated dial tone.

   Ringing tone (also called ringback tone) is generated by the local
   switch at the callee, with a one-way voice path opened up as soon as
   the callee's phone rings.  (This reduces the chance of clipping the
   called party's response just after answer.  It also permits


Schulzrinne, et al.       Expires July 27, 2005                 [Page 5]


Internet-Draft         Telephony Events and Tones           January 2005

   pre-answer announcements or in-band call-progress indications to
   reach the caller before or in lieu of a ringing tone.) Congestion
   tone and special information tones can be generated by any of the
   switches along the way, and may be generated by the caller's switch
   based on ISUP messages received.  Busy tone is generated by the
   caller's switch, triggered by the appropriate ISUP message, for
   analog instruments, or the ISDN terminal.

   In the third scenario, an end system is directly connected to the
   Internet and does not need to generate tone signals again, so that
   time alignment and power levels are not relevant.  These systems rely
   on PSTN gateways or Internet end systems to generate DTMF events and
   do not perform their own audio waveform analysis.  An example of such
   a system is an Internet interactive voice-response (IVR) system.

   In circumstances where exact timing alignment between the audio
   stream and the DTMF digits or other events is not important and data
   is sent unicast, such as the IVR example mentioned earlier, it may be
   preferable to use a reliable control protocol rather than RTP
   packets.  In those circumstances, this payload format would not be
   used.

   Note that in a number of these cases it is possible that the gateway
   or end system will be both a sender and receiver of telephone
   signals.  Sometimes the same class of signals will be sent as
   received -- in the case of "RTP trunking" or voiceband data, for
   instance.  In other cases, such as that of an end system serving
   analogue lines, the signals sent will be in a different class from
   those received.

1.4  Events, States, Tone Patterns, and Voice Encoded Tones

   This document provides the means for in-band transport over the
   Internet of two broad classes of signalling information: in-band
   tones or tone sequences, and signals sent out-of-band in the PSTN.
   Three methods, two of which are defined by this document, are
   available for carrying tone signals; only one of the three can be
   used to carry out-of-band PSTN signals.  Depending on the
   application, it may be desirable to carry the signalling information
   in more than one form at once.  Section 5 discusses when and how this
   should be done.

   1.  The gateway or end system can upspeed to a higher-bandwidth codec
       such as G.711 [13] when tone signals are to be conveyed.  See new
       ITU-T Recommendation V.152 [20] for a formal treatment of this
       approach.  Alternatively, for FAX, text, or modem signals
       respectively, a specialized transport such as T.38 [17], RFC 2793
       [9], or V.150.1 modem relay [19] may be used.


Schulzrinne, et al.       Expires July 27, 2005                 [Page 6]


   2.  The sending gateway can simply measure the frequency components
       of the voice band signals and transmit this information to the
       RTP receiver using the tone representation defined in this
       document (Section 4).  In this mode, the gateway makes no attempt
       to discern the meaning of the tones, but simply distinguishes
       tones from speech signals.  An end system may use the same
       approach using configured rather than measured frequencies.

       All tone signals in use in the PSTN and meant for human
       consumption are sequences of simple combinations of sine waves,
       either added or modulated.  (There is at least one tone, however,
       the ANSam tone [18] used for indicating data transmission over
       voice lines, that makes use of periodic phase reversals.)

   3.  As a third option, a gateway can recognize the tones and
       translate them into a name, such as ringing or busy tone or DTMF
       digit '0' (Section 2).  The receiver then produces a tone signal
       or other indication appropriate to the signal.  Generally, since
       the recognition of signals at the sender often depends on their
       on/off pattern or the sequence of several tones, this recognition
       can take several seconds.  On the other hand, the gateway may
       have access to the actual signaling information that generates
       the tones and thus can generate the RTP packet immediately,
       without the detour through acoustic signals.

       The use of named events is the only feasible method for
       transmitting out-of-band PSTN signals as content within RTP
       sessions.













Schulzrinne, et al.       Expires July 27, 2005                 [Page 7]


Internet-Draft         Telephony Events and Tones           January 2005

2.  RTP Payload Format for Named Telephone Events

2.1  Introduction

   The RTP payload format for named telephone events is designated as
   "telephone-event", the MIME type as "audio/telephone-event".  In
   accordance with current practice, this payload format does not have a
   static payload type number, but uses a RTP payload type number
   established dynamically and out-of-band.  The default clock frequency
   is 8000 Hz, but the clock frequency can be redefined when assigning
   the dynamic payload type.

   Named telephone events are carried as part of the audio stream, and
   MUST use the same sequence number and time-stamp base as the regular
   audio channel to simplify the generation of audio waveforms at a
   gateway.  The named telephone events payload type can be considered
   to be a very highly-compressed audio codec, and is treated the same
   as other codecs.

2.2  Use of RTP Header Fields

2.2.1  Timestamp

   The RTP timestamp reflects the measurement point for the current
   packet.  The event duration described in Section 2.5 extends forwards
   from that time.  For events that span multiple RTP packets, the RTP
   timestamp identifies the beginning of the event, i.e., several RTP
   packets may carry the same timestamp.  For long-lasting events that
   have to be split into subevents (see below, Section 2.5.1.3), the
   timestamp indicates the beginning of the subevent.

2.2.2  Marker Bit

   The RTP marker bit indicates the beginning of a new event.  For long-
   lasting events that have to be split into subevents (see below,
   Section 2.5.1.3), only the first subevent will have the marker bit
   set.

2.3  Payload Format

   The payload format for named telephone events is shown in Figure 1.





Schulzrinne, et al.       Expires July 27, 2005                 [Page 8]


Internet-Draft         Telephony Events and Tones           January 2005

           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |     event     |E|R| volume    |          duration             |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 1: Payload Format for Named Events

2.3.1  Event Field

   The event field is a number between 0 and 255 identifying a specific
   telephony event.  An IANA registry of event codes for this field has
   been established (see IANA Considerations, Section 7).  The initial
   content of this registry consists of the events defined in Section 3.

2.3.2  E ("End") Bit

   If set to a value of one, the "end" bit indicates that this packet
   contains the end of the event.  For long-lasting events that have to
   be split into subevents (see below, Section 2.5.1.3), only the final
   packet for the final subevent will have the "E" bit set.

2.3.3  R Bit

   This field is reserved for future use.  The sender MUST set it to
   zero, the receiver MUST ignore it.

2.3.4  Volume Field

   For DTMF digits and other events representable as tones, this field
   describes the power level of the tone, expressed in dBm0 after
   dropping the sign.  Power levels range from 0 to -63 dBm0.  Thus,
   larger values denote lower volume.  This value is defined only for
   events for which the documentation indicates that volume is
   applicable.  For other events, the sender MUST set volume to zero and
   the receiver MUST ignore the value.

2.3.5  Duration Field

   The duration field indicates the duration of the event or subevent
   being reported, in timestamp units, expressed as an unsigned integer.
   For a non-zero value, the event or subevent began at the instant
   identified by the RTP timestamp and has so far lasted as long as
   indicated by this parameter.  The event may or may not have ended.
   If the event duration exceeds the maximum representable by the
   duration field, the event is split into several contiguous subevents
   as described below (Section 2.5.1.3).


Schulzrinne, et al.       Expires July 27, 2005                 [Page 9]


Internet-Draft         Telephony Events and Tones           January 2005

   The special duration value of zero is reserved to indicate that the
   event lasts "forever", i.e., is a state and is considered to be
   effective until updated.  A sender MUST NOT transmit a zero duration
   for events other than those defined as states.  The receiver SHOULD
   ignore an event report with zero duration if the event is not a
   state.

   Events defined as states MAY contain a non-zero duration, indicating
   that the sender intends to refresh the state before the time duration
   has elapsed ("soft state").

      For a sampling rate of 8000 Hz, the duration field is sufficient
      to express event durations of up to approximately 8 seconds.

2.4  Optional MIME Parameters

   As indicated in the MIME registration for named events in
   Section 7.1.1, the telephone-event MIME type supports two optional
   parameters: the "events" parameter, and the "rate" parameter.

   The "events" parameter lists the events supported by the
   implementation.  Events are listed as one or more comma-separated
   elements.  Each element can either be a single integer or an integer
   followed by a hyphen and a larger integer, representing a range of
   consecutive event codes.  No white space is allowed in the argument.
   The integers designate the event numbers supported by the
   implementation.

   The "rate" parameter describes the sampling rate, in Hertz, and hence
   the units for the RTP timestamp and event duration fields.  The
   number is written as a floating point number or as an integer.  If
   omitted, the default value is 8000 Hz.

2.4.1  Relationship to SDP

   The recommended mapping of MIME optional parameters to SDP is given
   in section 3 of RFC 3555 [6].  The "rate" MIME parameter for the
   named event payload type follows this convention: it is expressed as
   usual as the <clock rate> component of the a=rtpmap: attribute line.

   The "events" MIME parameter deviates from the convention suggested in
   RFC 3555 because it omits the string "events=" before the list of
   supported events.

      a=fmtp:<format> <list of values>

   The list of values has the format described above for the MIME
   parameter.  The list does not have to be sorted.


Schulzrinne, et al.       Expires July 27, 2005                [Page 10]


Internet-Draft         Telephony Events and Tones           January 2005

   For example, if the payload format uses the payload type number 100,
   and the implementation can handle the DTMF tones (events 0 through
   15) and the dial and ringing tones (assuming as an example that these
   were defined as events with codes 66 and 70 respectively), it would
   include the following description in its SDP message:

      m=audio 12346 RTP/AVP 100
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 0-15,66,70

   The following sample media type definition corresponds to the SDP
   example above:

      audio/telephone-event;events="0-15,66,70";rate="8000"

2.5  Procedures

   This section defines the procedures associated with the named event
   payload type.  Additional procedures may be specified in the
   documentation associated with specific event codes.

2.5.1  Sending Procedures

2.5.1.1  Negotiation of Payloads

   Negotiation of payloads between sender and receiver is achieved by
   out-of-band means, using SDP, for example.

   The sender SHOULD indicate what events it supports, using the
   optional "events" parameter associated with the telephone-events MIME
   type.  If the sender receives an "events" parameter from the
   receiver, it MUST restrict the set of events it sends to those listed
   in the received "events" parameter.  For backward compatibility, if
   no "events" parameter is received, the sender SHOULD assume support
   for the DTMF events 0-15 but for no other events.

   Events may be sent in combination with older events using RFC 2198
   [2]redundancy.  Section 2.5.1.4 describes how this can be used to
   avoid packet and RTP header overheads when retransmitting final event
   reports.  Section 2.6 discusses the use of additional levels of RFC
   2198 redundancy to increase the probability that at least one copy of
   the report of the end of an event reaches the receiver.  The
   following SDP shows an example of such usage, where G.711 audio
   appears in a separate stream, and the primary component of the
   redundant payload is events.



Schulzrinne, et al.       Expires July 27, 2005                [Page 11]


Internet-Draft         Telephony Events and Tones           January 2005

      m=audio 12344 RTP/AVP 99
      a=rtpmap:99  pcmu/8000
      m=audio 12345 RTP/AVP 100 101
      a=rtpmap:100 red/8000/1
      a=fmtp:100 101/101/101
      a=rtpmap:101 telephone-event/8000
      a=fmtp:101 0-15

2.5.1.2  Transmission of Event Packets

   DTMF digits and other named telephone events are carried as part of
   the audio stream, and MUST use the same sequence number and
   time-stamp base as the regular audio channel to simplify the
   generation of audio waveforms at a gateway.

   An audio source SHOULD start transmitting event packets as soon as it
   recognizes an event, and continue to send updates until the event has
   ended.  The update packet MUST have the same RTP timestamp value as
   the initial packet for the event, but the duration MUST be increased
   to reflect the total cumulative duration since the beginning of the
   event.

   The first packet for an event MUST have the "M" bit set.  The final
   packet for an event MUST have the "E" bit set, but setting of the "E"
   bit MAY be deferred until the final packet is retransmitted (see
   Section 2.5.1.4).  Intermediate packets for an event MUST NOT have
   either the "M" bit or the "E" bit set.

   Sending of a packet with the "E" bit set is OPTIONAL if the packet
   reports two events which are defined as mutually exclusive states, or
   if the final packet for one state is immediately followed by a packet
   reporting a mutually exclusive state.  (For events defined as states,
   the appearance of a mutually exclusive state implies the end of the
   previous state.)

   A source has wide latitude as to how often it sends event updates.  A
   natural interval is the spacing between non-event audio packets.
   (Recall that a single RTP packet can contain multiple audio frames
   for frame-based codecs and that the packet interval can vary during a
   session.) Alternatively, a source MAY decide to use a different
   spacing for event updates, with a value of 50 ms RECOMMENDED.

   Timing information is contained in the RTP timestamp, allowing
   precise recovery of inter-event times.  Thus, the sender does not in
   theory need to maintain precise or consistent time intervals between
   event packets.  However, the sender SHOULD minimize the need for
   buffering at the receiving end by sending event reports at constant
   intervals.


Schulzrinne, et al.       Expires July 27, 2005                [Page 12]


      DTMF digits and other tone events are sent incrementally to avoid
      having the receiver wait for the completion of the event.  In some
      cases (for example, data session startup protocols), waiting to
      the end of a tone before reporting it will cause the session to
      fail.  In other cases, it will simply cause undesirable delays in
      playout at the receiving end.

   For robustness, the sender SHOULD retransmit "state" events
   periodically.

2.5.1.3  Long Duration Events

   If an event persists beyond the maximum duration expressible in the
   duration field (0xFFFF), the sender MUST send a packet reporting this
   maximum duration but MUST NOT set the "E" bit in this packet.  The
   sender MUST then begin reporting a new "subevent" with the RTP
   timestamp set to the time at which the previous subevent ended and
   the duration set to the cumulative duration of the new subevent.  The
   "M" bit of the first packet reporting the new subevent MUST NOT be
   set.  The sender MUST repeat this procedure as required until the end
   of the complete event has been reached.  The final packet for the
   complete event MUST have the "E" bit set (either on initial
   transmission or on retransmission as described below).

2.5.1.3.1  Exceptional Procedure For Combined Payloads

   If events are combined as a redundant payload with another payload
   type using RFC 2198 [2] redundancy, the above procedure SHALL be
   applied, but using a maximum duration which ensures that the
   timestamp offset of the oldest generation of events in an RFC 2198
   packet never exceeds 0x3FFF.  If the sender is using a constant
   packetization period, the maximum sub-event duration can be
   calculated from the following formula:

      maximum duration = 0x3FFF - (R-1)*(packetization period in
      timestamp units)

   where R is the highest redundant layer number consisting of event
   payload.

      The RFC 2198 redundancy header timestamp offset value is only 14
      bits, compared with the 16 bits in the event payload duration
      field.  Since with other payloads the RTP timestamp typically
      increments for each new sample, the timestamp offset value becomes
      limiting on reported event duration.  The limit becomes more
      constraining when older generations of events are also included in
      the combined payload.



Schulzrinne, et al.       Expires July 27, 2005                [Page 13]


Internet-Draft         Telephony Events and Tones           January 2005

2.5.1.4  Retransmission of Final Packet

   The final packet for each event and for each subevent SHOULD be sent
   a total of three times at the interval used by the source for
   updates.  This ensures that the duration of the event or subevent can
   be recognized correctly even if an instance of the last packet is
   lost.

   A sender MAY use RFC 2198 [2] with two levels of redundancy to
   combine retransmissions with reports of new events, thus saving on
   header overheads.  In this usage, the primary payload is new event
   reports, while the first and second levels of redundancy report first
   and second retransmissions of final event reports.  Within a session
   negotiated to allow such usage, packets containing the RFC 2198
   payload SHOULD NOT be sent except when both primary and retransmitted
   reports are to be included.  All other packets of the session SHOULD
   contain only the simple, non-redundant telephone-event payload.  Note
   that the expected proportion of simple versus redundant packets
   affects the order in which they should be specified on an SDP m=
   line.

      There is little point in sending initial or interim event reports
      redundantly because each succeeding packet describes the event
      fully (except for typically irrelevant variations in volume).

   A sender MAY delay setting the "E" bit until retransmitting the last
   packet for a tone, rather than setting the bit on its first
   transmission.  This avoids having to wait to detect whether the tone
   has indeed ended.  Once the sender has set the "E" bit for a packet,
   it MUST continue to set the "E" bit for any further retransmissions
   of that packet.

2.5.1.5  Packing Multiple Events Into One Packet

   Multiple named events can be packed into a single RTP packet if and
   only if the events are consecutive and contiguous, i.e., occur
   without overlap and without pause between them, and if the last event
   packed into a packet occurs quickly enough to avoid excessive delays
   at the receiver.

   This approach is similar to having multiple frames of frame-based
   audio in one RTP packet.

   The constraint that packed events not overlap implies that events
   designated as states can be followed in a packet only by other state
   events which are mutually exclusive to them.  The constraint itself
   is needed so that the beginning time of each event can be calculated
   at the receiver.


Schulzrinne, et al.       Expires July 27, 2005                [Page 14]


Internet-Draft         Telephony Events and Tones           January 2005

   In a packet containing events packed in this way, the RTP timestamp
   MUST identify the beginning of the first event or subevent in the
   packet.  The "M" bit MUST be set if the packet records the beginning
   of at least one event.  (The exception will be when the packet
   carries the end of one segment of a long-lasting event, and the
   beginning of the next segment.)  The "E" bit and duration for each
   event in the packet MUST be set using the same rules as if that event
   were the only event contained in the packet.

2.5.1.6  RTP Sequence Number

   The RTP sequence number MUST be incremented by one in each successive
   RTP packet sent.  Incrementing applies to retransmitted as well as
   initial instances of event reports, to permit the receiver to detect
   lost packets for RTCP receiver reports.

2.5.2  Receiving Procedures

2.5.2.1  Indication of Receiver Capabilities using SDP

   Receivers can indicate which named events they can handle, for
   example, by using the Session Description Protocol (RFC 2327 [3]).
   SDP descriptions using the event payload MUST contain an fmtp format
   attribute that lists the event values that the receiver can process.

2.5.2.2  Playout of Tone Events

   In the gateway scenario, an Internet telephony gateway connecting a
   packet voice network to the PSTN recreates the DTMF or other tones
   and injects them into the PSTN.  Since, for example, DTMF digit
   recognition takes several tens of milliseconds, the first few
   milliseconds of a digit will arrive as regular audio packets.  Thus,
   careful time and power (volume) alignment between the audio samples
   and the events is needed to avoid generating spurious digits at the
   receiver.  Playout when audio packets continue to arrive as the event
   proceeds is discussed further in Section 5.2 below.

   Receiver implementations MAY use different algorithms to create
   tones, including the two described here.  (Note that not all
   implementations have the need to recreate a tone; some may only care
   about recognizing the events.)  With either algorithm, a receiver may
   impose a playout delay to provide robustness against packet loss or
   delay.  The tradeoff between playout delay and other factors is
   discussed further in Section 5.1.1.

   In the first algorithm, the receiver simply places a tone of the
   given duration in the audio playout buffer at the location indicated
   by the timestamp.  As additional packets are received that extend the


Schulzrinne, et al.       Expires July 27, 2005                [Page 15]


Internet-Draft         Telephony Events and Tones           January 2005

   same tone, the waveform in the playout buffer is extended
   accordingly.  (Care has to be taken if audio is mixed, i.e., summed,
   in the playout buffer rather than simply copied.) Thus, if a packet
   in a tone lasting longer than the packet interarrival time gets lost
   and the playout delay is short, a gap in the tone may occur.

   Alternatively, the receiver can start a tone and play it until one of
   the following occurs:

   o  it receives a packet with the "E" bit set;
   o  it receives the next tone, distinguished by a different timestamp
      value (noting that new segments of long-duration events also
      appear with a new timestamp value);
   o  it receives an alternative non-event media stream (assuming none
      was being received while the event stream was active); or
   o  a given time period elapses.

   This is more robust against packet loss, but may extend the tone
   beyond its original duration if all retransmissions of the last
   packet in an event are lost.  Limiting the time period of extending
   the tone is necessary to avoid that a tone "gets stuck".  This
   algorithm is not a license for senders to set the duration field to
   zero; it MUST be set to the current duration as described, since this
   is needed to create accurate events if the first event packet is
   lost, among other reasons.

   Regardless of the algorithm used, the tone SHOULD NOT be extended by
   more than three packet interarrival times.  A slight extension of
   tone durations and shortening of pauses is generally harmless.

   A receiver SHOULD NOT restart a tone once playout has stopped.  It
   MAY do so if the tone is of a type meant for human consumption or is
   one for which interruptions will not cause confusion at the receiving
   device.

   If a receiver receives an event packet for an event which it is not
   currently playing out and the packet does not have the "M" bit set,
   earlier packets for that event have evidently been lost.  This can be
   confirmed by gaps in the RTP sequence number.  The receiver MAY
   determine on the basis of retained history and the timestamp and
   event code of the current packet that it corresponds to an event
   already played out and lapsed.  In that case further reports for the
   event MUST be ignored, as indicated in the previous paragraph.

   If, on the other hand, the event has not been played out at all, the
   receiver MAY attempt to play the event out to the complete duration
   indicated in the event report.  The appropriate behaviour will depend
   on the event type concerned, and requires consideration of the


Schulzrinne, et al.       Expires July 27, 2005                [Page 16]


Internet-Draft         Telephony Events and Tones           January 2005

   relationship of the event to audio media flows and whether correct
   event duration is essential to the correct operation of the media
   session.

   A receiver SHOULD NOT rely on a particular event packet spacing, but
   instead MUST use the event timestamps and durations to determine
   timing and duration of playout.

   The receiver MUST calculate jitter for RTCP receiver reports based on
   all packets with a given timestamp.  Note: The jitter value should
   primarily be used as a means for comparing the reception quality
   between two users or two time-periods, not as an absolute measure.

   If a zero volume is indicated for an event for which the volume field
   is defined, then the receiver MAY reconstruct the volume from the
   volume of non-event audio or MAY use the nominal value specified by
   the ITU Recommendation or other document defining the tone.  This
   ensures backwards compatibility with RFC 2833 [10], where the volume
   field was defined only for DTMF events.

2.5.2.3  Long Duration Events

   If an event report is received with duration equal to the maximum
   duration expressible in the duration field (0xFFFF) and the "E" bit
   for the report is not set, the event report may mark the end of a
   subevent generated according to the procedures of Section 2.5.1.3.
   If another report for the same event type is received, the receiver
   MUST compare the RTP timestamp for the new event with the sum of the
   RTP timestamp of the previous report plus the duration (0xFFFF).  The
   receiver uses the absence of a gap between the events to detect that
   it is receiving a single long-duration event.

   The total duration of a long duration event is (obviously) the sum of
   the durations of the subevents used to report it.  This is equal to
   the duration of the final subevent (as indicated in the final packet
   for that subevent), plus 0xFFFF multiplied by the number of subevents
   preceding the final subevent.

2.5.2.3.1  Exceptional Procedure For Combined Payloads

   If events are combined as a redundant payload with another payload
   type using RFC 2198 [2] redundancy, sub-events are generated at
   intervals of 0x3FFF or less, rather than 0xFFFF, as required by the
   procedures of Section 2.5.1.3.1 in this case.  If a receiver is using
   the events component of the payload, event duration may be only an
   approximate indicator of division into sub-events, but the lack of an
   E-bit and the adjacency of two reports with the same event code are
   strong indicators in themselves.


Schulzrinne, et al.       Expires July 27, 2005                [Page 17]


Internet-Draft         Telephony Events and Tones           January 2005

2.5.2.4  Multiple Events In a Packet

   The procedures of Section 2.5.1.5 require that if multiple events are
   reported in the same packet, they are contiguous and non-overlapping.
   As a result, it is not strictly necessary for the receiver to know
   the start times of the events following the first one in order to
   play them out -- it needs only to respect the duration reported for
   each event.  Nevertheless, if knowledge of the start time for a given
   event after the first one is required, it is equal to the sum of the
   start time of the preceding event plus the duration of the preceding
   event.

2.5.2.5  Soft States

   If the duration of a soft state event expires, the receiver SHOULD
   consider the value of the state to be "unknown" unless otherwise
   indicated in the event documentation.

2.6  Reliability

   A reliability objective for event transmission may be expressed as
   the target probability that the event is played out with the correct
   duration and with the correct starting time relative to other events
   or other media operating on the same timestamp base.  Reliability is
   an issue because of the possibility that packets are lost or delayed
   within the network.

   The named event mechanism uses two complementary redundancy
   mechanisms to deal with lost packets:

   Intra-event updates:

      Events that last longer than one event period (e.g., 50 ms) are
      updated periodically, so that the receiver can reconstruct the
      event and its duration if it receives any of the update packets,
      albeit with delay.

      During an event, the RTP event payload format provides incremental
      updates on the event.  The error resiliency afforded by this
      mechanism depends on whether the first or second algorithm in
      Section 2.5.2.2 is used and on the playout delay at the receiver.
      For example, if the receiver uses the first algorithm and only
      places the current duration of tone signal in the playout buffer,
      for a playout delay of 120 ms and a packet gap of 50 ms, two
      packets in a row can get lost without causing a premature end of
      the tone generated.



Schulzrinne, et al.       Expires July 27, 2005                [Page 18]


Internet-Draft         Telephony Events and Tones           January 2005

   Repeat last event packet:

      As described in Section 2.5.1.4, the last report for an event is
      transmitted a total of three times.  This mechanism adds
      robustness to the reporting of the end of an event.

      Where Section 2.5.1.4 indicates that it is appropriate to use the
      RFC 2198 [2] audio redundancy mechanism to carry retransmissions
      of final event reports, this mechanism MAY also be used to extend
      the number of final report retransmissions.  This is done by using
      more than two levels of redundancy.  The use of RFC 2198 helps to
      mitigate the extra bandwidth demands that would be imposed simply
      by retransmitting final event packets more than three times.  If a
      lack of following events makes use of RFC 2198 inappropriate, the
      sender SHOULD NOT exceed the three-time transmission limit unless
      an exponential backoff algorithm like that used for TCP is used to
      derive the times at which retransmitted packets are sent.

   See Section 5.1.1 for further discussion of application issues
   associated with reliability objectives.
















Schulzrinne, et al.       Expires July 27, 2005                [Page 19]


Internet-Draft         Telephony Events and Tones           January 2005

3.  Specification of Event Codes For DTMF Events

   This document defines one class of named events: DTMF tones.

3.1  DTMF Events

   DTMF signalling [7] is typically generated by a telephone set or
   possibly by a PBX.  DTMF digits may be consumed by entities such as
   gateways or application servers in the IP network, or by entities
   such as telephone switches or IVRs in the circuit switched network.

   The DTMF events support two possible applications at the sending end,
   and two at the receiving end.  In the first sending application, the
   Internet telephony gateway detects DTMF on the incoming circuits and
   sends the RTP payload described here instead of regular audio
   packets.  The gateway likely has the necessary digital signal
   processors and algorithms, as it often needs to detect DTMF, e.g.,
   for two-stage dialing.  Having the gateway detect tones relieves the
   receiving Internet end system from having to do this work and also
   avoids having low bit-rate codecs like G.723.1 [14] render DTMF tones
   unintelligible.  In the second sending application, an Internet end
   system such as an "Internet phone" can emulate DTMF functionality
   without concerning itself with generating precise tone pairs and
   without imposing the burden of tone recognition on the receiver.

   A similar distinction occurs at the receiving end.  In the gateway
   scenario, an Internet telephony gateway connecting a packet voice
   network to the PSTN recreates the DTMF tones or other telephony
   events and injects them into the PSTN.  In the end system scenario,
   the DTMF events are consumed by the receiving entity itself.

   Table 1 shows the DTMF-related event codes within the telephone-event
   payload format.  The DTMF digits 0-9 and * and # are commonly
   supported.  DTMF digits A through D are less frequently encountered,
   typically in special applications such as military networks.

   ITU-T Recommendation Q.24 [8], Table A-1, indicates that the legacy
   switching equipment in the countries surveyed expects a minimum
   recognizable signal duration of 40 ms, a minimum pause between
   signals of 40 ms, and a maximum signalling rate of 8 to 10 digits per
   second depending on the country.  Human-generated DTMF signals, of
   course, are generally longer with larger pauses between them.





Schulzrinne, et al.       Expires July 27, 2005                [Page 20]


Internet-Draft         Telephony Events and Tones           January 2005

                  +-------+--------+------+---------+
                  | Event | Code   | Type | Volume? |
                  +-------+--------+------+---------+
                  | 0--9  | 0--9   | tone | yes     |
                  |       |        |      |         |
                  | *     | 10     | tone | yes     |
                  |       |        |      |         |
                  | #     | 11     | tone | yes     |
                  |       |        |      |         |
                  | A--D  | 12--15 | tone | yes     |
                  +-------+--------+------+---------+

                       Table 1: DTMF named events



















Schulzrinne, et al.       Expires July 27, 2005                [Page 21]


Internet-Draft         Telephony Events and Tones           January 2005

4.  RTP Payload Format for Telephony Tones

4.1  Introduction

   As an alternative to describing tones and events by name, as
   described in Section 2, it is sometimes preferable to describe them
   by their waveform properties.  In particular, recognition is faster
   than for naming signals since it does not depend on recognizing
   durations or pauses.

   There is no single international standard for telephone tones such as
   dial tone, ringing (ringback), busy, congestion ("fast-busy"),
   special announcement tones or some of the other special tones, such
   as payphone recognition, call waiting or record tone.  However, ITU-T
   Recommendation E.180 [12] notes that across all countries, these
   tones share a number of characteristics:

   o  Telephony tones consist of either a single tone, the addition of
      two or three tones or the modulation of two tones.  (Almost all
      tones use two frequencies; only the Hungarian "special dial tone"
      has three.) Tones that are mixed have the same amplitude and do
      not decay.

   o  In-band tones for telephony events are in the range of 25 Hz
      (ringing tone in Angola) to 2600 Hz (the tone used for line
      signalling in SS No.  5 and R1).  The in-band telephone frequency
      range is limited to 3400 Hz.  R2 defines a 3825 Hz out-of-band
      tone for line signalling on analogue trunks.  (The piano has a
      range from 27.5 to 4186 Hz.)

   o  Modulation frequencies range between 15 (ANSam tone) to 480 Hz
      (Jamaica).  Non-integer frequencies are used only for frequencies
      of 16 2/3 and 33 1/3 Hz.  (These fractional frequencies appear to
      be derived from AC power grid frequencies.)

   o  Tones that are not continuous have durations of less than four
      seconds.

   o  ITU Recommendation E.180 [12] notes that different telephone
      companies require a tone accuracy of between 0.5 and 1.5%.  The
      Recommendation suggests a frequency tolerance of 1%.

4.2  Examples of Common Telephone Tone Signals

   As an aid to the implementor, Table 2 summarizes some common tones.
   The rows labeled "ITU ..." refer to ITU-T Recommendation E.180 [12].
   In the table, the symbol "+" indicates addition of the tones, without
   modulation, while "*" indicates amplitude modulation.


Schulzrinne, et al.       Expires July 27, 2005                [Page 22]


Internet-Draft         Telephony Events and Tones           January 2005

   +------------------------+--------------+-------------+-------------+
   | Tone Name              | Frequency    | On Period   | Off Period  |
   |                        |              | (s)         | (s)         |
   +------------------------+--------------+-------------+-------------+
   | CNG                    | 1100         | 0.5         | 3.0         |
   |                        |              |             |             |
   | V.25 CT                | 1300         | 0.5         | 2.0         |
   |                        |              |             |             |
   | CED                    | 2100         | 3.3         | --          |
   |                        |              |             |             |
   | ANS                    | 2100         | 3.3         | --          |
   |                        |              |             |             |
   | ANSam                  | 2100*15      | 3.3         | --          |
   |                        |              |             |             |
   | V.21 "0" bit, channel  | 1180         | 0.00333     | --          |
   | 1                      |              |             |             |
   |                        |              |             |             |
   | V.21 "1" bit, channel  | 980          | 0.00333     | --          |
   | 1                      |              |             |             |
   |                        |              |             |             |
   | V.21 "0" bit, channel  | 1850         | 0.00333     | --          |
   | 2                      |              |             |             |
   |                        |              |             |             |
   | V.21 "1" bit, channel  | 1650         | 0.00333     | --          |
   | 2                      |              |             |             |
   |                        |              |             |             |
   | -------------          | ----------   | ---------   | ----------  |
   |                        |              |             |             |
   | ITU dial tone          | 425          | --          | --          |
   |                        |              |             |             |
   | U.S. dial tone         | 350+440      | --          | --          |
   |                        |              |             |             |
   | ITU ringing tone       | 425          | 0.67-1.5    | 3-5         |
   |                        |              |             |             |
   | U.S. ringing tone      | 440+480      | 2.0         | 4.0         |
   |                        |              |             |             |
   | ITU busy tone          | 425          |             |             |
   |                        |              |             |             |
   | U.S. busy tone         | 480+620      | 0.5         | 0.5         |
   |                        |              |             |             |
   | ITU congestion tone    | 425          |             |             |
   |                        |              |             |             |
   | U.S. congestion tone   | 480+620      | 0.25        | 0.25        |
   +------------------------+--------------+-------------+-------------+

                  Table 2: Examples of telephony tones



Schulzrinne, et al.       Expires July 27, 2005                [Page 23]


Internet-Draft         Telephony Events and Tones           January 2005

4.3  Use of RTP Header Fields

4.3.1  Timestamp

   The RTP timestamp reflects the measurement point for the current
   packet.  The event duration described in Section 4.3.3 extends
   forwards from that time.

4.3.2  Marker Bit

   The tones payload type uses the marker bit to distinguish the first
   RTP packet reporting a given instance of a tone from succeeding
   packets for that tone.  The marker bit SHOULD be set to 1 for the
   first packet, and to 0 for all succeeding packets relating to the
   same tone.

4.3.3  Payload Format

   Based on the characteristics described above, this document defines
   an RTP payload format called "tone" that can represent tones
   consisting of one or more frequencies.  (The corresponding MIME type
   is "audio/tone".) The default timestamp rate is 8000 Hz, but other
   rates may be defined.  Note that the timestamp rate does not affect
   the interpretation of the frequency, just the durations.

   In accordance with current practice, this payload format does not
   have a static payload type number, but uses a RTP payload type number
   established dynamically and out-of-band.

   The payload format is shown in Figure 2.











Schulzrinne, et al.       Expires July 27, 2005                [Page 24]


Internet-Draft         Telephony Events and Tones           January 2005

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |    modulation   |T|  volume   |          duration             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |R R R R|       frequency       |R R R R|       frequency       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |R R R R|       frequency       |R R R R|       frequency       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           ......

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |R R R R|       frequency       |R R R R|      frequency        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 2: Payload Format for Tones

   The payload contains the following fields:

   modulation:

      The modulation frequency, in Hz.  The field is a 9-bit unsigned
      integer, allowing modulation frequencies up to 511 Hz.  If there
      is no modulation, this field has a value of zero.

   T:

      If the "T" bit is set (one), the modulation frequency is to be
      divided by three.  Otherwise, the modulation frequency is taken as
      is.

      This bit allows frequencies accurate to 1/3 Hz, since modulation
      frequencies such as 16 2/3 Hz are in practical use.

   volume:

      The power level of the tone, expressed in dBm0 after dropping the
      sign, with range from 0 to -63 dBm0.  (Note: A preferred level
      range for digital tone generators is -8 dBm0 to -3 dBm0.)

   duration:

      The duration of the tone, measured in timestamp units.  The tone
      begins at the instant identified by the RTP timestamp and lasts
      for the duration value.  The value of zero is not permitted and
      tones with such a duration SHOULD be ignored.

      The definition of duration corresponds to that for sample-based


Schulzrinne, et al.       Expires July 27, 2005                [Page 25]


Internet-Draft         Telephony Events and Tones           January 2005

      codecs, where the timestamp represents the sampling point for the
      first sample.

   frequency:

      The frequencies of the tones to be added, measured in Hz and
      represented as a 12-bit unsigned integer.  The field size is
      sufficient to represent frequencies up to 4095 Hz, which exceeds
      the range of telephone systems.  A value of zero indicates
      silence.  A single tone can contain any number of frequencies.  If
      the number of frequencies it contains is odd, padding SHALL be
      added to bring the packet to a 32-bit boundary.  (RFC 3550 [5]
      requires that padding be set to all zeroes.)

   R:

      This field is reserved for future use.  The sender MUST set it to
      zero, the receiver MUST ignore it.

4.3.4  Optional MIME Parameters

   The "rate" parameter describes the sampling rate, in Hertz.  The
   number is written as a floating point number or as an integer.  If
   omitted, the default value is 8000 Hz.

4.4  Procedures

   This section defines the procedures associated with the tones payload
   type.

4.4.1  Sending Procedures

   The sender MAY send an initial tones packet as soon as a tone is
   recognized, or MAY wait until a pre-negotiated packetization period
   has elapsed.  The first RTP packet for a tone SHOULD have the marker
   bit set to 1.

   In the case of longer-duration tones, the sender SHOULD generate
   multiple RTP packets for the same tone instance.  The RTP timestamp
   MUST be updated for each packet generated (in contrast, for instance,
   to the timestamp for packets carrying telephone-events).  Subsequent
   packets for the same tone SHOULD have the marker bit set to 0, and
   the RTP timestamp in each subsequent packet MUST equal the sum of the
   timestamp and the duration in the preceding packet.

   A final RTP packet MAY be generated as soon as the end of the tone is
   detected, without waiting for the latest packetization period to
   elapse.


Schulzrinne, et al.       Expires July 27, 2005                [Page 26]


Internet-Draft         Telephony Events and Tones           January 2005

   For increased reliability, the sender MAY combine new and old tone
   reports in the same RTP packet using RFC 2198 [2] audio redundancy.

4.4.2  Receiving Procedures

   Receiving implementations play out the tones as received.  When
   playing out successive tone reports for the same tone (marker bit is
   zero, the RTP timestamp is contiguous with that of the previous RTP
   packet, and payload content is identical), the receiving
   implementation SHOULD continue the tone without change or a break.





















Schulzrinne, et al.       Expires July 27, 2005                [Page 27]


Internet-Draft         Telephony Events and Tones           January 2005

5.  Application Considerations

5.1  Considerations On Selection Of Packetization Period For Events

   Note that according to RFC 3264 [4], the SDP a=ptime: attribute
   indicates the packetization period that the author of the session
   description expects when receiving media, and that this value does
   not have to be the same in both directions.  The appropriate period
   may vary with the application, since increased packetization periods
   imply increased playout delay and thereby increased end-to-end
   response times in instances where one end responds to events reported
   from the other.  The negotiations MAY specify such differences by
   separating events corresponding to different applications into
   different streams.  In the example below, events 0-15 are DTMF
   events, which have a fairly wide tolerance on timing.  Events 32-49
   and 52-60 are events related to data transmission and are subject to
   end-to-end response time considerations.  As a result, they are
   assigned a smaller packetization period than the DTMF events.

      m=audio 12344 RTP/AVP 99
      a=rtpmap:99 telephone-event/8000
      a=fmtp:100 0-15
      a=ptime:50
      m=audio 12346 RTP/AVP 100
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 32-49,52-60
      a=ptime:30

5.1.1  Interactions To Be Considered

   As a preliminary remark: to avoid gaps in playout (for any payload
   type), the receiver has to impose a playout delay equal to the
   largest expected time lapse between successive packets that it
   receives (leaving aside silence).  It is generally desirable to
   minimize playout delay.  The sender can help by maintaining a
   constant packetization period and packet dispatch interval.

   There is an interaction between the packetization period used by a
   sender, the playout delay used by the receiver, and the vulnerability
   of an event flow to packet losses.  Assuming packet losses are
   independent, a shorter packetization interval means that the receiver
   can use a smaller playout delay to recover from a given number of
   consecutive packet losses, at any stage of event playout.  This
   improves end-to-end response times in situations where that matters.
   Of course, this comes at the expense of more bandwidth for the
   session, which in itself increases the probability of packet loss.

   In fact, losses tend to come in bursts.  If these bursts have a


Schulzrinne, et al.       Expires July 27, 2005                [Page 28]


Internet-Draft         Telephony Events and Tones           January 2005

   significant probability of lasting more than one packetization
   period, reducing the packetization period simply means that more
   packets will be lost.  The storm must be weathered, and playout delay
   at the receiver is the primary mechanism available for that purpose.

   Assuming a playout delay and packetization period properly matched to
   the loss characteristics of the network, there is still one point of
   vulnerability: loss of the final event report and its
   retransmissions.  For events lasting less than one packetization
   period, such a loss would mean that the lost events never get played
   out.  For longer events, the loss means that the event playout
   duration will be incorrect.  If the use of RFC 2198 redundancy is
   appropriate, then as indicated in Section 2.6 , it can be used to
   raise the number of final event retransmissions and period spanned by
   them to the values required to meet reliability objectives.

   All else being equal, it is preferable to minimize aggregate data
   rates by reporting more events per packet and reducing the level of
   redundancy used.  To give an idea of the bandwidth tradeoffs between
   packetization period and level of redundancy, consider a situation
   where, to achieve reliability objectives, it is necessary that final
   event reports and their retransmissions span a period of 100 ms
   (because the probability that no burst of losses will last longer
   than that is at the target level).  Suppose the average event
   duration is 3.33 ms (V.21 bits, for instance).  Table 3 shows
   combinations of packetization interval and level of redundancy that
   will meet the reliability requirement, and their impact on packet
   size and total IP bandwidth required.

   +------------+------------+-------------+-------------+-------------+
   | Packetizat |  Levels of |   Packets/s |   IP Packet |    Total IP |
   | ion        | Redundancy |             | Size (bits) |    Bit Rate |
   |  Interval  |            |             |             |    (bits/s) |
   |  (ms)      |            |             |             |             |
   +------------+------------+-------------+-------------+-------------+
   | 50         |      2     |          20 |        1928 |       38560 |
   |            |            |             |             |             |
   | 33.3       |      3     |          30 |        1800 |       54000 |
   |            |            |             |             |             |
   | 25         |      4     |          40 |        1752 |       70080 |
   |            |            |             |             |             |
   | 20         |      5     |          50 |        1736 |       86800 |
   +------------+------------+-------------+-------------+-------------+

       Table 3: Data Rate At the IP Level vs. Packetization Delay

   In this example, packet size is nearly constant even though the
   smaller packetization periods mean fewer events per generation.  (In


Schulzrinne, et al.       Expires July 27, 2005                [Page 29]


Internet-Draft         Telephony Events and Tones           January 2005

   fact, beyond five levels of redundancy it starts to increase.)  As a
   result, total bandwidth consumed at the IP level increases almost in
   direct proportion to the decrease in packetization period.  Under the
   assumed loss model, a packetization period smaller than 50 ms is
   unjustified.

   The alternative loss model mentioned above is one where loss bursts
   in the network are short enough that packet loss probabilities for
   successive packets appear to be independent.  In this case, the
   reliability problem boils down to having enough final event report
   transmissions to meet the probability objective.  Suppose it takes
   four packets to do so.  This calls for three levels of redundancy for
   final reports, one more than would be used otherwise.  As mentioned
   above, it could be seen as beneficial in this case to use a shorter
   packetization period.  Table 4 shows the data rates resulting from
   different packetization periods with the same level of redundancy.

   +------------+------------+-------------+-------------+-------------+
   | Packetizat | Redundancy |   Packets/s |   IP Packet |    Total IP |
   | ion        |            |             | Size (bits) |    Bit Rate |
   |  Interval  |            |             |             |    (bits/s) |
   |  (ms)      |            |             |             |             |
   +------------+------------+-------------+-------------+-------------+
   | 50         |      3     |          20 |        2440 |       48800 |
   |            |            |             |             |             |
   | 33.3       |      3     |          30 |        1800 |       54000 |
   |            |            |             |             |             |
   | 25         |      3     |          40 |        1480 |       59200 |
   |            |            |             |             |             |
   | 20         |      3     |          50 |        1288 |       64400 |
   +------------+------------+-------------+-------------+-------------+

           Table 4: Data Rate At the IP Level vs. Redundancy

   The gain in bandwidth with lower packetization periods is mitigated
   by the reduction in number of events per packet.  Thus improved
   end-to-end response times are achieved at reasonable cost in this
   example.

5.2  Examples

   Events are usually sent in combination with or alternating with other
   payload types.  Payload negotiation may specify separate event and
   other payload streams, or may specify a combined stream that mixes
   other payload types with events using RFC 2198 [2] redundancy
   headers.  The purpose of using a combined stream may be for debugging
   or to ease the transition between general audio and events.


Schulzrinne, et al.       Expires July 27, 2005                [Page 30]


Internet-Draft         Telephony Events and Tones           January 2005

   Consider a DTMF dialling sequence, where the user dials the digits
   "911" and a sending gateway detects them.  The first digit is 200 ms
   long (1600 timestamp units) and starts at time 0; the second digit
   lasts 250 ms (2000 timestamp units) and starts at time 880 ms (7040
   timestamp units); the third digit is pressed at time 1.4 s (11,200
   timestamp units) and lasts 220 ms (1760 timestamp units).  The frame
   duration is 50 ms.

   Table 5 shows the complete sequence of events assuming that only the
   telephone-events payload type is being reported.  For simplicity: the
   timestamp is assumed to begin at 0, the RTP sequence number at 1, and
   volume settings are omitted.

   +--------+----------+-------+-------+-------+-------+-------+-------+
   |   Time | Event    | M bit |  Time |   Seq | Event |  Dura | E bit |
   |   (ms) |          |       | stamp |    No |  Code |  tion |       |
   +--------+----------+-------+-------+-------+-------+-------+-------+
   |      0 | "9"      |       |       |       |       |       |       |
   |        | starts   |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |     50 | RTP      |  "1"  |     0 |     1 |   9   |   400 |  "0"  |
   |        | packet 1 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    100 | RTP      |  "0"  |     0 |     2 |   9   |   800 |  "0"  |
   |        | packet 2 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    150 | RTP      |  "0"  |     0 |     3 |   9   |  1200 |  "0"  |
   |        | packet 3 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    200 | RTP      |  "0"  |     0 |     4 |   9   |  1600 |  "0"  |
   |        | packet 4 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    200 | "9" ends |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    250 | RTP      |  "0"  |     0 |     5 |   9   |  1600 |  "1"  |
   |        | packet 4 |       |       |       |       |       |       |
   |        | first    |       |       |       |       |       |       |
   |        | retrans  |       |       |       |       |       |       |
   |        | mission  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    300 | RTP      |  "0"  |     0 |     6 |   9   |  1600 |  "1"  |
   |        | packet 4 |       |       |       |       |       |       |
   |        | second   |       |       |       |       |       |       |
   |        | retrans  |       |       |       |       |       |       |


Schulzrinne, et al.       Expires July 27, 2005                [Page 31]


Internet-Draft         Telephony Events and Tones           January 2005

   |        | mission  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    880 | First    |       |       |       |       |       |       |
   |        | "1"      |       |       |       |       |       |       |
   |        | starts   |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    930 | RTP      |  "1"  |  7040 |     7 |   1   |   400 |  "0"  |
   |        | packet 5 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    980 | RTP      |  "0"  |  7040 |     8 |   1   |   800 |  "0"  |
   |        | packet 6 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1030 | RTP      |  "0"  |  7040 |     9 |   1   |  1200 |  "0"  |
   |        | packet 7 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1080 | RTP      |  "0"  |  7040 |    10 |   1   |  1600 |  "0"  |
   |        | packet 8 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1130 | RTP      |  "0"  |  7040 |    11 |   1   |  2000 |  "0"  |
   |        | packet 9 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1130 | First    |       |       |       |       |       |       |
   |        | "1" ends |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1180 | RTP      |  "0"  |  7040 |    12 |   1   |  2000 |  "1"  |
   |        | packet 9 |       |       |       |       |       |       |
   |        | first    |       |       |       |       |       |       |
   |        | retrans  |       |       |       |       |       |       |
   |        | mission  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1230 | RTP      |  "0"  |  7040 |    13 |   1   |  2000 |  "1"  |
   |        | packet 9 |       |       |       |       |       |       |
   |        | second   |       |       |       |       |       |       |
   |        | retrans  |       |       |       |       |       |       |
   |        | mission  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1400 | Second   |       |       |       |       |       |       |
   |        | "1"      |       |       |       |       |       |       |
   |        | starts   |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1450 | RTP      |  "1"  | 11200 |    14 |   1   |   400 |  "0"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 10 sent  |       |       |       |       |       |       |


Schulzrinne, et al.       Expires July 27, 2005                [Page 32]


Internet-Draft         Telephony Events and Tones           January 2005

   |        |          |       |       |       |       |       |       |
   |   1500 | RTP      |  "0"  | 11200 |    15 |   1   |   800 |  "0"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 11 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1550 | RTP      |  "0"  | 11200 |    16 |   1   |  1200 |  "0"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 12 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1600 | RTP      |  "0"  | 11200 |    17 |   1   |  1600 |  "0"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 13 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1620 | Second   |       |       |       |       |       |       |
   |        | "1" ends |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1650 | RTP      |  "0"  | 11200 |    18 |   1   |  1760 |  "1"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 14 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1700 | RTP      |  "0"  | 11200 |    19 |   1   |  1760 |  "1"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 14 first |       |       |       |       |       |       |
   |        | retrans  |       |       |       |       |       |       |
   |        | mission  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1750 | RTP      |  "0"  | 11200 |    20 |   1   |  1760 |  "1"  |
   |        | packet   |       |       |       |       |       |       |
   |        | 14       |       |       |       |       |       |       |
   |        | second   |       |       |       |       |       |       |
   |        | retrans  |       |       |       |       |       |       |
   |        | mission  |       |       |       |       |       |       |
   +--------+----------+-------+-------+-------+-------+-------+-------+

                  Table 5: Example of Event Reporting

   Table 6 shows the same sequence assuming that only the tone payload
   type is being reported.  This looks somewhat different.  For
   simplicity: the timestamp is assumed to begin at 0, the sequence
   number at 1.  Volume, the T bit, and the modulation frequency are
   omitted.  The latter two are always 0.





Schulzrinne, et al.       Expires July 27, 2005                [Page 33]


Internet-Draft         Telephony Events and Tones           January 2005

   +--------+----------+-------+-------+-------+-------+-------+-------+
   |   Time | Event    | M bit |  Time |   Seq | Dura  | Freq  | Freq  |
   |   (ms) |          |       | stamp |    No | tion  | 1     | 2     |
   |        |          |       |       |       |       | (Hz)  | (Hz)  |
   +--------+----------+-------+-------+-------+-------+-------+-------+
   |      0 | "9"      |       |       |       |       |       |       |
   |        | starts   |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |     50 | RTP      |  "1"  |     0 |     1 | 400   | 852   | 1477  |
   |        | packet 1 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    100 | RTP      |  "0"  |   400 |     2 | 400   | 852   | 1477  |
   |        | packet 2 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    150 | RTP      |  "0"  |   800 |     3 | 400   | 852   | 1477  |
   |        | packet 3 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    200 | RTP      |  "0"  |  1200 |     4 | 400   | 852   | 1477  |
   |        | packet 4 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    200 | "9" ends |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    880 | First    |       |       |       |       |       |       |
   |        | "1"      |       |       |       |       |       |       |
   |        | starts   |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    930 | RTP      |  "1"  |  7040 |     5 | 400   | 697   | 1209  |
   |        | packet 5 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |    980 | RTP      |  "0"  |  7440 |     6 | 400   | 697   | 1209  |
   |        | packet 6 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1030 | RTP      |  "0"  |  7840 |     7 | 400   | 697   | 1209  |
   |        | packet 7 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1080 | RTP      |  "0"  |  8240 |     8 | 400   | 697   | 1209  |
   |        | packet 8 |       |       |       |       |       |       |
   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1130 | RTP      |  "0"  |  8640 |     9 | 400   | 697   | 1209  |
   |        | packet 9 |       |       |       |       |       |       |


Schulzrinne, et al.       Expires July 27, 2005                [Page 34]


Internet-Draft         Telephony Events and Tones           January 2005

   |        | sent     |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1130 | First    |       |       |       |       |       |       |
   |        | "1" ends |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1400 | Second   |       |       |       |       |       |       |
   |        | "1"      |       |       |       |       |       |       |
   |        | starts   |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1450 | RTP      |  "1"  | 11200 |    10 | 400   | 697   | 1209  |
   |        | packet   |       |       |       |       |       |       |
   |        | 10 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1500 | RTP      |  "0"  | 11600 |    11 | 400   | 697   | 1209  |
   |        | packet   |       |       |       |       |       |       |
   |        | 11 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1550 | RTP      |  "0"  | 12000 |    12 | 400   | 697   | 1209  |
   |        | packet   |       |       |       |       |       |       |
   |        | 12 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1600 | RTP      |  "0"  | 12400 |    13 | 400   | 697   | 1209  |
   |        | packet   |       |       |       |       |       |       |
   |        | 13 sent  |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1620 | Second   |       |       |       |       |       |       |
   |        | "1" ends |       |       |       |       |       |       |
   |        |          |       |       |       |       |       |       |
   |   1650 | RTP      |  "0"  | 12800 |    14 | 160   | 697   | 1209  |
   |        | packet   |       |       |       |       |       |       |
   |        | 14 sent  |       |       |       |       |       |       |
   +--------+----------+-------+-------+-------+-------+-------+-------+

                   Table 6: Example of Tone Reporting

   Now consider a combined payload, where the tone payload is the
   primary payload type and the event payload is treated as a redundant
   encoding (one level of redundancy).  Because the primary payload is
   tones, the tone payload rules determine the setting of the RTP header
   fields.  This means that the RTP timestamp always advances.  As a
   corollary, the timestamp offset for the events payload in the RFC
   2198 header increases by the same amount.

   One issue that has to be considered in a combined payload is how to
   handle retransmissions of final event reports.  The tones payload
   specification does not recommend retransmissions of final packets, so
   it is unclear what to put in the primary payload fields of the
   combined packet.  In the interests of simplicity it is suggested that


Schulzrinne, et al.       Expires July 27, 2005                [Page 35]


Internet-Draft         Telephony Events and Tones           January 2005

   the retransmitted packets copy the fields relating to the primary
   payload (including the RTP timestamp) from the original packet.  The
   same principle can be applied if the packet includes multiple levels
   of event payload redundancy.

   The figures below all illustrate "RTP packet 14" in the above tables.
   Figure 3 shows an event-only payload, corresponding to Table 5.
   Figure 4 shows an event-only payload, corresponding to Table 6.
   Finally, Figure 5 shows a combined payload, with tones primary and
   events as a single redundant layer.  Note that the combined payload
   has the RTP sequence numbers shown in Table 5, because the
   transmitted sequence includes the retransmitted packets.

   Figure 3 assumes that the following SDP specification was used.  This
   session description provides for separate streams of G.729 audio and
   events.  Packets reported within the G.729 stream are not considered
   here.

      m=audio 12344 RTP/AVP 99
      a=rtpmap:99 G729/8000
      a=ptime:20
      m=audio 12346 RTP/AVP 100
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 0-15
      a=ptime:50

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      | 2 |0|0|   0   |0|    100      |            18                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      |                             11200                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      |                            0x5234a8                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     event     |E R| volume    |          duration             |
      |       1       |1 0|    20     |             1760              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 3: Example RTP Packet For Event Payload

   Figure 4 assumes that an SDP specification similar to that of the
   previous case was used.



Schulzrinne, et al.       Expires July 27, 2005                [Page 36]


Internet-Draft         Telephony Events and Tones           January 2005

      m=audio 12344 RTP/AVP 99
      a=rtpmap:99 G729/8000
      a=ptime:20
      m=audio 12346 RTP/AVP 101
      a=rtpmap:101 tone/8000
      a=ptime:50

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      | 2 |0|0|   0   |0|    101      |             14                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      |                             12800                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      |                            0x5234a8                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    modulation   |T|  volume   |          duration             |
      |        0        |0|    20     |             160               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |R R R R|       frequency       |R R R R|       frequency       |
      |0 0 0 0|          697          |0 0 0 0|         1209          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 4: Example RTP Packet For Tone Payload

   Figure 5, for the combined payload, assumes the following SDP session
   description:

      m=audio 12344 RTP/AVP 99
      a=rtpmap:99 G729/8000
      a=ptime:20
      m=audio 12346 RTP/AVP 102 101 100
      a=rtpmap:102 red/8000/1
      a=fmtp:102 101/100
      a=rtpmap:101 tone/8000
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 0-15
      a=ptime:50

   For ease of presentation, Figure 5 presents the actual payload as if
   they began on 32-bit boundaries.  In the actual packet, they follow
   immediately after the end of the RFC 2198 header, and thus are
   displaced one octet into successive words.



Schulzrinne, et al.       Expires July 27, 2005                [Page 37]


Internet-Draft         Telephony Events and Tones           January 2005

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      | 2 |0|0|   0   |0|    102      |             18                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      |                             12800                             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      |                            0x5234a8                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |F|   block PT  |  timestamp offset         |   block length    |
      |1|      100    |       1600                |        4          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |F|   block PT  |   event payload begins ...                    /
      |0|      101    |                                               \
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Event payload

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     event     |E R| volume    |          duration             |
      |       1       |1 0|    20     |             1760              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Tone payload

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    modulation   |T|  volume   |          duration             |
      |        0        |0|    20     |             160               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |R R R R|       frequency       |R R R R|       frequency       |
      |0 0 0 0|          697          |0 0 0 0|         1209          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 5: Example RTP Packet For Combined Tone and Event Payloads







Schulzrinne, et al.       Expires July 27, 2005                [Page 38]


Internet-Draft         Telephony Events and Tones           January 2005

6.  Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification (RFC 3550 [5]), and any appropriate RTP profile (for
   example RFC 3551 [11]).  This implies that confidentiality of the
   media streams is achieved by encryption.  Because the data
   compression used with this payload format is applied end-to-end,
   encryption may be performed after compression so there is no conflict
   between the two operations.

   This payload type does not exhibit any significant non-uniformity in
   the receiver side computational complexity for packet processing to
   cause a potential denial-of-service threat.

   Additional security considerations are described in RFC 2198 [2].

   A security review of this payload format found no additional
   considerations.
















Schulzrinne, et al.       Expires July 27, 2005                [Page 39]


Internet-Draft         Telephony Events and Tones           January 2005

7.  IANA Considerations

   This document defines two new RTP payload formats, named telephone-
   event and tone, and associated Internet media (MIME) types,
   audio/telephone-event and audio/tone.  It also defines the event
   codes for DTMF tone events.

   Within the audio/telephone-event type, events MUST be registered with
   IANA.  Registrations are subject to approval by the current chair of
   the IETF audio/video transport working group, or by an expert
   designated by the transport area director if the AVT group has
   closed.  The initial registry content is shown in Table 7, and
   consists of the events defined in Section 3 of this document.

   The meaning of new events MUST be documented either as an RFC or an
   equivalent standards document produced by another standardization
   body, such as ITU-T.  The documentation for each event MUST indicate
   whether the event is a state, tone, or other type of event (e.g., an
   out-of-band electrical event such as on-hook or an indication that
   will not itself be played out as tones at the receiving end).  For
   tone events, the documentation MUST indicate whether the volume field
   is applicable or must be set to 0.

   Legal event codes range from 0 to 255.














Schulzrinne, et al.       Expires July 27, 2005                [Page 40]


Internet-Draft         Telephony Events and Tones           January 2005

   +-----------------+-------------------------------+-----------------+
   |      Event Code | Event Name                    |       Reference |
   +-----------------+-------------------------------+-----------------+
   |               0 | DTMF digit "0"                |      <This RFC> |
   |                 |                               |                 |
   |               1 | DTMF digit "1"                |      <This RFC> |
   |                 |                               |                 |
   |               2 | DTMF digit "2"                |      <This RFC> |
   |                 |                               |                 |
   |               3 | DTMF digit "3"                |      <This RFC> |
   |                 |                               |                 |
   |               4 | DTMF digit "4"                |      <This RFC> |
   |                 |                               |                 |
   |               5 | DTMF digit "5"                |      <This RFC> |
   |                 |                               |                 |
   |               6 | DTMF digit "6"                |      <This RFC> |
   |                 |                               |                 |
   |               7 | DTMF digit "7"                |      <This RFC> |
   |                 |                               |                 |
   |               8 | DTMF digit "8"                |      <This RFC> |
   |                 |                               |                 |
   |               9 | DTMF digit "9"                |      <This RFC> |
   |                 |                               |                 |
   |              10 | DTMF digit "*"                |      <This RFC> |
   |                 |                               |                 |
   |              11 | DTMF digit "#"                |      <This RFC> |
   |                 |                               |                 |
   |              12 | DTMF digit "A"                |      <This RFC> |
   |                 |                               |                 |
   |              13 | DTMF digit "B"                |      <This RFC> |
   |                 |                               |                 |
   |              14 | DTMF digit "C"                |      <This RFC> |
   |                 |                               |                 |
   |              15 | DTMF digit "D"                |      <This RFC> |
   +-----------------+-------------------------------+-----------------+

           Table 7: audio/telephone-event Event Code Registry

7.1  MIME Registration

7.1.1  audio/telephone-event

   MIME media type name: audio

   MIME subtype name: telephone-event

   Required parameters: none.


Schulzrinne, et al.       Expires July 27, 2005                [Page 41]


Internet-Draft         Telephony Events and Tones           January 2005

   Optional parameters:

   The "events" parameter lists the events supported by the
   implementation.  Events are listed as one or more comma-separated
   elements.  Each element can either be a single integer or two
   integers separated by a hyphen.  No white space is allowed in the
   argument.  The integers designate the event numbers supported by the
   implementation.

   The "rate" parameter describes the sampling rate, in Hertz.  The
   number is written as a floating point number or as an integer.  If
   omitted, the default value is 8000 Hz.

   Encoding considerations:

   This type is only defined for transfer via RTP [5].

   Security considerations:

   See the "Security Considerations" section (Section 6) in this
   document.

   Interoperability considerations: none

   Published specification: This document.

   Applications which use this media:

   The telephone-event audio subtype supports the transport of events
   occuring in telephone systems over the Internet.

   Additional information:
   1.  Magic number(s): N/A
   2.  File extension(s): N/A
   3.  Macintosh file type code: N/A

7.1.2  audio/tone

   MIME media type name: audio

   MIME subtype name: tone

   Required parameters: none

   Optional parameters:

   The "rate" parameter describes the sampling rate, in Hertz.  The
   number is written as a floating point number or as an integer.  If


Schulzrinne, et al.       Expires July 27, 2005                [Page 42]


Internet-Draft         Telephony Events and Tones           January 2005

   omitted, the default value is 8000 Hz.

   Encoding considerations:

   This type is only defined for transfer via RTP [5].

   audio/tone MIME body parts contain binary data.  A content-
   transfer-encoding of "binary" is strongly encouraged for messaging
   environments which support binary transport.  A content-transfer-
   encoding of base-64 (and the associated transformation) is strongly
   encouraged for messaging environments which do not support binary
   transfer.

   Security considerations:

   See the "Security Considerations" section (Section 6) in this
   document.

   Interoperability considerations: none

   Published specification: This document.

   Applications which use this media: The tone audio subtype supports
   the transport of pure composite tones, for example those commonly
   used in the current telephone system to signal call progress.

   Additional information:
   1.  Magic number(s): N/A
   2.  File extension(s): N/A
   3.  Macintosh file type code: N/A











Schulzrinne, et al.       Expires July 27, 2005                [Page 43]


Internet-Draft         Telephony Events and Tones           January 2005

8.  Acknowledgements

   The suggestions of the Megaco working group are gratefully
   acknowledged.  Detailed advice and comments were provided by Hisham
   Abdelhamid, Flemming Andreasen, Fred Burg, Steve Casner, Dan
   Deliberato, Fatih Erdin, Bill Foster, Mike Fox, Mehryar Garakani,
   Gunnar Hellstrom, Rajesh Kumar, Terry Lyons, Steve Magnell, Zarko
   Markov, Kai Miao, Satish Mundra, Kevin Noll, Vern Paxson, Oren Peleg,
   Colin Perkins, Raghavendra Prabhu, Moshe Samoha, Todd Sherer, Adrian
   Soncodi, Yaakov Stein, Mira Stevanovic, Alex Urquizo and Herb
   Wildfeur.




















Schulzrinne, et al.       Expires July 27, 2005                [Page 44]


Internet-Draft         Telephony Events and Tones           January 2005

9.  References

9.1  Normative References

   [1]  Bradner, S., "Key words for use in RFCs to indicate requirement
        levels", RFC 2119, March 1997.

   [2]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
        Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP payload
        for redundant audio data", RFC 2198, September 1997.

   [3]  Handley, M. and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327, April 1998.

   [4]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
        the Session Description Protocol (SDP)", RFC 3264, June 2002.

   [5]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
        "RTP: A Transport Protocol for Real-Time Applications",
        RFC 3550, STD 0064, July 2003.

   [6]  Casner, S. and P. Hoschka, "MIME Type Registration of RTP
        Payload Formats", RFC 3555, July 2003.

   [7]  International Telecommunication Union, "Technical features of
        push-button telephone sets", ITU-T Recommendation Q.23, November
        1988.

   [8]  International Telecommunication Union, "Multifrequency
        push-button signal reception", ITU-T Recommendation Q.24,
        November 1988.

9.2  Informative References

   [9]   Hellstrom, G., "RTP Payload for Text Conversation", RFC 2793,
         May 2000.

   [10]  Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
         Telephony Tones and Telephony Signals", RFC 2833, May 2000.

   [11]  Schulzrinne, H., "RTP profile for audio and video conferences
         with minimal control", RFC 3551, STD 0065, July 2003.

   [12]  International Telecommunication Union, "Technical
         characteristics of tones for the telephone service", ITU-T
         Recommendation E.180/Q.35, March 1998.

   [13]  International Telecommunication Union, "Pulse code modulation


Schulzrinne, et al.       Expires July 27, 2005                [Page 45]


Internet-Draft         Telephony Events and Tones           January 2005

         (PCM) of voice frequencies", ITU-T Recommendation G.711,
         November 1988.

   [14]  International Telecommunication Union, "Speech coders : Dual
         rate speech coder for multimedia communications transmitting at
         5.3 and 6.3 kbit/s", ITU-T Recommendation G.723.1, March 1996.

   [15]  International Telecommunication Union, "Coding of speech at 8
         kbit/s using conjugate-structure algebraic-code-excited
         linear-prediction (CS-ACELP)", ITU-T Recommendation G.729,
         March 1996.

   [16]  International Telecommunication Union, "ISDN user-network
         interface layer 3 specification for basic call control", ITU-T
         Recommendation Q.931, May 1998.

   [17]  International Telecommunication Union, "Procedures for
         real-time Group 3 facsimile communication over IP networks",
         ITU-T Recommendation T.38, July 2003.

   [18]  International Telecommunication Union, "Procedures for starting
         sessions of data transmission over the public switched
         telephone network", ITU-T Recommendation V.8, November 2000.

   [19]  International Telecommunication Union, "Modem-over-IP networks:
         Procedures for the end-to-end connection of V-series DCEs",
         ITU-T Recommendation V.150.1, January 2003.

   [20]  International Telecommunication Union, "Procedures for
         supporting Voice-Band Data over IP Networks", ITU-T
         Recommendation V.152, January 2005.

Authors' Addresses

   Henning Schulzrinne
   Columbia U.
   Dept. of Computer Science
   Columbia University
   1214 Amsterdam Avenue
   New York, NY  10027
   US

   Email: schulzrinne@cs.columbia.edu




Schulzrinne, et al.       Expires July 27, 2005                [Page 46]


Internet-Draft         Telephony Events and Tones           January 2005

   Scott Petrack
   eDial
   266 Second Ave
   Waltham, MA  02451
   US

   Email: scott.petrack@edial.com

   Tom Taylor
   Nortel
   1852 Lorraine Ave
   Ottawa, Ontario  K1H 6Z8
   CA

   Email: taylor@nortel.com


















Schulzrinne, et al.       Expires July 27, 2005                [Page 47]


Internet-Draft         Telephony Events and Tones           January 2005

Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.

Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Schulzrinne, et al.       Expires July 27, 2005                [Page 48]