Audio/Video Transport (avt)
   Internet Draft                                        H. Schulzrinne
   Document: draft-ietf-avt-rfc2833bis-06.txt               Columbia U.
                                                             S. Petrack
                                                                  eDial
                                                              T. Taylor
                                                        Nortel Networks
   Expires: April 2005                                    November 2004


    RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals


Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.

Abstract

   This memo describes how to carry dual-tone multifrequency (DTMF)
   signaling, other tone signals and telephony events in RTP packets.
   This memo captures and expands upon the basic framework defined in
   RFC 2833, but retains only the most basic event codepoints.  Other
   codepoints are documented separately.








Schulzrinne, Petrack     Expires - April 2005                 [Page 1]


                    RTP Events and Tones Payloads       November 2004


Table of Contents

   1.    Introduction................................................4
      1.1   Terminology..............................................4
      1.2   Overview.................................................4
      1.3   Potential Applications...................................5
      1.4   Events, States, Tone Patterns, and Voice Encoded Tones...6

   2.    RTP Payload Format for Named Telephone Events...............7
      2.1   Introduction.............................................7
      2.2   Use of RTP Header Fields.................................8
         2.2.1 Timestamp.............................................8
         2.2.2 Marker Bit............................................8
      2.3   Payload Format...........................................8
         2.3.1 Event Field...........................................8
         2.3.2 E ("End") Bit.........................................8
         2.3.3 R Bit.................................................8
         2.3.4 Volume Field..........................................9
         2.3.5 Duration Field........................................9
      2.4   Optional MIME Parameters.................................9
         2.4.1 Relationship to SDP..................................10
      2.5   Procedures..............................................10
         2.5.1 Sending Procedures...................................10
         2.5.1.1  Negotiation of Payloads...........................10
         2.5.1.2  Transmission of Event Packets.....................11
         2.5.1.3  Long Duration Events..............................12
         2.5.1.4  Retransmission of Final Packet....................12
         2.5.1.5  Packing Multiple Events Into One Packet...........12
         2.5.1.6  RTP Sequence Number...............................13
         2.5.2 Receiving Procedures.................................13
         2.5.2.1  Indication of Receiver Capabilities using SDP.....13
         2.5.2.2  Playout of Tone Events playout....................13
         2.5.2.3  Long Duration Events..............................15
         2.5.2.4  Multiple Events In a Packet.......................15
         2.5.2.5  Soft States.......................................16
      2.6   Reliability.............................................16
         2.6.1 Intra-Event Updates..................................16
         2.6.2 Multi-Event Redundancy...............................16

   3. Specification of Codepoints For Telephone Events..............17
      3.1   DTMF Events.............................................18
      3.2   Data Modem and Fax Events...............................19
         3.2.1 V.21 Events..........................................20
         3.2.2 V.8 Events...........................................22
         3.2.3 V.25 Events..........................................23
         3.2.4 T.30 Events..........................................25


   4. RTP Payload Format for Telephony Tones........................28


Schulzrinne, Petrack     Expires - April 2005                 [Page 2]


                    RTP Events and Tones Payloads       November 2004


      4.1   Introduction............................................28
      4.2   Examples of Common Telephone Tone Signals...............29
      4.3   Use of RTP Header Fields................................30
         4.3.1 Timestamp............................................30
         4.3.2 Marker Bit...........................................30
         4.3.3 Payload Format.......................................30
         4.3.4 Optional MIME Parameters.............................32
      4.4   Procedures..............................................32
         4.4.1 Sending Procedures...................................32
         4.4.2 Receiving Procedures.................................33

   5. Application Considerations....................................34
      5.1   Combining Tones and Named Events........................34
      5.2   Simultaneous Generation of Audio and Events.............34
      5.3   Strategies For Handling FAX and Modem Signals...........35
      5.4   Examples................................................36
         5.4.1 Use of RFC 2198 Redundancy With Named Events.........36
         5.4.2 Combined Tone and Telephone-event Payloads...........38

   6. MIME Registration.............................................40
      6.1   audio/telephone-event...................................40
      6.2   audio/tone..............................................41

   7. Security Considerations.......................................42

   8. IANA Considerations...........................................42

   9. Acknowledgements..............................................44

   10.   Authors  ..................................................44

   12.   References.................................................45
      12.1  Normative References....................................45
      12.2  Informative References..................................46

















Schulzrinne, Petrack     Expires - April 2005                 [Page 3]


                    RTP Events and Tones Payloads       November 2004


1. Introduction

1.1   Terminology

   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
   and "OPTIONAL" are to be interpreted as described in RFC 2119 [N-1]
   and indicate requirement levels for compliant implementations.

   Normative references appear as [N-n], while informative references
   appear as [I-n].  All references are at the end of this memo.

   This document uses the following abbreviations:

   DTMF  Dual Tone Multifrequency

   IVR   Integrated Voice Response unit

   PSTN  Public Switched (circuit) Telephone Network

1.2   Overview

   This memo defines two RTP [N-4] payload formats, one for carrying
   dual-tone multifrequency (DTMF) digits and other line and trunk
   signals as events (section 2), and a second one to describe general
   multi-frequency tones in terms only of their frequency and cadence
   (section 4). Separate RTP payload formats for telephony tone signals
   are desirable since low-rate voice codecs cannot be guaranteed to
   reproduce these tone signals accurately enough for automatic
   recognition. In addition, tone properties such as the phase reversals
   in the ANSam tone will not survive speech coding.  Defining separate
   payload formats also permits higher redundancy while maintaining a
   low bit rate.  Finally, some telephony events such as "on-hook" occur
   out-of-band and cannot be transmitted as tones.

   The remainder of this section provides the motivation for defining
   the payload types described in this document.  Section 2 defines the
   payload format and associated procedures for use of named events.
   Section 3 describes the events for which codepoints are defined in
   this document. Section 4 describes the payload format and associated
   procedures for tone representations.  Section 5 deals with
   achievement of reliable delivery through redundancy and the use of
   combined payloads. Section 6 provides the MIME media type
   registrations for the two payload formats, and also defines the IANA
   requirements for registration of codepoints for named telephone
   events.  Section 7 deals with security considerations.





Schulzrinne, Petrack     Expires - April 2005                 [Page 4]


                    RTP Events and Tones Payloads       November 2004


1.3   Potential Applications

   The payload formats described here may be useful in a number of
   different scenarios.

   On the sending side, there are two basic possibilities: either the
   sending side is an end system which originates the signals itself, or
   it is a gateway with the task of propagating incoming telephone
   signals into the Internet.

   On the receiving side there are more possibilities.  The first is
   that the receiver must propagate tone signalling accurately into the
   PSTN for machine consumption.  One example of this is a gateway
   passing DTMF tones to an IVR.  In this scenario, frequencies,
   amplitudes, tone durations, and the durations of pauses between tones
   are all significant, and individual tone signals must be delivered
   reliably and in order.

   In the second scenario, the receiver must play out tones for human
   consumption.  Typically, rather than a series of tone signals each
   with its own meaning, the content will consist of a single sequence
   of tones and possibly silence, played out continuously or repeated
   cyclically for some period of time.  Often the end of the tone
   playout will be triggered by an event fed back in the other
   direction, using either in- or out-of-band means.  Examples of this
   are dial tone or busy tone.

   The relationship between locality and the tones to be played out is a
   complicating factor in this scenario.  In the phone network, tones
   are generated at different places, depending on the switching
   technology and the nature of the tone. This determines, for example,
   whether a person making a call to a foreign country hears her local
   tones she is familiar with or the tones as used in the country
   called.

   For analog lines, dial tone is always generated by the local switch.
   ISDN terminals may generate dial tone locally and then send a Q.931
   [I-7] SETUP message containing the dialed digits. If the terminal
   just sends a SETUP message without any Called Party digits, then the
   switch does digit collection, provided by the terminal as KEYPAD
   messages, and provides dial tone over the B-channel. The terminal can
   either use the audio signal on the B-channel or can use the Q.931
   messages to trigger locally generated dial tone.

   Ringing tone (also called ringback tone) is generated by the local
   switch at the callee, with a one-way voice path opened up as soon as
   the callee's phone rings. (This reduces the chance of clipping the
   called party's response just after answer. It also permits pre-answer
   announcements or in-band call-progress indications to reach the


Schulzrinne, Petrack     Expires - April 2005                 [Page 5]


                    RTP Events and Tones Payloads       November 2004


   caller before or in lieu of a ringing tone.) Congestion tone and
   special information tones can be generated by any of the switches
   along the way, and may be generated by the caller's switch based on
   ISUP messages received. Busy tone is generated by the caller's
   switch, triggered by the appropriate ISUP message, for analog
   instruments, or the ISDN terminal.

   In the third scenario, an end system is directly connected to the
   Internet and does not need to generate tone signals again, so that
   time alignment and power levels are not relevant. These systems rely
   on PSTN gateways or Internet end systems to generate DTMF events and
   do not perform their own audio waveform analysis. An example of such
   a system is an Internet interactive voice-response (IVR) system.

   In circumstances where exact timing alignment between the audio
   stream and the DTMF digits or other events is not important and data
   is sent unicast, such as the IVR example mentioned earlier, it may be
   preferable to use a reliable control protocol rather than RTP
   packets. In those circumstances, this payload format would not be
   used.

   Note that in a number of these cases it is possible that the gateway
   or end system will be both a sender and receiver of telephone
   signals.  Sometimes the same class of signals will be sent as
   received -- in the case of "RTP trunking" or voiceband data, for
   instance.  In other cases, such as that of an end system serving
   analogue lines, the signals sent will be in a different class from
   those received.

1.4   Events, States, Tone Patterns, and Voice Encoded Tones

   This document provides the means for in-band transport over the
   Internet of two broad classes of signalling information: in-band
   tones or tone sequences, and signals sent out-of-band in the PSTN.
   Three methods, two of which are defined by this document, are
   available for carrying tone signals; only one of the three can be
   used to carry out-of-band PSTN signals.  Depending on the
   application, it may be desirable to carry the signalling information
   in more than one form at once.  Section 5 discusses when and how this
   should be done.

   1) The gateway or end system can upspeed to a higher-bandwidth codec
      such as G.711 [I-3] when tone signals are to be conveyed.
      Alternatively, for FAX or modem signals respectively, a
      specialized transport such as T.38 [I-8], RFC 2793 [I-1], or
      V.150.1 modem relay [I-17] may be used.

   2) The sending gateway can simply measure the frequency components of
      the voice band signals and transmit this information to the RTP


Schulzrinne, Petrack     Expires - April 2005                 [Page 6]


                    RTP Events and Tones Payloads       November 2004


      receiver using the tone representation defined in this document
      (section 4). In this mode, the gateway makes no attempt to discern
      the meaning of the tones, but simply distinguishes tones from
      speech signals. An end system may use the same approach using
      configured rather than measured frequencies.

      All tone signals in use in the PSTN and meant for human
      consumption are sequences of simple combinations of sine waves,
      either added or modulated. (There is at least one tone, however,
      the ANSam tone [N-11] used for indicating data transmission over
      voice lines, that makes use of periodic phase reversals.)

   3) As a third option, a gateway can recognize the tones and translate
      them into a name, such as ringing or busy tone or DTMF digit '0'
      (section 2). The receiver then produces a tone signal or other
      indication appropriate to the signal. Generally, since the
      recognition of signals at the sender often depends on their on/off
      pattern or the sequence of several tones, this recognition can
      take several seconds. On the other hand, the gateway may have
      access to the actual signaling information that generates the
      tones and thus can generate the RTP packet immediately, without
      the detour through acoustic signals.

      The use of named events is the only feasible method for
      transmitting out-of-band PSTN signals as content within RTP
      sessions.



2. RTP Payload Format for Named Telephone Events

2.1   Introduction

   The RTP payload format for named telephone events is designated as
   "telephone-event", the MIME type as "audio/telephone-event". In
   accordance with current practice, this payload format does not have a
   static payload type number, but uses a RTP payload type number
   established dynamically and out-of-band. The default clock frequency
   is 8000 Hz, but the clock frequency can be redefined when assigning
   the dynamic payload type.

   Named telephone events are carried as part of the audio stream, and
   MUST use the same sequence number and time-stamp base as the regular
   audio channel to simplify the generation of audio waveforms at a
   gateway. The named telephone events payload type can be considered to
   be a very highly-compressed audio codec, and is treated the same as
   other codecs.




Schulzrinne, Petrack     Expires - April 2005                 [Page 7]


                    RTP Events and Tones Payloads       November 2004


2.2   Use of RTP Header Fields

2.2.1 Timestamp

   The RTP timestamp reflects the measurement point for the current
   packet. The event duration described in section 2.5 extends forwards
   from that time. For events that span multiple RTP packets, the RTP
   timestamp identifies the beginning of the event, i.e., several RTP
   packets may carry the same timestamp. For long-lasting events that
   have to be split into subevents (see below, section 2.5.1.3), the
   timestamp indicates the beginning of the subevent.

2.2.2 Marker Bit

   The RTP marker bit indicates the beginning of a new event. For long-
   lasting events that have to be split into subevents (see below,
   section 2.5.1.3), only the first subevent will have the marker bit
   set.

2.3   Payload Format

   The payload format for named telephone events is shown in Figure 1.

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |     event     |E|R| volume    |          duration             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 1: Payload Format for Named Events

2.3.1 Event Field

   The event field is a number between 0 and 255 identifying a specific
   telephony event.  An IANA registry of codepoints for this field has
   been established (see IANA Considerations, section 8).  The initial
   content of this registry consists of the events defined in section 3.

2.3.2 E ("End") Bit

   If set to a value of one, the "end" bit indicates that this packet
   contains the end of the event.  For long-lasting events that have to
   be split into subevents (see below, section 2.5.1.3), only the final
   packet for the final subevent will have the "E" bit set.

2.3.3 R Bit

   This field is reserved for future use. The sender MUST set it to
   zero, the receiver MUST ignore it.


Schulzrinne, Petrack     Expires - April 2005                 [Page 8]


                    RTP Events and Tones Payloads       November 2004


2.3.4 Volume Field

   For DTMF digits and other events representable as tones, this field
   describes the power level of the tone, expressed in dBm0 after
   dropping the sign. Power levels range from 0 to -63 dBm0. Thus,
   larger values denote lower volume. This value is defined only for
   events for which the documentation indicates that volume is
   applicable.  For other events, the sender MUST set volume to zero and
   the receiver MUST ignore the value.

2.3.5 Duration Field

   The duration field indicates the duration of the event or subevent
   being reported, in timestamp units, expressed as an unsigned integer.
   For a non-zero value, the event or subevent began at the instant
   identified by the RTP timestamp and has so far lasted as long as
   indicated by this parameter. The event may or may not have ended.  If
   the event duration exceeds the maximum representable by the duration
   field, the event is split into several contiguous subevents as
   described below (section 2.5.1.3).

   The special duration value of zero is reserved to indicate that the
   event lasts "forever", i.e., is a state and is considered to be
   effective until updated.  A sender MUST NOT transmit a zero duration
   for events other than those defined as states.  The receiver SHOULD
   ignore an event report with zero duration if the event is not a
   state.

   Events defined as states MAY contain a non-zero duration, indicating
   that the sender intends to refresh the state before the time duration
   has elapsed ("soft state").

      For a sampling rate of 8000 Hz, the duration field is sufficient
      to express event durations of up to approximately 8 seconds.

2.4   Optional MIME Parameters

   As indicated in the MIME registration for named events in section
   6.1, the telephone-event MIME type supports two optional parameters:
   the "events" parameter, and the "rate" parameter.

   The "events" parameter lists the events supported by the
   implementation. Events are listed as one or more comma-separated
   elements. Each element can either be a single integer or an integer
   followed by a hyphen and a larger integer, representing a range of
   consecutive event codepoints. No white space is allowed in the
   argument. The integers designate the event numbers supported by the
   implementation.



Schulzrinne, Petrack     Expires - April 2005                 [Page 9]


                    RTP Events and Tones Payloads       November 2004


   The "rate" parameter describes the sampling rate, in Hertz, and hence
   the units for the RTP timestamp and event duration fields. The number
   is written as a floating point number or as an integer. If omitted,
   the default value is 8000 Hz.

2.4.1 Relationship to SDP

   The recommended mapping of MIME optional parameters to SDP is given
   in section 3 of RFC 3555 [N-5].  The "rate" MIME parameter for the
   named event payload type follows this convention: it is expressed as
   usual as the <clock rate> component of the a=rtpmap: attribute line.

   The "events" MIME parameter deviates from the convention suggested in
   RFC 3555 because it omits the string "events=" before the list of
   supported events.

      a=fmtp:<format> <list of values>

   The list of values has the format described above for the MIME
   parameter.  The list does not have to be sorted.

   For example, if the payload format uses the payload type number 100,
   and the implementation can handle the DTMF tones (events 0 through
   15) and the dial and ringing tones, it would include the following
   description in its SDP message:

      m=audio 12345 RTP/AVP 100
      a=rtpmap:100 telephone-event/8000
      a=fmtp:100 0-15,66,70

   The following sample media type definition corresponds to the SDP
   example above:

      audio/telephone-event;events="0-15,66,67";rate="8000"

2.5   Procedures

   This section defines the procedures associated with the named event
   payload type.  Additional procedures may be specified in the
   documentation associated with specific event codepoints.

2.5.1 Sending Procedures

2.5.1.1  Negotiation of Payloads

   Negotiation of payloads between sender and receiver is achieved by
   out-of-band means, using SDP, for example.




Schulzrinne, Petrack     Expires - April 2005                [Page 10]


                    RTP Events and Tones Payloads       November 2004


   The sender SHOULD indicate what events it supports, using the
   optional "events" parameter associated with the telephone-events MIME
   type.  If the sender receives an "events" parameter from the
   receiver, it MUST restrict the set of events it sends to those listed
   in the received "events" parameter.  For backward compatibility, if
   no "events" parameter is received, the sender SHOULD assume support
   for the DTMF events 0-15 but for no other events.

2.5.1.2  Transmission of Event Packets

   DTMF digits and named telephone events are carried as part of the
   audio stream, and MUST use the same sequence number and time-stamp
   base as the regular audio channel to simplify the generation of audio
   waveforms at a gateway.

   An audio source SHOULD start transmitting event packets as soon as it
   recognizes an event, and continue to send updates until the event has
   ended. The update packet MUST have the same RTP timestamp value as
   the initial packet for the event, but the duration MUST be increased
   to reflect the total cumulative duration since the beginning of the
   event.

   The first packet for an event MUST have the "M" bit set.  The final
   packet for an event MUST have the "E" bit set, but setting of the "E"
   bit MAY be deferred until the final packet is retransmitted (see
   section 2.5.1.4).  Intermediate packets for an event MUST NOT have
   either the "M" bit or the "E" bit set.

   Sending of a packet with the "E" bit set is OPTIONAL if the packet
   reports two events which are defined as mutually exclusive states, or
   if the final packet for one state is immediately followed by a packet
   reporting a mutually exclusive state.  (For events defined as states,
   the appearance of a mutually exclusive state implies the end of the
   previous state.)

   A source has wide latitude as to how often it sends event updates. A
   natural interval is the spacing between non-event audio packets.
   (Recall that a single RTP packet can contain multiple audio frames
   for frame-based codecs and that the packet interval can vary during a
   session.) Alternatively, a source MAY decide to use a different
   spacing for event updates, called an event period, with a value of
   50 ms RECOMMENDED.

   DTMF digits and events are sent incrementally to avoid having the
   receiver wait for the completion of the event. Since some tones are
   two seconds long, this would incur a substantial delay. The
   transmitter does not know if event length is important and thus needs
   to transmit immediately and incrementally. If the receiver
   application does not care about event length, the incremental


Schulzrinne, Petrack     Expires - April 2005                [Page 11]


                    RTP Events and Tones Payloads       November 2004


   transmission mechanism avoids delay. Some applications, such as
   gateways into the PSTN, care about both delays and event duration.

   For robustness, the sender SHOULD retransmit "state" events
   periodically.

   Timing information is contained in the RTP timestamp, allowing
   precise recovery of inter-event times.  Thus, the sender does not
   need to maintain precise or consistent time intervals between event
   packets.

2.5.1.3  Long Duration Events

   If an event persists beyond the maximum duration expressible in the
   duration field (0xFFFF), the sender MUST send a packet reporting this
   maximum duration but MUST NOT set the "E" bit in this packet.  The
   sender MUST then begin reporting a new "subevent" with the RTP
   timestamp set to the time at which the previous subevent ended and
   the duration set to the cumulative duration of the new subevent.  The
   "M" bit of the first packet reporting the new subevent MUST NOT be
   set.  The sender MUST repeat this procedure as required until the end
   of the complete event has been reached.  The final packet for the
   complete event MUST have the "E" bit set (either on initial
   transmission or on retransmission as described below).

2.5.1.4  Retransmission of Final Packet

   The final packet for each event and for each subevent SHOULD be sent
   a total of three times at the interval used by the source for
   updates. (If a new event is recognized during the retransmissions and
   RFC 2198  [N-2] is in use, the old event will be part of the
   redundancy in the RFC 2198 payloads.) This ensures that the duration
   of the event or subevent can be recognized correctly even if an
   instance of the last packet is lost.

   A sender MAY delay setting the "E" bit until retransmitting the last
   packet for a tone, rather than setting the bit on its first
   transmission. This avoids having to wait to detect whether the tone
   has indeed ended.  Once the sender has set the "E" bit for a packet,
   it MUST continue to set the "E" bit for any further retransmissions
   of that packet.

2.5.1.5  Packing Multiple Events Into One Packet

   Multiple named events can be packed into a single RTP packet if and
   only if the events are consecutive and contiguous, i.e., occur
   without overlap and without pause between them, and if the last event
   packed into a packet occurs quickly enough to avoid excessive delays
   at the receiver.


Schulzrinne, Petrack     Expires - April 2005                [Page 12]


                    RTP Events and Tones Payloads       November 2004


      This approach is similar to having multiple frames of frame-based
      audio in one RTP packet.

   The constraint that packed events not overlap implies that events
   designated as states can be followed in a packet only by other state
   events which are mutually exclusive to them.  The constraint itself
   is needed so that the beginning time of each event can be calculated
   at the receiver.

   In a packet containing events packed in this way, the RTP timestamp
   MUST identify the beginning of the first event or subevent in the
   packet.  The "M" bit MUST be set (since the packet records the
   beginning of at least one event).  The "E" bit and duration for each
   event in the packet MUST be set using the same rules as if that event
   were the only event contained in the packet.

   For events with a duration shorter than a typical packet interval,
   for example, V.21 bits, it is RECOMMENDED that multiple events are
   represented by a single RFC 2198 [N-2] packet, as described in
   section 5.

2.5.1.6  RTP Sequence Number

   The RTP sequence number MUST be incremented by one in each successive
   RTP packet sent.  Incrementing applies to retransmitted as well as
   initial instances of event reports, to permit the receiver to detect
   lost packets for RTCP receiver reports.

2.5.2 Receiving Procedures



2.5.2.1  Indication of Receiver Capabilities using SDP

   Receivers can indicate which named events they can handle, for
   example, by using the Session Description Protocol (RFC 2327 [N-3]).
   SDP descriptions using the event payload MUST contain an fmtp format
   attribute that lists the event values that the receiver can process.

2.5.2.2  Playout of Tone Events

   In the gateway scenario, an Internet telephony gateway connecting a
   packet voice network to the PSTN recreates the DTMF or other tones
   and injects them into the PSTN. Since, for example, DTMF digit
   recognition takes several tens of milliseconds, the first few
   milliseconds of a digit will arrive as regular audio packets. Thus,
   careful time and power (volume) alignment between the audio samples
   and the events is needed to avoid generating spurious digits at the
   receiver.  Playout when audio packets continue to arrive as the event
   proceeds is discussed further in section 5.2 below.



Schulzrinne, Petrack     Expires - April 2005                [Page 13]


                    RTP Events and Tones Payloads       November 2004




   Receiver implementations MAY use different algorithms to create
   tones, including the two described here. Note that not all
   implementations have the need to recreate a tone; some may only care
   about recognizing the events.

   In the first algorithm, the receiver simply places a tone of the
   given duration in the audio playout buffer at the location indicated
   by the timestamp. As additional packets are received that extend the
   same tone, the waveform in the playout buffer is extended
   accordingly. (Care has to be taken if audio is mixed, i.e., summed,
   in the playout buffer rather than simply copied.) Thus, if a packet
   in a tone lasting longer than the packet interarrival time gets lost
   and the playout delay is short, a gap in the tone may occur.

   Alternatively, the receiver can start a tone and play it until it
   receives a packet with the "E" bit set, the next tone, distinguished
   by a different timestamp value or a given time period elapses. This
   is more robust against packet loss, but may extend the tone beyond
   its original duration if all retransmissions of the last packet in an
   event are lost. Limiting the time period of extending the tone is
   necessary to avoid that a tone "gets stuck". This algorithm is not a
   license for senders to set the duration field to zero; it MUST be set
   to the current duration as described, since this is needed to create
   accurate events if the first event packet is lost, among other
   reasons.

   Regardless of the algorithm used, the tone SHOULD NOT be extended by
   more than three packet interarrival times. A slight extension of tone
   durations and shortening of pauses is generally harmless.

   If a receiver has extended a tone by the maximum extension duration
   and started playing silence, it MUST NOT resume playing the tone when
   later packets for that event arrive, as this would cause spurious
   events to be detected downstream.

   If a receiver receives an event packet for an event which it is not
   currently playing out and the packet does not have the "M" bit set,
   earlier packets for that event have evidently been lost.  This can
   also be determined by gaps in the RTP sequence number.  The receiver
   MAY determine on the basis of retained history and the timestamp and
   event code of the current packet that it corresponds to an event
   already played out and lapsed.  In that case further reports for the
   event MUST be ignored, as indicated in the previous paragraph.  If
   this is not so, the receiver MAY attempt to play the event out to the
   complete duration indicated in the event report.  The appropriate
   behaviour will depend on the event type concerned, and requires
   consideration of the relationship of the event to audio media flows



Schulzrinne, Petrack     Expires - April 2005                [Page 14]


                    RTP Events and Tones Payloads       November 2004


   and whether correct event duration is essential to the correct
   operation of the media session.

   A receiver SHOULD NOT rely on a particular event packet spacing, but
   instead MUST use the event timestamps and durations to determine
   timing and duration of playout.

   The receiver MUST calculate jitter for RTCP receiver reports based on
   all packets with a given timestamp. Note: The jitter value should
   primarily be used as a means for comparing the reception quality
   between two users or two time-periods, not as an absolute measure.

   If a zero volume is indicated for an event for which the volume field
   is defined, then the receiver MAY reconstruct the volume from the
   volume of non-event audio or MAY use the nominal value specified by
   the ITU Recommendation or other document defining the tone.  This
   ensures backwards compatibility with RFC 2833, where the volume field
   was defined only for DTMF events.

2.5.2.3  Long Duration Events

   If an event report is received with duration equal to the maximum
   duration expressible in the duration field (0xFFFF) and the "E" bit
   for the report is not set, the event report may mark the end of a
   subevent generated according to the procedures of section 2.5.1.3.
   If another report for the same event type is received, the receiver
   MUST compare the RTP timestamp for the new event with the sum of the
   RTP timestamp of the previous report plus the duration (0xFFFF).  The
   receiver uses the absence of a gap between the events to detect that
   it is receiving a single long-duration event.

   The total duration of a long duration event is (obviously) the sum of
   the durations of the subevents used to report it.  This is equal to
   the duration of the final subevent (as indicated in the final packet
   for that subevent), plus 0xFFFF multiplied by the number of subevents
   preceding the final subevent.

2.5.2.4  Multiple Events In a Packet

   The procedures of section 2.5.1.5 require that if multiple events are
   reported in the same packet, they are contiguous and non-overlapping.
   As a result, it is not strictly necessary for the receiver to know
   the start times of the events following the first one in order to
   play them out -- it needs only to respect the duration reported for
   each event.  Nevertheless, if knowledge of the start time for a given
   event after the first one is required, it is equal to the sum of the
   start time of the preceding event plus the duration of the preceding
   event.



Schulzrinne, Petrack     Expires - April 2005                [Page 15]


                    RTP Events and Tones Payloads       November 2004


2.5.2.5  Soft States

   If the duration of a soft state event expires, the receiver SHOULD
   consider the value of the state to be "unknown" unless otherwise
   indicated in the event documentation (e.g., in section 3).

2.6   Reliability

   The named event mechanism uses three complementary redundancy
   mechanisms to deal with lost packets:

   Intra-event updates:

      Events that last longer than one event period (e.g., 50 ms) are
      updated periodically, so that the receiver can reconstruct the
      event and its duration if it receives any of the update packets,
      albeit with delay. This mechanism is described in section 2.6.1
      and is most helpful for longer events.

   Repeat last event packet:

      As described in section 2.5.1.4, the last event packet is
      transmitted a total of three times if there is no subsequent
      event. This mechanism is applicable for widely-spaced events.

   Multi-event redundancy:

      Section 2.6.2 describes how a summary of earlier events MAY be
      carried in RFC 2198 redundancy payloads.  This is particularly
      useful for sequences of short events, e.g., digits dialed by a
      modem or autodialer or in-band tone signaling sequences (section
      3.2 or 3.5).

2.6.1 Intra-Event Updates

   During an event, the RTP event payload format provides incremental
   updates on the event. The error resiliency afforded by this mechanism
   depends on whether the first or second algorithm in section 2.5.2.2
   is used and on the playout delay at the receiver. For example, if the
   receiver uses the first algorithm and only places the current
   duration of tone signal in the playout buffer, for a playout delay of
   120 ms and a packet gap of 50 ms, two packets in a row can get lost
   without causing a premature end of the tone generated.

2.6.2 Multi-Event Redundancy

   The audio redundancy mechanism described in RFC 2198 [N-2] MAY be
   used to recover from packet loss across events. For the suggested
   packet gap of 50 ms, the effective data rate is r times 64 bits (32


Schulzrinne, Petrack     Expires - April 2005                [Page 16]


                    RTP Events and Tones Payloads       November 2004


   bits for the redundancy header and 32 bits for the telephone-event
   payload) plus 8 bits for the primary encoding every 50 ms or (r times
   1280 + 160) bits/second, where r is the number of redundant events
   carried in each packet. The value of r is an implementation trade-
   off, with a value of 5 suggested.

   The timestamp offset in this redundancy scheme has 14 bits, so that
   it allows a single packet to "cover" 2.048 seconds of telephone
   events at a sampling rate of 8000 Hz. Including the starting time of
   previous events allows precise reconstruction of the tone sequence at
   a gateway. The scheme is resilient to consecutive packet losses
   spanning this interval of 2.048 seconds or r digits, whichever is
   less. Note that for previous digits, only an average loudness can be
   represented.

   An encoder MAY treat the event payload as a highly-compressed version
   of the current audio frame. In that mode, each RTP packet during an
   event would contain the current audio codec rendition (say, G.723.1
   [I-4] or G.729 [I-5] of this digit as well as the representation
   described in section 2, plus any previous events seen earlier.

      This approach allows dumb gateways that do not understand this
      format to function. See also the discussion in section 1.

   The payload format described here achieves a higher redundancy even
   in the case of sustained packet loss than the method proposed for the
   Voice over Frame Relay Implementation Agreement [I-18].  In short,
   senders generate updates at regular intervals, thus ensuring that
   each event is transmitted multiple times. RFC 2198 [N-2] is used to
   recover events where all packets sent during the event have been
   lost.



3. Specification of Codepoints For Telephone Events

   This document defines two classes of named events:

   1) DTMF tones (section 3.1);

   2) data and fax-related tones (section 3.2);

   It is intended that other RFCs define additional events, and in
   particular define and update the events present in RFC 2833 but not
   documented here.

   The tables listing the event codepoints for each class indicate
   whether the respective events are states, tones, or other.  For tone



Schulzrinne, Petrack     Expires - April 2005                [Page 17]


                    RTP Events and Tones Payloads       November 2004


   events, the tables indicate whether the volume field is applicable or
   must be set to 0.

3.1   DTMF Events

   DTMF signalling [N-8] is typically generated by a telephone set or
   possibly by a PBX.  DTMF digits may be consumed by entities such as
   gateways or application servers in the IP network, or by entities
   such as telephone switches or IVRs in the circuit switched network.

   The DTMF events support two possible applications at the sending end,
   and two at the receiving end.  In the first application at the
   sending end, the Internet telephony gateway detects DTMF on the
   incoming circuits and sends the RTP payload described here instead of
   regular audio packets. The gateway likely has the necessary digital
   signal processors and algorithms, as it often needs to detect DTMF,
   e.g., for two-stage dialing. Having the gateway detect tones relieves
   the receiving Internet end system from having to do this work and
   also avoids having low bit-rate codecs like G.723.1 [I-4] render DTMF
   tones unintelligible. In the second application, an Internet end
   system such as an "Internet phone" can emulate DTMF functionality
   without concerning itself with generating precise tone pairs and
   without imposing the burden of tone recognition on the receiver.

   A similar distinction occurs at the receiving end.  In the gateway
   scenario, an Internet telephony gateway connecting a packet voice
   network to the PSTN recreates the DTMF tones or other telephony
   events and injects them into the PSTN. In the end system scenario,
   the DTMF events are consumed by the receiving entity itself.

   Table 1 shows the DTMF-related named event codepoints within the
   telephone-event payload format. The DTMF digits 0-9 and * and # are
   commonly supported.  DTMF digits A through D are less frequently
   encountered, typically in special applications such as military
   networks.

   ITU-T Recommendation Q.24 [N-9], Table A-1, indicates that the legacy
   switching equipment in the countries surveyed expects a minimum
   recognizable signal duration of 40 ms, a minimum pause between
   signals of 40 ms, and a maximum signalling rate of 8 to 10 digits per
   second depending on the country.










Schulzrinne, Petrack     Expires - April 2005                [Page 18]


                    RTP Events and Tones Payloads       November 2004


       Event        Encoding        Type         Volume?
                    (decimal)

       0--9         0--9            tone         yes

       *            10              tone         yes

       #            11              tone         yes

       A--D         12--15          tone         yes


                        Table 1: DTMF named events



3.2   Data Modem and Fax Events

   This section defines a few of the control events and tones that can
   appear on a subscriber line serving a fax machine or modem.  Their
   purpose is to support negotiation, start-up and takedown of FAX and
   modem sessions and transitions between operating modes.  The actual
   FAX and modem content are carried by other payload types (e.g, G.711
   [I-3], T.38 [I-8], or, in specific circumstances, V.150.1 [I-17]
   modem relay, RFC 2793 [I-1], or CLEARMODE [I-2].  The events are
   organized into several groups, corresponding to the ITU-T
   Recommendation in which they are defined.

   NOTE: implementors SHOULD NOT rely on the descriptions of the various
   modem protocols described below without consulting the original
   references (generally ITU-T Recommendations).  The descriptions are
   provided in this document to give a context for the use of the events
   defined here.  They frequently omit important details needed for
   implementation.

   The typical application of these events is to allow the Internet to
   serve as a bridge between terminals operating on the PSTN.  This
   application is characterized as follows:

    - each gateway will act both as sender and as receiver;

    - time constraints apply to the exchange of signals, making the
      early identification and reporting of events desirable so that
      receiver playout can proceed in timely fashion;

    - transfer of the events must be reliable.




Schulzrinne, Petrack     Expires - April 2005                [Page 19]


                    RTP Events and Tones Payloads       November 2004


   In some cases, an implementation may simply ignore certain events,
   such as fax tones, that do not make sense in a particular
   environment.  Section 2.4.1 specifies how an implementation can use
   the SDP "fmtp" parameter within an SDP description to indicate its
   inability to understand a particular event or range of events.

   Regardless of which events they support, implementations MUST be
   prepared to send and receive data signals using payload types other
   than telephone-event, simultaneously with the use of the latter.
   This is discussed further in section 5.3.

   A further word on time constraints is in order.  Time constraints
   governing the duration of tones do not pose a problem when using the
   telephone-events payload type: the payload specifies the duration and
   the receiving gateway can play out the tones accordingly.  Problems
   come when time constraints are specified for the duration of silence
   between tones.  A silent period of "at least x ms" is not a problem -
   - event notifications can be received late, but they can still be
   played out at their specified durations.

   The problem arises with requirements of silence for "exactly" some
   period or for "at most" some period.  The most general constraint of
   the latter type has to do with the operation of echo suppressors
   (ITU-T Rec. G.164 [N-6] and echo cancellers (ITU-T Rec. G.165 [N-7]).
   These devices may re-activate after as little as 100 ms of no signal
   on the line.  As a result, in any situation where echo suppressors or
   cancellers must be disabled for signalling to work, tone events must
   be reported quickly enough to ensure that these devices do not become
   renabled.  This principle is reflected in the succeeding sections.

3.2.1 V.21 Events

   V.21 [N-12] is a modem protocol offering data transmission at a
   maximum rate of 300 bits/s.  Two channels are defined, supporting
   full duplex data transmission if required.  One channel uses
   frequencies 980 Hz for "1" and 1180 Hz for "0"; the other channel
   uses frequencies 1650 Hz for "1" and 1850 Hz for "0".  The modem can
   operate synchronously or asynchronously.

   V.21 is used by other protocols (e.g., V.8bis, V.18, T.30) for
   transmission of control data, and is also used in its own right
   between text terminals.  The telephone-events payload type SHOULD NOT
   be used to carry user data as opposed to control data -- other
   payload types such as G.711 [I-3], RFC 2793 [I-1], or V.150.1 [I-17]
   modem relay are more suitable for that purpose.  The V.21 events are
   summarized in Table 3.

   Sending implementations MUST report a completed event for every bit
   transmitted (i.e., rather than at transitions between "0" and "1").


Schulzrinne, Petrack     Expires - April 2005                [Page 20]


                    RTP Events and Tones Payloads       November 2004


   Implementations SHOULD pack multiple events into one packet, using
   the procedures of section 2.5.1.5.  Eight to ten bits is a reasonable
   packetization interval.

   Reliable transmission of V.21 events is important, to prevent data
   corruption.  Reporting an event per bit rather than per transition
   increases reporting redundancy and thus reporting reliability, since
   each event completion is retransmitted three times as described in
   section 2.5.1.4.  To reduce the number of packets required for
   reporting, implementations SHOULD carry the retransmitted events
   using RFC 2198 [N-2] redundancy encoding.

     Event            Frequency  Encoding      Type    Volume?
                       Hz        (decimal)

     V.21 channel 1,   1180          37       tone    yes
     "0" bit

     V.21 channel 1,    980          38       tone    yes
     "1" bit

     V.21 channel 2,   1850          39       tone    yes
     "0" bit

     V.21 channel 2,   1650          40       tone    yes
     "1" bit


                     Table 2: Events for V.21 signals

3.2.2 V.8 Events

   V.8 [N-11] is an older general negotiation and control protocol,
   supporting startup for the following terminals: H.324 [I-6]
   multimedia, V.18 [I-21] text, T.101 [I-9] videotext, T.30 [N-10] send
   or receive FAX, and a long list of V-series modems including V.34 [I-
   13], V.90 [I-14], V.91 [I-15], and V.92 [I-16].  In contrast to
   V.8bis [I-19], in V.8 only the calling terminal can determine the
   operating mode.

   V.8 defines four signals which consist of bits transferred by V.21
   [N-12] at 300 bits/s: the call indicator signal (CI), the call menu
   signal (CM), the CM terminator (CJ), and the joint menu signal (JM).
   In addition, it uses tones defined in V.25 [N-13] and T.30 [N-10]
   (described below), and one tone (ANSam) defined in V.8 itself.  The
   calling terminal sends using the V.21 low channel; the answering
   terminal uses the high channel.



Schulzrinne, Petrack     Expires - April 2005                [Page 21]


                    RTP Events and Tones Payloads       November 2004


   The basic protocol sequence is subject to a number of variations to
   accommodate different terminal types.  A pure V.8 sequence is as
   follows:

   1) After an initial period of silence, the calling terminal transmits
      the V.8 CI signal.  It repeats CI at least three times, continuing
      with occasional pauses until it detects ANSam tone.  The CI
      indicates whether the calling terminal wants to function as H.324,
      V.18, T.30 send, T.30 receive, or a V-series modem.

   2) The answering terminal transmits ANSam after detecting CI.  ANSam
      will disable any G.164 [N-6] echo suppressors on the circuit after
      400 ms and any G.165 [N-7] echo cancellors after one second of
      ANSam playout.

   3) On detecting ANSam, the calling terminal pauses at least half a
      second, then begins transmitting CM to indicate detailed
      capabilities within the chosen mode.

   4) After detecting at least two identical sequences of CM, the
      answering terminal begins to transmit JM, indicating its own
      capabilities (or offering an alternative terminal type if it
      cannot support the one requested).

   5) After detecting at least two identical sequences of JM, the
      calling terminal completes the current octet of CM, then transmits
      CJ to acknowledge the JM signal.  It pauses exactly 75 ms, then
      starts operating in the selected mode.

   6) The answering terminal transmits JM until it has detected CJ.  At
      that point it stops transmitting JM immediately, pauses exactly 75
      ms, then starts operating in the selected mode.

   The CI, CM, and JM signals all consist of a fixed sequence of ten "1"
   bits followed by a signal-dependent pattern of ten synchronization
   bits, followed by one or more octets of variable information.  Each
   octet is preceded by a "0" start bit and followed by a "1" stop bit.
   The combination of the synchronization pattern and V.21 channel
   uniquely identifies the message type.  The CJ signal consists of
   three successive octets of all zeros with stop and start bits but
   without the preceding "1"s and synchronizing pattern of the other
   signals.

   If both gateways support V.21 bit events (section 3.2.2), the sending
   gateway for a given message MUST report each instance of a CM, JM,
   CI, and CJ signal respectively as a series of V.21 bit events.  A
   packetization interval of 10 events per packet is suggested, since
   V.8 signals are organized in this way.



Schulzrinne, Petrack     Expires - April 2005                [Page 22]


                    RTP Events and Tones Payloads       November 2004


   The overlapping nature of V.8 signalling means that there is no risk
   of silence exceeding 100 ms once ANSam has disabled any echo control
   circuitry.  However, the 75 ms pause before entering operation in the
   selected data mode will require both the calling and the answering
   gateways to recognize the completion of CJ, so they can change from
   playout of telephone-events to playout of the data-bearing payload
   after the 75 ms period.

     Event          Frequency     Encoding     Type      Volume?
                        Hz       (decimal)

     ANSam          2100 x 15         34       tone      yes

     /ANSam         2100 x 15         35       tone      yes
                    phase rev.



                      Table 3: Events for V.8 signals

   Modified answer tone ANSam consists of a sinewave signal at 2100 Hz
   with phase reversals at an interval of 450 ms, amplitude-modulated by
   a sine wave at 15 Hz. The modulated envelope ranges in amplitude
   between 0.8 and 1.2 times its average amplitude. The average
   transmitted power is governed by national regulations.  Thus it makes
   sense to indicate the volume of the signal.  The ANSam phase
   reversals are allowed only if echo canceller disabling is required.

   The sender MUST report ANSam as soon as it is recognized, providing
   updates at reasonable intervals as it continues.  However, an ANSam
   event packet SHOULD NOT be sent until it is possible to discriminate
   between an ANSam event and an ANS event (see V.25 events, below).  If
   a phase reversal is detected, the sender MUST report completion of
   the ANSam event and beginning of the /ANSam event at the time that
   the reversal was detected.  If another phase reversal is detected,
   the sender MUST report the end of the /ANSam event and the beginning
   of an ANSam event, continuing in this way until the tone is removed.

3.2.3 V.25 Events

   V.25 [N-13] is a start-up protocol antedating V.8 [N-11] and V.8bis
   [I-19].  It specifies the exchange of two tone signals:

   CT:

      "The calling tone consists of a series of interrupted bursts of
      1300 hz tone, on for a duration of not less than 0.5 s and not
      more than 0.7 s and off for a duration of not less than 1.5 s and



Schulzrinne, Petrack     Expires - April 2005                [Page 23]


                    RTP Events and Tones Payloads       November 2004


      not more than 2.0 s." [N-13]. Modems not starting with the V.8
      call initiation signal often use this tone.

   ANS:

      Answering tone.  This 2100 Hz tone is used to disable echo
      suppression for data transmission [N-13], [N-10].  For fax
      machines, Recommendation T.30 [N-10] refers to this tone as called
      terminal identification (CED) answer tone.  ANS differs from V.8
      ANSam in that ANSam varies in amplitude due to modulation by a 15
      Hz signal.

   V.25 specifically includes procedures for disabling echo suppressors
   as defined by ITU-T Rec. G.164 [N-6].  However, G.164 echo
   suppressors have now for the most part been replaced by G.165 [N-7]
   echo cancellers, which require phase reversals in the disabling tone
   (see ANSam above).  As a result, V.25 was modified in July, 2001 to
   say that phase reversal in the ANS tone is required if echo
   cancellers are to be disabled.

   One possible V.25 sequence is as follows:

   1) The calling terminal starts generating CT as soon as the call is
      connected.

   2) The called terminal waits in silence for 1.8 to 2.5 s after
      answer, then begins to transmit ANS continuously.  If echo
      cancellers are on the line the phase of the ANS signal is reversed
      every 450 ms.  ANS will not reach the calling terminal until the
      echo control equipment has been disabled.  Since this takes about
      a second it can only happen in the gap between one burst of CT and
      the next.

   3) Following detection of ANS, the calling terminal may stop
      generating CT immediately or wait until the end of the current
      burst to stop.  In any event, it must wait at least 400 ms (at
      least 1 s if phase reversal of ANS is being used to disable echo
      cancellers) after stopping CT before it can generate the calling
      station response tone.  This tone is modem-specific, not specified
      in V.25.

   4) The called terminal plays out ANS for 2.6 to 4.0 seconds or until
      it has detected calling station response for 100 ms.  It waits 55-
      95 ms (nominal 75 ms) in silence.  (Note that the upper limit of
      95 ms is rather close to the point at which echo control may
      reestablish itself.)  If the reason for ANS termination was
      timeout rather than detection of calling station response, the
      called terminal begins to play out ANS again to maintain disabling
      of echo control until the calling station responds.


Schulzrinne, Petrack     Expires - April 2005                [Page 24]


                    RTP Events and Tones Payloads       November 2004


   The events defined for V.25 signalling are shown in Table 5.  The
   gateway at the calling end SHOULD use a packetization interval
   smaller than the nominal duration of a CT burst, to ensure that CT
   playout at the called end precedes the sending of ANS from that end.

   The gateway at the called end MUST report ANS as soon as it is
   recognized, providing updates at reasonable intervals as it
   continues. However, an ANS event packet SHOULD NOT be sent until it
   is possible to discriminate between an ANS event and an ANSam event
   (see V.8 events, above).  If a phase reversal is detected, the sender
   MUST report completion of the ANS event and beginning of the /ANS
   event at the time that the reversal was detected.  If another phase
   reversal is detected, the sender MUST report the end of the /ANS
   event and the beginning of an ANS event, continuing in this way until
   the tone is removed.

     Event          Frequency     Encoding      Type    Volume?
                        Hz       (decimal)

     Answer tone    2100              32       tone    yes
     (ANS)

     /ANS           2100 rev          33       tone    yes

     CT             1300              49       tone    yes


                     Table 4: Events for V.25 signals

3.2.4 T.30 Events

   ITU-T Recommendation T.30 [N-10] defines the procedures used by Group
   III FAX terminals.  The pre-message procedures for which the events
   of this section are defined are used to identify terminal
   capabilities at each end and negotiate operating mode. Post-message
   procedures are also included, to handle cases such as multiple
   document transmission.  FAX terminals support a wide variety of
   protocol stacks, so T.30 has a number of options for control
   protocols and sequences.

   T.30 defines two tone signals used at the beginning of a call.  The
   CNG signal is sent by the calling terminal.  It is a pure 1100 Hz
   tone played in bursts: 0.5 s on, 3 s off.  It continues until timeout
   or until the calling terminal detects a response.

   The called terminal waits in silence for at least 200 ms.  It then
   may return CED tone, which is identical to V.25 ANS, or else V.8
   ANSam if it has V.8 capability.  If ANSam is returned and the calling


Schulzrinne, Petrack     Expires - April 2005                [Page 25]


                    RTP Events and Tones Payloads       November 2004


   terminal has V.8 capability, it transmits CI to begin a V.8
   negotiation.  Otherwise, the calling and called terminals enter the
   T.30 negotiation phase.

   In the negotiation phase the terminals exchange binary messages using
   V.21 signals, high channel frequencies only.  Each message is
   preceded by a one-second (nominal) preamble consisting entirely of
   HDLC flag octets (0x7E).  This flag has the function of preparing
   echo control equipment for the message which follows.

   The pre-transfer messages exchanged using the V.21 coding are:

   Digital Identification Signal (DIS):

      Characterizes the standard ITU-T capabilities of the called
      terminal.

   Digital Transmit Command (DTC):

      The digital command response to the standard capabilities
      identified by the DIS signal.

   Digital Command Signal (DCS):

      The digital set-up command responding to the standard capabilities
      identified by the DIS signal.

   Confirmation To Receive (CFR):

      A digital response confirming that the entire pre-message
      procedure has been completed and the message transmissions may
      commence.

   If the calling terminal wishes to transmit a document, the three
   messages exchanged are DIS (from the called terminal), DCS, and CFR.
   If it wishes to receive, the sequence changes to DIS, DTC, DCS, and
   CFR.  Each message may consist of multiple frames, each bounded by
   HDLC flags.  The messages are organized as a series of octets, but
   T.30 calls for the insertion of extra "0" bits to prevent spurious
   recognition of HDLC flags.

   T.30 also provides for the transmission of control messages after
   document transmission has completed (e.g., to support transmission of
   multiple documents).  The transition back from the modem used for
   document transmission (V.17 [I-10], V.27ter [I-11], V.29 [I-12], V.34
   [I-13]) to V.21 signalling is preceded by 75 ms (nominal) of
   silence).  Control message transmission is preceded by the preamble
   described above.



Schulzrinne, Petrack     Expires - April 2005                [Page 26]


                    RTP Events and Tones Payloads       November 2004


   Before CFR the transmitting terminal sends a training signal
   consisting of a steady string of V.21 high channel zeros (1850 Hz
   tones) for 1.5 s.  The sender MAY report this training signal either
   as a single extended V.21 upper channel "0" event, or as a series of
   "0" events of normal duration.  The event(s) MUST be reported as soon
   as the training signal is recognized, with updates at reasonable
   intervals thereafter.

   Applications supporting T.30 signalling using the telephone-events
   payload MUST transfer T.30 messages in the form of sequences of bits,
   using the V.21 bit events defined in section 3.2.2.  The transmitted
   information MUST include the complete contents of the message: the
   initial HDLC flags, the information field, the checksum, and the
   terminating HDLC flags.

   Transmission MUST also include the extra "0" bits added to prevent
   false recognition of HDLC flags at the receiver.  Implementors should
   note that these extra "0" bits mean that in general T.30 messages as
   transmitted on the wire will not come out to an even multiple of
   octets.  Sending implementations MAY choose to vary the packetization
   interval to include exactly one octet of information plus any extra
   "0" bits inserted into that octet.

   The events defined for T.30 signalling are shown in Table 6.  The CED
   and /CED events represent exactly the same tone signals as V.8 ANS
   and /ANS, and are given the same codepoints; they are reproduced here
   only for convenience.  For reporting of CNG, the gateway at the
   calling end SHOULD use a packetization interval smaller than the
   nominal duration of a CNG burst, to ensure that CED has time to
   disable echo control before it times out.

   The gateway at the called end MUST report CED as soon as it is
   recognized, providing updates at reasonable intervals as it
   continues. However, a CED event packet SHOULD NOT be sent until it is
   possible to discriminate between a CED event and an ANSam event (see
   V.8 events, above).  If a phase reversal is detected, the sender MUST
   report completion of the CED event and beginning of the /CED event at
   the time that the reversal was detected.  If another phase reversal
   is detected, the sender MUST report the end of the /CED event and the
   beginning of an CED event, continuing in this way until the tone is
   removed.

   Event            Frequency     Encoding  Type    Volume?
                         Hz       (decimal)

   CNG (Calling     1100             36      tone    yes
   tone)




Schulzrinne, Petrack     Expires - April 2005                [Page 27]


                    RTP Events and Tones Payloads       November 2004


   CED (Called      2100             32      tone    yes
   tone)

   /CED             2100             33      tone    yes
                    ph. rev.



                     Table 5: Events for T.30 signals



4. RTP Payload Format for Telephony Tones

4.1   Introduction

   As an alternative to describing tones and events by name, as
   described in section 2, it is sometimes preferable to describe them
   by their waveform properties. In particular, recognition is faster
   than for naming signals since it does not depend on recognizing
   durations or pauses.

   There is no single international standard for telephone tones such as
   dial tone, ringing (ringback), busy, congestion ("fast-busy"),
   special announcement tones or some of the other special tones, such
   as payphone recognition, call waiting or record tone. However, ITU-T
   Recommendation E.180 [I-20] notes that across all countries, these
   tones share a number of characteristics:

    - Telephony tones consist of either a single tone, the addition of
      two or three tones or the modulation of two tones. (Almost all
      tones use two frequencies; only the Hungarian "special dial tone"
      has three.) Tones that are mixed have the same amplitude and do
      not decay.

    - In-band tones for telephony events are in the range of 25 Hz
      (ringing tone in Angola) to 2600 Hz (the tone used for line
      signalling in SS No. 5 and R1). The in-band telephone frequency
      range is limited to 3400 Hz.  R2 defines a 3825 Hz out-of-band
      tone for line signalling on analogue trunks.  (The piano has a
      range from 27.5 to 4186 Hz.)

    -  Modulation frequencies range between 15 (ANSam tone) to 480 Hz
      (Jamaica). Non-integer frequencies are used only for frequencies
      of 16 2/3 and 33 1/3 Hz. (These fractional frequencies appear to
      be derived from AC power grid frequencies.)





Schulzrinne, Petrack     Expires - April 2005                [Page 28]


                    RTP Events and Tones Payloads       November 2004


    - Tones that are not continuous have durations of less than four
      seconds.

    - ITU Recommendation E.180 [I-20] notes that different telephone
      companies require a tone accuracy of between 0.5 and 1.5%.  The
      Recommendation suggests a frequency tolerance of 1%.

4.2   Examples of Common Telephone Tone Signals

   As an aid to the implementor, Table 15 summarizes some common tones.
   The rows labeled "ITU ..." refer to ITU-T Recommendation E.180 [I-
   20]. Note that there are no specific guidelines for these tones. In
   the table, the symbol "+" indicates addition of the tones, without
   modulation, while "*" indicates amplitude modulation. The meaning of
   these tones is described in section 3.3.

           Tone Name       Frequency  On Period  Off Period
                                          (s)        (s)

           CNG             1100        0.5        3.0

           V.25 CT         1300        0.5        2.0

           CED             2100        3.3        --

           ANS             2100        3.3        --

           ANSam           2100*15     3.3        --

           V.21 "0" bit,   1180        0.00333    --
           channel 1

           V.21 "1" bit,   980         0.00333    --
           channel 1

           V.21 "0" bit,   1850        0.00333    --
           channel 2

           V.21_"1"_bit,   1650        0.00333    --
           channel 2

           -------------   ---------- ---------  ----------

           ITU dial tone   425         --         --

           U.S. dial       350+440     --         --
           tone



Schulzrinne, Petrack     Expires - April 2005                [Page 29]


                    RTP Events and Tones Payloads       November 2004


           ITU ringing     425         0.67-1.5   3-5
           tone

           U.S._ringing_   440+480     2.0        4.0
           tone

           ITU busy tone   425

           U.S. busy       480+620     0.5        0.5
           tone

           ITU             425
           congestion
           tone

           U.S.            480+620     0.25       0.25
           congestion
           tone



                   Table 6: Examples of telephony tones

4.3   Use of RTP Header Fields

4.3.1 Timestamp

   The RTP timestamp reflects the measurement point for the current
   packet. The event duration described in section 4.3.3 extends
   forwards from that time.

4.3.2 Marker Bit

   The tones payload type uses the marker bit to distinguish the first
   RTP packet reporting a given instance of a tone from succeeding
   packets for that tone.  The marker bit SHOULD be set to 1 for the
   first packet, and to 0 for all succeeding packets relating to the
   same tone.

4.3.3 Payload Format

   Based on the characteristics described above, this document defines
   an RTP payload format called "tone" that can represent tones
   consisting of one or more frequencies. (The corresponding MIME type
   is "audio/tone".) The default timestamp rate is 8000 Hz, but other
   rates may be defined. Note that the timestamp rate does not affect
   the interpretation of the frequency, just the durations.



Schulzrinne, Petrack     Expires - April 2005                [Page 30]


                    RTP Events and Tones Payloads       November 2004


   In accordance with current practice, this payload format does not
   have a static payload type number, but uses a RTP payload type number
   established dynamically and out-of-band.

   The payload format is shown in Figure 2.



     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    modulation   |T|  volume   |          duration             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |R R R R|       frequency       |R R R R|       frequency       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |R R R R|       frequency       |R R R R|       frequency       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        ......

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |R R R R|       frequency       |R R R R|      frequency        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                   Figure 2:   Payload Format for Tones

   The payload contains the following fields:

   modulation:

      The modulation frequency, in Hz. The field is a 9-bit unsigned
      integer, allowing modulation frequencies up to 511 Hz. If there is
      no modulation, this field has a value of zero.

   T:

      If the "T" bit is set (one), the modulation frequency is to be
      divided by three. Otherwise, the modulation frequency is taken as
      is.

      This bit allows frequencies accurate to 1/3 Hz, since modulation
      frequencies such as 16 2/3 Hz are in practical use.

   volume:

      The power level of the tone, expressed in dBm0 after dropping the
      sign, with range from 0 to -63 dBm0. (Note: A preferred level
      range for digital tone generators is -8 dBm0 to -3 dBm0.)



Schulzrinne, Petrack     Expires - April 2005                [Page 31]


                    RTP Events and Tones Payloads       November 2004


   duration:

      The duration of the tone, measured in timestamp units.  The tone
      begins at the instant identified by the RTP timestamp and lasts
      for the duration value. The value of zero is not permitted and
      tones with such a duration SHOULD be ignored.

      The definition of duration corresponds to that for sample-based
      codecs, where the timestamp represents the sampling point for the
      first sample.

   frequency:

      The frequencies of the tones to be added, measured in Hz and
      represented as a 12-bit unsigned integer. The field size is
      sufficient to represent frequencies up to 4095 Hz, which exceeds
      the range of telephone systems. A value of zero indicates silence.
      A single tone can contain any number of frequencies.  If the
      number of frequencies it contains is odd, padding SHALL be added
      to bring the packet to a 32-bit boundary.  (RFC 3550 [N-4]
      requires that padding be set to all zeroes.)

   R:

      This field is reserved for future use. The sender MUST set it to
      zero, the receiver MUST ignore it.

4.3.4 Optional MIME Parameters

   The "rate" parameter describes the sampling rate, in Hertz. The
   number is written as a floating point number or as an integer. If
   omitted, the default value is 8000 Hz.

4.4   Procedures

   This section defines the procedures associated with the tones payload
   type.

4.4.1 Sending Procedures

   As indicated by the examples in Table 15, the duration of an
   individual tone may range from a few milliseconds to a number of
   seconds.  Timing considerations dictate some general guidelines for
   how these two extremes should be handled by the sender.  For tones
   directed to human listeners, timing is not critical, within a
   tolerance of 100 ms or so at either beginning or end.  For tones
   directed to remote equipment, the most critical aspect of timing is
   intra-stream time relationships -- that is, the individual tone
   durations and the interval between tones for a related sequence of


Schulzrinne, Petrack     Expires - April 2005                [Page 32]


                    RTP Events and Tones Payloads       November 2004


   them.  The timing of the start of playout of a related sequence is
   less critical within limits.

   In the case of longer-duration tones, implementations SHOULD expect
   to generate multiple RTP packets for the same tone instance.  The
   considerations just enumerated suggest that a packetization interval
   in the order of 50 ms may be acceptable, in terms of the initial
   delay it imposes on remote playback.  Implementations MAY adjust the
   packetization interval to suit the nature of the tones being played
   out.  The packetization interval SHOULD remain constant until the
   tone ends in order not to distort playout times through buffer under-
   runs.

   The RTP timestamp MUST be updated for each packet generated (in
   contrast, for instance, to the timestamp for packets carrying
   telephone- events).  The first RTP packet for a tone SHOULD have the
   marker bit set to 1.  Subsequent packets for the same tone SHOULD
   have the marker bit set to 0, and the RTP timestamp in each
   subsequent packet MUST equal the sum of the timestamp and the
   duration in the preceding packet.  A final RTP packet SHOULD be
   generated as soon as the end of the tone is detected, without waiting
   for the latest packetization interval to elapse.

   If the tones are meant for machine consumption, the intervals between
   them are potentially critical.  Implementations may be aware of this
   situation, or may infer it from a heuristic such as that the tones
   are less than a second in duration.  In this situation, it is
   RECOMMENDED that if a tone follows another tone within a period of
   100 ms or less, the new tone should be reported as soon as it has
   been identified.  The suggested 50 ms packetization interval should
   be applied to subsequent reports for the same tone.

   The above advice applies to tones lasting in the order of 25 ms or
   more.  Shorter tones, which are likely to be from modems, SHOULD be
   reported in batches.  The tones payload format requires that each
   tone be reported in a separate RTP packet, but it is RECOMMENDED that
   multiple RTP packets be reported in the same UDP packet.  Individual
   tones should be given their actual durations (i.e., from transition
   point to transition point) rather than reporting a new tone at each
   bit boundary.

4.4.2 Receiving Procedures

   Receiving implementations play out the tones as received.  When
   playing out successive tone reports for the same tone (marker bit is
   zero, the RTP timestamp is contiguous with that of the previous RTP
   packet, and payload content is identical), the receiving
   implementation SHOULD continue the tone without change or a break.



Schulzrinne, Petrack     Expires - April 2005                [Page 33]


                    RTP Events and Tones Payloads       November 2004




5. Application Considerations

5.1   Combining Tones and Named Events

   Gateways which send signalling events via RTP MAY send both named
   signals (section named) and the tone representation (section tones)
   as a single RTP session, using the redundancy mechanism defined in
   RFC 2198 [N-2] to interleave the two representations. It is generally
   a good idea to send both, since it allows the receiver to choose the
   appropriate rendering.

   If a gateway cannot present a tone representation, it SHOULD also
   send the audio tones as regular RTP audio packets using either the
   codec used for regular speech signals or a codec that is known to
   carry such signals successfully (e.g., PCMU).

      Some low-rate codecs cannot accurately represent certain
      tones, such as DTMF.

5.2   Simultaneous Generation of Audio and Events

   A source can choose between four approaches:

   Events and audio:

      The source sends events and encoded audio packets (e.g., PCMU or
      the codec used for speech signals) for the same time instant. In
      that mode, events are treated as redundant encodings for the
      encoded audio stream.

   Events only:

      The source does not send encoded audio while event tones are
      active and only sends named events, without any redundancy beyond
      the periodic updates of longer-lasting events.

   Events only, with redundancy:

      The source does not send encoded audio while event tones are
      active. It only sends named events, but uses RFC 2198 [N-2]
      redundancy, with named events as both primary and redundant
      encodings.

   Events and audio, with redundancy:





Schulzrinne, Petrack     Expires - April 2005                [Page 34]


                    RTP Events and Tones Payloads       November 2004


      During an event, the source sends both named events and audio,
      using RFC 2198 [N-2] to interleave audio data, current and
      redundant named events.

   The choices above do not affect the event redundancy mechanism
   described in section 2.6.

   Note that a period covered by a named event may overlap in time with
   a period of audio encoded by other means. This is likely to occur at
   the onset of a tone and is necessary to avoid possible errors in the
   interpretation of the reproduced tone at the remote end.
   Implementations supporting this payload format MUST be prepared to
   handle the overlap. It is RECOMMENDED that gateways only render the
   encoded tone since the audio may contain spurious tones introduced by
   the audio compression algorithm. However, it is anticipated that
   these extra tones in general should not interfere with recognition at
   the far end.

5.3   Strategies For Handling FAX and Modem Signals

   As described in section 3.2, the typical data application involves a
   pair of gateways interposed between two terminals, where the
   terminals are in the PSTN.  The gateways are likely to be serving a
   mixture of voice and data traffic, and need to adopt payload types
   appropriate to the media flows as they occur.  If voice compression
   is in use for voice calls, this means that the gateways need the
   flexibility to switch to other payload types when data streams are
   recognized.

   Within the established IETF framework, this implies that the gateways
   must negotiate the potential payloads (voice, telephone-events,
   tones, voice-band data, T.38 FAX [I-8], and possibly RFC 2793 [I-1]
   text and CLEARMODE [I-2] octet streams) as separate payload types.
   From a timing point of view, this is most easily done at the
   beginning of a call, but results in an over-allocation of resources
   at the gateways and in the intervening network.

   One alternative is to use named events to buy time while out-of-band
   signals are exchanged to update to the new payload type applicable to
   the session. Thanks to the events defined in section 3.2, this is a
   viable approach for sessions beginning with V.8, V.8bis, T.30, or
   V.25 control sequences.

   Named data-related events also allow gateways to optimize their
   operation when data signals are received in a relatively general
   form.  One example is the use of V.8-related events to deduce that
   the voice-band data being sent in a G.711 payload comes from a
   higher-speed modem and therefore requires disabling of echo
   cancellors.


Schulzrinne, Petrack     Expires - April 2005                [Page 35]


                    RTP Events and Tones Payloads       November 2004


   All of the control procedures described in section 3.2 eventually
   give way to data content.  As mentioned above, this content will be
   carried by other payload types.  Receiving gateways MUST be prepared
   to switch to the other payload type within the time constraints
   associated with the respective applications.  (For several of the
   procedures documented below, the sender provides 75 ms of silence
   between the initial control signalling and the sending of data
   content.)  In some cases (V.8bis [I-19], T.30 [N-10]), further
   control signalling may happen after the call has been established.

   A possible strategy is to send both telephone-events and the data
   payload in an RFC 2198 redundancy arrangement.  The receiving gateway
   then propagates the data payload whenever no event is in progress.
   For this to work, the data payload and events (when present) MUST
   cover exactly the same time period; otherwise spurious events will be
   detected downstream.

   Note that there are a number of cases where no control sequence will
   precede the data content.  This is true, for example, for a number of
   legacy text terminal types.  In such instances, the events defined in
   section 3.2.6 in particular MAY be sent to help the remote gateway
   optimize its handling of the alternative payload.

5.4   Examples

5.4.1 Use of RFC 2198 Redundancy With Named Events

   A typical RTP packet, where the user is just dialing the last digit
   of the DTMF sequence "911", is shown in Figure 3. The first digit was
   200 ms long (1600 timestamp units) and started at time 0, the second
   digit lasted 250 ms (2000 timestamp units) and started at time 800 ms
   (6400 timestamp units), the third digit was pressed at time 1.4 s
   (11,200 timestamp units) and the packet shown was sent at 1.45 s
   (11,600 timestamp units). The frame duration is 50 ms. To make the
   parts recognizable, Figure 3 ignores byte alignment. Timestamp and
   sequence number are assumed to have been zero at the beginning of the
   first digit. In this example, the dynamic payload types 96 and 97
   have been assigned for the redundancy mechanism and the telephone
   event payload, respectively.












Schulzrinne, Petrack     Expires - April 2005                [Page 36]


                    RTP Events and Tones Payloads       November 2004


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   | 2 |0|0|   0   |0|     96      |              13               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   |                             11200                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   |                            0x5234a8                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   block PT  |     timestamp offset      |   block length    |
   |1|     97      |            11200          |         4         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   block PT  |     timestamp offset      |   block length    |
   |1|     97      |   11200 - 6400 = 4800     |         4         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   Block PT  |
   |0|     97      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     digit     |E R| volume    |          duration             |
   |       9       |1 0|     7     |             1600              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     digit     |E R| volume    |          duration             |
   |       1       |1 0|    10     |             2000              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     digit     |E R| volume    |          duration             |
   |       1       |0 0|    20     |              400              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


            Figure 3:   Example RTP packet after dialing "911"

   Table 16 shows all packets up to and including the packet shown in
   the figure. The last three columns describe the duration fields in
   the event payloads. The timestamp offset is not shown. We assume here
   that the digits happen to start on a 50 ms multiple, which is
   somewhat unlikely.












Schulzrinne, Petrack     Expires - April 2005                [Page 37]


                    RTP Events and Tones Payloads       November 2004


   Time (s)   Event      RTP    Timestamp           Duration
                        Seq               "9"      "1"      "1"

   0.00      "9"         -             -        -        -         -
            starts
   0.05                  0             0      400        -         -
   0.10                  1             0      800        -         -
   0.15                  2             0    1,200        -         -
   0.20      "9" ends    3             0    1,600        -         -
   0.25                  4             0    1,600        -         -
   0.30                  5             0    1,600        -         -
   0.80      "1"         -             -        -        -         -
            starts
   0.85                  6         6,400    1,600      400         -
   0.90                  7         6,400    1,600      800         -
   0.95                  8         6,400    1,600    1,200         -
   1.00                  9         6,400    1,600    1,600         -
   1.05      "1" ends   10         6,400    1,600    2,000         -
   1.10                 11         6,400    1,600    2,000         -
   1.15                 12         6,400    1,600    2,000         -
   1.40      "1"         -             -        -        -         -
            starts
   1.45                 13        11,200    1,600    2,000       400


                     Table 7: RTP packets for example

5.4.2 Combined Tone and Telephone-event Payloads

   The payload formats in sections 2 and 4 can be combined into a single
   payload using the method specified in RFC 2198 [N-2]. Figure 4_shows
   an example. In that example, the RTP packet combines two "tone" and
   one "telephone-event" payloads.  The payload types are chosen
   arbitrarily as 97 and 98, respectively, with a sample rate of 8000
   Hz. Here, the redundancy format has the dynamic payload type 96.

   The packet represents a snapshot of U.S. ringing tone, 1.5 seconds
   (12,000 timestamp units) into the second "on" part of the 2.0/4.0
   second cadence, i.e., a total of 7.5 seconds (60,000 timestamp units)
   into the ring cycle. The 440 + 480 Hz tone of this second cadence
   started at RTP timestamp 48,000. Four seconds of silence preceded it,
   but since RFC 2198 only has a fourteen-bit offset, only 2.05 seconds
   (16383 timestamp units) can be represented. Even though the tone
   sequence is not complete, the sender was able to determine that this
   is indeed ringback, and thus includes the corresponding named event.





Schulzrinne, Petrack     Expires - April 2005                [Page 38]


                    RTP Events and Tones Payloads       November 2004


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | V |P|X|  CC   |M|     PT      |       sequence number         |
    | 2 |0|0|   0   |0|     96      |              31               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           timestamp                           |
    |                             48000                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           synchronization source (SSRC) identifier            |
    |                            0x5234a8                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|   block PT  |     timestamp offset      |   block length    |
    |1|     98      |            16383          |         4         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|   block PT  |     timestamp offset      |   block length    |
    |1|     97      |            16383          |         8         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|   Block PT  |
    |0|     97      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  event=ring   |0|0| volume=0  |     duration=28383            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | modulation=0    |0| volume=63 |     duration=16383            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0 0 0 0|     frequency=0       |0 0 0 0|    frequency=0        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | modulation=0    |0| volume=5  |     duration=12000            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0 0 0 0|     frequency=440     |0 0 0 0|    frequency=480      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


       Figure 4:   Combining tones and events in a single RTP packet















Schulzrinne, Petrack     Expires - April 2005                [Page 39]


                    RTP Events and Tones Payloads       November 2004


6. MIME Registration

6.1   audio/telephone-event

   MIME media type name: audio

   MIME subtype name: telephone-event

   Required parameters: none.

   Optional parameters:

      The "events" parameter lists the events supported by the
      implementation.  Events are listed as one or more comma-separated
      elements.  Each element can either be a single integer or two
      integers separated by a hyphen.  No white space is allowed in the
      argument.  The integers designate the event numbers supported by
      the implementation.

      The "rate" parameter describes the sampling rate, in Hertz.  The
      number is written as a floating point number or as an integer.  If
      omitted, the default value is 8000 Hz.

   Encoding considerations:

      This type is only defined for transfer via RTP [N-4].

   Security considerations:

      See the "Security Considerations" section (section 7) in this
      document.

   Interoperability considerations: none

   Published specification: This document.

   Applications which use this media:

      The telephone-event audio subtype supports the transport of events
      occuring in telephone systems over the Internet.

   Additional information:

      1. Magic number(s): N/A

      2. File extension(s): N/A

      3. Macintosh file type code: N/A



Schulzrinne, Petrack     Expires - April 2005                [Page 40]


                    RTP Events and Tones Payloads       November 2004


6.2   audio/tone

   MIME media type name: audio

   MIME subtype name: tone

   Required parameters: none

   Optional parameters:

      The "rate" parameter describes the sampling rate, in Hertz.  The
      number is written as a floating point number or as an integer.  If
      omitted, the default value is 8000 Hz.

   Encoding considerations:

      This type is only defined for transfer via RTP [N-4].

      audio/tone MIME body parts contain binary data.  A content-
      transfer-encoding of "binary" is strongly encouraged for messaging
      environments which support binary transport.  A content-transfer-
      encoding of base-64 (and the associated transformation) is
      strongly encouraged for messaging environments which do not
      support binary transfer.

   Security considerations:

      See the "Security Considerations" section (section 7) in this
      document.

   Interoperability considerations: none

   Published specification: This document.

   Applications which use this media: The tone audio subtype supports
   the transport of pure composite tones, for example those commonly
   used in the current telephone system to signal call progress.

   Additional information:

      1. Magic number(s): N/A

      2. File extension(s): N/A

      3. Macintosh file type code: N/A






Schulzrinne, Petrack     Expires - April 2005                [Page 41]


                    RTP Events and Tones Payloads       November 2004


7. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification (RFC 3550 [N-4]), and any appropriate RTP profile (for
   example RFC 3551 [I-22]). This implies that confidentiality of the
   media streams is achieved by encryption.  Because the data
   compression used with this payload format is applied end-to-end,
   encryption may be performed after compression so there is no conflict
   between the two operations.

   This payload type does not exhibit any significant non-uniformity in
   the receiver side computational complexity for packet processing to
   cause a potential denial-of-service threat.

   Additional security considerations are described in RFC 2198 [N-2].

   A security review of this payload format found no additional
   considerations.

8. IANA Considerations

   This document defines two new RTP payload formats, named telephone-
   event and tone, and associated Internet media (MIME) types,
   audio/telephone-event and audio/tone.  It also defines a number of
   codepoints for events.

   Within the audio/telephone-event type, events MUST be registered with
   IANA.  Registrations are subject to approval by the current chair of
   the IETF audio/video transport working group, or by an expert
   designated by the transport area director if the AVT group has
   closed. The initial registry content is shown in the following table,
   and consists of the events defined in section 3 of this document.

   The meaning of new events MUST be documented either as an RFC or an
   equivalent standards document produced by another standardization
   body, such as ITU-T.



                  audio/telephone-event Event Code Registry

    Event    Event Name                                  Reference
    Code

    0        DTMF digit "0"                              <This RFC>





Schulzrinne, Petrack     Expires - April 2005                [Page 42]


                    RTP Events and Tones Payloads       November 2004


    1        DTMF digit "1"                              <This RFC>

    2        DTMF digit "2"                              <This RFC>

    3        DTMF digit "3"                              <This RFC>

    4        DTMF digit "4"                              <This RFC>

    5        DTMF digit "5"                              <This RFC>

    6        DTMF digit "6"                              <This RFC>

    7        DTMF digit "7"                              <This RFC>

    8        DTMF digit "8"                              <This RFC>

    9        DTMF digit "9"                              <This RFC>

    10       DTMF digit "*"                              <This RFC>

    11       DTMF digit "#"                              <This RFC>

    12       DTMF digit "A"                              <This RFC>

    13       DTMF digit "B"                              <This RFC>

    14       DTMF digit "C"                              <This RFC>

    15       DTMF digit "D"                              <This RFC>

    32       ANS (V.25 Answer tone)                      <This RFC>

             Also known as CED (T.30 Called tone)

    33       /ANS (V.25 Answer tone with phase shift)    <This RFC>

             Also known as /CED (T.30 Called tone with
             phase shift)

    34       ANSam (V.8 amplitude modified Answer tone)  <This RFC>

    35       /ANSam (V.8 amplitude modified Answer tone  <This RFC>
             with phase shift)

    36       CNG (T.30 Calling tone)                     <This RFC>

    37       V.21 channel 1, "0" bit                     <This RFC>



Schulzrinne, Petrack     Expires - April 2005                [Page 43]


                    RTP Events and Tones Payloads       November 2004


    38       V.21 channel 1, "1" bit                     <This RFC>

    39       V.21 channel 2, "0" bit                     <This RFC>

    40       V.21 channel 2, "1" bit                     <This RFC>

    49       CT (V.25 Calling Tone)                      <This RFC>

    Legal event codes range from 0 to 255.  All codepoints other than
    those listed here are reserved.



9. Acknowledgements

   The suggestions of the Megaco working group are gratefully
   acknowledged.  Detailed advice and comments were provided by Hisham
   Abdelhamid, Flemming Andreasen, Fred Burg, Steve Casner, Dan
   Deliberato, Fatih Erdin, Bill Foster, Mike Fox, Mehryar Garakani,
   Gunnar Hellstrom, Rajesh Kumar, Terry Lyons, Steve Magnell, Zarko
   Markov, Kai Miao, Satish Mundra, Kevin Noll, Vern Paxson, Oren Peleg,
   Colin Perkins, Raghavendra Prabhu, Moshe Samoha, Todd Sherer, Adrian
   Soncodi, Yaakov Stein, Mira Stevanovic, Alex Urquizo and Herb
   Wildfeur.

10.   Authors

   Henning Schulzrinne
   Dept. of Computer Science
   Columbia University
   1214 Amsterdam Avenue
   New York, NY 10027
   USA

   electronic mail: schulzrinne@cs.columbia.edu

   Scott Petrack
   eDial
   USA

   electronic mail: scott.petrack@edial.com



   Tom Taylor
   Nortel Networks
   1852 Lorraine Ave.
   Ottawa, Ontario


Schulzrinne, Petrack     Expires - April 2005                [Page 44]


                    RTP Events and Tones Payloads       November 2004


   Canada K1H 6Z8

   Phone: +1 613 763-1496
   E-mail: taylor@nortelnetworks.com

12.   References

12.1  Normative References

   [N-1]   S. Bradner, "Key words for use in RFCs to indicate
           requirement levels", RFC 2119, Internet Engineering Task
           Force, Mar. 1997.

   [N-2]   C. E. Perkins, I. Kouvelas, O. Hodson, V. J. Hardman, M.
           Handley, J. C. Bolot, A. Vega-Garcia, and S. Fosse-Parisis,
           "RTP payload for redundant audio data", RFC 2198, Internet
           Engineering Task Force, Sept. 1997.

   [N-3]   M. Handley and V. Jacobson, "SDP: session description
           protocol", RFC 2327, Internet Engineering Task Force, Apr.
           1998.

   [N-4]   H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,
           "RTP: a transport protocol for real-time applications", RFC
           3550, Internet Engineering Task Force, Jul. 2003.

   [N-5]   S. Casner, P. Hoschka, "MIME Type Registration of RTP Payload
           Formats", RFC 3555, Internet Engineering Task Force, Jul.
           2003.

   [N-6]   International Telecommunication Union, "Echo suppressors",
           Recommendation G.164, ITU-T, Geneva, Switzerland, Nov. 1988.

   [N-7]   International Telecommunication Union, "Echo cancellers",
           Recommendation G.165, ITU-T, Geneva, Switzerland, Mar. 1993.

   [N-8]   International Telecommunication Union, "Technical features of
           push-button telephone sets", Recommendation Q.23, ITU-T,
           Geneva, Switzerland, Nov. 1988.

   [N-9]   International Telecommunication Union, "Multifrequency push-
           button signal reception",   Recommendation Q.24, ITU-T,
           Geneva, Switzerland, Nov. 1988.

   [N-10]  International Telecommunication Union, "Procedures for
           document facsimile transmission in the general switched
           telephone network", Recommendation T.30, ITU-T, Geneva,
           Switzerland, July 2003.



Schulzrinne, Petrack     Expires - April 2005                [Page 45]


                    RTP Events and Tones Payloads       November 2004


   [N-11]  International Telecommunication Union, "Procedures for
           starting sessions of data transmission over the public
           switched telephone network", Recommendation V.8, ITU-T,
           Geneva, Switzerland, Nov. 2000.

   [N-12]  International Telecommunication Union, "300 bits per second
           duplex modem standardized for use in the general switched
           telephone network", Recommendation V.21, ITU-T, Geneva,
           Switzerland, Nov. 1988.

   [N-13]  International Telecommunication Union, "Automatic answering
           equipment and general procedures for automatic calling
           equipment on the general switched telephone network including
           procedures for disabling of echo control devices for both
           manually and automatically established calls", Recommendation
           V.25, ITU-T, Geneva, Switzerland, Oct. 1996.  See also
           Corrigendum 1 to Recommendation V.25, Jul. 2001.



12.2  Informative References

   [I-1]   G. Hellstrom, "RTP Payload for Text Conversation", RFC 2793,
           Internet Engineering Task Force, May 2000.

   [I-2]   R. Kreuter, "RTP Payload for a 64 kbit/s transparent call",
           Work in progress, Internet Engineering Task Force, December
           2003.

   [I-3]   International Telecommunication Union, "Pulse code modulation
           (PCM) of voice frequencies", Recommendation G.711, ITU-T,
           Geneva, Switzerland, Nov. 1988.

   [I-4]   International Telecommunication Union, "Speech coders : Dual
           rate speech coder for multimedia communications transmitting
           at 5.3 and 6.3 kbit/s", Recommendation G.723.1, ITU-T,
           Geneva, Switzerland, Mar. 1996.

   [I-5]   International Telecommunication Union, "Coding of speech at 8
           kbit/s using conjugate-structure algebraic-code-excited
           linear-prediction (CS-ACELP)", Recommendation G.729, ITU-T,
           Geneva, Switzerland, Mar. 1996.

   [I-6]   International Telecommunication Union, "Terminal for low bit-
           rate multimedia communication", Recommendation H.324, ITU-T,
           Geneva, Switzerland, Mar. 2002.





Schulzrinne, Petrack     Expires - April 2005                [Page 46]


                    RTP Events and Tones Payloads       November 2004


   [I-7]   International Telecommunication Union, "ISDN user-network
           interface layer 3 specification for basic call control",
           Recommendation Q.931, ITU-T, Geneva, Switzerland, May 1998.

   [I-8]   International Telecommunication Union, "Procedures for real-
           time Group 3 facsimile communication over IP networks",
           Recommendation T.38, ITU-T, Geneva, Switzerland, Jul. 2003.

   [I-9]   International Telecommunication Union, "International
           interworking for videotex services", Recommendation T.101,
           ITU-T, Geneva, Switzerland, Nov. 1994.

   [I-10]  International Telecommunication Union, "A 2-wire modem for
           facsimile applications with rates up to 14 400 bit/s",
           Recommendation V.17, ITU-T, Geneva, Switzerland, Feb. 1991.

   [I-11]  International Telecommunication Union, "4800/2400 bits per
           second modem standardized for use in the general switched
           telephone network", Recommendation V.27ter, ITU-T, Geneva,
           Switzerland, Nov. 1988.

   [I-12]  International Telecommunication Union, "9600 bits per second
           modem standardized for use on point-to-point 4-wire leased
           telephone-type circuits", Recommendation V.29, ITU-T, Geneva,
           Switzerland, Nov. 1988.

   [I-13]  International Telecommunication Union, "A modem operating at
           data signalling rates of up to 33 600 bit/s for use on the
           general switched telephone network and on leased point-to-
           point 2-wire telephone-type circuits", Recommendation V.34,
           ITU-T, Geneva, Switzerland, Feb. 1998.

   [I-14]  International Telecommunication Union, "A digital modem and
           analogue modem pair for use on the Public Switched Telephone
           Network (PSTN) at data signalling rates of up to 56 000 bit/s
           downstream and up to 33 600 bit/s upstream", Recommendation
           V.90, ITU-T, Geneva, Switzerland, Sep. 1998.

   [I-15]  International Telecommunication Union, "A digital modem
           operating at data signalling rates of up to 64 000 bit/s for
           use on a 4-wire circuit switched connection and on leased
           point-to-point 4-wire digital circuits", Recommendation V.91,
           ITU-T, Geneva, Switzerland, May 1999.

   [I-16]  International Telecommunication Union, "Enhancements to
           Recommendation V.90", Recommendation V.92, ITU-T, Geneva,
           Switzerland, Nov. 2000.




Schulzrinne, Petrack     Expires - April 2005                [Page 47]


                    RTP Events and Tones Payloads       November 2004


   [I-17]  International Telecommunication Union, "Modem-over-IP
           networks: Procedures for the end-to-end connection of V-
           series DCEs", Recommendation V.150.1, ITU-T, Geneva,
           Switzerland, Jan. 2003.

   [I-18]  R. Kocen and T. Hatala, "Voice over frame relay
           implementation agreement", Implementation Agreement FRF.11,
           Frame Relay Forum, Foster City, California, Jan. 1997.

   [I-19]  International Telecommunication Union, "Procedures for the
           identification and selection of common modes of operation
           between data circuit-terminating equipments (DCEs) and
           between data terminal equipments (DTEs) over the public
           switched telephone network and on leased point-to-point
           telephone-type circuits", Recommendation V.8bis, ITU-T,
           Geneva, Switzerland, Nov. 2000.

   [I-20]  International Telecommunication Union, "Technical
           characteristics of tones for the telephone service",
           Recommendation E.180/Q.35, ITU-T, Geneva, Switzerland, Mar.
           1998.

   [I-21]  International Telecommunication Union, "Operational and
           interworking requirements for {DCEs operating in the text
           telephone mode", Recommendation V.18, ITU-T, Geneva,
           Switzerland, Nov. 2000.  See also Recommendation V.18
           Amendment 1, Nov. 2002.

   [I-22]  H. Schulzrinne, "RTP profile for audio and video conferences
           with minimal control", RFC 3551, Internet Engineering Task
           Force, Jul. 2003.




















Schulzrinne, Petrack     Expires - April 2005                [Page 48]


                    RTP Events and Tones Payloads       November 2004








Disclaimer of validity:

   "The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.

Copyright Notice

   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

Disclaimer

   This document and the information contained herein are provided on
   an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
   INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.






Schulzrinne, Petrack     Expires - April 2005                [Page 49]