Internet Engineering Task Force                                   SIP WG
Internet Draft                                              G. Camarillo
                                                                Ericsson
                                                          H. Schulzrinne
                                                     Columbia University
draft-camarillo-sipping-early-media-00.txt
November 29, 2002
Expires: May, 2003


Early Media and Ringback Tone Generation in the Session Initiation Protocol

STATUS OF THIS MEMO

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   To view the list Internet-Draft Shadow Directories, see
   http://www.ietf.org/shadow.html.


Abstract

   This document describes how to manage early media in SIP. It also
   describes which inputs need to be taken into consideration to define
   local policies for ringback tone generation.












G. Camarillo et. al.                                          [Page 1]


Internet Draft                    SIP                  November 29, 2002





                           Table of Contents



   1          Introduction ........................................    3
   2          Early Media in SIP ..................................    3
   2.1        Status Codes ........................................    4
   2.2        Direction Attributes and Media Clipping .............    4
   2.3        Intention to Send Media .............................    5
   2.4        The SDP Intention Parameter .........................    6
   2.5        Applicability of the SDP Intention Parameter ........    6
   3          Forking .............................................    6
   4          Ringback Tone Generation ............................    7
   5          Interactions with Preconditions .....................    8
   6          Examples ............................................    8
   6.1        Remotely Generated Ringback Tone ....................    8
   6.2        Locally Generated Ringback Tone .....................    9
   7          Acknowledgments .....................................    9
   8          Authors' Addresses ..................................    9
   9          Bibliography ........................................   10




























G. Camarillo et. al.                                          [Page 2]


Internet Draft                    SIP                  November 29, 2002


1 Introduction

   Early media refers to media (e.g., audio and/or video) that is
   exchanged before a particular session is accepted by the called user.
   Early media within a particular SIP dialog takes place from the
   moment the initial INVITE is sent until the UAS generates a final
   response. Early media can be unidirectional or bi-directional and can
   be generated by the caller or/and the callee. Typical examples of
   early media generated by the callee are ringback tone and
   announcements (e.g., queuing status). Early media generated by the
   caller typically consist of voice commands or DTMF tones to drive
   IVRs.

   The basic SIP spec [1] supports very simple early media, but UAs that
   implement fully-featured early media need to support the PRACK [2]
   and the UPDATE [3] methods.

   The remainder of this document is organized as follows. Section 2
   describes early media establishment in SIP and Section 4 describes
   ringback tone generation. Section 5 analyzes interactions between
   early media, ringback tone generation and preconditions and Section 6
   provides examples of common scenarios that involve the usage of the
   mechanisms described in Sections 2 and 4.

2 Early Media in SIP

   SIP [1] uses the offer/answer model [4] to negotiate session
   parameters. One of the user agents - the offerer - prepares a session
   description that is called the offer. The other user agent - the
   answerer - responds with another session description called the
   answer. This two-way handshake allows both user agents to agree upon
   the session parameters to be used to exchange media.

   The idea behind the offer/answer model is to decouple the
   offer/answer exchange from the mechanism used to transport the
   session descriptions. For example, the offer can be sent in an INVITE
   request and the answer can arrive in the 200 (OK) response for that
   INVITE. Or, alternatively, the offer can be sent in the 200 (OK) for
   an empty INVITE and the answer be sent in the ACK. When reliable
   provisional responses [2] and/or UPDATE requests [3] are used, there
   are many more possible ways to exchange offers and answers.

   The offer/answer model is not even coupled to SIP. Other transport
   mechanisms such as email attachments or instant messages can be used
   to perform an offer/answer exchange.

   This decoupling between the offer/answer model and the particular
   messages used for a particular offer/answer exchange implies that the



G. Camarillo et. al.                                          [Page 3]


Internet Draft                    SIP                  November 29, 2002


   negotiation of media parameters is not affected by the status of the
   session. If an INVITE contains an offer, it does not matter that the
   answer is received in a 183 (Session Progress), a 180 (Ringing) or a
   200 (OK) response. The resulting media session will be the same in
   the three scenarios.

        Note that in the past, some people wrongly believed that a
        UAC receiving a particular answer had to set up different
        early media sessions if the answer was received in a 180
        response (all the media streams were "magically" considered
        inactive) or in a 183 response (the media streams were
        established following the normal offer/answer model).

2.1 Status Codes

   As a consequence of the previously mentioned decoupling, the status
   code of a particular 1xx or 2xx SIP response is independent of the
   offer/answer model. For example, if a UAS is alerting the user, it
   will send a 180 (Ringing) response, regardless of the presence (or
   absence) of early media. Early media is driven by the offer/answer
   model, NOT by the status codes.

2.2 Direction Attributes and Media Clipping

   The direction attribute (i.e., sendrecv, sendonly, recvonly or
   inactive) for a particular stream contains the status of the media
   tools handling that stream at the end-points. Therefore, the
   direction attribute indicates whether or not the media tools at the
   end-point are ready to receive/send media over a particular media
   stream.

   The problem is that the offer/answer model does not distinguish
   between a sender that does not intend to send media and a receiver
   that does not accept incoming media. This distinction is useful to
   avoid media clipping in certain situations. We have the following
   alternatives for a particular direction of a stream:

        1.   Sender intends to send; receiver accepts media

        2.   Sender intends to send; receiver does NOT accept media

        3.   Sender does NOT intend to send; receiver accepts media

        4.   Sender does NOT intend to send; receiver does NOT accept
             media

   We have to analyze in which of these 4 scenarios there is a chance of
   having media clipping when the media resumes being sent over the



G. Camarillo et. al.                                          [Page 4]


Internet Draft                    SIP                  November 29, 2002


   stream. If is obvious that in scenario 1 there is already media
   flowing from sender to receiver, so we do not need to analyze it.

   If in scenario 2 the receiver decides to start accepting media, it
   will configure his media tool so that it is ready to receive media,
   and it will send an offer to the sender indicating so. Since the
   receiver configures his media tool before sending the offer, there is
   no media clipping.

   In scenario 3, if the sender decides to start sending media, it will
   have to send an offer to the receiver indicating so. However, since
   SIP signalling typically traverses a different path than the media
   packets, the first media packets may arrive to the receiver before
   the offer. This is not a problem, since the receiver was accepting
   media anyway. There is no media clipping.

   In scenario 4, if the sender decides to start sending media, it will
   have to send an offer to the receiver indicating so. However, the
   sender cannot start sending media to the receiver until it gets the
   answer back. Otherwise, all the media would be discarded by the
   receiver, since it was not accepting any media at that point in time.
   This leads to media clipping. The sender will not typically be able
   to send the first "hello" pronounced by the user.

   The problem with the offer/answer model is that it can establish
   scenario 4, but it cannot establish scenario 3. Therefore, when a
   sender that was quiet resumes sending media, there can be media
   clipping. The solution to this problem consists of using the SDP
   direction attribute to indicate media acceptance by the receiver and
   a new SDP parameter to indicate intention to send media by the
   sender. Such a parameter is defined in Section 2.4.

2.3 Intention to Send Media

   To resolve the problem above, some proposed keeping the sender from
   signalling that it did not intend to send media. That would transform
   scenario 3 into scenario 1, eliminating media clipping. However,
   knowing whether or not the sender intends to send media may be
   important to drive GUIs in certain situations, as shown in the
   following example.

   Two users, A and B, are involved in a videoconference using a
   sendrecv video stream. B wants to have a moment of privacy, so he
   switches off his camera for a minute. B issues an offer indicating
   that it does not intend to send video. However, the offer indicates
   that A and B should still keep their video tools configured as
   sendrecv, so that when B switches on his camera again, they can
   perform a "soft" media resume (i.e., without media clipping).



G. Camarillo et. al.                                          [Page 5]


Internet Draft                    SIP                  November 29, 2002


   B's intention of not sending video is now used to drive A's GUI
   (e.g., minimizing the window where A was watching B's face). If B's
   intention had not been signalled, A's GUI would have probably
   continued showing the last video frame that was received over the
   stream. A would not have been able to distinguish this situation from
   a massive packet loss in the network (RTCP timers are usually too
   long for this purpose). Therefore, signalling the intention of
   sending or not sending media is important to drive GUIs.

2.4 The SDP Intention Parameter

   OPEN ISSUE: SHOULD THIS ATTRIBUTE BE DEFINED IN AN MMUSIC DRAFT OR IS
   IT OK TO DEFINE IT IN THIS SECTION? IT PROBABLY BELONGS TO MMUSIC,
   BECAUSE IT IS NOT EARLY MEDIA SPECIFIC.

   A new "intention" SDP media level attribute is defined. It is used to
   indicate whether or not the entity generating the session description
   intends to send media at a particular point in time over a particular
   stream. Its formatting in SDP is described by the following BNF:



            intention-attribute    = "a=intention:" intention-value
            intention-value        = "send" | "nosend"



2.5 Applicability of the SDP Intention Parameter

   The SDP intention parameter should be used by systems that want to
   provide information to drive GUIs and that want to avoid media
   clipping. Systems whose requirements regarding media clipping are not
   strict can signal scenario 4 instead. Systems that do not wish to
   provide information to drive GUIs can signal scenario 1 instead.

3 Forking

   If an INVITE forks, the UAC can receive multiple provisional
   responses that establish different early media streams. It is up to
   the UAC's local policy how to render the media received over those
   streams. When a UAC has to deal with several video streams, it seems
   natural, if the GUI supports it, to use a different window to show
   each individual stream. However, a UAC receiving several audio
   streams will probably have to choose one to be played, because mixing
   them all may not be useful.

   Note that if the INVITE that forked contained an offer, all the UASs
   will send their early media to the same transport address of the UAC.



G. Camarillo et. al.                                          [Page 6]


Internet Draft                    SIP                  November 29, 2002


   The UAC should be ready to temporarily demultiplex them based on the
   RTP SSRCs and send a new offer within the early dialog as soon as the
   offer/answer rules allow it.

4 Ringback Tone Generation

   In the PSTN, telephone switches typically play ringback tones to the
   caller to indicate that the called user is being alerted. When, where
   and how these ringback tones are generated has been standardized
   (i.e., the local exchange of the callee generates a standardized
   ringback tone while the callee is being alterted). A standardized
   approach to provide this type of feedback for the user makes sense in
   a homogeneous environment such as the PSTN, where all the terminals
   have a similar user interface.

   This homogeneity is not found among SIP user agents. SIP user agents
   have different capabilities, different user interfaces and may be
   used to establish sessions that do not involve audio at all. Because
   of this, the way a SIP UA provides the user with information about
   the progress of session establishment is a matter of local policy.
   This local policy in a given SIP UA has two main inputs; the status
   of the INVITE transaction and the availability of incoming early
   media.

   The status of the INVITE transaction is given by the status code of
   the latest response (e.g., 180 Ringing). The availability of incoming
   early media is given by the offer/answer model and its direction
   attributes and the intention attribute.

   For example, a POTS-like SIP UA could implement the following local
   policy:

        1.   If there is at least one audio stream in sendrecv or
             recvonly mode, play out the audio received over that
             stream.

        2.   If the callee is being alerted and there are no audio
             streams in sendrecv or recvonly mode, play a locally-
             generated ringback tone to the user.

   And a SIP UA with a graphical user interface could follow the local
   policy below:

        1.   If there are audio or/and video streams in sendrecv or
             recvonly mode, play out whatever it is received over those
             streams.

        2.   If the callee is being alerted, display the message "The



G. Camarillo et. al.                                          [Page 7]


Internet Draft                    SIP                  November 29, 2002


             callee is being alerted" for the user.

        3.   If a provisional response other than alerting is received,
             display its reason phrase to the user (e.g., Trying, Call
             is Being Forwarded, Queued)

   Note that while it is not desirable to standardize a common local
   policy to be followed by every SIP UA, a particular subset of more or
   less homogeneous SIP UAs could use the same local policy by
   convention. Examples of such subsets of SIP UAs may be "all the
   PSTN/SIP gateways" or "every 3G IMS terminal". However, defining the
   particular common policy that such groups of SIP devices may use is
   outside the scope of this document.

5 Interactions with Preconditions

   RFC 3312 [5] defines a framework for preconditions for SIP. The
   negotiation of preconditions does not interact with the negotiation
   or early media. Every precondition has a direction attribute (e.g.,
   QoS in the sendonly direction) that may differ from the direction
   attribute of the media stream. Since the presence of early media is
   signalled with the latter attribute, there are no interactions
   between preconditions and early media.

   For example, a UA can request sendrecv QoS for a media stream that
   will be in recvonly mode for early media and will be set to sendrecv
   when the session is accepted.

6 Examples

   The following examples assume SIP UAs following the local policy
   below:

        1.   If there is at least one audio stream in sendrecv or
             recvonly mode, play out the audio received over that
             stream.

        2.   If the callee is being alerted and there are no audio
             streams in sendrecv or recvonly mode, play a locally-
             generated ringback tone to the user.

6.1 Remotely Generated Ringback Tone

   The UAS of Figure 1 receives an initial INVITE (1) with an offer that
   contains an audio stream in sendrecv mode. The UAS will play an
   announcement, but it will not accept incoming (early) media until
   user B accepts the session. The UAS sends a 183 (Session Progress)
   response with an answer that sets the audio stream to sendonly (2).



G. Camarillo et. al.                                          [Page 8]


Internet Draft                    SIP                  November 29, 2002


   After playing the announcement, the UAS starts alerting user B (5).
   The UAS will be generating a special ringback tone on the media
   stream, but since the audio stream was already in sendonly mode,
   there is no need of a new offer/answer exchange.

   When user B accepts the session the UAS sends a 200 (OK) response (8)
   for the INVITE and an UPDATE (9) to set the audio stream to sendrecv
   in parallel. The UAC sends the ACK (10) and the 200 (OK) response
   (11) for the UPDATE in parallel.


   Since the audio stream is in sendrecv or recvonly mode (from the
   UAC's prespective) all the time, the UAC applies the first bullet of
   its local policy. It plays out whatever it is received over the audio
   stream (i.e., first the announcement and then the remotely generated
   ringback tone).

6.2 Locally Generated Ringback Tone

   The UAS of Figure 2 receives an initial INVITE (1) with an offer that
   contains an audio stream in sendrecv mode. The UAS will play an
   announcement, but it will not accept incoming (early) media until
   user B accepts the session. The UAS sends a 183 (Session Progress)
   response with an answer that sets the audio stream to sendonly (2).
   After playing the announcement, the UAS starts alerting user B, but
   it will not be generating any ringback tone on the media stream.
   Therefore, it sends a 180 (Ringing) response (5) and sets the audio
   stream to inactive with an UPDATE (8). At this point in time, the UAC
   uses the second bullet of its local policy and generates ringback
   tone locally.

   When user B accepts the session the UAS sends a 200 (OK) response
   (10) for the INVITE and an UPDATE (11) to set the audio stream to
   sendrecv in parallel. The UAC sends the ACK (12) and the 200 (OK)
   response (13) for the UPDATE in parallel.


7 Acknowledgments

   Paul Kyzivat, Christer Holmberg, Jon Peterson and William Marshall
   provided useful comments and suggestions.

8 Authors' Addresses

   Gonzalo Camarillo
   Ericsson
   Advanced Signalling Research Lab.
   FIN-02420 Jorvas



G. Camarillo et. al.                                          [Page 9]


Internet Draft                    SIP                  November 29, 2002


   Finland
   electronic mail:  Gonzalo.Camarillo@ericsson.com

   Henning Schulzrinne
   Dept. of Computer Science
   Columbia University 1214 Amsterdam Avenue, MC 0401
   New York, NY 10027
   USA
   electronic mail:  schulzrinne@cs.columbia.edu

9 Bibliography

   [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J.
   Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session
   initiation protocol," RFC 3261, Internet Engineering Task Force, June
   2002.

   [2] J. Rosenberg and H. Schulzrinne, "Reliability of provisional
   responses in session initiation protocol (SIP)," RFC 3262, Internet
   Engineering Task Force, June 2002.

   [3] J. Rosenberg, "The session initiation protocol (SIP) UPDATE
   method," RFC 3311, Internet Engineering Task Force, Oct. 2002.

   [4] J. Rosenberg and H. Schulzrinne, "An offer/answer model with
   session description protocol (SDP)," RFC 3264, Internet Engineering
   Task Force, June 2002.

   [5] "Integration of resource management and session initiation
   protocol (SIP)," RFC 3312, Internet Engineering Task Force, Oct.
   2002.


   Full Copyright Statement

   Copyright (c) The Internet Society (2002). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be



G. Camarillo et. al.                                         [Page 10]


Internet Draft                    SIP                  November 29, 2002





          A                                            B

          |                                            |
          |---------------(1) INVITE    -------------->|
          |                   a=sendrecv               |
          |<------(2) 183 Session Progress-------------|
          |           a=sendonly                       |
          |-----------------(3) PRACK----------------->|
          |                                            |
          |<-----------(4) 200 OK (PRACK)--------------|
          |  *                                         |
          | ****************************************** |
          |*     User B will be with you shortly     * |
          | ****************************************** |
          |  *                                         |
          |<------------(5) 180 Ringing----------------|
          |                                            |
          |-----------------(6) PRACK----------------->|
          |                                            |
          |<-----------(7) 200 OK (PRACK)--------------|
          |  *                                         |
          | ****************************************** |
          |*             Ringback tone               * |
          | ****************************************** |
          |  *                                         |
          |<-----------(8) 200 OK (INVITE)-------------|
          |                                            |
          |<--------------(9) UPDATE    ---------------|
          |                   a=sendrecv               |
          |                                            |
          |-----------------(10) ACK------------------>|
          |                                            |
          |------------(11) 200 OK (UPDATE)----------->|
          |                 a=sendrecv                 |
          |  *                                      *  |
          | ****************************************** |
          |*        Bi-directional conversation       *|
          | ****************************************** |
          |  *                                      *  |
          |                                            |






   Figure 1: Remotely generated ringback tone


G. Camarillo et. al.                                         [Page 11]


Internet Draft                    SIP                  November 29, 2002





          A                                            B

          |                                            |
          |---------------(1) INVITE    -------------->|
          |                   a=sendrecv               |
          |<------(2) 183 Session Progress-------------|
          |           a=sendonly                       |
          |-----------------(3) PRACK----------------->|
          |                                            |
          |<-----------(4) 200 OK (PRACK)--------------|
          |  *                                         |
          | ****************************************** |
          |*     User B will be with you shortly     * |
          | ****************************************** |
          |  *                                         |
          |<------------(5) 180 Ringing----------------|
          |                 a=inactive                 |
          |<--------------(6) UPDATE    ---------------|
          |                   a=inactive               |
          |                                            |
          |-------------(7) PRACK--------------------->|
          |-------------(8) 200 OK (UPDATE)----------->|
          |                 a=inactive                 |
          |<-----------(9) 200 OK (PRACK)--------------|
          |                                            |
          |                                            |
          |                                            |
          |<----------(10) 200 OK (INVITE)-------------|
          |                                            |
          |<-------------(11) UPDATE    ---------------|
          |                   a=sendrecv               |
          |                                            |
          |-----------------(12) ACK------------------>|
          |                                            |
          |------------(13) 200 OK (UPDATE)----------->|
          |                 a=sendrecv                 |
          |  *                                      *  |
          | ****************************************** |
          |*        Bi-directional conversation       *|
          | ****************************************** |
          |  *                                      *  |
          |                                            |


   Figure 2: Locally generated ringback tone




G. Camarillo et. al.                                         [Page 12]


Internet Draft                    SIP                  November 29, 2002


   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.







































G. Camarillo et. al.                                         [Page 13]