Internet Draft                           Bert Culpepper
   draft-culpepper-sipping-app-interact-
   reqs-02.txt
   November 3, 2002                         Robert Fairlie-Cuninghame
   Expires: May 2003                        Nuera Communications, Inc.


                     Session Initiation Protocol Based
                   Application Interaction Requirements


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 [1].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet- Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This particular draft is intended to be discussed in the SIPPING
   Working Group.  Discussion of it therefore belongs on that list.
   The charter for SIPPING working group may be found at
   http://www.ietf.org/html.charters/sipping-charter.html

Abstract

   This document defines the high level requirements for a framework
   and/or one or more mechanisms that support user interaction, via
   SIP-based user agents, with applications residing on remote network
   servers.  The requirements in this document address the overall
   features of such a system, without regard to its architecture.

   SIP currently supports media-based application interactions using
   methods such as speech, video and end-to-end telephony-related
   tones; however, it is desired that more general application
   interaction models are defined, especially those that are not
   restricted to the media plane.  In addition, it is desired that an
   application be able to present the user with application-specific


Culpepper/Fairle-Cuninghame                                    [Page 1]


Internet Draft       SIP-Based App Interaction Reqs         Nov 3, 2002

   user interfaces and information.  The user agent should also be able
   to generate activity indications back to an application to
   communicate actions on physical or logical user interfaces.  The
   document also defines a number of topic-related terms to assist in
   disambiguating discussions of the issues.

1. Motivation

   Telecommunications services in circuit-switched networks have
   utilized end-user indications as the means for users to interact
   with the services while users are engaged in a call.  These end-user
   indications, such as those produced by a user pressing keys, are
   sent end-to-end through each of the network entities participating
   in the call.  As communications services move to IP networks, the
   ability for users to interact with their communications services in
   a real-time like fashion must also follow.  Unlike the legacy
   circuit-switched networks, nodes hosting many services in IP
   networks infrequently reside along the path taken by the media.

   Users of communications services have become accustomed to control
   of services through interaction via the communications terminal.
   The traditional means by which users interact with their
   communications services in legacy networks is via the use of DTMF
   generated as a result of the user pressing a key on terminal's
   keypad.  Because of this, there is a significant desire to duplicate
   the use of DTMF to support user interaction with services tightly
   associated with IP communications sessions.  The Internet network
   model for communications separates session control from the session
   media in that the devices involved in session control are not
   necessarily tightly coupled to the devices that process media.  As
   the transport of DTMF is provided for in IP networks as a media
   stream, access to these user indications by the network entities
   involved in the session control is awkward.  In addition, limiting
   user interaction with communications services to input devices that
   emulate the traditional telephone keypad constrain the user devices
   unnecessarily.

   In addition to legacy application interaction methods such as DTMF,
   there is a desire for new interaction methods that support the use
   of web pages, keyboards and other user devices used to access the
   Internet to be available.  These new interaction methods should
   operate, from a user's perspective, in a consistent and seamless
   manner with legacy methods such as DTMF.

   It is for these reasons a different mechanism than that based on
   legacy networks is needed to transport user indications for
   application interaction in IP networks.

   The Session Initial Protocol (SIP) [2] has been chosen as the
   session control protocol for multimedia session establishment within
   the general Internet and in many other IP-based networks.  Because
   of this choice, it is desirable to have a mechanism supporting user


Culpepper/Fairlie-Cuninghame                                   [Page 2]


Internet Draft       SIP-Based App Interaction Reqs         Nov 3, 2002

   application interaction that works with SIP.  As SIP deals with
   session control and not media transport, the mechanism should not be
   limited to the media plane.

2. Terminology

   The following acronyms and terms are used in this document.

   Requestor: The agent responsible for requesting user indications or
   application presentations from the Reporter.  The Requestor is
   normally associated with the Application Entity.

   Reporter: The agent responsible for detecting and reporting user
   activity indications or presenting a user application component to
   the user.  This framework restricts the Reporter to being a SIP UA
   and is normally associated with the User or User Device.

   UA: SIP User Agent [2].

   User Activity Indication (UAI): The message(s) containing the data
   associated with the reporting of discrete user indications, for
   instance, a mouse click or button press.  It refers to indications
   relating to discrete stimulus-based interactions rather than media
   stream-based interactions such as voice or video.

   Physical User Interface: The collection of physical input and
   presentation devices possessed by a device, for instance, a display,
   speaker, microphone or dialpad.

   Logical User Interface (LUI): The logical collection of user
   interface components (see definition below) used by a user to
   interact with a group of (explicitly) cooperating applications.  A
   logical user interface is independent of all other application
   interactions occurring on the device.

   User Interface Component (UIC): A component (physical or otherwise)
   used for application interaction.  Examples of UICs include: a web-
   page window, a media-based video window, a speaker, microphone or a
   key-based input device.  A UIC may only generate user activity
   indications when the user is interacting with the associated logical
   user interface.

   Presentation-based Interaction: A presentation-based UIC will
   present an application-supplied user interface (or simply
   application-supplied information) to the user.  A presentation-based
   component will also commonly allow a user to interact directly with
   the supplied interface through stimulus-based methods (UAIs).  An
   example is a web-page window & pointing device or simply a display
   screen with no associated input device.

   Media-based Interaction: Media-based interaction refers to user
   input supplied via UICs that process media (e.g., audio).  Media-
   based UI components allow bi-directional or unidirectional

Culpepper/Fairlie-Cuninghame                                   [Page 3]


Internet Draft       SIP-Based App Interaction Reqs         Nov 3, 2002

   interaction through the media plane, for instance, a speaker or a
   microphone (unidirectional) or a speaker & microphone combination
   (bi-directional).  Media-based UICs may present application-supplied
   user interfaces or information to the user; however, these
   components do not generate discrete user activity indications and
   merely relay un-interpreted media streams to/from the application.
   The resulting framework does not alter the normal SIP session
   semantics but simply allows the media-based SIP session to be
   associated with a UIC within a logical user interface.

   Input-based Interaction: Input-based interaction refers to user
   input supplied via UICs that do not present an application-supplied
   interface to the user but rather correspond to a (usually physical)
   interface possessed by the device, for instance, a dialpad or
   keyboard.  Input-based UI components generate User Activity
   Indications in response to user actions.

3. End-to-end Verses Asynchronous User Activity Indications

   The end-to-end user activity indications currently supported in IP
   networks require "workarounds" in SIP networks so that applications
   along the session signaling path have access to the indications.
   The current solution requires "DTMF forking" be supported by the
   endpoint, or requires the receiving entity, when it's not the final
   destination for the session's media, to re-generate the indication
   towards the destination.  In many scenarios, the indications meant
   for the application are not used at the destination.

   User activity indications needed for application interaction on the
   other hand, are only needed between an endpoint/user and the
   application within the network.  Using end-to-end mechanisms for
   application interaction, when the application is not itself an
   endpoint in the session, is problematic as indicated above.

4. General Requirements

   R1:  The framework must support the collection of device/user input
        generated in the context of a SIP dialog or conversation-space.

   R2:  The framework must transport user activity indications to
        network elements independently of the media plane.

   R3:  The transport mechanism must be sensitive to the limited
        bandwidth constraints of some signaling planes; for instance,
        reliability through blind retransmission is not acceptable.

   R4:  The framework must support multiple network entities or
        applications requesting and receiving user activity indications
        from a SIP UA independently of each other.


Culpepper/Fairlie-Cuninghame                                   [Page 4]


Internet Draft       SIP-Based App Interaction Reqs         Nov 3, 2002

   R5:  The framework must provide a means for a network
        application/entity to indicate its desire to receive user
        activity indications and/or to present an application interface
        on the User's UA.

   R6:  The framework must support a means for a requestor to be able
        to determine the user interface components that are available
        at the UA for application use.

   R7:  The framework must provide a means for a SIP UA to indicate its
        capability/intent to fulfill a request for user activity
        indications.

   R8:  The framework must provide a means whereby the Requestor can
        indicate its desire to only receive a subset of the supported
        user activity indications for any non-trivial UI component.

   R9:  The framework must provide a means to prevent the transport of
        UAIs unless implicitly or explicitly requested by an entity.

   R10: The framework should support devices with a wide range of user
        interfaces for both presentation-based and input-based
        interaction modes, for instance, it must support devices that
        possess a display UI component, as well as those that do not;
        from devices that only have physical buttons to those that only
        have display-based pointing devices.

   R11: The framework must be extensible so that a variety of non key-
        based user activity indications can be supported now or in the
        future, for instance, sliders, dials, switches, local voice-
        commands, hyperlinks, biometrics, etc.

   R12: The framework must support reliable delivery of UAIs at least
        as good as the session control protocol.

   R13: The framework must ensure that the receiver of user activity
        indications (i.e., the Requestor) can determine their original
        order of occurrence and detect any missing indications.

   R14: The framework must allow the user to know which application is
        associated with each UIC.

   R15: The framework must provide a mechanism that allow users to have
        assurances that the user input they are providing is only seen
        by the application that created the user interface component.

   R16: The framework must support the ability for each user interface
        component to be associated with a separate logical user
        interface.  Each logical user interface may be associated with
        the same or different applications.  For example, a user may
        want to interact with a voice-recording application and a
        prepaid calling application within the same call but allow each
        application to use a different logical user interface.

Culpepper/Fairlie-Cuninghame                                   [Page 5]


Internet Draft       SIP-Based App Interaction Reqs         Nov 3, 2002


   R17: The framework must allow user interface components created
        through this mechanism to be updated or removed as desired by
        the creating application entity.

   R18: Unless authorized by the user, application interaction
        resources established through this framework should be
        terminated when they are no longer associated with a SIP dialog
        (by the User Agent).

5. Key-Based Input Specific Requirements

   K1:  The framework must address the collection of DTMF-based user
        activity indications.

   K2:  The framework must address the collection of user activity
        indications for device- and/or user- specific buttons.

   K3:  For key-based indications, the framework must provide some form
        of indication of key press duration.

   K4:  For key-based indications, the framework must provide some form
        of indication of a key-press' occurrence in time relative to
        other key presses.

6. Desirables

   D1:  The framework should allow a device to indicate relative
        preferences amongst its various supported user interface
        components.

   D2:  To help manage feature interaction, the framework should also
        allow a means of prioritizing user interface component requests
        from multiple network entities within a single SIP dialog.

7. Acknowledgements

   The authors would like to acknowledge the detailed comments and
   additions to this document by Jonathan Rosenberg of Dynamicsoft,
   Inc. and Eric Chueng of AT&T Labs.

8. Authors

   Robert Fairlie-Cuninghame
   Nuera Communications, Inc.
   50 Victoria Rd

   Farnborough, Hants GU14-7PG
   United Kingdom
   Phone: +44-1252-548200
   Email: rfairlie@nuera.com


Culpepper/Fairlie-Cuninghame                                   [Page 6]


Internet Draft       SIP-Based App Interaction Reqs         Nov 3, 2002


   Bert Culpepper
   Email: bertculpepper@netscape.net


9. References

   1  S. Bradner, "The Internet Standards Process -- Revision 3", BCP
      9, RFC 2026, October 1996.

   2  J. Rosenberg, H. Schulzrinne, et. al., "SIP: Session Initiation
      Protocol", RFC 3261, June 2002.









































Culpepper/Fairlie-Cuninghame                                   [Page 7]