SIPPING J. Van Dyke
Internet-Draft E. Burger (Ed.)
Expires: January 8, 2004 A. Spitzer
SnowShore Networks, Inc.
July 8, 2003
Media Server Control Markup Language (MSCML) and Protocol
draft-vandyke-mscml-03
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 8, 2004.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
Media Server Control Markup Language (MSCML) is a markup language
used in conjunction with SIP to provide advanced conferencing and IVR
functions. This protocol is for communications between a conference
focus and mixer in the IETF SIP Conferencing Framework.
Conventions used in this document
RFC2119 [1] provides the interpretations for the key words "MUST",
"MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" found in this document.
Van Dyke, et al. Expires January 8, 2004 [Page 1]
Internet-Draft MSCML July 2003
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. MSCML Approach . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Use of SIP Request Methods . . . . . . . . . . . . . . . . . . 6
4. MSCML Usage and Design . . . . . . . . . . . . . . . . . . . . 8
5. Advanced Conferencing . . . . . . . . . . . . . . . . . . . . 9
6. Interactive Voice Response (IVR) . . . . . . . . . . . . . . . 14
6.1 Play Audio <play> . . . . . . . . . . . . . . . . . . . . . . 15
6.2 Collect Digits <playcollect> . . . . . . . . . . . . . . . . . 15
6.3 Recording Audio <playrecord> . . . . . . . . . . . . . . . . . 17
6.4 Stop Request <stop> . . . . . . . . . . . . . . . . . . . . . 19
6.5 Prompt Block <prompt> . . . . . . . . . . . . . . . . . . . . 19
6.6 Recording Fax <faxrecord> . . . . . . . . . . . . . . . . . . 20
6.7 Sending Fax <faxplay> . . . . . . . . . . . . . . . . . . . . 22
7. Response Attributes and Return Codes . . . . . . . . . . . . . 25
7.1 SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.3 <response> Attributes . . . . . . . . . . . . . . . . . . . . 25
8. Formal Syntax . . . . . . . . . . . . . . . . . . . . . . . . 28
8.1 MSCML DTD . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.2 Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40
9.1 IANA Registration of MIME media type
application/mediaservercontrol+xml . . . . . . . . . . . . . . 40
10. Security Considerations . . . . . . . . . . . . . . . . . . . 41
Normative References . . . . . . . . . . . . . . . . . . . . . 42
Informative References . . . . . . . . . . . . . . . . . . . . 43
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 44
A. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 45
B. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 46
Intellectual Property and Copyright Statements . . . . . . . . 47
Van Dyke, et al. Expires January 8, 2004 [Page 2]
Internet-Draft MSCML July 2003
1. Introduction
This document describes the Media Server Control Markup Language
(MSCML). This document describes payloads that one can send with a
standard SIP INVITE to a media server. Basic Network Media Services
with SIP [4] describes media server SIP URI formats.
Prior to MSCML, there was not a standard way to deliver SIP-based
enhanced conferencing. Basic SIP constructs, such as described in
Basic Network Media Services with SIP [4], serves simple n-way
conferencing well. The SIP URI provides a natural mechanism for
identifying a specific SIP conference, while INVITE and BYE methods
elegantly implement conference join and leave semantics. However,
enhanced conferencing applications also require features such as
sizing and resizing, in-conference IVR operations (e.g. recording and
playing participant names to the full conference) and conference
event reporting. MSCML payloads within standard SIP methods realize
these features.
The structure and approach of MSCML satisfy the requirements set out
in conferencing-framework [5] and cc-framework [6]. In particular,
MSCML serves as the interface between the conference factory and a
centralized conference mixer. In this case, a media server has the
role of the conference mixer.
There are two broad classes of MSCML functionality. The first class
includes primitives for advanced conferencing such as conference
configuration, participant leg manipulation and conference event
reporting. The second class comprises primitives for interactive
voice response (IVR). These include playing audio, collecting
digits, and recording audio.
The IVR features of MSCML originally evolved simply as an adjunct for
conferencing. In many scenarios it was impractical or inconvenient
to establish a dialog with a distinct IVR resource and then re-join
the conference. However, MSCML works well for simple IVR such as
prompt-and-collect for SIP Proxy Servers or Media Gateway
Controllers. On the other hand, for complex IVR it may be more
appropriate to employ a full IVR markup language such as VoiceXML
[7].
In general, a media server offers services to SIP UAC's to
application servers, feature servers, and media gateway controllers.
See the ISC Reference Architecture [8] for definitions of these
terms. It is unlikely, but not prohibited, for end user SIP UAC's to
have a direct signaling relationship with a media server.
This document describes a working framework and protocol with which
Van Dyke, et al. Expires January 8, 2004 [Page 3]
Internet-Draft MSCML July 2003
there is considerable implementation experience. Application
developers and service providers have created several MSCML-based
services since the initial version was made available more than a
year ago. This experience is highly relevant to the ongoing work of
the IETF, particularly the SIP, SIPPING, and MMUSIC work groups.
Van Dyke, et al. Expires January 8, 2004 [Page 4]
Internet-Draft MSCML July 2003
2. MSCML Approach
It is critically important to emphasize the goal of MSCML is to
provide a development environment that follows the SIP, HTTP, and XML
development paradigm. That is, the mixing resource is a server that
operates on application level constructs such as call participants.
Some developers may desire low-level of control over DSP resources.
Examples of such control include path establishment between DSP
blocks such as tone detectors, tone generators, or other speech
resources. For such users, we STRONGLY suggest using a protocol such
as H.248.1 [9]. Such control does not fit the SIP model. It may be
possible to transport such low-level instructions in SIP. However,
the programming model moves from the client-server peer paradigm of
SIP to the master-slave controller model of H.248.1.
The MSCML paradigm is important to the developer community, in that
developers and operators conceptually write applications about calls,
conferences, and call legs. The H.248.1 paradigm is conceptually
about resources and plumbing. That is a whole level of
implementation details that, for the majority of developers, adds no
value.
Van Dyke, et al. Expires January 8, 2004 [Page 5]
Internet-Draft MSCML July 2003
3. Use of SIP Request Methods
As mentioned above, MSCML payloads may be carried in either SIP
INVITE or INFO requests. The initial INVITE, which creates an
enhanced conference, MUST include an MSCML payload. The initial
INVITE, which joins a participant leg to an enhanced conference, MAY
include an MSCML payload. All mid-call MSCML payloads are sent via
SIP INFO requests.
MSCML responses are transported in the final response to the SIP
INVITE containing the matching MSCML request or in a SIP INFO
message. The only allowable final response to a SIP INFO containing
a message body is a 200 OK, per RFC2976 [10]. Therefore, when the
MSCML request is sent via SIP INFO, the MSCML response is carried in
a separate INFO request. In general, these responses are
asynchronous in nature and require a separate transaction due to
timing considerations.
There has been considerable debate on the use of the SIP INFO method
for any purpose. Our experience is that MSCML would not have been
possible without it. When MSCML was implemented the first SIP Event
Notification draft had just been published. At that time, use of
SUBSCRIBE/NOTIFY within an existing dialog was undefined. This
prevented its use in MSCML since all events occurred in an INVITE
established dialog. And while SUBSCRIBE/NOTIFY was well suited for
reporting conference events its semantics seemed inappropriate for
modifying a participant leg or conference setting where the only
"event" was the success or failure of the request. Lastly, since SIP
INFO was an established RFC it was well supported in all the SIP
stack implementations available at that time. We had few if any
interoperability issues as a result.
As it turns out, using NOTIFY is not appropriate, as the NOTIFY would
be in response to an implicit subscription. The issues of implicit
subscription have been discussed on the SIP and SIPPING lists.
Using SUBSRCIBE is not appropriate for two reasons. The first is
semantic. The purpose of SUBSCRIBE is to register interest in User
Agent state. However, using SUBSCRIBE for MSCML results in the
SUBSCRIBE modifying the User Agent state. The second reason
SUBSCRIBE is not appropriate is because MSCML is inherently
call-based. The association of a SIP dialog with a call leg means
MSCML can be incredibly straightforward. For example, if one used
SUBSCRIBE or other SIP method to send commands about some context,
one must identify that context somehow. Relating commands to the SIP
dialog they arrive on defines the context for free. Moreover, it is
conceptually easy for the developer.
Van Dyke, et al. Expires January 8, 2004 [Page 6]
Internet-Draft MSCML July 2003
We have considered the MESSAGE method, as used in, for example, KPML
[11]. MESSAGE is appropriate for KPML as there is usually only a
single response to a given KPML document. However, for MSCML, there
can be multiple responses to a given request. Also, mid-call
requests can go in both directions, which is not the case for KPML.
Because of the multiple response and peer mid-call request nature of
MSCML, we also considered MSRP [12]. MSRP may be the appropriate
technology. The main benefit of MSRP is that only proxies interested
in seeing MSCML signaling see the MSCML messages. This is in
contrast to the current scheme, where the interested proxies, as well
as any other proxies that happen to record-route, see the MSCML
messages. The trade-off here is that many of the interested proxies
are border proxies. In the interest of interoperability, we chose to
continue using INFO.
SIP continues to progress incredibly quickly and we will continually
reevaluate some of the decisions that resulted in the original design
of MSCML. However, we can confidently say that the availability of a
widely supported, flexible request method was very important to the
development and adoption MSCML.
Van Dyke, et al. Expires January 8, 2004 [Page 7]
Internet-Draft MSCML July 2003
4. MSCML Usage and Design
To avoid undue complexity two rules were established regarding MSCML
usage. The first is that only one MSCML body may be present in a SIP
request. The second is that each MSCML body may contain only one
request or response. This greatly simplified transaction management.
MSCML syntax does provide for the unique identification of multiple
requests in a single body part but this is not currently allowed.
Per the guidelines of RFC3470 [13], MSCML bodies MUST be wellformed
and valid.
Van Dyke, et al. Expires January 8, 2004 [Page 8]
Internet-Draft MSCML July 2003
5. Advanced Conferencing
The advanced conferencing model is a star controller model, with both
signaling and media directed to a central location. Figure 1 depicts
a typical signaling relationship between end users' UAC's, a
conference application server, and a media server.
+-------+
| UAC 1 |---\ Public URI +-------------+
+-------+ \ _____________| Application |
/ / | Server | Not shown:
+-------+ / / +-------------+ RTP flows directly
| UAC 2 |---/ / | Private between UAC's and
+-------+ / | URI Media Server
. / +--------------+
: / | |
+-------+ / | Media Server |
| UAC n |---/ | |
+-------+ +--------------+
Figure 1: Conference Model
Each UAC sends an INVITE to a Public Conference URI. Presumably the
Application Server publishes this URI, or it is an ad hoc URI. In
any event, the Application Server generates a Private URI, following
the rules specified by Basic Network Media Services with SIP [4].
That is, the URI is of the form:
sip:conf=UniqueID@ms.example.net
Where UniqueID is a unique conference identifier, and ms.example.net
is the host name or IP address of the media server. There is nothing
to prevent the UAC's from contacting the media server directly.
However, one would expect the owner of the media server to restrict
who can use media server resources.
As for basic conferencing, described by Basic Network Media Services
with SIP [4], the first INVITE to the media server with a UniqueID
creates a conference. However, in advanced conferencing, the first
INVITE includes a MSCML configure_conference payload. The MSCML
payload conveys extended session parameters (e.g. number of
participants) that are not readily expressed in SDP but must be known
to allocate the appropriate resources.
The first dialog established for an enhanced conference has several
useful properties and is referred to as the "Conference Control Leg."
The control leg is used for play or record audio operations to/from
the entire conference and no RTP is expected on the Conference
Control Leg. Therefore, the application must send either no SDP or
Van Dyke, et al. Expires January 8, 2004 [Page 9]
Internet-Draft MSCML July 2003
hold SDP (c=0.0.0.0) in the initial INVITE request. In addition, the
lifetime of the conference is the same as that of its control leg.
This ensures that the conference remains in existence even if one or
more participant legs unintentionally leaves the conference.
The <configure_conference> tag has two attributes that control the
resources the media server sets aside for the conference. The
attributes are reservedtalkers and reserveconfmedia. Reservedtalkers
sets the maximum number of talker legs. Reserveconfmedia, if set to
"Yes", allocates resources for playing or recording audio to or from
the entire conference. The default for reserveconfmedia is "Yes".
The application server can include any MSCML command in the initial
INVITE, with the exception of asynchronous commands, such as <play>
or <record>. The application server must issue asynchronous commands
separately (e.g., in INFO messages) to avoid ambiguous responses.
For example, to create a conference with up to 120 active talkers and
the ability to play audio into the conference or record parts or all
of the conference, the application server specifies both attributes,
as shown in Figure 3.
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<request>
<configure_conference reservedtalkers="120"/>
</request>
</MediaServerControl>
Figure 3: 120 Speaker MSCML Example
Figure 4 shows a conference with up to five active speakers without
the capability to play or record audio into the conference.
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<request>
<configure_conference reservedtalkers="5"
reserveconfmedia="no"/>
</request>
</MediaServerControl>
Figure 4: 5 Speaker MSCML Example
Once the application server has created the Conference Control Leg,
the server can join participants to the conference. The application
server directs the INVITE to the Private Conference URI described
above. In the example given, this would be
Van Dyke, et al. Expires January 8, 2004 [Page 10]
Internet-Draft MSCML July 2003
sip:conf=UniqueID@ms.example.net .
Conference legs have a number of parameters the application server
can modify. The defaults are in Figure 5. The following sections
will discuss the meaning of the parameters in detail.
Parameter Default Description
inputgain auto Use AGC to determine input gain for leg
outputgain auto Use AGC to determine output gain for leg
type talker Consider this leg's audio for mixing
in the output mix
dtmfclamp yes Remove detected DTMF digit from audio
toneclamp yes Remove loud single-frequency tone
from audio
Figure 5: Conference Leg Parameters
If the default parameters are acceptable for the leg the application
server wishes to enter into the conference, then a normal SIP INVITE
is sufficient. However, if the application server wishes to modify
one or more of the parameters, the application server can include a
MSCML body in addition to the SDP body.
The application server can modify the conference leg parameters by
issuing a SIP INFO on the selected dialog representing the conference
leg. Of course, the application server cannot modify SDP in an INFO
message.
To remove a leg from the conference, the application server issues a
SIP BYE request on the selected dialog representing the conference
leg.
The application server can terminate all legs in a conference by
issuing a SIP BYE request on the Conference Control Leg. If one or
more participants are still in the conference when the media server
receives a SIP BYE request on the Conference Control Leg, the media
server issues SIP BYE requests on all of the remaining conference
legs to ensure clean up of the legs.
The media server returns a 200 OK to the SIP BYE request as it sends
BYE requests to the other legs. This is because we cannot issue a
provisional response to a non-INVITE request, yet the teardown of the
other legs may "take a while".
Once the conference has begun, the application server can manipulate
the conference as a whole by issuing commands on the Conference Leg.
For example, the application server can request the media server to
record the conference, play a prompt to the conference, change the
Van Dyke, et al. Expires January 8, 2004 [Page 11]
Internet-Draft MSCML July 2003
input or output gain for the conference as a whole, and report on
events. The elements for these commands are <playrecord>, <play>,
<inputgain>, <outputgain>, and <subscribe>, respectively.
Figure 6 shows two sample commands. The first plays a prompt into
the conference. The second records the entire conference to the URI
specified by recurl over NFS.
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<request>
<play
prompturl="http://prompts.example.net/us_EN/welcome.au"/>
</request>
</MediaServerControl>
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<request>
<playrecord
recurl="file://archive.example.net/conferences/archives/011208.au"
beep="no"
initsilence="-1" endsilence="-1" />
</request>
</MediaServerControl>
Figure 6: Sample Full Conference Audio Commands
The response to this last request will be similar to Figure 7.
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<response request="playrecord" code="200" text="OK"/>
</MediaServerControl>
Figure 7: Sample Change Command Response
Later event reporting comes through SIP INFO messages. Figure 8
shows an example report.
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<notification>
<conference uniqueID="ab34h76z" numtalkers="16"
numlisteners="1382">
<activetalkers>
<talker callID="myhost4sn123"/>
Van Dyke, et al. Expires January 8, 2004 [Page 12]
Internet-Draft MSCML July 2003
<talker callID="myhost2sn456"/>
<talker callID="myhost12sn78"/>
</activetalkers>
</conference>
</notification>
</MediaServerControl>
Figure 8: Active Talker Event Example
An application server can modify a leg by issuing an INFO on the
dialog associated with the participant leg. For example, Figure 9
mutes a conference leg.
<?xml version="1.0" encoding="utf-8"?>
<MediaServerControl version="1.0">
<request>
<configure_leg mixmode="mute"/>
<request>
</MediaServerControl>
Figure 9: Sample Change Leg Command
In Figure 6 we saw a request to play a prompt to the entire
conference. We can also request to play a prompt to an individual
call leg. If we want to play a prompt or collect digits only on a
single leg, we issue the commands within the dialog for the of the
desired conference participant.
Van Dyke, et al. Expires January 8, 2004 [Page 13]
Internet-Draft MSCML July 2003
6. Interactive Voice Response (IVR)
In the IVR model, the Media Server acts as a media processing proxy
for the UAC. This is particularly useful when the UAC is a media
gateway or other device with limited media processing capability.
SIP +--------------+
Service URI | Application |
/---------------| Server |
/(e.g., RFC3087) +--------------+
/ | MSCML
/ SIP | Session
/ +--------------+
+-----+/ RTP | |
| UAC |=====================| Media Server |
+-----+ | |
+--------------+
Figure 10: IVR Model
The IVR service supports basic Interactive Voice Response functions,
playing announcements, collecting DTMF digits, and recording audio,
based on Media Server Control Markup Language (MSCML) directives
added to the message body of a SIP request. Figure 10 shows the
signaling relationship between a client UAC, and Application Server,
and a Media Server.
Multifunction media servers SHOULD use the URI conventions described
in Basic Network Media Services with SIP [4]. For review, the IVR
service indicator is "ivr":
sip:ivr@ms.example.net
One may carry the request payload for IVR in either the initial SIP
INVITE or INFO requests.
Mid-call requests must use the INFO method. The INFO method reduces
certain timing issues that occur with re-INVITES and also uses less
processing on both the application server and Media Server.
The Media Server notifies the application that the command has
completed through a <response> message containing final status
information and data such as collected DTMF digits.
The media server does not queue IVR requests. If the media server
receives a request while another is in progress, the media server
stops the first operation and it carries out the new request. The
Media Server generates a <response> message for the first request and
returns any data collected up to that point. If an application wishes
Van Dyke, et al. Expires January 8, 2004 [Page 14]
Internet-Draft MSCML July 2003
to stop a request in progress but does not wish to initiate another
operation, it issues a <stop> request. This also causes the Media
Server to generate a <response> message.
The Media Server treats a SIP re-INVITE with hold media (c=0.0.0.0)
as an implicit <stop> request. The media server immediately
terminates the running <play>, <playcollect> or <playrecord> request,
and sends a <response>, indicating "reason=stopped".
6.1 Play Audio <play>
The application issues a <play> request to play an announcement
without interruption and with no digit collection. One use, for
example, is to announce the name of a new participant to the entire
conference.
The application specifies the announcement to play by the prompt
block in the body of the request.
Attributes include promptencoding (optional), which explicitly
specifies the encoding (mu-law or A-law), and id (also optional). ID
is an application-defined request identifier that correlates the
asynchronous response with its original request and echoes back to
the application in the Media Server's response.
When the announcement has finished playing, the Media Server sends a
<response=> payload to the application in a SIP INFO message.
The response may carry the id, the status code (e.g., 200), the
status text (e.g., OK), and the reason (EOF or stopped).
6.2 Collect Digits <playcollect>
The application issues a <playcollect> request to optionally play an
announcement and then collect digits.
This request has multiple attributes, all of which are optional.
The presence or absence of the prompt block controls whether there
will be an announcement or the result of the request is to be digit
collection only.
Whenever the media server receives a <playcollect> request, it will
continuously buffer and examine collected digits. The media server
compares previously buffered digits to the returnkey, escapekey, and
maxdigits attributes to determine if any immediate action is
required. This provides the type-ahead behavior for menu traversal
and other types of IVR interactions.
Van Dyke, et al. Expires January 8, 2004 [Page 15]
Internet-Draft MSCML July 2003
The application may override type-ahead behavior by setting the
cleardigits parameter to "yes", which removes all previously-buffered
digits such that the only user input considered is what occurs after
the request.
If cleardigits is set to "no", digits previously buffered will result
in the prompt being barged immediately. Prompt play would never
begin, and digit collection would start immediately.
The default for barge is "yes". If the barge attribute is set to
"no", the cleardigits attribute implicitly has a value of "yes".
This ensures that DTMF input occurring before the current collection
is not left in the buffer after the request completes.
The application can set two special digits to invoke special
processing when detected:
o The escapekey, which defaults to *, indicates that the user
intends to terminate the current operation without saving any
input collected to that point. Detection terminates the request
immediately and generates a response.
o The returnkey, which defaults to #, indicates the user has
completed input and wants to return all collected digits to the
application. When the media server detects the returnkey, it
immediately terminates collection and returns the collected digits
to the application in the <response> message.
Several timer attributes control how long the Media Server waits for
digits in the input sequence. All timer settings are in
milliseconds.
firstdigittimer controls how long the Media Server waits for the
initial DTMF input before terminating collection.
interdigittimer controls how long the Media Server waits between DTMF
inputs.
extradigittimer controls how long the Media Server waits for
additional user input after the specified number of digits have
been collected.
The extradigittimer setting enables the "returnkey" input to be
associated with the current collection. For example, if maxdigits is
set to 3 and returnkey is set to #, the user may enter either "x#",
"xx#" or "xxx#", where x represents a DTMF digit.
If the "returnkey" pattern is detected during the "extradigit"
Van Dyke, et al. Expires January 8, 2004 [Page 16]
Internet-Draft MSCML July 2003
interval, the collected digits are returned to the application and
the "returnkey" is removed from the digit buffer.
If this were not the case, the example would return "xxx" to the
application and leave the terminating "#" in the digit buffer to be
processed by the next <playcollect> request. This might result in
the termination of the following prompt; clearly not what the user
intended.
The extradigittimer has no effect unless returnkey has been set.
When the <playcollect> has finished playing, the Media Server sends a
<response> payload to the application in a SIP INFO message.
The response may carry the id, the code (e.g., 200), the text(e.g.,
OK), the reason (match, timeout, returnkey, escapekey, or stopped),
and the collected digits.
6.3 Recording Audio <playrecord>
The <playrecord> request directs the Media Server to capture the RTP
it receives and deliver it to a URL specified by the controlling
application.
This tag has multiple attributes. The required recurl attribute
identifies the URL target for the recorded audio. All other
attributes are optional.
The presence or absence of the prompt block controls whether or not a
prompt plays before recording begins.
When the application requests the media server to prompt the caller
before recording audio, <playrecord> has two stages. The first is
equivalent to a <playcollect> operation. The application may set the
prompt phase to be interruptible by DTMF input (barge) and may also
specify an escape key that will terminate the <playrecord> request
before the recording phase begins.
Detection of the escape key generates a response message, and the
operation returns immediately. If any other keys are pressed and if
the prompt has been set as interruptible (barge="yes"), then the play
stops immediately and the recording phase begins.
Any digits collected in the prompt phase, with the exception of the
recstopmask, are buffered and returned in the response.
If the request proceeds to the recording phase, any digits from the
collect phase are discarded from the buffer to eliminate unintended
Van Dyke, et al. Expires January 8, 2004 [Page 17]
Internet-Draft MSCML July 2003
termination of the recording.
The media server compares digits detected during the recording phase
to the digits specified in the recstopmask to determine if they
indicate a recording termination request.
The media server ignores digits not present in the recstopmask and
passes them into the recording. If the recording is terminated
because of a DTMF input, the collected digits are returned to the
application in the <response>.
Once recording has begun, the media server writes the audio to the
specified recurl URL no matter what DTMF events are detected. It is
the responsibility of the application to examine the DTMF input
returned in the <response> message to determine whether the audio
file should be saved or if it should be deleted and potentially
re-recorded.
Two attributes control how long the Media Server waits for the start
of speech to begin the recording and the absence of speech to end the
recording:
initsilence determines how long to wait for initial speech input
before terminating (canceling) the recording. This parameter may
take an integer value in milliseconds, or may be set to -1, which
directs the Media Server to wait indefinitely. The default is 3000
ms (3 seconds).
endsilence determines how long the Media Server waits after speech
has ended to stop the recording. This parameter may take an
integer value in milliseconds, or may be set to -1. With a value
of -1, the recording will continue indefinitely after speech has
ended and may terminate due to a DTMF keypress or because the
maximum desired duration has been reached. The default value is
4000 ms (4 seconds).
If the endsilence timer expires, the Media Server trims the end of
the recorded audio by an amount equal to the endsilence parameter.
Additional attributes are:
mode whether the recording will overwrite or append.
reencoding whether encoding is mu-law or A-law.
duration time in ms for the entire recording.
Van Dyke, et al. Expires January 8, 2004 [Page 18]
Internet-Draft MSCML July 2003
beep whether a beep will signify the start of recording.
When the recording is finished, the media server generates a
<response> message and sends it to the application in a SIP INFO
message. The response contains the id, the code (e.g., 200, 400,
501), the reason (e.g., digit, end_silence, init_silence,
max_duration, escapekey, error, or stopped), collected digits, and
the reclength (size of the recorded file in bytes).
6.4 Stop Request <stop>
The application issues a <stop> request when the objective is to stop
a request in progress and not initiate another operation. This
request generates a <response> message from the Media Server.
The only attribute is id, which is optional.
The application-defined request id correlates the asynchronous
response with its original request and echoes back to the application
in the Media Server's response.
The response may carry the id, the code (e.g., 200), and the text
(e.g., OK).
Note that the Media Server treats a SIP re-INVITE with hold media
(c=0.0.0.0) as an implicit <stop> request. The media server
immediately terminates the running <play>, <playcollect> or
<playrecord> request, and sends a <response>, indicating
"reason=stopped".
6.5 Prompt Block <prompt>
This block in the body of the <play>, <playcollect>, or <playrecord>
request contains one or more references to physical audio files,
provisioned sequences, or variables that are played in the order in
which they appear.
Figure 12 shows a sample prompt block.
<prompt baseurl="file:////opt/snowshore/prompts/conf/">
<audio url="please_enter.wav"/>
<variable type="silence" value="1"/>
<audio url="your.raw" encoding="a-law"/>
<variable type="silence" value="1"/>
<audio
url="http://prompts.example.net/pin_number.wav"/>
</prompt>
Van Dyke, et al. Expires January 8, 2004 [Page 19]
Internet-Draft MSCML July 2003
Figure 12: Active Talker Event Example
The baseurl attribute is the base URL prepended to the URL attributes
within the <prompt> block.
Each audio element in a <prompt> block refers to an audio file or
provisioned sequence for the media server to play. The media server
plays audio files in the order in which they are listed in the block.
6.6 Recording Fax <faxrecord>
The <faxrecord> request directs the Media Server to process a fax in
answer mode. The reason for a separate tag from the <playrecord> tag
is because the Media Server needs to know to process the T.30 [14] or
T.38 [15] fax protocols.
This tag has multiple attributes. The lclid attribute is a string
that identifies the called station. The lclid attribute is optional.
The default is null.
The <faxrecord> request operates in one of three modes: receive,
poll, and turnaround poll.
In receive mode, the Media Server receives the fax and writes the fax
data to the URI specified by the recurl attribute.
In poll mode, the Media Server sends a fax, but as a polled (called)
device.
In turnaround poll mode, the Media Server will record a fax that the
remote machine sends. If the remote machine requests a transmission,
then the Media Server will send the fax.
The recurl attribute is the URI to record the fax to, if specified.
The prompturl attribute is the URI to fetch the fax to transmit, if
specified.
The rmtid attribute specifies the calling station identifier of the
remote terminal. If specified, the media server MUST reject
transactions with the remote terminal if the remote terminal's
identifier does not match rmtid.
The combination of prompturl and recurl define the mode. See Table
1.
Van Dyke, et al. Expires January 8, 2004 [Page 20]
Internet-Draft MSCML July 2003
+----------------+----------------+----------------+----------------+
| prompturl | recurl | Mode | Operation |
+----------------+----------------+----------------+----------------+
| no | no | Invalid | Request fails. |
| | | | |
| no | yes | Receive | Record fax |
| | | | into recurl. |
| | | | |
| yes | no | Poll | Send fax from |
| | | | prompturl. If |
| | | | rmtid is |
| | | | specified, it |
| | | | must match |
| | | | remote |
| | | | terminal's |
| | | | identifier, or |
| | | | the request |
| | | | will fail. |
| | | | |
| yes | yes | Turnaround | If the remote |
| | | Poll | terminal |
| | | | wishes to |
| | | | transmit, the |
| | | | Media Server |
| | | | records the |
| | | | fax into |
| | | | recurl. If the |
| | | | remote |
| | | | terminal |
| | | | wishes to |
| | | | receive, the |
| | | | Media Server |
| | | | sends the fax |
| | | | from |
| | | | prompturl. If |
| | | | rmtid is |
| | | | specified, it |
| | | | must match |
| | | | remote |
| | | | terminal's |
| | | | identifier, or |
| | | | the send |
| | | | request will |
| | | | fail. A |
| | | | receive |
| | | | operation will |
| | | | still succeed, |
| | | | however. |
Van Dyke, et al. Expires January 8, 2004 [Page 21]
Internet-Draft MSCML July 2003
+----------------+----------------+----------------+----------------+
Table 1: Fax Receive Modes
The Media Server MUST flush any quarantined digits when it receives a
<faxrecord> request.
6.7 Sending Fax <faxplay>
The <faxplay> request directs the Media Server to process a fax in
originate mode. The reason for a separate tag from the <play> tag is
because the Media Server needs to know to process the T.30 [14] or
T.38 [15] fax protocols.
This tag has multiple attributes. The lclid attribute is a string
that identifies the Media Server as the calling station in the DIS
message. The lclid attribute is optional. The default is null.
The <faxplay> request operates in one of three modes: send, remote
poll, and turnaround poll.
In send mode, the Media Server sends the fax.
In remote poll mode, the Application Server places a call on behalf
of the Media Server. The Media Server requests a fax transmission
from the remote fax terminal.
In turnaround poll mode, the Media Server will record a fax that the
remote machine sends. If the remote machine requests a transmission,
then the Media Server will send the fax.
The recurl attribute is the URI to record the fax to, if specified.
The Media Server will advertise in the
DIS message it can receive
fax transmissions.
The prompturl attribute is the URI to fetch the fax to transmit, if
specified. The Media Server will advertise in the DIS message it can
send fax transmissions.
The rmtid attribute specifies the calling station identifier of the
remote terminal. If specified, the media server MUST reject
transactions with the remote terminal if the remote terminal's
identifier does not match rmtid.
The combination of prompturl and recurl define the mode. See Table
2.
Van Dyke, et al. Expires January 8, 2004 [Page 22]
Internet-Draft MSCML July 2003
+----------------+----------------+----------------+----------------+
| prompturl | recurl | Mode | Operation |
+----------------+----------------+----------------+----------------+
| no | no | Invalid | Request fails. |
| | | | |
| yes | no | Send | Send fax from |
| | | | prompturl. If |
| | | | rmtid is |
| | | | specified, it |
| | | | must match |
| | | | remote |
| | | | terminal's |
| | | | identifier, or |
| | | | the receive |
| | | | request will |
| | | | fail. |
| | | | |
| no | yes | Poll | Send fax from |
| | | | prompturl, |
| | | | assuming the |
| | | | remote |
| | | | terminal |
| | | | specifies it |
| | | | can receive a |
| | | | fax in its DIS |
| | | | message. It |
| | | | the remote |
| | | | terminal does |
| | | | not support |
| | | | reverse |
| | | | polling, the |
| | | | request will |
| | | | fail. If rmtid |
| | | | is specified, |
| | | | it must match |
| | | | remote |
| | | | terminal's |
| | | | identifier, or |
| | | | the request |
| | | | will fail. |
| | | | |
| yes | yes | Turnaround | If the remote |
| | | Poll | terminal |
| | | | wishes to |
| | | | transmit, the |
| | | | Media Server |
| | | | records the |
| | | | fax into |
Van Dyke, et al. Expires January 8, 2004 [Page 23]
Internet-Draft MSCML July 2003
| | | | recurl. If the |
| | | | remote |
| | | | terminal |
| | | | wishes to |
| | | | receive, the |
| | | | Media Server |
| | | | sends the fax |
| | | | from |
| | | | prompturl. If |
| | | | rmtid is |
| | | | specified, it |
| | | | must match |
| | | | remote |
| | | | terminal's |
| | | | identifier, or |
| | | | the send |
| | | | request will |
| | | | fail. A |
| | | | receive |
| | | | operation will |
| | | | still succeed, |
| | | | however. |
+----------------+----------------+----------------+----------------+
Table 2: Fax Send Modes
The Media Server MUST flush any quarantined digits when it receives a
<faxplay> request.
Van Dyke, et al. Expires January 8, 2004 [Page 24]
Internet-Draft MSCML July 2003
7. Response Attributes and Return Codes
7.1 SIP
The Media Server acknowledges receipt of an application request by
sending a response of either 200 OK or 415 BAD MEDIA TYPE. (The
latter is sent when the SIP request contains a content type other
than "application/sdp" or "application/mediaservercontrol+xml").
The <response> message is transported in a SIP INFO request.
If there is an error in the request or the request cannot be
completed, the <response> message is sent very shortly after
receiving the request. If the request is able to proceed, the
<response> contains final status information as listed below.
7.2 HTTP
The Media Server processes the request and returns a <response>
message in the body of the http POST. The media server treats the
results of the post as if a new MSCML file was sent in a new INFO
message.
7.3 <response> Attributes
If the request specified an ID, the response will echoed the ID.
The "code" is the result code for the request. It can take the
following values.
o 200 indicates command completed.
o 400 for <playrecord>, <faxrecord>, and <faxplay> indicates command
not accepted due to an error. The text attribute describes the
cause of the error.
o 501 for <playrecord>, <faxrecord>, and <faxplay> indicates an
error because the media server does not support the URL type
specified.
The "digits" are the returned digits for <playcollect> and
<playrecord>. Its value is the collected digits, if any.
The "reason" is why the command terminated. For all requests, the
reason "stopped" indicates that a <stop> request, another command, or
a re-INVITE with hold media stopped the request.
For the <play> request, the "EOF" reason means the media server
Van Dyke, et al. Expires January 8, 2004 [Page 25]
Internet-Draft MSCML July 2003
played out to the end of the file.
For the <playcollect> request, a reason of "match" means a match was
found; "timeout" means no digit was received before the time-out
timer expired; "returnkey" and "escapekey" means the return key or
escape key terminated the operation, respectively; and "interrupted"
means another request interrupted the <playcollect> request.
For the <playrecord> request, a reason of "digit" means a digit was
detected; "end_silence" means the recording terminated because the
trailing silence timer expired; "init_silence" means that no voice
was detected; "max_duration" means the recording terminated because
the maximum time for recording completed; "escapekey" means the user
entered the escape key in either play or record mode, thus
terminating the recording; or "error", for a general operation
failure.
For the <faxplay> and <faxrecord> requests, a reason of "complete"
means successful completion, even if there were bad lines or minor
negotiation problems, i.e., a DCN was received; "disconnect" means
the session was disconnected; "notfax" means no DIS or DCS was
received on the connection.
The "reclength" is the length of the recording in bytes for a
<playrecord>.
The "text" is the descriptive text associated with the response code.
For the <faxplay> and <faxrecord> requests, the faxcode attribute is
the binary-or of the following bit patterns.
Van Dyke, et al. Expires January 8, 2004 [Page 26]
Internet-Draft MSCML July 2003
+------+--------------------------------------+
| Mask | description |
+------+--------------------------------------+
| 0 | Operation Failed |
| | |
| 1 | Operation Succeeded |
| | |
| 2 | Partial Success |
| | |
| 4 | Image received and placed in recurl |
| | |
| 8 | Image sent from prompturl |
| | |
| 16 | rmtid did not match |
| | |
| 32 | Error reading prompturl |
| | |
| 64 | Error writing recurl |
| | |
| 128 | Negotiation failure on send phase |
| | |
| 256 | Negotiation failure on receive phase |
| | |
| 512 | Reserved |
| | |
| 1024 | Irrecoverable IP packet loss |
| | |
| 2048 | Line errors in received image |
+------+--------------------------------------+
Van Dyke, et al. Expires January 8, 2004 [Page 27]
Internet-Draft MSCML July 2003
8. Formal Syntax
The following syntax specification uses the augmented Data Type
Definition (DTD) as described in XML [2].
8.1 MSCML DTD
<?xml version="1.0" encoding="UTF-8"?>
<!-- =========================================================== -->
<!-- MediaServerControl Document Type Description -->
<!-- Copyright (c) 2001-2003 SnowShore Networks, Inc. -->
<!-- All Rights Reserved. -->
<!-- =========================================================== -->
<!ELEMENT MediaServerControl (request | response | notification)>
<!ATTLIST MediaServerControl
version (1.0) #REQUIRED
>
<!ELEMENT request (configure_conference | configure_leg |
play | playcollect | playrecord | stop)>
<!ELEMENT configure_conference (inputgain?, outputgain?, subscribe?)>
<!ATTLIST configure_conference
id CDATA #IMPLIED
reservedtalkers CDATA #IMPLIED
reserveconfmedia (yes | no) #IMPLIED
>
<!-- Tags for gain control -->
<!ELEMENT outputgain (auto | fixed)>
<!ELEMENT inputgain (auto | fixed)>
<!ELEMENT auto EMPTY>
<!ATTLIST auto
startlevel CDATA #IMPLIED
targetlevel CDATA #IMPLIED
silencethreshold CDATA #IMPLIED
>
<!ELEMENT fixed EMPTY>
<!ATTLIST fixed
level CDATA #IMPLIED
>
<!ELEMENT subscribe (events)>
<!ELEMENT events (activetalkers)>
<!ELEMENT activetalkers (talker+)?>
<!ATTLIST activetalkers
report (yes | no) "no"
interval CDATA #IMPLIED
>
<!-- Acceptable values for interval range from 1-60 seconds -->
<!ELEMENT talker EMPTY>
<!ATTLIST talker
Van Dyke, et al. Expires January 8, 2004 [Page 28]
Internet-Draft MSCML July 2003
callid CDATA #REQUIRED
>
<!-- The list of current talkers is used only when sending -->
<!-- notifications to the calling application. It should never -->
<!-- be set when subscribing. -->
<!ELEMENT configure_leg (inputgain?, outputgain?)>
<!ATTLIST configure_leg
id CDATA #IMPLIED
type (talker | listener) #IMPLIED
mixmode (full | mute | preferred | parked) #IMPLIED
dtmfclamp (yes | no) #IMPLIED
>
<!-- Stops a play or record operation in progress -->
<!ELEMENT stop EMPTY>
<!-- Plays an audio prompt, no barge-in or digit collection. -->
<!-- <play/> generates a <response/> message when the specified -->
<!-- prompt has finished playing or if an error occurs. -->
<!ELEMENT play (prompt)?>
<!ATTLIST play
id CDATA #IMPLIED
prompturl CDATA #IMPLIED
promptencoding (ulaw | alaw) #IMPLIED
>
<!-- Plays an audio prompt, collects DTMF digits and returns the -->
<!-- digits to the application. May also be used simply to -->
<!-- collect digits if no sequence is specified. <playcollect/> -->
<!-- sends an asynchronous <response/> message which is normally -->
<!-- generated when the desired digits have been collected or a -->
<!-- timeout has expired. -->
<!ELEMENT playcollect (prompt?, pattern?)>
<!ATTLIST playcollect
id CDATA #IMPLIED
prompturl CDATA #IMPLIED
barge (yes | no) "yes"
promptencoding (ulaw | alaw) #IMPLIED
cleardigits CDATA "yes"
maxdigits CDATA #IMPLIED
firstdigittimer CDATA #IMPLIED
interdigittimer CDATA #IMPLIED
intdigcrittimer CDATA #IMPLIED
extradigittimer CDATA #IMPLIED
returnkey CDATA "#"
escapekey CDATA "*"
>
<!-- <playrecord/> takes the audio from the associated session -->
<!-- and records it to the location and format specified. It -->
<!-- generates a <response/> message if the request is in error, -->
<!-- when the recording session has been interrupted by DTMF, -->
Van Dyke, et al. Expires January 8, 2004 [Page 29]
Internet-Draft MSCML July 2003
<!-- the specified duration has been exceeded or a timeout has -->
<!-- expired. The request has an optional prompt to be played -->
<!-- prior to the start of recording. -->
<!ELEMENT playrecord (prompt)?>
<!ATTLIST playrecord
id CDATA #IMPLIED
prompturl CDATA #IMPLIED
barge (yes | no) #IMPLIED
cleardigits (yes | no) #IMPLIED
escapekey CDATA "*"
recurl CDATA #REQUIRED
mode (append | overwrite) "overwrite"
recencoding (ulaw | alaw) #IMPLIED
initsilence CDATA #IMPLIED
endsilence CDATA #IMPLIED
duration CDATA #IMPLIED
beep (yes | no) "yes"
recstopmask CDATA "01234567890*#"
>
<!ELEMENT prompt (audio | variable)+>
<!ATTLIST prompt
locale CDATA #IMPLIED
baseurl CDATA #IMPLIED
>
<!ELEMENT audio EMPTY>
<!ATTLIST audio
url CDATA #REQUIRED
encoding (ulaw | alaw) #IMPLIED
>
<!-- The encoding attribute is required for files that are not in-->
<!-- self-describing .au or .wav format and do not have a well -->
<!-- known extension (.ulaw). -->
<!ELEMENT pattern (regex | digitmap)+>
<!ELEMENT regex EMPTY>
<!ATTLIST regex
value CDATA #REQUIRED
name CDATA #IMPLIED
>
<!ELEMENT digitmap EMPTY>
<!ATTLIST digitmap
value CDATA #REQUIRED
name CDATA #IMPLIED
>
<!ELEMENT variable EMPTY>
<!ATTLIST variable
type (date | digit | duration | month | money |
number | silence | string | time | weekday) #REQUIRED
subtype (mdy | dmy | ymd | ndn | t12 | t24 | USD |
Van Dyke, et al. Expires January 8, 2004 [Page 30]
Internet-Draft MSCML July 2003
gen | ndn | crd | ord) #IMPLIED
value CDATA #REQUIRED
>
<!-- Play fax file -->
<!ELEMENT faxplay EMPTY>
<!ATTLIST faxplay
lclid CDATA ""
prompturl CDATA #IMPLIED
recurl CDATA #IMPLIED
rmtid CDATA #IMPLIED
>
<!-- Record fax file -->
<!ELEMENT faxrecord EMPTY>
<!ATTLIST faxrecord
lclid CDATA ""
prompturl CDATA #IMPLIED
recurl CDATA #IMPLIED
rmtid CDATA #IMPLIED
>
<!ELEMENT response EMPTY>
<!ATTLIST response
request (configure_conference | configure_leg |
play | playcollect | playrecord |
faxrecord | faxplay | stop) #REQUIRED
id CDATA #IMPLIED
code CDATA #REQUIRED
text CDATA #REQUIRED
reason CDATA #IMPLIED
reclength CDATA #IMPLIED
patternname CDATA #IMPLIED
digits CDATA #IMPLIED
faxcode CDATA #IMPLIED
pages_sent CDATA #IMPLIED
pages_recv CDATA #IMPLIED
>
<!ELEMENT notification (conference)>
<!ELEMENT conference (activetalkers)>
<!ATTLIST conference
uniqueid CDATA #REQUIRED
numtalkers CDATA #REQUIRED
>
8.2 Schema
This section is informative. The normative definition of the schema
is the DTD described in the previous section, MSCML DTD (Section
8.1).
Van Dyke, et al. Expires January 8, 2004 [Page 31]
Internet-Draft MSCML July 2003
<?xml version="1.0" encoding="UTF-8"?>
<!-- =========================================================== -->
<!-- MediaServerControl XML Schema -->
<!-- Copyright (c) 2001-2003 SnowShore Networks, Inc. -->
<!-- All Rights Reserved. -->
<!-- =========================================================== -->
<!--W3C Schema generated by XMLSPY v5 rel. 2 U -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="MediaServerControl">
<xs:annotation>
<xs:documentation>
Media Server Control Markup Language (MSCML)
</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:choice>
<xs:element name="request" type="requestType"/>
<xs:element name="response" type="responseType"/>
<xs:element name="notification" type="notificationType"/>
</xs:choice>
<xs:attribute name="version" use="required">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="1.0"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
<xs:complexType name="activetalkersType">
<xs:sequence minOccurs="0">
<xs:element name="talker" type="talkerType"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="report" default="no">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="interval" type="xs:string"/>
</xs:complexType>
<xs:complexType name="audioType">
<xs:attribute name="url" type="xs:string" use="required"/>
<xs:attribute name="encoding">
Van Dyke, et al. Expires January 8, 2004 [Page 32]
Internet-Draft MSCML July 2003
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="ulaw"/>
<xs:enumeration value="alaw"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:complexType name="autoType">
<xs:attribute name="startlevel" type="xs:string"/>
<xs:attribute name="targetlevel" type="xs:string"/>
<xs:attribute name="silencethreshold" type="xs:string"/>
</xs:complexType>
<xs:complexType name="conferenceType">
<xs:sequence>
<xs:element name="activetalkers" type="activetalkersType"/>
</xs:sequence>
<xs:attribute name="uniqueid" type="xs:string" use="required"/>
<xs:attribute name="numtalkers" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="configure_conferenceType">
<xs:sequence>
<xs:element name="inputgain" type="inputgainType"
minOccurs="0"/>
<xs:element name="outputgain" type="outputgainType"
minOccurs="0"/>
<xs:element name="subscribe" type="subscribeType"
minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string"/>
<xs:attribute name="reservedtalkers" type="xs:string"/>
<xs:attribute name="reserveconfmedia">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:complexType name="configure_legType">
<xs:sequence>
<xs:element name="inputgain" type="inputgainType"
minOccurs="0"/>
<xs:element name="outputgain" type="outputgainType"
minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string"/>
Van Dyke, et al. Expires January 8, 2004 [Page 33]
Internet-Draft MSCML July 2003
<xs:attribute name="type">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="talker"/>
<xs:enumeration value="listener"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="mixmode">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="full"/>
<xs:enumeration value="mute"/>
<xs:enumeration value="preferred"/>
<xs:enumeration value="parked"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="dtmfclamp">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:complexType name="digitmapType">
<xs:attribute name="value" type="xs:string" use="required"/>
<xs:attribute name="name" type="xs:string"/>
</xs:complexType>
<xs:complexType name="eventsType">
<xs:sequence>
<xs:element name="activetalkers" type="activetalkersType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="fixedType">
<xs:attribute name="level" type="xs:string"/>
</xs:complexType>
<xs:complexType name="inputgainType">
<xs:choice>
<xs:element name="auto" type="autoType"/>
<xs:element name="fixed" type="fixedType"/>
</xs:choice>
</xs:complexType>
<xs:complexType name="notificationType">
<xs:sequence>
<xs:element name="conference" type="conferenceType"/>
Van Dyke, et al. Expires January 8, 2004 [Page 34]
Internet-Draft MSCML July 2003
</xs:sequence>
</xs:complexType>
<xs:complexType name="outputgainType">
<xs:choice>
<xs:element name="auto" type="autoType"/>
<xs:element name="fixed" type="fixedType"/>
</xs:choice>
</xs:complexType>
<xs:complexType name="patternType">
<xs:choice maxOccurs="unbounded">
<xs:element name="regex" type="regexType"/>
<xs:element name="digitmap" type="digitmapType"/>
</xs:choice>
</xs:complexType>
<xs:complexType name="playType">
<xs:sequence minOccurs="0">
<xs:element name="prompt" type="promptType"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string"/>
<xs:attribute name="prompturl" type="xs:string"/>
<xs:attribute name="promptencoding">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="ulaw"/>
<xs:enumeration value="alaw"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:complexType name="playcollectType">
<xs:sequence>
<xs:element name="prompt" type="promptType" minOccurs="0"/>
<xs:element name="pattern" type="patternType" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string"/>
<xs:attribute name="prompturl" type="xs:string"/>
<xs:attribute name="barge" default="yes">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="promptencoding">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="ulaw"/>
Van Dyke, et al. Expires January 8, 2004 [Page 35]
Internet-Draft MSCML July 2003
<xs:enumeration value="alaw"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="cleardigits" type="xs:string" default="yes"/>
<xs:attribute name="maxdigits" type="xs:string"/>
<xs:attribute name="firstdigittimer" type="xs:string"/>
<xs:attribute name="interdigittimer" type="xs:string"/>
<xs:attribute name="intdigcrittimer" type="xs:string"/>
<xs:attribute name="extradigittimer" type="xs:string"/>
<xs:attribute name="returnkey" type="xs:string" default="#"/>
<xs:attribute name="escapekey" type="xs:string" default="*"/>
</xs:complexType>
<xs:complexType name="faxplayType"/>
<xs:complexType name="faxrecordType"/>
<xs:complexType name="playrecordType">
<xs:sequence minOccurs="0">
<xs:element name="prompt" type="promptType"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string"/>
<xs:attribute name="prompturl" type="xs:string"/>
<xs:attribute name="barge">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="cleardigits">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="escapekey" type="xs:string" default="*"/>
<xs:attribute name="recurl" type="xs:string" use="required"/>
<xs:attribute name="mode" default="overwrite">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="append"/>
<xs:enumeration value="overwrite"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="recencoding">
Van Dyke, et al. Expires January 8, 2004 [Page 36]
Internet-Draft MSCML July 2003
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="ulaw"/>
<xs:enumeration value="alaw"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="initsilence" type="xs:string"/>
<xs:attribute name="endsilence" type="xs:string"/>
<xs:attribute name="duration" type="xs:string"/>
<xs:attribute name="beep" default="yes">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="recstopmask" type="xs:string"
default="01234567890*#"/>
</xs:complexType>
<xs:complexType name="promptType">
<xs:choice maxOccurs="unbounded">
<xs:element name="audio" type="audioType"/>
<xs:element name="variable" type="variableType"/>
</xs:choice>
<xs:attribute name="locale" type="xs:string"/>
<xs:attribute name="baseurl" type="xs:string"/>
</xs:complexType>
<xs:complexType name="regexType">
<xs:attribute name="value" type="xs:string" use="required"/>
<xs:attribute name="name" type="xs:string"/>
</xs:complexType>
<xs:complexType name="requestType">
<xs:choice>
<xs:element name="configure_conference"
type="configure_conferenceType"/>
<xs:element name="configure_leg" type="configure_legType"/>
<xs:element name="play" type="playType"/>
<xs:element name="playcollect" type="playcollectType"/>
<xs:element name="playrecord" type="playrecordType"/>
<xs:element name="faxrecord" type="faxrecordType"/>
<xs:element name="faxplay" type="faxplayType"/>
<xs:element ref="stop"/>
</xs:choice>
</xs:complexType>
<xs:complexType name="responseType">
<xs:attribute name="request" use="required">
Van Dyke, et al. Expires January 8, 2004 [Page 37]
Internet-Draft MSCML July 2003
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="configure_conference"/>
<xs:enumeration value="configure_leg"/>
<xs:enumeration value="play"/>
<xs:enumeration value="playcollect"/>
<xs:enumeration value="playrecord"/>
<xs:enumeration value="faxrecord"/>
<xs:enumeration value="faxplay"/>
<xs:enumeration value="stop"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="id" type="xs:string"/>
<xs:attribute name="code" type="xs:string" use="required"/>
<xs:attribute name="reason" type="xs:string" use="required"/>
<xs:attribute name="text" type="xs:string" use="required"/>
<xs:attribute name="patternname" type="xs:string"/>
<xs:attribute name="digits" type="xs:string"/>
<xs:attribute name="reclength" type="xs:string"/>
<xs:attribute name="faxcode" type="xs:string"/>
<xs:attribute name="pages_sent" type="xs:string"/>
<xs:attribute name="pages_recv" type="xs:string"/>
</xs:complexType>
<xs:element name="stop">
<xs:complexType/>
</xs:element>
<xs:complexType name="subscribeType">
<xs:sequence>
<xs:element name="events" type="eventsType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="talkerType">
<xs:attribute name="callid" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="variableType">
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="date"/>
<xs:enumeration value="digit"/>
<xs:enumeration value="duration"/>
<xs:enumeration value="month"/>
<xs:enumeration value="money"/>
<xs:enumeration value="number"/>
<xs:enumeration value="silence"/>
<xs:enumeration value="string"/>
<xs:enumeration value="time"/>
Van Dyke, et al. Expires January 8, 2004 [Page 38]
Internet-Draft MSCML July 2003
<xs:enumeration value="weekday"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="subtype">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="mdy"/>
<xs:enumeration value="dmy"/>
<xs:enumeration value="ymd"/>
<xs:enumeration value="ndn"/>
<xs:enumeration value="t12"/>
<xs:enumeration value="t24"/>
<xs:enumeration value="USD"/>
<xs:enumeration value="gen"/>
<xs:enumeration value="ndn"/>
<xs:enumeration value="crd"/>
<xs:enumeration value="ord"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="value" type="xs:string" use="required"/>
</xs:complexType>
</xs:schema>
Van Dyke, et al. Expires January 8, 2004 [Page 39]
Internet-Draft MSCML July 2003
9. IANA Considerations
9.1 IANA Registration of MIME media type application/
mediaservercontrol+xml
MIME media type name: application
MIME subtype name: mediaservercontrol+xml
Required parameters: none
Optional parameters: charset
charset This parameter has identical semantics to the charset
parameter of the "application/xml" media type as specified in
XML Media Types [3].
Encoding considerations: See RFC3023 [3].
Interoperability considerations: See RFC2023 [3] and this document.
Published specification: This document.
Applications which use this media type: Multimedia, enhanced
conferencing and interactive applications.
Intended usage: COMMON
Van Dyke, et al. Expires January 8, 2004 [Page 40]
Internet-Draft MSCML July 2003
10. Security Considerations
Because media flows through a media server in a conference, the media
server itself MUST protect the integrity, confidentiality, and
security of the sessions. It should not be possible for a conference
participant, on her own behalf, to be able to "tap in" to another
conference without proper authorization.
Because conferencing is a high value application, the media server
SHOULD implement appropriate security measures. This includes, but
not limited to, access lists for application servers. That is, only
a select list of application or proxy servers is allowed to create
conferences, invite participants to sessions, etc. Note that the
mechanisms for such security, like private networks, shared
certificates, MAC white/black lists, are beyond the scope of this
document.
As an XML markup, all of the security considerations of RFC3023 [3]
apply.
Van Dyke, et al. Expires January 8, 2004 [Page 41]
Internet-Draft MSCML July 2003
Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[2] Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, "XML
Schema Part 1: Structures", W3C REC REC-xmlschema-1-20010502,
May 2001.
[3] Murata, M., St. Laurent, S. and D. Kohn, "XML Media Types", RFC
3023, January 2001.
Van Dyke, et al. Expires January 8, 2004 [Page 42]
Internet-Draft MSCML July 2003
Informative References
[4] Van Dyke, J., Burger (Ed.), E. and A. Spitzer, "Basic Network
Media Services with SIP", draft-burger-sipping-netann-06 (work
in progress), January 2003.
[5] Johnston, A. and O. Levin, "Session Initiation Protocol Call
Control - Conferencing for User Agents",
draft-ietf-sipping-cc-conferencing-00 (work in progress), April
2003.
[6] Mahy, R., "A Call Control and Multi-party usage framework for
the Session Initiation Protocol (SIP)",
draft-ietf-sipping-cc-framework-02 (work in progress), March
2003.
[7] McGlashan, S., Burnett, D., Danielsen, P., Ferrans, J., Hunt,
A., Karam, G., Ladd, D., Lucas, B., Porter, B., Rehor, K. and
S. Tryphonas, "Voice Extensible Markup Language (VoiceXML)
Version 2.0", W3C LastCall WD-voicexml20-20020424, April 2002.
[8] ISC, "ISC Reference Architecture V1.2", June 2002.
[9] Groves, C., Pantaleo, M., Anderson, T. and T. Taylor, "Gateway
Control Protocol Version 1", RFC 3525, July 2003.
[10] Donovan, S., "The SIP INFO Method", RFC 2976, October 2000.
[11] Burger, E., "Keypad Markup Language (KPML)",
draft-burger-sipping-kpml-02 (work in progress), July 2003.
[12] Campbell, B., "Instant Message Sessions in SIMPLE",
draft-ietf-simple-message-sessions-00 (work in progress), May
2003.
[13] Hollenbeck, S., Rose, M. and L. Masinter, "Guidelines for the
Use of Extensible Markup Lanugage (XML) within IETF Protocols",
BCP 70, RFC 3470, January 2003.
[14] "Procedures for document facsimile transmission in the general
switched telephone network", Recommendation T.30, April 1999.
[15] "Procedures for real-time Group 3 facsimile communication over
IP networks", Recommendation T.38, March 2002.
Van Dyke, et al. Expires January 8, 2004 [Page 43]
Internet-Draft MSCML July 2003
Authors' Addresses
Jeff Van Dyke
SnowShore Networks, Inc.
285 Billerica Rd.
Chelmsford, MA 01824-4120
USA
EMail: jvandyke@snowshore.com
Eric Burger
SnowShore Networks, Inc.
285 Billerica Rd.
Chelmsford, MA 01824-4120
USA
EMail: e.burger@ieee.org
Andy Spitzer
SnowShore Networks, Inc.
285 Billerica Rd.
Chelmsford, MA 01824-4120
USA
EMail: woof@snowshore.com
Van Dyke, et al. Expires January 8, 2004 [Page 44]
Internet-Draft MSCML July 2003
Appendix A. Contributors
Jeff Van Dyke, Andy Spitzer, and Terence Lobo at SnowShore Networks,
Inc. did the concept, development, documentation, and execution for
MSCML. The IVR implementation was influenced by original work by
Andy Spitzer while he was at The Telephone Connection, Inc.
Cliff Schornak of Commetrex and Jeff Van Dyke developed the facsimile
service.
Terence Lobo, Srinivas Motamarri, Haj Elfadil, and Edwina Nowicki
contributed in being the first to eat what got cooked up.
Van Dyke, et al. Expires January 8, 2004 [Page 45]
Internet-Draft MSCML July 2003
Appendix B. Acknowledgements
The following individuals significantly assisted in the development,
direction, or, most importantly, debugging of MSCML:
o Gaurav Srivastva and Subhash Verma from BayPackets
o Jon Hinckley from SkyWave/Sestro
o Wesley Hicks, Ravindra Kabre, Kevin Summers from Sonus Networks
o Diana Rawlins and Sharadha Vijay from WorldCom
o Tim Wong from Z-Tel
o Kevin Flemming for his feedback on the semantics of creation
versus configuration for conferencing.
The authors would like to thank Scotty Farber, technical writer
extraordinaire, who turned our techno-geek into English.
Van Dyke, et al. Expires January 8, 2004 [Page 46]
Internet-Draft MSCML July 2003
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive
Director.
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this
document. For more information consult the online list of claimed
rights.
Note that SnowShore Networks, Inc. is making their intellectual
property right interest in MSCML available on a royalty-free basis.
The full text of the disclosure is at
http://www.ietf.org/ietf/IPR/SNOWSHORE-draft-vandyke-mscml.txt .
Van Dyke, et al. Expires January 8, 2004 [Page 47]
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Van Dyke, et al. Expires January 8, 2004 [Page 48]