Internet Engineering Task Force                               MMusic WG
INTERNET-DRAFT                                       Scott Petrack, IBM
draft-petrack-sisp-00.txt
13 June 1996                                     Expires: December 1996

              SISP - Simple Internet Signalling Protocol

Status of this Memo

This document is a first draft of an Internet-Draft. Internet-Drafts
are working documents of the Internet Engineering Task Force (IETF),
its areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

This document is truly rough, but it was felt that the timeliness
of the ideas justified dissemination in this preliminary form.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or made obsolete by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.


Abstract

Simple Internet Signalling Protocol (SISP) performs one function:
signalling of realtime data streams over IP networks. SISP has
several distinguishing features: it is a tiny extension of RTCP,
running over UDP or TCP, it can integrate very well with PSTN
signalling, and it can run in very low bandwidth situations without
disturbing the real time stream. It is completely scalable with
respect to number of participants and also with respect to "tightness"
of control, and can work with an extremely
wide variety of conference models, policies, and standards.

SISP differs from other conferencing protocols in that it performs
a single essential task completely. It is argued that other protocols
solve only parts of several overlapping problems.  SISP can serve as the
lowest common denominator for signalling of real-time streams.

The requirements that SISP fulfills, the features it offers,
the fact that it uses RTCP as an encapsulation scheme, and its
generally minimalist approach of solving one problem only are
more important than the actual state machine it implements or
particular format of its messages.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 2
1. Introduction

This paper discusses a solution to a one particular aspect of
the very large "session/conference control problem": the
problem of signalling. It is also an attempt to help resolve a
crisis.

There are at present two separate groups of applications which
perform conferencing over the Internet. One is the suite of MBONE
tools. These tools have little or no conference control built in.
Separate protocols and tools are used to supply control if it
is desired. This is quite a reasonable approach: after all,
streaming is streaming and control is control. There is a long
list of protocols emerging at present (SIP, SAP, SDP, SCIP, SCCP)
which solve an equally long list of problems (location,
announcement, setup, session description, scheduling, negotiation
of capabilities). The problem is that these protocols have a
large overlap, and in many cases they solve overlapping and
ill-defined problems.

The second group is the plethora of commercial Internet
telephones, videophones, and other real time communication
applications. These applications often have control built in
directly into the real time stream. Of course, these applications
are really very immature, and certainly have not done their
IETF homework in almost any subject: none use multicast, few
use RTP, and none are interoperable in any way, neither for
control nor for streaming. Many people consider this a crisis,
although oddly enough there are wildly differing views on what
the crisis is.

Now of course it is very distasteful to have to deal with this
second group of applications at all. One has the impression
that there are no "real problems" here, certainly none
worthy of real research time or thought. It seems clear that
if one does some serious work on the first group of applications,
then the commercial applications will fall into line as they
realize the advantages of standardization.

This note argues otherwise: it begins with
the question: "what is the absolute minimum infrastructure
that must be in place to allow different multimedia conferencing
applications to become interoperable?" I claim that there
is a very tiny thing which stands out: signalling of realtime
streams. This is the mechanism by which
one sets up, maintains, and tears down a realtime stream.

All of conference control has as object to allow human users to
pass real time streams amongst themselves (although of course
there will be cases where some or all users are not human).
Signalling is what happens at the very last stage, when all
decisions about location, announcement, policy, scheduling,
have taken place, and you want to setup the real time stream NOW.
It also happens when the real time stream is already streaming,
and you need to change some shared ephermal state of it NOW.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 3

SISP has no mechanism to perform location queries, or scheduling
of future conferences. In fact, it doesn't really
know at all about conferences. It knows about a single RTP
stream. SISP simply adds some new RTCP SDES items and a packet type
to add some control to a single RTP stream. That's all. If you
are satsified with the loose control that RTCP gives over
a real time stream, then you do not need to use the new packet
type.  These allow one to scale the currently available loose control
across to a very tightly controlled model. It reuses a number
of mechanisms that already exist within RTCP, such as SDES
and CNAME. In fact, it is better to say that SISP simply uses these
mechanisms, not reuses them.

One very important feature of RTP is that each real time stream
is a separate entity with its own control; in the same way,
SISP treats each real time stream as a separate entity.
For example, this allows you to transfer the audio stream in an
AudioVideo Call, without transferring the Video stream. These sorts
of services are very important. Rather than reinventing them, we get
them from RTCP. In general, all issues relating to "shared ephermal
state" are implemented on a per-stream basis.

Of course, it is very desireable to have standard tools and
protocols for location, etc., and of course there is overlap
between the need to "announce" and "describe" and "invite."
Unfortunately, these problems have not been well enough
defined and separated yet, and this is the reason that there
are many overlapping protocols which are solving many overlapping
problems. We avoid this morass by simply not addressing it in any way.
We solve the smaller and perfectly well defined problem of
signalling. We certainly hope that solving this will help clarify
the other higher level issues.

This paper is written from a double perspective.

On the one hand, it is indeed a "letter from the front."
The author is writing to the generals and strategists back home,
describing a particular crisis. He has already done something
about it, and he thinks that it is important the generals know.
He is a bit upset that the guidelines he has from headquarters
are a bit confusing and frankly confused.

From another point of view, the author believes that the knot of
overlapping requirements and protocols is making for bad strategy.
He has an alternative solution, and he thinks that at least some
of its features are truly superior to what now exists. He hopes
that the following will contribute to untangling the problem.

The author's goal is thus a contribution both to the "problem"
of overlapping protocols and also to the "crisis" of
non-interoperable Internet Telephones and VideoPhones.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 4

With all this out of the way, the rest of the paper can be
more straightforward. We will describe the the motivation
for SISP, the motivation for using RTCP as transport,
basic features of SISP, and a few signal flows.

For many reasons (including the
"shared ephemeral state" of people submitting internet drafts in
June 1996 ), a great amount of important detail is missing
from this paper. Apart from simply regretting this, the author
wishes to state that the two main ideas of this paper, to use
RTCP and to separate out signalling from all other multimedia
conferencing problems, can be explained without reference to the
bit patterns of the new RTCP packets proposed.

In a final section, I compare and contrast broadly SISP with
various features of SIP, SAP, SDP, SCIP, SCCP. It is important
to understand the claim that non-SISP protocols try to do too much
on the one hand, and on the other are not quite rich enough.
At first we thought to call the new protocol "YACC" -
"yet another conference control." We have convinced ourselves
that this would not be accurate: SISP separates out one
particular, specific, essential problem, and solves it.

2. RTCP - a model of what is needed

The basic features of SISP stem directly from the decision to use
RTCP as a basis. So it makes sense to begin with a discussion of
the principles that impelled us to such a decision:

As explained above, our motivation was to perform signalling,
in the dictionary sense of "an act, event, or watchword that
has been agreed upon as the occasion of concerted action."
That is, signalling are those messages involved in call setup,
tear down, and maintenance which causes an action to happen
NOW. In particular, the thing of interest which is acted upon
is a real time media stream, which in our world is an RTP [1] stream.
So we wish to send messages to control real time streams in real
time.

Now of course it might be interesting to discuss if we
really want to control real time streams, or some higher level
thing like a "session" or a "conference." But it should be clear
that whatever higher level constructs one makes, at some point this
turns into control of some real time stream. When a user joins
or leaves a conference, for example, then a real time stream is
starts or stops flowing over a portion of the Internet, whatever
particular meaning you like to assign to the words "user", "join",
"leave," or "conference". Since we have to control these RTP
streams, it is natural to see what they are made of, and what
already exists to signal them.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 5

Looking at the definition of RTP, one discovers that it contains
RTCP, a "control protocol," by definition. In fact, the manual
states explicitly that "RTCP may be
sufficient for `loosely controlled' sessions" [1,p.2]. Hmmm.
It is certainly natural to try to make precise just
which signalling and control functions are already in RTCP,
before going on to invent something new. In any case,
we would certainly like to have a signalling mechanism which
can scale from very enabling very loose to very tight control.
If RTCP covers one end of the spectrum, it is interesting to
see how far it can be pushed, at what point it breaks, and if
a continuum can be built with it as one end.

One discovers that RTCP is pretty rich already. For example, "by
having each participant send its control packets to all the others,
each can independently observe the number of participants." [1,p.15]
This is certainly some sort of session awareness, of "shared
ephemeral state" in the sense of [2].

There is also a great deal of information sent about the sender in
the RTCP SDES packet. Although it is not clear at first why one
needs to know the email of the person to whom you are talking
(I don't know the email address of many of the people I talk to
over the telephone), the fact is that *all* the current suggestions
within MMUSIC seem to think that this is very important.
Luckily for us, then, every RTP stream
is already required to have this information transmitted within
and SDES packet. So applications already have code to transmit this
information. It seems a shame to code it again.

In fact, RTCP already solves some other difficult problems in multimedia
signalling. Consider the problem of how to define a "session" or
"conference." In RTCP, one has the notion of a "Canonical Name"
(CNAME). This is used in the RTCP packets so that different RTP streams
can be associated. For example, this is how one can know that a
particular RTP video stream and a different RTP audio stream are in
fact meant to be a synchronized VideoPhone call.

What more natural thing to do than to use an RTCP stream to convey all
this information which is vital to call signalling? For example, in
a very tightly controlled conference (like an ordinary phone call),
one might start by sending an RTCP stream with a CNAME and SDES and
other necessary information, and when the necessary shared stated
has been obtained, the RTP stream itself can begin. If there are several
RTP streams that make up a session, one could actually keep one RTCP
stream exclusively for signalling, or just add new RTCP and RTP streams
as needed.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 6

In fact, just by using RTP one can get a range of loose to tight
control models.
As one example among many others, if one wishes to have a
multicast session, where some parameters can not be announced
in advance, then one can send out the required parameters in
an SDES packet, and any RTCP monitor looking at the traffic can
join the conference. If one wants tighter control, then security
and encryption are both a part of RTCP already.

Before we come to the bad news, let us continue and see how RTCP
solves some problems of signalling simply and naturally.

The BYE packet of RTCP is actually a true
signal, in that it does indeed constitute a "watchword which has
been agreed upon as the occasion of concerted action." (Well, of
course in an extremely loosely controlled conference, of course, this
may not be true, but in such a case the BYE is not very important).

Here is a more sophisticated reason to use RTCP as a
signalling mechanism: signalling often involves precise timing
considerations. The need for precise timestamps to deal with
some aspects of "shared ephemeral state" is carefully discussed in [2].
Indeed, in the public telephone network (PSTN), passing these strict
timing requirements is one aspect of the process of homologation.
RTCP packets come with timestamps as well.

Another advantage of RTCP is that it allows for separate
signalling for each real time stream. For example,
if I wish to transfer a VideoPhone conversation to someone
who is connected only by telephone, I might wish to transfer
the audio stream of the call, but not the video stream.
It would be unfortunate if the transfer had to fail, just
because the third party had no video support.
Just as one doesn't want to *force* someone to associate or
interleave two separate streams, logically or physically, one
shouldn't try to force association of signalling either.
Problems that arise because of bandwidth considerations are
best dealt with by RTCP compression [3,4], not by forcing
users to have reduced functionality.

Finally, for those applications which run on very low bandwidth links,
using RTCP has two advantages, one of which is perhaps a bit subtle:

First, we have seen that many things one needs to send for signalling
are required in any case by RTCP. So apart from saving tired fingers
the trouble of writing new code, using RTCP can also save bandwidth.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 7

Second, on a very low bandwidth link, merely sending a signal can
overload the bandwidth, when an application is sending true RTP.
Now there is a small amount of tolerance for when one can send an RTCP
packet. In particular, a clever application can arrange to send the RTCP
packet during a "silence" period, which for the present purpose just
means a period when the RTP stream is not itself transmitting. This is
sometimes difficult, but a good application will know how to exploit
this. Of course, one can perform similar juggling with the signals,
but if one is transmitting signals for an RTP stream along with the
RTCP stream, it is obviously easier to coordinate things.

I hope that the reader is convinced that RTCP is already well
on the way to enable signalling for real time streams that is
robust, flexible on the scale of loose/tight control, and
very effecient in bandwidth and implementation.

3. Extensions to RTCP

Unfortunately, there are indeed some needed messages that
are missing from RTCP. Not surprisingly, these are needed
precisely to fill out the "tightly controlled" end of the
scale. What is amazing is that so little is needed.

I am sorry to be very informal here, and beg the extreme
indulgence of the reader. A committment is made to provide
details at a later date.

3.1 RDES - receiver description packet type

In a tightly coupled conference you clearly need to identify
the person you wish to speak to. Now exactly what "identify"
means is of course an interesting subject. We can say what
we mean quite precisely: It is assumed that the remote
machine receiving the RTCP packets has some means of identifying
the person you wish to contact. It is the duty of a decent
"location service" to provide both the address (IP, port) of the
machine to recieve the real time stream, and hence the RTCP
signalling, as well as the tag/value information needed to
identify the actual remote party. How this location service
works is beyond the scope of the signalling-for-RTP-streams
considered here. In any case, the RDES message should have
exactly the description needed to identify the remote party.

Of course, there is no *requirement* that one use the RDES
field to tightly control a conference. One could imagine a
private multicast to thousands of members of a cult, where
the standard methods of RTCP security could be used to control
conference membership very tightly. But it is equally obvious
that one mechanism for tight control is that an RDES message
should be sent at the very beginning of a call to identify the
called party.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 8

It should come as no surprise that the suggested format for
an RDES message is identical to that of an SDES message.
We shall give an example of the use of the RDES message in
the appendix.

3.2 RCAP item - receiver capabilities, new item in SDES

Until now, I have not yet given a single hint of any signal flow.
I have given no model of any kind for control. The next item
type seems indeed related to the particular model for signal
flows. But in fact it does not really forbid any particular
model. The "receiver capabilities" items list the RTP payload
types that the particular Receiver is willing to accept.
Note once again that I make no assumption whatsoever about
how this list is obtained. It may be a list of the coders that
the receiver's machine can actually decode, or it may be a
subset of that list based on such things as available machine
resources, hierarchy within an organization, or the phase of the
moon. As far as the needs of signalling go, a potential receiver
must be able to send out a list of those RTP payload types
that it is willing to receive right now. This list can contain
any of the accepted standard RTP payload types, or
elements of some other list of payload types agreed upon
by non-RTP means.

An example of setting up a simple call will be given in the
appendix, but it may help to state here that the basic mode
of call setup is inspired by the H.245 capabilities exchange
of ITU-T standard H.323. Namely, the reciever merely lists
the payload types that it is willing to accept, and then
the transmitter chooses one of those types for transmission.

Note that we can agree that the order of payload types within
a list describes the order of preference which the receiver has.
Note also that we need no special new item in SDES to describe
what is actually being sent. This is done in the payload type
of RTP.

It might be rather confusing that the RCAP item type is found in
the SDES packet, and not in the RDES packet. This is pure logic:
an RDES packet is sent in order to identify the *remote*
party. But one sends a SDES packet to describe "oneself," and
part of this description is what one is willing to tolerate
receiving.

3.3 CP item - call progress, new item in SDES

The final item that we need to add is one that allows call
progress to occur. Call Progress is the feedback that one
obtains during the life of a call from the network system.
For example, you hear a particular sound after you dial
a remote user, and you know that his or her equipment is ringing.
The call progress words currently supported by SISP are
the following: Ringing, Accept, Busy, NoAnswer, Reject,
Pause, Error, Release, Info.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                  Page 9


When the remote party's application is "ringing," it sends an SDES
packet with the CP item set to "ringing." The local application
receives this and can send the appropriate message ("Drrrrrrrring!")
to the local user.

These are new items for an SDES packet because we think of the
user who is "ringing", or "accepts" a call as describing
"himself" in an SDES packet. Unfortunately, this means
that even if I am a receiver only, I might send an SDES packet,
for example to say that my app is "ringing."
This has caused some confusion, even with the author. It is
really the "receiver" who is ringing, or busy, or pausing.
But it seems to be RTCP convention that an SDES packet is describing
"himself". Originally, these call progress items were part of
the receiver report. There was no release item, and the BYE
packet was used instead. (All this point needs to be clarified).

Although the general meaning of each word is clear, there
are a few comments to make about some of the CP items:

Accept: The application sends an "accept" CP item when it is
ready for the other side to start streaming its RTP data. Of
course, there are many conference models where this makes no
sense. For example, in a loosely defined model, I certainly
don't want to wait for an accept message to begin streaming.

This is entirely correct. SISP does not *require* that an
application send an "accept" message before the remote
party begins to stream. Whether or not this is necessary
is decided by means totally outside of SISP, and is definitely
a part of the conference model being used. This will be
decided by a session "announcement" or "description", or
some other means. SISP is merely a signaling protocol. SISP
claims that RTCP, supplemented with the very few additions
here, is rich enough for all Internet Signalling means.

One can make a similar comment about every item of type
CP (Call Progress). Indeed, we have seen that in the loosest
conference model, RTCP suffices (the RTP standard says it does,
so this statement is by IETF consensus true). But if one
wants to distinguish, for example, between a call that is
rejected because there was no answer or because the user
made an active choice to reject the call, one can use SISP
to do this.

We shall see another example of the fact that SISP does not
mandate conference policy, but merely allows one to express
it, in the appendix.

Pause: This SDES item just says that the receiver is stopping
recieving "for a while". It is an indication that the receiver has
"put you on hold." Note that I did not put a "Resume" item type.
When I put you on hold, you really have no way of knowing this in
an ordinary call. But one might wish to add the signal.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                 Page 10


Release: On the face of it, this is a totally superfluous
item, because there is a BYE packet. It was added so that
a receiver can also request or announce that s/he will
disconnect. It was also added to allow for some more
complicated supplementary services. The idea is that
many supplementary services end with a simultaneous
release, and an instruction for one party to do something
else. For example, in a blind call transfer, where
A has called B, and after some time, A would like B to
speak to C instead, but doesn't need to inform C about it
first (this is the "blind" part of the transfer), then
A would send an SDES with a release CP item to B, along
with an INFO CP item which said "call C."

This last example is one of many many supplementary
services. The author has checked that the very
simple list here is enough to implement the gamut of supplementary
services. Signal flows will be given in the next version
of this document.

The conclusion of this is that by adding only 3 things to
RTCP - one packet type and two new SDES items, it is possible
to use RTCP to implement the full range of Signalling needed
for Internet Conferencing Applications.

4. Complaints, complaints.

With such a scant description of SISP, it would be highly
inappropriate to critisize other attempts to provide for
internet signalling in detail. We shall try to
list general objections to previous solutions.

First, signalling should be totally separate from the location
service. Of course, a location server may indeed use SISP if that
is appropriate. But that would be signalling for the location
server call, not for the actual call one wishes to make.
SISP begins its function after the location of the remote
party has already been decided.

Second, the signalling protocol should be allow for any
conference model. For example, a protocol which *forces*
an application to distinguish "reject" and "no answer" is
flawed: the user may not wish to convey the information that
s/he rejected an invitation to confer. Certainly there
cannot be a requirement for any centralized statekeeper if
one wishes to include loosely controlled multicast
conferences.

Third, there must be the possibilty of dynamic negotiation
of capabilities in real-time, via signalling at the
time of connection. This is because one may need to reserve
machine resources, and one can only do this "at the last
minute."


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                 Page 11


Fourth, it is important to allow for independent signalling
on independent RTP streams. This in itself is a strong
motivation for starting with RTCP.

Fifth, it is important to be truly scalable, in terms of
available bandwidth, number of users in the session, and
along the tight-loose control access. This is not
an easy problem, and much work has gone into RTCP to this end.

Sixth, it is important that the signalling allow for timestamps
on signals.

Seventh, it is important in some applications that the signalling
itself be secure. At the other extreme, for some loosely
controlled conferences, it is useful to have "signal monitors"
that can "pick up" enough of the required information to
join the conference.

Eighth, by definition RTCP is everywhere where RTP is. It is
far from clear that SMTP, HTTP, etc. will be there. (Imagine
very small cheap special purpose communication devices).

Ninth, the "global id" problem is quite complicated, and
tying down multimedia conferencing to any particular solution
of this problem is difficult. In any case, the part of the
problem that is location should be treated by a location
server, and the part of the global id problem that relates
to shared ephemeral state is best treated by the simple
CNAME mechanism of RTP. The part of the problem relating
to things like dynamic IP or "Integration into Email," for
example, is not really a problem that is related to signalling.

5. Reliability of SISP messages

The reader may have the impression that the author has somehow
forgotten that RTCP is not reliable. Indeed, in trials
he has simply used TCP for the RTCP flow. Since the RTCP
traffic is really very slight, this has not caused problems,
even on slow serial links. (In fact, because of TCP/IP
compression, TCP is usually a more effecient choice over
a dial up link!). Of course in situations where it is
not possible to use TCP, some other means must be used
to ensure the reliability of the SISP signalling.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                 Page 12

6. Signal Flows

(These must be written up, but I shall give at least one!)

6.1. Of course, the simple open multicast conference is an
example of SISP signalling, as is any other conference which
relies on some non-RTP means to determine location, and then
only RTCP for conference control. But we shall give an
example of a simple Internet Telephone Call, using SISP.
In the following example A wishes to call B.
The precise timings are not given for simplicity, but
the packets sent are written in time order.

a. A sends B the following RTCP packets, in this order:
   (RTP is not yet being sent)

        SDES: identifies the caller. (This is optional)

        RDES: identifies the callee. The information used in this
          packet is obtained from some location server or other
          means.

        RCAP: identifies the recpetion capabilities of A
          (remember that if there is more than one RTP stream,
           then there will be more than one SISP stream as well).



b. B receives the three packets, and perhaps it consults with the OS
   and with some databases. It starts a ringing signal to the user,
   and sends the following packet to A:

        SDES: identifies B, and sends the "ringing" CP item


c. Perhaps after some consultation
   with the user, with some databases, and with the operating system,
   B sends the following RTCP packets, in this order:

    SDES: identifies B and sends the "accept" CP item


d. Upon getting the "accept" message, A knows that it can start
   streaming. It sends the following packet to B:

        SDES: identifies A and sends the "accept" CP item

And now B knows that it can start streaming as well.


draft-petrack-sisp-00.txt        Simple Internet Signalling Protocol
13 June 1996                                                 Page 13


Acknowledgements:

The author wishes to thank Ed Ellesson of IBM for helpful ideas
and advice, encouragement, and tolerance.

References

[1] H. Shulzrinne, S. Casner, R. Frederick, and S. McCanne, "RTP:
    A Transport Protocol for real-time applications." RFC 1889

[2] S. Shenker, A. Weinrib, E. Schooler, "Managing Shared Ephemeral
    Teleconferencing State: Policy and Mechanism." draft-mmusic-
    ietf-agree-00.ps

[3] S. Casner and V. Jacobson, "Compressing IP/UDP/RTP Headers for
    Low-Speed Serial Links." draft-casner-jacobson-crtp-00.txt

[4] S. Petrack, "Compression of Headers in RTP Streams",
        draft-petrack-crtp-00.txt

Author's Location Information

Name=Scott Petrack
Address=IBM Haifa Research Lab, Haifa 31905, Israel
Email=petrack@vnet.ibm.com
Telephone=+972 4 829 6290
Fax=+972 4 829 6112