INTERNET-DRAFT                 M. Handley/J. Crowcroft/C. Bormann/J. Ott
Expires: January 1998    ISI/UCL/Universitaet Bremen/Universitaet Bremen
                                                               July 1997

           The Internet Multimedia Conferencing Architecture

Status of this memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on (Africa), (Europe), (Pacific Rim), (US East Coast), or (US West Coast).

   Distribution of this document is unlimited.


   This document provides an overview of multimedia conferencing on the
   Internet.  The protocols mentioned are specified elsewhere as RFCs,
   Internet-Drafts, or ITU recommendations.  Each of these
   specifications gives details of the protocol itself, how it works and
   what it does.  This document attempts to provide the reader with an
   overview of how the components fit together and of some of the
   assumptions made, as well as some statement of direction for those
   components still in a nascent stage.

   This document is a product of the Multiparty Multimedia Session
   Control (MMUSIC) working group of the Internet Engineering Task
   Force.  Comments are solicited and should be addressed to the working
   group's mailing list at and/or the authors.

   (To do for final version: fix references.)

Handley/Crowcroft/Bormann/Ott                                   [Page 1]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

1.  Introduction

   In conjunction with computers, the term ``conferencing'' is often
   used in two different ways: firstly, to refer to bulletin boards and
   mail list style asynchronous exchanges of messages between multiple
   users; secondly, to refer to synchronous or so-called ``real-time''
   conferencing, including audio/video communication and shared tools
   such as whiteboards and other applications.  This document is about
   the architecture for this latter application, multimedia conferencing
   in the Internet.

   There are other infrastructures for teleconferencing in the world:
   POTS (Plain Old Telephone System) networks often provide voice
   conferencing and phone-bridges, while with ISDN, H.320 [1] can be
   used for small, strictly organised video-telephony conferencing.  The
   architecture that has evolved in the Internet is far more general as
   well as being scalable to very large groups, and permits the open
   introduction of new media and new applications as they are devised.
   As the simplest case, it also allows two persons to communicate via
   audio only, so it encompasses IP telephony.

   The determining factors of a conferencing architecture are
   communication in (possibly large) groups of humans and real-time
   delivery of information.  In the Internet, this is supported at a
   number of levels.  The remainder of this section provides an overview
   of this support, and the rest of the document describes each aspect
   in more detail.

   In a conference, information must be distributed to all the
   conference participants.  Early conferencing systems used a fan-out
   of data streams, e.g., one connection between each pair of
   participants, which means that the same information must cross some
   networks more than once.  The Internet architecture uses the more
   efficient approach of multicasting the information to all
   participants (section 2).

   Multimedia conferences require real-time delivery of at least the
   audio and video information streams used in the conference.  In an
   ISDN context, fixed rate circuits are allocated for this purpose --
   whether their bandwidth is required at any particular instance or
   not.  On the other hand, the traditional Internet service model
   (``best effort'') cannot make the necessary quality of service
   available in congested networks.  New service models are being
   defined in the Internet together with protocols to reserve capacity
   in a more flexible way than that available with circuit switching
   (section 3).

   In a datagram network, multimedia information must be transmitted in
   packets, some of which may be delayed more than others.  In order
   that audio and video streams be played out at the recipient in the
   correct timing, information must be transmitted that allows the
   recipient to reconstitute the timing.  A transport protocol with the

Handley/Crowcroft/Bormann/Ott                                   [Page 2]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   specific functions needed for this has been defined (section 4).

   Conference tools such as virtual whiteboards or shared editors are
   not concerned with real-time delivery of audio or video but maintain
   and update shared state between the participants.  Work on support of
   such applications in a multicst environment is in progress (section

   The humans participating in a conference generally need to have a
   specific idea of the context in which the conference is happening,
   which can be formalized as a conference policy.  Some conferences are
   essentially crowds gathered around an attraction, while others have
   very formal guidelines on who may take part (listen in) and who may
   speak at which point.  In any case, initially the participants must
   find each other, i.e. establish communication relationships
   (conference setup, section 6).  During the conference, some
   conference control information is exchanged to implement a conference
   policy or at least to inform the crowd of who is present (section 7).

   In addition, security measures may be required to actually enforce
   the conference policy, e.g. to control who is listening and to
   authenticate contributions as actually originating from a specific
   person.  In the Internet, there is little tendency to rely on the
   traditional ``security'' of distribution offered e.g. by the phone
   system.  Instead, cryptographic methods are used for encryption and
   authentication, which need to be supported by additional conference
   setup and control mechanisms (section 8).

        Figure 1: Internet Multimedia Conferencing protocol stacks
   |<---       Conference Management       --->|<--- Media Agents   --->|
   |                                           |                        |
   |         Conference      |    Conference   | Audio/ |    Shared     |
   |     Setup & Discovery   |  Course Control | Video  |  Applications |

   +-------------------------+------+--------+-+--------+------------+  +
   |         S D P           |      | Distr. |  RTP /   |  Reliable  |  |
   | SAP | SIP | HTTP | SMTP | RSVP | Ctrl(1)|  RTCP    |Multicast(2)|  |
   +-----+--+--+------+------+   +--+--------+----------+------------+--+
   |   UDP  |      T C P     |   |                U D P                 |
   |                        IP + IP Multicast                           |
   |                 Integrated Services Forwarding                     |


   (1)  The work on distributed control for tightly coupled conferences
        is in progress (see section 6).

   (2)  See section 5.

Handley/Crowcroft/Bormann/Ott                                   [Page 3]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   The protocol stacks for Internet Multimedia Conferencing are shown in
   Figure 1.  Most of the protocols are not deeply layered unlike many
   protocol stacks, but rather are used alongside each other to produce
   a complete conference.

2.  Multicast Traffic Distribution

   IP multicast enables efficient many-to-many datagram distribution.
   It is one of the basic building blocks of the Internet Multimedia
   Conferencing architecture.  For most conferencing purposes, unicast
   is viewed as being a special case of multicast traffic.

2.1.  Multicast Service Model

   The IP multicast service model is as follows:

   -    Senders send datagrams to the address of a multicast group.

   -    Receivers express an interest in (join) certain multicast

   -    Multicast routers conspire to deliver multicast group addressed
        datagrams from the senders to the receivers.

   The important factor here is that senders do not have to know who the
   receivers are in order to be able to send to them.  In fact, in most
   situations, no single point in the network needs to know who all the
   receivers are, and it is this that makes IP multicast scalable to
   very large groups.  In addition, receivers do not need to know who
   the senders are in order to be able to receive traffic from them, and
   this solves many conference setup and resource location problems
   without needing explicit machinery.

   There are many multicast routing protocols [2-5] but all of them
   satisfy the above service model.  They differ in their mechanisms and
   in how they scale with the number of senders and groups.

   Within a single LAN, group membership is expressed by IGMP [6, 7].
   IGMP version 3 allows receivers to express an interest in only
   receiving some of the senders to a particular multicast group.
   Earlier versions of IGMP only allow a receiver to request to receive
   all the sources sending to a multicast group.

2.2.  Address Allocation

   How does an application choose a multicast address to use?

   In the absence of any other information, we can bootstrap a multicast

Handley/Crowcroft/Bormann/Ott                                   [Page 4]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   application by using well-known multicast addresses.  Routing
   (unicast and multicast) and the group membership protocol IGMP can do
   just that.  However, this is not the best way of managing
   applications of which there is more than one instance at any one

   For these, we need a mechanism for allocating group addresses
   dynamically, and a directory service which can hold these allocations
   together with some key (session information for example -- see
   later), so that users can look up the address associated with the
   application.  The address allocation and directory functions should
   be distributed to scale well.

   Address allocation schemes should avoid clashes, hence some kind of
   hash function suggests itself for forming initial ``random'' values
   for the address.  Furthermore, both the address allocation system and
   the directory service can take advantage of the baseline multicast
   mechanism by advertising conferences through multicast messages on a
   well-known address, and using this to inform other directory servers
   to remove clashes and inform applications of the allocation; see also
   section 7.

   Such advertisements, as well as the multicast traffic itself, can be
   restricted to a defined region in the network (such as a corporate
   network) by using multicast addresses out of a range reserved for
   administrative scoping [***REF***].  In the future, address
   allocation may further be influenced by the desire to allocate
   addresses such that the corresponding landmarks used in emerging
   inter-domain multicast routing protocols are close to a significant
   subset of the participants [***REF***].

3.  Internet Service Models

   Traditionally the Internet has provided best-effort delivery of
   datagram traffic from senders to receivers.  No guarantees are made
   regarding when or if a datagram will be delivered to a receiver,
   however datagrams are normally only dropped when a router exceeds a
   queue size limit due to congestion.  The best-effort Internet service
   model does not assume FIFO queuing, although many routers have
   implemented this.

   With best-effort service, if a link is not congested, queues will not
   build at routers, datagrams will not be discarded in routers, and
   delays will consist of serialisation delays at each hop plus
   propagation delays.  With sufficiently fast link speeds,
   serialisation delays are insignificant compared to propagation

  [1] For  slow  links,  a set of mechanisms has been defined that
helps minimize serialisation and link access delays [8].

Handley/Crowcroft/Bormann/Ott                                   [Page 5]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   If a link is congested, with best-effort service queuing delays will
   start to influence end-to-end delays, and packets will start to be
   lost as queue size limits are exceeded.

3.1.  Non-best effort service

   Real-time Internet traffic is defined as datagrams that are delay
   sensitive.  It could be argued that all datagrams are delay sensitive
   to some extent, but for these purposes we refer only to datagrams
   where exceeding an end-to-end delay bound of a few hundred
   milliseconds renders the datagrams useless for the purpose they were
   intended.  For the purposes of this definition, TCP traffic is
   normally not considered to be real-time traffic, although there may
   be exceptions to this rule.

   On congested links, best-effort service queuing delays will adversely
   affect real-time traffic.  This does not mean that best-effort
   service cannot support real-time traffic -- merely that congested
   best-effort links seriously degrade the service provided.  For such
   congested links, a better-than-best-effort service is desirable.

   To achieve this, the service model of the routers can be modified.
   At a minimum, FIFO queuing can be replaced by packet forwarding
   strategies that discriminate different ``flows'' of traffic.  The
   idea of a flow is very general.  A flow might consist of ``all
   marketing site web traffic'', or ``all fileserver traffic to and from
   teller machines'' or ``all traffic from the CEOs laptop wherever it
   is''.  On the other hand, a flow might consist of a particular
   sequence of packets from an application in a particular machine to a
   peer application in another particular machine between specific times
   of a specific day.

   Flows are typically identifiable in the Internet by the tuple:
   {source machine, destination machine, source port, destination port,
   protocol} any of which could be ``ANY'' (wildcarded).

   In the multicast case, the destination is the group, and can be used
   to provide efficient aggregation.

   Flow identification is called classification and a class (which can
   contain one or more flows) has an associated service model applied.
   This can default to best effort.

   Through network management, we can imagine establishing classes of
   long lived flows -- enterprise networks (``Intranets'') often enforce
   traffic policies that distinguish priorities which can be used to
   discriminate in favor of more important traffic in the event of
   overload (though in an underloaded network, the effect of such
   policies will be invisible, and may incur no load/work in routers).

   The router service model to provide such classes with different

Handley/Crowcroft/Bormann/Ott                                   [Page 6]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   treatment can be as simple as a priority queuing system, or it can be
   more elaborate.

   Although best-effort services can support real-time traffic,
   classifying real-time traffic separately from non-real-time traffic
   and giving real-time traffic priority treatment ensures that real-
   time traffic sees minimum delays.  Non-real-time TCP traffic tends to
   be elastic in its bandwidth requirements, and will then tend to fill
   any remaining bandwidth.

   We could imagine a future Internet with sufficient capacity to carry
   all of the world's telephony traffic.  Since this is a relatively
   modest capacity requirement, it might be simpler to establish
   ``POTS'' as a static class which is given some fraction of the
   capacity overall, and then within the backbone of the network no
   individual call need be given an allocation (i.e. we would no longer
   need the call setup/tear down that was needed in the legacy POTS
   which was only present due to under-provisioning of trunks, and to
   allow the trunk exchanges the option of call blocking).  The vision
   is of a network that is engineered with capacity for all of the
   average load sources to send all the time.

3.2.  Reservations

   For flows that may take a significant fraction of the network (i.e.
   are ``special'' and can't just be lumped under a static class), we
   need a more dynamic way of establishing these classifications.  In
   the short term, this applies to any multimedia calls since the
   Internet is largely under-provisioned at the time of writing.

   RSVP is being standardised for just this purpose.  It provides flow
   identification and classification.  Hosts and applications are
   modified to speak RSVP client language, and routers speak RSVP.

   Since most traffic requiring reservations is delivered to groups
   (e.g. TV), it is natural for the receiver to make the request for a
   reservation for a flow.  This has the added advantage that different
   receivers can make heterogeneous requests for capacity from the same
   source.  Thus RSVP can accommodate monochrome, color and HDTV
   receivers from a single source.

   Again the routers conspire to deliver the right flows to the right

   RSVP accommodates the wildcarding noted above.

3.3.  Admission Control

   If a network is provisioned such that it has excess capacity for all

Handley/Crowcroft/Bormann/Ott                                   [Page 7]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   the real-time flows using it, a simple priority classification
   ensures that real-time traffic is minimally delayed.  However, if a
   network is insufficiently provisioned for the traffic in a real-time
   traffic class, then real-time traffic will be queued, and delays and
   packet loss will result.  Thus in an under-provisioned network,
   either all real-time flows will suffer, or some of them must be given

   RSVP provides a mechanism by which an admission control request can
   be made, and if sufficient capacity remains in the requested traffic
   class, then a reservation for that capacity can be put in place.

   If insufficient capacity remains, the admission request will be
   refused, but the traffic will still be forwarded with the default
   service for that traffic's traffic class.  In many cases even an
   admission request that failed at one or more routers can still supply
   acceptable quality as it may have succeeded in installing a
   reservation in all the routers that were suffering congestion.  This
   is because other reservations may not be fully utilising their
   reserved capacity in those routers where the reservation failed.

3.4.  Billing

   If a reservation involves setting aside resources for a flow, this
   will tie up resources so that other reservations may not succeed, and
   depending on whether the flow fills the reservation, other traffic is
   prevented from using the network.  Clearly some negative feedback is
   required in order to prevent pointless reservations from denying
   service to other users.  This feedback is typically in the form of
   billing.  For real-time non-best effort multicast traffic that is not
   reserved, this negative feedback is provided in the form of loss due
   to congestion of a traffic class, and it is not clear that usage
   based billing is required.

   Billing requires that the user making the reservation is properly
   authenticated so that the correct user can be charged.  Billing for
   reservations introduces a level of complexity to the Internet that
   has not typically been experienced with non-reserved traffic, and
   requires network providers to have reciprocal usage-based billing
   arrangements for traffic carried between them.  It also suggests the
   use of mechanisms whereby some fraction of the bill for a link
   reservation can be charged to each of the downstream multicast

4.  Audio/Video Transport Protocols

   So-called real-time delivery of traffic requires little in the way of
   transport protocol.  In particular, real-time traffic that is sent
   over more than trivial distances is not retransmittable.

Handley/Crowcroft/Bormann/Ott                                   [Page 8]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

4.1.  Separate Flows for each Media Stream

   With packet multimedia data there is no need for the different media
   comprising a conference to be carried in the same packets.  In fact
   it simplifies receivers if different media streams are carried in
   separate flows (i.e., separate transport ports and/or separate
   multicast groups).  This also allows the different media to be given
   different quality of service.  For example, under congestion, a
   router might preferentially drop video packets over audio packets.
   In addition, some sites may not wish to receive all the media flows.
   For example, a site with a slow access link may be able to
   participate in a conference using only audio and a whiteboard whereas
   other sites in the same conference may also send and receive video.

4.2.  Receiver Adaptation

   Best-effort traffic is delayed by queues in routers between the
   sender and the receivers.  Even reserved priority traffic may see
   small transient queues in routers, and so packets comprising a flow
   will be delayed for different times.  Such delay variance is known as

   Real-time applications such as audio and video need to be able to
   buffer real-time data at the receiver for sufficient time to remove
   the jitter added by the network and recover the original timing
   relationships between the media data.  In order to know how long to
   buffer for, each packet must carry a timestamp which gives the time
   at the sender when the data was captured.  Note that for audio and
   video data timing recovery, it is not necessary to know the absolute
   time that the data was captured at the sender, only the time relative
   to the other data packets.

4.3.  Synchronisation

   As audio and video flows will receive differing jitter and possibly
   differing quality of service, audio and video that were grabbed at
   the same time at the sender may not arrive at the receiver at the
   same time.  At the receiver, each flow will need a playout buffer to
   remove network jitter.  Inter-flow synchronisation can be performed
   by adapting these playout buffers so that samples/frames that
   originated at the same time are played out at the same time.  This
   requires that the time base of different flows from the same sender
   can be related at the receivers, e.g. by making available the
   absolute times at which each of them was captured.

Handley/Crowcroft/Bormann/Ott                                   [Page 9]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

4.4.  RTP

   The transport protocol for real-time flows is RTP [9].  This provides
   a standard format packet header which gives media specific timestamp
   data, as well as payload format information and sequence numbering
   amongst other things.  RTP is normally carried using UDP.  It does
   not provide or require any connection setup, nor does it provide any
   enhanced reliability over UDP.  For RTP to provide a useful media
   flow, there must be sufficient capacity in the relevant traffic class
   to accommodate the traffic.  How this capacity is ensured is
   independent of RTP.

   Every original RTP source is identified by a source identifier, and
   this source id is carried in every packet.  RTP allows flows from
   several sources to be mixed in gateways to provide a single resulting
   flow.  When this happens, each mixed packet contains the source ids
   of all the contributing sources.

   RTP media timestamp units are flow specific -- they are in units that
   are appropriate to the media flow.  For example, 8kHz sampled PCM
   encoded audio has a timestamp clock rate of 8kHz.  This means that
   inter-flow synchronisation is not possible from the RTP timestamps

   Each RTP flow is supplemented by Real-Time Control Protocol (RTCP)
   packets.  There are a number of different RTCP packet types.  RTCP
   packets provide the relationship between the real-time clock at a
   sender and the RTP media timestamps, and provide textual information
   to identify a sender in a conference from the source id.

4.5.  Conference Membership and Reception Feedback

   IP multicast allows sources to send to a multicast group without
   being a receiver of that group.  However, for many conferencing
   purposes it is useful to know who is listening to the conference, and
   whether the media flows are reaching receivers properly.  Accurately
   performing both these tasks restricts the scaling of the conference.
   IP multicast means that no-one knows the precise membership of a
   multicast group at a specific time, and this information cannot be
   discovered, as to try to do so would cause an implosion of messages,
   many of which would be lost[2].  Instead, RTCP provides approximate
   membership information through periodic multicast of session messages
   which, in addition to information about the recipient, also give
   information about the reception quality at that receiver.  RTCP
   session messages are restricted in rate, so that as a conference
  [2] Note that a conference policy that restricts conference mem-
bership can be implemented using encryption and restricted distri-
bution of encryption keys, of which more later.

Handley/Crowcroft/Bormann/Ott                                  [Page 10]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   grows, the rate of session messages remains constant, and each
   receiver reports less often.  A member of the conference can never
   know exactly who is present at a particular time from RTCP reports,
   but does have a good approximation to the conference membership.

   Reception quality information is primarily intended for debugging
   purposes, as debugging of IP multicast problems is a difficult task.
   However, it is possible to use reception quality information for rate
   adaptive senders, although it is not clear whether this information
   is sufficiently timely to be able to adapt fast enough to transient
   congestion.  However, it is certainly sufficient for Van Jacobson
   congestion control [10] style adaption to a ``share'' of the current

4.6.  Control of Stream Playback and Recording

   A control protocol for initiating and controlling playing and
   recording audio, video, and other RTP-based information is the Real-
   Time Stream control Protocol (RTSP) [11].  While primarily intended
   for web-based media-on-demand services, RTSP may also be used in the
   context of teleconferences to make recorded audio/video information
   available to the participants, or to control recording the course of
   the conference.

5.  Protocols for Non-A/V Applications

   Applications other than audio and video have evolved in Internet
   conferencing, e.g. Imm, Wb [12], Nt.  Such applications can be used
   to substitute for meeting aids in physical conferences (whiteboards,
   projectors) or replace visual and auditory cues that are lost in
   teleconferences (e.g., a speaker list application); they also can
   enable new styles of joint work.

   Most non-A/V applications have in common that the application
   protocol is about establishing and updating a shared state.  Loss of
   information is often not acceptable, so some form of multicast
   reliability is required.  The applications' requirements differ: Some
   applications make per-participant additions to the shared state that
   are orthogonal to each other (e.g., whiteboards), some evolve a more
   closely interrelated common state (e.g., additions to a speaker list
   must be properly sequenced).  Some applications can make use of added
   bandwidth/react to congestion in an elastic way, others transport
   data that, although not strictly real-time, is time-critical.

   In the IRTF research group on Reliable Multicast, work is in progress
   on common protocol elements that can be used in such applications.
   At the time of writing, some aspects of reliable multicast are not
   well-understood, such as the proper way to provide congestion control
   in a multicast environment.  As congestion control is considered an
   essential element, standards track protocols are not expected before
   this can be solved.  Refer to for further

Handley/Crowcroft/Bormann/Ott                                  [Page 11]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997


6.  Conference Control

   Conferences come in many shapes and sizes, but there are only really
   two models for conference control: light-weight sessions and tightly
   coupled conferencing.  For both models, rendezvous mechanisms are
   needed.  Note that the conference control model is orthogonal to
   issues of quality of service and network resource reservation.  Note
   also that the issue of conference control is orthogonal to the
   mechanism for discovering the conference.

6.1.  Light-weight Sessions

   Light-weight sessions are multicast based multimedia conferences that
   lack explicit session membership and explicit conference control
   mechanisms.  Typically a lightweight session consists of a number of
   many-to-many media streams supported using RTP and RTCP using IP
   multicast[3].  The only conference control information available
   during the course of light-weight sessions is that distributed in the
   RTCP session information, i.e. an approximate membership list with
   some attributes per member.

6.2.  Tightly coupled Conferences

   Tightly coupled conferences may also be multicast based and use RTP
   and RTCP, but in addition they have an explicit conference membership
   mechanism and may have an explicit conference control mechanism that
   provides facilities such as floor control.

   At the time of writing, no standard mechanism for performing tightly
   coupled conference control currently exists in the Internet
   community.  Another standards body, the ITU, has defined two
   standards that can be used in the Internet:

   -    The T.120 series of recommendations includes a centralized
        conference control protocol currently used for data application
        only, T.124 [13].

   -    Recommendation H.323 for Multi-Media Conferences for Packet-
  [3] There  is some confusion on the term session, which is some-
times used for a conference and sometimes for a related set of me-
dia  streams transported by RTP and perceived as a unit, e.g., the
audio channel in a conference.  In this document, we prefer to use
the less ambiguous term conference except where existing protocols
use the term session.

Handley/Crowcroft/Bormann/Ott                                  [Page 12]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

        based Network Environments  [14] specifies a point-to-point
        channel setup protocol [15] that also covers a few multipoint
        conferencing aspects.

   As T.124 is not accepted by the industry as a basis for audiovisual
   conference control on one hand and H.245 does not provide distributed
   control for tightly coupled conferences on the other hand, there is
   no obvious choice.  The Simple Conference Control Protocol (SCCP)
   [16] is being developed as a prototype towards providing this kind of
   control (being a shared state application, SCCP could also benefit
   from developments in the area of reliable multicast).  A future
   distributed conference control protocol could be used as the
   distributed control mode envisioned by H.323 (which has not yet been
   addressed by the ITU).

7.  Conference Discovery

   There are two basic forms of conference discovery mechanism.  These
   are session advertisement and session invitation.  Session
   advertisements are provided using a session directory, and inviting a
   user to join a session is provided using a session invitation

7.1.  Session Directories

   The rendezvous mechanism for light-weight sessions is a multicast
   based session directory.  This distributes session descriptions [17]
   to all the potential session participants.  These session
   descriptions provide an advertisement that the session will exist,
   and also provide sufficient information including multicast
   addresses, ports, media formats and session times so that a receiver
   of the session description can join the session.  The protocol SDP
   (session description protocol) describes contents and format of the
   session descriptions.

   As dynamic multicast address allocation can be optimised by knowing
   which addresses are in use at which times, the session directory is
   an appropriate agent to perform multicast address allocation.  SAP
   (session announcement protocol) is the protocol used by the session
   directory agents [18].

   This mechanism can also be applied to advertised tightly coupled
   sessions, and only requires that additional information about the
   mechanism to use to join the session is given.

7.2.  Session Invitation

Handley/Crowcroft/Bormann/Ott                                  [Page 13]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   Not all sessions are advertised, and even those that are advertised
   may require a mechanism to explicitly invite a user to join a
   session.  Such a mechanism is required regardless of whether the
   session is a lightweight session or a more tightly coupled session,
   although the invitation system must specify the mechanism to be used
   to join the session.

   As users are mobile, it is important that such a mechanism is capable
   of locating and inviting a user in a location independent manner.
   This requires an extra level of indirection (addressing).  The
   invitation mechanism should also provide for alternative responses,
   such as leaving a message or being referred to another user, should
   the invited user be unavailable.

   Based on a protocol with many of the properties required [19], a
   session initiation protocol (SIP) is being developed [20].

8.  Security

   There is a temptation to believe that multicast is inherently less
   private than unicast communication since the traffic visits so many
   more places in the network.  In fact, this is not the case except
   with broadcast and prune type multicast routing protocols.  However,
   IP multicast does make it simple for a host to anonymously join a
   multicast group and receive traffic destined to that group without
   the other senders' and receivers' knowledge.  If the application
   requirement (conference policy) is to communicate between some
   defined set of users, then strict privacy can only be enforced in any
   case through adequate end-to-end encryption.

   RTP specifies a standard way to encrypt RTP and RTCP packets using
   private key encryption schemes such as DES [21].  It also specifies a
   standard mechanism to manipulate plain text keys using MD5 [22] so
   that the resulting bit string can be used as a DES key.  This allows
   simple out-of-band mechanisms such as privacy-enhanced mail to be
   used for encryption key exchange.

8.1.  Authentication and Key Distribution

   Key distribution is closely tied to authentication.  Conference or
   session directory keys can be securely distributed using public-key
   cryptography on a one-to-one basis (by email, a directory service, or
   by an explicit conference setup mechanism), but this is only as good
   as the certification mechanism used to certify that a key given by a
   user is the correct public key for that user.  Such certification
   mechanisms [23] are not specific to conferencing, and no standard
   mechanisms are currently in use for conferencing purposes other than
   PEM [24].

   At the time of writing, no standard mechanisms for key distribution

Handley/Crowcroft/Bormann/Ott                                  [Page 14]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   are defined for the conference setup and control protocols in use.

   Even without privacy requirements in the conference policy, strong
   authentication of a user is required if making a network reservation
   results in usage based billing.

8.2.  Encrypted Session Announcements

   Session Directories can make encrypted session announcements using
   private key encryption, and carry the encryption keys to be used for
   each of the conference media streams in the session.  Whilst this
   does not solve the key distribution problem, it does allow a single
   conference to be announced more than once to more than one key-group,
   where each group holds a different session directory key, so that the
   two groups can be brought together into a single conference without
   having to know each other's keys.

9.  Summary

   This document is an attempt to gather together in one place the set
   of assumptions behind the design of the Internet Multimedia
   Conferencing architecture, and the services that are provided to
   support it.

10.  Acknowledgments and Authors' Addresses

   Acknowledgments are due to the End-to-End Research Group, the Int-
   serv, RSVP, MMUSIC and AVT working groups of the IETF, and discussion
   with colleagues at UCL.  The earliest clear exposition of the ideas
   here can be found at and
   was presented at ACM SIGCOMM 1994 in London by Van Jacobson.

           Mark Handley
           (fix me)

           Jon Crowcroft
           Department of Computer Science
           University College London
           Gower Street,
           London WC1E 6BT
           fax +44 171 387 1397

Handley/Crowcroft/Bormann/Ott                                  [Page 15]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

           Carsten Bormann
           Universitaet Bremen
           Postfach 330440
           D-28334 Bremen
           fax +49 421 218-7000


   1.  ITU, Recommendation H.320.

   2.  S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C.-G. Liu, and
       L. Wei, An Architecture for Wide Area Multicast Routing, 24, pp.
       126-135, ACM SIGCOMM, October 1994.

   3.  S. Deering, C. Partridge, and D. Waitzman, "Distance Vector
       Multicast Routing Protocol," RFC 1075, November 1988.

   4.  A. Ballardie, P. Francis, and J. Crowcroft, An Architecture for
       Scalable Inter-Domain Multicast Routing, pp. 85-95, ACM SIGCOMM,

   5.  J. Moy, "Multicast Extensions to OSPF," RFC 1584, March 1994.

   6.  S. Deering, Multicast Routing in Internetworks and Extended LANs,
       pp. 55-64, ACM SIGCOMM, August 1988.

   7.  S. Deering, "Host Extensions for IP Multicasting," RFC 1112,
       August 1989.

   8.  C. Bormann, "Providing integrated services over low-bitrate
       links," Internet-Draft draft-ietf-issll-isslow-02.txt, Work in
       Progress, May 1997.

   9.  H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A
       Transport Protocol for Real-Time Applications," RFC 1889.

   10. V. Jacobson, Congestion Avoidance and Control, ACM SIGCOMM,
       August 1988.

   11. H. Schulzrinne, A. Rao, and R. Lanphier, "Real-Time Stream
       Control Protocol (RTSP)," Internet-Draft draft-ietf-mmusic-
       rtsp-0x.txt, Work in Progress, (fix me).

   12. S. Floyd, V. Jacobson, S. McCanne, C.-G. Liu, and L. Zhang, A
       Reliable Multicast Framework for Light-weight Sessions and
       Application Level Framing, pp. 342-356, ACM SIGCOMM, 1995.

Handley/Crowcroft/Bormann/Ott                                  [Page 16]

INTERNET-DRAThe Internet Multimedia Conferencing Architecture  July 1997

   13. ITU-T, Recommendation T.124 -- Generic Conference Control.

   14. ITU-T, Recommendation H.323 -- Multi-Media Conferences for
       Packet-based Network Environments.

   15. ITU-T, Recommendation H.245.

   16. C. Bormann, J. Ott, and C. Reichert, "Simple Conference Control
       Protocol," Internet-Draft draft-ietf-mmusic-sccp-0x.txt, Work in
       Progress, (fix me).

   17. M. Handley and V. Jacobson, "SDP: Session Description Protocol,"
       Internet-Draft draft-ietf-mmusic-sdp-0x.txt, Work in Progress,
       (fix me)..

   18. M. Handley and V. Jacobson, "SAP: Session Announcement Protocol,"
       Internet-Draft draft-ietf-mmusic-sap-0x.txt, Work in Progress,
       (fix me)..

   19. H. Schulzrinne, "Personal Mobility for Multimedia Services in the
       Internet," IMDS'96, March 1996..

   20. M. Handley, H. Schulzrinne, and E. Schooler, "SIP: Session
       Initiation Protocol," Internet-Draft draft-ietf-mmusic-
       sip-0x.txt, Work in Progress, (fix me)..

   21. National Institute of Standards and Technology (NIST), FIPS
       Publication 46-1: Data Encryption Standard, January 1988.

   22. R. Rivest, "The MD5 Message-Digest Algorithm," RFC 1321, April

   23. CCITT, Recommendation X.509: The Directory -- Authentication
       Framework, 1988..

   24. J. Linn, "Privacy Enhancement for Internet Electronic Mail: Part
       I: Message Encryption and Authentication Procedures," RFC 1421,
       Feb 1993.

Handley/Crowcroft/Bormann/Ott                                  [Page 17]