RTCWEB J. Lennox
Internet-Draft Vidyo
Intended status: Standards Track A. Romanow
Expires: May 3, 2012 P. Witty
Cisco Systems
October 31, 2011
Real-Time Transport Protocol (RTP) Usage for Telepresence Sessions
draft-lennox-clue-rtp-usage-01
Abstract
This document describes mechanisms and recommended practice for
transmitting the media streams of telepresence sessions using the
Real-Time Transport Protocol (RTP).
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 3, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Lennox, et al. Expires May 3, 2012 [Page 1]
Internet-Draft RTP Usage for Telepresence October 2011
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Source multiplexing - overview . . . . . . . . . . . . . . . . 3
4. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5. Demultiplexing . . . . . . . . . . . . . . . . . . . . . . . . 6
5.1. Using the SSRC for demultiplexing . . . . . . . . . . . . 7
5.2. Multiplex ID . . . . . . . . . . . . . . . . . . . . . . . 8
5.3. Combined approach . . . . . . . . . . . . . . . . . . . . 9
6. Transmission of presentation sources . . . . . . . . . . . . . 9
7. Other considerations . . . . . . . . . . . . . . . . . . . . . 10
8. Security Considerations . . . . . . . . . . . . . . . . . . . 10
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
10.1. Normative References . . . . . . . . . . . . . . . . . . . 10
10.2. Informative References . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
Lennox, et al. Expires May 3, 2012 [Page 2]
Internet-Draft RTP Usage for Telepresence October 2011
1. Introduction
Telepresence systems, of the architecture described by
[I-D.ietf-clue-telepresence-use-cases] and
[I-D.ietf-clue-telepresence-requirements], will send and receive
multiple media streams, where the number of streams in use is
potentially large and asymmetric between endpoints, and streams can
come and go dynamically. These characteristics lead to a number of
architectural design choices which, while still in the scope of
potential architectures envisioned by the Real-Time Transport
Protocol [RFC3550], must be fairly different than those typically
implemented by the current generation of voice or video conferencing
systems. This document makes recommendations about how streams
should be encoded and transmitted in RTP for this telepresence
architecture.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] and
indicate requirement levels for compliant implementations.
3. Source multiplexing - overview
Telepresence sessions have lots of media streams: easily dozens at a
time (given, e.g., a continuous presence screen in a multi-point
conference), potentially out of a possible pool of hundreds.
Furthermore, endpoints will have an asymmetric number of media
streams.
In such an environment the usual model of existing SIP endpoints --
sending zero or one source (in each direction) per RTP session --
doesn't scale, and mapping asymmetric numbers of sources to sessions
is needlessly complex.
Therefore, telepresence systems SHOULD use a single RTP session per
media type, except where there's a need to give sessions different
transport treatment. All sources of the same media type are sent
over this single RTP session. This architecture (known as "source
multiplexing") was defined by [RFC3550], but was used rarely until
more recently by some Telepresence systems.
Multiplexing multiple media streams in this way has additional
advantages. It makes going through middle boxes considerably easier,
as it allows Telepresence devices to work through SIP B2BUAs that do
Lennox, et al. Expires May 3, 2012 [Page 3]
Internet-Draft RTP Usage for Telepresence October 2011
not support multiple media lines of the same media type. It also
simplifies NAT and firewall traversal by allowing endpoint to deal
with only a single address/port mapping per media type rather than
multiple mappings.
During call setup, a single RTP session is negotiated for each media
type. In SDP, only one media line is negotiated per media and
multiple media streams are sent over the same UDP channel negotiated
using the SDP media line.
A number of protocol issues involved in multiplexing RTP streams into
session are discussed in
[I-D.westerlund-avtcore-multiplex-architecture] and
[I-D.lennox-rtcweb-rtp-media-type-mux]. In this draft we concentrate
on examining the demultiplexing of RTP streams, in the specific
context of telepresence systems.
A key issue to work out is how a receiver interprets the multiple
streams it receives, and corrolates them with the captures it has
requested. In some cases, the CLUE Framework
[I-D.ietf-clue-framework]'s concept of the "capture" maps cleanly to
the RTP concept of an SSRC, but in some cases it does not.
First we will consider the cases that need to be considered. We will
then examine the two most obvious approaches to demultiplexing,
showing their pros and cons. We then describe a third possible
alternative.
4. Use Cases
There are three distinct use cases relevant for telepresence systems:
Static stream choice:
In this case, the streams sent over the multiplex are constant over
the complete session. An example is a triple-camera system to MCU in
which left, center and right streams are sent for the duration of the
session.
This describes an endpoint to endpoint, endpoint to multipoint
device, and equivalently a transcoding multipoint device to endpoint.
This is illustrated in Figure 1.
Lennox, et al. Expires May 3, 2012 [Page 4]
Internet-Draft RTP Usage for Telepresence October 2011
,'''''''''''| +-----------Y
| | | |
| +--------+|"""""""""""""""""""""""""""|+--------+ |
| |EndPoint||---------------------------||EndPoint| |
| +--------+|"""""""""""""""""""""""""""|+--------+ |
| | | |
"-----------' "------------
Figure 1: Point to Point Static Streams
Dynamic streams from a finite set:
In this case, the receiver has requested a smaller number of streams
than the number of media sources that are available, and expects the
sender to switch the sources being sent based on criteria chosen by
the sender. (This is called auto-switched in the CLUE Framework
[I-D.ietf-clue-framework].)
An example is a triple-camera system to two-screen system, in which
the sender needs to switch either LC -> LR, or CR -> LR.
This describes an endpoint to endpoint, endpoint to multipoint
device, and a transcoding device to endpoint.
This is illustrated in Figure 2.
,'''''''''''| +-----------Y
| | |+--------+ |
| +--------+|"""""""""""""""""""""""""""||EndPoint| |
| |EndPoint|| |+--------+_|
| +--------+'''''''''' '''''''''''
| |........
"-----------'
Figure 2: Point to Point Finite Source Streams
Dynamic streams from an infinite set:
This case describes a switched multipoint device to endpoint, in
which the multipoint device can choose to send any streams received
from any other endpoints within the conference to the endpoint.
For example, in an MCU to triple-screen system, the MCU could send
e.g. LCR of a triple-camera system -> LCR, or CCC of three single-
camera endpoints -> LCR.
Lennox, et al. Expires May 3, 2012 [Page 5]
Internet-Draft RTP Usage for Telepresence October 2011
This is illustrated in Figure 3.
+-+--+--+
| |EP| `-.
| +--+ |`.`-.
+-------`. `. `.
`-.`. `-.
`.`-. `-.
`-.`. `-.-------+ +------+
+--+--+---+ `.`.| +---+ ---------------| +--+ |
| |EP| +----.....:=. |MCU| ...............| |EP| |
| +--+ |"""""""""--| +---+ |______________| +--+ |
+---------+"""""""""";'.'.'.'---+ +------+
.'.'.'.'
.'.'.'.'
/ /.'.'
.'.::-'
+--+--+--+ .'.::'
| |EP| .'.::'
| +--+ .::'
+--------.'
Figure 3: Multipoint Infinite Streams
Within any of these cases, every stream within the multiplexed
session MUST have a unique SSRC. The SSRC is chosen at random
[RFC3550] to ensure uniqueness (within the conference), and contains
no meaningful information.
Any source may choose to restart a stream at any time, resulting in a
new SSRC. For example, a transcoding MCU might, for reasons of load
balancing, transfer an encoder onto a different DSP, and throw away
all context of the encoding at this state, sending an RTCP BYE
message for the old SSRC, and picking a new SSRC for the stream when
started on the new DSP.
Because of this possibility of changing the SSRC at any time, all our
use cases can be considered to be the third and most difficult case,
that of dynamic streams from an infinite set. Thus, this is the only
case we will consider.
5. Demultiplexing
There are two obvious choices in order to demultiplex: the SSRC,
which is guaranteed to be unique for a stream, but conveys no
Lennox, et al. Expires May 3, 2012 [Page 6]
Internet-Draft RTP Usage for Telepresence October 2011
intrinsic useful information, or an additional multiplex ID tagged on
to media packets. There may be other choices, e.g., payload type
number, which might be appropriate for multiplexing one audio with
one video stream on the same RTP session, but this not relevant for
the cases discussed here.
For receivers with limited decoding resources, it is particularly
important to ensure that the number of streams which the receiver is
expecting to receive never exceeds the maximum number it has
requested. On a change of stream, the receiver can be expected to
have a one-out, one-in policy, so that the decoder of the stream
currently being decoded is stopped before starting the decoder for
the stream replacing it. The sender should therefore indicate to the
receiver which stream will be replaced upon a stream change.
5.1. Using the SSRC for demultiplexing
Using the SSRC has the advantage of being included already in each
RTP packet. However, there are some disadvantages to consider.
First, the SSRC needs to be linked to some metadata to associate it
to the capture stream. This is because although it uniquely
identifies a media stream, it does not indicate which of the
requested streams each SSRC is tied to. If more than one media
stream is expected, it is therefore required to send some additional
metadata to indicate the link between the SSRC and the CLUE stream
ID. This is simply a mapping from transmitted SSRC to stream ID,
updated as new SSRCs replace old ones.
Because of the one-out, one-in codec policy, the receiver must know
in advance of receiving the media stream how to allocate its decoding
resources. Athough it could cache incoming media received before it
knows what multiplex stream it applies to, this will require an
unknown amount of storage space (particularly if the metadata is
lost), and could lead to significant latency, after which the
receiver may not find it possible to catch up because of resource
constraints, or else it would require an expensive state refresh,
such as a Full Intra Request (FIR) [RFC5104].
In addition, a receiver will have to store lookup tables of SSRCs to
stream IDs/decoders etc. Because of the large SSRC space (32 bits),
this will have to be in the form of something like a hash map, and a
lookup will have to be performed for every incoming packet, which may
prove costly on the receiver side.
Consider the choices for where to put the metadata. The metadata
could be sent in the CLUE messaging. The use of a reliable transport
means that it can be sure that the metadata will not be lost, but if
this reliability is acheived through retransmission, the time taken
Lennox, et al. Expires May 3, 2012 [Page 7]
Internet-Draft RTP Usage for Telepresence October 2011
for the metadata to reach all receivers (particularly in a very large
scale conference, e.g., with thousands of users) could result in very
poor switching times, providing a bad user experience.
A second option for sending the metadata is in RTCP, for instance as
a new SDES item. This is likely to follow the same path as media,
and therefore if the metadata is sent slightly in advance of the
media, it can be expected to be received in advance of the media.
However, because RTCP is lossy, the metadata may not be received for
some time, resulting in the receiver of the media not knowing how to
route the received media. A system of acks and retransmissions could
mitigate this, but this results in the same high switching latency
behaviour as discussed for using CLUE as a transport for the
metadata.
5.2. Multiplex ID
The second option is to tag each media packet with an RTP header
extension [RFC5285] carrying a multiplex ID. This means that a
receiver immediately knows how to interpret received media, even when
an unknown SSRC is seen. As long as the media carries a known
multiplex ID, it can be assumed that this media stream will replace
the stream currently being received with that multiplex ID.
This gives significant advantages to switching latency, as a switch
between sources can be acheived without any form of negotiation with
the receiver. There is no chance of receiving media without knowing
to which switched capture it belongs.
Although multiplex IDs may be chosen by either the sender or
receiver, the multiplex ID can, if chosen by the receiver, contain
semantic information relevant to the receiver. For example, on a
large multipoint device with many DSPs, the receiver chosen multiplex
ID could identify the DSP to which the media should be sent, and
possibly contain routing information to the DSP.
However, there are also significant disadvantages in using a
multiplex ID. It introduces additional processing costs.
Multiplex IDs are scoped only within one hop (i.e., within a cascaded
conference a multiplex ID that is used from the source to the first
MCU is not meaningful between two MCUs, or between an MCU and a
receiver), and so they may need to be modified at every stage.
To add or modify the multiplex ID is an expensive operation,
particularly if SRTP is used to authenticate the packet.
Modification to the contents of the RTP header requires a
reauthentication of the complete packet, and this could prove to be a
Lennox, et al. Expires May 3, 2012 [Page 8]
Internet-Draft RTP Usage for Telepresence October 2011
limiting factor in the throughput of a multipoint device. However,
it may be that reauthentication is required in any case due to the
nature of SDP. SDP permits the receiver to choose payload types,
meaning that a similar option to modify the payload type in the
packet header will cause the need to reauthenticate.
5.3. Combined approach
The two major flaws of the above methods (poor switching performance
of SSRC multiplexing, high computational cost on switching nodes) can
be mitigated with a combined method. In this, the multiplex ID can
be included in packets belonging to the first frame of media
(typically an IDR/GDR), but following this only the SSRC is used to
demultiplex.
Because the IDR is already required to be received before any further
frames can be decoded, this does not create any further restrictions
on the media stream -- existing mechanisms to ensure the reliability
of an IDR frame can be used. It does introduce extra complexity on
the demultiplex side, requiring a two stage process of inspecting the
packet for a multiplex ID, and, if it is not present, looking for the
SSRC in a table of known streams.
The solution is somewhat more complex if it is possible for a source
to change which switched capture is sending it: for instance, in the
second example in Section 4, when the sender switches from sending LC
-> LR to sending CR -> LR, the sender's "C" source moves from the
receiver's "R" multiplex ID to the receiver's "L" multiplex ID. For
reasons of coding efficiency, it is desirable in this case to avoid
sending a new IDR frame for the "C" stream, if the receiver's
architecture allows the same decoding state to be used for its
various multiplex IDs. In this case, the multiplex ID could be sent
for a small number of frames after the source's multiplex ID has
changed.
6. Transmission of presentation sources
Most existing videoconferencing systems use separate RTP sessions for
main and presentation video sources, distinguished by the SDP content
attribute [RFC4796]. The use of [I-D.ietf-clue-framework]the CLUE
telepresence framework to describe multiplexed streams can remove
this need. However, it could still be useful in some cases to make
the distinction between presentation and main video sources at the
transport layer. In particular, if different treatment is desired at
the transport layer or below (e.g. different VLANs, different QoS
characteristics, etc.) for main video vs presentiation, the use of
multiple RTP sessions m lines with different transport addresses
Lennox, et al. Expires May 3, 2012 [Page 9]
Internet-Draft RTP Usage for Telepresence October 2011
could would be necessary.
7. Other considerations
As currently defined, H.281 Far-End Camera Control
[ITU.H281.1994][RFC4573] does not, in SIP-based videoconferences,
support selecting among multiple remote sources (though it does in
H.323 conferences controled by an MCU, which can assign terminal IDs
to sources). When RTP sessions contain multiple sources, this
limitation becomes pressing. (However, this problem does not appear
to be in scope of the CLUE working group.)
8. Security Considerations
The security considerations for multiplexed RTP do not seem to be
different than for non-multiplexed RTP.
9. IANA Considerations
This document makes no requests of IANA.
Note to RFC Editor: please remove this section before publication as
an RFC.
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
10.2. Informative References
[I-D.ietf-clue-framework]
Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
"Framework for Telepresence Multi-Streams",
draft-ietf-clue-framework-00 (work in progress),
October 2011.
[I-D.ietf-clue-telepresence-requirements]
Lennox, et al. Expires May 3, 2012 [Page 10]
Internet-Draft RTP Usage for Telepresence October 2011
Romanow, A. and S. Botzko, "Requirements for Telepresence
Multi-Streams",
draft-ietf-clue-telepresence-requirements-01 (work in
progress), October 2011.
[I-D.ietf-clue-telepresence-use-cases]
Romanow, A., Botzko, S., Duckworth, M., Even, R., and I.
Communications, "Use Cases for Telepresence Multi-
streams", draft-ietf-clue-telepresence-use-cases-01 (work
in progress), July 2011.
[I-D.lennox-rtcweb-rtp-media-type-mux]
Lennox, J. and J. Rosenberg, "Multiplexing Multiple Media
Types In a Single Real-Time Transport Protocol (RTP)
Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work
in progress), October 2011.
[I-D.westerlund-avtcore-multiplex-architecture]
Westerlund, M., Burman, B., and C. Perkins, "RTP
Multiplexing Architecture",
draft-westerlund-avtcore-multiplex-architecture-00 (work
in progress), October 2011.
[ITU.H281.1994]
International Telecommunications Union, "A far end camera
control protocol for videoconferences using H.224", ITU-
T Recommendation H.281, 11 1994.
[RFC4573] Even, R. and A. Lochbaum, "MIME Type Registration for RTP
Payload Format for H.224", RFC 4573, July 2006.
[RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description
Protocol (SDP) Content Attribute", RFC 4796,
February 2007.
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
"Codec Control Messages in the RTP Audio-Visual Profile
with Feedback (AVPF)", RFC 5104, February 2008.
[RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP
Header Extensions", RFC 5285, July 2008.
Lennox, et al. Expires May 3, 2012 [Page 11]
Internet-Draft RTP Usage for Telepresence October 2011
Authors' Addresses
Jonathan Lennox
Vidyo, Inc.
433 Hackensack Avenue
Seventh Floor
Hackensack, NJ 07601
US
Email: jonathan@vidyo.com
Allyn Romanow
Cisco Systems
San Jose, CA 95134
USA
Email: allyn@cisco.com
Paul Witty
Cisco Systems
Langley, England
UK
Email: pauwitty@cisco.com
Lennox, et al. Expires May 3, 2012 [Page 12]