Network Working Group Magnus Westerlund
INTERNET-DRAFT Ericsson
Expires: March 2007 Stephan Wenger
Nokia
September 17, 2006
RTP Topologies
draft-ietf-avt-topologies-01.txt>
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
This document disucsses multi-endpoint topologies commonly used in
RTP based environments. In particular, centralized topologies
commonly employed in the video conferencing industry are mapped to
the RTP terminology.
Wenger, et al. [Page 1]
INTERNET-DRAFT RTP Topologies September 17, 2006
TABLE OF CONTENTS
Status of this Memo................................................1
Copyright Notice...................................................1
Abstract...........................................................1
TABLE OF CONTENTS..................................................2
1. Introduction....................................................3
2. Definitions.....................................................3
2.1. Glossary...................................................3
2.2. Terminology................................................3
2.3. Topologies.................................................4
2.3.1. TOPO10: Point to Point................................4
2.3.2. TOPO20: Point to Multi-point using Multicast..........4
2.3.3. TOPO30: Point to Multipoint using the RFC 3550
translator...................................................5
2.3.4. TOPO40: Point to Multipoint using the RFC 3550 mixer
model........................................................8
2.3.5. TOPO50: Point to Multipoint using video switching MCU 10
2.3.6. TOPO60: Point to Multipoint using RTCP-terminating MCU11
2.3.7. Combining Topologies.................................12
3. Security Considerations........................................13
4. IANA Considerations............................................13
5. Acknowledgements...............................................13
6. References.....................................................14
6.1. Normative references......................................14
6.2. Informative references....................................14
7. Authors' Addresses.............................................14
8. List of Changes relative to previous drafts....................15
RFC Editor Considerations.........................................16
Wenger, et al. Informational [Page 2]
INTERNET-DRAFT RTP Topologies September 17, 2006
1. Introduction
When working on the Codec Control Messages [CCM], we noticed a
considerable confusion in the community with respect to terms such
as MCU, mixer, and translator. In the process of writing, we
became increasingly unsure of our own understanding, and therefore
added what became the core of this draft to the CCM draft. Later,
it was found that this information has its own value, and was
"outsourced" from the CCM draft into the present memo.
It could be argued that this document clarifies and explains
sections of the RTP spec [RFC3550], and is therefore of
informational nature. In this case, the present memo may end up
as an informational RFC.
When the Audio-Visual Profile with Feedback (AVPF) [AVPF] was
developed, the main emphasis lied in the efficient support of
point-to-point and small multipoint scenarios without centralized
multipoint control. However, in practice, many small multipoint
conferences operate utilizing devices known as Multipoint Control
Units (MCUs). MCUs comprise mixers and translators (in RTP
[RFC3550] terminology), but also signalling support
2. Definitions
2.1. Glossary
ASM - Asynchronous Multicast
AVPF - The Extended RTP Profile for RTCP-based Feedback
MCU - Multipoint Control Unit
PtM - Point to Multipoint
PtP - Point to Point
2.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119 [RFC2119].
Wenger, et al. Informational [Page 3]
INTERNET-DRAFT RTP Topologies September 17, 2006
2.3. Topologies
This subsection defines several basic topologies that are relevant
for codec control. The first four relate to the RTP system model
utilizing multicast and/or unicast, as envisioned in RFC 3550.
The last two topologies, in contrast, describe the widely deployed
system model as used in most H.323 video conferences, where both
the media streams and the RTCP control traffic terminate at the
MCU. More topologies can be constructed by combining any of the
models, see Section 2.3.7.
The topologies may be referenced by a shortcut name, indicated by
the prefix "Topo-".
2.3.1. Point to Point
Shortcut name: Topo-Point-to-Point
The Point to Point (PtP) topology (Figure 1) consists of two end-
points with unicast capabilities between them. Both RTP and RTCP
traffic are conveyed endpoint to endpoint using unicast traffic
only (even if this unicast traffic happens to be conveyed over an
IP-multicast address).
+---+ +---+
| A |<------->| B |
+---+ +---+
Figure 1 - Point to Point
The main property of this topology is that A sends to B and only
B, while B sends to A and only A. This avoids all complexities of
handling multiple endpoints and combining the requirements from
them. Do note that an endpoint may still use multiple RTP
Synchronization Sources (SSRCs) in an RTP session.
2.3.2. Point to Multi-point using Multicast
Shortcut name: Topo- Multicast
Wenger, et al. Informational [Page 4]
INTERNET-DRAFT RTP Topologies September 17, 2006
+-----+
+---+ / \ +---+
| A |----/ \---| B |
+---+ / Multi- \ +---+
+ Cast +
+---+ \ Network / +---+
| C |----\ /---| D |
+---+ \ / +---+
+-----+
Figure 2 - Point to Multipoint using Multicast
We define Point to Multipoint (PtM) using multicast topology as a
transmission model in which traffic from any participant reaches
all the other participants, except for cases such as
o packet loss occurs,
o a participant does not wish to receive the traffic
for a specific media stream, and therefore has not
subscribed to the IP multicast group in question.
In this sense, "traffic" encompasses both RTP and RTCP traffic.
The number of participants can be between one and many -- as RTP
and RTCP scales to very large multicast groups (the theoretical
limit of RTP is approximately two billion participants).
This draft is primarily interested in the subset of multicast
session where the number of participants in the multicast group
allows the participants to use early or immediate feedback as
defined in AVPF. This document refers to those groups as as
"small multicast groups".
2.3.3. Point to Multipoint using the RFC 3550 translator
Shortcut name: Topo-Translator
Two main categories of Translators can be distinguished.
Transport Translators do not modify the media stream itself, but
are concerned with transport parameters. Transport parameters, in
the sense of this section, comprise the transport addresses to
bridge different domains, and the media packetization to allow
other transport protocols to be interconnected to a session
(gateways).
Media Translators, in contrast, modify the media stream itself.
This process is commonly known as transcoding. The modification
of the media stream can be as small as removing parts of the
Wenger, et al. Informational [Page 5]
INTERNET-DRAFT RTP Topologies September 17, 2006
stream, and can go all the way to a full transcoding utilizing a
different media codec. Media translators are commonly used to
connect entities without a common interoperability point.
Stand-alone Media Translators are rare. Most commonly, a
combination of Transport and Media Translators are used to
translate both the media stream and the transport aspects of a
stream between two transport domains (or clouds).
Both Translator types share common attributes that separates them
from mixers. For each media stream that the translator receives,
it generates an individual stream in the other domain. However, a
translator maintains a complete view of all existing participants
between both domains. Therefore, the SSRC space is shared across
the two domains.
The RTCP translation process can be trivial, for example when
Transport translators just need to adjust IP addresses, and can be
quite complex in the case of media translators. See section 7.2
of [RFC 3550].
+-----+
+---+ / \ +------------+ +---+
| A |<---/ \ | |<---->| B |
+---+ / Multi- \ | | +---+
+ Cast +->| Translator |
+---+ \ Network / | | +---+
| C |<---\ / | |<---->| D |
+---+ \ / +------------+ +---+
+-----+
Figure 3 - Point to Multipoint using a Translator
Figure 3 depicts an example of a Transport Translator performing
at least IP address translation. It allows the (non multicast
capable) participants B and D to take part in a multicasted
session by having the translator forward their unicast traffic to
the multicast addresses in use, and vice versa. It must also
forward B's traffic to D and vice versa, to provide each of B and
D with a complete view of the session.
If B were behind a limited link, the translator may perform media
transcoding to allow the traffic received from the other
participants to reach B without overloading the link.
When in the example depicted in Figure 3 the translator acts only
as a Transport Translator, then the RTCP traffic can simply be
forwarded, similar to the media traffic. However, when media
Wenger, et al. Informational [Page 6]
INTERNET-DRAFT RTP Topologies September 17, 2006
translation occurs, the translator's task becomes substantially
more complex even with respect to the RTCP traffic. In this case,
the translator needs to rewrite B's RTCP receiver report, before
forwarding them to D and the multicast network. The rewriting is
needed as the stream received by B is not the same stream as the
other participants receive. For example, the number of packets
transmitted to B may be lower than what D receives, due to the
different media format. Therefore, if the receiver reports were
forwarded without changes, the extended highest sequence number
would indicate that B were substantially behind in reception --
while it most likely it would not be. Therefore, the translator
must translate that number to a corresponding sequence number for
the stream the translator received. Similar arguments can be made
for most other fields in the RTCP receiver reports.
As specified in Section 7.1 of [RFC3550] the SSRC space is common
for all participants in the session, independent of which side
they are of the translator. Thus it is the responsibility of the
participants to run SSRC collision detection, and the SSRC a field
the translator should not change.
+---+ +------------+ +---+
| A |<---->| Multipoint |<---->| B |
+---+ | Control | +---+
| Unit |
+---+ | (MCU) | +---+
| C |<---->| |<---->| D |
+---+ +------------+ +---+
Figure 4 - MCU with RTP Translator (relay) with only unicast links
A common MCU scenario is the one depicted in Figure 4. Herein,
the MCU connects multiple users of a conference through unicast.
This can be implemented using a very simple transport translator,
which could be called a relay. The relay forwards all traffic it
receives, both RTP and RTCP, to all other participants. In doing
so, a multicast network is emulated without relying on a multicast
capable network structure.
A translator normally does not use an SSRC of its own, and is not
visible as an active participant in the session. However, it may
act as a media receiver, thus have an SSRC, and use RTCP to report
reception statistics. However this behavior should only be used
when it is really desirable to have this feedback, i.e. having it
act as special type of quality monitor.
It also needs to be noted that the translator, in some cases, may
act on behalf of the "real" source and respond to codec control
messages. in his capacity as media translator. This for example
Wenger, et al. Informational [Page 7]
INTERNET-DRAFT RTP Topologies September 17, 2006
may occur if a receiver requests a bandwidth reduction, and the
media translator has not detected any congestion or other reasons
for bandwidth reduction between the media source and itself. In
that case, a translator should be able to react to codec control
messages, as it is capable of fulfilling the request on behalf of
the media sender. If it wouldn't react to codec control, and
therefore couldn't fullfil the request, the media quality in the
media senders domain would suffer.
2.3.4. Point to Multipoint using the RFC 3550 mixer model
Shortcut name: Topo-Mixer
A mixer is a middlebox that aggregates multiple RTP streams that
are part of a session, by mixing the media data and generating a
new RTP stream. One common application for a mixer is to allow a
participant to receive a session with a reduced amount of
resources.
+-----+
+---+ / \ +-----------+ +---+
| A |<---/ \ | |<---->| B |
+---+ / Multi- \ | | +---+
+ Cast +->| Mixer |
+---+ \ Network / | | +---+
| C |<---\ / | |<---->| D |
+---+ \ / +-----------+ +---+
+-----+
Figure 5 - Point to Multipoint using RFC 3550 mixer model
A mixer can be viewed as a device terminating the media streams
received from other session participants. Using the media data
from the received media streams, a mixer generates a media stream
that is sent to the session participant.
The content that the mixer provides is the mixed aggregate of what
the mixer receives from the PtP or PtM links, which are part of
the same conference session.
The mixer is the content source, as it mixes the content (often in
the uncompressed domain) and then encodes it for transmission to a
participant. The CC and CSRC fields in the RTP header are used to
indicate the contributors of to the newly generated stream. The
SSRCs of the to-be-mixed streams on the mixer input appear as the
CSRCs at the mixer output. That output stream uses a new SSRC
that identifies the Mixer. The CSRC are forwarded between the two
domains to allow for loop detection and identification of sources
Wenger, et al. Informational [Page 8]
INTERNET-DRAFT RTP Topologies September 17, 2006
that are part of the global session. Note that Section 7.1 of RFC
3550 requires the SSRC space to be shared between domains for
these reasons.
The mixer is responsible for generating RTCP packets in accordance
with its role. It is a receiver and should therefore send
reception reports for the media streams it receives. As a media
sender itself it should also generate sender report for those
media streams sent. The content of the SRs created by the mixer
may or may not take into account the situation on its receiving
side. Similarly, the content of RRs created by the mixer may or
may not be based on the situation on the mixer's sending side.
This is left open to the implementation. As specified in Section
7.3 of RFC 3550, a mixer must not forward RTCP unaltered between
the two domains.
The mixer depicted in Figure 5 has three domains that needs to be
separated; the multicast network, participant B and participant D.
The Mixer produces different mixed streams to B and D, as the one
to B may contain D and vice versa. However the mixer does only
need one SSRC in each domain that is the receiving entity and
transmitter of mixed content.
In the multicast domain, the mixer does not need to provide a
mixed view of the other domains and will commonly only forward the
media from B and D into the multicast network using B's and D's
SSRC.
The mixer is responsible for receiving the codec control messages
and handles them appropriately. The definition of "appropriate"
depends on the message itself and the context. In some cases, the
reception of a codec control message may result in the generation
and transmission of codec control messages by the mixer to the
participants in the other domain. In other cases, a message is
handled by the mixer itself and therefore not forwarded to any
other domains.
It should be noted that this form of mixing technology is not
widely deployed. Most multipoint video conferences used today
employ one of the models discussed in the next sections.
When replacing the multicast network in Figure 5 (to the left of
the mixer) with individual unicast links as depicted in Figure 6,
the mixer model is very similar to the one discussed in section
2.3.6 below.
Wenger, et al. Informational [Page 9]
INTERNET-DRAFT RTP Topologies September 17, 2006
+---+ +------------+ +---+
| A |<---->| Multipoint |<---->| B |
+---+ | Control | +---+
| Unit |
+---+ | (MCU) | +---+
| C |<---->| |<---->| D |
+---+ +------------+ +---+
Figure 6 - RTP Mixer with only unicast links
2.3.5. Point to Multipoint using video switching MCU
Shortcut name: Topo- Video-switch-MCU
+---+ +------------+ +---+
| A |------| Multipoint |------| B |
+---+ | Control | +---+
| Unit |
+---+ | (MCU) | +---+
| C |------| |------| D |
+---+ +------------+ +---+
Figure 7 - Point to Multipoint using relaying MCU
This PtM topology is, today, still deployed, although the RTCP-
terminating MCUs, as discussed in the next section, are perhaps
more common.. this topology, as well as the following one,
reflect today's lack of wide availability of IP multicast
technologies , as well as the simplicity of content switching when
compared to content mixing. The technology is commonly
implemented in what is known as "Video Switching MCUs".
A video switching MCU forwards to a participant a single media
stream, selected from the available streams. The criteria for
selection are often based on voice activity in the audio-visual
conference, but other conference management mechanisms (like
presentation mode or explicit floor control) are known to exist as
well.
The video switching MCU may also perform media translation to
modify the content in bit-rate, encoding, resolution; however it
still may indicate the original sender of the content through the
SSRC. In this case the values of the CC and CSRC fields are
retained.
Wenger, et al. Informational [Page 10]
INTERNET-DRAFT RTP Topologies September 17, 2006
If not terminating RTP, the RTCP Sender Reports are forwarded for
the currently selected sender. All RTCP receiver reports are
freely forward between the participants. In addition, the MCU may
also originate RTCP control traffic in order to control the
session and/or report on status from its viewpoint.
The video switching MCU has mostly the attributes of a translator.
However its stream selection is a mixing behaviour. This behaviour
has some RTP and RTCP issues associated with it. The suppression
of all but one media stream results in that most participants see
only a subset of the sent media streams at any given time; often a
single stream per conference. Therefore, RTCP receiver reports
only report on these streams. In consequence, the media senders
that are not currently forwarded receive a view of the session
that indicates their media streams disappearing somewhere en
route. This makes the use of RTCP for congestion control very
problematic. To avoid these issues the MCU needs to modify the
RTCP RRs.
2.3.6. Point to Multipoint using RTCP-terminating MCU
Shortcut name: Topo-RTCP-terminating-MCU
+---+ +------------+ +---+
| A |<---->| Multipoint |<---->| B |
+---+ | Control | +---+
| Unit |
+---+ | (MCU) | +---+
| C |<---->| |<---->| D |
+---+ +------------+ +---+
Figure 8 - Point to Multipoint using content modifying MCU
In this PtM scenario, each participant runs an RTP point-to-point
session between itself and the MCU, this is the mostly deployed
topology. The content that the MCU provides to each participant is
either:
a) A selection of the content received from the other
participants, or
b) The mixed aggregate of what the MCU receives from the other
PtP links, which are part of the same conference session.
In case a) the MCU may modify the content in bit-rate, encoding,
resolution. No explicit RTP mechanism is used to establish the
relationship between the original media sender and the version the
MCU sends. In other words, the outgoing session typically uses a
Wenger, et al. Informational [Page 11]
INTERNET-DRAFT RTP Topologies September 17, 2006
different SSRC, and may well use a different PT, even if this
different PT happens to be mapped to the same media type. (This
is the definition of this topology and distinguishes it from the
topologies previously discussed).
In case b) the MCU is the content source as it mixes the content
and then encodes it for transmission to a participant. The
participant's content that is included in the aggregated content
is not indicated through any explicit RTP mechanism. For example,
regardless of the number of streams that are aggregated, in the
MCU generated streams CC is zero and therefore no CSRC fields are
present (this is true for most shipping MCUS). The participants
contributing to the mix are reported using signalling mechanism
like conference event package in SIP.
The MCU is responsible for receiving the codec control messages
and handle them appropriately. In some cases, the reception of a
codec control message may result in the generation and
transmission of codec control messages by the MCU to some or all
of the other participants.
An MCU may transparently relay some codec control messages and
intercept, modify, and (when appropriate) generate codec control
messages of its own and transmit them to the media senders.
The main feature that sets this topology apart from what RFC 3550
describes, is the lack of an explicit RTP level indication of all
participants. If one were using the mechanisms available in RTP
and RTCP to signal this explicitly, the topology would follow the
approach of an RTP mixer. The lack of explicit indication has at
least the following potential problems:
1) Loop detection cannot be performed on the RTP level. When
carelessly connecting two misconfigured MCUs, a loop could be
generated.
2) There is no information about active media senders available
in the RTP packet. As this information is missing, receivers
cannot use it. It also deprive the participant's clients
information about who are actively sending in a machine
usable way. Thus preventing clients from doing indication of
currently active speakers in user interfaces, etc. It is
known in the signaling layer.
2.3.7. Combining Topologies
Topologies can be combined and linked to each other using mixers
or translators. Care must however be taken to how the SSRC space
is handled, mixers separate the SSRC space into two parts, while
Wenger, et al. Informational [Page 12]
INTERNET-DRAFT RTP Topologies September 17, 2006
translators maintain the space across themselves. Any hybrid, like
the video switching MCU, 2.3.5, requires considerable afterthought
on how RTCP is dealt with. But do note that the SSRC uniquenss
always needs to global across the different domains.
3. Security Considerations
The usage of mixers and translators do have impact on security and
the security functions used. The primary issue is that both mixers
and translators do modify packets, thus preventing the usage of
integrity and source authentication unless they are a trusted
device which takes part of the security context. If encryption is
employed the media translator and mixers will need to be able to
decrypt the media to perform its function. A transport translator
may be used without access to the security association in cases
they touches parts that are not included in the integrity
protection, for example IP address and UDP port numbers in a media
stream using SRTP [RFC3711]. However in general the translator or
mixer needs to be part of the signalling context and get the
necessary security associations established with its RTP session
participants.
Including the mixer and translator in the security context allows
the entity if subverted or misbehaving to perform a number of very
serious attacks as it has full access. It can perform all the
attacks possible, see RFC 3550 and any applicable profiles, as if
the media session was not protected at all, while giving the
impression to the session participants that they are protected
against them.
4. IANA Considerations
This document specifies no actions for IANA.
5. Acknowledgements
The authors would like to thank N.N.
Wenger, et al. Informational [Page 13]
INTERNET-DRAFT RTP Topologies September 17, 2006
6. References
6.1. Normative references
[AVPF] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J.
Rey, "Extended RTP Profile for Real-time Transport
Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC
4585, July 2006.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
Video Conferences with Minimal Control", STD 65, RFC
3551, July 2003.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and
K. Norrman, "The Secure Real-time Transport Protocol
(SRTP)", RFC 3711, March 2004.
6.2. Informative references
Any 3GPP document can be downloaded from the 3GPP web server,
"http://www.3gpp.org/", see specifications.
7. Authors' Addresses
Magnus Westerlund
Ericsson Research
Ericsson AB
SE-164 80 Stockholm, SWEDEN
Phone: +46 8 7190000
EMail: magnus.westerlund@ericsson.com
Stephan Wenger
Nokia Corporation
P.O. Box 100
FIN-33721 Tampere
FINLAND
Phone: +358-50-486-0637
EMail: stewe@stewe.org
Wenger, et al. Informational [Page 14]
INTERNET-DRAFT RTP Topologies September 17, 2006
8. List of Changes relative to previous drafts
Full Copyright Statement
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on
an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to
pertain to the implementation or use of the technology described
in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights.
Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at
http://www.ietf.org/ipr.
Wenger, et al. Informational [Page 15]
INTERNET-DRAFT RTP Topologies September 17, 2006
The IETF invites any interested party to bring to its attention
any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
RFC Editor Considerations
The RFC editor is requested to replace all occurrences of XXXX
with the RFC number this document receives.
Wenger, et al. Informational [Page 16]