AVTCORE                                                        J. Lennox
Internet-Draft                                                     Vidyo
Updates: 3550 (if approved)                                M. Westerlund
Intended status: Standards Track                                Ericsson
Expires: January 10, 2013                                   July 9, 2012

Real-Time Transport Protocol (RTP) Considerations for Endpoints Sending
                         Multiple Media Streams


   This document expands and clarifies the behavior of the Real-Time
   Transport Protocol (RTP) endpoints when they are sending multiple
   media streams in a single RTP session.  In particular, issues
   involving Real-Time Transport Control Protocol (RTCP) messages are

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 10, 2013.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of

Lennox & Westerlund     Expires January 10, 2013                [Page 1]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Use Cases For Multi-Stream Endpoints . . . . . . . . . . . . .  3
     3.1.  Multiple-Capturer Endpoints  . . . . . . . . . . . . . . .  3
     3.2.  Multi-Media Sessions . . . . . . . . . . . . . . . . . . .  4
     3.3.  Multi-Stream Mixers  . . . . . . . . . . . . . . . . . . .  4
   4.  Multi-Stream Endpoint RTP Media Recommendations  . . . . . . .  4
   5.  Multi-Stream Endpoint RTCP Recommendations . . . . . . . . . .  5
     5.1.  Transmission of RTCP Reception Statistics  . . . . . . . .  6
     5.2.  Consequences of Restricted RTCP Reception Statistics . . .  7
     5.3.  Alternate Restriction Proposal . . . . . . . . . . . . . .  7
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  8
   7.  Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . .  8
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  8
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     9.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
     9.2.  Informative References . . . . . . . . . . . . . . . . . .  9
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10

Lennox & Westerlund     Expires January 10, 2013                [Page 2]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

1.  Introduction

   At the time The Real-Time Tranport Protocol (RTP) [RFC3550] was
   originally written, and for quite some time after, endpoints in RTP
   sessions typically only transmitted a single media stream per RTP
   session, where separate RTP sessions were typically used for each
   distinct media type.

   Recently, however, a number of scenarios have emerged (discussed
   further in Section 3) in which endpoints wish to send multiple RTP
   media streams, distinguished by distinct RTP synchronization source
   (SSRC) identifiers, in a single RTP session.  Although RTP's initial
   design did consider such scenarios, the specification was not
   consistently written with such use cases in mind.  The specifications
   are thus somewhat unclear.

   The purpose of this document is to expand and clarify [RFC3550]'s
   language for these use cases.  The authors believe this does not
   result in any major normative changes to the RTP specification,
   however this document defines how the RTP specification shall be
   interpreted.  In these cases, this document updates RFC3550.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "OPTIONAL" in this document are to be interpreted as described in RFC
   2119 [RFC2119] and indicate requirement levels for compliant

3.  Use Cases For Multi-Stream Endpoints

   This section discusses several use cases that have motivated the
   development of endpoints that send multiple streams in a single RTP

3.1.  Multiple-Capturer Endpoints

   The most straightforward motivation for an endpoint to send multiple
   media streams in a session is the scenario where an endpoint has
   multiple capture devices of the same media type and characteristics.
   For example, telepresence endpoints, of the type described by the
   CLUE Telepresence Framework [I-D.ietf-clue-framework] is designed,
   often have multiple cameras or microphones covering various areas of
   a room.

Lennox & Westerlund     Expires January 10, 2013                [Page 3]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

3.2.  Multi-Media Sessions

   Recent work has been done in RTP
   [I-D.westerlund-avtcore-multi-media-rtp-session] and SDP
   [I-D.ietf-mmusic-sdp-bundle-negotiation] to update RTP's historical
   assumption that media streams of different media types would always
   be sent on different RTP sessions.  In this work, a single endpoint's
   audio and video media streams (for example) are instead sent in a
   single RTP session.

3.3.  Multi-Stream Mixers

   There are several RTP topologies which can involve a central box
   which itself generates multiple media streams in a session.

   One example is a mixer providing centralized compositing for a multi-
   capturer scenario like the one described in Section 3.1.  In this
   case, the centralized node is behaving much like a multi-capturer
   endpoint, generating several similar and related sources.

   More complicated is the Source Projecting Mixer, which is a central
   box that receives media streams from several endpoints, and then
   selectively forwards modified versions of some of the streams toward
   the other endpoints it is connected to.  Toward one destination, a
   separate media source appears in the session for every other source
   connected to the mixer, "projected" from the original streams, but at
   any given time many of them may appear to be inactive (and thus
   receivers, not senders, in RTP).  This box is an RTP mixer, not an
   RTP translator, in that it terminates RTCP reporting about the mixed
   streams, and it can re-write SSRCs, timestamps, and sequence numbers,
   as well as the contents of the RTP payloads, and can turn sources on
   and off at will without appearing to be generating packet loss.  Each
   projected stream will typically preserve its original RTCP source
   description (SDES) information.

4.  Multi-Stream Endpoint RTP Media Recommendations

   While an endpoint MUST (of course) stay within its share of the
   available session bandwidth, as determined by signalling and
   congestion control, this need not be applied independently or
   uniformly to each media stream.  In particular, session bandwidth MAY
   be reallocated among an endpoint's media streams, for example by
   varying the bandwidth use of a variable-rate codec, or changing the
   codec used by the media stream, up to the constraints of the
   session's negotiated (or declared) codecs.  This includes enabling or
   disabling media streams as more or less bandwidth becomes available.

Lennox & Westerlund     Expires January 10, 2013                [Page 4]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

5.  Multi-Stream Endpoint RTCP Recommendations

   The Real-Time Transport Control Protocol (RTCP) is defined in Section
   6 of [RFC3550], but it is largely documented in terms of
   "participants".  For multi-media-stream endpoints, it is generally
   most useful to interpret the specification such that each media
   stream is a separate "participant".

   For each of an endpoint's media media streams, whether or not it is
   currently being sent, SR/RR and SDES packets MUST be sent at least
   once per RTCP report interval.  (For discussion of the content of SR
   or RR packets' reception statistic reports, see Section 5.1.)

   When a new media stream is added to a unicast session, the sentence
   in [RFC3550]'s Section 6.2 applies: "For unicast sessions ... the
   delay before sending the initial compound RTCP packet MAY be zero."
   Thus, endpoints MAY send an initial RTCP packet for the media stream
   immediately upon adding to the session.

   Similarly, [RFC3550] Section 6.1 gives the following advice to RTP
   translators and mixers:

      It is RECOMMENDED that translators and mixers combine individual
      RTCP packets from the multiple sources they are forwarding into
      one compound packet whenever feasible in order to amortize the
      packet overhead (see Section 7).  An example RTCP compound packet
      as might be produced by a mixer is shown in Fig. 1.  If the
      overall length of a compound packet would exceed the MTU of the
      network path, it SHOULD be segmented into multiple shorter
      compound packets to be transmitted in separate packets of the
      underlying protocol.  This does not impair the RTCP bandwidth
      estimation because each compound packet represents at least one
      distinct participant.  Note that each of the compound packets MUST
      begin with an SR or RR packet.
   Note: To avoid confusion, an RTCP packet is an individual item, such
   as an Sender Report (SR), Receiver Report (RR), Source Description
   (SDES), Goodbye (BYE), Application Defined (APP), Feedback [RFC4585]
   or Extended Report (XR) [RFC3611] packet.  A compound packet is the
   combination of two or more such RTCP packets where the first packet
   must be an SR or an RR packet, and which contains a SDES packet
   containing an CNAME item.  Thus the above results in compound RTCP
   packets that contain multiple SR or RR packets from different sources
   as well as any of the other packet types.  There are no restrictions
   on the order the packets may occur within the compound packet, except
   the regular compound rule, i.e. starting with an SR or RR.

   This advice applies to multi-media-stream endpoints as well, with the
   same restrictions and considerations.  (Note, however, that the last

Lennox & Westerlund     Expires January 10, 2013                [Page 5]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

   sentence does not apply to AVPF [RFC4585] or SAVPF [RFC5124] feedback
   packets if Reduced-Size RTCP [RFC5506] is in use.)

   Open Issue: Any clarifications on how one handle the scheduling of
   RTCP transmissions when having multiple sources?  Alternatives
   include delaying one source to the next source's transmission, or to
   group multiple sources to use only one scheduling.

5.1.  Transmission of RTCP Reception Statistics

   As required by [RFC3550], an endpoint MUST send reception reports
   about every active media stream it is receiving, from at least one
   local source.

   However, a naive application of the RTP specification's rules could
   be quite inefficient.  In particular, if a session has N media
   sources (active and inactive), and had S senders in each reporting
   interval, there would either be N*S report blocks per reporting
   interval, or (per the round-robinning recommendations of [RFC3550]
   Section 6.1) reception sources would be unnecessarily round-robinned.
   In a session where most media sources become senders reasonably
   frequently, this results in quadratically many reception report
   blocks in the conference, or reporting delays proportional to the
   number of session members.

   Since traffic is received by endpoints, however, rather than by media
   sources, there is not actually any need for this quadratic expansion.
   All that is needed is for each endpoint to report all the remote
   sources it is receiving.

   Thus, an endpoint SHOULD NOT send reception reports from one of its
   own media sources about another one of its own ("self-reports").
   Similarly, an endpoint with multiple media sources SHOULD NOT send
   reception reports about a remote media source from more than one of
   its local sources ("cross-reports").  Instead, it SHOULD pick one of
   its local media sources as the "reporting" source for each remote
   media source, and use it to send reception reports for that remote
   source; all its other media sources SHOULD NOT send any reception
   reports for that remote media source.

   An endpoint MAY choose different local media sources as the reporting
   source for different remote media sources (for example, it could
   choose to send reports about remote audio sources from its local
   audio source, and reports about remote video sources from its local
   video source), or it MAY choose a single local source for all its
   reports.  If the reporting source leaves the session (sends BYE),
   another reporting source MUST be chosen.  This "reporting" source
   SHOULD also be the source for any AVPF feedback messages about its

Lennox & Westerlund     Expires January 10, 2013                [Page 6]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

   remote sources, as well.

5.2.  Consequences of Restricted RTCP Reception Statistics

   The RTCP traffic generated by receivers following the rules in
   Section 5.1 might appear, to observers unaware of the recommendations
   of this specification or knowledge about which end-points are
   associated with which SSRCs, to be generated by receivers who are
   experiencing a network disconnection.

   This could be a potentially critical problem when one uses RTCP for
   congestion control, as a sender might think that it is sending so
   much traffic that it is causing complete congestion collapse.  At the
   same time, however, a congestion control solution is likely not
   interested in performing unecessary processing based on multiple
   reporting sources having identical statistics.  A congestion control
   algorithm is likely more interested in frequent reporting from one
   specific source than multiple sources at the same end-point based on
   common statistics.  That would reduce the uncertainty that sources
   are from the same end-point, and likely improve the interarrival time
   of the reporting, compared to multiple SSRCs which, by the RTCP
   algorithm, are deliberately desynchronized.  However, this would
   clearly require clarifications on how the RTCP timer rules are to be

   However, such an interpretation of the session statistics would
   require a fairly sophisticated RTCP analysis.  Any receiver of RTCP
   statistics which is just interested in information about itself needs
   to be prepared that any given reception report might not contain
   information about a specific media source, because reception reports
   in large conferences can be round-robined.

   Thus, it is unclear to what extent this restriction would actually
   cause trouble in practice.

5.3.  Alternate Restriction Proposal

   If there are indeed scenarios in which the rules of Section 5.1 do
   cause troubles, an alternative solution would be to explicitly
   signal, in RTCP, which groups of media sources originate from a
   single endpoint.  Thus, within a group of sources, receivers could
   know that there would not be self-reports, and only a single SSRC
   would be providing cross-reports.  In such a mode, the signaling
   protocol would need to negotiate, or declare, that the mode was in

   The next question would be to determine how to indicate the groups of
   sources for this purpose.  The sources' CNAMEs would probably not be

Lennox & Westerlund     Expires January 10, 2013                [Page 7]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

   sufficient, as some of the use cases described in Section 3, notably
   the source-projecting mixer, result in a single endpoint generating
   sources with multiple CNAME values.  Thus, a new SDES item would be
   needed for these purposes.

   TBD: If this solution is indeed taken, define the specifics of this
   SDES item, and the signaling needed to indicate its use.

6.  Security Considerations

   In the secure RTP protocol (SRTP) [RFC3711], the cryptographic
   context of a compound SRTCP packet is the SSRC of the sender of the
   first RTCP (sub-)packet.  This could matter in some cases, especially
   for keying mechanisms such as Mikey [RFC3830] which use per-SSRC

   Other than that, the standard security considerations of RTP apply;
   sending multiple media streams from a single endpoint does not appear
   to have different security consequences than sending the same number
   of streams.

7.  Open Issues

   At this stage this document contains a number of open issues.  The
   below list tries to summarize the issues:
   1.  Any clarifications on how to handle the RTCP scheduler when
       sending multiple sources in one compound packet.
   2.  Shall suppression of self-reporting, i.e. reporting one's other
       SSRCs in any SR/RR, be applied?
   3.  Shall suppression of cross-reporting be used, i.e. each end-point
       uses only one SSRC to report on any non-local SSRCs being
       received?  If so what method should be applied:
       1.  Implicit, by just not report using any other SSRC
       2.  Explicit binding of SSRCs that are being commonly reported,
           either using SDES or another packet type, to explicitly
           indicate the SSRCs on whose behalf the report applies.
       3.  Add any specific RTCP scheduler considerations.

8.  IANA Considerations

   This document makes no requests of IANA.

   Note to the RFC Editor: please remove this section before

Lennox & Westerlund     Expires January 10, 2013                [Page 8]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

   (Note: This section may change if the alternative proposal of
   Section 5.3 is adopted.)

9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
              "Extended RTP Profile for Real-time Transport Control
              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
              July 2006.

   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
              Real-time Transport Control Protocol (RTCP)-Based Feedback
              (RTP/SAVPF)", RFC 5124, February 2008.

   [RFC5506]  Johansson, I. and M. Westerlund, "Support for Reduced-Size
              Real-Time Transport Control Protocol (RTCP): Opportunities
              and Consequences", RFC 5506, April 2009.

9.2.  Informative References

              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
              "Framework for Telepresence Multi-Streams",
              draft-ietf-clue-framework-06 (work in progress),
              July 2012.

              Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation
              Using Session Description Protocol (SDP) Port Numbers",
              draft-ietf-mmusic-sdp-bundle-negotiation-00 (work in
              progress), February 2012.

              Westerlund, M., Perkins, C., and J. Lennox, "Multiple

Lennox & Westerlund     Expires January 10, 2013                [Page 9]

Internet-Draft       RTP Multi-Stream Considerations           July 2012

              Media Types in an RTP Session",
              draft-westerlund-avtcore-multi-media-rtp-session-00 (work
              in progress), July 2012.

   [RFC3611]  Friedman, T., Caceres, R., and A. Clark, "RTP Control
              Protocol Extended Reports (RTCP XR)", RFC 3611,
              November 2003.

   [RFC3830]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
              Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
              August 2004.

Authors' Addresses

   Jonathan Lennox
   Vidyo, Inc.
   433 Hackensack Avenue
   Seventh Floor
   Hackensack, NJ  07601

   Email: jonathan@vidyo.com

   Magnus Westerlund
   Farogatan 6
   SE-164 80 Kista

   Phone: +46 10 714 82 87
   Email: magnus.westerlund@ericsson.com

Lennox & Westerlund     Expires January 10, 2013               [Page 10]