Network Working Group                                      H. Alvestrand
Internet-Draft                                                    Google
Intended status: Informational                             June 17, 2012
Expires: December 19, 2012

               Why RTP Sessions Should Be Content Neutral


   This document is not intended for publication as an RFC.

   It gives the underpinning arguments for why the idea that RTP
   sessions and MIME top level types are related is a deeply broken
   paradigm, and that we need to get away from it.

   These arguments are solely the opinion of the listed author.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 19, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal

Alvestrand              Expires December 19, 2012               [Page 1]

Internet-Draft           RTP session neutrality                June 2012

   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Stuff that can be carried over RTP . . . . . . . . . . . . . .  3
   3.  What the network can do to "help" a flow . . . . . . . . . . .  4
   4.  The definition of an RTP "session" . . . . . . . . . . . . . .  5
   5.  Proper and improper use of RTP sessions  . . . . . . . . . . .  6
   6.  The Pernicious Effect of SDP on the Media Type System  . . . .  8
   7.  The Mixer Fallacy  . . . . . . . . . . . . . . . . . . . . . .  8
   8.  Corrective Actions . . . . . . . . . . . . . . . . . . . . . .  9
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   10. Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     12.1.  Normative References  . . . . . . . . . . . . . . . . . . 10
     12.2.  Informative References  . . . . . . . . . . . . . . . . . 10
   Appendix A.  Change log  . . . . . . . . . . . . . . . . . . . . . 10
     A.1.   Version -00 to -01  . . . . . . . . . . . . . . . . . . . 11
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11

Alvestrand              Expires December 19, 2012               [Page 2]

Internet-Draft           RTP session neutrality                June 2012

1.  Introduction

   The RTP universe of functionality can, for the purposes of this
   argument, be reduced to two components: The RTP wire protocol
   [RFC3550] (consisting of the RTP packet format, the RTCP reporting
   format, and the handling rules for RTP sessions), and the SDP session
   description language.  For the purposes of this argument, the SDP
   functionality for describing non-RTP sessions is ignored, as is the
   ability to negotiate RTP sessions by other means than SDP.

   This document argues that the RTP mechanisms of multiple RTP sessions
   make sense for a lot of purposes, but does NOT make sense for a
   mandated separation between different top-level MIME media types.

2.  Stuff that can be carried over RTP

   RTP, according to its own description an application layer framework
   component, is a suitable protocol for framing data that needs to
   travel across the network in a time-sensitive fashion, with the idea
   that it is going to be presented at the receiving end in a time
   sequence.  Normally, the data (usually called "media") is streamed
   across the network at a rate approximately equal to the speed at
   which it is intended to be presented ("real time data").

   Examples of data carried over RTP include:

   o  G.711 Audio - 64 Kbits/second, completely fixed bitrate

   o  GSM AMR Audio - 4.75 to 12.2 Kbits/second, variable bitrate

   o  OPUS audio compressed into near-incomprehensibility - 6 kbits/
      second, variable bitrate

   o  OPUS audio carrying high fidelity music - 500 kbits/second,
      variable bitrate

   o  QQVGA (160x120) video at 15 FPS in H.264 compression - 50 Kbits/
      second, variable bitrate, lots of schemes for error concealment
      and correction

   o  HD video at 1920x1080@60 in H.264 compression - 1.4 Mbits/second

   o  Real-time text (T.140) - very few bits/second

   o  [RFC4733] DTMF tone signalling - very few bits/second

   Schemes designed to increase the reliability of data carried across

Alvestrand              Expires December 19, 2012               [Page 3]

Internet-Draft           RTP session neutrality                June 2012

   RTP include:

   o  Forward error correction (FEC)

   o  Duplicated streams, codec-independent

   o  Duplicate sending of important information within the codec

   o  NAK-based resends signalled over RTCP

   o  Stream reset requests signalled over RTCP

   Some of these are only applicable to media types (in particular,
   "send me a new I-frame" doesn't make sense if you don't have
   I-frames).  Others can be used with any type of data.

3.  What the network can do to "help" a flow

   The network can apply various things to help the session data arrive
   according to policy:

   o  Capacity reservation for specific flows

   o  Priority queueing, sending certain types of data faster than

   o  Filtering or blocking certain types of communication that the
      managers deem inappropriate

   The network can do these things in multiple ways, including so-called
   "deep packet inspection", but the most common techniques require
   being able to identify either the requested handling of the packets
   (DiffServ using DSCP codepoints) or recognizing the flow based on its
   5-tuple (source and destination address and port + protocol),
   possibly correlating the 5-tuple with information carried to the
   router through some kind of management interface (either connected to
   the session setup protocol or managed via some other interface such
   as RSVP/IntServ), and behaving accordingly.

   All techniques have limitations; DSCP requires a certain trust in the
   endpoints using the codepoints for "deserving traffic"; deep packet
   inspection requires that packets be unencrypted, and stream control
   requires that 5-tuples be related back to their putative purpose
   either by heuristics or by being connected to management protocols.

Alvestrand              Expires December 19, 2012               [Page 4]

Internet-Draft           RTP session neutrality                June 2012

4.  The definition of an RTP "session"

   An RTP session is defined in RFC 3550 section 3:

   "RTP Session: An association among a set of participants
   communicating with RTP.  A participant may be involved in multiple
   RTP sessions at the same time.  In a multimedia session, each medium
   is typically carried in a separate RTP session with its own RTCP
   packets unless the encoding itself multiplexes multiple media into a
   single data stream.  A participant distinguishes multiple RTP
   sessions by reception of different sessions using different pairs of
   destination transport addresses, where a pair of transport addresses
   comprises one network address plus a pair of ports for RTP and RTCP.
   All participants in an RTP session may share a common destination
   transport address pair, as in the case of IP multicast, or the pairs
   may be different for each participant, as in the case of individual
   unicast network addresses and port pairs.  In the unicast case, a
   participant may receive from all other participants in the session
   using the same pair of ports, or may use a distinct pair of ports for

   The distinguishing feature of an RTP session is that each maintains a
   full, separate space of SSRC identifiers (defined next).  The set of
   participants included in one RTP session consists of those that can
   receive an SSRC identifier transmitted by any one of the participants
   either in RTP as the SSRC or a CSRC (also defined below) or in RTCP.
   For example, consider a three- party conference implemented using
   unicast UDP with each participant receiving from the other two on
   separate port pairs.  If each participant sends RTCP feedback about
   data received from one other participant only back to that
   participant, then the conference is composed of three separate point-
   to-point RTP sessions.  If each participant provides RTCP feedback
   about its reception of one other participant to both of the other
   participants, then the conference is composed of one multi-party RTP
   session.  The latter case simulates the behavior that would occur
   with IP multicast communication among the three participants.

   The RTP framework allows the variations defined here, but a
   particular control protocol or application design will usually impose
   constraints on these variations."

   An RTP session is thus characterized by:

   o  A single SSRC space

   o  A single reporting space - all participants see all RTCP messages

Alvestrand              Expires December 19, 2012               [Page 5]

Internet-Draft           RTP session neutrality                June 2012

   o  Non overlapping transport addresses

   As we can see here, it is not possible to tell from a single packet
   whether it belongs to the same session as another packet or not; if
   we observe two packets with the same source and destination
   addresses, it seems safe to assume that they belong to the same
   session, but for all other cases, deciding whether or not two packets
   or packet streams are in the same session requires knowledge of the
   configuration of the session.

5.  Proper and improper use of RTP sessions

   Section 5.2 of RFC 3550 gives the canonical statement of RTP session

   "In RTP, multiplexing is provided by the destination transport
   address (network address and port number) which is different for each
   RTP session.  For example, in a teleconference composed of audio and
   video media encoded separately, each medium SHOULD be carried in a
   separate RTP session with its own destination transport address."

   This sentence makes two very important leaps of faith:

   o  That distinguishing sessions by destination transport address is
      necessary and sufficient

   o  That it is appropriate to give strong guidance about the
      distribution of media streams across RTP sessions

   Both of these are shaky.

   As the cost of connecting ports has increased due to NATs, firewalls
   and IPv4 exhaustion, there has been a strong push towards using fewer
   ports, and indeed fewer 5-tuples, so that it is not uncommon to see
   flows that can be distinguished only by source address; there have
   also been proposals floated for putting multiple RTP sessions across
   one 5-tuple [draft-westerlund-avtcore-transport-multiplexing].

   The cost of ports is also one factor pushing towards multiple media
   types in one RTP session; however, the more important underlying
   challenge is that this distinction is neither necessary nor
   sufficient to distinguish the cases in which RTP media streams want
   to have differential treatment from the network, and thus need to
   assign streams either to the same session (to guarantee the same
   treatment) or to different sessions (to allow for differential

Alvestrand              Expires December 19, 2012               [Page 6]

Internet-Draft           RTP session neutrality                June 2012

   Consider the list of scenarios above, and imagine RTP being used for:

   o  A videoconference between 3 people who know each other well, using
      low end equipment and barely-sufficient bandwidth pipes

   o  A Berlin Philharmonic concert broadcast featuring Brahms' "Tragic

   o  A point-to-point transmission of a Manchester United vs Liverpool
      football match

   o  A professor's lecture, with "talking head" presentation
      simultaneous with slides, and opportunity for students to ask

   In each of these contexts, the tradeoff between audio and video is
   different; in the Brahms case, the audio (which is the point of the
   transmission) is likely to be transmitted at higher bandwidth than
   the video, and if one of them has to have his bandwidth reduced, the
   video should be reduced in quality before the audio is.  In contrast,
   in the football match, spectators care about seeing the action; as
   long as they can understand the commentator's voice, the audio
   quality is "good enough".

   In the lecture case, quality of the lecturer's slides and voice is
   critical; video from students is almost irrelevant to the larger

   A logical arrangement of media streams in RTP sessions would be to
   group them by importance, and send them with appropriate traffic
   engineering structuring; in the lecturer case, the slides and the
   professor's voice would be carried in a high priority media stream,
   while the professor's picture would have second priority, and voice
   and video from students would be made available on an "if it works,
   it works" basis.  Someone may easily decide that the student feedback
   track is not worth listening to, or remove the talking head of the
   professor; it would be strange indeed to try to listen to the lecture
   without viewing the supporting material.

   This illustrates two points:

   o  The RTP session mechanism, using the 5-tuple as the unit of
      differentiation, is a simple, effective and readily deployed
      mechanism for separating streams that require different treatment
      from the network in easily distinguished partitions.

   o  The assignment of media to such partitions is application
      dependent, and the decision on how to group and how to prioritize

Alvestrand              Expires December 19, 2012               [Page 7]

Internet-Draft           RTP session neutrality                June 2012

      needs to be taken by the application developer.

6.  The Pernicious Effect of SDP on the Media Type System

   In the list of reasons to argue against the inappropriate advice
   quoted above from RFC 3550, its pernicious influence on the MIME type
   system bears mentioning.

   The MIME type system, as described in [RFC4288], consists of a two-
   level hierarchy: A top level media type (text, audio, video,
   application and so on), and a media subtype that identifies (to some
   level of precision) the format of the data being carried.

   The system has mostly been respected, with some types (for instance
   PDF) forever being borderline between the various categories, but
   over the years, a few types have been entered into the system with
   their top level types being decided, not by the nature of their
   content, but by *the type with which their proponents wished to have
   them multiplexed in an RTP session*.

   This includes the types that designate repair mechanism (audio/
   parityfec, audio/red), timed data transfer (audio/clearmode) and that
   ultimate triumph of expediency over cleanliness: audio/t140c, audio/
   3gpp-tt and video/3gpp-tt: Text types registered as audio and video.

   For each of these, there is a fairly natural fit in the normal MIME
   hierarchy (application/ for the mechanism types and text/ for the
   text types); the assignment of them to the "media" top level types
   has been done as an expediency in order to get around the stultifying
   results of the advice given in RFC 3550.

7.  The Mixer Fallacy

   One of the arguments in favour of the RFC 3550 separation has been
   that a mixer can be deployed that knows nothing of the semantics of
   the media streams; it can "just mix them".

   This applies partly to exactly one type of application: The audio
   conference bridge.

   For a video mixer, it does not apply; external logic (such as
   listening to the audio voume of the corresponding audio channel, or
   explicit flow control) is needed to select the right video stream to
   send out.  And even for larger audio bridges, it is common to have
   functions like floor control, remote mute and other participant
   management tools in order to control the bridge - as soon as such

Alvestrand              Expires December 19, 2012               [Page 8]

Internet-Draft           RTP session neutrality                June 2012

   tools are introduced, they are as relevant for a multi-media-type RTP
   session as they are to a single-media-type RTP session.

8.  Corrective Actions

   There are not many protocol changes that really need to be taken to
   solve this problem.

   The basic mechanism of RTP is media type independent.  There are some
   RTCP issues with dealing with RTP flows of wildly varying bandwidth,
   but as can be seen from the table of media types in the introduction,
   this issue isn't solved by separating them; the bandwidth ranges of
   the types overlap.

   The thing that binds most in the current protocol suite is the
   conservation of the inappropriate binding in the SDP media
   description/negotiation format, where the MIME type is represented in
   two pieces, one of which is tied to the RTP session rather than to
   the payload type it is associated with, and there are fairly well-
   understood ways to get around that, such as the BUNDLE grouping
   extension [I-D.ietf-mmusic-sdp-bundle-negotiation].  Better designed
   negotiation protocols would not have this problem at all.

   In order to get out of the bind that SDP places us in, a change such
   as BUNDLE should be adopted, and the IETF should record that the
   advice from RFC 3550 is to be considered *advice*, not command: It is
   sometimes appropriate to separate media streams according to top
   level type, and sometimes not appropriate to do so.  The application
   is the one that needs to make this decision.

9.  IANA Considerations

   This document makes no request of IANA.

   Note to RFC Editor: this section may be removed on publication as an

10.  Security Considerations

   This note does not discuss any change that the author thinks would
   have any significant influence on the security of RTP traffic.

Alvestrand              Expires December 19, 2012               [Page 9]

Internet-Draft           RTP session neutrality                June 2012

11.  Acknowledgements

   This note has benefited greatly from exchanges with Colin Perkins,
   whose unwavering support of a sharply differing viewpoint has served
   to inform the arguments presented in this document.  Magnus
   Westerlund and Christer Holmberg also deserve special mention for
   engaging constructively in the discussion.

12.  References

12.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

12.2.  Informative References

              Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation
              Using Session Description Protocol (SDP) Port Numbers",
              draft-ietf-mmusic-sdp-bundle-negotiation-00 (work in
              progress), February 2012.

              Westerlund, M. and C. Perkins, "Multiple RTP Session on a
              Single Lower-Layer Transport",
              draft-westerlund-avtcore-transport-multiplexing-01 (work
              in progress), October 2011.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
              Registration Procedures", BCP 13, RFC 4288, December 2005.

   [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
              Digits, Telephony Tones, and Telephony Signals", RFC 4733,
              December 2006.

Appendix A.  Change log

Alvestrand              Expires December 19, 2012              [Page 10]

Internet-Draft           RTP session neutrality                June 2012

A.1.  Version -00 to -01

   Version number bump, since the debate is ongoing.  A few nits fixed.
   Added the "Mixer Fallacy" section.  Updated reference to "bundle" to
   new draft name.

   This should be the last version, since the author is in the process
   of working with the authors of
   [I-D.westerlund-avtcore-transport-multiplexing] to achieve a jointly
   agreeable text.  Hopefully this will take lesss than 6 months.

Author's Address

   Harald T. Alvestrand
   Kungsbron 2
   Stockholm,   11122


Alvestrand              Expires December 19, 2012              [Page 11]