MMUSIC                                                       A. B. Roach
Internet-Draft                                                   Mozilla
Intended status: Informational                          January 31, 2013
Expires: August 4, 2013


       Thoughts on syntax for representing multiple media streams
                      draft-roach-mmusic-mlines-00

Abstract

   This document briefly explores the ramifications of combining
   multiple media streams into one SDP m= section versus expressing each
   in its own m= section.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 4, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.





Roach                    Expires August 4, 2013                 [Page 1]


Internet-Draft             Media Stream Syntax              January 2013


1.  Introduction

   As part of the ongoing RTCWEB and CLUE work, it has become clear that
   the current mechanisms in SDP are insufficient for describing complex
   sessions with multiple streams.  Two competing schools of thought
   have emerged.  One holds that the m= lines should apply to RTP
   sessions, regardless of how many media streams they contain.  Another
   holds that m= lines should apply to media streams exclusively, and
   that an additional mechanism should be applied to combine multiple
   streams into a single RTP session, if necessary.


2.  Alternatives

2.1.  Alternative 1: Multiple streams per m= section

   One approach to specifying multiple streams in a single RTP session
   is to put information for several streams into a single m= section;
   and, by doing do, implicitly combine them into a single session.

   To maintain some level of backwards compataibility with SDP, this
   approach might choose to have one m= section for audio and a second
   for video (with additional m= sections for other media types if they
   are used in the future), combining those sections with a=group:BUNDLE
   [I-D.ietf-mmusic-sdp-bundle-negotiation]; we will call this
   "Alternative 1a".  An alternate approach would be the definition of a
   new media type which effectively allows transmission of any kind of
   media, thereby avoiding the need to bundle multiple sections together
   at all.  A syntax for such an approach is proposed by
   [I-D.holmberg-mmusic-sdp-mmt-negotiation].  We will call this
   "Alternative 1b".

   In both of the cases described above, certain SDP attributes might be
   targeted at only one of the streams in an RTP session.  These
   attributes can be matched up with individual streams using the
   "a=ssrc" extension defined in [RFC5576].

   For "Alternative 1a", we have the additional challenge of specifying
   attributes that apply to the entire RTP session, such as a=rtcp-fb
   and ICE candidate parameters.  One approach would be inclusion of
   such parameters only in the first m= section within a bundle, with
   the implication that they apply to the entire session.









Roach                    Expires August 4, 2013                 [Page 2]


Internet-Draft             Media Stream Syntax              January 2013


2.1.1.  Alternative 1a: One section per RTP session per type

   v=0
   o=- 2890844526 2890844526 IN IP4 host.example.com
   s=
   c=IN IP4 host.example.com
   t=0 0
   a=group:BUNDLE c1 c2
   m=audio 10000 RTP/AVP 0 8 97
   a=mid:c1
   a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
   a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
      192.0.2.240 rport 51091
   a=rtpmap:0 PCMU/8000
   a=rtpmap:8 PCMA/8000
   a=rtpmap:97 iLBC/8000
   a=ssrc:11111 label:speaker-audio
   a=ssrc:22222 label:floor-mic
   m=video 10000 RTP/AVP 31 32
   a=mid:c2
   a=rtpmap:31 H261/90000
   a=rtpmap:32 MPV/90000
   a=ssrc:33333 label:speaker-video
   a=ssrc:44444 label:slides



























Roach                    Expires August 4, 2013                 [Page 3]


Internet-Draft             Media Stream Syntax              January 2013


2.1.2.  Alternative 1b: One section per RTP session

   v=0
   o=- 2890844526 2890844526 IN IP4 host.example.com
   s=
   c=IN IP4 host.example.com
   t=0 0
   a=group:MMT foo bar zoe
   m=anymedia 10000 RTP/AVP 0 8 97 31 32
   a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
   a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
      192.0.2.240 rport 51091
   a=rtpmap:0 PCMU/8000
   a=rtpmap:8 PCMA/8000
   a=rtpmap:97 iLBC/8000
   a=rtpmap:31 H261/90000
   a=rtpmap:32 MPV/90000
   a=mmtype:0 audio
   a=mmtype:8 audio
   a=mmtype:97 audio
   a=mmtype:31 video
   a=mmtype:32 video
   a=ssrc:11111 label:speaker-audio
   a=ssrc:22222 label:floor-mic
   a=ssrc:33333 label:speaker-video
   a=ssrc:44444 label:slides

2.2.  Alternative 2: Single stream per m= section

   An alternate proposal is constraining one m= section to talk about a
   single media stream.  Like alternative 1a, above, the BUNDLE
   extension is used to combine several m= sections into a single RTP
   session.  Any attributes that are applicable to a single media stream
   can be correlated by putting them in the corresponding m= section.
   Any attributes that apply to the transport paramters (e.g., rtcp-fb,
   ICE parameters) are conveyed in the first m= section within the
   bundle (alternate schemes are possible, but this seems the simplest
   and most straightforward).













Roach                    Expires August 4, 2013                 [Page 4]


Internet-Draft             Media Stream Syntax              January 2013


   v=0
   o=- 2890844526 2890844526 IN IP4 host.example.com
   s=
   c=IN IP4 host.example.com
   t=0 0
   a=group:BUNDLE c1 c2 c3 c4
   m=audio 10000 RTP/AVP 0 8 97
   a=mid:c1
   a=label:speaker-audio
   a=rtpmap:0 PCMU/8000
   a=rtpmap:8 PCMA/8000
   a=rtpmap:97 iLBC/8000
   a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
   a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
      192.0.2.240 rport 51091
   m=audio 10000 RTP/AVP 0 8 97
   a=mid:c2
   a=label:floor-mic
   a=rtpmap:0 PCMU/8000
   a=rtpmap:8 PCMA/8000
   a=rtpmap:97 iLBC/8000
   m=video 10000 RTP/AVP 31 32
   a=mid:c3
   a=label:speaker-video
   a=rtpmap:31 H261/90000
   a=rtpmap:32 MPV/90000
   m=video 10000 RTP/AVP 31 32
   a=mid:c4
   a=label:slides
   a=rtpmap:31 H261/90000
   a=rtpmap:32 MPV/90000

2.3.  Pros and Cons

2.3.1.  Codec Selection

   Currently, in SDP and the various documents that rely on it (such as
   [RFC3264]), there are certain assumptions made about the ordinality
   of streams to m= sections.  Consider, for example, wanting to convey
   two audio streams with a low-bandwidth voice codec preferred for one,
   but a high-quailty codec preferred for the other.  RFC 3264 has rules
   indicating that codecs are conveyed in the order of their preference.
   With alternative 2, it is trivial to provide different ordering (or
   even a different set) of codecs to acheive such a goal.  Alternatives
   1a and 1b lack the ability to do so without additional extensions.

   This set of facts supports alternative 2 in preference to
   alternatives 1a and 1b.



Roach                    Expires August 4, 2013                 [Page 5]


Internet-Draft             Media Stream Syntax              January 2013


2.3.2.  Port Number Handling

   When multiple sections are used to represent a single session, we
   need to make a decision regarding the port number conveyed in the m=
   line itself.  One option is to use the same port number in all
   related m= sections.  According to Cullen Jennings, this interacts
   very poorly with existing implementations that use SDP.  The other
   alternative is to indicate bogus port numbers in all (or all but one)
   of the m= lines.  According to Hadriel Kaplan, this usage will lead
   to certain media intermediaries destroying the session when it
   determines that a signaled port is going unused.

   Alternative 1b avoids this problem altogether by having only one m=
   per IP/port combination, thereby completely sidestepping the question
   of what to put in subsequent m= lines.

   This set of facts supports alternative 1b in preference to
   alternatives 1a and 2.

2.3.3.  Attribute handling

   Attributes that appear inside m= sections can be generally broken
   down into three categories: those intended to apply to a single media
   stream (e.g., framerate); those intended to apply to an RTP session
   (e.g., rtcp-fb), and those that are explicitly bound to the m= line
   itself (e.g., rtpmap).  By and large, these attributes have been
   defined with an assumption that each RTP session had one stream and
   vice-versa.

   By specifying a model that breaks this one-to-one correspondence, we
   have created the need to be able designate a specific media stream
   within an RTP session (for alternatives 1a and 1b), or the need to be
   able to talk about session-level attributes (for alternatives 1a and
   2).

   Alternatives 1a and 1b can perform stream-level designation through
   the use of the ssid attribute specified in [RFC5576].  Alternatives
   1a and 2 can apply a convention that any RTP-session-level attributes
   are placed in the first m= section in a bundle (although other, more
   complicated approaches may also be possible).

   Note, in particular, that alternative 1a inherits both problems of
   being able to designate attributes as applying to a single stream, as
   well as being able to talk about session-level attributes when
   multiple m=lines are bundled together.

   This set of facts supports alternatives 1b and 2 in preference to
   alternative 1a.



Roach                    Expires August 4, 2013                 [Page 6]


Internet-Draft             Media Stream Syntax              January 2013


2.3.4.  What We're Unaware of Not Knowing

   It is worth noting that the problem described in Section 2.3.1 was
   not discovered for quite a long time after the discussion of multiple
   media streams had begun.  In the characterization of "known knowns,"
   "known unknowns," and "unknown unknowns," this issue remained an
   unknown unknown for more than a little time.

   Generally, addressing these unknown unknowns is likely to be easiest
   if we have the highest granularity of control.  Alternative 2, by
   breaking each stream apart into its own instance of the control
   structure that has historically been used to work with media (the m=
   section), provides this high granularity where alternatives 1a and 1b
   do not.

   It is the author's opinion that the probable existance of such
   unknown unknowns favors alternative 2 over 1a or 1b.

2.4.  Red Herrings

   During the course of discussing this topic, several points have been
   raised that, while relevant, do not bias the selection of one
   solution over another.

   One issue that has been brought up is that SDP offer/answer requires
   signaling of the number of m= sections in the offer, to allow clear
   semantics for negotiation.  Some proponents of solutions 1a and 1b
   have indicated a belief that allowing multiple streams per m= section
   avoides this restriction.  This assertion has a number of problems.
   First, it assumes that implementations can perform reasonable
   operations on dynamically created media streams that begin and end
   without any signaling.  It further assumes that the problems that the
   offer/answer model imposed the m-line restrictions for are no longer
   applicable (at least, not on a stream level).  Finally, this
   assertion assumes that no control surfaces are necessary to talk
   about and/or manipulate the individual streams (alternately, if such
   control surfaces are introduced, then additional SDP round-trips to
   exchange information about those controls is necessary, making them
   semantically equivalent to a new offer/answer exchange -- which
   eliminates any purported advantage).

   It has also been observed that, in addition to being sometimes
   applicable to streams and sometimes applicable to sessions, attribute
   are also sometimes unidirectional, and sometimes bidirectional.
   While an astute observation, this does not appear to have any bearing
   on the ultimate solution selected, as all three alternatives face
   exactly the same challenges in dealing with issues of directionality.




Roach                    Expires August 4, 2013                 [Page 7]


Internet-Draft             Media Stream Syntax              January 2013


   Finally, it should be noted that any decision to include multiple
   sections within a single m= section does little to simplify
   implementation.  Even if native RTCWEB implementations generate the
   fewest m= sections necessary to convey their desired session state,
   the selection of alternatives 1a and 1b does not obviate the
   requirement that implementations must be able to receive SDP with
   several m=audio sections (for example).  Interoperation with legacy
   implementations, even through a gateway, will require that proper
   handling of such session descriptions is present in every RTCWEB
   implementation.

2.5.  Summary

   The following table summarizes the pros and cons conveyed in the
   preceding sections on a per-solution basis.

                      +---------------+----+----+---+
                      | Issue         | 1a | 1b | 2 |
                      +---------------+----+----+---+
                      | Section 2.3.1 | -  | -  | + |
                      | Section 2.3.2 | -  | +  | - |
                      | Section 2.3.3 | -  | +  | + |
                      | Section 2.3.4 | -  | -  | + |
                      +---------------+----+----+---+

   Based on these criteria, it is the author's belief that Alternative 2
   provides the most benefit, with Alternative 1b providing a close
   second place.

   Alternative 1a has the remarkable property of combining all of the
   drawbacks of solutions 1b and 2, forming a kind of "sweet-spot" of
   ill-advisement, and thereby maximizing the amount of work required of
   the MMUSIC, RTCWEB,and CLUE working groups.


3.  IANA Considerations

   This document makes no requests of IANA.


4.  Security Considerations

   The author does not beleive that the syntax under discussion has an
   impact on the security properties of those protocols that make use of
   SDP.






Roach                    Expires August 4, 2013                 [Page 8]


Internet-Draft             Media Stream Syntax              January 2013


5.  Normative References

   [I-D.holmberg-mmusic-sdp-mmt-negotiation]
              Holmberg, C., Alvestrand, H., and J. Lennox, "Multiplexed
              Media Types (MMT) Using Session Description Protocol (SDP)
              Port Numbers",
              draft-holmberg-mmusic-sdp-mmt-negotiation-00 (work in
              progress), October 2012.

   [I-D.ietf-mmusic-sdp-bundle-negotiation]
              Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation
              Using Session Description Protocol (SDP) Port Numbers",
              draft-ietf-mmusic-sdp-bundle-negotiation-01 (work in
              progress), August 2012.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
              Media Attributes in the Session Description Protocol
              (SDP)", RFC 5576, June 2009.


Author's Address

   Adam Roach
   Mozilla
   Dallas, TX
   US

   Email: adam@nostrum.com



















Roach                    Expires August 4, 2013                 [Page 9]