Skip to main content

A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources
draft-lennox-raiarea-rtp-grouping-taxonomy-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Jonathan Lennox , Kevin Gross
Last updated 2013-02-18
Replaced by draft-ietf-avtext-rtp-grouping-taxonomy, draft-ietf-avtext-rtp-grouping-taxonomy, RFC 7656
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-lennox-raiarea-rtp-grouping-taxonomy-00
Network Working Group                                          J. Lennox
Internet-Draft                                                     Vidyo
Intended status: Informational                                  K. Gross
Expires: August 22, 2013                                             AVA
                                                       February 18, 2013

A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport
                         Protocol (RTP) Sources
             draft-lennox-raiarea-rtp-grouping-taxonomy-00

Abstract

   The terminology about, and associations among, Real-Time Transport
   Protocol (RTP) sources can be complex and somewhat opaque.  This
   document describes a number of existing and proposed relationships
   among RTP sources, and attempts to define common terminology for
   discussing protocol entities and their relationships.

   This document is still very rough, but is submitted in the hopes of
   making future discussion productive.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 22, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents

Lennox & Gross           Expires August 22, 2013                [Page 1]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Lennox & Gross           Expires August 22, 2013                [Page 2]
Internet-Draft            RTP Grouping Taxonomy            February 2013

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  RTP Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Synchronization Source . . . . . . . . . . . . . . . . . .  4
     2.2.  RTP Session  . . . . . . . . . . . . . . . . . . . . . . .  4
     2.3.  Contributing Source  . . . . . . . . . . . . . . . . . . .  5
     2.4.  Multimedia Session . . . . . . . . . . . . . . . . . . . .  5
     2.5.  Media Source . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Terminology used other protocols and groups  . . . . . . . . .  5
     3.1.  SDP  . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
       3.1.1.  SDP Multimedia Session . . . . . . . . . . . . . . . .  6
       3.1.2.  CLUE . . . . . . . . . . . . . . . . . . . . . . . . .  6
       3.1.3.  WebRTC . . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  Grouping of RTP sources  . . . . . . . . . . . . . . . . . . .  8
     4.1.  Overview of hierarchical relationships among groups  . . .  8
     4.2.  Existing and proposed source group semantics . . . . . . .  8
       4.2.1.  Relationships among media sources  . . . . . . . . . .  8
       4.2.2.  Alternative representations of media sources . . . . .  9
       4.2.3.  Robustness and Repair  . . . . . . . . . . . . . . . .  9
       4.2.4.  Reporting Associations . . . . . . . . . . . . . . . .  9
   5.  Existing and proposed mechanisms for describing groups . . . .  9
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   7.  Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . . 10
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   9.  Informative References . . . . . . . . . . . . . . . . . . . . 10
   Appendix A.  RTP terminology . . . . . . . . . . . . . . . . . . . 11
     A.1.  Capture  . . . . . . . . . . . . . . . . . . . . . . . . . 11
     A.2.  CNAME  . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     A.3.  End system . . . . . . . . . . . . . . . . . . . . . . . . 11
     A.4.  Group  . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     A.5.  Multimedia session . . . . . . . . . . . . . . . . . . . . 12
     A.6.  Multi-stream transmission  . . . . . . . . . . . . . . . . 12
     A.7.  Receiver . . . . . . . . . . . . . . . . . . . . . . . . . 12
     A.8.  Repair stream  . . . . . . . . . . . . . . . . . . . . . . 12
     A.9.  RTP endpoint . . . . . . . . . . . . . . . . . . . . . . . 12
     A.10. RTP session  . . . . . . . . . . . . . . . . . . . . . . . 12
     A.11. Scene  . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     A.12. Sender . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     A.13. Simulcast  . . . . . . . . . . . . . . . . . . . . . . . . 12
     A.14. Source . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     A.15. Media stream . . . . . . . . . . . . . . . . . . . . . . . 13
     A.16. Switched capture . . . . . . . . . . . . . . . . . . . . . 13
     A.17. Track  . . . . . . . . . . . . . . . . . . . . . . . . . . 13
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13

Lennox & Gross           Expires August 22, 2013                [Page 3]
Internet-Draft            RTP Grouping Taxonomy            February 2013

1.  Introduction

   Terminology and understanding about sources in RTP, and how they
   relate, is very confused.  Different protocols use the same terms
   differently, and complexities addressed at one layer are often
   glossed over or ignored at another.

   This document attempts to provide some clarity by reviewing the
   semantics of various aspects of sources in RTP.  As an organizing
   mechanism, it approaches this by describing various ways that RTP
   sources can be grouped and associated together.

2.  RTP Terminology

2.1.  Synchronization Source

   In RTP, the smallest unit of media transport is the "synchronization
   source" (SSRC).  Defined in [RFC3550], a synchronization source is
   the source of a stream of RTP packets, identified by a 32-bit numeric
   SSRC identifier carried in the RTP header of every RTP packet.  All
   packets from a synchronization source form part of the same timing
   and sequence number space, so a receiver groups packets by
   synchronization source for playback.  In the most common cases, a
   synchronization source is a single flow of encoded media that can be
   fed to a single decoding process.

   In some more complex cases, a synchronization source is also used to
   encode repair flows and sub-flows of a layered encoding.  See
   Section 4.2.2 and Section 4.2.3 below.

   (Note: as the definition above mentions, "synchronization source" in
   [RFC3550] technically refers to the *source* of the stream of RTP
   packets, not the packets themselves.  A better term for the stream
   could be helpful, and the authors are open to one; however, finding a
   good term that doesn't collide with terms used in other standards is
   very difficult.  In particular, both "stream" and "flow" are
   overloaded enough that their use would probably confuse more than it
   clarifies.)

2.2.  RTP Session

   The RTP protocol transmits sources in an "RTP session".  An RTP
   session is an association among a group of participants communicating
   with RTP.  It is a group communications channel which carries a
   number of synchronization sources.  Within an RTP session, every
   participant finds out meta-data and control information (over RTCP)
   about all the synchronization sources in the RTP session, all

Lennox & Gross           Expires August 22, 2013                [Page 4]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   synchronization source identifiers are unique in the RTP session, and
   the bandwidth of the RTCP control channel is shared by all the
   synchronization sources in the session.

2.3.  Contributing Source

   In an RTP session, in some cases, the media flows of some
   synchronization sources are not forwarded by RTP middleboxes (known
   as mixers and translators).  In the case of mixers, each of the
   synchronization sources that are used to create a mixed stream of
   media becomes visible instead as a "contributing source" (CSRC),
   whose media is no longer directly visibile in some parts of the RTP
   session, but whose meta-data is still known by all participants
   through RTCP.

2.4.  Multimedia Session

   A group of RTP sessions can be grouped into a "multimedia session", a
   group of concurrent, associated RTP sessions among a common group of
   participants.  A multimedia session defines logical relationships
   among sources that appear in multiple RTP sessions.

2.5.  Media Source

   At a higher level than this is the "media source", a single piece of
   media that can be indepentently rendered.  While in the simplest
   cases a synchronization source uniquely encodes a media source, many
   more complicated cases, discussed in Section 4.2.2 and Section 4.2.3
   below, associate multiple synchronization sources together to provide
   various methods and options for encoding a media source.  (In some of
   these cases, these multiple synchronization sources are even sent in
   separate RTP sessions, though always in the same multimedia session.)

   Unlike the other terminology defined in this section, "media source"
   is not a well-established term; existing protocols use a number of
   different terms to define similar concepts, described below.  ("Media
   source" may not be the best term in for this concept; the authors of
   this document are open to better ones.)

3.  Terminology used other protocols and groups

3.1.  SDP

   The Session Description Protocol [RFC4566] provides a mechanism for
   describing -- and, with SDP Offer/Answer [RFC3264], negotiation --
   multimedia sessions.

Lennox & Gross           Expires August 22, 2013                [Page 5]
Internet-Draft            RTP Grouping Taxonomy            February 2013

3.1.1.  SDP Multimedia Session

   SDP uses the term "multimedia session" in essentially the same manner
   as RTP.

3.1.1.1.  Media Stream

   "Media stream" is perhaps one of the most confusing terms in SDP.  It
   is never explicitly defined in [RFC4566]; its meaning can only be
   inferred from descriptions of its properties, the SDP syntax used to
   describe it, and its usage in other protocols.

   In SDP, a media stream is described my a "media description", a block
   of SDP starting with an "m=" line and followed by other SDP fields
   which are scoped to that media description.

   When SDP is used to describe a media stream that uses RTP, the
   attributes used (in the core SDP protocol) all define attributes of
   an RTP session.  Furthermore, [RFC4566] does note that "Media streams
   can be many-to-many."  Thus, when SDP describes a media stream that
   uses RTP, the RTP-level protocol element that is described is an RTP
   session, which can carry any number of synchronization sources (and
   thus media sources).

   However, for a number of reasons, many implementations of SDP,
   notably SIP devices, instead interpret SDP media streams as
   containing only a single media source in each direction (assuming
   unicast flows).  There are a number of reasons this implementation
   decision was chosen, including the lack of definition of "media
   stream" in the SDP specification; the poorly chosen name (the term
   "media stream" seems like something that should refer to what this
   document calls a "media source"); the fact that core SDP, and offer/
   answer, provide no description or negotiation mechanisms for anything
   more fine-grained than a media source; and the fact that most
   existing implementations have no need for more than one media source
   and received of each media type.

   Other than in discussions of other existing terminology, this
   document attempts to avoid the use of the word "stream".

3.1.2.  CLUE

   The CLUE working group is ongoing work in the IETF to define
   protocols and mechanism for interoperable telepresence systems.  Its
   architecture and terms are defined in [I-D.ietf-clue-framework].

Lennox & Gross           Expires August 22, 2013                [Page 6]
Internet-Draft            RTP Grouping Taxonomy            February 2013

3.1.2.1.  CLUE Capture Scene

   In CLUE, a "Capture Scene" is a structure representing a spatial
   region containing one or more Capture Devices, each capturing media
   representing a portion of the region.  The spatial region represented
   by a scene may or may not correspond to a real region in physical
   space, such as a room.  A capture scene includes attributes and one
   or more capture scene entries, with each entry including one or more
   media captures.

3.1.2.2.  CLUE Capture Scene Entry

   In CLUE, a "Capture Scene Entry" is a list of Media Captures of the
   same media type that together form one way to represent the entire
   Capture scene.

3.1.2.3.  CLUE Capture

   In CLUE, a "Media Capture" is a source of Media, such as from one or
   more Capture Devices or constructed from other Media streams.

   The "Capture" is CLUE's term that is most closely equivalent to what
   this document calls a "media source"; however, some usages (such as
   "dynamic sources" of "switched captures") may be more complex, and
   are still actively being defined.

3.1.2.4.  CLUE Capture Encoding

   In CLUE, a "Capture Encoding" is a specific encoding of a Media
   Capture, to be sent by a Media Provider to a Media Consumer via RTP.

   In simulcast scenearios (see Section 4.2.2), multiple capture
   encodings can be used simultaneously to transmit a single capture.

3.1.3.  WebRTC

   WebRTC is ongoing work in the IETF and W3C to define mechanisms and
   APIs to allow Javascript applications in web browsers to participate
   in multimedia sessions.  An overview can be found in
   [I-D.ietf-rtcweb-overview].

3.1.3.1.  WebRTC RtcMediaStreamTrack

   RtcMediaStreamTrack

   An "RtcMediaStreamTrack" is an object in WebRTC that maps most
   closely to what this document calls a "media source", though some of
   the details remain to be defined.  However, as WebRTC is defining an

Lennox & Gross           Expires August 22, 2013                [Page 7]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   API, the RtcMediaStreamTrack object also includes a good deal of
   meta-data about the media source, such as an ID, enabled/disabled
   state, and so forth.

3.1.3.2.  WebRTC RtcMediaStream

   An "RtcMediaStream" in WebRTC is an object that groups a set of
   RtcMediaStreamTracks that are synchronized with each other.

4.  Grouping of RTP sources

   Synchronization sources can have a large variety of relationships
   among them.  These relationships can apply both between sources
   within a single session, and between sources that occur in multiple
   sessons.

   Ways of relating synchronization sources typically involve groups: a
   set of synchronizaiton sources has some relationship that applies to
   all those in the group, and no others.  (Relationships that involve
   arbitrary non-grouping associations among synchronization sources,
   such that e.g., A relates to B and B to C, but A and C are unrelated,
   are uncommon if not nonexistant.)

   In many cases, the semantics of groups are not simply that the
   sources form an undifferentiated group, but rather that members of
   the group have certain roles.

4.1.  Overview of hierarchical relationships among groups

   Many (though not all) associations among groups can be described
   hierarchically.

   These hierarchichal groups can be divided into three large
   categories: associations among independent media sources; alternative
   representations of media sources; and mechanisms for robustness and
   repair of a particular representation of a media source.

4.2.  Existing and proposed source group semantics

   We can divide group semantics into several broad categories.

   (TODO: each of the items below needs to be described in detail.)

4.2.1.  Relationships among media sources

   Synchronization Contexts -- things identified by a CNAME.

Lennox & Gross           Expires August 22, 2013                [Page 8]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   CLUE Scenes (and their substructures, down to Captures.).

   WebRTC MediaStreams.

   Clock source signaling.

4.2.2.  Alternative representations of media sources

   Simulcast.

   Multi-Stream Transmission of Layered Codecs.

   The original FID description in RFC 5888 (grouping) Section 8.

4.2.3.  Robustness and Repair

   Forward Error Correction (FEC).  (But in some cases this can also be
   across multiple media sources!)

   Retransmission.

4.2.4.  Reporting Associations

   RTP also uses SSRC values to identify receivers of media, the source
   of feedback about synchronization sources (and thus, also media
   sources).  In some cases these receivers are also grouped.  However,
   these associations are not associations of senders of media, and thus
   are not the same type of relationship as the ones described above.

   Inter-Domain Media Synchronization

   Reporting groups.

5.  Existing and proposed mechanisms for describing groups

   SDP groups

   SDP source groups

   SRCNAME

   MSID

   CLUE

   Application-specific SDP attributes

Lennox & Gross           Expires August 22, 2013                [Page 9]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   SDP "number of ports" field to create multiple sessions with SSRC
   alignment.

6.  Security Considerations

   Different for each way of grouping.  Is there anything we can
   generalize?

   Hopefully having a well-defined common terminology and understanding
   of the complexities of the RTP architecture will help lead us to
   better standards, avoiding security problems.

7.  Open Issues

   Much of the terminology is still a matter of dispute.

   It might be useful to distinguish between a single endpoint's view of
   a source, or RTP session, or multimedia session, versus the full set
   of sessions and every endpoint that's communicating in them, with the
   signaling that established them.

   (Sure to be many more...)

8.  IANA Considerations

   This document makes no request of IANA.

9.  Informative References

   [I-D.ietf-clue-framework]
              Duckworth, M., Pepperell, A., and S. Wenger, "Framework
              for Telepresence Multi-Streams",
              draft-ietf-clue-framework-08 (work in progress),
              December 2012.

   [I-D.ietf-rtcweb-overview]
              Alvestrand, H., "Overview: Real Time Protocols for Brower-
              based Applications", draft-ietf-rtcweb-overview-05 (work
              in progress), December 2012.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

Lennox & Gross           Expires August 22, 2013               [Page 10]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC6222]  Begen, A., Perkins, C., and D. Wing, "Guidelines for
              Choosing RTP Control Protocol (RTCP) Canonical Names
              (CNAMEs)", RFC 6222, April 2011.

Appendix A.  RTP terminology

   (TODO: This list of definitions needs to be cleaned up and expanded,
   if it's useful.  Possibly its content should simply be merged into
   the appropriate sections of the main document, instead.)

A.1.  Capture

   "Capture" is the term used in the IETF CLUE Telepresence Framework to
   refer to what this document calls a "media source".

A.2.  CNAME

   An RTP canonical name (RTP CNAME or CNAME) is a persistent transport-
   level identifier for an RTP endpoint (Appendix A.9).  CNAMEs must be
   unique within an RTP session (Section 2.2).  The RTCP CNAME can be
   either persistent across different RTP sessions for an RTP endpoint
   or unique per session, meaning that an RTP endpoint chooses a
   different CNAME for each session.[RFC6222] receivers (Appendix A.7)
   require the CNAME to keep track of each participant.  Receivers may
   also require the CNAME to associate multiple data streams from a
   given participant in a set of related RTP sessions, for example to
   synchronize audio and video.

A.3.  End system

   An application that generates the content to be sent in RTP packets
   and/or consumes the content of received RTP packets.  An end system
   can produce one or more synchronization sources (Section 2.1) in a
   particular RTP session (Section 2.2).[RFC3550]

A.4.  Group

Lennox & Gross           Expires August 22, 2013               [Page 11]
Internet-Draft            RTP Grouping Taxonomy            February 2013

A.5.  Multimedia session

   A set of concurrent, associated RTP sessions (Section 2.2) among a
   common group of participants.[RFC3550]

A.6.  Multi-stream transmission

   Multi-stream transmission (MST) is a mechanism by which different
   portions of a layered encoding of a media stream are sent using
   separate synchronization sources (sometimes in separate RTP sessons).
   MSTs are useful for receiver control of layered media.

A.7.  Receiver

A.8.  Repair stream

   A repair stream is a secondary stream used for error recovery.  The
   repair mecahnisms available include
   o  duplication of the original stream,
   o  duplication of the original stream with a time offset,
   o  forward error correction (FEC) techniques, and.
   o  retransmission of lost packets (either globally or selectively).

A.9.  RTP endpoint

A.10.  RTP session

   An RTP session or simply, session is an association among a set of
   participants communicating with RTP.[RFC3550]

A.11.  Scene

   A scene is associated with IETF CLUE and describes a collection of
   captures (Appendix A.1) and the metadata required to make
   associations (e.g. relative spacial orintation of cameras) between
   the captures.

A.12.  Sender

A.13.  Simulcast

A.14.  Source

   A synchronization source (SSRC) is the source of a stream of RTP
   packets, identified by a 32-bit numeric SSRC identifier carried in
   the RTP header so as not to be dependent upon the network address.
   All packets from a synchronization source form part of the same
   timing and sequence number space, so a receiver groups packets by

Lennox & Gross           Expires August 22, 2013               [Page 12]
Internet-Draft            RTP Grouping Taxonomy            February 2013

   synchronization source for playback.[RFC3550]

A.15.  Media stream

A.16.  Switched capture

   A switched capture is used in teleconference appliations and is a
   stream (Appendix A.15) whose content is selected from one of a
   selection of sources.  The source is selected based on an algorithm
   in the equipment producing the stream for example, a closeup of the
   person currently speaking in a conference room.

A.17.  Track

   A track is associated with W3C WebRTC and IETF RTCWeb projects.  A
   track is defined by SSRC and CNAME.  A track is the smallest media
   entity that may be independently rendered.  A single audio channel is
   a track.

Authors' Addresses

   Jonathan Lennox
   Vidyo, Inc.
   433 Hackensack Avenue
   Seventh Floor
   Hackensack, NJ  07601
   US

   Email: jonathan@vidyo.com

   Kevin Gross
   AVA Networks, LLC
   Boulder, CO
   US

   Email: kevin.gross@avanw.com

Lennox & Gross           Expires August 22, 2013               [Page 13]