AVT T. Schierl
Internet-Draft Fraunhofer HHI
Intended status: Informational J. Lennox
Expires: April 30, 2009 Vidyo
October 27, 2008
Multi-Session and Multi-Source Transmission in the Real-Time Transport
Protocol (RTP)
draft-schierl-avt-rtp-multi-session-transmission-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 30, 2009.
Abstract
In this draft, we discuss problems related to multi-session and
multi-source transmission using the Real-Time Transport Protocol
(RTP). Most of the input to this draft is taken from email
discussion. Multi-session and multi-source transmission is motivated
by media data which allows for different transport layer treatment of
parts of the media. This is typically the case for layered media.
Multi-session transmission is when media data from a single media
source is split over multiple RTP sessions. Single-session multi-
source transmission (from now on just called "multi-source
transmission") is when data from a single media source is sent as
Schierl & Lennox Expires April 30, 2009 [Page 1]
Internet-Draft RTP Multi-Session Transmission October 2008
several RTP streams in the same RTP session. The main problems
discussed are the mechanisms used for data alignment and source
correlation. This draft gives further an overview of payload formats
using multi-sessions/multi-source transmission and highlights other
transport related issues. The draft concludes with recommendations
for the discussed problems.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. Existing Users of Multi-Session and Multi-Source
Transmission . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1. Progressive Video with Hybrid (PVH) . . . . . . . . . . . 5
4.2. H.264 Scalable Video Coding (SVC) . . . . . . . . . . . . 6
4.3. H.264 Multi-View Coding (MVC) . . . . . . . . . . . . . . 6
4.4. G.718: Embedded Variable Bit-Rate (EV-VBR)
Speech/Audio Codec . . . . . . . . . . . . . . . . . . . . 6
4.5. MPEG Surround . . . . . . . . . . . . . . . . . . . . . . 7
4.6. RTP Forward Error Correction . . . . . . . . . . . . . . . 7
4.7. RTP Retransmission . . . . . . . . . . . . . . . . . . . . 7
5. Topology Overview . . . . . . . . . . . . . . . . . . . . . . 8
6. Requirements for multi-session transmission . . . . . . . . . 8
6.1. Requirements on Data Alignment . . . . . . . . . . . . . . 8
6.2. Requirements on Source Correlation . . . . . . . . . . . . 9
7. Review of techniques for Data Alignment . . . . . . . . . . . 9
7.1. NTP Timestamp Alignment using RTCP Sender Report (SR)
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1.1. Identified problems . . . . . . . . . . . . . . . . . 10
7.2. Review of other potential techniques for Data Alignment . 12
7.2.1. RTP Timestamp Alignment . . . . . . . . . . . . . . . 12
7.2.2. Initial RTP Timestamp or RTP Timestamp Offset
Signaling . . . . . . . . . . . . . . . . . . . . . . 12
7.2.3. CCM message - need NTP update . . . . . . . . . . . . 13
7.2.4. Multiple early RTCP SRs . . . . . . . . . . . . . . . 13
7.2.5. Codec-Specific Mechanisms . . . . . . . . . . . . . . 13
7.2.6. RTP header extension . . . . . . . . . . . . . . . . . 14
8. Review of techniques for Source Correlation . . . . . . . . . 14
8.1. Source Correlation using CNAME in SDES . . . . . . . . . . 14
8.2. Review of other potential techniques for Source
Correlation . . . . . . . . . . . . . . . . . . . . . . . 15
8.2.1. Single SSRC Space . . . . . . . . . . . . . . . . . . 15
8.2.2. SSRC Groups . . . . . . . . . . . . . . . . . . . . . 15
8.2.3. CNAME in Source Attributes . . . . . . . . . . . . . . 16
8.2.4. Application-specific Inference of Association . . . . 16
9. Summary of RTP solution for Data Alignment and Source
Schierl & Lennox Expires April 30, 2009 [Page 2]
Internet-Draft RTP Multi-Session Transmission October 2008
Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 16
9.1. Data Alignment in RTP . . . . . . . . . . . . . . . . . . 16
9.2. Source Correlation in RTP . . . . . . . . . . . . . . . . 16
9.3. Dependency signaling . . . . . . . . . . . . . . . . . . . 17
10. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 17
11. Other transport related issues for multi-session
transmission . . . . . . . . . . . . . . . . . . . . . . . . . 18
11.1. Inter-session Jitter . . . . . . . . . . . . . . . . . . . 18
11.2. Inter-session Interleaving . . . . . . . . . . . . . . . . 18
12. Security Considerations . . . . . . . . . . . . . . . . . . . 18
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
14.1. Normative References . . . . . . . . . . . . . . . . . . . 18
14.2. Informative References . . . . . . . . . . . . . . . . . . 19
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
Intellectual Property and Copyright Statements . . . . . . . . . . 22
Schierl & Lennox Expires April 30, 2009 [Page 3]
Internet-Draft RTP Multi-Session Transmission October 2008
1. Introduction
Multi-session transmission is when media data from a single media
source is split over multiple Real-Time Transport Protocol (RTP)
[RFC3550] sessions. This is usually done because different transport
layer treatment is desired for different aspects of the media source,
e.g., different multicast groups or different traffic classes. If
the traffic is being sent using multicast routing, this is often
known as "layered multicast."
Single-session multi-source transmission (from now on just called
"multi-source transmission") is when data from a single media source
is sent as several RTP streams in the same RTP session. In this
case, the streams need to be treated differently by RTP (e.g. with
separate RTCP statistics, or selective forwarding by RTP translators)
but do not need different transport characteristics. This is often
referred to as "SSRC multiplexing", after the synchronization source
identifier (SSRC) which distinguishes sources in an RTP session.
Such techniques are often used for "layered" or "embedded" codecs
(the former term is typically used for video, the latter for audio).
A lower-bitrate, and often lower-complexity, stream (known as the
"base"), often backward-compatible with older codecs, provides basic
media quality, while one or more additional streams (known as
"enhancements") provide richer media or otherwise provide an enhanced
user experience. Various layered and embedded codecs are discussed
in Section 4.
Multi-session and multi-source transmission are also used for stream
robustness. Both RTP Forward Error Correction [RFC5109] and RTP
Retransmission [RFC4588] use multi-session transmission, and the
latter can optionally use multi-source transmission as well.
For both multi-session and multi-source transmission, two issues
arise: how streams are correlated, i.e. how receivers determine which
base and enhancement streams carry data for the same media source;
and how streams are aligned, i.e. how receivers determine which
packets of the base stream are associated with which packets of the
enhancement stream.
2. Definitions
multi-session transmission: In multi-session transmission, media
data from a single media source is split over multiple RTP
sessions. The term "layered multicast" is equivalent to multi-
session transmission for sessions using multicast addresses.
Schierl & Lennox Expires April 30, 2009 [Page 4]
Internet-Draft RTP Multi-Session Transmission October 2008
multi-source transmission: In multi-source transmission, data from a
single media source is sent as several RTP streams in the same RTP
session. The sources contained in an RTP session are identified
by their synchronization source identifiers (SSRCs) or, if
combined by a RTP mixer, by their contributing source identifiers
(CSRCs), as defined in RTP [RFC3550].
associated multimedia streams: Associated multimedia streams are
independent media sources from the same session participant, e.g.
audio and video sources, or multiple cameras from a single
participant. Each source can have an independent media clock,
reflecting the device that captured the media. For live media,
these clocks will often drift relative to each other, over and
above their often inherently-different clock rates. In RTP, each
stream has separate initial RTP timestamps and sequence numbers.
Related sources are associated using the RTCP Canonical Name
(CNAME) Source Description (SDES) field. A common time base may
be computed using NTP timestamps, based on information carried in
RTCP Sender Report (SR) packets. The sources are typically
synchronized ("lip-synced") by receivers when rendered, based on
the computed NTP timestamps.
Data Alignment: Assembling data of the same media frame which is
transferred in different sessions or as different sources in the
same session as part of a layered media. The assembly of the
media frame must be achieved before decoding, otherwise the
decoding process typically fails or may be only possible at a
reduced quality.
Source Correlation: The logical association of RTP streams
transferred as multiple separate sessions or as multiple sources
in the same session to one layered media.
3. Terminology
"The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
4. Existing Users of Multi-Session and Multi-Source Transmission
4.1. Progressive Video with Hybrid (PVH)
Progressive Video with Hybrid transform (PVH) [McCa96] was used in
the initial demonstration of multi-session transmission. PVH was the
initial driver for adding text on layered multicast to the Real-Time
Transport Protocol (RTP) [RFC3550]. Data Alignment was done using
packets' RTP timestamps.
Schierl & Lennox Expires April 30, 2009 [Page 5]
Internet-Draft RTP Multi-Session Transmission October 2008
4.2. H.264 Scalable Video Coding (SVC)
H.264 Scalable Video Coding (SVC) [I-D.ietf-avt-rtp-svc] extends the
H.264 [RFC3984] video standard to provide spatial, temporal, and
quality (signal-to-noise) enhancements. The base layer of SVC is
backward-compatible with existing H.264 decoders. A base layer sent
separately using the H.264 [RFC3984] payload format can be received
and processed by existing devices. The Payload Format for SVC uses
the multi-session transmission approach. Currently two basic modes
are defined in the SVC Payload Format for decoding order recovery of
media data received from multiple sessions:
Data Alignment based on NTP timestamps: This method is used in the
NI-T and NI-TC mode defined in [I-D.ietf-avt-rtp-svc]. These
modes currently rely on exact NTP timestamp alignment in order to
recover the decoding order.
Cross-Session Decoding Order Number (CS-DON): This method is used in
the NI-C, NI-TC and I-C modes defined in [I-D.ietf-avt-rtp-svc].
These modes rely on a number (CS-DON) which is associated to
packets indicating the decoding order across sessions.
4.3. H.264 Multi-View Coding (MVC)
H.264 Multi View Coding (MVC) [I-D.wang-avt-rtp-mvc] extends the
H.264 [RFC3984] video standard to provide multiple views of a video
stream, for multi view and 3D applications. MVC is similarly to SVC
an extension of H.264 and has a backward compatible base view, which
can be also decoded by existing H.264 receivers. Thus it is possible
to provide the base view of a multi sessions transmission in a
compatible way using the H.264 [RFC3984] as Payload Format. Since
the new coding approach is mainly based on exploiting temporal
references to other frames of the same view or different views, there
is not always the need to receive the base view in order to decode a
desired view. The payload format will rely on the same approaches as
defined in the RTP Payload Format for SVC video
[I-D.ietf-avt-rtp-svc] for decoding order recovery when receiving
data from multiple sessions.
4.4. G.718: Embedded Variable Bit-Rate (EV-VBR) Speech/Audio Codec
G.718, the Embedded Variable Bit-Rate (EV-VBR) speech/audio codec
[I-D.lakaniemi-avt-rtp-evbr] provides an embedded speech-rate
encoder. This codec also allows for multi-session transmission. The
current draft mandates RTP SR for Data Alignment in multi-session
transmission.
Schierl & Lennox Expires April 30, 2009 [Page 6]
Internet-Draft RTP Multi-Session Transmission October 2008
4.5. MPEG Surround
MPEG Surround (Spatial Audio Coding, SAC) [I-D.ietf-avt-rtp-mps]
enhances MPEG two-channel audio with multi-channel surround sound
while maintaining backward compatibility with two-channel receivers.
The payload relies on NTP timestamp alignment for multi-session
transmission. The audio codec typically has different sampling rates
for base and enhancements.
4.6. RTP Forward Error Correction
RTP Generic Forward Error Correction [RFC5109] allows a supplemental
stream to provide additional data for recovery from packet loss using
a separate session for transmitting the FEC stream. The repair
stream is typically sent as a separate RTP session. A special case
is when the FEC stream is being sent as a secondary codec in the
redundant encoding format. In this case the FEC stream is sent as a
separate source in the same session as the redundant codec. Data
Alignment is achieved using sequence numbers of the FEC protected
packets.
FEC Grouping Issues in Session Description Protocol
[I-D.begen-mmusic-fec-grouping-issues] describes a grouping framework
for FEC and media streams based on the Grouping of Media Lines in the
Session Description Protocol (SDP) [RFC3388] framework. The
framework relies on transmitting the FEC streams in separate
sessions. Data Alignment is achieved by the FEC Framework and relies
on the used FEC scheme, i.e. there is a specific solution for
associating data of the protected and the protecting packet stream.
4.7. RTP Retransmission
RTP Retransmission [RFC4588] allows senders to retransmit RTP packets
indicated by the receiver as lost. The re-sent packets are
transported in a separate stream and may be transmitted within a
separate RTP session or may be transmitted as a separate source in
the same session as the media stream.
If multi-source (i.e., single-session) transmission is being used,
retransmitted packets are sent with a different SSRC. Source
association in this case done by sources' CNAMEs, with the further
requirement that a receiver MUST NOT have two outstanding requests
for the same packet sequence number in two different original streams
before the association is resolved.
Schierl & Lennox Expires April 30, 2009 [Page 7]
Internet-Draft RTP Multi-Session Transmission October 2008
5. Topology Overview
A number of different RTP Topologies [RFC5117] are relevant for
consideration for multi-source and multi-session transmission.
[Ed. TBD: more text on the relation between the approaches presented
in the memo and the mentioned topologies.]
o Point-to-point - Two endpoints communicating using unicast.
o Point-to-multipoint via multicast - Using a multicast transport
mechnisms to send packets of one participant to all the other
participants in the multicast group.
o Point-to-multipoint via RTP translator - Using [RFC3550]
translators to send packets of one participant to other
participants of a group. Packets of one or more participants may
be forwarded to the group.
o Point-to-multipoint via RTP mixer - Using [RFC3550] mixers to send
packets of one participant to other participants of a group.
Packets of one or more participants may be forwarded to the group.
o Point-to-multipoint via Video Switching MCUs - Allows for sending
packets from one participant to the other participants in a group.
But typically only one participant's video data is forwarded at a
time to the other participants.
o Point-to-multipoint via RTCP-terminating MCUs - Each participant
is running a point-to-point session with the MCU. Typically, only
one participant's video data is forwarded at a time to the other
participants.
o Point-to-multipoint without a feedback channel - These channels
typically provide IP multicast over a broadcast transmission
medium, which naturally do not provide a bi-directional channel.
This is the case, e.g. for DVB channels using IP over MPE over
MPEG-2 Transport Stream as for DVB-H or the emerging DVB-SH.
6. Requirements for multi-session transmission
6.1. Requirements on Data Alignment
Synchronization of media streams received from multiple sessions is
typically used for lip-synchronization of audio and video data. For
this case, RTP provides a strong tool, which is the presence of (RTP)
timestamps for each media frame, generated from individual clocks for
each session. Additionally, RTCP Sender Report packets are sent
periodically in each session containing (NTP) timestamps from a
wallclock common across all of the sessions, plus a reference to the
corresponding (RTP) timestamp that would be generated for a media
frame with the signaled wallclock time. The interval between
transmission of RTCP SRs is typically in the range of multiple
Schierl & Lennox Expires April 30, 2009 [Page 8]
Internet-Draft RTP Multi-Session Transmission October 2008
seconds. For a more detailed review of RTP synchronization
techniques, see Section 7.1.
For the reception of layered media, either on multiple sessions or as
multiple sources, it is absolutely essential to allow for immediate
Data Alignment. That is, the Data Alignment must be applied before
the decoding process of the layered media. If Data Alignment is not
applied before decoding, the decoder may not be able to decode the
media at all, or may only be able to produce a media representation
at reduced quality.
6.2. Requirements on Source Correlation
For the reception of layered media, whether on multiple sessions or
as multiple sources, it is absolutely essential to find out prior to
decoding which sessions and sources are correlated. That is, the
receiver needs to know, prior to Data Alignment and decoding, the
inter-session and the inter-source dependency. Notably, for cases in
which multiple independent media sources are transmitted as layered
media in the same session or set of sessions, miscorrelation of
sources could lead to a decoder attempting to use one source's base
layer with another source's enhancement layer.
7. Review of techniques for Data Alignment
7.1. NTP Timestamp Alignment using RTCP Sender Report (SR) Packets
The inter-media synchronization mechanism defined in [RFC3550] uses
RTP timestamps in the RTP packets and a combination of RTP timestamp
and NTP wallclock carried in the RTCP Sender Report (SR) packets.
The RTCP SR packet contains a RTP timestamp in the media timescale
and as reference to an absolute wallclock time the NTP timestamp.
The definitions for timestamp generation and synchronization in
section 5.1 and 6.4.1 of [RFC3550] are summarized in the following
list:
o The timestamp reflects the sampling instant of the first octet in
the RTP data packet.
o The sampling instant MUST be derived from a clock that increments
monotonically and linearly in time to allow synchronization and
jitter calculations (see Section 6.4.1).
o The resolution of the clock MUST be sufficient for the desired
synchronization accuracy and for measuring packet arrival jitter
(one tick per video frame is typically not sufficient).
o If RTP packets are generated periodically, the nominal sampling
instant as determined from the sampling clock is to be used, not a
reading of the system clock.
Schierl & Lennox Expires April 30, 2009 [Page 9]
Internet-Draft RTP Multi-Session Transmission October 2008
o RTP timestamps from different media streams may advance at
different rates and usually have independent, random offsets.
Therefore, although these timestamps are sufficient to reconstruct
the timing of a single stream, directly comparing RTP timestamps
from different media is not effective for synchronization.
Instead, for each medium the RTP timestamp is related to the
sampling instant by pairing it with a timestamp from a reference
clock (wallclock) that represents the time when the data
corresponding to the RTP timestamp was sampled..
o Receivers should expect that the measurement accuracy of the
timestamp may be limited to far less than the resolution of the
NTP timestamp.
o On a system that has no notion of wallclock time but does have
some system-specific clock such as "system uptime", a sender MAY
use that clock as a reference to calculate relative NTP
timestamps.
o It is important to choose a commonly used clock so that if
separate implementations are used to produce the individual
streams of a multimedia session, all implementations will use the
same clock.
o [Ed. : The RTP timestamp in the SR] corresponds to the same time
as the NTP timestamp (above), but in the same units and with the
same random offset as the RTP timestamps in data packets.
o This correspondence may be used for intra- and inter-media
synchronization for sources whose NTP timestamps are synchronized,
and may be used by media-independent receivers to estimate the
nominal RTP clock frequency.
o Rather, it MUST be calculated from the corresponding NTP timestamp
using the relationship between the RTP timestamp counter and real
time as maintained by periodically checking the wallclock time at
a sampling instant.
To summarize, the definitions in [RFC3550]: the RTCP SR is used for
deriving the media timestamp using the RTP timestamp and the NTP
wallclock. If this synchronization mechanism is correctly
implemented and there is no clock jitter in neither the media clock
nor in the clock thus it can be always guaranteed, that a RTP
timestamp and its NTP wallclock timestamp are perfectly aligned, the
RTP approach should work fine for Data Alignment. [Ed. : need more
text for summary / review of text above ]
7.1.1. Identified problems
7.1.1.1. Synchronization Delay
Since [RFC3550] mandates RTCP SRs to be sent in intervals of multiple
seconds, Data Alignment based on this information may introduce a
delay to this process, which may lead to delayed tune-in for the
Schierl & Lennox Expires April 30, 2009 [Page 10]
Internet-Draft RTP Multi-Session Transmission October 2008
decoding process. This is typically not the case for decoding media
transferred in exactly one session and source, since synchronization
is not required for decoding, but only for playout. A delay for
playout or lip synchronization does not usually pose a fundamental
problem.
7.1.1.2. Losing synchronization information
The loss of RTCP SR packets may introduce additional delay to the
Data Alignment process, thus a more robust mechanism would be
desirable.
7.1.1.3. Clock Skew
Clock skew between the NTP/system clock and the media clock will
affect the NTP media timestamp generation derived from RTCP SRs and
RTP timestamps. That typically results in different NTP timestamps
for packets of the same media frame transmitted in the different
sessions or transferred as different sources, and leads to
misalignment for the Data Alignment. As far as we know, there is no
way to always guarantee the presence of perfect clocks for media and
NTP/system clock. From the standardization point of view this may
seem to be an implementation issue. However, if this implementation
issue puts a burden on the senders like the presence of a perfect
clocks for generating timestamps, this issue needs to be solved in an
easy and general way.
Following the RTP philosophy, clock skew can be estimated by
observing several RTCP SRs. The receiver may use the observation to
compensate for the clock skew. However, this is only possible if
there is no requirement for immediate synchronization of the sort
which is essential for Data Alignment of layered codecs.
The case of clock skew between in media and NTP/system clocks may be
overcome by using the same clock instance, e.g. the system clock, for
RTP as well as NTP timestamp generation. However, this is not
compliant with RTP, since [RFC3550] mandates the use of a media clock
which is different from the system clock (see definitions in RTP as
cited above in Section 7.1). Indeed, for many codecs, notably audio,
correct decoding requires that the timestamp difference between
subsequent frames exactly correspond to the amount of data sent in
each frame.
7.1.1.4. Accuracy of clocks
Assuming that we have clocks without skew, there is still the
question of accuracy of the clock used for generating the timestamps.
Notably, the Windows system clock is only updated on each system
Schierl & Lennox Expires April 30, 2009 [Page 11]
Internet-Draft RTP Multi-Session Transmission October 2008
clock tick, typically every 10 or 15 milliseconds on Windows XP and
Vista. RTP says that a receiver should not make any assumption on
this, but an implementation which may have to cope with rounding done
in the low-order microsecond cannot simply compare two NTP timestamps
for being identical. An application may have to compare "ranges" of
timestamps in order to get rid of rounding problems. However, in
some cases the ranges of NTP timestamps required may indeed be
greater than the time interval between consecutive media frames.
7.1.1.5. Existing RTCP SR implementations
As far as we know, existing RTCP SR implementations show a wide range
of alignment problems for generating exact NTP media timestamps for
Data Alignment. NTP alignment issues can be modeled for existing
RTCP senders by capturing an NTP and RTP timestamps in consecutive SR
packets, projecting the NTP timestamp in one SR packet based on the
RTP timestamp in that SR packet, the NTP and RTP timestamps in the
previous SR packet, and the codec's nominal clock rate. Initial
experiments have shown NTP timestamp alignment problems on the order
of 40-50 milliseconds for several implementations.
7.2. Review of other potential techniques for Data Alignment
7.2.1. RTP Timestamp Alignment
The idea here is to signal the same RTP timestamp for packets
containing data of the same media time instance in the different
sessions. That is the same clock would have to be used for the
multiple sessions and the same RTP random offset would have to be
used. This method is backward compatible with using NTP timestamps
for inter-media synchronization as well as for jitter calculation.
Furthermore, this is the only alternative used up to our knowledge
(see Section 4.1) for layered transmission of media.
7.2.1.1. Identified problems
Using the same RTP timestamp random offset may lead to getting weak
initialization vectors for the encryption method defined in [RFC3550]
if keys are shared across the sessions or streams. Additionally,
that it may be unnatural for some codecs to use the same clockrate
for the multiple sessions, for example an audio wideband enhancement
layer enhancing a narrow-band base layer.
7.2.2. Initial RTP Timestamp or RTP Timestamp Offset Signaling
Signaling the initial RTP timestamp or the initial offsets as an
media or source level attribute in SDP associated with each stream.
This could be done, e.g., using
Schierl & Lennox Expires April 30, 2009 [Page 12]
Internet-Draft RTP Multi-Session Transmission October 2008
[I-D.ietf-mmusic-sdp-source-attributes].
7.2.2.1. Identified problems
This may have an implication for implementations, since one needs to
know packet stream related information as initial RTP timestamp, or
offset between RTP timestamps during while offering a session. This
may be a problem for sessions where multiple senders are present: it
may not always be possible for an SDP creator to include all initial
offsets / timestamps for all participants for sessions with multiple
sending parties.
7.2.3. CCM message - need NTP update
In this case, a receiver would request for immediate synchronization
information. This method may reduce the initial delay, but just work
for topologies with bi-directional channels.
7.2.3.1. Identified problems
This method is only feasible for topologies with bidirectional and
reasonably rapid communication channels, i.e. unicast or small-group
multicast. This method also assumes that the NTP timestamp alignment
always works.
7.2.4. Multiple early RTCP SRs
In this case, the sender would generate more RTCP SRs than typically
required and send them at an early point in the session. This method
does also work for topologies with uni-directional communication
channels.
7.2.4.1. Identified problems
This method may overflow the RTCP bandwidth. Enhancing the RTCP
sender bandwidth may be achieved using SDP bandwidth parameters.
This method may require an adjustment of the RTCP bandwidth of the
session depending on the number of participants and senders.
Further, this approach does not solve the problem for receivers
tuning in to the session after it begins ("random entry"). This
method also assumes that the NTP timestamp alignment always works.
7.2.5. Codec-Specific Mechanisms
This mechanism exploits signaling contained within the payload's data
sections in order to allow the Data Alignment. Example is the Cross
Session Decoding Order Number (CS-DON) as defined in
[I-D.ietf-avt-rtp-svc] or as proposed in
Schierl & Lennox Expires April 30, 2009 [Page 13]
Internet-Draft RTP Multi-Session Transmission October 2008
[I-D.hannuksela-avt-rtp-svc], where a timestamp or a timestamp delta
of the RTP packet to be aligned is carried by payload specific means.
7.2.5.1. Identified problems
A payload independent solution for the basic functionality of Data
Alignment is desirable.
7.2.6. RTP header extension
The RTP header extension may be used to add generic signaling about
Data Alignment to RTP packets.
7.2.6.1. Identified problems
RTP header extensions are required to be ancillary information which
can safely be discarded by receivers which do not understand them.
Data alignment mechanisms do not satisfy this requirement.
8. Review of techniques for Source Correlation
8.1. Source Correlation using CNAME in SDES
In RTP, associated multimedia streams (e.g., audio and video sources
from a single participant) have different SSRCs, and are associated
using SDES CNAME fields. While in principle the same technique can
be used to associate streams for multi-session or multi-source
transmission, several issues arise.
Startup latency: while slow lipsync convergence of multimedia streams
is often tolerable, layered sources have to be associated from the
start in order to be decodable, particularly for codec types such as
video with inter-frame decoding dependencies.
If multiple sources are sent from the same participant on the same
session or family of sessions, e.g. multiple video cameras, they will
have the same CNAME, because they are synchronized with each other
and with any other sources for the session. This makes it impossible
to definitively associate base and enhancement sources, as there may
be more than one of each with the same CNAME. This potential for
confusion is the reason for RTP retransmission's restriction on
multiple outstanding RTP NACKs before stream association has
completed, as described in Section 4.7.
Schierl & Lennox Expires April 30, 2009 [Page 14]
Internet-Draft RTP Multi-Session Transmission October 2008
8.2. Review of other potential techniques for Source Correlation
8.2.1. Single SSRC Space
Motivated by the problems with CNAME association, RTP [RFC3550]
specifies instead a single SSRC space for layered multicast
(multiple-session transmission). Furthermore, as described in
Section 9.2, it specifies that SSRC collision detection is performed
only in the base layer.
Applying SSRC collision detection in just the base layer in case of
using multi-session transmission seems to work for current codec
implementations.
By definition one of the multiple views possible in MVC media
Section 4.3 is the base view and this view is backward compatible to
H.264. Decoding a view other than the base view may not require the
presence of the base view. Although MVC is by its nature a layered
codec, it may not always be reasonable to require the reception of
the base layer for collision detection, even when it is not required
for decoding.
Currently, we do not see major relevance for the MVC codec format,
due to its lack in coding efficiency, thus we tend not to take MVC as
the killer application for new Source Correlation functionalities.
This means without taking MVC into account, the current solution of
using the base layer for SSRC collision detection seems to be still
appropriate.
If needed, collision detection could instead be performed across all,
or a subset of, the sessions used for multi-session transmission.
However, it is not entirely clear how this would work for senders or
receivers that are only participating in a subset of the sessions,
and this would require further study.
8.2.2. SSRC Groups
The Internet-Draft [I-D.ietf-mmusic-sdp-source-attributes] specifies
a mechanism by which related sources can be described as grouped in
SDP. For multi-source (single-session) transmission, this can
provide an alternative way to provide source association.
Clearly, this will only be effective in topologies and signaling
architectures in which the SDP author can know about every source in
the session that will be used for multi-source transmission, and the
SDP can be updated on the addition of new sources or SSRCs
collisions.
Schierl & Lennox Expires April 30, 2009 [Page 15]
Internet-Draft RTP Multi-Session Transmission October 2008
8.2.3. CNAME in Source Attributes
The draft [I-D.ietf-mmusic-sdp-source-attributes] also provides a
mechanism for sources' SSRCs to be associated to their CNAMEs in SDP.
This can eliminate the startup latency of stream association for the
mechanism described in Section 8.1, though it does not solve the
problem of multiple sources for a session. It also has the same
architectural limitations as Section 8.2.2 in terms of using SDP.
8.2.4. Application-specific Inference of Association
As described in Section 4.7, it is in some cases possible to use
mechanisms specific to a particular codec or mechanism to determine
stream associations. For retransmission, for instance, a NACK of a
packet with sequence N with SSRC A, followed by a retransmission of a
packet with sequence N on SSRC B, indicates that SSRC B is the
retransmission stream for SSRC A. Such techniques are mechanism-
specific and cannot easily be generalized.
9. Summary of RTP solution for Data Alignment and Source Correlation
9.1. Data Alignment in RTP
The text on layered multicast in [RFC3550] does not discuss Data
Alignment among the media data carried in the different RTP sessions.
We assume that the intention of the RTP specification was to use NTP
timestamp alignment. However, Vic, the demonstration code for
layered multicast using PVH, used RTP timestamp alignment for this
purpose.
9.2. Source Correlation in RTP
The text in section 8.3 of [RFC3550] mandates a single SSRC to be
used for multiple sessions containing data of the same layered media
source. Further, the text mandates the detection of SSRC collisions
using the CNAME item in SDES packets carried in the base layer:
For layered encodings transmitted on separate RTP sessions (see
Section 2.4), a single SSRC identifier space SHOULD be used across
the sessions of all layers and the core (base) layer SHOULD be
used for SSRC identifier allocation and collision resolution.
When a source discovers that it has collided, it transmits an RTCP
BYE packet on only the base layer but changes the SSRC identifier
to the new value in all layers. ...
Schierl & Lennox Expires April 30, 2009 [Page 16]
Internet-Draft RTP Multi-Session Transmission October 2008
9.3. Dependency signaling
For signaling the dependency of data transmitted using layered
multicast, SDP [RFC4566] contains rudimentary support, in that it
allows for signaling a range of transport addresses in a certain
media description. By definition, a higher transport address
identifies a higher layer in the one- dimensional hierarchy. A
receiver needs only to decode data conveyed over this transport
address and lower transport addresses to decode this Operation Point.
When the media data of one source is transmitted in multiple RTP
sessions, the mechanism defined in Signaling media decoding
dependency in Session Description Protocol (SDP)
[I-D.ietf-mmusic-decoding-dependency] can also be used to indicate
the relationship between the multiple sessions of the same media
type. Currently, this mechanism is inherited by the new Payload
Formats allowing multi-session transmission: [I-D.ietf-avt-rtp-svc],
[I-D.wang-avt-rtp-mvc], [I-D.ietf-avt-rtp-mps], and
[I-D.lakaniemi-avt-rtp-evbr] . By definition the base layer is
signaled as the RTP session which does not depend on any other
session.
Since [RFC3550] mandates the correlation of one layered media with
the same source, there is no mechanism to indicate dependencies of
multiple sources.
10. Recommendations
We recommend for Data Alignment of media data from the same source,
that the same RTP timestamp is used for packets of the same time
instance as defined in
[I-D.lennox-avt-rtp-layered-encoding-timestamps]. This method comes
for free and can be implemented in a backward compatible way, since
NTP timing for synchronizing different types of media is not
affected. This further requires the use of the same timescale of the
sessions of an multi-session or multi-source transmission, which is
anyway the case if the layered media is identified as a unique
source. Mandating the same timescale for each of the sessions in a
multi-session transmission may need to be discussed with respect to
the audio codec described in Section 4.5.
For Source Correlation, we suggest to keep the mechanism defined in
[RFC3550], i.e. all layers of a layered media source have the same
SSRC and the base layer is used for SSRC collision detection.
Further, it may be useful to have a signaling mechanism, which
indicates the RTP session to be used for SSRC collision detection.
Schierl & Lennox Expires April 30, 2009 [Page 17]
Internet-Draft RTP Multi-Session Transmission October 2008
11. Other transport related issues for multi-session transmission
11.1. Inter-session Jitter
The transport of media of the same source in different sessions may
introduce different jitter behaviors in the different sessions. We
call this issue inter-session jitter. Inter-session jitter may be
caused by sessions taking different network paths or by any other
packet reordering within the network outside the control of the user.
RTP implementations typically use buffers for de-jittering each of
the sessions separately. In a simple A/V transmission scenario, de-
jittering the audio and the video input queue separately is not
problematic, since the synchronization is achieved after the decoder
during playout. Using multi-session transmission, de-jittering and
synchronization (Data Alignment) is required before decoding instead
of synchronizing the data after decoding at playout time. And the
Data Alignment via NTP timestamp must be 100% exact on a micro second
base, otherwise the synchronization fails. This is definitely
different from doing synchronization for lip synchronized playout of
audio and video.
11.2. Inter-session Interleaving
Using multi-session transmission allows for data interleaving, while
the data transmitted within one session can still be sent in decoding
order. Inter-session interleaving may be also realizable using Data
Alignment via timestamps.
12. Security Considerations
[Ed. TBD]
13. IANA Considerations
No action by IANA is required.
14. References
14.1. Normative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
Schierl & Lennox Expires April 30, 2009 [Page 18]
Internet-Draft RTP Multi-Session Transmission October 2008
14.2. Informative References
[I-D.begen-mmusic-fec-grouping-issues]
Begen, A., "FEC Grouping Issues in Session Description
Protocol", draft-begen-mmusic-fec-grouping-issues-00 (work
in progress), February 2008.
[I-D.hannuksela-avt-rtp-svc]
Hannuksela, M. and Y. Wang, "Session Multiplexing for SVC
Video", draft-hannuksela-avt-rtp-svc-01 (work in
progress), July 2008.
[I-D.ietf-avt-rtp-mps]
Bont, F., Doehla, S., Schmidt, M., and R. Sperschneider,
"RTP Payload Format for Elementary Streams with MPEG
Surround multi- channel audio", draft-ietf-avt-rtp-mps-01
(work in progress), October 2008.
[I-D.ietf-avt-rtp-svc]
Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
"RTP Payload Format for SVC Video",
draft-ietf-avt-rtp-svc-14 (work in progress),
September 2008.
[I-D.ietf-mmusic-decoding-dependency]
Schierl, T. and S. Wenger, "Signaling media decoding
dependency in Session Description Protocol (SDP)",
draft-ietf-mmusic-decoding-dependency-04 (work in
progress), October 2008.
[I-D.ietf-mmusic-sdp-source-attributes]
Lennox, J., Ott, J., and T. Schierl, "Source-Specific
Media Attributes in the Session Description Protocol
(SDP)", draft-ietf-mmusic-sdp-source-attributes-01 (work
in progress), February 2008.
[I-D.lakaniemi-avt-rtp-evbr]
Lakaniemi, A. and Y. Wang, "RTP payload format for G.718
speech/audio", draft-lakaniemi-avt-rtp-evbr-04 (work in
progress), October 2008.
[I-D.lennox-avt-rtp-layered-encoding-timestamps]
Lennox, J., Schierl, T., and S. Ganesan, "Real-Time
Transport Protocol (RTP) Timestamps for Layered
Encodings",
draft-lennox-avt-rtp-layered-encoding-timestamps-00 (work
in progress), June 2008.
Schierl & Lennox Expires April 30, 2009 [Page 19]
Internet-Draft RTP Multi-Session Transmission October 2008
[I-D.wang-avt-rtp-mvc]
Wang, Y. and T. Schierl, "RTP Payload Format for MVC
Video", draft-wang-avt-rtp-mvc-02 (work in progress),
August 2008.
[McCa96] McCanne, S., "Scalable Compression and Transmission of
Internet Multicast Video", Report No. UCB/CSD-96-928,
December 1996.
Ph.D. Dissertation, University of California Berkeley.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3388] Camarillo, G., Eriksson, G., Holler, J., and H.
Schulzrinne, "Grouping of Media Lines in the Session
Description Protocol (SDP)", RFC 3388, December 2002.
[RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund,
M., and D. Singer, "RTP Payload Format for H.264 Video",
RFC 3984, February 2005.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Description Protocol", RFC 4566, July 2006.
[RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
July 2006.
[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
Correction", RFC 5109, December 2007.
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
January 2008.
Appendix A. Acknowledgements
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA). Further, the author Thomas
Schierl of Fraunhofer HHI is sponsored by the European Commission
under the contract number FP7-ICT-214063, project SEA. The authors
want to thank Colin Perkins, Ye-Kui Wang, Randell Jesup, Ingemar
Johansson, Gerard Babonneau, Alex Eleftheriadis, Stefan Doehla, and
Roni Even for their valuable comments on the mailing list.
Schierl & Lennox Expires April 30, 2009 [Page 20]
Internet-Draft RTP Multi-Session Transmission October 2008
Authors' Addresses
Thomas Schierl
Fraunhofer HHI
Einsteinufer 37
D-10587 Berlin
Germany
Phone: +49-30-31002-227
Email: mail@thomas-schierl.de
Jonathan Lennox
Vidyo, Inc.
433 Hackensack Avenue
Sixth Floor
Hackensack, NJ 07601
US
Email: jonathan@vidyo.com
Schierl & Lennox Expires April 30, 2009 [Page 21]
Internet-Draft RTP Multi-Session Transmission October 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Schierl & Lennox Expires April 30, 2009 [Page 22]