INTERNET-DRAFT Stephan Wenger
draft-wenger-avt-rtcp-feedback-00.txt TU Berlin
Joerg Ott
Universitaet Bremen TZI
July 14, 2000
Expires December 2000
RTCP-based Feedback for Predictive Video Coding
Status of this Memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC 2026. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
0. Open Issues
1) Should the draft limit itself to supporting feedback for video only
or should it target a more general solution for feedback? At the
moment, the draft covers only video.
2) Should the feedback be restricted to point-to-point scenarios or
should we support (small group) multicast. At the moment, the draft
is designed to scale to (small) group.
3) Feedback traffic explosion is prevented by a) dithering and b)
damping. a) somewhat poses constraints on timely transmission of
feedback. b) prevents that the encoder can learn about the
_severeness_ of a loss problem (e.g. how many receivers have now a
bad picture). This prevents adaptive encoder reaction based on the
perceived quality of the whole group. At the moment, a) and b) are
both to be used to be network friendly. Which mechanisms (besides
flooding the network which we want to avoid) are conceivable to
support an approach that is able to achieve a better perceived
picture quality?
Wenger/Ott Expires December 2000 [Page 1]
Internet Draft July 14, 2000
4) Is the maximum number of MBs 8191 for SLI sufficient? Yes for MPEG-
1, MPEG-2 and ITU-T H.261, H.263. What about MPEG-4?
5) Should there be a special mode (possibly optimized for point-to-point
communication) that allows UMs packets without RR (see section 3)?
6) RPS/NEWPRED also make use of positive acknowledgements. Obviously,
this does inherently not scale to multicast. Should there be a
point-to-point mode that allows positive ACKs?
7) We have not yet considered the use of layered codecs. When
transporting each layer in its own RTP stream, everything should be
ok. If not, then we can foresee problems.
8) Section 7 on NEWPRED needs more work (probably based on Fukunaga et.
al draft).
9) Further work is needed on maximum group size estimation for using
feedback and on more detailed guidelines on calculating the maximum
dithering delay for Early RRs (T_dither_max) per UM type.
10)
Further investigations are desirable for the Early RR/UM scheduling
and damping and the relationship of Early RR/UM scheduling to regular
RTCP report scheduling.
1. Abstract
Predictive video coding is not loss resilient. Any loss of coded
data leads to annoying artifacts not only in the reproduced picture
in which the loss occurred, but also in subsequent pictures. Error
resilience can be achieved by spending bits to convey redundant
information using source coding based mechanisms or transport based
mechanisms. This can be done without the use of any feedback between
the decoder(s) and the encoder.
Alternatively, where applicable, decoders can inform the encoder
through a feedback channel about a loss situation, and the encoder
can react accordingly. This approach provides better picture quality
and is more efficient with respect to the bandwidth used by the
encoder to achieve a given quality. However, using feedback
mechanisms is limited to certain application scenarios identified by
encoder characteristics, delay constraints, and/or the number of
recipients. This document discusses various types of feedback
information (called _upstream messages_, UMs) for predictive video
coding and defines an RTCP packet format to transmit UMs in an RTP
environment. It can be used in conjunction with all payload
specifications for predictive video coding schemes currently
available for RTP. To reflect the need for very low delay for the
transmission of the UMs, which is necessary to make them efficient,
the rules for sending receiver reports are enhanced to support Early
Receiver Report (Early RRs) and an algorithm is specified that allows
Wenger/Ott Expires December 2000 [Page 2]
Internet Draft July 14, 2000
for low delay in small multicast groups, but prevents network
flooding.
2. Introduction
2.1. Video Encoder-decoder synchronicity
Most current video coding schemes for compressed video, such as the
ITU-T H.261 and H.263 and ISO/IEC MPEG[124] employ a mechanism known
as Inter Picture Prediction. Each picture is divided into
macroblocks of uniform size. For each macroblock, one or more
motion vectors may be identified and transmitted. The residual
signal after motion compensation is DCT-transformed, quantized,
entropy coded, and transmitted as well. The encoder reconstructs,
based on this information, a so-called reference picture, which is
used to perform the motion compensation and residual signal coding
steps for the subsequent picture. Since the reference picture is
generated using only such information that is also available at the
decoder, the reference picture is identical to the reconstructed
picture at the decoder. Having identical reference pictures at the
encoder and decoder is referred to as encoder-decoder-synchronicity.
Whenever data is damaged or lost on the way between the encoder and
the decoder, the reconstructed picture at the decoder is no more
identical with the encoder's reference picture -- the encoder-decoder
synchronicity is lost.
Any loss of the encoder-decoder synchronicity results in annoying
artifacts at the decoder. Because the prediction of subsequent
pictures in the decoder is based on a damaged reference picture, the
annoying artifacts are present not only in the picture in which the
loss occurred; they propagate to all subsequent pictures, until,
through source coding based mechanisms, the encoder-decoder
synchronicity is restored. Therefore, the goal of systems employing
predictive video coding in a lossy environment must be to keep the
encoder-decoder synchronicity, or, if this is not possible, to regain
that synchronicity as quickly as possible.
2.2. Non-feedback based mechanisms
Avoiding the loss of the encoder-decoder synchronicity corresponds to
avoiding the loss of coded picture data. Such a task can be
performed on the transport layer. In RTP environments, the use of
packet-based FEC is a good example for such a technique. (The use of
TCP or reliable multicast as the transport for media streams would be
an even better one but is inappropriate for low-delay (interactive)
real-time systems.) FEC schemes, interleaving, and other means for
repairing real-time media streams may also add additional delay and
significant bit rate overhead without being able to guarantee
compensation of virtually all packet losses.
Wenger/Ott Expires December 2000 [Page 3]
Internet Draft July 14, 2000
Once the encoder-decoder synchronicity is lost, only source coding
oriented mechanisms can help to regain it. One common way is to send
a non predictively coded picture (known as Intra picture). Intra
pictures have the disadvantage of being several times bigger than
predictively coded pictures (Inter pictures). Therefore, sending
Intra pictures has negative implications both on the bandwidth and
(in bandwidth limited environments) delay. Another way is to use
Intra macroblock refresh. Here, certain parts of the picture (those
affected by a packet loss) are coded non predictively in order to
resynchronize the encoder and decoder over time. Intra macroblock
refresh has better delay characteristics then full Intra pictures
because the picture size can be kept constant, but is less efficient
in terms of bit rate/distortion than full Intra pictures. More
sophisticated means such as Reference Picture Selection (RPS) are
also available in modern video coding standards.
Systems not employing feedback channels may use any combination of
the mechanisms described above to add error resilience -- at the cost
of added bit rate and, sometimes, added delay. The number of
additional bits spent for error resilience can be adapted using the
long-term packet loss rate information in the RTCP receiver reports.
But, even when using such adaptive means, it is still likely that
systems spend many more bits then theoretically necessary to achieve
error resilience in order to be on the safe side. Plus, as regular
RTCP feedback is aimed at longer terms, reactivity to sudden losses
is limited. In all practical applications today this means that
fewer bits are available for non redundant picture data, and hence
the overall picture quality suffers.
2.3 Feedback based systems
Feedback-based systems try to avoid spending too many bits for
redundant information by informing the encoder about a loss situation
at the decoder(s). The encoder can then react accordingly and spend
redundant bits only when needed possibly only for the part of the
picture that was effected by the loss -- thereby reducing the number
of redundant bits and leaving more bits for useful information. As a
result, a higher reproduced picture quality can generally be expected
when feedback channels are available.
Similar to the observations of section 2.2, transport and source
coding based mechanisms can be distinguished that react on loss
situations reported by feedback.
Transport based systems employing feedback react media unaware, by
re-transmitting lost packets. TCP is a good example for a protocol
following such a scheme. Transport-based feedback in real-time
and/or multicast environments is a complex matter and subject of a
lot of engineering and research in and outside of the IETF. This
specification is not concerned with pure transport-based feedback.
Wenger/Ott Expires December 2000 [Page 4]
Internet Draft July 14, 2000
Source coding based mechanisms may react upon the arrival of an
upstream message indicating a loss situation by adding bits that
restore, or at least make an effort to restore, the encoder-decoder
synchronicity. This process has to be performed by a real-time
encoder. However, schemes were reported, that allow the use of
feedback also for non-real-time encoders by storing multiple
representations of the same data (e.g. Inter and Intra coded), and
dynamically switching between those representations.
Several types of feedback messages, called Upstream Messages or UMs,
are defined in this specification. A UM can be as simple as a
Boolean condition, indicating for example the loss of a full picture
(and, therefore, the need of a full Intra picture transmission).
Other feedback messages may contain more complex information such as
information about the damage of a spatial region of the picture. A
special form consists of a message the format and semantics of which
are not known at the transport level, because they are defined in the
video codec standards.
Most UMs contain negative acknowledge information, indicating an
erroneous situation at the decoder. In others, the nature of the
acknowledge (positive, negative, or both) is part of the feedback
message itself. When used in multicast environments, positive
acknowledge MUST NOT be used.
This document assumes that feedback messages are transmitted using
RTCP packets. RTCP messages from the receivers to the sender cannot
be send at any possible time, in order to prevent traffic explosion
in case of large multicast groups. Instead, the bit rate for all
RTCP messages of all receivers together has to obey a maximum
fraction of the total RTP session bit rate, yielding a very limited
bit rate budget for a single receiver when having a large multicast
group. This, in turn, leads to an increased average delay when the
size of the receiving multicast group grows. (see section 6 of
draft-ietf-avt-rtp-new-06.txt for details)
This specification defines an algorithm that adheres to the bit rate
limitations for the feedback channel on the long term, but allows
short-term overdrafting for any receiver (but not all of them
simultaneously). Thus, the algorithm allows for better real-time
performance then the one specified in draft-ietf-avt-rtp-new-06.txt.
Traffic explosion in such cases in which many receivers identify a
picture damage simultaneously is prevented by dithering.
As this specification assumes a real-time encoder that has full
control over its transmission bit rate, there is no scaling problem
on the forward channel. Any reaction to negative feedback generates
additional bits, which have to be conveyed but this is taken from the
sender's total bit rate budget. The encoder can take this into
account by, for example, sending fewer pictures per second, lower the
quality and bit rate by changing quantization parameters and so
forth. The sender is also free to simply ignore feedback messages.
Wenger/Ott Expires December 2000 [Page 5]
Internet Draft July 14, 2000
Adjusting the tradeoff between the reproduced picture quality of all
receivers of a multicast group and the amount of bits spent for
encoder-decoder re-synchronization is a very complex task and is not
covered in this specification.
This document currently covers feedback messages for a Picture Loss
Indication (PLI), Slice Loss Indication (SLI), and Reference Picture
Selection Indication (RPSI). PLI indicates the loss of a full
picture and roughly corresponds to the Fast Intra Request known from
H.320 systems and from RFC 2032 (H261 packetization). Algorithms
using SLI can be found under the acronym Automatic Repeat Request
(ARQ) in the signal processing literature. Reference Picture
Selection, aka NEWPRED, is available in certain profiles of MPEG-4
(version 2 and later) and as an optional mode in H.263 (version 2 and
later). The packet format specified in this document is open to
extensions so that future feedback mechanisms can easily be
integrated.
2.4. Applications and Relationships to other Standards
This specification is based on RTCP, which implies its use in an RTP
environment. RTP itself is used in a variety of systems such as in
SIP- or H.323-based multimedia conferencing/telephony.
As for the video codecs, there is currently a small set of standards
that are, for the purpose of this discussion, roughly comparable.
Many mechanisms for regaining encoder-decoder synchronicity are
applicable to all video codecs. Others require certain tools (such
as Reference Picture Selection, aka NEWPRED) that are available only
in certain versions of the standards, and/or optional tools whose use
must be negotiated prior to being used.
A few RTP payload specifications such as RFC 2032 already define a
feedback mechanism for some of the coding algorithms considered in
this specification. An application capable of performing both
schemes MUST use the feedback mechanism defined in this
specification, although, for backward compatibility reasons, it MUST
also be capable to conform to the feedback scheme defined in the
respective RTP payload format, if this is required by that payload
format.
2.5 Remarks on the size of the multicast group
This specification makes an attempt to prevent traffic explosion on
the feedback channel in a very similar way as RTP does, with the
exception of allowing individual receivers to overdraft their bit
rate budget from time to time. This is necessary in order to allow
for low delay, which is needed by the algorithms reacting to UMs.
This scaling, however, limits the usefulness of this mechanism in
multicast groups from a certain size upwards (where the size
Wenger/Ott Expires December 2000 [Page 6]
Internet Draft July 14, 2000
threshold depends on a number of parameters including loss rate,
frame rate). The maximum size of the multicast group is not
specified here (which is soft and also depends on application
requirements). The authors have done some rough calculations (for
which it is too early to present them here in detail) that suggest
that feedback is not expected to yield acceptable results for group
sizes larger then 10 receivers (often less than five), assuming
today's network conditions (RTT, loss rate) and common bit rates.
2.6 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [xxx]
3. Low delay RTCP Feedback
UMs are part of the RTCP control streams and are thus subject to the
same bandwidth constraints as other RTCP traffic. This means in
particular, that it may not be possible to report a packet loss at a
receiver immediately back to the sender. However, the value of
feedback given to a sender typically decreases over time -- in terms
of the media quality as perceived by the user at the receiving end
and/or the cost required to achieve media stream repair.
RFC1889bis (i.e. draft revision draft-ietf-avt-rtp-new-06.txt)
specifies rules when RTCP receiver reports (RRs) should be sent.
This specification modifies those rules in order to allow
applications to timely report damaged pictures, since most algorithms
that use UMs are very critical to the UM timing. See section 5 and
following for a discussion of the impact of delay on the performance
of each UM type.
The modified algorithm can be outlined as follows: Normally, when no
UMs have to be conveyed, RRs are sent following the rules of
RFC1889bis. If one or more receivers detect the need for an UM, the
receiver first checks whether it has already seen a corresponding UM
from any other receiver (which it can do as UMs are transmitted via
multicast). If this is the case then the receiver refrains from
sending the UM, and continues to follow the regular RR sending
schedule. If the receiver has not yet seen a similar UM from any
other receiver, it checks whether it has already overdrafted its RTCP
bit rate budget before (without waiting for its regularly scheduled
RR time). Only if this is not the case, it sends the UM, after
waiting a short, random dithering interval period. Note that always
a complete RR is sent in addition to the UM, in order to a) follow
the rules for compound packets, and b) make sure that a sufficiently
large number of RRs from each receiver is transmitted. Considering
the overhead for IP and UDP packets, it is believed that these
advantages outweigh the disadvantage of preventing RTCP packets that
contain only UMs.
Wenger/Ott Expires December 2000 [Page 7]
Internet Draft July 14, 2000
3.1. Definitions
[Note: not all are used in this first revision of the draft.]
a) Let the video stream be transmitted at a (roughly) constant frame
rate f (in frames per second). This results in an inter-frame
time period of tau=1/f if frame are sent in regular intervals.
b) For timing considerations, we assume that a single frame is always
carried in a single packet. If a frame does not fit into the MTU,
then the frame is split across several packets. Gaps are then
measured between always the first or always the last packet of a
frame. For later considerations on feedback delay, if a frame is
split its packets are paced for transmission (rather than sent as
a burst) over some time period T_split, this can be modeled as a
_constant_ added to the overall transmission delay from the sender
to the receiver.
c) Let T_rtt be the maximum round trip time as measured by RTCP.
Note that this may be asymmetric.
d) Let T_jitter be the maximum jitter measured from a sender to a
receiver.
e) Let t_rr and t_(rr-1) be the time for the next (last) scheduled
RTCP RR transmission calculated prior to reconsideration.
Let T_rr + t_(rr-1) = t_rr. (In the RFC1889bis draft these are
termed tp, tn, respectively).
f) Let t_e be the time for which a feedback packet is scheduled.
g) Let t_dither_max be the maximum interval for which an RTCP
feedback packet may be additionally delayed (to prevent
implosions).
h) Let T_fd be the delay for the feedback message that a certain
packet to return to the sender after.
i) Let S be the number of active senders in the RTP session.
j) Let N be the current estimate of the number of receivers in the
RTP session.
3.2. RTCP Feedback
The feedback situation for a packet loss at a receiver is depicted in
figure 1 below. At time t0, a packet loss is detected at the
receiver. The receiver decides -- based upon current T_rtt, group
size, and other (application-specific) parameters -- that a certain
type of feedback information shall be sent back to the sender.
Wenger/Ott Expires December 2000 [Page 8]
Internet Draft July 14, 2000
To avoid an implosion of immediate feedback packets, the receivers
delays transmission of the feedback packet(an Early RTCP RR/FB
packets) by a random amount T_fd (with the random number evenly
distributed in the interval [0, T_dither_max]. Transmission of
the RTCP RR/FB is then scheduled for t_e = t0 + T_fd.
The T_dither_max parameter depends on the feedback algorithm used
(PLI, SLI, RPSI) and needs to take into account a number of other
parameters (such as the estimated round-trip time) to limit the upper
bound for the feedback in a way that ensures that the feedback
information still makes sense when it reaches the sender.
If an RTCP feedback packet is scheduled, the time slot for the next
scheduled RTCP RR is updated accordingly to a new t_rr taken from
the interval [t_(rr-1) + 2*T_rr, t_e + 2*T_rr] (with T_rr being the
newly calculated deterministic RTCP interval.
pkt loss
detected
|
| RTCP feedback
vXXXXXXXXXXXXXXXXXXXX ) )
|---+--------+-------------+-----+------------| |--------+--------->
| | | | ( ( |
| t0 te |
t_(rr-1) t_rr
\_______ ________/
\/
T_dither_max
Figure 1: Packet loss and parameters for Early RR scheduling
3.3. Early RR/UM Algorithm
Assume an active sender S0 (out of S senders) and a number N of
receivers with R being one of these receivers.
Assume further that R has verified that using feedback mechanisms is
reasonable at the current constellation (which is highly application
specific and hence not specified in this document at the moment; a
future revision may contain more detailed guidelines to this end).
Then, the following rules apply to transmitting an Upstream Messages
(UM) as compound packet with RTCP RR and possibly other information.
This compound RTCP packet is referred to as _RTCP RR/UM_.
Initially, R sets allow_early=TRUE.
Wenger/Ott Expires December 2000 [Page 9]
Internet Draft July 14, 2000
At a point in time t0, R has transmitted the last RTCP RR packet at
t_(rr-1) and has scheduled the next transmission (prior to
reconsideration) for t_rr.
If R detects a packet loss at time t0 then R should check first
whether its next regularly scheduled RTCP RR is within the time
bounds for the RTCP UM (t_e + t_dither_max > t_rr). If so, no Early
RR is scheduled; instead the UM is appended to the regular RTCP RR.
Otherwise, R should check whether it is allowed to transmit an Early
RR/FB packet (allow_early==TRUE).
If so, R creates a UM unit, calculates t_dither_max and then
schedules an early RR/UM packet for t_e = t0 + RND * t_dither_max
with the RND function evenly distributed between 0 and 1.
If R receives an RR/UM packet (indicating the same or a superset
of the feedback information R wanted to transmit) before t_e is
reached, the FB information is discarded and the transmission
schedule for the next RR packet is reset to t_rr as calculated
before.
(Note: if the UM is piggybacked onto a regularly scheduled RTCP RR
message, this should not affect transmission of the RR; but
should the UM then be removed from the compound RR/UM?)
Otherwise, when t_e is reached, R creates an RR, appends the UM
information, and transmits the RR/UM packet. R then sets
allow_early=FALSE and recalculates t_rr += T_rr (possibly
t_rr = t_e + 2*T_rr or some value in between; this needs further
work). As soon as R sends its next regularly scheduled RTCP RR
(at the new t_rr), it sets allow_early=TRUE again.
Option: R also starts a timer T_allow (e.g. T_allow=T_rr).
If T_rr expires before an Early RR/UM is received from
another participant in the RTP session, R sets
allow_early=TRUE. If an Early RR/UM is received from
another participant before T_allow expires, T_allow
is cancelled.
If allow_early==FALSE then R calculates t_dither_max and checks the
time for the next scheduled RR: if t_rr - t0 < t_dither_max then R
creates an FB unit for transmission along with the RR packet at t_rr
(see above). Otherwise, R does not send an RTCP RR/UM.
Note: A bit in the UM unit is required to indicate whether the
transmission occurs as an Early RR/FB or as a regularly
scheduled RR/FB packet. This E-bit is to be set accordingly.
See section 4 for details.
Note: Numerous variations spring to mind on RTCP RR/UM scheduling,
dithering, damping, etc. Right now, this is deliberately kept
simple for an easy starting point and to provoke further
Wenger/Ott Expires December 2000 [Page 10]
Internet Draft July 14, 2000
discussions.
3.4. Summary of decision steps
Before even considering whether or not to send RTCP UM information an
application has to determine whether this mechanism is applicable:
1) An application has to decide whether -- for the current ratio of
frame rate with the associated (application-specific) maximum
feedback delay and the currently observed round-trip time --
feedback mechanisms can be applied at all.
2) The application has to decide whether -- for a certain observed
error rate, assigned bandwidth, frame rate, and group size -- (and
which) feedback mechanisms can be applied.
3) If these tests pass, the application has to follow the rules for
transmitting early RTCP RRs or regularly scheduled RTCP RRs with
piggybacked UMs.
4. Format of RTCP Feedback messages
The general format of an UM is outlined below. Compound packets
including UMs are possible. All UMs concerning any given picture of
any given receiver MUST be conveyed in a single compound packet, in
order to prevent the loss of parts of such a combined message. It
SHOULD be avoided to combine different types of UMs for any given
picture of any given receiver.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2| UMT |E| PT=RTCP-Feedb | length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SSRC |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| Upstream Control Information (UCI)
|
| +-+-+-+-+-+-+-+-+-+-+-+-+-+
| : padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
version (V): 2 bits
Identifies the version of RTP, which is the same in RTCP packets as
in RTP data packets.
Upstream Message Type (UMT): 5 bits
Identifies the type of the upstream message.
0: forbidden
Wenger/Ott Expires December 2000 [Page 11]
Internet Draft July 14, 2000
1: Picture Loss Indication
2: Slice Lost Indication
3: Reference Picture Selection Indication
4-31: reserved
Packet Type (PT): 8 bits
Constant value (TBD) identifying RTCP Upstream messages.
Early Upstream Message (E): 1 bit
A bit that, when set, indicates that the UM is sent early, i.e. did
not follow the regular schedule for sending RTCP Receiver Reports.
Length: 16 bits: Number of bits valid in the UCI field. A zero value
indicates that the UCI field is not present (e.g. in case of a
Picture Intra Request).
SSRC: 32 bits
SSRC is the synchronization source identifier for the sender of this
packet.
Upstream Control Information (UCI): variable
Format and semantics of the UCI defer for the various upstream
message types. Fragmentation of an upstream message into several UCI
fields is prohibited. See the following sections for their
definition.
5. Message Type 1: Picture Loss Indication (PLI)
5.1 Semantics
With the Picture Loss Indication message a decoder informs the
encoder about the loss of one or more full pictures
5.2 Format
PLI does not require parameters. Therefore, the length field MUST be
0, and there MUST NOT be Upstream Control Information.
5.3 Timing Rules
The timing follows the rules outlined in section 3. In systems that
employ both PLI and other UM types it may be advisable to follow the
regular RTCP RR timing rules, since PLI is not as delay critical as
other UM types.
5.4 Remarks
PLI messages typically trigger the sending of full Intra pictures.
Intra Pictures are several times larger then predicted (Inter)
pictures. Their size is independent of the time they are generated.
In most environments, especially when employing bandwidth-limited
Wenger/Ott Expires December 2000 [Page 12]
Internet Draft July 14, 2000
links, the use of an Intra picture implies an allowed delay that is a
significant multitude of the typical frame duration. An example: If
the sending frame rate is 10 fps, and an Intra picture is assumed to
be 10 times as big as an Inter picture (not an unrealistic
assumption, see [] for details), then a full second of latency has to
be accepted. In such an environment there is no need for a
particular short delay in sending the upstream message. Hence
waiting for the next possible time slot allowed by RFC1889bis RTCP
timing rules does not negatively influence system performance.
6. Message Type 2: Slice Lost Indication
6.1 Semantics
With the Slice Lost Indication a decoder can inform an encoder that
it was unable to decode one, or several consecutive, macroblocks.
The encoder can take appropriate action in order to re-synchronize
encoder and decoder by means of its choice, typically by sending the
lost macroblocks in Intra mode. This upstream message SHALL NOT be
used for video codecs with non-uniform, dynamically changeable
macroblock sizes such as H.263 with enabled Annex Q. In such a case,
an encoder cannot always identify the corrupted spatial region.
6.2 Format
When UMT indicates a Slice Lost Indication, then there is one
additional UCI field the content of which is in the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| First | Number | TR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
First: 13 bits
The macroblock (MB) address of the first lost macroblock. The MB
numbering is done such that the macroblock in the upper left corner
of the picture is considered macroblock number 1 and the number for
each macroblock increases from left to right and then from top to
bottom in raster-scan order (such that if there is a total of N
macroblocks in a picture, the bottom right macroblock is considered
macroblock number N).
Number: 13 bits
The number of lost macroblocks, in scan order as discussed above.
TR: 6 bits
The six least significant bits of the Temporal Reference of the
picture.
6.3 Timing Rules
Wenger/Ott Expires December 2000 [Page 13]
Internet Draft July 14, 2000
The efficiency of algorithms using the Slice Lost Indication is
reduced greatly when the Indication is not transmitted in a timely
fashion. Motion compensation propagates corrupted pixels that are
not reported as being corrupted. Therefore, the use of the algorithm
discussed in section 3 is highly recommended.
Constraints on T_dither_max to be discussed.
6.4 Remarks
The First field of the UCI defines the first macroblock of a picture
as 1 and not, as one could suspect, as 0. This was done to align
this specification with the comparable mechanism available in H.245.
The maximum number of macroblocks in a picture (2**13 or 8192)
corresponds to the maximum picture sizes of the ITU-T and ISO/IEC
video codecs. If future video codecs offer larger picture sizes
and/or smaller macroblock sizes, then an additional upstream message
has to be defined. The six least significant bits of the Temporal
Reference field are deemed to be sufficient to indicate the picture
in which the loss occurred.
Algorithms were reported that keep track of the regions effected by
motion compensation, in order to allow for a transmission of Intra
macroblocks to all those areas, regardless of the timing of the UM
[TBP.]. While, when those algorithms are used, the timing of the UM
is less critical then without, it has to be observed that those
algorithms correct large parts of the picture and, therefore, have to
transmit many for bits in case of delayed UMs.
7. Message Type 3: Reference Picture Selection Indication
7.1 Semantics
Modern video coding standards such as MPEG-4 visual version 2 or
H.263 version 2 allow the use of older reference pictures then the
most recent one. Typically, a first-in-first-out queue of reference
pictures is maintained. If an encoder has learned about a loss of
encoder-decoder synchronicity, a known-as-correct reference picture
can be used. As this reference picture is temporally further away
then usual, the resulting predictively coded picture will use more
bits.
Both MPEG-4 and H.263 define a binary format for the _payload_ of an
RPSI message that includes information such as the temporal ID of the
damaged picture and the size of the damaged region. This bit string
is typically small _- a couple of dozen bits -_, of variable length,
and self-contained, i.e. contains all information that is necessary
to perform reference picture selection.
Note that both MPEG-4 and H.263 allow the use of RPSI with positive
feedback information as well. That is, all corrected pictures are
Wenger/Ott Expires December 2000 [Page 14]
Internet Draft July 14, 2000
reported. Any form of positive feedback MUST NOT be used when in a
multicast environment (reporting positive feedback about individual
reference pictures at RTCP intervals is not expected to be of much
use anyway). For point-to-point communication, positive feedback MAY
be used but, again, the bit rate budget of RTCP feedback will prevent
the use in most scenarios anyway.
7.2 Format
When UM indicates an RPSI, then the length field is set to the number
of bits of the following bit string that contains the RPS
information. This bit string follows byte aligned in the UCI field.
Bit padding is used to achieve 32-bit word alignment of the UCI
message (and the whole packet).
7.3 Timing Rules
RPS is even more critical to delay then algorithms using SLI. This
is due to the fact that the older the RPS message is, the more bits
the encoder has to spend to achieve encoder-decoder synchronicity.
See [TBP.] for some information about the overhead of RPS for certain
bit rate/frame rate/loss rate scenarios.
Therefore, RPS messages should typically be sent as soon as possible,
employing the algorithm of section 3.
Constraints on T_dither_max to be discussed.
7.4 Remarks
[To Do]
8. Security considerations
RTP packets transporting information with the proposed payload for-
mat are subject to the security considerations discussed in the RTP
specification [1]. This implies that confidentiality of the media
streams is achieved by encryption.
If the entire stream (extension data and AU data) is to be secured
and all the participants are expected to have the keys to decode the
entire stream, then the encryption is performed in the usual manner,
and there is no conflict between the two operations (encapsulation
and encryption).
The need for a portion of stream (e.g. extension data) to be
encrypted with a different key, or not to be encrypted, would require
application level signaling protocols to be aware of the usage of
the XT field, and to exchange keys and negotiate their usage on the
media and extension data separately.
Wenger/Ott Expires December 2000 [Page 15]
Internet Draft July 14, 2000
9. Acknowledgements
Large parts of the syntax and the text concerned with RPS and NEWPRED
were borrowed from an early I-D from Fukunaga et. al. that was
concerned with MPEG-4 ES packetization.
10. Full Copyright Statement
Copyright (C) The Internet Society (1999). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works.
However, this document itself may not be modified in any way, such as
by removing the copyright notice or references to the Internet Soci-
ety or other Internet organizations, except as needed for the purpose
of developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be fol-
lowed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER-
CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
11. Authors' Addresses
Stephan Wenger (stewe@cs.tu-berlin.de)
TU Berlin
Sekr. FR 6-3
Franklinstr. 28-29
D-10587 Berlin
Germany
Joerg Ott (jo@tzi.uni-bremen.de)
Universitaet Bremen TZI
MZH 5180
Bibliothekstr. 1
D-28359 Bremen
Germany
12. Bibliography: TODO
Wenger/Ott Expires December 2000 [Page 16]