Audio-Video Transport Working Group                            D. Budge
INTERNET-DRAFT                                              R. McKenzie
                                                               W. Mills
                                                                W. Diss
                                                                P. Long
                                             Smith Micro Software, Inc.
                                                               May 1997
                                               Expires: December 4 1997


             Media-independent Error Correction using RTP
               draft-budge-media-error-correction-00.txt

Status of This Memo

This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

To view the entire list of current Internet-Drafts, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).

Abstract

This document specifies a media-independent error-correction scheme
using the Real-Time Transport Protocol (RTP), along with the payload
format for encapsulating both error-correction signaling and media
bitstreams in RTP. It enables the reconstruction of lost packets across
a connectionless transport such as RTP over UDP. The goal of this scheme
is to maximize isochrony, the regular and timely delivery of data, with
minimal bandwidth, latency, and computational costs.

Table of Contents

1. Background..........................................................2
2. Internet Behavior...................................................3
3. Effects of Packet Loss..............................................4
4. Alternative Solutions...............................................4
5. This Solution.......................................................5
6. Usage of RTP........................................................6
 6.1 RTP Header Usage .................................................6
 6.2 RTP Packet Structure .............................................7


Budge, et al.                                                 [Page 1]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


7. Error-correction Payload Header.....................................7
 7.1 Error-Correction Schemes .........................................9
  7.1.1 Error-Correction Scheme 0 (1:1:0) .............................9
  7.1.2 Error-Correction Scheme 1 (2:1:1) ............................10
  7.1.3 Error-Correction Scheme 2 (3:2:2) ............................10
  7.1.4 Error-Correction Scheme 3 (2:1:4) ............................11
 7.2 Changing Error-Correction Scheme During an RTP Session ..........12
  7.2.1 How Changing Scheme is Possible ..............................13
  7.2.2 Scenarios for Choosing a Scheme ..............................13
 7.3 Stream Interruptions ............................................13
  7.3.1 The Problem with Stream Interruptions ........................14
  7.3.2 Handling Stream Interruptions by Changing Schemes ............14
  7.3.3 Handling Stream Interruptions by Inserting Null Payloads .....15
  7.3.4 Comparison of Handling Stream Interruptions ..................15
 7.4 Tutorial ........................................................15
8. Security Considerations............................................16
9. Conclusions........................................................16
10. References........................................................17
11. Authors' Address..................................................17


1. Background

Data communication over the Internet is markedly different from point-
to-point communication via modems. Modems can communicate data with
relatively low latency (in milliseconds) and low data protocol overhead
(the number of payload data bytes transmitted relative to the total
number of bytes transmitted). Modem data errors take the form of
corrupted or missing bytes. Because there is such low latency on modems,
when an error occurs, we can simply ask the modem to retransmit the
corrupted data.

The Internet has variable latency that is an order of magnitude greater
than modem communications (milliseconds versus several seconds) and
relatively high protocol overhead (for audio communications, can be up
to 25 percent versus 5 percent over a modem). Communications bandwidths
on the Internet are constantly varying from bytes per second to Kbytes
per second. Internet data errors take the form of whole packets of data
being lost. Large latencies prohibit the retransmission of corrupted
data which would cause pauses of, for example, 5 to 20 seconds in a
video conference. Because of these limitations, video conferencing using
the Internet is currently limited to the those users who are more
fascinated with the technology than actually communicating with another
person. The challenge then is to deliver the best video conferencing
technology possible on today's Internet despite its failings.








Budge, et al.                                                 [Page 2]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


Finally, despite popular opinion, H.323 [1] is not a reasonable Internet
video conferencing standard--it is an intranet video conferencing
standard. The needs of a user connected to his or her Internet Service
Provider (ISP) via a modem are very different than the needs of a user
connected to a corporate LAN, with its large bandwidth capacity and
relatively low latency.

2. Internet Behavior

Communication across the Internet consists of the transmission of IP
data packets [2]. Several different packet types/protocols are
available, the two principle protocols being TCP [3] and UDP [4]. TCP is
the most familiar protocol because it provides guaranteed delivery of
data between two points. In order to guarantee delivery, TCP has a
handshaking mechanism where a buffer is transmitted to the receiver, the
receiver checks the buffer for errors and then acknowledges correct
delivery before the next buffer is transmitted. If the buffer is not
acknowledged, it is retransmitted. Although this is a reasonable scheme
for sending data, it is problematic for sending real-time data such as
audio and video across the Internet. One problem is latency. It takes so
long for the acknowledgment to get back to the receiver that the data
channel is frequently idle. For example, if the round trip delay on a
20,000-bit-per-second TCP connection is 2 seconds, sending 1000-byte TCP
packets results in only 20 percent usage of available bandwidth.
Furthermore, real-time data is inherently time-critical. After a few
seconds (as in the case of a retransmission caused by a failed
acknowledgment), some of the data has "spoiled," i.e., it is no longer
useful to the receiver. Finally, due to retransmission, TCP is not an
acceptable protocol for multicast signals, since each receiver may or
may not require retransmission.

The packet type most often used to transmit real-time data like audio
and video on the Internet is UDP, or User Datagram Protocol. UDP affords
higher throughput and lower latency than TCP at the expense of data
integrity. UDP is an unreliable protocol in that there is no guarantee
of packet delivery. The errors seen with UDP include lost packets,
packets arriving out of sequence, and duplicate packets. Duplicate
packets and out-of-order packets are easily handled. Lost packets are
another story.

We have observed that the amount of packet loss varies between 5 to 20
percent across an Internet connection. Further, we have observed that
packet loss is unpredictable--random over a uniform distribution--and
the packets are usually not dropped in temporally close groups. For
example, in a scenario where there is 10 percent packet loss, the
dropped packets occur on average 1 out of every 10, with 2 sequential
packets being dropped 1 out of every 100 packets and so on.

(It has been pointed out that many studies have focused on aggregate
packet loss at a node, not packet loss across a connection [5]. Each
model exhibits a different pattern of packet loss. For example, a single


Budge, et al.                                                 [Page 3]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


connection may experience random, mostly single-packet and adjacent two-
packet losses while a network node may experience grouped packet losses
that occur in overlaid waves of different frequencies and relative
phases.)


3. Effects of Packet Loss

Audio and video performance are affected differently by packet loss.
Converting the audio data into sound for a given audio packet does not
depend on any previous or future audio data packets. That is, it is
temporally "self contained." Therefore when audio data is lost, silence
may be heard during the interval that the lost speech packet occupied.
The effect is somewhat like a microphone with a bad cable--speech with
holes in it. When packet loss approaches 10 percent, the effect is very
annoying, such that it would be unacceptable for users who are
accustomed to phone-quality speech. At 20-percent packet loss, the
signal approaches the unintelligible.

Unlike audio codecs, video codecs such as H.263 [6] have facilities for
error correction in the form of simple redundancy. The effects of packet
loss can be mitigated further by minimizing inter-frame dependency.
Temporary picture aberrations still occur, though, such as brief freezes
in the video signal.


4. Alternative Solutions

Because the H.323 video conferencing standard is--again--directed
towards LAN-based systems, it does not attempt to address the problem of
dropped audio packets since packet loss on corporate LANs is usually
well below 10 percent.

For video, H.323 attempts to align data packets with discrete segments
of a video frame and wrap that video segment with a header that gives
context to the video data in the current frame. The effect of this is
that, even when a video packet is dropped, packets that describe other
pieces of the current picture can update their respective piece. For
example, if a video data packet contains a complete set of data for 16
lines of pixels, and that packet is lost, other lines in that frame
could still be updated. A sender could anticipate or detect the
percentage of missing macroblocks at the receiver and send redundant
macroblocks intra. They could be sent on a schedule so that they are
received often enough that freezes are brief, bad visual effects are
minimized, yet they do not consume excessive bandwidth. However,
drawbacks of sending redundant macroblocks in H.323 include the amount
of overhead per packet and increased codec inefficiency caused by the
packetization schemes. Between IP, UDP, RTP [7] packet headers and the
need to encode Groups Of Blocks (GOBS) with headers, the efficiency of
data transfer is reduced by at least 25 percent--bandwidth capacity that



Budge, et al.                                                 [Page 4]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


is present on the corporate LAN but hardly over a 28.8kbps modem
connection to the Internet.

In addition, although completely contrary to the spirit of H.26x
compression, inter-frame dependency can be reduced by, for example,
turning motion vectors off. This prevents the errors resulting in
missing macroblocks from propagating themselves, at the expense of
increasing the size of the bitstream by 25 to 50 percent. Although the
means by which H.26x codecs cope with packet loss have their
disadvantages, they do exist. Therefore, there is a greater need for the
error-correction scheme described in this document for audio streams
than video streams, although it may be appropriate for video if
temporary picture aberrations are unacceptable to the user.

One promising solution that has been offered to overcome the Internet's
unreliability is that of layering video. The concept of layering is
simple. In essence, send two independent interleaved frame sequences.
Then if one sequence "takes a hit," one can at least continue to view
the other stream (realizing that half the frame rate has been lost)
while signally the transmitter to send another key frame which refreshes
both streams. This approach has merit and needs to be studied further.
However, according to the authors of this method, there is a 20 percent
degradation of effective bandwidth due to the fact that frame-to-frame
differences are computed over a two-frame interval rather than over a
single-frame interval. The fact that one of the streams will be knocked
out--and soon--can be empirically demonstrated. What one is left with is
a picture that sometimes is fast (but not for long) then degrades to
less than half of the frame rate that the channel is capable (40 percent
of maximum). In addition, audio is not protected by this mechanism.
Finally, this would represent a major change to most existing H.263
codecs. For software codec manufactures, the additional buffering would
be reflected in some amount of performance degradation.

A simple solution is to simply transmit each UDP packet twice,
regardless of whether it is carrying audio or video data. This fits
within the existing H.323 standard in that there is a requirement for
duplicate packets to be ignored. Sending each packet twice guarantees
correction of all single-packet losses and half of all two-packet
losses. Assuming a 1:10 chance of any packet being dropped, analysis
yields a freeze chance of approximately 1:200. At first glance, it
appears that the loss of effective bandwidth (only 50 percent of
maximum) is prohibitive. But when one analyzes the effects of dropped
audio packets and the 10x cost of transmitting key frames (not to
mention the time that the screen is frozen), it starts to look like a
pretty good way to go.


5. This Solution

Rather than layering the signal or sending each packet twice, the
remainder of this document describes the packetization and scheme for


Budge, et al.                                                 [Page 5]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


sending the exclusive-or, or XOR, of combinations of packets (The Mills-
and-McKenzie, or M&M, Algorithm). This scheme substantially increases
the reliability of packet delivery, because the original packets can be
reconstructed in several different ways, not just recovered from the
surviving duplicate.


6. Usage of RTP

Along with a header for error-correction information, defined in section
7, the media stream is carried as payload data within RTP packets.


6.1 RTP Header Usage

marker bit (M bit):

The RTP marker-bit field has the following interpretation unless a
profile supersedes it.

The RTP marker bit for a given packet shall be what would have been the
RTP marker bit for the original media payload of the most-recently-
transmitted non-null payload of those represented in the packet (the
timestamps in consecutive RTP packets might not be monotonic [7]). See
section 7.3 for a definition of "null payload." If a packet contains
only null payloads, the marker bit shall have no meaning.

For example, here is a list of RTP packets transmitted from left to
right, where uppercase letters represent original media payloads and xy
represents some arbitrary function, f(x,y) (f() is always the XOR
operation in this document).

A*, B*, ABC*, C*, ACD*, ABD*, D*, BCD*

What would have been the RTP marker bit for the original media payload
that is followed by an asterisk, *, is used for the value of this
packet's RTP marker-bit field. Knowing to which original media payload
the field belongs, one refers to the original media's RTP profile to
determine the final use of this field.

As an example of what happens in the presence of a null payload, here is
the same list where C is a null payload:

A*, B*, AB*C, C, ACD*, ABD*, D*, BCD*

payload type (PT):

This is the type of error-correction-encapsulated media, not the
original media. The packet type of the original media cannot be used,
because an error-correction-encapsulated media stream is different than
the original media stream. Therefore, a new static payload type may be


Budge, et al.                                                 [Page 6]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


defined for an error-correction-encapsulated version of a media stream
or, more likely, a dynamic payload type may be defined out-of-band.
Regardless, payload-type assignment and the mapping between original
payload type and error-correction-encapsulated payload type is outside
the scope of this document.

sequence number:

The RTP sequence number can be used to restore the original packet
sequence and determine whether and how many packets are lost, exactly as
without this error-correction scheme. It can also be used to determine
media context such as the spatial position of a video payload or the
timing of the source. However, since the value of this field increases
monotonically with respect to the sequence of generated RTP packets but
not necessarily with respect to the sequence of the original media
payloads (e.g., an error-correction RTP packet may contain the XOR of
three original media payloads or conversely an original media payload
may be represented in more than one error-correction RTP packet), the
receiver may also need to take into consideration the values of the
scheme and mode fields to determine the same media context.

timestamp:

The RTP timestamp field has the same interpretation as the marker-bit
field described above unless a profile supersedes this interpretation of
the timestamp field.


6.2 RTP Packet Structure

The error-correction payload header starts at the first octet in the RTP
payload. The media payload immediately follows the error-correction
payload header. The media payload is the data that would have otherwise
exclusively occupied the RTP payload if this error-correction scheme
were not used. The RTP profile defined for the media shall be used for
the packetization of the media payload field. The layout of the RTP
error-correction packet is shown as:

+---------------------------------------------------------------+
|                 RTP Header                                    |
|---------------------------------------------------------------|
|                 Error-correction Payload Header               |
|---------------------------------------------------------------|
|                 Media Payload                                 |
+---------------------------------------------------------------+


7. Error-correction Payload Header

Each RTP error-correction packet carries as many media packets as would
have been carried without error correction. The error-correction payload


Budge, et al.                                                 [Page 7]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


header is always present in each RTP packet even if the error-correction
scheme=0, which indicates "do not apply error-correction to this
packet," and not necessarily, "do not apply error-correction during this
RTP session."

Four error-correction schemes, i.e., 0, 1, 2, and 3, are defined for the
RTP error-correction payload header. The ability to receive packets of a
particular scheme is signaled out-of-band. Only one scheme applies to an
RTP packet at a time, but the scheme can change from one RTP packet to
another. The ability of the receiver to switch error-correction scheme
during an RTP session (not the actual switching) is also signaled out-
of-band.

(It would have been convenient to use the payload-type field in the RTP
header to express the information represented by the scheme and mode
fields, thus saving one byte per RTP packet. However, this could consume
several dynamic payload types in the rather small number space between
96 to 127 and require a relatively complicated out-of-band method to
assign dynamic payload types to the corresponding scheme and mode
combinations.)

The error-correction payload header is a single 24-bit word, which is
transmitted in network byte and bit order (decreasing significance) with
the most significant bit shown at the left in the following diagram.

 0                   1                   2
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|scheme | mode  |           length              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

scheme: 4 bits

The error-correction scheme that is being used on this and presumably
subsequent packets. (Note: This document describes a single error-
correction scheme in the larger sense, but this field identifies which
[sub]scheme is currently in use. Therefore, the word, "scheme," is
overloaded in this document.)

mode: 4 bits

Media-payload mode. This indicates the position of this packet in the
cycle of packets for this error-correction scheme. The sequence number
in the RTP header is used to establish further context in the packet
stream.

length: 16 bits

The XOR of the lengths of the original media payloads of which this
payload contains an XOR. For example, if an error-correction payload
contains (the scheme and mode indicate what it contains):


Budge, et al.                                                 [Page 8]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997



A XOR B XOR C

where the letters, A, B, and C, represent original media payloads, this
field contains:

(length of A) XOR (length of B) XOR (length of C)

The purpose is, by XORing the length in the same way as the media
payload, to recover the original media payload's length at the same time
as the payload is recovered.

If an error-correction payload contains an original media payload that
is not XORed with another one, this field simply contains that payload's
length.


7.1 Error-Correction Schemes

These are the error-correction schemes currently defined. In the future,
others may be defined with corresponding values for the scheme field of
the error-correction payload header.

If the media payloads being XORed are of different lengths, for example,
A XOR B, where A or B is shorter, or (A XOR B) XOR C, where (A XOR B) or
C is shorter, the shorter payload is effectively padded with zeros up to
the length of the longer one before the XOR operation is performed. This
padding is just for the XOR operation and is not part of the payload,
proper. When the original media payload is reconstructed along with its
length, the padding is ignored.

The notation, x:y:z, at the end of the following section headings is a
descriptive short-hand for the scheme. The first two numbers are the
ratio between the number of media payloads transmitted versus the number
of original media payloads. The third is the number of additional packet
delays incurred by using the scheme relative to not using error
correction at all. If a scheme is added whose short-hand would be the
same as a scheme already defined, it should be qualified in some way
such as by appending a lower-case letter as in 2:1:1a.


7.1.1 Error-Correction Scheme 0 (1:1:0)

This indicates that error correction is not being applied to the packet
stream. This allows an RTP session over a connection that is currently
experiencing extremely low packet loss to immediately provide error
correction in case the line degrades.

mode is unused but shall be set to 0.




Budge, et al.                                                 [Page 9]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


7.1.2 Error-Correction Scheme 1 (2:1:1)

With this simplest error-correction scheme, the packet stream is
translated to the original sequence with a packet of the XOR of each
adjacent pair inserted between them as in

A, B, C, D, E, F, . . . => A, AB; B, BC; C, CD; D, DE; E, EF; F, . . .

where AB stands for A XOR B, etc. Noting that XOR is associative,
commutative, and idempotent, where the latter means that XXY=Y, there
are two ways to reconstruct the first packet and three ways to
reconstruct all subsequent packets:

A = A = (AB)(B)
B = B = (A)(AB) = (BC)(C)
C = C = (B)(BC) = (CD)(D)

This strategy has the same 2:1 transmission overhead and single
additional packet delay of simply sending each packet twice; however, it
has much better error-correction capability. It protects against all
single-packet and two-packet losses, and 75 percent of three-packet
losses within the group of 4 sent packets for each pair of original
packets, while sending each packet twice protects against all single-
packet losses but only 50 percent of two-packet losses and no three-
packet losses.

mode shall be set to 1 for an XOR packet and 0 otherwise.


7.1.3 Error-Correction Scheme 2 (3:2:2)

We can increase our effective bandwidth relative to scheme=1 at the
expense of another packet delay and some error-correction capability.
This is done by sending, for each group of two packets, three
combinations of XOR packets. If the packet stream would have been

A, B, C, D, E, F, G

we first partition it into groups of two packets:

. . . B, C; D, E; F, G; . . .

Then each group is translated as follows into groups of three packets,
remembering that, for example, BC stands for B XOR C:

. . . B, C, BC; D, E, DE; F, G, FG; . . .

Finally, every second packet in a group is carried over into the next
group by XORing it with each packet in the next group, resulting in

AB, AC, ABC; CD, CE, CDE; EF, EG, EFG; . . .


Budge, et al.                                                 [Page 10]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997



(Since this is a streaming operation, these translations are not, of
course, done in separate passes as has been shown here.)

Because of the rolling nature of the scheme, analysis is not
straightforward. When the carry-over is known, it is protection against
any single-packet loss in a group.  When the carry-over is not known,
the carry-over can be reconstructed if all three packets of a group are
received. Reconstruction also depends on the carry-over situation. In
the presence of the carry-over, there are two ways to reconstruct each
packet:

B = (A)(AB) = (AC)(ABC)
C = (A)(AC) = (AB)(ABC)

Absent the carry-over, there is only one way to reconstruct each packet:

A = (AB)(AC)(ABC)    -- reconstructing the carry-over
B = (AC)(ABC)
C = (AB)(ABC)

This strategy has a 3:2 transmission overhead instead of 2:1 for
scheme=1 or sending each packet twice--a significant improvement. It
protects against all single-packet losses and against 73 to 83 percent
of two-packet losses within the group. It protects against 11 of the 15
two-packet losses, 1 of the 15 only leaves the last carry-over open
which may be reconstructed by the next group, and 3 of the 15 yield an
unrecoverable error.

Note that the above method achieves more error-correction capability
than duplicating each packet, with greater effective bandwidth at the
expense of 2 packet delays rather than 1. Where ideal transmission is
100 percent transfer rate, duplicating packets represents a 50 percent
transfer rate, and the above mechanism represents a 67 percent transfer
rate.

mode shall be set to 0 for the XOR of two adjacent original media
payloads, as in AB, CD, and EF, above, 1 for the XOR of two payloads
separate by another intervening payload, as in AC, CE, and EG, and 2 for
the XOR of three adjacent payloads, as in ABC, CDE, and EFG.


7.1.4 Error-Correction Scheme 3 (2:1:4)

To increase error correction further with the effective bandwidth of
scheme=1, but at the expense of 4 packet delays instead of 2 for
scheme=2 and 1 for scheme=1, a better strategy is to send, for each
group of four packets, eight combinations of XORs:

A, B, C, D; . . . => A, B, ABC, C, ACD, ABD, D, BCD; . . .



Budge, et al.                                                 [Page 11]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


This protects against one-, two-, and three-packet losses, plus
correction of 80 percent (56/70) of four-packet losses per group of
eight sent. For this strategy, there are eight ways to reconstruct each
packet:

A = A = (B)(C)(ABC) = (B)(D)(ABD) = (C)(D)(ACD) = (D)(ABC)(BCD) =
(C)(ABD)(BCD) = (B)(ACD)(BCD) = (ABC)(ABD)(ACD)

B = B = (A)(C)(ABC) = (A)(D)(ABD) = (C)(D)(BCD) = (D)(ABC)(ACD) =
(C)(ABD)(ACD) = (A)(BCD)(ACD) = (ABC)(ABD)(BCD)

C = C = (A)(B)(ABC) = (A)(D)(ACD) = (B)(D)(BCD) = (D)(ABC)(ABD) =
(B)(ACD)(ABD) = (A)(BCD)(ABD) = (ABC)(ACD)(BCD)

D = D = (A)(B)(ABD) = (A)(C)(ACD) = (B)(C)(BCD) = (C)(ABD)(ABC) =
(B)(ACD)(ABC) = (A)(BCD)(ABC) = (ABD)(ACD)(BCD)

The above sequence, while having the same bandwidth efficiency as the
duplicate-packet mechanism, 2:1, is far superior in error-correction
ability. With this transmission scheme (assuming a 10 percent packet
loss), one could anticipate a video freeze or audio discontinuity about
once every 20 minutes (the odds of an anomaly are 1:5922).

mode shall be set to 0 for an A packet, 1 for B, 2 for ABC, 3 for C, 4
for ACD, 5 for ABD, 6 for D, and 7 for BCD.


7.2 Changing Error-Correction Scheme During an RTP Session

Since the scheme field is present in every error-correction RTP packet,
the RTP sender may change its value at any time, signaling to the RTP
receiver that a different error-correction scheme is being used starting
with the packet containing the changed scheme value. The capability of
the receiver to perform a given error-correction scheme must be signaled
out-of-band, such as through the use of the capability-exchange and
open-logical-channel procedures of H.245 [8]. The sender shall not use
an error-correction scheme which the receiver has no capability to
process.

When the scheme changes, the transmitter shall not send encodings for
any original packets that the receiver could have reconstructed from the
previous scheme's packet stream, assuming no packet loss. This avoids
duplicate packets, because the receiver has no way to correlate the same
original packets sent using the different schemes--it has no way of
knowing that the same packet has been sent twice. For example, for
scheme=3, if A, B, and ABC have been transmitted and the sender wishes
to switch to scheme=0 for the next RTP packet, it shall not transmit an
encoding of C because C could have been reconstructed from the previous
packets, i.e., (A)(B)(ABC).




Budge, et al.                                                 [Page 12]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


7.2.1 How Changing Scheme is Possible

It is possible to change the error-correction scheme on a packet-by-
packet basis, because the first packet of the new scheme can be thought
of as if it were the first packet of a new RTP session using only that
scheme. Once the receiver has released any system resources associated
with the previous scheme, it uses the same start-of-RTP-session logic it
would have used for the indicated scheme to start processing packets for
the new scheme during the current RTP session. This logic includes
determining the mode of the packet within the indicated error-correction
scheme and taking into consideration the possibility that the first few
packets of this scheme may have been lost.


7.2.2 Scenarios for Choosing a Scheme

As long as the receiver has the capability to process it, the sender may
use the same error-correction scheme for all RTP sessions, a scheme used
throughout the session but chosen for the known performance
characteristics and apparent condition of one or more links in the
connection before each session starts, or a scheme based on the current
condition of the connection, switching schemes as the condition changes.
Feedback for the last scenario could be provided by the receiver to the
sender through a reverse RTCP [7] channel associated with the forward
RTP channel.


7.3 Stream Interruptions

The error-correction scheme described in this document assumes that
there is usually a steady stream of original media payloads present at
the input of the error-correction encoder. When there is not--when the
stream of payloads into the encoder is interrupted for some reason such
as at the onset of silence suppression in an audio stream--there are at
least two ways to make sure that enough information is transmitted so
that all packets can be reconstructed up to and including the last
packet before the interruption. Only one way is required; however, since
they have different performance characteristics, two ways are described,
and the choice of which one to use is left up to the sender, although
the receiver shall be capable of handling both.

To ensure that the sender may use either way of handling stream
interruptions, although only used by one, the following requirements are
necessary: A receiver shall ignore all original media payloads with
length=0 (this assumes that an original media payload can never
otherwise have length=0). These payloads are called "null payloads." Any
RTP header fields that have been associated with a payload by an RTP
profile shall be ignored as they pertain to a null payload. If a sender
inserts a null payload into the input of its encoder, it shall
consistently apply the null payload across all modes of the scheme as if



Budge, et al.                                                 [Page 13]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


it were a real payload whose place it has taken. For example, with
scheme=1, a null payload cannot simply replace B in a single mode, as in

A, AB; 0, BC; C, . . .

It must replace all encodings of B, as in

A, A0; 0, 0C; C, . . .


7.3.1 The Problem with Stream Interruptions

If the last original media payload before an interruption must be
encoded with one or more subsequent payloads (e.g., the D in DE or the A
in ACD) according to the current mode of the scheme in use, the last
payload must be delayed until the subsequent payloads arrive at the
input of the error-correction encoder. For some media or if the stream
resumes after a relatively short delay, this is not a problem; however,
it is a problem for other media such as audio with silence suppression.

This particular media uses silence frames to indicate to the receiver
that the audio stream is being interrupted due to silence at the source,
but the silence frame cannot be sent until more frames arrive at the
encoder! If previously transmitted packets do not contain sufficient
information for the receiver to reconstruct the last payload, the
receiver encounters the silence frame at the end of the period of
silence that the frame was intended to announce. For example, with
scheme=2, if AB, AC, and ABC have been sent and C is a silence frame,
the receiver can reconstruct the original payloads of A, B, and C
(assuming no packets have been lost) without CD. However, if only AB has
been sent and B is a silence frame, the receiver cannot reconstruct the
original payloads of A or B without AC and ABC. The solution lies in
somehow transmitting A and B without waiting for C to arrive at the
encoder. The following solutions do just that, the first by changing
schemes, the second by inserting null payloads.


7.3.2 Handling Stream Interruptions by Changing Schemes

Perhaps the simplest way of dealing with stream interruption is to
immediately switch to scheme=0--no error correction--for the last few
packets before the interruption. If, assuming no packet loss, previously
transmitted packets do not contain sufficient information for the
receiver to reconstruct original payloads up to and including the last
original payload before the interruption, scheme=0 shall be used to
transmit all of the original payloads that the receiver cannot
reconstruct, in the same order that they were received by the encoder.
When original payloads start arriving at the input of the encoder
again--the interruption is over--the sender shall resume with the first
mode of any scheme, although it will typically be the previous, non-0
scheme. A stream may be interrupted at any mode, even before a group is


Budge, et al.                                                 [Page 14]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


completed. When the stream resumes, the sender shall start with mode=0,
not necessarily where it left off.

For example, with scheme=2, if only AB has been sent and B is the last
payload before an interruption, the sender transmits packets containing
A and then B using scheme=0. When the interruption is over, the sender
transmits using scheme=2 again as in the following, where the scheme is
enclosed in brackets:

AB[2], A[0], B[0] . . . interruption . . . A'B'[2], A'C'[2],  . . .


7.3.3 Handling Stream Interruptions by Inserting Null Payloads

An alternative to changing schemes is for the sender to insert null
payloads into its encoder for any original media payloads for which it
would have otherwise had to wait. This allows the packet stream to
continue until packets have been sent that contain encodings necessary
for the receiver to reconstruct all packets up to and including the last
original payload before the interruption. The receiver simply has to
ignore null payloads.

For example, with scheme=2, if only AB has been sent and B is the last
payload before an interruption, the sender shall transmit AC and ABC
where a null payload is inserted in place of C:

AB, A0, AB0; . . . interruption . . . 0D, 0E, 0DE; EF, EG, EFG; . . .

This allows the receiver to reconstruct A and B without waiting for a
real C to arrive at the sender's encoder. Note that C shall continue to
be replaced with a null payload in subsequent modes that include
encodings of C.


7.3.4 Comparison of Handling Stream Interruptions

Handling stream interruptions with null payloads continues to generate
error-correction packets while the sender is "spinning down," before the
interruption, whereas the alternative, changing to scheme=0, simply
transmits single copies of the last few payloads as it is spinning down.
Scheme=0 does not providing any encodings for secondary reconstructions,
making it less resilient to packet loss. On the other hand, handling
interruptions by changing to scheme=0 requires slightly less bandwidth
and may be less complex.


7.4 Tutorial

For those not familiar with the basic error-correcting ability provided
by XOR, here is a simplified example of how it is done. When z, the



Budge, et al.                                                 [Page 15]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


result of A XOR B, is XORed with A, B is recovered as the result; when
XORed with B, A is recovered.

A = 1010
B = 1001
z = A XOR B = 0011
B = z XOR A = 1001
A = z XOR B = 1010


8. Security Considerations

This error-correction scheme does not expose the data in the RTP header
and payload to any further security risk, nor does it provide any
further security protection, than if it were not used.


9. Conclusions

The data transport must fit the expected parameters of the target
transmission medium, e.g., data transfer rate, data error rate, data
error mode, and round trip delay. In the case of the Internet via phone
modems, we have empirically determined that a moderate-to-good Internet
transmission transfers about 20,000 bits per second with a 10 percent to
20 percent packet loss.

Given these parameters and the need to provide mechanisms that support
multicasting of audio and video, the XOR error-correction, packet-
redundancy scheme described in this document works well between
endpoints on a lossy packet-switched network, providing the user with
higher quality, intelligible audio and video that is more fluid. There
may be strategies for other transmission mediums, like intranets, that
reduce overhead from the levels stated above, but in many cases,
intranets have enough capacity so that the bandwidth requirements of
this scheme are not an issue.

A final note: Although this error-correction scheme is intended to be
implemented within endpoints on the network, it would be more efficient
to implement the core technology solely on network points-of-presence
such as, in the case of the Internet, at the dial-up user's service
provider. To illustrate, assume we have two users, each connected via a
modem to their service provider, engaged in a conference. Their modem
connections are small-bandwidth, low-latency, high-reliability
connections. The Internet connection between the two service providers
is a high bandwidth, high latency, low reliability connection. The users
could send their packets without redundancy, hence using the narrow
bandwidth efficiently, to an active agent residing at the service
provider. That agent could then create the XORed redundant-packet stream
for transmission to a peer agent at the other service provider, who
would then recover the original packets and send them down the modem



Budge, et al.                                                 [Page 16]


INTERNET-DRAFT   Media-independent Error Correction using RTP   May 1997


pipe to the second user. Of course, the drawback of this scheme is the
need for active agents resident at the Internet service providers.


10. References

[1] "Visual Telephone System and Equipment for Local Area Networks Which
Provide a Non-Guaranteed Quality of Service," ITU-T Draft Recommendation
H.323, 1996.

[2] Postel, J., "Internet Protocol," RFC 791, 1981.

[3] Postel, J., ed., "Transmission Control Protocol - DARPA Internet
Program Protocol Specification," RFC 793, 1981.

[4] Postel, J., "User Datagram Protocol," RFC 768, 1980.

[5] Bolot, J.-C. and Vega-Garcia, A., "The case for FEC-based error
control for packet audio in the Internet," Multimedia Systems, 1997.

[6] "Video Coding for Low Bitrate Communication," ITU-T Recommendation
H.263, 1996.

[7] H.Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A
Transport Protocol for Real-Time Applications," RFC 1889.

[8] "Control Protocol for Multimedia Communication," ITU-T
Recommendation H.245, 1996.


11. Authors' Address

Dan Budge (dbudge@smithmicro.com, telephone extension 22)
Robert McKenzie (bmckenzie@smithmicro.com)
Willie Mills (wmills@smithmicro.com)
William Diss (bdiss@smithmicro.com)
Paul Long (plong@smithmicro.com, telephone extension 12)

Smith Micro Software, Inc.
15050 SW Koll Parkway
Suite 2B
Beaverton, OR 97006
USA

Phone: +1.503.641.1221
Fax:   +1.503.641.3344

Expires: December 4 1997





Budge, et al.                                                 [Page 17]

--------------33F09D01775604BC94EE41C7--