INTERNET-DRAFT                                              Ladan Gharai
<draft-ietf-gharai-ac3-01.txt>                                   USC/ISI

                     RTP Payload Format for AC-3 Audio

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at


This document specifies a packetization scheme for encapsulating AC-3
audio streams into a payload format for the Real-Time Transport Protocol

1.  Introduction

AC-3, also known as Dolby Digital  or Dolby AC-3, is a flexible audio
data compression technology. It has been in use in feature films since
1992 and has also been selected as the audio format of HDTV. The AC-3
digital compression algorithm can encode 1 to 5.1 audio channels in PCM
representation into a single serial bit stream.  Encoding multiple
channels as a single entity is more efficient than individually encoding
each channel, resulting in an overall lower bit rate.

draft-ietf-gharai-ac3-00.txt                                    [Page 1]

INTERNET-DRAFT                                             July 13, 2000

The syntax for AC-3 is fully  described in [1] by the Advanced
Television Standards Committee (ATSC). The audio compression system used
by HDTV is a restricted subset of this specification, where the
restrictions are specified in Annex B of the Digital Television Standard

2.  AC-3 Digital Audio

An AC-3 audio stream is constructed as a sequence of synchronization
frames also called the sync frame. Each frame is completely self
contained and is made up of:

 o a synchronization information (SI) header, which includes:
   - a sync word, used for acquiring and maintaining synchronization
   - an indication of the sampling rate, 48kHz, 44.1kHz or 32kHz
   - and the size of the sync frame
 o a bit stream information (BSI) header which includes the sync
   frames' timestamp,
 o 6 audio blocks (AB), each block represents 256 new audio samples,
 o an auxiliary data field (Aux),
 o and finally, an error check field CRC.

   | SI | BSI |  AB0 | AB1 | AB2 | AB3 | AB4 | AB5 | Aux | CRC |
     Figure 1. An AC-3 synchronization frame (not to scale).

All sync frames within a sequence are the same size. Frame sizes range
from 128bytes to 3840bytes. Table 5.13 in [1] lists all possible frame
sizes per bit rate and sampling frequency. At 48kHz each sync frame
represents 32ms of audio data (each audio block is 5.33ms).

Each sync frame is a complete independent data unit, it does not require
any other data to be decoded.  A complete sync frame MUST be presented
to the decoder for decompression. An incomplete sync frame will not pass
the decoder's error detection test causing the decoder to mute. At 48kHz
this can cause a maximum of 64ms of muted audio (if decoder is unable to
synchronize with the immediate next sync word).

3.  RTP Packetization

When feasible, a RTP packet will contain an integral number of sync
frames.  However, depending on the path-MTU, a sync frame may require
multiple RTP packets, in which case the sync frame will be fragmented
across multiple RTP packets. Multiple RTP packets transferring a
fragmented sync frame must have the same timestamp, which reflects the

draft-ietf-gharai-ac3-00.txt                                    [Page 2]

INTERNET-DRAFT                                             July 13, 2000

sampling instance of the sync frame. Fragmented sync frames are
reassembled via the RTP timestamp and sequence number.

An RTP packet should not carry fragments of different sync frames, or a
fragment of one sync frame and an other complete sync frame. Once
received fragmented sync frames MUST be reassembled  before being
presented to the decoder.

The fields of the RTP fixed header are used as follows:

Marker bit (M): The Marker bit of the RTP header is set to 1 for the
last packet of a sync frame and set to 0 on all other packets.

Payload Type (PT): The Payload Type indicates the use of the payload
format defined in this document.  A profile may assign a payload type
value for this format either statically or dynamically as described in
RFC 1890 [4].

Timestamp:  A 32bit 48kHz, 44.1kHz or 32kHz (corresponding with the
sampling frequency of the audio) timestamp which encodes the sampling
instant of the first sync frame in the RTP packet. All packets
transferring a fragmented sync frame MUST have the same timestamp.

4.  SDP Payload Format Description

With a dynamic payload type (say 96) and using the encoding name AC-3,
the rtpmap for an AC-3 audio stream sampled at 48kHz is as follows:

a=rtpmap:96 AC-3/48000

5.  Data Resiliency

With a transfer rate of 32kbps (the lowest transfer rate suggested in
table 5.13 of the ATSC standard) the size of sync frames for audio
sampled at 32kHz, 44.1kHz and 48kHz are 192bytes, 138bytes and 128bytes

Given the "all or nothing" nature of AC-3 sync frame, fragmented sync
frames are highly susceptible to network loss, i.e. the loss of one RTP
packet carrying part of a sync frame renders the other packets useless.

Augmenting the RTP stream with AC-3 sync frames compressed at 32kbps
increases the resiliency of the data stream, particularly for large
fragmented sync frames. The two audio streams can be interleaved into an
RTP stream. The application will first attempt de-packetize and (if
necessary) reassemble the higher quality AC-3 sync frames. However for

draft-ietf-gharai-ac3-00.txt                                    [Page 3]

INTERNET-DRAFT                                             July 13, 2000

missing or incomplete sync frames the lower quality sync frames shall be
presented to the decoder.

6.  Security Considerations

RTP packets using the payload format defined in this specification are
subject to the security considerations discussed in the RTP
specification [4], and any appropriate RTP profile.  This implies that
confidentiality of the media streams is achieved by encryption.  Because
the data compression used with this payload format is applied end-to-
end, encryption may be performed after compression so there is no
conflict between the two operations.

7.  IANA Considerations


8.  To Do

The AC-3 stream is likely to be well-served by a repair vector similar
to that proposed for AAC audio.

9.  Full Copyright Statement

Copyright (C) The Internet Society (1999). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it or
assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are included
on all such copies and derivative works.

However, this document itself may not be modified in any way, such as by
removing the copyright notice or references to the Internet Society or
other Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be followed,
or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

draft-ietf-gharai-ac3-00.txt                                    [Page 4]

INTERNET-DRAFT                                             July 13, 2000

This document and the information contained herein is provided on an "AS

10.  Authors' Address

Ladan Gharai

11.  Bibliography

[1] ATSC Digital Audio Compression Standard (AC-3) Document A/52,
    Sep. 1995,

[2] ATSC Digital Television Standard Document A/53, September 1995,

[3] Schulzrinne, Casner, Frederick, Jacobson, "RTP: A transport
    protocal for real time Applications", RFC 1889, IETF, January

[4] Schulzrinne, "RTP Profile for Audio and Video Conferences with
    Minimal Control", RFC 1890, IETF, January 1996.

draft-ietf-gharai-ac3-00.txt                                    [Page 5]