RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio
RFC 3190

Document Type RFC - Proposed Standard (January 2002)
Authors Carsten Bormann  , Stephen Casner  , Katsushi Kobayashi  , Akimichi Ogawa 
Last updated 2013-03-02
Stream Internet Engineering Task Force (IETF)
Network Working Group                                       K. Kobayashi
Request for Comments: 3190             Communication Research Laboratory
Category: Standards Track                                       A. Ogawa
                                                         Keio University
                                                               S. Casner
                                                           Packet Design
                                                              C. Bormann
                                                 Universitaet Bremen TZI
                                                            January 2002

                         RTP Payload Format for
        12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio

   This document specifies a packetization scheme for encapsulating
   12-bit nonlinear, 20-bit linear, and 24-bit linear audio data streams
   using the Real-time Transport Protocol (RTP).  This document also
   specifies the format of a Session Description Protocol (SDP)
   parameter to indicate when audio data is preemphasized before
   sampling.  The parameter may be used with other audio payload
   formats, in particular L16 (16-bit linear).

1. Introduction

   This document describes the sampling of audio data in 12-bit
   nonlinear, 20-bit linear, and 24-bit linear encodings, and specifies
   the encapsulation of the audio data into the Real-time Transport
   Protocol (RTP), version 2 [1,2].  DAT (digital audio tape) and DV
   (digital video) devices [3,4] use these audio encodings in addition
   to 16-bit linear encoding.  The packetization scheme for 16-bit
   linear audio (L16) is already specified [2,5].  This document
   specifies the packetization scheme for the other encodings following
   that for L16; in particular, when used with the RTP profile [2],
   these payload formats follow the encoding-independent rules for

   sample ordering and channel interleaving specified in [2] plus
   extensions specified here.  This document also specifies out-of-band
   negotiation methods for the extended channel interleaving rules and
   for use when an analog preemphasis technique is applied to the audio

1.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [6]

2. The need for RTP encapsulation of 12-, 20- and 24-bit audio

   Many high-quality digital audio and visual systems, such as DAT and
   DV, adopt sample-based audio encodings.  Different audio formats are
   used in various situations.  To transport the audio data using RTP,
   an encapsulation needs to be defined for each specific format.  Only
   16-bit linear audio encapsulation (L16) has thus far been defined.
   Other encoding formats have already appeared, such as the 12-bit
   nonlinear, 20-bit linear and 24-bit linear encodings used in the DAT
   and DV video world.  This specification defines the RTP payload
   encapsulation format in order to use the new encodings in the RTP

3. 12-bit nonlinear audio encapsulation

   IEC 61119 [3] specifies the 12-bit nonlinear audio format in DAT and
   DV, called LP (Long Play) audio.  It would be easy to convert 12-bit
   nonlinear audio into 16-bit linear form at the RTP sender and
   transmit it using the L16 audio format already defined.  However,
   this would consume 33% more network bandwidth than necessary.  This
   payload format is specified as a more efficient alternative.

   The 12-bit nonlinear encoding is the same as for 16-bit linear audio
   except for the packing of each sampled data element.  Each sample of
   12-bit nonlinear audio is derived from a single sample of 16-bit
   linear audio by a nonlinear compression.  Table 1 shows the details
   of the conversion from 16 to 12 bits.  The result is a 12-bit signed
   value ranging from -2048 to 2047 and it is represented in two's
   complement notation.  The 12-bit samples are packed contiguously into
   payload octets starting with the most significant bit.  When the
   payload contains an odd number of samples, the four LSBs of the last
   octet are unused.  Parameters other than quantization, e.g., sampling
   frequency and audio channel assignment, are the same as in the L16
