Audio/Video Transport                                         M. Romaine
Internet-Draft                                               M. Hatanaka
Expires: April 20, 2005                                     J. Matsumoto
                                                                    SONY
                                                        October 20, 2004



                  RTP Payload Format for ATRAC Family
                   draft-ietf-avt-rtp-atrac-family-01


Status of this Memo


   This document is an Internet-Draft and is subject to all provisions
   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


   This Internet-Draft will expire on April 20, 2005.


Copyright Notice


   Copyright (C) The Internet Society (2004).


Abstract


   This document describes an RTP payload format for efficient and
   flexible transporting of audio data encoded with the Adaptive
   TRansform Audio Codec (ATRAC) family of codecs.  Recent enhancements
   to the ATRAC family of codecs support high quality audio coding with
   multiple channels.  The RTP payload format as presented in this
   document includes support for data fragmentation and elementary




Romaine, et al.          Expires April 20, 2005                 [Page 1]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   redundancy measures.


Table of Contents


   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1   ATRAC Details  . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Conventions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Payload Format . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1   RTP Header . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.2   Payload Header . . . . . . . . . . . . . . . . . . . . . .  5
     3.3   Payload Data . . . . . . . . . . . . . . . . . . . . . . .  6
   4.  Frame Fragmentation and Packetization  . . . . . . . . . . . .  8
     4.1   Example Multi-frame Packet . . . . . . . . . . . . . . . .  8
     4.2   Example Fragmented ATRAC Frame . . . . . . . . . . . . . .  9
   5.  Payload Format Parameters  . . . . . . . . . . . . . . . . . . 11
     5.1   ATRAC3 MIME Registration . . . . . . . . . . . . . . . . . 11
     5.2   ATRAC-X MIME Registraion . . . . . . . . . . . . . . . . . 12
     5.3   Channel Mapping Configuration Table  . . . . . . . . . . . 14
     5.4   Mapping MIME Parameters into SDP . . . . . . . . . . . . . 14
       5.4.1   For MIME subtype ATRAC3  . . . . . . . . . . . . . . . 15
       5.4.2   For MIME subtype ATRAC-X . . . . . . . . . . . . . . . 15
     5.5   Offer-Answer Model Considerations  . . . . . . . . . . . . 15
       5.5.1   For MIME subtype ATRAC3  . . . . . . . . . . . . . . . 15
       5.5.2   For MIME subtype ATRAC-X . . . . . . . . . . . . . . . 15
     5.6   Example SDP Session Descriptions . . . . . . . . . . . . . 16
     5.7   Example Offer-Answer Exchange  . . . . . . . . . . . . . . 16
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
     7.1   Confidentiality  . . . . . . . . . . . . . . . . . . . . . 19
     7.2   Authentication . . . . . . . . . . . . . . . . . . . . . . 19
     7.3   Decoding Validation  . . . . . . . . . . . . . . . . . . . 19
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
   8.1   Normative References . . . . . . . . . . . . . . . . . . . . 20
   8.2   Informative References . . . . . . . . . . . . . . . . . . . 20
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 20
       Intellectual Property and Copyright Statements . . . . . . . . 22
















Romaine, et al.          Expires April 20, 2005                 [Page 2]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



1.  Introduction


   The ATRAC family of perceptual audio codecs are designed to address
   numerous needs for high-quality, low bit-rate audio transfer.  ATRAC
   technology can be found in many consumer and professional products
   and applications, including MD players, CD players, voice recorders,
   and mobile phones.  The need for real-time streaming of audio data
   has grown, and this document details our efforts in increasing the
   product and application space for the ATRAC family of codecs.


   Recent advances in ATRAC technology allow for multiple channels of
   audio to be encoded in customizable groupings.  This should allow for
   future expansions in scaled streaming.  To provide the greatest
   flexibility in streaming any one of the ATRAC family member codecs
   however, this payload format does not distinguish between the codecs
   on a packet level.


   This simplified payload format contains only the basic information
   needed to disassemble a packet of ATRAC audio in order to decode it.
   Timestamps are in sample units, with audio data currently encoded
   into frames of 1024 or 2048 samples depending on the ATRAC version.
   There is also basic support for fragmentation and redundancy, as
   ATRAC frames MAY exceed an MTU size of 1500 octets.


   Although streaming of multi-channel audio is supported depending on
   the ATRAC version used, all encoded audio for a given time period is
   contained within a single frame.  Therefore, there is no interleaving
   nor splitting of audio data on a per-channel basis to be concerned
   with.


1.1  ATRAC Details


   Early versions of the ATRAC codec handled only two channels of audio
   at 44.1kHz sampling frequency, with typical bit-rates between 66kbps
   and 132kbps.  The latest version allows for up to 8 channels of audio
   at 96kHz sampling frequency.  The feasible bit-rate range has also
   expanded, allowing from 8kbps to 1400kbps.


   Depending on the version of ATRAC used, the sample-frame size is
   either 1024 or 2048.  Actual bit-rates are determined by specifying a
   fixed encoded frame-size.  In other words, instead of requesting a
   stereo 44.1kHz stream at, say, 64kbps, one would tell the encoder to
   create encoded frame-sizes of 364bytes.









Romaine, et al.          Expires April 20, 2005                 [Page 3]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



2.  Conventions


   The key words "MUST, "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [4].















































Romaine, et al.          Expires April 20, 2005                 [Page 4]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



3.  Payload Format


3.1  RTP Header


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |V=2|P|X|  CC   |M|     PT      |       sequence number         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          timestamp                            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |            synchronization source (SSRC) identifier           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           contributing source (CSRC) identifiers              |
     |                             .....                             |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Marker (M): 1 bit
   Set to zero as silence suppression is currently not used.


   Payload Type (PT): 7 bits
   The payload type can either be dynamically allocated at the
   application level, or an RTP profile for a class of applications is
   expected to assign the payload type for this format.  A dynamic
   allocation SHOULD designate this format as ATRAC-Family.


   Timestamp: 32 bits
   A timestamp representing the sampling time of the first sample of the
   first ATRAC frame in the RTP packet.  The clock frequency MUST be set
   to the sample rate of the encoded audio data, and is conveyed
   out-of-band.  For ATRAC3 the RTP timestamp rate SHALL be 44100Hz.


3.2  Payload Header


   The ATRAC family payload header is a scant two octets.  This should
   make processing very simple.


    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |C|FrgNo| Rsrvd |NFrames| FrOff |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Continuous flag (C): 1 bit
   Set to 1 if this is part of a fragmented packet.  Thus the last
   packet in a series would have this bit be 0.


   Fragment Number (FrgNo): 3 bits
   In the event of data fragmentation, this value is 1 for the first




Romaine, et al.          Expires April 20, 2005                 [Page 5]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   packet, and increases sequentially for the remaining fragmented data
   packets.


   Number of Frames (NFrames): 4 bits
   The number of frames in this packet.  This allows for a maximum of 16
   ATRAC-encoded audio encapsulations per packet, with 0 indicating one
   frame.  Keep in mind only the first frame is allowed to be
   fragmented.  Additionally, this MUST not be anything other than 0 for
   subsequent packets containing the fragmented frame.


   Frame Offset (FrOff): 4 bits
   The purpose of frame offsets is to provide a basic mechanism for the
   transmission of redundant data.  Redundant frames are sent
   sequentially before any new frames in the same packet.  The timestamp
   also reflects the playback time of the first frame in a packet, even
   if the first frame is a redundant frame.  Frame-size lengths are
   determined during SDP negotiations (one of either 1024 or 2048
   samples), and are fixed for a given session.  A "maxRedundantFrames"
   parameter is also sent during SDP negotations; this allows for the
   necessary buffer size to be calculated in advance.


   As an example of using Frame Offsets, refer to Figure 1, which
   considers a situation when FrOff is 2.  If a packet has 4 frames of
   audio, with each frame representing 1024 samples of audio, then we
   can calculate that playback begins with 2 frames (2048 samples) of
   redundant data, and can allocate buffer space as necessary.  (The
   only other necessary variable is sampling frequency, which should
   have been established during out-of-band negotiations).  This field
   SHOULD NOT be used in packets containing fragmented data.


    |-Fr1-|-Fr2-|-Fr3-|-Fr4-|                         Nth Packet,   TS=1
                |-Fr3-|-Fr4-|-Fr5-|-Fr6-|             N+1th Packet, TS=3
                            |-Fr5-|-Fr6-|-Fr7-|-Fr8-| N+1th Packet, TS=5



3.3  Payload Data


   ATRAC payload data consists of 2 octets, which represent the
   byte-length of encoded audio data, followed by the actual audio data.


      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Block Length          |  ATRAC data...                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Block length: 16 bits
   The byte length of encoded audio data for the following frame.  This
   is so that in the case of fragmentation, if only a subsequent packet




Romaine, et al.          Expires April 20, 2005                 [Page 6]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   is received, decoding can still occur.  16 bits allows for a maximum
   block length of 65535 bytes.  In the event a data block is larger
   than 65535 bytes but would still fit within MTU limits, fragmentation
   MUST occur.  If there are multiple frames in a packet, a block-length
   field exists before each frame data.















































Romaine, et al.          Expires April 20, 2005                 [Page 7]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



4.  Frame Fragmentation and Packetization


   Each RTP packet SHALL contain either an integer number of ATRAC
   encoded audio frames (with a maximum of 16), or one ATRAC frame
   fragment.  In the former case, as many complete ATRAC frames as can
   fit in a single path-MTU SHOULD be placed in an RTP packet.  However,
   if even a single ATRAC frame will not fit into a complete RTP packet,
   the ATRAC frame SHOULD be fragmented.


   The start of a fragmented frame gets placed in its own RTP packet,
   its Continuous bit (C) set to one, and its Fragment Number (FragNo)
   set to one.  As the frame must be the only one in the packet, the
   Number of Frames field is zero.  Subsequent packets are to contain
   the remaining fragmented frame data, with the Fragment Number
   increasing sequentially and the Continuous bit (C) consistently set
   to one.  As subsequent packets do not contain any new frames, the
   Number of Frames field SHOULD be ignored.  The last packet of
   fragmented data MUST have the Continuous bit (C) set to zero.


   In addition to the Continuous bit and Fragment Number fields
   indicating fragmentation and a means to reorder the packets, the
   timestamp can be used to determine which packets go together.  Thus,
   packets containing related fragmented frames SHOULD have identical
   timestamps.


   In the event of fragmentation, the basic redundancy measures SHOULD
   NOT be used.  This means the Frame Offset field SHOULD be ignored.


   Two packetization examples are presented below.  One is an example of
   a fragmented ATRAC frame, the other is an example of multiple frames
   per packet.


4.1  Example Multi-frame Packet


   Multiple encoded audio frames are combined into one packet.  For
   brevity, the RTP packet header details have been omitted.
















Romaine, et al.          Expires April 20, 2005                 [Page 8]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |0|  0  | Rsrvd |   5   |   2   |          Block Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   (Redundant)  Frame 1 data...                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Block Length          | (Redundant)  Frame 2 data...  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Block Length          |      Frame 3 data...          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Block Length          |      Frame 4 data...          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Block Length          |      Frame 5 data...          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




4.2  Example Fragmented ATRAC Frame


   The encoded audio data frame is split over three RTP packets.  For
   brevity, the RTP packet header details have been omitted.



     Packet 1:
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |1|  1  | Rsrvd |   0   |   0   |          Block Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          ATRAC data...                        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     Packet 2:
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |1|  2  | Rsrvd |   0   |   0   |          Block Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     ...more ATRAC data...                     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     Packet 3:
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |0|  3  | Rsrvd |   0   |   0   |          Block Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                ...the last of the ATRAC data                  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   The following points highlight important characteristics of the




Romaine, et al.          Expires April 20, 2005                 [Page 9]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   example above:
   o  the transition from one to zero of the Continuous bit (C)
   o  a sequential increase in the Fragment Number

















































Romaine, et al.          Expires April 20, 2005                [Page 10]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



5.  Payload Format Parameters


   Certain parameters will need to be defined before ATRAC family
   encoded content can be streamed.  Other optional parameters may also
   be defined to take advantage of specific features relevant to certain
   ATRAC versions.  Parameters for ATRAC3 and ATRAC-X are defined here
   as part of the MIME subtype registration process.  A mapping of these
   parameters into the Session Description Protocol (SDP) (RFC 2327) [2]
   is also provided for applications that utilize SDP.


   The data format and parameters are specified for real-time transport
   in RTP.


5.1  ATRAC3 MIME Registration


   The MIME subtype for the Adaptive TRansform Codec version 3 (ATRAC3)
   is allocated from the Vendor tree since this codec is intended to be
   used with commercial products, and use of any ATRAC family codec
   requires a license from Sony Corporation, the vendor.


   Note, any unspecified parameter MUST be ignored by the receiver.


   Media Type name:  audio


   Media subtype name:  vnd.sony.atrac3


   Required parameters:


   frameLength:  Indicates the size in bytes of an encoded audio frame.
   In essence, this value determines the bit-rate of the encoded audio.
   Permissible values are 192 (66kbps), 304 (105kbps), and 384
   (132kbps).


   Optional parameters:


   maxRedundantFrames:  The maximum number of redundant frames that may
   be sent during a session in any given packet under the redundant
   framing mechanism detailed in the draft.  Allowed values are integers
   in the range of 0 to 15, inclusive.  If this parameter is not used, a
   default of 15 SHOULD be assumed.


   maxptime: The maximum amount of media which can be encapsulated in a
   payload packet, expressed as time in milliseconds.  The time is
   calculated as the sum of the time the media present in the packet
   represents.  The time SHOULD be a multiple of the frame size.  If
   this parameter is not present, the sender MAY encapsulate a maximum
   of 16 encoded frames into one RTP packet.





Romaine, et al.          Expires April 20, 2005                [Page 11]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   ptime:    see RFC 2327 [2]


   Encoding considerations: This type is defined for transfer via RTP
   RFC 3550 [1].


   Security considerations: Audio data is believed to offer no security
   risks.


   Public specifications: Please refer to section 7 of this draft.


   Macintosh file type code: none
   Object identifier or OID: none


   Person & email address to contact for further information:
   Mitsuyuki Hatanaka
   hatanaka@av.crl.sony.co.jp


   Intended usage: LIMITED USE
   Only licensees of ATRAC technology may use this type.


   Author/Change controller:
   hatanaka@av.crl.sony.co.jp


5.2  ATRAC-X MIME Registraion


   The MIME subtype for the Adaptive TRansform Codec version X (ATRAC-X)
   is allocated from the Vendor tree since this codec is intended to be
   used with commercial products, and use of any ATRAC family codec
   requires a license from Sony Corporation, the vendor.


   Note, any unspecified parameter MUST be ignored by the receiver.


   Media Type name:  audio


   Media subtype name:  vnd.sony.atrac-x


   Required parameters:


   sampleRate:  Represents the sampling frequency in Hz of the original
   audio data.  Permissible values are 32000, 44100, 48000, 88200,
   96000.


   frameLength:  Indicates the size in bytes of an encoded audio frame.
   In essence, this value determines the bitrate of the encoded audio.
   Permissible values lie within 8 ~ 8192.


   channelID:  Indicates the number of channels and channel layout
   according to the table in Section 5.3.  Note that this layout is




Romaine, et al.          Expires April 20, 2005                [Page 12]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   different from that proposed in RFC 3551 [3].  However, as channelID
   = 0 defines an ambiguous channel layout, the channel mapping defined
   in Section 4.1 of [3] could be used.  Permissible values are 0, 1, 2,
   3, 4, 5, 6, 7.


   Optional parameters:


   maxRedundantFrames:  The maximum number of redundant frames that may
   be sent during a session in any given packet under the redundant
   framing mechanism detailed in the draft.  Allowed values are integers
   in the range 0 to 15, inclusive.  If this parameter is not used, a
   default of 15 SHOULD be assumed.


   delayMode:  Indicates a desire to use low-delay features, in which
   case the decoder will process received data accordingly based on this
   value.  Permissible values are 2 and 4.


   encryptionMode:  Indicates whether the audio frames have been
   encrypted using OpenMG ("OpenMG") or a third party method ("Other").
   If "Other", the specific mode MUST be determined at the application
   level.  Permissible values are "OpenMG" and "Other".


   maxptime: The maximum amount of media which can be encapsulated in a
   payload packet, expressed as time in milliseconds.  The time is
   calculated as the sum of the time the media present in the packet
   represents.  The time SHOULD be a multiple of the frame size.  If
   this parameter is not present, the sender MAY encapsulate a maximum
   of 16 encoded frames into one RTP packet.


   ptime:    see RFC 2327 [2]


   Encoding considerations: This type is defined for transfer via RTP
   (RFC 3550) [1].


   Security considerations:
   Audio data is believed to offer no security risks.


   Public specifications:
   Please refer to section 7 of this draft.


   Macintosh file type code: none
   Object identifier or OID: none


   Person & email address to contact for further information:
   Mitsuyuki Hatanaka
   hatanaka@av.crl.sony.co.jp


   Intended usage: LIMITED USE




Romaine, et al.          Expires April 20, 2005                [Page 13]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   Only licensees of ATRAC technology may use this type.


   Author/Change controller:
   hatanaka@av.crl.sony.co.jp


5.3  Channel Mapping Configuration Table


               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               | channelID | Number of |  Default Speaker    |
               |           | Channels  |      Mapping        |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     0     |  max 64   |     undefined       |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     1     |     1     | front: center       |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     2     |     2     | front: left, right  |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     3     |     3     | front: left, right  |
               |           |           | front: center       |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     4     |     4     | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: surround      |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     5     |    5+1    | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: left, right   |
               |           |           | LFE                 |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     6     |    6+1    | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: left, right   |
               |           |           | rear: center        |
               |           |           | LFE                 |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     7     |    7+1    | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: left, right   |
               |           |           | side: left, right   |
               |           |           | LFE                 |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



5.4  Mapping MIME Parameters into SDP


   The information carried in the MIME media type specification has a
   specific mapping to fields in the Session Description Protocol (SDP)
   [2], which is commonly used to describe RTP sessions.  When SDP is




Romaine, et al.          Expires April 20, 2005                [Page 14]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   used to specify sessions employing the ATRAC family of codecs, the
   following mapping rules according to the ATRAC codec apply:


5.4.1  For MIME subtype ATRAC3
   o  The MIME type ("audio") goes in SDP "m=" as the media name
   o  The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.  ATRAC3 supports only mono or stereo signals,
      so a corresponding number of channels SHALL also be included in
      this attribute.
   o  The "frameLength" parameter goes in SDP "a=fmtp".  This parameter
      MUST be present.  "maxRedundantFrames" may follow, but if no value
      is transmitted, the receiver SHOULD assume a default value of
      "15".
   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.


5.4.2  For MIME subtype ATRAC-X
   o  The MIME type ("audio") goes in SDP "m=" as the media name
   o  The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.  This should be followed by the "sampleRate"
      (as the RTP clock rate), and then the actual number of channels
      regardless of the channelID parameter.
   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.
   o  Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the MIME media type string as a
      semicolon separated list of parameter=value pairs.  The
      "frameLength" parameter must be the first entry on this line.  It
      is recommened that the "channelID" parameter be the next entry.
      The receiver MUST assume a default value of "15" for
      "maxRedundantFrames".


5.5  Offer-Answer Model Considerations


   Some options for encoding and decoding ATRAC audio data will require
   either or both the sender and receiver to comply with certain
   specifications.  In order to establish an interoperable transmission
   framework, an Offer-Answer negotiation in SDP should observe the
   following considerations:


5.5.1  For MIME subtype ATRAC3
   o  Downgraded subsets of "frameLength" are possible.  However for
      best performance, we suggest the Answerer respond with the highest
      possible values offered.


5.5.2  For MIME subtype ATRAC-X
   o  When creating an offer with considerably high requirements (such
      as 8 channels at 96kHz), it is RECOMMENDED that the offerer also




Romaine, et al.          Expires April 20, 2005                [Page 15]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



      propose a configuration with lower requirements, such as a stereo
      only option.  Although multiple alternative configurations may be
      offered, care should be taken not to offer too many payload types.
   o  Downgraded subsets of "sampleRate", "frameLength", and "channelID"
      are possible.  For best performance, we suggest an answer SHALL
      NOT contain any values requiring further capabilities than the
      offer contains, but is RECOMMENDED to provide values as close as
      possible to those in the offer.
   o  The "maxRedundantFrames" is a suggested minimum.  This value MAY
      be increased in an answer (with a maximum of 15), but SHALL NOT be
      reduced.
   o  The optional parameters "delayMode" and "encryptionMode" are
      non-negotiable.  Thus, if the Answerer cannot comply with the
      offered value, the session must be deemed inoperable.
   o  The parameters "maxptime" and "ptime" should not, in most cases,
      affect the interoperability.  However, the parameter settings can
      affect application performance.


5.6  Example SDP Session Descriptions


   Example usage of ATRAC-X with stereo at 44100Hz:


   m=audio 49120 RTP/AVP 99
   a=rtpmap:99 ATRAC-X/44100/2
   a=fmtp:99 frameLength=312; channelID=2; delayMode=2
   a=maxptime:20


   Example usage of ATRAC-X with 5.1 setup at 48000Hz:


   m=audio 49120 RTP/AVP 99
   a=rtpmap:99 ATRAC-X/48000/6
   a=fmtp:99 frameLength=1156; channelID=5
   a=maxptime:30


5.7  Example Offer-Answer Exchange


   An example Offer/Answer model (assuming ATRAC Family's PT is 99).


   Alice's Offer:


   m=audio 49170 RTP/AVP 99
   a=rtpmap:98 ATRAC-X/44100/6
   a=fmtp:99 frameLength=1156; channelID=5
   a=rtpmap:99 ATRAC-X/44100/6
   a=fmtp:99 frameLength=386; channelID=5


   Bob's Answer:





Romaine, et al.          Expires April 20, 2005                [Page 16]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   m=audio 49170 RTP/AVP 99
   a=rtpmap:99 ATRAC-X/44100/2
   a=fmtp:99 frameLength=386; channelID=2

















































Romaine, et al.          Expires April 20, 2005                [Page 17]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



6.  IANA Considerations


   Two new MIME subtypes, for ATRAC3 and ATRAC-X, are requested to be
   registered (see Section 5).
















































Romaine, et al.          Expires April 20, 2005                [Page 18]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



7.  Security Considerations


   Certain security precautions may be desired to protect copyrighted
   material.  The payload format as described in this document is
   subject to the security considerations defined in RFC3550 [1] and any
   applicable profile, for example RFC 3551 [3].  This payload format
   however does not implement any security mechanisms of its own.
   External means, such as SRTP [5], MAY be used since the audio
   compression scheme follows an end-to-end model.


   Since the data transported is audio that is already encoded, the main
   security issues are confidentiality, integrity, and authentication of
   the actual audio.


7.1  Confidentiality


   To ensure confidentiality of ATRAC encoded audio, the audio frames
   will have to be encrypted.  Encryption of the payload header,
   however, is not as neccessary, and in fact may not be preferrable if
   the information could be useful to some third party application.


   Because the audio compression scheme follows an end-to-end model,
   encryption may be performed after packet encapsulation.  As
   multi-channel transmissions are contained in single encoded audio
   frames, there is no concern for encryption affecting interleaving
   data.


7.2  Authentication


   Transmitted data may be tampered or altered due malicious attempts,
   such as man-in-the-middle attacks.  Such attacks may result in
   depacketization and/or decoding errors that could decimate audio
   quality.


   As this payload format does not include its own means for sender
   authentication and integrity protection, an external mechanism must
   be used.  It is RECOMMENDED, however, that the chosen mechanism
   protect more than just the audio data bits.  For example, to protect
   against a man-in-the-middle attack, the payload header and RTP header
   SHOULD be protected.


7.3  Decoding Validation


   Verification of the received encoded audio packets should be
   performed so as to ensure a minimal level of audio quality.  As a
   most primitive implementation, if the receiver calculates a packet
   size differing from the payload length based on data in the payload
   header fields, the receiver SHOULD discard the packet.




Romaine, et al.          Expires April 20, 2005                [Page 19]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



8.  References


8.1  Normative References


   [1]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobsen,
        "RTP: A Transport Protocol for Real-Time Applications", RFC
        3550, STD 64, July 2003.


   [2]  Handley, M. and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327, April 1998.


   [3]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
        with Minimal Control", RFC 3551, STD 65, July 2003.


   [4]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels, BCP 14", RFC 2119, March 1997.


8.2  Informative References


   [5]  Kerr, P., "RTP Payload Format for Vorbis Encoded Audio", October
        2003.


   [6]  Sjoberg, J., "Real-Time Transport Protocol (RTP) Payload Format
        and File Storage Format for the Adaptive Multi-Rate (AMR) and
        Adpative Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267,
        June 2002.


   [7]  Baugher, M., Carrara, E., McGrew, D., Naslund, M. and Norrman,
        "The Secure Real Time Transport Protocol", July 2003.


   [8]  Rosenberg, J. and Schulzrinne, "An Offer/Answer Model with the
        Session Description Protocl (SDP)", RFC 3264, June 2002.



Authors' Addresses


   Matthew Romaine
   Sony Corporation, Japan
   6-7-35 Kitashinagawa
   Shinagawa-ku
   Tokyo  141-0001
   Japan


   EMail: Matthew.Romaine@jp.sony.com








Romaine, et al.          Expires April 20, 2005                [Page 20]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



   Mitsuyuki Hatanaka
   Sony Corporation, Japan
   6-7-35 Kitashinagawa
   Shinagawa-ku
   Tokyo  141-0001
   Japan


   EMail: hatanaka@av.crl.sony.co.jp



   Jun Matsumoto
   Sony Corporation, Japan
   6-7-35 Kitashinagawa
   Shinagawa-ku
   Tokyo  141-0001
   Japan


   EMail: jun@av.crl.sony.co.jp


































Romaine, et al.          Expires April 20, 2005                [Page 21]


Internet-Draft    RTP Payload Format for ATRAC Family       October 2004



Intellectual Property Statement


   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.


   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.


   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.



Disclaimer of Validity


   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.



Copyright Statement


   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.



Acknowledgment


   Funding for the RFC Editor function is currently provided by the
   Internet Society.




Romaine, et al.          Expires April 20, 2005                [Page 22]