Audio/Video Transport M. Romaine
Internet-Draft M. Hatanaka
Expires: September 20, 2004 J. Matsumoto
SONY
March 22, 2004
RTP Payload Format for ATRAC Family
draft-hatanaka-avt-rtp-atrac-family-01
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 20, 2004.
Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract
This document describes an RTP payload format for efficient and
flexible transporting of audio data encoded with the Adaptive
TRansform Audio Codec (ATRAC) family of codecs. Recent enhancements
to the ATRAC family of codecs support high quality audio coding with
multiple channels. The RTP payload format as presented in this
document includes support for data fragmentation and elementary
redundancy measures.
Romaine, et al. Expires September 20, 2004 [Page 1]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
1. Introduction
The ATRAC family of perceptual audio codecs are designed to address
numerous needs for high-quality, low bit-rate audio transfer. ATRAC
technology can be found in many consumer and professional products
and applications, including MD players, voice recorders, mobile
phones, and CD players. The need for real-time streaming of audio
data has grown, and this document details our efforts in increasing
the product and application space for the ATRAC family of codecs.
Recent advances in ATRAC technology allow for multiple channels of
audio to be encoded in customizable groupings. This should allow for
future expansions in scaled streaming. To provide the greatest
flexibility in streaming any one of the ATRAC family member codecs
however, this payload format does not distinguish between the codecs
on a packet level.
This simplified payload format contains only the basic information
needed to disassemble a packet of ATRAC audio in order to decode it.
Timestamps are in sample units, with audio data currently encoded
into frames of 1024 or 2048 samples depending on the ATRAC version.
There is also basic support for fragmentation and redundancy, as
ATRAC frames MAY exceed an MTU size of 1500 octets.
Although streaming of multi-channel audio is supported depending on
the ATRAC version used, all encoded audio for a given time period is
contained within a single frame. Therefore, there is no interleaving
nor splitting of audio data on a channel-basis to be concerned with.
1.1 ATRAC Details
Early versions of the ATRAC codec handled only two channels of audio
at 44.1kHz sampling frequency, with typical bit-rates between 66kbps
and 132kbps. The latest version allows for up to 8 channels of audio
at 96kHz sampling frequency. The feasible bit-rate range has also
expanded, allowing from 8kbps to 1400kbps.
Depending on the version of ATRAC used, the sample-frame size is
either 1024 or 2048. Actual bit-rates are determined by specifying a
fixed encoded frame-size. In other words, instead of requesting a
stereo 44.1kHz stream at, say, 64kbps, one would tell the encoder to
create encoded frame-sizes of 364bytes.
Romaine, et al. Expires September 20, 2004 [Page 2]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
2. Conventions
The key words "MUST, "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [4].
Romaine, et al. Expires September 20, 2004 [Page 3]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
3. Payload Format
3.1 RTP Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| contributing source (CSRC) identifiers |
| ..... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Marker (M): 1 bit
Set to zero as silence suppression is currently not used.
Payload Type (PT): 7 bits
The payload type can either be dynamically allocated at the
application level, or an RTP profile for a class of applications is
expected to assign the payload type for this format. A dynamic
allocation SHOULD designate this format as ATRAC-Family.
Sequence number: 16 bits
This field is as defined in RFC 3550 [1].
Timestamp: 32 bits
A timestamp representing the sampling time of the first sample of the
first ATRAC frame in the RTP packet. The clock frequency MUST be set
to the sample rate of the encoded audio data, and is conveyed
out-of-band.
3.2 Payload Header
The ATRAC family payload header is a scant two octets. This should
make processing very simple.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|C|FrgNo| Rsrvd |NFrames| FrOff |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Continuous flag (C): 1 bit Set to one if this is a continuation of a
fragmented packet.
Romaine, et al. Expires September 20, 2004 [Page 4]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Fragment Number (FrgNo): 3 bits
In the event of data fragmentation, this value is 1 for the first
packet, and increases sequentially for the remaining fragmented data
packets.
Number of Frames (NFrames): 4 bits
The number of frames in this packet. This allows for a maximum of 16
ATRAC-encoded audio encapsulations per packet, with 0 indicating one
frame. Keep in mind only the first frame is allowed to be
fragmented. Additionally, this MUST not be anything other than 0 for
subsequent packets containing the fragmented frame.
Frame Offset (FrOff): 4 bits
The purpose of frame offsets is to provide a basic mechanism for the
transmission of redundant data. Redundant frames are sent
sequentially before any new frames in the same packet. The timestamp
also reflects the playback time of the first frame in a packet, even
if the first frame is a redundant frame. Frame-size lengths are
determined during SDP negotiations (one of either 1024 or 2048
samples), and are fixed for a given session. A "maxRedundantFrames"
parameter is also sent during SDP negotations; this allows for the
necessary buffer size to be calculated in advance.
As an example of using Frame Offsets, refer to Figure 1, which
considers a situation when FrOff is 2. If a packet has 4 frames of
audio, with each frame representing 1024 samples of audio, then we
can calculate that playback begins with 2 frames (2048 samples) of
redundant data, and can allocate buffer space as necessary. (The
only other necessary variable is sampling frequency, which MUST have
been established during SDP negotiations). This field SHOULD NOT be
used in packets containing fragmented data.
|-Fr1-|-Fr2-|-Fr3-|-Fr4-| Nth Packet, TS=1
|-Fr3-|-Fr4-|-Fr5-|-Fr6-| N+1th Packet, TS=3
|-Fr5-|-Fr6-|-Fr7-|-Fr8-| N+1th Packet, TS=5
3.3 Payload Data
ATRAC payload data consists of 2 octets which represent the
byte-length of encoded audio data. After that, the actual audio data
follows.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Block Length | Rsrvd | ATRAC data... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Romaine, et al. Expires September 20, 2004 [Page 5]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Block length: 12 bits
The byte length of encoded audio data until the end of the current
packet. This is so that in the case of fragmentation, if only a
subsequent packet is received, decoding can still occur. 12 bits
allows for a maximum block length of 4096 bytes. In the event a data
block is larger than 4096 bytes but would still fit within MTU
limits, fragmentation MUST occur.
Romaine, et al. Expires September 20, 2004 [Page 6]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
4. Frame Packetization
Each RTP packet contains either an integer number of ATRAC encoded
audio frames, with a maximum of 16, or one ATRAC frame fragment.
As many complete ATRAC frames as can fit in a single path-MTU SHOULD
be placed in an RTP packet, with the aforementioned maximum of 16.
However, if an ATRAC frame will not fit into an RTP packet, it MUST
be fragmented.
The start of a fragmented frame gets placed in its own RTP packet,
its Continuous bit (C) set to one, and its Fragment Number (FragNo)
set to one. As the frame must be the only one in the packet, the
Number of Frames field is zero. Subsequent packets are to contain
the remaining fragmented frame data, with the Fragment Number
increasing sequentially and the Continuous bit (C) consistently set
to one. As subsequent packets do not contain any new frames, the
Number of Frames field SHOULD be ignored. The last packet of
fragmented data MUST have the Continuous bit (C) set to zero.
In the event of fragmentation, the basic redundancy measures should
NOT be used.
4.1 Example Fragmented ATRAC Frame
An example of a fragmented ATRAC frame is presented below. The
encoded audio data frame is split over three RTP packets. For
brevity, the RTP packet header details have been excluded.
Romaine, et al. Expires September 20, 2004 [Page 7]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Packet 1:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| 1 | Rsrvd | 0 | 0 | block length | Rsrvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ATRAC data... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Packet 2:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| 2 | Rsrvd | 0 | 0 | block length | Rsrvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...more ATRAC data... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Packet 3:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| 3 | Rsrvd | 0 | 0 | block length | Rsrvd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...the last of the ATRAC data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The following points highlight important characteristics of the
example above:
o the transition from one to zero of the Continuous bit (C)
o a sequential increase in the Fragment Number
Romaine, et al. Expires September 20, 2004 [Page 8]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
5. Payload Format Parameters
Certain parameters will need to be defined before ATRAC family
encoded content can be streamed. Other optional parameters may also
be defined to take advantage of specific features relevant to certain
ATRAC versions. Parameters for ATRAC3 and ATRAC-X are defined here
as part of the MIME subtype registration process. A mapping of these
parameters into the Session Description Protocol (SDP) (RFC 2327) [2]
is also provided for applications that utilize SDP.
The data format and parameters are specified for real-time transport
in RTP.
5.1 ATRAC3 MIME Registration
The MIME subtype for the Adaptive TRansform Codec version 3 (ATRAC3)
is allocated from the Vendor tree since this codec is intended to be
used with commercial products, and use of any ATRAC family codec
requires a license from Sony Corporation, the vendor.
Note, any unspecified parameter MUST be ignored by the receiver.
Media Type name: audio
Media subtype name: ATRAC3
Required parameters:
frameLength: Indicates the size in bytes of an encoded audio frame.
In essence, this value determines the bit-rate of the encoded audio.
Permissible values are 192 (66kbps), 304 (105kbps), and 384
(132kbps).
Optional parameters:
maxRedundantFrames: The maximum number of redundant frames that may
be sent during a session in any given packet under the redundant
framing mechanism detailed in the draft.
maxptime: The maximum amount of media which can be encapsulated in a
payload packet, expressed as time in milliseconds. The time is
calculated as the sum of the time the media present in the packet
represents. The time SHOULD be a multiple of the frame size. If
this parameter is not present, the sender MAY encapsulate a maximum
of 16 encoded frames into one RTP packet.
ptime: see RFC 2327 [2]
Romaine, et al. Expires September 20, 2004 [Page 9]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Encoding considerations: This type is defined for transfer via RTP
RFC 3550 [1].
Security considerations: Audio data is believed to offer no security
risks.
Public specifications: Please refer to section 7 of this draft.
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
Mitsuyuki Hatanaka
hatanaka@av.crl.sony.co.jp
Intended usage: LIMITED USE
Only licensees of ATRAC technology may use this type.
Author/Change controller:
hatanaka@av.crl.sony.co.jp
5.2 ATRAC-X MIME Registraion
The MIME subtype for the Adaptive TRansform Codec version X (ATRAC-X)
is allocated from the Vendor tree since this codec is intended to be
used with commercial products, and use of any ATRAC family codec
requires a license from Sony Corporation, the vendor.
Note, any unspecified parameter MUST be ignored by the receiver.
Media Type name: audio
Media subtype name: ATRAC-X
Required parameters:
sampleRate: Represents the sampling frequency in Hz of the original
audio data. Permissible values are 32000, 44100, 48000, 88200,
96000.
frameLength: Indicates the size in bytes of an encoded audio frame.
In essence, this value determines the bitrate of the encoded audio.
Permissible values lie within 8 ~ 8192.
channelID: Indicates the number of channels and channel layout
according to the table in Section 5.3. Note that this layout is
different from that proposed in RFC 3551 [3]. However, as channelID
= 0 defines an ambiguous channel layout, the channel mapping defined
Romaine, et al. Expires September 20, 2004 [Page 10]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
in Section 4.1 of [3] could be used. Permissible values are 0, 1, 2,
3, 4, 5, 6, 7.
Optional parameters:
maxRedundantFrames: The maximum number of redundant frames that may
be sent during a session in any given packet under the redundant
framing mechanism detailed in the draft. If this parameter is not
used, a default of "16" SHOULD be assumed.
delayMode: Indicates a desire to use low-delay features, in which
case the decoder will process received data accordingly based on this
value. Permissible values are 2 and 4.
encryptionMode: Indicates whether the audio frames have been
encrypted using OpenMG ("OpenMG") or a third party method ("Other).
If "Other", the specific mode MUST be determined at the application
level. Permissible values are "OpenMG" and "Other".
maxptime: The maximum amount of media which can be encapsulated in a
payload packet, expressed as time in milliseconds. The time is
calculated as the sum of the time the media present in the packet
represents. The time SHOULD be a multiple of the frame size. If
this parameter is not present, the sender MAY encapsulate a maximum
of 16 encoded frames into one RTP packet.
ptime: see RFC 2327 [2]
Encoding considerations: This type is defined for transfer via RTP
(RFC 3550) [1].
Security considerations:
Audio data is believed to offer no security risks.
Public specifications:
Please refer to section 7 of this draft.
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
Mitsuyuki Hatanaka
hatanaka@av.crl.sony.co.jp
Intended usage: LIMITED USE
Only licensees of ATRAC technology may use this type.
Author/Change controller:
Romaine, et al. Expires September 20, 2004 [Page 11]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
hatanaka@av.crl.sony.co.jp
5.3 Channel Mapping Configuration Table
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| channelID | Number of | Default Speaker |
| | Channels | Mapping |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 | max 64 | undefined |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 1 | front: center |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 2 | 2 | front: left, right |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 3 | 3 | front: left, right |
| | | front: center |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 4 | 4 | front: left, right |
| | | front: center |
| | | rear: surround |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 5 | 5+1 | front: left, right |
| | | front: center |
| | | rear: left, right |
| | | LFE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 6 | 6+1 | front: left, right |
| | | front: center |
| | | rear: left, right |
| | | rear: center |
| | | LFE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 7 | 7+1 | front: left, right |
| | | front: center |
| | | rear: left, right |
| | | side: left, right |
| | | LFE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5.4 Mapping MIME Parameters into SDP
The information carried in the MIME media type specification has a
specific mapping to fields in the Session Description Protocol (SDP)
[2], which is commonly used to describe RTP sessions. When SDP is
used to specify sessions employing the ATRAC family of codecs, the
following mapping rules according to the ATRAC codec apply:
Romaine, et al. Expires September 20, 2004 [Page 12]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
5.4.1 For MIME subtype ATRAC3
o The MIME type ("audio") goes in SDP "m=" as the media name
o The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
the encoding name.
o The "frameLength" parameter goes in SDP "a=fmtp". This parameter
MUST be present. "maxRedundantFrames" may follow, but if no value
is transmitted, the receiver SHOULD assume a default value of
"16".
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
"a=maxptime" attributes, respectively.
5.4.2 For MIME subtype ATRAC-X
o The MIME type ("audio") goes in SDP "m=" as the media name
o The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
the encoding name. This should be followed by the "sampleRate"
(as the RTP clock rate), and then the "channelID" parameter.
o Any remaining parameters go in the SDP "a=fmtp" attribute by
copying them directly from the MIME media type string as a
semicolon separated list of parameter=value pairs. The
"frameLength" parameter must be the first entry on this line. The
receiver MUST assume a default value of "16" for
"maxRedundantFrames".
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
"a=maxptime" attributes, respectively.
5.5 Offer-Answer Model Considerations
Some options for encoding and decoding ATRAC audio data will require
either or both the sender and receiver to comply with certain
specifications. In order to establish an interoperable transmission
framework, an Offer-Answer negotiation in SDP should observe the
following considerations:
5.5.1 For MIME subtype ATRAC3
o Downgraded subsets of "frameLength" are possible. However for best
performance, we suggest the Answerer respond with the highest
possible values offered.
Romaine, et al. Expires September 20, 2004 [Page 13]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
5.5.2 For MIME subtype ATRAC-X
o When creating an offer with considerably high requirements (such
as 8 channels at 96kHz), it is RECOMMENDED that the offerer also
propose a configuration with lower requirements, such as a stereo
only option. Although multiple alternative configurations may be
offered, care should be taken to not offer too many payload types.
o Downgraded subsets of "sampleRate", "frameLength", and "channelID"
are possible. However for best performance, we suggest the
Answerer respond with the highest possible values offered.
o The "maxRedundantFrames" is a suggested minimum. The Answerer MAY
use a higher value, but MUST NOT use a lower value.
o The optional parameters "delayMode" and "encryptionMode" are
non-negotiable. Thus, if the Answerer cannot comply with the
offered value, the session must be deemed inoperable.
o The parameters "maxptime" and "ptime" should not, in most cases,
affect the interoperability. However, the parameter settings can
affect application performance.
5.6 Example SDP Session Descriptions
Example usage of ATRAC-X with stereo at 44100Hz:
m=audio 49120 RTP/AVP 99
a=rtpmap:99 ATRAC-X/44100/2
a=fmtp:99 frameLength=312; delayMode=2
a=maxptime:20
Example usage of ATRAC-X with 5.1 setup at 48000Hz:
m=audio 49120 RTP/AVP 99
a=rtpmap:99 ATRAC-X/48000/5
a=fmtp:99 frameLength=1156
a=maxptime:30
5.7 Example Offer-Answer Exchange
An example Offer/Answer model (assuming ATRAC Family's PT is 99).
Alice's Offer:
m=audio 49170 RTP/AVP 99
a=rtpmap:98 ATRAC-X/44100/5
Romaine, et al. Expires September 20, 2004 [Page 14]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
a=fmtp:99 frameLength=1156
a=rtpmap:99 ATRAC-X/44100/386/5
a=fmtp:99 frameLength=386
Bob's Answer:
m=audio 49170 RTP/AVP 99
a=rtpmap:99 ATRAC-X/44100/2
a=fmtp:99 frameLength=386
Romaine, et al. Expires September 20, 2004 [Page 15]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
6. IANA Considerations
New MIME subtypes for ATRAC3 and ATRAC-X are currently being
registered (see Section 5).
Romaine, et al. Expires September 20, 2004 [Page 16]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
7. Security Considerations
Certain security precautions may be desired to protect copyrighted
material. The payload format as described in this document is
subject to the security considerations defined in RFC3550 [1]. This
payload format however does not implement any security mechanisms of
its own. External means, such as SRTP [5], MAY be used since the
audio compression scheme follows an end-to-end model.
Since the data transported is audio that is already encoded, the main
security issues are confidentiality, integrity, and authentication of
the actual audio.
7.1 Confidentiality
To ensure confidentiality of ATRAC encoded audio, the audio frames
will have to be encrypted. Encryption of the payload header,
however, is not as neccessary, and in fact may not be preferrable if
the information could be useful to some third party application.
Because the audio compression scheme follows an end-to-end model,
encryption may be performed after packet encapsulation. As
multi-channel transmissions are contained in single encoded audio
frames, there is no concern for encryption affecting interleaving
data.
7.2 Authentication
Transmitted data may be tampered or altered due malicious attempts,
such as man-in-the-middle attacks. Such attacks may result in
depacketization and/or decoding errors that could decimate audio
quality.
As this payload format does not include its own means for sender
authentication and integrity protection, an external mechanism must
be used. It is RECOMMENDED, however, that the chosen mechanism
protect more than just the audio data bits. For example, to protect
against a man-in-the-middle attack, the payload header and RTP header
SHOULD be protected.
7.3 Decoding Validation
Verification of the received encoded audio packets should be
performed so as to ensure a minimal level of audio quality. As a
most primitive implementation, if the receiver calculates a packet
size differing from the payload length based on data in the payload
header fields, the receiver SHOULD discard the packet.
Romaine, et al. Expires September 20, 2004 [Page 17]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Normative References
[1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobsen,
"RTP: A Transport Protocol for Real-Time Applications", RFC
3550, July 2003.
[2] Handley, M. and V. Jacobson, "SDP: Session Description
Protocol", RFC 2327, April 1998.
[3] Schulzrinne, H., "RTP Profile for Audio and Video Conferences
with Minimal Control", RFC 3551, July 2003.
[4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels, BCP 14", RFC 2119, March 1997.
Romaine, et al. Expires September 20, 2004 [Page 18]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Informative References
[5] Kerr, P., "RTP Payload Format for Vorbis Encoded Audio", October
2003.
[6] Sjoberg, J., "Real-Time Transport Protocol (RTP) Payload Format
and File Storage Format for the Adaptive Multi-Rate (AMR) and
Adpative Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267,
June 2002.
[7] Baugher, M., Carrara, E., McGrew, D., Naslund, M. and Norrman,
"The Secure Real Time Transport Protocol", July 2003.
[8] Rosenberg, J. and Schulzrinne, "An Offer/Answer Model with the
Session Description Protocl (SDP)", RFC 3264, June 2002.
Authors' Addresses
Matthew Romaine
Sony Corporation, Japan
6-7-35 Kitashinagawa
Shinagawa-ku
Tokyo 141-0001
Japan
EMail: Matthew.Romaine@jp.sony.com
Mitsuyuki Hatanaka
Sony Corporation, Japan
6-7-35 Kitashinagawa
Shinagawa-ku
Tokyo 141-0001
Japan
EMail: hatanaka@av.crl.sony.co.jp
Jun Matsumoto
Sony Corporation, Japan
6-7-35 Kitashinagawa
Shinagawa-ku
Tokyo 141-0001
Japan
EMail: jun@av.crl.sony.co.jp
Romaine, et al. Expires September 20, 2004 [Page 19]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive
Director.
Full Copyright Statement
Copyright (C) The Internet Society (2004). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
Romaine, et al. Expires September 20, 2004 [Page 20]
Internet-Draft RTP Payload Format for ATRAC Family March 2004
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Romaine, et al. Expires September 20, 2004 [Page 21]