Audio Video Transport WG                                  Sassan Ahmadi
INTERNET-DRAFT                                               Nokia Inc.
Expires: April 18, 2005                                October 18, 2004



 Storage File Format for the Variable-Rate Multimode Wideband (VMR-WB)
                                Audio Codec
                  <draft-ietf-avt-vmr-wb-file-format-00.txt>



Status of this Memo


   By submitting this Internet-Draft, I certify that any applicable
   patent or other IPR claims of which I am aware have been disclosed,
   and any of which I become aware will be disclosed, in accordance
   with RFC 3668.


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.


   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as "work in
   progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.txt


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


   This document is a submission of the IETF AVT WG. Comments should
   be directed to the AVT WG mailing list, avt@ietf.org.



Abstract

   This document specifies a file format for the storage of
   variable-rate multimode wideband (VMR-WB) speech codec. A MIME type
   registration is included for VMR-WB files.


   VMR-WB is a variable-rate multimode wideband speech codec that has a
   number of operating modes, one of which is interoperable with AMR-WB
   (i.e., RFC 3267) audio codec at certain rates. Therefore, provisions
   have been made in this draft to facilitate retrieval of VMR-WB
   stored data (generated in the interoperable mode) by AMR-WB decoder.



Table of Contents


1.Introduction.................................................2
2.Conventions and Acronyms.....................................2


Sassan Ahmadi                                                [page 1]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


3.Overview of VMR-WB. ............ ............................3
4. VMR-WB File Format..........................................4
   4.1. Single Channel Header..................................4
   4.2. Multi-Channel Header...................................4
   4.3. Speech Frames..........................................5
5. Security Considerations.....................................6
6. VMR-WB File Format MIME Registration........................7
7. IANA Considerations.........................................9
8. Acknowledgements............................................9
References.....................................................9
   Normative References........................................9
   Informative References......................................9
Author's Address...............................................9
IPR Notice....................................................10
Copyright Notice..............................................10



1. Introduction


   This document specifies a file format for storage of VMR-WB encoded
   Speech/audio data. The VMR-WB file format supports single and
   multi-channel storage. It further facilitates decoding of VMR-WB
   generated files by AMR-WB decoder [4].


   The file format is specified in Section 4. A MIME type registration
   for VMR-WB file format is provided in Section 6.


   The VMR-WB RTP payload formats have been specified in a separate
   document [2].


   To ensure coherence with RFC YYYY [2], common tables and parameters
   are not defined in this document, rather corresponding tables and
   parameters of [2] are referenced.



2. Conventions and Acronyms


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
   in this document are to be interpreted as described in RFC2119 [2].


   The following acronyms are used in this document:


    3GPP2  - The Third Generation Partnership Project 2
    CDMA   - Code Division Multiple Access
    AMR-WB - Adaptive Multi-Rate Wideband Codec
    VMR-WB - Variable-Rate Multimode Wideband Codec
    MIME   - Multipurpose Internet Mail Extension

   The term "interoperable mode" in this document refers to VMR-WB
   mode 3, which is interoperable with AMR-WB codec modes 0, 1, and 2.


   The term "non-interoperable modes" in this document refers to


Sassan Ahmadi                                                [page 2]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   VMR-WB modes 0, 1, and 2.


   The term "frame-block" is used in this document to describe the
   time-synchronized set of speech frames in an N-channel storage
   scenario. A frame-block will contain N speech frames, one from each
   of the channels, and all N speech frames represent exactly the same
   time period.



3. Overview of VMR-WB


   VMR-WB is the wideband speech-coding standard developed by Third
   Generation Partnership Project 2 (3GPP2) for encoding/decoding
   wideband/narrowband speech content in multimedia services in 3G CDMA
   cellular systems [1,2]. It has a number of operating modes, where
   each mode is a tradeoff between voice quality and average data rate.


   While VMR-WB is a native CDMA codec complying with all CDMA system
   requirements, it is further interoperable with AMR-WB [4] at 12.65,
   8.85, and 6.60 kbps.


   VMR-WB by default is a wideband codec operating with 16000 Hz
   sampled media (i.e., speech or audio); however, it is further
   capable of processing 8000 Hz sampled media in all modes of
   operation [1]. The VMR-WB decoder does not require a priori
   knowledge about the sampling rate of the original media (i.e.,
   speech/audio signals sampled at 8 or 16 kHz) at the input of the
   encoder.


   The VMR-WB decoder, by default, generates 16000 Hz wideband output
   Regardless of the encoder input sampling frequency, unless
   instructed otherwise.



4. VMR-WB File Format


   The storage format is used for storing VMR-WB encoded speech
   frames in a file or as an e-mail attachment. Multiple channel
   content is also supported. The storage format described in section
   is fully consistent with the one described in Section 8.5 of [1].


   Note: The storage format described in this document uses several
   magic numbers to differentiate between interoperable and
   non-interoperable modes of VMR-WB as well as single and
   multi-channel files. This may be accomplished in other ways that are
   simpler and more straightforward that one should consider in design
   of future storage formats. The use of different magic numbers and
   file extensions for the files generated by the interoperable and
   non-interoperable modes of VMR-WB enables a file reader to decide if
   it is capable of decoding the content without opening the file or
   attempting to decode the content.


   In general, VMR-WB file has the following structure:


Sassan Ahmadi                                                [page 3]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   +------------------+
   | Header           |
   +------------------+
   | Speech frame 1   |
   +------------------+
   : ...              :
   +------------------+
   | Speech frame n   |
   +------------------+



4.1. Single channel Header


   A single channel VMR-WB file header contains only a magic number.


   The magic number for single channel VMR-WB files containing
   speech data generated in the non-interoperable modes; i.e.,
   VMR-WB modes 0, 1, or 2, MUST consist of ASCII character
   string


     "#!VMR-WB\n"
     (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x0a in hexadecimal).


   Note, the "\n" is an important part of the magic numbers and
   MUST be included in the comparison; otherwise, the single
   channel magic number above will become indistinguishable from
   that of the multi-channel file defined in the next section.


   The magic number for single channel VMR-WB files containing
   speech data generated in the interoperable mode; i.e., VMR-WB
   mode 3, MUST consist of ASCII character string


     "#!VMR-WB_I\n"
     (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x49 0x0a in
      hexadecimal).


   In the interoperable mode, a file generated by VMR-WB is decodable
   with AMR-WB (with the exception of different magic numbers).
   However, VMR-WB can only decode AMR-WB codec modes 0, 1, and 2.


   The AMR-WB single channel magic number and AMR-WB file extension
   [4] can also be used to store speech data generated by VMR-WB
   encoder operating in the interoperable mode to facilitate decoding
   of the file by an AMR-WB decoder. Since VMR-WB decoder is only
   capable of decoding certain AMR-WB codec modes, it MUST be ensured
   that only supported codec modes of AMR-WB are presented to the
   VMR-WB decoder.



4.2. Multi-channel Header


   The multi-channel header consists of a magic number followed
   by a 32-bit channel description field, giving the multi-channel


Sassan Ahmadi                                                [page 4]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   header the following structure:


   +----------------------------+
   |        Magic Number        |
   +----------------------------+
   | Channel Description Field  |
   +----------------------------+


   The magic number for multi-channel VMR-WB files containing
   speech data generated in the non-interoperable modes; i.e.,
   VMR-WB modes 0, 1, or 2, MUST consist of the ASCII character string


     "#!VMR-WB_MC1.0\n"
     (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x31
      0x2E 0x30 0x0a in hexadecimal).


   The version number in the magic numbers refers to the version
   of the file format.


   The magic number for multi-channel VMR-WB files containing
   speech data generated in the interoperable mode; i.e., VMR-WB
   mode 3, MUST consist of the ASCII character string


     "#!VMR-WB_MCI1.0\n"
     (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43 0x49
      0x31 0x2E 0x30 0x0a in hexadecimal).


   The 32-bit channel description field is defined as


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Reserved bits                                    | CHAN  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Reserved bits: MUST be set to 0 when written, and a reader
   MUST ignore them.


   CHAN (4 bit unsigned integer): Indicates the number of audio
   channels contained in this storage file. The valid values and
   the order of the channels within a frame-block are specified
   in Section 4.1 in [5].


   The AMR-WB multi-channel magic number and AMR-WB file extension [4]
   can also be used to store speech data generated by VMR-WB encoder
   operating in the interoperable mode to facilitate decoding of the
   file by an AMR-WB decoder. Since VMR-WB decoder is only capable of
   decoding certain AMR-WB codec modes, it MUST be ensured that only
   supported codec modes of AMR-WB are presented to the VMR-WB decoder.



4.3. Speech Frames



Sassan Ahmadi                                                [page 5]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   After the file header, speech frame-blocks consecutive in time are
   stored in the file. Each frame-block contains a number of
   octet-aligned speech frames equal to the number of channels, and
   stored in increasing order, starting with channel 1. Each stored
   speech frame starts with a one-octet frame header with the following
   format:


    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |P|  FT   |Q|P|P|
   +-+-+-+-+-+-+-+-+


   The FT field is defined in Table 3 of [2]. Padding bits MUST be
   set to zero and MUST be ignored by a receiver.


   Q (1 bit): Frame quality indicator. If set to 0, indicates
   the corresponding frame is corrupted. The VMR-WB encoder
   always sets Q bit to 1. The VMR-WB decoder may ignore the Q bit.


   Following this one octet header, the speech bits are placed
   as defined in Section 6.3.4 of [2]. The last octet of each frame is
   padded with zeroes, if needed, to achieve octet alignment.


   The following example shows a VMR-WB speech frame encoded at
   Half-Rate (with 124 speech bits) in the storage format.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| FT=4  |1|0|0|                                               |
   +-+-+-+-+-+-+-+-+                                               +
   |                                                               |
   +          Speech bits for frame-block n, channel k             +
   |                                                               |
   +                                                               +
   |                                                               |
   +       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       |0|0|0|0|
   +-+-+-+-+-+-+-+-+


   Frame-blocks or speech frames lost in transmission MUST be stored as
   Erasure/SPEECH_LOST (FT=14) and non-received frame-blocks between
   SID updates during non-speech periods (when using DTX) MUST be
   stored as Blank/NO_DATA frames (FT=15) in complete frame-blocks to
   maintain synchronization with the original media.



5. Security Considerations


   This document specifies a file format only, not a streaming protocol
   payload format, nor a transfer method.  As such, it introduces no
   security risks in addition to those associated with any audio codec
   or media file format (e.g., denial of service by transmitting a


Sassan Ahmadi                                                [page 6]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   file larger than the receiver can handle).  Note that those security
   concerns should be understood before using the file format specified
   here. Clearly it is possible to author malicious files in order to
   attack a receiver. However, clients can and usually do protect
   themselves against this kind of attack.


   There is currently no provision in the standards for encryption,
   signing, or authentication of this file format. However, depending
   on the application, external mechanisms can be used to provide
   privacy, authentication, and protection against authorized
   use or distribution of the media.



6. VMR-WB File Format MIME Registration


   This section defines the parameters that may be used to select
   optional features in the VMR-WB storage format.


   The parameters are defined here as part of the MIME subtype
   registration for the VMR-WB file format.


   The MIME subtype for the Variable-Rate Multimode Wideband
   (VMR-WB) audio codec is allocated from the IETF tree. This MIME
   registration covers non-real-time transfers via stored files.


   Note, the receiver MUST ignore any unspecified parameter and
   use the default values instead.


     Media Type name:      audio


     Media subtype name:   VMR-WB-FILE


     Required parameters:  none


   Note that if no input parameters are defined, the default values
   will be used.


   OPTIONAL file format parameters:


     mode-set:       see RFC YYYY [2]


     channels:       see RFC YYYY [2]


   Encoding considerations:


          This type is defined for transfer of VMR-WB data using the
          file format specified in Section 4 of RFC XXXX. The stored
          file format is binary data and must be encoded for non-binary
          transport; the Base64 encoding is suitable in many cases.


   Security considerations:


          See Section 5 of this document.


Sassan Ahmadi                                                [page 7]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   Public specification:


          The VMR-WB speech codec is specified in following
          3GPP2 specifications C.S0052-0 version 1.0.
          File format is specified in RFC XXXX.
          Transfer methods are specified in RFC YYYY.


   Additional information:


   Magic numbers:


          Single channel (for the non-interoperable modes)
          ASCII character string "#!VMR-WB\n"
          (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x0a in
           hexadecimal)


          Single channel (for the interoperable mode)
          ASCII character string "#!VMR-WB_I\n"
          (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x49 0x0a
           in hexadecimal)


          Multi-channel (for the non-interoperable modes)
          ASCII character string "#!VMR-WB_MC1.0\n"
          (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43
           0x31 0x2E 0x30 0x0a in hexadecimal)


          Multi-channel (for the interoperable mode)
          ASCII character string "#!VMR-WB_MCI1.0\n"
          (or 0x23 0x21 0x56 0x4d 0x52 0x2d 0x57 0x42 0x5F 0x4D 0x43
           0x49 0x31 0x2E 0x30 0x0a in hexadecimal)


   File extensions for the non-interoperable modes: vmr, VMR
                     Macintosh file type code: none
                     Object identifier or OID: none


   File extensions for the interoperable mode: vmi, VMI
                     Macintosh file type code: none
                     Object identifier or OID: none


   Person & email address to contact for further information:


                 Sassan Ahmadi, Ph.D.   Nokia Inc. USA
                 sassan.ahmadi@nokia.com


   Intended usage: COMMON.


     This file format is expected to be widely used in Internet email
     user agents, multimedia authoring and playing software, and
     CDMA2000 mobile terminals.


   Author/Change controller:


   IETF Audio/Video Transport working group delegated from the IESG


Sassan Ahmadi                                                [page 8]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


7. IANA Considerations

   It is requested that one new MIME subtype (audio/VMR-WB-FILE) is
   registered by IANA, see Section 6.



8. Acknowledgements


   The author would like to thank Redwan Salami of VoiceAge
   Corporation, Ari Lakaniemi of Nokia Inc., and IETF/AVT chairs Colin
   Perkins and Magnus Westerlund for their technical comments
   to improve this document.


   Also, the author would like to acknowledge that some parts of
   RFC 3267 [4] has been used in this document.



References


Normative References


   [1]  3GPP2 C.S0052-0 v1.0 "Source-Controlled Variable-Rate
        Multimode Wideband Speech Codec (VMR-WB) Service Option
        62 for Spread Spectrum Systems", 3GPP2 Technical Specification,
        July 2004.


   [2]  S. Ahmadi, "Real-Time Transport Protocol (RTP) Payload Formats
        for the Variable-Rate Multimode Wideband (VMR-WB) Audio Codec",
        RFC YYYY, Internet Engineering Task Force, Dec. 2004.


   [3]  S. Bradner, "Key words for use in RFCs to Indicate
        Requirement Levels", BCP 14, RFC 2119, Internet Engineering
        Task Force, March 1997.


   [4]  J. Sjoberg, et al., "Real-Time Transport Protocol (RTP)
        Payload Format and File Storage Format for the Adaptive
        Multi-Rate (AMR) and Adaptive Multi-Rate Wideband
        (AMR-WB) Audio Codecs", RFC 3267, Internet
        Engineering Task Force, June 2002.



Informative References


   [5]  H. Schulzrinne, "RTP Profile for Audio and Video
        Conferences with Minimal Control" STD 65, RFC 3551, Internet
        Engineering Task Force, July 2003.


   Any 3GPP2 document can be downloaded from the 3GPP2 web
   server, "http://www.3gpp2.org/", see specifications.



Author's Address



Sassan Ahmadi                                                [page 9]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


   The editor will serve as the point of contact for all
   technical matters related to this document.


    Dr. Sassan Ahmadi             Phone: 1 (858) 831-5916
                                  Fax:   1 (858) 831-4174
    Nokia Inc.                    Email: sassan.ahmadi@nokia.com
    12278 Scripps Summit Dr.
    San Diego, CA 92131 USA


    This Internet-Draft expires in six months from October 18, 2004.



RFC Editor Considerations


    The RFC editor is requested to replace all occurrences of XXXX with
    the RFC number that this document will receive.


    The RFC editor is also requested to replace all occurrences of YYYY
    with the RFC number that [2] will receive.


IPR Notice


    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.


    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use
    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.


    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
    this standard. Please address the information to the IETF at
    ietf-ipr@ietf.org.



Copyright Notice


    Copyright (C) The Internet Society (2004).  This document is
    subject to the rights, licenses and restrictions contained in BCP
    78, and except as set forth therein, the authors retain all their
    rights.


    This document and the information contained herein are provided on


Sassan Ahmadi                                                [page 10]


INTERNET-DRAFT              VMR-WB File Format              Oct. 2004


    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
    THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
    THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
    ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
    PARTICULAR PURPOSE.















































Sassan Ahmadi                                                [page 11]