INTERNET-DRAFT                                             Eric A. Hall
  Document: draft-hall-mime-app-mbox-00.txt                      May 2004
  Expires: December, 2004
  Category: Standards Track
  
  
                       The APPLICATION/MBOX Media-Type
  
  
     Status of this Memo
  
     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC 2026.
  
     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups. Note that
     other groups may also distribute working documents as Internet-
     Drafts.
  
     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time. It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."
  
     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt
  
     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.
  
  
     Copyright Notice
  
     Copyright (C) The Internet Society (2004).  All Rights Reserved.
  
  
     Abstract
  
     This document requests that the application/MBOX media-type be
     authorized for allocation by IANA, according to the terms
     specified in RFC 2048 [RFC2048].
  
  
  
  Internet Draft     draft-hall-mime-app-mbox-00.txt          May 2004
  
  
  
  1.      Background and Overview
  
     UNIX and look-alike operating systems have historically made
     extensive use of "MBOX" mailbox files for a variety of messaging
     purposes. In the common case, these files are used to hold
     collections of electronic mail messages which users manipulate as
     "folders" of a private mail-store. These files are also frequently
     used by a variety of back-end email services, including delivery
     servers, filtering systems, and mailing-list programs. Over the
     last few years, the use of these files has also spread to other
     operating systems, with a variety of messaging tools on numerous
     platforms now providing direct access to MBOX files.
  
     The increased pervasiveness of these files has led to an increased
     demand for improvements in cross-system, network-wide interchange
     of these files. In turn, this requirement also dictates a need for
     a media-type definition for MBOX files in general.
  
     For example, some applications allow users to open MBOX files as
     discrete data-objects, but use platform- or product-specific
     mapping techniques to identify these files. Similarly, many
     mailing list archive programs provide access to MBOX files for
     historical messages, but will publish these files as text/plain or
     some other generic media-type, but which causes problematic end-
     of-line conversions when these files are transferred across a
     network, or which does not provide for local actions that should
     be performed against the data (such as prompting the user to
     import the mailbox data into a local mail-store). The definition
     of a standard media-type for these files would facilitate a more
     consistent behavior for these types of actions, and would further
     the cause of interoperability.
  
     Note that this specification does not define the MBOX data file as
     an authoritative Internet data-type or structure. Instead, it
     merely seeks to define a standard media-type definition for these
     files, so that their transfer may be more consistent.
  
  2.      Prerequisites and Terminology
  
     Readers of this document are expected to be familiar with the
     specification for MIME registrations (RFC 2048).
  
     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
  
  Hall                  I-D Expires: December 2004             [page 2]


  Internet Draft     draft-hall-mime-app-mbox-00.txt          May 2004
  
  
     in this document are to be interpreted as described in RFC 2119
     [RFC2119].
  
  3.      The APPLICATION/MBOX Media-Type Registration Request
  
     This section provides the registration request, as per RFC 2048,
     and which will be submitted to IANA after IESG approval.
  
     MIME media type name: application
  
     MIME subtype name: MBOX
  
     Required parameters: none
  
     Optional parameters: none
  
     Encoding considerations: MBOX data typically consists of seven-bit
     ASCII characters in an eight-bit file stream. The data often
     contains subordinate data which was previously encoded in order to
     fit within the seven-bit character space. If the content must be
     further encoded in order to satisfy transfer restrictions, quoted-
     printable is generally encouraged, since it is likely to introduce
     the least amount of overhead.
  
     Security considerations: MBOX data is passive, and does not
     generally represent a unique or new security threat. However,
     there is some risk in sharing any kind of data, in that
     unintentional information may be exposed, and that risk applies to
     MBOX data as well.
  
     Interoperability considerations: The MBOX file format has a long
     and rich history on UNIX and UNIX-like platforms. It is also used
     with many messaging products on non-UNIX platforms, and is also
     commonly used for intermediary purposes, such as mailing list
     archives, an intermediary conversion format for private mail-
     stores, and other messaging-related purposes.
  
     The canonical MBOX file format depends on the use of ASCII Line
     Feed as the end-of-line character, and this usage is typically
     followed on non-UNIX platforms. Text-based transfer protocols will
     sometimes make the mistake of converting the Line Feed end-of-line
     marker into some other sequence that is assumed to be more
     appropriate for the destination system, although this is often
     harmful. In all cases, the application/MBOX data MUST be
     transferred as an opaque eight-bit file stream, with no end-of-
     line conversion being performed by the transfer protocol.
  
  Hall                  I-D Expires: December 2004             [page 3]


  Internet Draft     draft-hall-mime-app-mbox-00.txt          May 2004
  
  
  
     Published specification: see Appendix A.
  
     Applications which use this media type: scores of messaging
     products make use of the MBOX file format.
  
     Magic number(s): no standard
  
     File extension(s): MBOX files sometimes have a ".mbox" extension,
     but this is not required nor even reasonably expected.
  
     Macintosh File Type Code(s): no standard
  
     Person & email address to contact for further information: Eric A.
     Hall (ehall@ntrg.com)
  
     Intended usage: COMMON
  
  4.      Security Considerations
  
     See the discussion in section 3.
  
  5.      IANA Considerations
  
     After any IESG approval which may be forthcoming, IANA would be
     expected to register the application/MBOX media-type, using the
     application provided in section 3 above.
  
  6.      Normative References
  
          [RFC2048]     Freed, N., Klensin, J., Postel, J.,
                         "Multipurpose Internet Mail Extensions (MIME)
                         Part Four: Registration Procedures", BCP 13,
                         RFC 2048, November 1996.
  
          [RFC2119]     Bradner, S., "Key words for use in RFCs to
                         Indicate Requirement Levels", BCP 14, RFC
                         2119, March 1997.
  
  Appendix A.    The Common MBOX Format
  
     The MBOX file format is not documented by any authoritative
     source, but instead only exists as commonly-understood output from
     historical messaging tools. Partly due to the lack of
     authoritative documentation, the MBOX file format has been adapted
     and mutated by various utilities over the years, and does not
     exist in a form which is syntactically precise.
  
  Hall                  I-D Expires: December 2004             [page 4]


  Internet Draft     draft-hall-mime-app-mbox-00.txt          May 2004
  
  
  
     MBOX files almost always use the Line Feed character (0x10) as the
     end-of-line marker. MBOX files usually contain seven-bit character
     data, but eight-bit data is not uncommon.
  
     MBOX files typically contain a sequence of messages, each of which
     begin with a "From_" line, and which are further separated from
     their neighboring messages by an empty line that precedes the next
     "From_" line. This means that the first message in an MBOX file
     will immediately begin with a "From_" line, while every other
     message will begin with a "From_" line that is immediately
     preceded by a Line Feed character.
  
     The structure of the "From_" lines vary somewhat, but almost
     always contain the exact character sequence of "From", followed by
     whitespace, followed by an email address of some kind, followed by
     more whitespace, and terminated by a timestamp sequence of some
     kind. Note that the email address may use any of the forms which
     have been used throughout history, and the timestamp sequences can
     also vary according to system preferences. In most cases, the
     timestamp is followed by an end-of-line signal, but some messaging
     systems have also been known to append additional information
     after the timestamp.
  
     The exact format of the "From_" line in use with a particular MBOX
     file can often be determined by examining the first line of the
     file itself, which will be a "From_" line, and which is easy to
     locate, although implementers are cautioned that multiple MBOX
     files may have been joined together, or a single file may have
     been accessed by multiple clients, resulting in different "From_"
     line formats being used within a single file.
  
     As a result of these variations, implementers are strongly
     encouraged to fully apply the robustness principle to any MBOX
     files which are transferred across system lines. In particular,
     the email address and timestamp sequences are strongly encouraged
     to conform with the ABNF syntax rules for the Address and Date-
     Time sequences described in RFC 2822 [RFC2822], although
     recipients MUST be prepared to receive less-structured sequences.
  
     Many implementations are also known to escape body lines beginning
     with "From ", using a leading Greater Than symbol (0x3E) to break
     the pattern matching. This is so that excessively-liberal parsers
     do not misinterpret these sentences as new "From_" lines. However,
     other implementations are known not to escape such lines unless
     they also appear to contain an email address and a timestamp,
  
  Hall                  I-D Expires: December 2004             [page 5]


  Internet Draft     draft-hall-mime-app-mbox-00.txt          May 2004
  
  
     while other implementations are known to perform secondary escapes
     against text which is already escaped or quoted. This issue does
     not generally affect the transport of MBOX files and is therefore
     beyond the scope of this document, but implementations should be
     aware of these considerations.
  
  
  Acknowledgments
  
     Funding for the RFC editor function is currently provided by the
     Internet Society.
  
  
  Authors' Addresses
  
     Eric A. Hall
     ehall@ehsco.com
  
  
  Full Copyright Statement
  
     Copyright (C) The Internet Society (2003). All Rights Reserved.
  
     This document and translations of it may be copied and furnished
     to others, and derivative works that comment on or otherwise
     explain it or assist in its implementation may be prepared,
     copied, published and distributed, in whole or in part, without
     restriction of any kind, provided that the above copyright notice
     and this paragraph are included on all such copies and derivative
     works. However, this document itself may not be modified in any
     way, such as by removing the copyright notice or references to the
     Internet Society or other Internet organizations, except as needed
     for the purpose of developing Internet standards in which case the
     procedures for copyrights defined in the Internet Standards
     process must be followed, or as required to translate it into
     languages other than English.
  
     The limited permissions granted above are perpetual and will not
     be revoked by the Internet Society or its successors or assigns.
  
     This document and the information contained herein is provided on
     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
     IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
     THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
  
  Hall                  I-D Expires: December 2004             [page 6]