INTERNET-DRAFT                                             Eric A. Hall
  Document: draft-hall-mime-app-mbox-03.doc                  January 2005
  Expires: July, 2005
  Category: Standards-Track
  
  
                       The APPLICATION/MBOX Media-Type
  
     Status of this Memo
  
     By submitting this Internet-Draft, I certify that any applicable
     patent or other IPR claims of which I am aware have been
     disclosed, and any of which I become aware will be disclosed, in
     accordance with RFC 3668.
  
     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups. Note that
     other groups may also distribute working documents as Internet-
     Drafts.
  
     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time. It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."
  
     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt.
  
     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.
  
  
     Copyright Notice
  
     Copyright (C) The Internet Society (2004). All Rights Reserved.
  
  
     Abstract
  
     This memo requests that the application/mbox media-type be
     authorized for allocation by the IESG, according to the terms
     specified in RFC 2048 [RFC2048]. This memo also defines a default
     format for the mbox database, which must be supported by all
     conformant implementations.
  
  
  
  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
  
  1.      Background and Overview
  
     UNIX-like operating systems have historically made widespread use
     of "mbox" database files for a variety of local email purposes. In
     the common case, mbox files store linear sequences of one or more
     electronic mail messages, with local email clients treating the
     database as a kind of "folder" of email messages. mbox databases
     are also commonly used by a variety of other messaging tools, such
     as mailing list programs, archival and filtering utilities, email
     servers, and other applications. In recent years, mbox databases
     have also become common on a large number of non-UNIX computing
     platforms, for similar kinds of purposes.
  
     The increased pervasiveness of these files has led to an increased
     demand for a standardized, network-wide interchange of these files
     as discrete database objects. In turn, this dictates a need for a
     media-type definition for mbox files in general, which is the
     subject and purpose of this memo.
  
  2.      About the mbox Database
  
     The mbox database format is not documented in an authoritative
     specification, but instead exists as a well-known output format,
     or is only anecdotally documented, or is only authoritatively
     documented for a specific platform or tool. As a result, the mbox
     database has been adapted and mutated by various systems and
     utilities over the years, and does not exist in a singular form
     across all messaging platforms.
  
     In general, mbox files typically contain a linear sequence of
     electronic mail messages. Each message begins with a separator
     line that identifies the message sender, and also identifies the
     date and time at which the message was received by the final
     recipient (either the last-hop system in the transfer path, or the
     system which serves as the recipient's mailstore). The end of the
     database is usually recognizable by the absence of any additional
     data, or by the presence of an end-of-file marker.
  
     The structure of the separator lines vary somewhat across
     implementations, but almost always contain the exact character
     sequence of "From", followed by a single space character, an email
     address of some kind, another space character, a timestamp
     sequence of some kind, and an end-of-line marker. The exact
     structure of the email address, the timestamp, the end-of-line
     marker, and even the whole separator line, are all known to differ
  
  Hall                    I-D Expires: July 2005               [page 2]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
     across implementations. For example, the email address can reflect
     any addressing syntax which has ever been used on any system in
     all of history (specifically including addressing forms which are
     not compatible with Internet mail), and the timestamp sequences
     can also vary according to system output, while the end-of-line
     sequences will often reflects platform-specific requirements. Some
     messaging systems have also been known to append additional
     information after the timestamp, while the presence of such data
     is known to break other messaging systems.
  
     The exact format of the separator line in use with a particular
     mbox database can often be determined by examining the first line
     of the file itself, although implementers are cautioned that
     multiple mbox files may have been concatenated together, or a
     single file may have been accessed by multiple messaging clients
     (each of which uses its own syntax), resulting in different
     separator line formats being used within a single file.
  
     Many implementations are also known to escape message body lines
     that begin with the character sequence of "From", so as to prevent
     confusion with overly-liberal parsers that do not search for full
     separator lines. In the common case, a leading Greater-Than symbol
     (0x3E) is used for this purpose, with "From" becoming ">From".
     However, other implementations are known not to escape such lines
     unless they are immediately preceded by a blank line or if they
     also appear to contain an email address and a timestamp, while
     others are known to perform secondary escapes against such lines
     which are already escaped or quoted.
  
     Message data within mbox databases is often undefined, or will
     reflect on site-specific peculiarities. For example, it is
     entirely possible for the message body or headers in an mbox
     database to contain untagged eight-bit character data that
     implicitly reflects a site-specific default language or locale.
     Similarly, message data can also contain unencoded eight-bit
     binary data, or to use encoding formats which represent a specific
     platform (E.G., BINHEX or UUENCODE sequences). Along these same
     lines, email addresses can also reflect site-specific messaging
     platforms or make default assumptions about domain names, or for
     header fields to reflect a particular platform, and so forth.
  
     A comprehensive description of mbox database files on UNIX-like
     systems can be found at http://qmail.org./man/man5/mbox.html,
     which should be treated as mostly authoritative for those
     variations which are otherwise only documented in anecdotal form.
     However, readers are advised that many other platforms and tools
  
  Hall                    I-D Expires: July 2005               [page 3]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
     make use of mbox databases, and that there are many more potential
     variations that can be encountered in the wild.
  
     In order to mitigate errors that may arise from such vagaries,
     this specification defines a "format" parameter to the
     APPLICATION/MBOX media-type declaration, which can be used to
     identify the specific kind of mbox database that is being
     transferred. Furthermore, this specification defines a "default"
     database format which MUST be supported by implementations that
     claim to be compliant with this specification, and which is to be
     used as the implicit format for undeclared APPLICATION/MBOX data
     objects. Additional format types are to be defined in subsequent
     specifications. Messaging systems which receive an unknown format
     SHOULD treat the data as an opaque binary object, as if the data
     had been declared as APPLICATION/OCTET-STREAM.
  
     Refer to Appendix A for a description of the default mbox format.
  
     Note that RFC 2046 [RFC2046] defines the multipart/digest media-
     type for transferring platform-independent message files. Since
     that specification defines a set of neutral and strict formatting
     rules, the multipart/digest media-type already facilitates highly-
     predictable transfer and conversion operations, and as such
     implementers are strongly encouraged to support and use that
     media-type where possible.
  
  3.      Prerequisites and Terminology
  
     Readers of this document are expected to be familiar with the
     specification for MIME [RFC2045] and MIME-type registrations
     [RFC2048].
  
     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
     in this document are to be interpreted as described in RFC 2119
     [RFC2119].
  
  4.      The APPLICATION/MBOX Media-Type Registration
  
     This section provides the media-type registration application (as
     per [RFC2048]), which will be submitted to IANA after IESG
     approval of this specification.
  
     MIME media type name: application
  
     MIME subtype name: mbox
  
  Hall                    I-D Expires: July 2005               [page 4]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
  
     Required parameters: none
  
     Optional parameters: The "format" parameter identifies the format
     of the mbox database and the messages contained therein. The
     default value for the "format" parameter is "default", and refers
     to the formatting rules defined in Appendix A. mbox databases that
     do not have a "format" parameter SHOULD be interpreted as having
     the implicit "format" value of "default". mbox databases that have
     an unknown value for the "format" parameter SHOULD be treated as
     opaque data objects, as if the media-type had been specified as
     APPLICATION/OCTET-STREAM.
  
     Encoding considerations: Due to a lack of tagging mechanisms in
     the traditional mbox format, it will often be necessary to encode
     the contents of an mbox database prior to transmission. In
     particular, if an email client receives an mbox database as a
     message attachment and writes that message (and its attachment) to
     a local mbox database, the contents of the two database files may
     become irreversibly intermingled, such that neither database is
     independently recognizable. In order to avoid these collisions,
     messaging clients which support this specification MUST encode an
     mbox database (or at a minimum, the message-separator lines within
     the database) with a non-transparent encoding (such as BASE64 or
     Quoted-Printable) prior to transfer. mbox databases transferred
     over other media SHOULD also be similarly encoded to allow for any
     subsequent retransmission which may occur. Implementers should
     also be prepared to encode mbox data locally if non-compliant data
     is received.
  
     Security considerations: mbox data is passive, and does not
     generally represent a unique or new security threat. However,
     there is some risk in sharing any kind of data, in that
     unintentional information may be exposed, and this risk certainly
     applies to mbox data as well.
  
     Interoperability considerations: Due to the lack of any single
     formal specification for mbox databases, there are a large number
     of variations between database formats, and it is expected that
     non-conformant data will be exchanged. As such, it will be
     necessary for implementations to examine the mbox database in
     order to successfully import or otherwise utilize the data.
     Furthermore, different organizations will often have locally-
     relevant assumptions about message data (such as the character set
     in use, or the default domain name for unqualified addresses,
     among others), with no way to carry these local assumptions across
  
  Hall                    I-D Expires: July 2005               [page 5]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
     the broader Internet. Although the "default" format specified in
     this memo does not allow for these kinds of vagaries, prior
     negotiation or agreement between humans will often be needed when
     non-compliant database formats are used.
  
     Published specification: see Appendix A.
  
     Applications which use this media type: hundreds of messaging
     products make use of the mbox database format, in one form or
     another.
  
     Magic number(s): no standard
  
     File extension(s): mbox database files sometimes have an ".mbox"
     extension, but this is not required nor expected.
  
     Macintosh File Type Code(s): no standard
  
     Person & email address to contact for further information: Eric A.
     Hall (ehall@ntrg.com)
  
     Intended usage: COMMON
  
  5.      Security Considerations
  
     See the discussion in section 4.
  
  6.      IANA Considerations
  
     After any IESG approval which may be forthcoming, IANA would be
     expected to register the application/mbox media-type, using the
     application provided in section 4 above.
  
  7.      Normative References
  
          [RFC2046]     Freed, N., Borenstein, N., "Multipurpose
                         Internet Mail Extensions (MIME) Part Two:
                         Media Types", RFC 2046, November 1996.
  
          [RFC2048]     Freed, N., Klensin, J., Postel, J.,
                         "Multipurpose Internet Mail Extensions (MIME)
                         Part Four: Registration Procedures", BCP 13,
                         RFC 2048, November 1996.
  
          [RFC2119]     Bradner, S., "Key words for use in RFCs to
                         Indicate Requirement Levels", BCP 14, RFC
                         2119, March 1997.
  
  Hall                    I-D Expires: July 2005               [page 6]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
  
  
  Appendix A.    The "default" mbox Database Format
  
     In order to improve interoperability among messaging systems, this
     memo defines a "default" mbox database format, which MUST be
     supported by all implementations claiming to be compliant with
     this specification.
  
     The "default" mbox database format uses a linear sequence of
     Internet messages, with each message being immediately prefaced by
     a separator line, and being terminated by an empty line. More
     specifically:
  
        o Each message within the database MUST follow the syntax and
          formatting rules defined in RFC 2822 [RFC2822] and its
          related specifications, with the exception that the canonical
          mbox database MUST use a single Line-Feed character (0x0A) as
          the end-of-line sequence, and MUST NOT use a Carriage-
          Return/Line-Feed pair (NB: this requirement only applies to
          the canonical mbox database, and is not to be interpreted to
          override any other specifications). This usage represents the
          most common historical representation of the mbox database
          format, and allows for the least amount of conversion.
  
        o Messages within the default mbox database MUST consist of
          seven-bit characters within an eight-bit stream. Eight-bit
          data within the stream MUST be converted to a seven-bit form
          (using an appropriate, standardized encoding) and
          appropriately tagged (with the correct header fields) before
          the database is transferred.
  
        o Message headers and data in the default mbox database MUST be
          fully-qualified, as per the relevant specification[s]. For
          example, email addresses in the various header fields MUST
          have legitimate domain names (as per RFC 2822), while
          extended characters and encodings MUST be specified in the
          appropriate location (as per the appropriate MIME
          specifications), and so forth.
  
        o Each message in the mbox database MUST be immediately
          preceded by a single separator line, which MUST conform to
          the following syntax:
  
             The exact character sequence of "From";
  
  
  Hall                    I-D Expires: July 2005               [page 7]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
             a single Space character (0x20);
  
             the email address of the message sender (as obtained from
             the message envelope or other authoritative source),
             conformant with the "addr-spec" syntax from RFC 2822;
  
             a single Space character;
  
             a timestamp indicating the UTC date and time when the
             message was originally received, conformant with the
             syntax of the traditional UNIX 'ctime' output sans
             timezone (note that the use of UTC precludes the need for
             a timezone indicator);
  
             an end-of-line marker.
  
        o Each message in the database MUST be terminated by an empty
          line, containing a single end-of-line marker.
  
        o Message body lines that begin with the character sequence of
          "From" MUST be escaped with a leading Greater-Than symbol
          (0x3E) prior to transfer. Recipient systems MUST remove these
          escape sequences upon receipt (NB: this rule does not
          prohibit such systems from escaping the lines again, as may
          be needed for storage or subsequent processing).
  
     Note that the first message in an mbox database will only be
     prefaced by a separator line, while every other message will begin
     with two end-of-line sequences (one at the end of the message
     itself, and another to mark the end of the message within the mbox
     database file stream) and a separator line (marking the new
     message). The end of the database is reached when no more
     separator lines exist.
  
  Acknowledgments
  
     Funding for the RFC editor function is currently provided by the
     Internet Society.
  
  
  Authors' Addresses
  
     Eric A. Hall
     ehall@ntrg.com
  
  
  
  Hall                  I-D Expires: December 2004             [page 8]


  Internet Draft     draft-hall-mime-app-mbox-03.doc      January 2005
  
  
  Full Copyright Statement
  
     Copyright (C) The Internet Society 2004. This document is subject
     to the rights, licenses and restrictions contained in BCP 78, and
     except as set forth therein, the authors retain all their rights.
  
     This document and the information contained herein are provided on
     an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
     REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
     THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
     THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
     ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
     PARTICULAR PURPOSE.
  
  
  Hall                  I-D Expires: December 2004             [page 9]