Network Working Group E. Hall
Request for Comments: 4155 September 2005
Category: Informational
The application/mbox Media Type
Status of This Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This memo requests that the application/mbox media type be authorized
for allocation by the IESG, according to the terms specified in RFC
2048. This memo also defines a default format for the mbox database,
which must be supported by all conformant implementations.
1. Background and Overview
UNIX-like operating systems have historically made widespread use of
"mbox" database files for a variety of local email purposes. In the
common case, mbox files store linear sequences of one or more
electronic mail messages, with local email clients treating the
database as a logical folder of email messages. mbox databases are
also used by a variety of other messaging tools, such as mailing list
management programs, archiving and filtering utilities, messaging
servers, and other related applications. In recent years, mbox
databases have also become common on a large number of non-UNIX
computing platforms, for similar kinds of purposes.
The increased pervasiveness of these files has led to an increased
demand for a standardized, network-wide interchange of these files as
discrete database objects. In turn, this dictates a need for a
general media type definition for mbox files, which is the subject
and purpose of this memo.
Hall Informational [Page 1]
RFC 4155 The application/mbox Media Type September 2005
2. About the mbox Database
The mbox database format is not documented in an authoritative
specification, but instead exists as a well-known output format that
is anecdotally documented, or which is only authoritatively
documented for a specific platform or tool.
mbox databases typically contain a linear sequence of electronic mail
messages. Each message begins with a separator line that identifies
the message sender, and also identifies the date and time at which
the message was received by the final recipient (either the last-hop
system in the transfer path, or the system which serves as the
recipient's mailstore). Each message is typically terminated by an
empty line. The end of the database is usually recognized by either
the absence of any additional data, or by the presence of an explicit
end-of-file marker.
The structure of the separator lines vary across implementations, but
usually contain the exact character sequence of "From", followed by a
single Space character (0x20), an email address of some kind, another
Space character, a timestamp sequence of some kind, and an end-of-
line marker. However, due to the lack of any authoritative
specification, each of these attributes are known to vary widely
across implementations. For example, the email address can reflect
any addressing syntax that has ever been used on any messaging system
in all of history (specifically including address forms that are not
compatible with Internet messages, as defined by RFC 2822 [RFC2822]).
Similarly, the timestamp sequences can also vary according to system
output, while the end-of-line sequences will often reflect platform-
specific requirements. Different data formats can even appear within
a single database as a result of multiple mbox files being
concatenated together, or because a single file was accessed by
multiple messaging clients, each of which has used its own syntax for
the separator line.
Message data within mbox databases often reflects site-specific
peculiarities. For example, it is entirely possible for the message
body or headers in an mbox database to contain untagged eight-bit
character data that implicitly reflects a site-specific default
language or locale, or that reflects local defaults for timestamps
and email addresses; none of this data is widely portable beyond the
local scope. Similarly, message data can also contain unencoded
eight-bit binary data, or can use encoding formats that represent a
specific platform (e.g., BINHEX or UUENCODE sequences).
Hall Informational [Page 2]
RFC 4155 The application/mbox Media Type September 2005
Many implementations are also known to escape message body lines that