Network working group                             G. Klyne, Clearswift
Internet draft                                            9 April 2002
                                                 Expires: October 2002


              An XML format for mail and other messages
               <draft-klyne-message-rfc822-xml-03.txt>

Status of this memo

  This document is an Internet-Draft and is in full conformance with
  all provisions of Section 10 of RFC 2026.

  Internet-Drafts are working documents of the Internet Engineering
  Task Force (IETF), its areas, and its working groups.  Note that
  other groups may also distribute working documents as Internet-
  Drafts.

  Internet-Drafts are draft documents valid for a maximum of six
  months and may be updated, replaced, or obsoleted by other
  documents at any time.  It is inappropriate to use Internet-Drafts
  as reference material or to cite them other than as "work in
  progress".

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/1id-abstracts.html

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html


Copyright Notice

  Copyright (C) The Internet Society 2001.  All Rights Reserved.

Abstract

  This document describes a coding of email and other messages in
  XML.  This coding is intended for use by XML applications that
  exchange information about such messages.

Discussion of this document

  Send comments to <ietf-message-xml@research.mimesweeper.com>.  To
  subscribe to this list, send a message with the body 'subscribe' to
  <ietf-message-xml-request@research.mimesweeper.com>.









Klyne                       Internet draft                    [Page 1]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  Table of contents

1. Introduction.............................................3
  1.1 Structure of this document ...........................3
  1.2 Document terminology and conventions .................4
  1.3 About MIME and XML ...................................5
2. Message structures.......................................6
  2.1 Message header overview ..............................6
  2.2 Multipart/related message structure ..................7
  2.3 Inline XML message structure .........................8
  2.4 Content type Message/Email+XML .......................8
3. Message header...........................................9
  3.1 The <Message> element ................................9
  3.2 Content of <Message> element .........................10
  3.3 Use of XML namespaces ................................10
  3.4 The <content> element ................................11
  3.5 General form of header field elements ................12
  3.6 RFC822-derived header elements .......................12
  3.7 Header fields containing addresses ...................13
     3.7.1 Header fields containing address groups..........14
  3.8 Header elements containing human readable text .......15
  3.9 MIME header fields ...................................15
  3.10 Other header fields .................................15
     3.10.1 Mandatory extensions............................16
4. Summary of RFC822-derived header elements................17
5. IANA considerations......................................17
6. Internationalization considerations......................18
  6.1 International URIs in XML ............................19
7. Security considerations..................................19
8. Acknowledgements.........................................20
9. References...............................................20
10. Author's address........................................23
Appendix A: Message/Email+XML content-type registration.....24
Appendix B: DTD for Email+XML message format................24
Appendix C: XML schema for Email+XML message format.........24
Appendix D: RDF representation of Email+XML message.........24
Appendix E: RDF schema for Email+XML message format.........25
Appendix F: Amendment history...............................25
Appendix G: Outstanding issues..............................26
Full copyright statement....................................26















Klyne                       Internet draft                    [Page 2]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>




1. Introduction

  This document describes a coding of email and similar messages
  (such as RFC822 [1]) using XML [2], described here as the Email+XML
  message format.

  The present document is presented as a design that can be used by
  XML applications that deal with email and similar messages.

  The XML coding is designed to address the following goals:

  o  to fully capture the semantics of Internet email messages, per
     RFC822 [1].  However it is not intended to provide a loss-less
     coding of RFC822 syntax.

  o  to extend the scope of address information that can be conveyed
     to arbitrary URIs [3].

  o  to take account of 8-bit clean transfer environments.

  o  to fully support, where applicable, international character sets
     and languages within the message header and content [4,5].

  o  to be usable in MIME [6] and pure XML [2] transfer environments.

  o  to be fully compliant with the XML [2] and XML namespace [9]
     specifications.

  o  to allow header information to be compatible with RDF format
     [10], for use by generalized metadata processing applications.

1.1 Structure of this document

  Section 2 describes the overall message structure, showing how the
  message header and message content can be conveyed in MIME and XML
  transfer environments.

  Section 3 describes the message header in greater detail, with
  particular reference to differences in the value of individual
  fields compared their RFC822 counterparts.

  Section 4 discusses issues that may arise when converting between
  traditional RFC822 and the Email+XML message format described here.

  Appendix A contains a MIME content-type registration for
  Message/Email+XML.







Klyne                       Internet draft                    [Page 3]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  Appendix B contains a DTD for the Email+XML message format.

  Appendix C contains an XML schema for the Email+XML message format.
  (XML schema are set to replace DTDs are the prferred way to
  describe XML docoment content.)

  Appendix D briefly discusses the RDF representation [10] and its
  applicability to the Email+XML message format.

  Appendix E contains an RDF schema [23] description for the
  Email+XML message format.

1.2 Document terminology and conventions

  Message   an assemblage of information that constitutes a
            communication of information from a sender to one or more
            recipients.  Consists of a message header and message
            content.

  Message header
            contains information about the message that is conveyed
            between message user agents, and not used by the message
            transfer mechanisms.  This may include who the message is
            from, who it is addressed to, other parties to whom it
            has been copied, subject of the message, date the message
            was composed, etc.

  Message content
            some arbitrary data carried in a message.

  Email+XML
            is the message format defined by this document.  (This
            name uses the XML content type labelling convention
            [11].)

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
  this document are to be interpreted as described in RFC 2119 [19].

       NOTE:  Comments like this provide additional nonessential
       information about the rationale behind this document.
       Such information is not needed for building a conformant
       implementation, but may help those who wish to understand
       the design in greater depth.

  [[[Editorial comments and questions about outstanding issues are
  provided in triple brackets like this.  These working comments
  should be resolved and removed prior to final publication.]]]







Klyne                       Internet draft                    [Page 4]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


1.3 About MIME and XML

  There has been much discussion about the relative merits of MIME
  and XML.  The position of this document is that they serve
  different purposes, and are complementary rather than alternatives.

  MIME is a framework primarily for encapsulating and composing
  arbitrary data entities, and offers the following capabilities:

  o  Content type labelling.

  o  Transfer encoding for handling arbitrary data on restricted
     channels.

  o  Assembly of different kinds of data into composite entities.

  o  End of data detection without need to parse or understand the
     data content.

  XML is a framework primarily for describing data structures,
  including semi-structured document data, and offers the following
  capabilities:

  o  Construction of arbitrary data structures based on an annotated
     tree model.

  o  Fine-grained labelling of structure components and data
     attributes.

  o  Cross-linking between data structure components.

  o  A standard format for interchange of structured information
     between diverse systems.

  There is, of course, some overlap in capabilities, and reasonable
  people may disagree about the appropriateness of using MIME and/or
  XML in particular circumstances.

  This document is predicated on the idea that XML is a useful
  mechanism (in addition to existing facilities) for structuring
  message header information.  It aims to be agnostic with regard to
  using MIME or some other framework for composing and encapsulating
  messages.












Klyne                       Internet draft                    [Page 5]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


2. Message structures

  A message consists of a message header and message content:

  o  The message header contains information about the message:  who
     it was sent by, who it is addressed to, its subject, date it was
     sent, and many other related pieces of information.

  o  The message content is any data that is carried by the message:
     e.g. a text message, fax image, voice message or arbitrary
     application data.  In principle, any data that can be transfered
     as a MIME object can be message content, though specific
     applications may limit the kinds of data that can be transferred.

  The Email+XML message format uses a URI-reference [3] in the
  message header to reference the message content.  Thus, the message
  content may be completely separate from the message header;  the
  message header is the root information of a message, from which
  message content may be discovered.

  Two specific message structure scenarios are contemplated here:

  o  Multipart/related, and

  o  An XML element within the message header.

  These are described below.  Other message structures are possible
  (e.g. multiple resources on a web server, multiple channels in a
  multiplexed protocol), but are not described here.

2.1 Message header overview

  The message header is an XML document whose root element is
  <Message>.  This contains a number of elements;  an initial set of
  such elements is defined based on RFC822 message headers [1].

  The message content is indicated by an attribute of the <Message>
  element whose value is a URI-reference for the content.

  The message header is discussed in greater detail in section 3
  below.














Klyne                       Internet draft                    [Page 6]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


2.2 Multipart/related message structure

  A message whose content is formatted as a MIME object [6] may be
  sent as a Multipart/related object [15]:

     Content-type: multipart/related; boundary="boundary";
                   start="<1@100Aker.org>";
                   type="message/Email+XML"

     --boundary
     Content-type: Message/Email+XML
     Content-ID: <1@100Aker.org>

     <emx:Message
         xmlns:emx='urn:ietf:params:email-xml:'
         xmlns:rfc822='urn:ietf:params:rfc822:'
         emx:content='cid:2@100Aker.org'>
       <rfc822:from>
         <emx:Address>
           <emx:adrs>mailto:Pooh@PoohCorner.100Aker.org</emx:adrs>
           <emx:name>Winnie the Pooh</emx:name>
         </emx:Address>
       </rfc822:from>
       <rfc822:to>
         <emx:Address>
           <emx:adrs>mailto:Piglet@BeechTree.100Aker.org</emx:adrs>
           <emx:name>MR SANDERS</emx:name>
         </emx:Address>
       </rfc822:to>
       <rfc822:subject>Woozle Hunting</rfc822:subject>
     </emx:Message>
     --boundary
     Content-Type: text/plain;charset=UTF-8
     Content-ID: <2@100Aker.org>

     I have Been Foolish and Deluded
     I am a Bear of No Brain at All
     --boundary--

  In this case, the Multipart/related contains two MIME parts:

  o  the message header, and

  o  the message content.











Klyne                       Internet draft                    [Page 7]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  The Multipart/related content-type header indicates the root of the
  message by its Content-ID value [6].  In turn, the message header
  refers to the message content with a <Message> element 'content='
  attribute whose value is a 'cid:' URI [16].

2.3 Inline XML message structure

  When the message content can be expressed as simple text or XML, it
  may be included within the message header using a <content> element
  containing the message content instead of a 'content=' attribute.

     Content-type: Message/Email+XML

     <emx:Message
         xmlns:emx='URN:ietf:params:email-xml:'
         xmlns:rfc822='URN:ietf:params:rfc822:'>
       <rfc822:from>
         <emx:Address>
           <emx:adrs>mailto:Christopher.Robin@GreenDoor.org</emx:adrs>
           <emx:name>Christopher Robin</emx:name>
         </emx:Address>
       </rfc822:from>
       <rfc822:to>
         <emx:Address>
           <emx:adrs>mailto:Pooh@PoohCorner.100Aker.org</emx:adrs>
           <emx:name>Winnie the Pooh</emx:name>
         </emx:Address>
       </rfc822:to>
       <rfc822:subject>Re: Woozle hunting</rfc822:subject>
       <emx:content type='text/plain'>
         You're the Best Bear in All the World
       </emx:content>
     </emx:Message>

  This example shows the message contained within a single
  Message/Email+XML MIME object.

  The <content> element indicates the message content.  When present,
  this element MUST be the last element contained in a <Message>
  element.

2.4 Content type Message/Email+XML

  This specification defines a new MIME content-type called
  Message/Email+XML.










Klyne                       Internet draft                    [Page 8]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  A Message/Email+XML entity contains an XML document conforming to
  the DTD known by the SYSTEM identifier
  'urn:ietf:params:xml:dtd:email-xml:', per [24].  The document may
  contain <?XML?> and <!DOCTYPE> declarations, but these are not
  required.

  The body of the document is a <Message> element, as described
  below.

  The character set encoding used in a Message/Email+XML entity is
  UTF-8.

  A Content-type registration template for Message/Email+XML is
  contained in Appendix A of this document.


3. Message header

  The Email+XML message header contains header fields based on
  RFC822, and coded in XML.

  The message header contains information about the message that is
  conveyed between message user agents, and not used by the message
  transfer mechanisms.  This may include who the message is from, who
  it is addressed to, other parties to whom it has been copied,
  subject of the message, date the message was composed, etc.

  The message header also contains a reference to the message
  content, as described in the previous section.

3.1 The <Message> element

  The <Message> element contains the message header, and references
  the message content.

  Possible attributes are:

  o  'xmlns=' or 'xmlns:tag=' is used to indicate a default XML
     namespace or XML namespace tag [9] that applies to the entire
     <Message> element.

  o  'content=' specifies a URI-reference [3] that references the
     message content, if such content is not contained inline in a
     '<content>' element.  Typically, the value is a 'cid:' URI as
     described in the previous section.  Other message content URI
     values are possible, but such use is beyond the scope of this
     specification.








Klyne                       Internet draft                    [Page 9]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  o  'xml:lang=' [2] may be used, in which case it specifies the
     language of any text in the message header, except where
     overridden by an 'xml:lang=' attribute of an enclosed element.

3.2 Content of <Message> element

  The content of a <Message> element is:

  o  a sequence of zero of more header field elements, and

  o  an optional <content> element.

  Header field elements may appear in any order.  When present, the
  <content> element MUST the last one in the <Message>.

  The <Message> element MUST contain either a 'content=' attribute or
  a single <content> element.  It must not contain both.

3.3 Use of XML namespaces

  The <Message> element, <Address> and related element names, the
  <content> element and <Message-content> element names name are all
  associated with a namespace called 'URN:ietf:params:email-xml:'.
  RFC822 header element names are associated with a namespace called
  'URN:IANA:namespace:rfc822:'.  (These namespace identifiers are
  based on "A URN Sub-namespace for Registered Protocol Parameters"
  [20].)

  The namespaces must be declared, either as a default namespace or
  using a namespace prefix (which is an arbitrary local name).  The
  namespace declaration may appear as an attribute of the <Message>
  element, or in the surrounding XML context.























Klyne                       Internet draft                   [Page 10]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  The message examples in section 2 use namespace prefixes 'emx:' and
  'rfc822', but any prefix could be used here.  Here is a different
  message example using a default namespace rather than a namespace
  prefix for the non-RFC822-derived names:

     Content-type: Message/Email+XML

     <Message
         xmlns='URN:ietf:params:email-xml:'
         xmlns:rfc822='URN:ietf:params:rfc822:'>
       <rfc822:from>
         <Address>
           <adrs>im:Eeyore@ThistlyCorner.100Aker.org</adrs>
           <name>Eeyore</name>
         </Address>
       </rfc822:from>
       <rfc822:to>
         <Group>
           <name>Anyone</name>
         </Group>
       </rfc822:to>
       <rfc822:subject>Why?</rfc822:subject>
       <content type='text/plain'>
         Wherefore?
         Inasmuch as which?
       </content>
     </Message>

3.4 The <content> element

  The <content> element is used to include the message content as
  text or XML data in the message header.  It is present when the
  <Message> element does not have a 'content=' attribute.

  Possible <content> attributes are:

  o  'type=' is optional, and indicates the MIME content-type of the
     message content.  If not specified, a content type of "text/xml"
     is assumed.

     (Whatever MIME content-type may be declared, the message content
     must be well-formed XML or character data.  In practice, this
     means the content must be some character-based data
     representation.)

  o  'xml:lang=' [2] may be used, in which case it specifies the
     language of the message content.








Klyne                       Internet draft                   [Page 11]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  The character encoding for the message content is the same as that
  used for the surrounding XML.  This is typically UTF-8, from the
  character set encoding of the MIME content-type Message/Email+XML.)

  The message content may be any well-formed XML, which includes
  simple character data.  Characters '<' and '&' that are not part of
  XML markup MUST be represented as '&lt; and '&amp;' respectively.
  The character '>' appearing in the sequence ']]>', other than at
  the end of a CDATA section, MUST be represented as '&gt;'.

3.5 General form of header field elements

  Each header field is represented by an XML element that identifies
  the field.

  The element content is the header field value.  For RFC822 and MIME
  header fields, the field value is character data in which the
  characters '<', '&' and '>' are represented as for character data
  in <Message-content> (see above).

3.6 RFC822-derived header elements

  For representing information about email messages, this
  specification introduces message header elements with names and
  semantics based on RFC822 header fields [1].  The intent is that
  the semantics of any RFC822 header field is easily represented in
  an Email+XML header element;  it is not a goal to capture the
  detailed syntax of any particular RFC822 message, or to construct a
  corresponding RFC822 message from any Email+XML message.

  RFC822-derived header elements have names based on RFC822 header
  names, using all lower-case characters (noting that XML element
  names are case sensitive).

  RFC822-derived header elements are associated with an XML
  namespace, as noted above at section 3.3, and may need to be
  combined with a namespace prefix if it is not the default
  namespace.  (See examples in sections 2.2 and 2.3.)

  RFC822-derived header element contents have the same syntax and
  meaning as corresponding RFC822 header field values, except that:

  o  Characters are not limited to US-ASCII.  UTF-8 character set
     encoding is typically used.

  o  Encoded words ('=?...?=') are not needed, and no special
     processing is defined for sequences of this form.








Klyne                       Internet draft                   [Page 12]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  o  Special considerations apply to fields containing address values
     (from, to, etc.) -- see section below.

  o  Special considerations apply to fields containing human-readable
     text values (subject, comments, etc.) -- see section below.

3.7 Header fields containing addresses

  Parts of an RFC822 address value are separated out into separate
  elements, all contained within an <Address> element.  The element
  types defined here are <adrs> and <name>.

  A major change from RFC822 is that all addresses are presented as
  URIs, rather than as RFC822 'addr-spec' values.  Email addresses
  (the only kind that appear in RFC822 headers) are expressed as
  'mailto:' URLs [21].  Address URIs are enclosed in an <adrs>
  element.

  This change anticipates that XML-based message headers may be used
  with a variety of different protocols with different addressing
  schemes.

  Finally, only one address per message header element is allowed (or
  an address group:  see below).  Where permitted, multiple values
  are represented by repeating the header element for each value.

  Note that characters in URIs are drawn from a limited repertoire;
  the URI '%' escape sequence may be used to represent other
  characters that are legal for the URI scheme used [14].

  The RFC822 address structures using 'phrase' are supported.  The
  'phrase' is a "formal name", and is enclosed in a <name> element.

  The RFC822 structures using source-route values (i.e. 'route' in
  'route-addr') are not supported.  RFC822 'comment' values within
  addresses are not supported.  Thus, RFC822 e-mail addresses that
  might be expressed as:

     Piglet@TrespassersW.100Aker.org (MR SANDERS)

  which is generally equivalent to:

     MR SANDERS <Piglet@TrespassersW.100Aker.org>

  must be presented in the form:

     <emx:Address>
       <emx:adrs>mailto:Piglet@TrespassersW.100Aker.org</emx:adrs>







Klyne                       Internet draft                   [Page 13]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


       <emx:name>MR SANDERS</emx:name>
     </emx:Address>

  Any '<', '&' and certain '>' characters appearing in a formal name
  (<name> element) MUST be represented using '&lt;', '&amp;' or
  '&gt;' as noted previously in section 3.4.

3.7.1 Header fields containing address groups

  Some RFC822 headers can have address group values as well as just
  address values.  The RFC822 'group' structure associates a
  collection of addresses with a name for that collection.  The
  individual addresses in a group may be omitted.

  An address group is expressed using a <Group> element containing
  the name of the group and zero, one or more <member> elements each
  containing an <Address>:

     <emx:Group>
       <emx:name>Christopher-Robins-friends</emx:name>
       <emx:member>
         <emx:Address>
           <emx:adrs>mailto:Pooh@PoohCorner.100Aker.org</emx:adrs>
           <emx:name>Winnie the Pooh</emx:name>
         </emx:Address>
       </emx:member>
       <emx:member>
         <emx:Address>
           <emx:adrs>mailto:Piglet@TrespassersW.100Aker.org</emx:adrs>
           <emx:name>MR SANDERS</emx:name>
         </emx:Address>
       </emx:member>
       <emx:member>
         <Address>
           <adrs>im:Eeyore@ThistlyCorner.100Aker.org</adrs>
           <name>Eeyore</name>
         </Address>
       </emx:member>
     </emx:Group>

  Omitting the individual member addresses, this would be:

     <emx:Group>
       <emx:name>Christopher-Robins-friends</emx:name>
     </emx:Group>










Klyne                       Internet draft                   [Page 14]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


3.8 Header elements containing human readable text

  Header fields that contain human readable text MAY have an
  'xml:lang=' attribute of the header element to indicate a language
  for the contained text.

  In the absence of such an attribute, any language applicable to the
  surrounding XML is to be assumed.

3.9 MIME header fields

  MIME content header fields MAY be part of the message header, using
  the same general format and XML namespace as RFC822-derived header
  fields (i.e. element name based on the MIME header field name, and
  associated with the same XML namespace).

  But note that most MIME header fields are not appropriate for use
  with the Email+XML message format.  When the message content is
  supplied as a separate MIME entity then MIME content header fields
  SHOULD be applied to that entity.

  It is expected that MIME header fields may be useful in the
  following circumstances:

  o  When the message content is included as inline XML, to convey
     information about it that cannot be conveyed using native XML
     mechanisms;  e.g. the Content-features header [22].

  o  MIME headers, not having an obvious XML counterpart, that express
     information that might be taken as metadata applying to the
     message as a whole, in isolation from the specific message
     content;  e.g. the Content-description header field.

3.10 Other header fields

  A message header MAY contain header fields that are not derived
  from RFC822 or MIME.  Any such header field names used MUST be
  associated with a different namespace.

  This specification does not define any such additional header
  fields.














Klyne                       Internet draft                   [Page 15]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


3.10.1 Mandatory extensions

  In general, a message handler should ignore any header fields that
  it does not understand.

  But sometimes it is desirable to introduce new header fields that
  must be understood for proper processing of the message to take
  place.  This specification defines an XML attribute
  'mustUnderstand=', which indicates whether or not the element to
  which it applies must be understood by a message processor:

  mustUnderstand='false'   is the default case, and indicates that
                           the corresponding element MAY safely be
                           ignored.

  mustUnderstand='true'    indicates that the element to which it
                           applies MUST be processed, OR processing
                           of the entire message (or message header)
                           MUST be abandoned.

  In XML namespace terms [9], the 'mustUnderstand=' attribute belongs
  to a "per-element-type namespace partition".  Interpretation of the
  attribute is a property of the element to which it applies.  In any
  case, the DTD or XML schema must declare that the element is
  allowed on any particular XML element type.  It is strongly
  recommended that any header elements used within an Email+XML
  message header allow this attribute with the interpretation
  described here.

  Non-validating XML processors used to handle Email+XML message
  headers MAY interpret the 'mustUnderstand=' attribute appearing on
  any header field element as described here.

  Notwithstanding the presence or absence of a 'mustUnderstand='
  attribute, individual applications may require that certain header
  elements are present or absent from any header that they interpret.



















Klyne                       Internet draft                   [Page 16]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


4. Summary of RFC822-derived header elements

  RFC822 fields containing a simple address:

     return-path
     from
     sender
     resent-from
     resent-sender

  RFC822 fields containing an address or group:

     to
     cc
     bcc
     reply-to
     resent-to
     resent-cc
     resent-bcc
     resent-reply-to

  RFC822 fields containing human-readable text:

     keywords
     subject
     comments

  Other RFC822 fields:

     received
     date
     resent-date
     message-id
     resent-message-id
     in-reply-to
     references
     encrypted


5. IANA considerations

  This specification calls for the registration of the new MIME
  content-type Message/Email+XML.  The registration template is at
  appendix A.

  [[[XML document identifier -- URN from IANA space?]]]

  [[[XML namespace identifier -- URN from IANA space?]]]







Klyne                       Internet draft                   [Page 17]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  [[[Waiting on [20]...]]]


6. Internationalization considerations

  This specification attempts to relax the restriction of
  international data imposed by RFC822.

  RFC822 limits characters in address local parts to US-ASCII.  This
  specification uses URIs and XML-based address format, relaxing that
  constraint so that foreign language personal names can be
  represented.  Character restrictions apply to URIs, and the
  %-escape mechanism defined by RFC2396 must be followed for
  representing non-URI characters.  The character encoding used is
  dependent on the URI scheme, but UTF-8 is the strongly recommended
  choice.  [[[todo: cite IRI work, and charmod?]]]

  Similarly, the characters that can be used in domain names are
  currently severely constrained.  Work is under way to define
  international forms for domain names.

  Message content is tagged using standard MIME capabilities (charset
  parameter for text data [13], and Content-language header for
  language tagging [22]).  Mandating handling of international data
  formats is a matter for particular applications;  it is recommended
  that applications using the Email+XML message format be required to
  process UTF-8 coded character data.  That does not necessarily mean
  that all characters received can be displayed.

  For content included in an XML element, language tagging can be
  achieved by including an 'xml:lang=' attribute [16] in the
  <Message-content> element (subject to appropriate DTD or XML schema
  permission to use that attribute).






















Klyne                       Internet draft                   [Page 18]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


6.1 International URIs in XML

  This sub-section is commentary, not part of this specification:

  In a message to the W3C URI mailing list
  (http://lists.w3.org/Archives/Public/uri/2000Oct/0008.html), Martin
  Duerst wrote:

     The original XML spec says (http://www.w3.org/TR/1998/REC-xml-
     19980210#sec-external-ent):

       An XML processor should handle a non-ASCII character in a URI
       by representing the character in UTF-8 as one or more bytes,
       and then escaping these bytes with the URI escaping mechanism
       (i.e., by converting each byte to %HH, where HH is the
       hexadecimal notation of the byte value).

     This says that the XML processor should do this for you, and
     therefore it should be okay for you to put in the original
     characters. But there are three problems here:

     o It says 'should', not must.

     o It's not clear whether it applies to all URIs, or just to the
       URIs used in System Identifiers, and in the former case, it's
       not clear how an XML processor would find all URIs in a
       document (without e.g. Schema information).

     o The text in the second edition of XML
       (http://www.w3.org/TR/REC-xml#sec-external-ent) is much
       clearer about how the conversion has to take place;
       unfortunately, it doesn't make clear who should do this
       conversion (the original document producer or the XML
       processor). The idea was not to change this for the second
       edition, but somehow it got lost. I'm following up on this.


7. Security considerations

  This document for the most part describes an alternative coding of
  an existing message structure, and is not believed to introduce any
  new security exposure not already inherent in existing systems.

  MIME based messages may be protected using existing MIME security
  frameworks, such as S/MIME [12], OpenPGP [13], etc.










Klyne                       Internet draft                   [Page 19]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  Using a non-MIME, pure XML message format means that alternative
  security frameworks may be applicable, such as XML digital
  signatures [14].

  Note that this framework is not designed to allow the conversion of
  message formats (e.g. between RFC822 and XML) while preserving
  signatures or other security information.  If a signature is
  applied in a MIME body part, and that body part is moved to a
  message with a different header format, then the signature may be
  expected to remain intact.


8. Acknowledgements

  The author thanks the following for their comments and/or
  contributions:  Harald Alvestrand, Dave Crocker, Simon Josefsson,
  [[[...]]].


9. References

[1]  Crocker, D.,
     "Standard for the format of ARPA Internet text messages",
     RFC 822, STD 11,
     August 1982.

[2]  Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen,
     "Extensible Markup Language (XML) 1.0",
     W3C recommendation: <http://www.w3.org/TR/REC-xml>,
     10 February 1998.

[3]  Berners-Lee, T., Fielding, R.T. and L. Masinter,
     "Uniform Resource Identifiers (URI): Generic Syntax",
     RFC 2396,
     August 1998.

[4]  Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
     R., Crispin, M., Svanberg, P.,
     "Report from the IAB Character Set Workshop",
     RFC 2130,
     April 1997.

     Alvestrand, H,
     "IETF Policy on Character Sets and Languages",
     RFC 2277, BCP 18,
     January 1998.









Klyne                       Internet draft                   [Page 20]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


     Freed, N., and J. Postel,
     "IANA Charset Registration Procedures",
     BCP 19, RFC 2278,
     January 1998.

[[[Is there a more definitive reference?]]]

[5]  Alvestrand, H.,
     "Tags for the Identification of Languages",
     RFC 1766,
     March 1995.
     (Defines Content-language header.)

[6]  Freed, N. and N. Borenstein,
     "Multipurpose Internet Mail Extensions (MIME) Part One: Format of
     Internet Message Bodies",
     RFC 2045,
     November 1996.

[7]  Freed, N. and N. Borenstein,
     "Multipurpose Internet Mail Extensions (MIME) Part Two: Media
     Types",
     RFC 2046
     November 1996.

[8]  Freed, N., Klensin, J., and J. Postel,
     "Multipurpose Internet Mail Extensions (MIME) Part Four:
     Registration Procedures",
     RFC 2048, BCP 13,
     November 1996.

[9]  Tim Bray, Dave Hollander, and Andrew Layman
     "Namespaces in XML",
     W3C recommendation: <http://www.w3.org/TR/REC-xml-names>,
     14 January 1999.

[10] Lassila, O. and R. Swick,
     "Resource Description Framework (RDF) Model and Syntax
     Specification",
     W3C recommendation: <http://www.w3.org/TR/REC-rdf-syntax>,
     22 February 1999.

[11] Kohn, D., Murata, M. and S. St.Laurent,
     "XML Media Types",
     draft-murata-xml-09.txt,
     September 2000.
     (Introduces '+XML' content-type naming convention.)








Klyne                       Internet draft                   [Page 21]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


[12] Ramsdell, B.,
     "S/MIME Version 3 Message Specification",
     RFC 2633,
     June 1999.

[13] Callas, J., Donnerhacke, L., Finney, H. and R. Thayer,
     "OpenPGP Message Format",
     RFC 2440,
     November 1998.

[14] Eastlake, D., Reagle, J. and D. Solo,
     "XML-Signature Syntax and Processing",
     Work in progress: <draft-ietf-xmldsig-core-09.txt>,
     August 2000.

[15] Levinson, E.,
     "The MIME Multipart/Related Content-type",
     RFC 2387,
     August 1998.

[16] Levinson, E.,
     "Content-ID and Message-ID Uniform Resource Locators",
     RFC 2392,
     August 1998.

[17] Daniel, R., DeRose, S. and E. Maler
     "XML Pointer Language (XPointer) Version 1.0",
     W3C Candidate Recommendation: <http://www.w3.org/TR/xptr>
     7 June 2000.

[18] Fallside, D.,
     "XML Schema Part 0: Primer",
     W3C Working Draft: <http://www.w3.org/TR/xmlschema-0/>,
     22 September 2000.

     Thompson, H., Beech, D., Maloney, M., and N. Mendelsohn
     "XML Schema Part 1: Structures",
     W3C Working Draft: <http://www.w3.org/TR/xmlschema-1/>
     22 September 2000.

     Biron, P. and A. Malhotra,
     "XML Schema Part 2: Datatypes",
     W3C Working Draft: <http://www.w3.org/TR/xmlschema-2/>
     22 September 2000.











Klyne                       Internet draft                   [Page 22]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


[19] Bradner, S.,
     "Key words for use in RFCs to Indicate Requirement Levels",
     RFC 2119,
     March 1997.

[20] Mealling, M., Masinter, L., Hardie, T., and G. Klyne,
     "A URN Sub-namespace for Registered Protocol Patameters",
     draft-mealling-iana-urn-01.txt (work in progress),
     August 2001.

[21] Hoffman, P., Masinter, L., and J. Zawinski,
     "The mailto URL scheme",
     RFC 2368,
     July 1998.

[22] Klyne, G.,
     "Indicating Media Features for MIME Content",
     RFC 2912,
     September 2000.

[23] Brickley, D. and R. V. Guha,
     "Resource Description Framework (RDF) Schema Specification",
     W3C recommendation: <http://www.w3.org/TR/PR-rdf-schema>,
     27 March 2000.

[24] Mealling, M.,
     "The IETF XML Registry",
     draft-mealling-iana-xmlns-registry-02.txt (work in progress),
     June 2001.


10. Author's address

  Graham Klyne
  MIMEsweeper Group
  Clearswift Corporation
  1310 Waterside
  Arlington Business Park
  Theale
  Reading, RG7 4SA
  United Kingdom

  Telephone: +44 11 8903 8903
  E-mail:    Graham.Klyne@MIMEsweeper.com











Klyne                       Internet draft                   [Page 23]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


Appendix A: Message/Email+XML content-type registration

  [[[TBD]]]


Appendix B: DTD for Email+XML message format

  [[[TBD]]]


Appendix C: XML schema for Email+XML message format

  [[[TBD]]]


Appendix D: RDF representation of Email+XML message

  The message header format described here is designed to be
  compatible with RDF [10].  To prepare a message header for
  presentation to an RDF processor, it should be enclosed in an
  <rdf:RDF> element having an appropriate RDF namespace declaration.

  In RDF terms, the message header is a resource, having a property
  arc for each header element and also one for the message content.

  Here is an informal representation of the RDF graph corresponding
  to the message example from section 2.3:

     [<Message>]
       |
       +--rfc822:from--> [<Address>]
       |                   |
       |        -----------
       |       |
       |       +--adrs-->"im:Eeyore@ThistlyCorner.100Aker.org"
       |       +--name-->"Eeyore"
       |
       +--rfc822:to-------> [<Group>]
       |                      |
       |                      +--name--> "Anyone"
       |
       +--rfc822:subject--> "Why?"
       |
       +--content--> "Wherefore?
                      Inasmuch as which?"

  There is a subtle difference in the RDF form of a message with
  inline content and one that references a separate content object:







Klyne                       Internet draft                   [Page 24]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  both have a 'content' property whose value is a resource;  if the
  content is defined externally, the value of the 'content' property
  is an RDF resource containing the content;  when the content is
  inline, the property value is an RDF literal.

  If inline message content contains XML markup, to ensure complete
  RDF compatibility the 'content' element should have a property
  'parseType="Literal"', to prevent the RDF processor from trying to
  interpret the content as RDF.


Appendix E: RDF schema for Email+XML message format

  [[[TBD]]]


Appendix F: Amendment history

  00a  13-Oct-2000  Memo initially created.

  00b  16-Oct-2000  Add reference to XML spec note about non-ASCII
                    text in a URI.

  00c  18-Oct-2000  Change RFC822|XML to RFC822+XML (per later XML-
                    MIME spec).

  01a  04-Jan-2001  Change draft title and message format name.
                    Indicate that this is not an exact coding of
                    RFC822 messages, but an attempt to capture their
                    essential semantics.  Change syntax of address
                    elements to be RDF compliant.

  01b  10-Jan-2001  Add RFC822 group structure to address format.
                    Distinguish between headers that allow group
                    values and those that allow simple addresses.  Use
                    separate namespaces for message structure and
                    headers derived from RFC822.  Add brief discussion
                    of RDF compatibility.

  01c  12-Jan-2001  Add discussion list details.

  02a  19-Jan-2001  Add clarification to security considerations that
                    message signatures are not generally expected to
                    survive any message format conversion.

  02b  10-Sep-2001  Update contact details.  Update proposed namespace
                    names in line with [20].  Update proposed DTD
                    name, per [24] (new reference).







Klyne                       Internet draft                   [Page 25]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  03a  09-Apr-2002  Update contact details.  Change name of
                    'seeNoEvil' attribute to 'mustUnderstand'.


Appendix G: Outstanding issues

  o  Review namespace URIs.

  o  Review MIME type name.  (Message/XML?  Application/Message+XML?)

  o  Allow more flexible use of RDF syntax to reduce verbosity (but
     increase number of different ways of expressing some constructs
     in XML;  e.g. adrs and name attributes for <Address>)?

  o  Clarify effect of namespaces (or not) on element attribute names.
     XML attributes do not follow the same default namespace rules as
     elements.

  o  Define DTD, XML schema and RDF schema.

  o  Finalize IANA considerations.


Full copyright statement

  Copyright (C) The Internet Society 2001.  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain
  it or assist in its implementation may be prepared, copied,
  published and distributed, in whole or in part, without restriction
  of any kind, provided that the above copyright notice and this
  paragraph are included on all such copies and derivative works.
  However, this document itself may not be modified in any way, such
  as by removing the copyright notice or references to the Internet
  Society or other Internet organizations, except as needed for the
  purpose of developing Internet standards in which case the
  procedures for copyrights defined in the Internet Standards process
  must be followed, or as required to translate it into languages
  other than English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.












Klyne                       Internet draft                   [Page 26]


XML coding of RFC822 messages                             9 April 2002
<draft-klyne-message-rfc822-xml-03.txt>


  This document and the information contained herein is provided on
  an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
  ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
  IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
  THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

















































Klyne                       Internet draft                   [Page 27]