Network Working Group Abel, Ed.
Internet-Draft TWNIC
Expires: October 28, 2007 April 26, 2007
Internationalized Email Headers
draft-ietf-eai-utf8headers-05.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 28, 2007.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Abstract
Full internationalization of electronic mail requires not only the
capability to transmit non-ASCII content, to encode selected
information in specific header fields, and to use non-ASCII
characters in envelope addresses. It also requires being able to
express those addresses and information based on them in mail header
fields. This document specifies an experimental variant of Internet
mail that permits the use of Unicode encoded in UTF-8, rather than
ASCII, as the base form for Internet email header field bodies. This
form is permitted in transmission only if authorized by an SMTP
Abel Expires October 28, 2007 [Page 1]
Internet-Draft I18N Email Headers April 2007
extension, as specified in an associated specification.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Role of this specification . . . . . . . . . . . . . . . . 3
2. Background and History . . . . . . . . . . . . . . . . . . . . 3
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Changes on Message Header Fields . . . . . . . . . . . . . . . 4
4.1. UTF8 Syntax . . . . . . . . . . . . . . . . . . . . . . . 5
4.2. Changes on MIME headers . . . . . . . . . . . . . . . . . 5
4.3. Syntax extensions to RFC 2822 . . . . . . . . . . . . . . 6
4.4. Change on addr-spec syntax . . . . . . . . . . . . . . . . 8
4.5. Trace field syntax . . . . . . . . . . . . . . . . . . . . 8
4.6. UTF8SMTP message . . . . . . . . . . . . . . . . . . . . . 9
5. Additional issues . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Mailing list header fields . . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA considerations . . . . . . . . . . . . . . . . . . . . . 10
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
9. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1. draft-ietf-eai-utf8header-05 . . . . . . . . . . . . . . . 10
9.2. draft-ietf-eai-utf8header-04 . . . . . . . . . . . . . . . 11
9.3. draft-ietf-eai-utf8header-03 . . . . . . . . . . . . . . . 11
9.4. draft-ietf-eai-utf8header-02 . . . . . . . . . . . . . . . 11
9.5. draft-ietf-eai-utf8header-01 . . . . . . . . . . . . . . . 11
9.6. draft-ietf-eai-utf8header-00 . . . . . . . . . . . . . . . 11
9.7. draft-yeh-ima-utf8header-01 . . . . . . . . . . . . . . . 11
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
10.1. Normative References . . . . . . . . . . . . . . . . . . . 12
10.2. Informative References . . . . . . . . . . . . . . . . . . 13
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13
Intellectual Property and Copyright Statements . . . . . . . . . . 15
Abel Expires October 28, 2007 [Page 2]
Internet-Draft I18N Email Headers April 2007
1. Introduction
1.1. Role of this specification
Full internationalization of electronic mail requires several
capabilities:
o The capability to transmit non-ASCII content, provided for as part
of the basic MIME specification [RFC2045], [RFC2046].
o The capability to express those addresses, and information related
to and based on them, in mail header fields, defined in this
document.And, finally,
o The capability to use international characters in envelope
addresses, discussed in [EAI-overview] and specified in
[EAI-SMTP-extension].
This document specifies an experimental variant of Internet mail that
permitsthe use of Unicode encoded in UTF-8 [RFC3629], rather than
ASCII, as the base form for Internet email header fields. This form
is permitted in transmission, if authorized by the SMTP extension
specified in [EAI-SMTP-extension] or by other transport mechanisms
capable of processing it.
2. Background and History
Mailbox names often represent the names of human users. Many of
these users throughout the world have names that are not normally
expressed with just the ASCII repertoire of characters, and would
more or less like to use their real names in their mailbox names.
These users are also likely to use non-ASCII text in their common
names and subjects of email messages, both in what they send and what
they receive. This protocol specifies UTF-8 as the encoding to
represent email header field bodies.
The traditional format of email messages [RFC2822] allows only ASCII
characters in the header fields of messages. This prevents users
from having email addresses that contain non-ASCII characters. It
further forces non-ASCII text in common names, comments, and in free
text (such as in the Subject: field) to be encoded (as required by
MIME format [RFC2047]).This specification describes a change to the
email message format that is related to the SMTP message transport
change described in the associated document [EAI-overview] and
[EAI-SMTP-extension], and that allows non-ASCII characters most email
header fields. These changes affect SMTP clients, SMTP servers, mail
user agents (MUAs), list expanders, gateways to other media, and all
other processes that parse or handle email messages.
Abel Expires October 28, 2007 [Page 3]
Internet-Draft I18N Email Headers April 2007
As specified in [EAI-SMTP-extension], an SMTP protocol extension
"UTF8SMTP" is used to prevent the transmission of messages with UTF-8
header fields to systems that cannot handle such messages.
Use of this SMTP extension helps prevents the introduction of such
messages into message stores that might misinterpret, improperly
display, or mangle such messages. It should be noted that using an
ESMTP extension does not prevent to transfer email messages with
UTF-8 header fields to other systems that use the email format for
messages and that may not be upgraded, such as the POP and IMAP
servers. Those protocols also need to be changed in order to handle
stored messages that have UTF-8 header fields.
The objective for this protocol is to allow UTF-8 in email header
fields. Issues about how to handle messages that contain UTF-8
header fields but are proposed to be delivered to systems that have
not been upgraded to support this capability are discussed elsewhere,
particularly in [EAI-downgrading].
3. Terminology
In this document, even ordinarg ASCII characters are UTF-8 characters
if the bodies of those headers contain <utf8-xtra-char>s.
Unless otherwise noted, all terms used here are defined in [RFC2821]
or [RFC2822] or in [EAI-overview].
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
and "MAY" in this document are to be interpreted as described in
[RFC2119].
Status of this memo on the ima mailing list. See
https://www1.ietf.org/mailman/listinfo/ima for information about
subscribing. The list's archive is at
http://www1.ietf.org/mail-archive/web/ima/index.html.
4. Changes on Message Header Fields
SMTP clients can send header fields in UTF-8 format, if the UTF8SMTP
extension is advertised by the SMTP server or as permitted by other
transport mechanisms.
This protocol does NOT change the definition of header field names.
That is, only the bodies of header fields are allowed to have UTF-8
characters; the rules in [RFC2822] for header names are not changed.
To permit UTF-8 characters in field values, the header definition in
Abel Expires October 28, 2007 [Page 4]
Internet-Draft I18N Email Headers April 2007
[RFC2822] must be extended to support new format. The following ABNF
is defined to substitute those definition in [RFC2822].
Those syntax rules not referred to this section remain as the
original definition in [RFC2822].
4.1. UTF8 Syntax
UTF-8 characters can be defined in terms of octets using the
following ABNF, taken from [RFC3629]:"
UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail /
%xE1-EC 2(UTF8-tail) /
%xED %x80-9F UTF8-tail /
%xEE-EF 2(UTF8-tail)
UTF8-4 = %xF0 %x90-BF 2(UTF8-tail) /
%xF1-F7 3(UTF8-tail)
UTF8-tail = %x80-BF
These are taken from [RFC3629], but kept in this document for reasons
of convenience.
[Note in draft: Whether normalizing is needed or not will be place in
here.]
4.2. Changes on MIME headers
The syntax of <value>, as defined in [RFC2045] is
value = token / utf8-quoted-string
To be able to use UTF-8 characters in MIME headers, <quoted-string>
syntax is extended as
qcontent = utf8-qtext / utf8-quoted-pair
In all those header fields, Observe that such Content-Type and other
header fields may be found both amongst the top-level fields of a
message and also within multiparts; and also that a complete message
conforming to this document may now appear as a message/rfc822 (in
both cases, subject to downgrade when that is necessary)
Abel Expires October 28, 2007 [Page 5]
Internet-Draft I18N Email Headers April 2007
4.3. Syntax extensions to RFC 2822
The following rules are intended to extend the corresponding rules in
[RFC2822] to allow UTF8 characters.
ctext /= NO-WS-CTL / ; all of <text> except
%d33-39 / ; SP, HTAB, "(", ")"
%d42-91 / ; and "\"
%d93-126 /
UTF8-xtra-char
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
UTF8-xtra-char
comment = "(" *([FWS] utf8-ccontent) [FWS] ")"
word = utf8-atom / utf8-quoted-string
This means that all the [RFC2822] constructs that build upon these
will permit UTF-8 characters, including comments and quoted strings.
Besides, in order to allow UTF8 characters in <addr-spec> we have to
change the syntax of <atext>. However, it would also lead <msg-id>
to allow UTF8 characters, which is not allowed due to the limitation
described in Section 4.5. So <utf8-atext> is added to meet this
requirement.
Abel Expires October 28, 2007 [Page 6]
Internet-Draft I18N Email Headers April 2007
utf8-text = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR and LF
%d14-127 /
UTF8-xtra-char
utf8-quoted-pair = ("\" utf8-text) / obs-qp
utf8-qcontent = utf8-qtext / utf8-quoted-pair
utf8-quoted-string = [CFWS]
DQUOTE *([FWS] utf8-qcontent) [FWS] DQUOTE
[CFWS]
utf8-ccontent = ctext / utf8-quoted-pair / comment
utf8-qtext= qtext / UTF8-xtra-char
qtext = NO-WS-CTL / ; all of <text> except
%d33 / ; The rest of the US-ASCII
%d35-91 / ; characters not including "\"
%d93-126 / ; or the quote character
utf8-atext = ALPHA / DIGIT /
"!" / "#" / ; Any character except
"$" / "%" / ; controls, SP, and specials.
"&" / "'" / ; Used for atoms
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~" /
UTF8-xtra-char
utf8-atom = [CFWS] 1*utf8-atext [CFWS]
utf8-dot-atom = [CFWS] utf8-dot-atom-text [CFWS]
utf8-dot-atom-text = 1*utf8-atext *("." 1*utf8-atext)
[NOTE IN DRAFT: If any header needs to be restricted to disallow
this, please raise the issue on the mailing list.]
Note, however, this does not remove any constraint on the character
set of protocol elements; for instance, all the allowed values for
timezone in the Date: headers are still expressed in ASCII. And
also, none of this revised syntax affects what is allowed in a
<message-id>, which will still remain in pure ASCII.
Abel Expires October 28, 2007 [Page 7]
Internet-Draft I18N Email Headers April 2007
4.4. Change on addr-spec syntax
Internationalized email addresses are represented in UTF-8. Thus,
all header fields containing <mailbox>es are updated to permit UTF-8
as well as an additional, optional all-ascii alternate address. Note
that MSAs and MTAs may need to downgrade internationalized messages.
The procedure for doing so in described in [EAI-downgrading].
mailbox = name-addr / addr-spec / utf8-addr-spec
angle-addr = [CFWS] "<" utf8-addr-spec<alt-address>">" [CFWS] / \
[CFWS] "<" utf8-addr-spec ">" [CFWS] / \
[CFWS] utf8-addr-spec [CFWS]
utf8-addr-spec = utf8-local-part "@" utf8-domain
utf8-local-part= utf8-dot-atom / utf8-quoted-string / obs-local-part
utf8-domain = utf8-dot-atom / domain-literal / obs-domain
alt-address = [CFWS] "<" addr-spec ">" [CFWS]
Below list a few possible <mailbox> representation as example.
"DISPLAY_NAME" <ASCII@ASCII>
; traditional mailbox format
"DISPLAY_NAME" <non-ASCII@non-ASCII>
; UTF8SMTP but no ALT-ADDRESS parameter provided,
; message will bounce if UTF8SMTP extension is not supported
non-ASCII@non-ASCII
; without DISPLAY_NAME and quoted string
; UTF8SMTP but no ALT-ADDRESS parameter provided,
; message will bounce if UTF8SMTP extension is not supported
"DISPLAY_NAME" <non-ASCII@non-ASCII<ASCII@ASCII>>
; UTF8SMTP with ALT-ADDRESS parameter provided,
; ALT-ADDRESS can be used if downgrade is necessary
4.5. Trace field syntax
"For" fields containing internationalized addresses are allowed, by
use of the new uFor syntax. UTF-8 information in needed in Received
fields and such information is therefore allowed, to preserve the
integrity of those fields. The uFor syntax retains the original
UTF-8 email address between EAI-aware MTAs. Note that, should
downgrading be required, the uFor parameter is dropped per the
procedure specified in [EAI-downgrading].
Abel Expires October 28, 2007 [Page 8]
Internet-Draft I18N Email Headers April 2007
The "Return-Path" header provides the email returning address in the
mail delivery. Thus, it MUST able to carry UTF8 addresses (see the
revised syntax of <angle-addr> in Section 4.3 of this document).
This will not break the rule of trace fied integrity, because it is
added at the last MTA.
4.6. UTF8SMTP message
Certain messages must be transmitted only if SMTP extension specified
in [EAI-SMTP-extension] is supported or otherwise environment
supports these messages. These messages are called to UTF8SMTP
messages. Message is "UTF8SMTP message", if * it uses UTF-8 header
as specified on this document on header of message, or * it uses
UTF-8 header on MIME header blocks on body of message, or * it
contains a message/rfc822 which is itself a UTF8SMTP message, or * it
includes MIME parts with new MIME subtypes that are, by their
definitions, only permitted in UTF8SMTP messages.
the object defined in [EAI-dsn] is intended to be transportable as
part of an ordinary [RFC2822] message. This means that if message
includes UTF8SMTP messages, with are carried on MIME subtypes of
"message", message itself is UTF8SMTP messages. And media type for
UTF8SMTP message is defined on [EAI-dsn].
5. Additional issues
This section identifies issues that are not covered as part of this
set of specifications, but that will need to be considered as part of
UTF8SMTP deployment.
This document does not specify any requirement for normalization.
Prudent use of UTF-8 in identifiers will involve sharply restricted
forms, for instance case-folded NFKC, but this document does not
require such a form anywhere in the protocol. [Note in draft:
Whether this non-requirement is adequate is a subject for debate].
5.1. Mailing list header fields
All mailing list and mail redistribution related header are discussed
in [EAI-mailing-list].
6. Security Considerations
If a user has a non-ASCII mailbox address and an ASCII mailbox
address, a digital certificate that identifies that user may have
both addresses in the identity. Having multiple email addresses as
Abel Expires October 28, 2007 [Page 9]
Internet-Draft I18N Email Headers April 2007
identities in a single certificate is already supported in PKIX and
OpenPGP.
Because UTF-8 often requires several octets to encode a single
character, internationalized local parts may cause mail addresses to
become longer. As specified in [RFC2822], each line of characters
MUST be no more 998 octets, excluding the CRLF.
In this specification, a user could provide an ASCII alternative
address for a non-ASCII address. However, it is possible these two
address go to different mailbox, or even different persons. This
might not be a protocol problem, but the user's personal choice or
administration policy or even be a deliberate attempt to deceive or
cause confusion.
7. IANA considerations
There are no IANA considerations in this document.
8. Acknowledgements
This document was created by incorporating a good deal of material
from an old Internet Draft by Paul Hoffman [Hoffman-utf8-headers].
While many of the concepts and details have changed, the
contributions from that draft are greatly appreciated.
Most of the content of this document is provided by John C Klensin.
Also some significant comments and suggestions were received from
Charles H. Lindsey, Kari Hurtta, Chris Newman, Yangwoo KO, Yoshiro
YONEYA, and other members of the JET team and were incorporated into
the document. The editor is much great thanks to their contribution
sincerely.
9. Edit history
This section is used for tracking the update of this document. Will
be removed after finalize.
9.1. draft-ietf-eai-utf8header-05
1. ABNF revise.
2. Remove original the section 4 (Pre-requirement)
3. Add Section 4.6
Abel Expires October 28, 2007 [Page 10]
Internet-Draft I18N Email Headers April 2007
9.2. draft-ietf-eai-utf8header-04
1. ABNF revise.
2. Modify uFor description in Section 4.5
9.3. draft-ietf-eai-utf8header-03
1. Editrial changes on terms and english.
2. ABNF revise.
3. addr-spec change, put ALT-ADDRESS inside "<" and ">" quote with
"<" and ">".
4. Remove the "Header-Type" header.
5. Add uFor description in Section 4.5
6. Remove the content in IANA considerations since "Header-Type" is
removed.
9.4. draft-ietf-eai-utf8header-02
1. Editrial changes on terms and english.
2. Change the header name "UTF8SMTP" to "Header-Type", and ABNF
revise.
3. addr-spec change, put ALT-ADDRESS inside "<" and ">" quote with
"[" and "]".
4. IANA considerations section rewrite into registeration templates
specified in [RFC3864].
9.5. draft-ietf-eai-utf8header-01
1. ABNF revise.
2. Terminology sync with overview document.
3. addr-spec change, put ALT-ADDRESS inside "<" and ">" quote with
"{" and "}".
4. add IANA considerations to register the new 2822 header
"UTF8SMTP".
5. add Security considerations about relation of UTF8SMTP address to
ALT-ADDRESS.
9.6. draft-ietf-eai-utf8header-00
1. ABNF added.
2. Editrial changes.
3. Sent it as WG document.
9.7. draft-yeh-ima-utf8header-01
1. Section re-arranged.
Abel Expires October 28, 2007 [Page 11]
Internet-Draft I18N Email Headers April 2007
2. Remove content are not below to this document.
10. References
10.1. Normative References
[ASCII] American National Standards Institute (formerly United
States of America Standards Institute), "USA Code for
Information Interchange", ANSI X3.4-1968, 1968.
ANSI X3.4-1968 has been replaced by newer versions with
slight modifications, but the 1968 version remains
definitive for the Internet.
[EAI-SMTP-extension]
Yao, J., Ed. and Wei. Mao, "SMTP extension for
internationalized email address",
draft-ietf-eai-smtpext-04.txt (work in progress),
April 2007.
[EAI-mailing-list]
Gellens, Randall., "Mailing Lists and Internationalized
Email Addresses", draft-ietf-eai-mailinglist-01.txt (work
in progress), January 2007.
[EAI-overview]
Klensin, J. and Y. Ko, "Overview and Framework of
Internationalized Email Address Delivery",
draft-ietf-eai-framework-05.txt (work in progress),
Feburary 2007.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded
Word Extensions:
Character Sets, Languages, and Continuations", RFC 2231,
November 1997.
[RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
April 2001.
[RFC2822] Resnick, P., "Internet Message Format", RFC 2822,
April 2001.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, November 2003.
Abel Expires October 28, 2007 [Page 12]
Internet-Draft I18N Email Headers April 2007
10.2. Informative References
[EAI-downgrading]
YONEYA, Yoshiro., Ed. and Kazunori. Fujiwara, Ed.,
"Downgrading mechanism for Internationalized eMail Address
(IMA)", draft-ietf-eai-downgrade-01.txt (work in
progress), March 2007.
[EAI-dsn] Newman, C., "International Delivery and Disposition
Notifications", draft-ietf-eai-dsn-00.txt (work in
progress), January 2007.
[]
Hoffman, P., "SMTP Service Extensions or Transmission of
Headers in UTF-8 Encoding",
draft-hoffman-utf8headers-00.txt (work in progress),
December 2003.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, November 1996.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
November 1996.
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
Part Three: Message Header Extensions for Non-ASCII Text",
RFC 2047, November 1996.
[RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration
Procedures for Message Header Fields", BCP 90, RFC 3864,
September 2004.
[RFC4646] Phillips, A. and M. Davis, "Tags for Identifying
Languages", BCP 47, RFC 4646, September 2006.
Abel Expires October 28, 2007 [Page 13]
Internet-Draft I18N Email Headers April 2007
Author's Address
Abel Yang (editor)
TWNIC
4F-2, No. 9, Sec 2, Roosvelt Rd.
Taipei, 100
Taiwan
Phone: +886 2 23411313 ext 505
Email: abelyang@twnic.net.tw
Abel Expires October 28, 2007 [Page 14]
Internet-Draft I18N Email Headers April 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Abel Expires October 28, 2007 [Page 15]