Email Address Internationalization A. Yang
(EAI) TWNIC
Internet-Draft S. Steele
Obsoletes: 5335 (if approved) Microsoft
Updates: 2045,5322 (if approved) N. Freed
Intended status: Standards Track Oracle
Expires: January 11, 2012 July 10, 2011
Internationalized Email Headers
draft-ietf-eai-rfc5335bis-11
Abstract
Internet mail was originally limited to 7-bit ASCII. MIME added
support for the use of 8-bit character sets in body parts, and also
defined an encoded-word construct so other character sets could be
used in certain header field values. But full internationalization
of electronic mail requires additional enhancements to allow the use
of Unicode, including characters outside the ASCII repertoire, in
mail addresses as well as direct use of Unicode in header fields like
From:, To:, and Subject:, without requiring the use of complex
encoded-word constructs. This document specifies an enhancement to
the Internet Message Format that allows use of Unicode in mail
addresses and most header field content.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 11, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
Yang, et al. Expires January 11, 2012 [Page 1]
Internet-Draft I18N Email Headers July 2011
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology Used In This Specification . . . . . . . . . . . . 3
3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4
3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4
3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5
3.3. Changes to MIME Message Type Encoding Restrictions . . . . 6
3.4. The Message/global Media Type . . . . . . . . . . . . . . 6
4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
7. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 9
7.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 10
7.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 10
7.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 10
7.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10
7.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 10
7.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 10
7.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 10
7.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 10
7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 10
7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . . . . . . 11
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.1. Normative References . . . . . . . . . . . . . . . . . . . 11
8.2. Informative References . . . . . . . . . . . . . . . . . . 12
Yang, et al. Expires January 11, 2012 [Page 2]
Internet-Draft I18N Email Headers July 2011
1. Introduction
Internet mail distinguishes a message from its transport and further
divides a message between a header and a body [RFC5598]. Internet
mail header field values contain a variety of strings that are
intended to be user-visible. The range of supported characters for
these strings was originally limited to 7-bit [ASCII]. MIME
[RFC2045] [RFC2046] [RFC2047] provides the ability to use additional
character sets, but this support is limited to body part data and to
special encoded-word constructs that were only allowed in a limited
number of places in header field values.
Globalization of the Internet requires support of the much larger set
of characters provided by Unicode [RFC5198] in both mail addresses
and most header field values. Additionally, complex encoding schemes
like encoded-words introduce inefficiencies as well as significant
opportunities for processing errors. And finally, native support for
the UTF-8 charset is now available on most systems. Hence it is
strongly desirable for Internet mail to support UTF-8 [RFC3629]
directly.
This document specifies an enhancement to the Internet Message Format
[RFC5322] and to MIME that permits the direct use of UTF-8, rather
than only ASCII, in header field values, including mail addresses. A
new media type, message/global, is defined for messages that use this
extended format. This specification also lifts the MIME restriction
on having non-identity content-transfer-encodings on any subtype of
the message top-level type so that message/global parts can be safely
transmitted across existing mail infrastructure.
This specification is based on a model of native, end-to-end support
for UTF-8, which depends on having an "8-bit clean" environment
assured by the transport system. Support for carriage across legacy,
7-bit infrastructure and for processing by 7-bit receivers requires
additional mechanisms that are not provided by these specifications.
2. Terminology Used In This Specification
A plain ASCII string is fully compatible with [RFC5321] and
[RFC5322]. In this document, non-ASCII strings are UTF-8 strings if
they are in header field values which contain at least one <UTF8-non-
ascii> (see Section 3.1).
Unless otherwise noted, all terms used here are defined in [RFC5321],
[RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or
[I-D.ietf-eai-rfc5336bis].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
Yang, et al. Expires January 11, 2012 [Page 3]
Internet-Draft I18N Email Headers July 2011
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
The term "8-bit" means octets are present in the data with values
above 0x7F.
3. Changes to Message Header Fields
To permit Unicode characters in field values, the header definition
in [RFC5322] is extended to support the new format. The following
sections specify the necessary changes to RFC 5322's ABNF.
The syntax rules not mentioned below remain defined as in [RFC5322].
Note that this protocol does not change RFC 5322 rules for defining
header field names. The bodies of header fields are allowed to
contain Unicode characters, but the header field names themselves
must contain only ASCII characters.
Also note that messages in this format require the use of the
&UTF8SMTPbis; extension [I-D.ietf-eai-rfc5336bis] to be transferred
via SMTP.
3.1. UTF-8 Syntax and Normalization
UTF-8 characters can be defined in terms of octets using the
following ABNF [RFC5234], taken from [RFC3629]:
UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4
UTF8-2 = <Defined in Section 4 of RFC3629>
UTF8-3 = <Defined in Section 4 of RFC3629>
UTF8-4 = <Defined in Section 4 of RFC3629>
See [RFC5198] for a discussion of Unicode normalization;
normalization form [NFC] SHOULD be used. Actually, if one is going
to do internationalization properly, one of the most often-cited
goals is to permit people to spell their names correctly. Since many
mailbox local parts reflect personal names, that principle applies to
mailboxes as well. The NFKC normalization form SHOULD NOT be used
because it may lose information that is needed to correctly spell
some names in some unusual circumstances.
Yang, et al. Expires January 11, 2012 [Page 4]
Internet-Draft I18N Email Headers July 2011
3.2. Syntax Extensions to RFC 5322
The following rules extend the ABNF syntax defined in [RFC5322] and
[RFC5234] in order to allow UTF-8 content.
VCHAR =/ UTF8-non-ascii
ctext =/ UTF8-non-ascii
atext =/ UTF8-non-ascii
qtext =/ UTF8-non-ascii
text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8
dtext =/ UTF8-non-ascii
A consequence of the change to the dtext rule is that UTF-8 would
then be allowed in the domain parts of message-ids as well as
addresses. This is unnecessary and undesirable, so three additional
RFC 5322 rules are redefined and a new itext rule is added:
id-left = dot-id-text
id-right = dot-id-text / no-fold-literal
dot-id-text = 1*itext *("." 1*itext)
itext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for msg-ids.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
This change also specifically disallows obsolete forms of message-ids
that RFC 5322 allows.
The preceding changes mean that the following constructs now allow
Yang, et al. Expires January 11, 2012 [Page 5]
Internet-Draft I18N Email Headers July 2011
UTF-8:
1. Unstructured text, used in header fields like Subject: or
Content-description:.
2. Any construct that uses atoms, including but not limited to the
local parts of addresses. This includes addresses in the "for"
clauses of Received: header fields.
3. Quoted strings.
4. Domains. (But not in message-ids.)
Note that header field names are not on this list; these are still
restricted to ASCII.
3.3. Changes to MIME Message Type Encoding Restrictions
This specification updates Section 6.4 of [RFC2045]. [RFC2045]
prohibits applying a content-transfer-encoding to any subtypes of
"message/". This specification relaxes that rule -- it allows newly
defined MIME types to permit content-transfer-encoding, and it allows
content-transfer-encoding for message/global (see Section 3.4).
Background: Normally, transfer of message/global will be done in
8-bit-clean channels, and body parts will have "identity" encodings,
that is, no decoding is necessary.
But in the case where a message containing a message/global is
downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding
might have to be applied to the message; if the message travels
multiple times between a 7-bit environment and an environment
implementing these extensions, multiple levels of encoding may occur.
This is expected to be rarely seen in practice, and the potential
complexity of other ways of dealing with the issue are thought to be
larger than the complexity of allowing nested encodings where
necessary.
3.4. The Message/global Media Type
Internationalized messages in this format MUST only be transmitted as
authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP
environment that supports these messages. A message is a "message/
global message" if:
o it contains 8-bit UTF-8 header values as specified in this
document, or
Yang, et al. Expires January 11, 2012 [Page 6]
Internet-Draft I18N Email Headers July 2011
o it contains 8-bit UTF-8 values in the header fields of body parts.
The content of a message/global part is otherwise identical to that
of a message/rfc822 part.
If this type is sent to a 7-bit-only system, it has to have an
appropriate content-transfer-encoding applied. (Note that a system
compliant with MIME that doesn't recognize message/global is supposed
to treat it as "application/octet-stream" as described in Section
5.2.4 of [RFC2046].)
Type name: message
Subtype name: global
Required parameters: none
Optional parameters: none
Encoding considerations: Any content-transfer-encoding is permitted.
The 8-bit or binary content-transfer-encodings are recommended
where permitted.
Security considerations: See Section 4.
Interoperability considerations: This media type provides
functionality similar to the message/rfc822 content type for email
messages with international email headers. When there is a need
to embed or return such content in another message, there is
generally an option to use this media type and leave the content
unchanged or down-convert the content to message/rfc822. Both of
these choices will interoperate with the installed base, but with
different properties. Systems unaware of internationalized
headers will typically treat a message/global body part as an
unknown attachment, while they will understand the structure of a
message/rfc822. However, systems that understand message/global
will provide functionality superior to the result of a down-
conversion to message/rfc822. The most interoperable choice
depends on the deployed software.
Published specification: RFC XXXX
Applications that use this media type: SMTP servers and email
clients that support multipart/report generation or parsing.
Email clients that forward messages with international headers as
attachments.
Yang, et al. Expires January 11, 2012 [Page 7]
Internet-Draft I18N Email Headers July 2011
Additional information:
Magic number(s): none
File extension(s): The extension ".u8msg" is suggested.
Macintosh file type code(s): A uniform type identifier (UTI) of
"public.utf8-email-message" is suggested. This conforms to
"public.message" and "public.composite-content", but does not
necessarily conform to "public.utf8-plain-text".
Person & email address to contact for further information: See the
Author's Address section of this document.
Intended usage: COMMON
Restrictions on usage: This is a structured media type that embeds
other MIME media types. The 8-bit or binary content-transfer-
encoding SHOULD be used unless this media type is sent over a
7-bit-only transport.
Author: See the Author's Address section of this document.
Change controller: IETF Standards Process
4. Security Considerations
Because UTF-8 often requires several octets to encode a single
character, internationalization may cause header field values in
general and mail addresses in particular to become longer. As
specified in [RFC5322], each line of characters MUST be no more than
998 octets, excluding the CRLF. On the other hand, MDA (Mail
Delivery Agent) processes that parse, store, or handle email
addresses or local parts must take extra care not to overflow
buffers, truncate addresses, or exceed storage allotments. Also,
they must take care, when comparing, to use the entire lengths of the
addresses.
There are lots of ways of using UTF-8 to represent something
equivalent or similar to a particular displayed character or group of
characters. This may allow filtering systems to be bypassed by using
a slightly different character to avoid detection while still
reaching the end user with largely the same intended deleterious
effect. The normalization process is described in Section 3.1 is
recommended to minimize this problem.
The security impact of UTF-8 headers on email signature systems such
as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
Yang, et al. Expires January 11, 2012 [Page 8]
Internet-Draft I18N Email Headers July 2011
discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14.
If a user has a non-ASCII mailbox address and an ASCII mailbox
address, a digital certificate that identifies that user might have
both addresses in the identity. Having multiple email addresses as
identities in a single certificate is already supported in PKIX
(Public Key Infrastructure for X.509 Certificates) [RFC5280] and
OpenPGP [RFC3156], but there may be user interface issues associated
with the introduction of UTF-8 into addresses in this context.
5. IANA Considerations
IANA is requested to update the registration of the message/global
MIME type using the registration form contained in Section 3.4.
6. Acknowledgements
This document incorporates many ideas first described in Internet-
Draft form by Paul Hoffman, although many details have changed from
that earlier work.
The author especially thanks Jeff Yeh for his efforts and
contributions on editing previous versions.
Most of the content of this document was provided by John C Klensin
and Dave Crocker. Significant comments and suggestions were received
from Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov,
Chris Newman, Kristin Hubner, Yangwoo Ko, Yoshiro Yoneya, and other
members of the JET team (Joint Engineering Team) and were
incorporated into the document. The editors wish to sincerely thank
them all for their contributions.
7. Edit history
[[RFC Editor: please remove this section before publishing.]]
7.1. draft-ietf-eai-rfc5335bis-00
1. Applied Errata suggested by Alfred Hoenes.
2. Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322].
3. Abrogate <alt-address> in ABNF of <angle-addr>.
4. Revoke [RFC5504] from this document.
5. Upgrade some references from I-Ds to RFC.
Yang, et al. Expires January 11, 2012 [Page 9]
Internet-Draft I18N Email Headers July 2011
7.2. draft-ietf-eai-rfc5335bis-01
1. Author name revised.
7.3. draft-ietf-eai-rfc5335bis-02
1. ABNF revised.
7.4. draft-ietf-eai-rfc5335bis-03
1. Fix typos
2. ABNF revised
3. Improve sentence
7.5. draft-ietf-eai-rfc5335bis-04
1. improve sentences and ABNF revised based on AD and Co-chairs
7.6. draft-ietf-eai-rfc5335bis-05
1. ABNF revised based on AD comments
7.7. draft-ietf-eai-rfc5335bis-06
1. ABNF revised
2. improve Section 5
7.8. draft-ietf-eai-rfc5335bis-07
1. Minor ABNF revised in Section 3.2
2. improve Section 5
7.9. draft-ietf-eai-rfc5335bis-09
Version -08 was posted in error and withdrawn. Version 09 is is
identical to version 07 except for a date change, addition of this
note, and some vertical spacing compression on this page.
7.10. draft-ietf-eai-rfc5335bis-10
1. Add appendix and overview of changes
2. Replace polls result in Abstract and Section 1
Yang, et al. Expires January 11, 2012 [Page 10]
Internet-Draft I18N Email Headers July 2011
3. Minor Sentence modification
7.11. draft-ietf-eai-rfc5335bis-11
1. Major rewrite of entire document to incorporate Dave Crocker's
simplified ABNF.
2. The document has intentionally been refocused on implementors
wishing to adapt their software to support EAI, so much of the
explanatory and historical text has been removed. (Some of it
may be reintroduced later as an appendix.
8. References
8.1. Normative References
[ASCII] "Coded Character Set -- 7-bit American
Standard Code for Information
Interchange", ANSI X3.4, 1986.
[I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and
Framework for Internationalized
Email",
draft-ietf-eai-frmwrk-4952bis-10 (work
in progress), September 2010.
[I-D.ietf-eai-rfc5336bis] Yao, J. and W. MAO, "SMTP Extension
for Internationalized Email Address",
draft-ietf-eai-rfc5336bis-07 (work in
progress), December 2010.
[NFC] Davis, M. and K. Whistler, "Unicode
Standard Annex #15: Unicode
Normalization Forms", September 2010,
<http://www.unicode.org/reports/
tr15/>.
[RFC2119] Bradner, S., "Key words for use in
RFCs to Indicate Requirement Levels",
BCP 14, RFC 2119, March 1997.
[RFC3629] Yergeau, F., "UTF-8, a transformation
format of ISO 10646", STD 63,
RFC 3629, November 2003.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode
Format for Network Interchange",
RFC 5198, March 2008.
Yang, et al. Expires January 11, 2012 [Page 11]
Internet-Draft I18N Email Headers July 2011
[RFC5234] Crocker, D. and P. Overell, "Augmented
BNF for Syntax Specifications: ABNF",
STD 68, RFC 5234, January 2008.
[RFC5321] Klensin, J., "Simple Mail Transfer
Protocol", RFC 5321, October 2008.
[RFC5322] Resnick, P., Ed., "Internet Message
Format", RFC 5322, October 2008.
[RFC5598] Crocker, D., "Internet Mail
Architecture", RFC 5598, July 2009.
8.2. Informative References
[RFC2045] Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions
(MIME) Part One: Format of Internet
Message Bodies", RFC 2045,
November 1996.
[RFC2046] Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions
(MIME) Part Two: Media Types",
RFC 2046, November 1996.
[RFC2047] Moore, K., "MIME (Multipurpose
Internet Mail Extensions) Part Three:
Message Header Extensions for Non-
ASCII Text", RFC 2047, November 1996.
[RFC3156] Elkins, M., Del Torto, D., Levien, R.,
and T. Roessler, "MIME Security with
OpenPGP", RFC 3156, August 2001.
[RFC5280] Cooper, D., Santesson, S., Farrell,
S., Boeyen, S., Housley, R., and W.
Polk, "Internet X.509 Public Key
Infrastructure Certificate and
Certificate Revocation List (CRL)
Profile", RFC 5280, May 2008.
[RFC6152] Klensin, J., Freed, N., Rose, M., and
D. Crocker, "SMTP Service Extension
for 8-bit MIME Transport", STD 71,
RFC 6152, March 2011.
Yang, et al. Expires January 11, 2012 [Page 12]
Internet-Draft I18N Email Headers July 2011
Authors' Addresses
Abel Yang
TWNIC
4F-2, No. 9, Sec 2, Roosevelt Rd.
Taipei, 100
Taiwan
Phone: +886 2 23411313 ext 505
EMail: abelyang@twnic.net.tw
Shawn Steele
Microsoft
EMail: Shawn.Steele@microsoft.com
Ned Freed
Oracle
800 Royal Oaks
Monrovia, CA 91016-6347
USA
EMail: ned+ietf@mrochek.com
Yang, et al. Expires January 11, 2012 [Page 13]