Internet Draft                                             Maynard Kang
draft-ietf-idn-mua-00.txt                                   i-EMAIL.net
February 5, 2001
Expires on August 5, 2001

          Internationalizing Domain Names in Mail User Agents

Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."


     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.



Abstract

This document describes a way where domain names used in Internet e-mail
can be internationalized by making changes only to end-user Mail User
Agents and, by doing so, avoid damaging other applications which handle
Internet e-mail, such as Message Transfer Agents and Delivery Agents.

1. Introduction

One of the proposed solutions for internationalized domain names (IDN)
involves only updating the user applications with no changes required
to the DNS protocol, servers and resolvers [IDNA] compared to other
solutions which require changes to be made to protocol, servers,
resolvers and applications.

The underlying principle of [IDNA] may be similarly applied to the
Internet e-mail system today - by effecting changes to only the Mail
User Agent (MUA) component of the e-mail system. Thus, existing
Message Transfer Agents, Delivery Agents and other applications which
handle e-mail do not have to be changed at all.

1.1 Definitions and Conventions

Usage of terms related to the character encoding model are in
reference to Unicode Technical Report 17 [UTR17].

The terms "international character", "non-ASCII character" and
"multilingual character", which are used interchangeably, are taken
to mean any abstract character which is not included in the range
specified by [US-ASCII].

1.2 Terminology

The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
and "MAY" in this document are to be interpreted as described in RFC
2119 [RFC2119].

1.3. Design Philosophy

As the Internet e-mail system is a diverse, distributed and
heterogeneous system with many vendors deploying a vast number of
applications, it is of utmost importance that interoperability amongst
these various components is maintained. Thus, the ideal solution would
be one which does not compromise or damage the operation of any of these
existing components once internationalized domain names are encountered.

Also, solutions which call for changes to be made to many or even all
components of the Internet e-mail system would require far too much
time and effort to deploy, given that Internet e-mail has such a huge
installed base.

This solution adheres to both of the above principles, in that
interoperability is preserved and that the cost and speed of
implementation is low. All that the user has to do to use IDNs in e-mail
is update his or her MUA.

1.4. IDN Summary

This solution specifies an IDN architecture of arch-3 (just send ACE)
and a transition strategy of trans-1 (always do current plus new
architecture) as described in [IDNCOMP]. The choice of ACE format is not
defined in this document, but MUST be the same as that specified in
[IDNA] in order to maintain uniqueness and consistency.

1.5. E-mail Internationalization Summary

As many Internet e-mail standards such as the SMTP protocol [RFC821]
and the e-mail message format [RFC822] only specify usage of the 7-bit
ASCII character set [US-ASCII], international characters which use octet-
based character encoding schemes (CES) cannot be used in e-mail
transmission, headers and bodies.

Although this issue has been addressed in [RFC2045] for message bodies
and [RFC2047] for message headers through the use of a Transfer Encoding
Syntax (TES) such as Quoted-Printable or Base64, there is no similar
solution which extends the functionality of [RFC821] to include usage of
international characters, except for [RFC1652] which allows transmission
of 8-bit data passed by the DATA command in an SMTP session.

[RFC1652] however, does not fully address the problem of using IDNs in
an SMTP session - the IDN may be used in areas within the SMTP session
other than the DATA command, such as the MAIL FROM and RCPT TO commands,
where an IDN may be part of the e-mail address(es) specified there.

Hence, this would be a major stumbling block to deploying "just-send-
8bit" IDNs for use in Internet e-mail, as these IDNs would not be able
to be used in SMTP e-mail transmissions due to [RFC821] restrictions.

2. Architectural Overview

The end-user MUA may encounter IDNs in the scenarios below:

(i)   When specifying the transmission server (i.e. SMTP server)
(ii)  When specifying the retrieval server (i.e. POP3/IMAP4/any other
      retrieval mechanism)
(iii) When specifying e-mail addresses during composition of a message
(iv)  When reading messages with e-mail addresses in it

As with [IDNA], the MUA is updated in a similar fashion to process IDNs
which are input by users and process IDNs which are displayed to users,
in all of the scenarios above.

For (i) and (ii), the IDN MUST be handled in the same manner as
specified in [IDNA]. The method of handling an IDN For (iii) and (iv) is
described below in 2.1.

2.1 Interfaces between E-mail components when composing/reading a mail

The interfaces between e-mail components can be pictorially represented
as shown below.

The example assumes the setup of a POP3/IMAP4 retrieval client and
server, but the exact nature of end-to-end e-mail transmission may vary
accordingly (e.g. elm or pine would read directly from the mail store).
However, these variations do not impact an accurate description of this
solution to a large extent as no changes are required at these levels.

        +------+                                       +------+
        | User |                                       | User |
        +------+                                       +---^--|
          | User Input:          User Display: Characters/ |
          | Keyboard/Pen/etc        Glyphs on CRT or other |
    +-----v---------------+    Representation (e.g. sound) |
    | Input Method Editor |                   +------------|-----+
    +---------------------+                   | Rendering Engine |
        | Input: Any localized/               +---------^--------+
        | internationalized      Output: Any localized/ |
        | charset                     internationalized |
   +----v-----------------+                     charset |
   | +------------------+ |                  +----------|-------------+
   | | Mail Composition | |                  | +--------------+       |
   | | Interface        | | Sender's         | | Mail Reading |       |
   | +------------------+ | MUA              | | Interface    |       |
   |    |                 |                  | +--------^-----+       |
   |    | Nameprepped ACE |       Receiver's |          | Nameprepped |
   |    v                 |              MUA |          | ACE         |
   | +-------------+      |                  | +-------------------+  |
   | | SMTP Client |      |                  | | POP3/IMAP4 Client |  |
   | +-------------+      |                  | +-------------------+  |
   +----|-----------------+                  +----------^-------------+
        | Nameprepped                                   | Nameprepped
        v ACE         Nameprepped       Nameprepped     | ACE
     +-------------+  ACE   +------------+  ACE   +-------------------+
     | SMTP Server | -----> | Mail Store | -----> | POP3/IMAP4 Server |
     +-------------+        +------------+        +-------------------+

2.1.1 Interface between User and Input Method Editor

For ASCII characters, input is straightforward: the user types on the
keyboard and whichever character that is pressed is sent to the
application.

However, for international characters, the end-user has to use a script-
specific Input Method Editor (IME), which may or may not be built-into
the OS, to interpret what the user communicates to the system and
thereafter send the respective international characters to the
application.

For example, for input of Chinese characters, some users use IMEs
which support the "Pinyin" input method. When a user types "zhongguo"
(in ASCII characters) on the keyboard and selects the characters which
represent "China" (in Chinese) from a list, the IME sends the
international characters to the application in a user-determined
charset (e.g. GB2312).

2.1.2 Interface between Input Method Editor and MUA Composition
      Interface

The MUA mail composition interface (i.e. the "Compose Message"
function of the MUA) SHOULD be able to accept IDNs using 8-bit character
encoding schemes, including those represented in any localized (e.g.
GB2312) or internationalized (e.g. UTF-8) charsets.

This input typically takes place where e-mail addresses are entered
such as the "From", "To", "Cc", "Bcc" fields, amongst others, as IDNs
may be used at the right-hand-side of the "@" sign in an e-mail address
(domain-parts).

The mail composition interface MAY allow ACE input for the same
reasons as specified in [IDNA], but is not recommended as ACE is opaque
and ugly.

2.1.3 Interface between MUA Composition Interface and SMTP Client

The MUA composition interface communicates with the SMTP client in the
MUA typically through internal function calls within the software itself
or through an API. It is at this level where ACE conversion of any IDN
encountered by the MUA composition interface takes place.

Before converting the name parts of the IDN into ACE, the MUA MUST
prepare each name part as specified in [NAMEPREP]. Thereafter, the MUA
MUST convert the name parts into ACE before passing any data to the SMTP
client.

The SMTP client then prepares the e-mail for transmission using the
SMTP protocol [RFC821], and thereafter establishes an SMTP connection
with the user-specified SMTP server to transmit the e-mail.

It is important to note that an IDN specified in the parameters of any
SMTP command MUST be represented in nameprepped ACE at this point in
time. This includes SMTP commands which require domain parameters (such
as the HELO and EHLO commands) and commands where e-mail addresses are
specified (such as the MAIL FROM, RCPT TO, DATA, VRFY, EXPN, SEND, SOML
and SAML commands).

As for data passed by the DATA command, ACE conversion MUST be
performed when the "domain" portion of an "addr-spec" or when a "domain"
itself, within the context of [RFC822], is encountered. This is
necessary as an updated MUA may originate a message which is read by a
non-updated MUA. If this happens, the non-updated MUA may face
operational problems dealing with IDNs that appear in the "addr-spec"
which are not in ACE.

Any transfer encoding syntax to be applied to the mail headers as
specified in [RFC2047] SHOULD be performed before nameprepped ACE
conversion. This is to reduce confusion between IDNs within "addr-spec"
and "domain" portions, in the context of [RFC822], and IDNs which appear
as arbitrary data in mail headers and bodies.

2.1.4. Interface between POP3/IMAP4 client (or local mail store) and
       Mail Reading Interface

The MUA mail reading interface (i.e. "Read mail" function of an MUA)
typically displays e-mail data retrieved from either a POP3/IMAP4
client or from a local mail store through internal function calls within
the MUA software or through an API.

When e-mail containing an ACE-represented IDN is to be displayed, the
MUA SHOULD convert the ACE-represented IDN contained within the
"addr-spec" or "domain" portion specified in [RFC822] back into any
localized or internationalized charset of the user's choice, whenever
possible. In the event that it is impossible to achieve conversion back
into the selected localized charset (for example, conversion of RACE-
represented Hangeul characters into ISO-8859-1 is impossible), the MUA
should prompt the user with an error message.

It may be possible to save and retrieve information about the original
charset of the ACE-converted IDN through the use of additional
[RFC822] mail headers, but that is not (yet) addressed by this memo.

Although it is possible to render ACE into properly decoded glyphs and
display the actual abstract characters without any conversion to other
charsets, the MUA SHOULD NOT do this as it is not the primary function
of an MUA to render characters. This should be left to a rendering
engine which is separate from the MUA and typically embedded into the
OS. It is sufficient for the MUA to pass the appropriate charset to the
rendering engine for proper display.

3. ACE Length Considerations

As [RFC821] in Section 4.5.3 restricts the maximum total length of a
domain name to 64 characters, representation of IDNs using ACE may
pose a potential problem. Most ACEs typically require 3-4 ASCII
characters to represent one international character (especially in the
case of CJK characters, where compression is less effective).

That would leave only about 16-24 characters for the whole IDN,
including all name parts and dots. This is highly undesirable as some
languages such as Arabic are unable to be abbreviated and the domain
names may require a larger length than that which is allowed by
[RFC821].

To further complicate matters, several mailing list software such as
ezmlm embed domain names into the local-parts portion of an e-mail
address during management of subscriptions, together with randomly-
generated subscription information. This would leave an even smaller
maximum ACE length, if interoperability with these mailing list software
were to be maintained, given that there is also a 64 character
restriction on local parts.

4. Security Considerations

As this memo is based on [IDNA], security considerations are similar
to that faced by [IDNA]. This includes security considerations from
[NAMEPREP] as well.

5. Other Considerations

Although this document addresses end-user MUAs (e.g. elm, mutt, pine,
Eudora, Outlook Express, etc) to a large extent, the definition of an
MUA could be extended to include web-based e-mail server software and
automated programs such as mailing list management software.

End-user MUAs may also include additional functionality where IDNs may
be encountered, such as calendaring/scheduling, directory services and
digital certificate storage. This is not (yet) addressed in this memo.

6. Future Extensions

It is possible to achieve internationalization of the entire e-mail
address by representation of international characters in the local-parts
of an "addr-spec" using nameprepped ACE conversion in a similar fashion
as described in this memo.

However, this is a different problem altogether and is currently beyond
the scope of this memo.

7. References

[IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names
in Applications (IDNA)", draft-ietf-idn-idna.

[UTR17] K. Whistler & M. Davis, Unicode Consortium, "Character Encoding
Model", Unicode Technical Report #17,
http://www.unicode.org/unicode/reports/tr17/

[US-ASCII] United States of America Standards Institute, "USA Code for
Information Interchange", X3.4, 1968.

[RFC2119] Scott  Bradner, "Key words for  use in  RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.

[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.

[RFC821] Jonathan B. Postel, "Simple Mail Transfer Protocol", August
1982, RFC 821.

[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
Text Messages", August 1982, RFC 822.

[RFC2045] N. Freed & N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
November 1996, RFC 2045.

[RFC2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
Part Three: Message Header Extensions for Non-ASCII Text", November
1996, RFC 2047.

[RFC1652] J. Klensin et al., "SMTP Service Extension for 8bit-
MIMEtransport", July 1994, RFC 1652.


[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
Internationalized Host Names", draft-ietf-idn-nameprep.

A. Author's Address

Maynard Kang
i-EMAIL.net Pte Ltd
1 Kim Seng Promenade #12-07
Great World City West Tower
Singapore 237994
E-mail: maynard@i-email.net