INTERNET-DRAFT                                             Stuart Kwan
                                                          James Gilroy
                                                          Levon Esibov
                                                       Microsoft Corp.
                                                              May 2001
<draft-skwan-utf8-dns-06.txt>                    Expires November 2001


     Using the UTF-8 Character Set in the Domain Name System

Status of this Memo

This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.  Note that
other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Abstract

The Domain Names standard specifies that hostnames are represented
using the ASCII character encoding.  This document expands that
specification to allow the use of the UTF-8 character encoding, a
superset of ASCII and a translation of the UCS-2 character encoding.


1. Introduction

The Domain Names standard [RFC1123] specifies that hostnames are
represented using the ASCII character encoding.  This document expands
that specification to allow the use of the UTF-8 character encoding
[RFC2044], a superset of ASCII and a translation of the UCS-2
character encoding.

Interpreting names as ASCII-only limits the utility of DNS in an
international setting.  The UTF-8 character set includes characters
from most of the world's written languages, allowing a far greater
range of possible names and allowing names to use characters that are
relevant to a particular locality.  UTF-8 is the recommended character
set for protocols that are evolving beyond ASCII [RFC2130].

Expires November 2001                                         [Page 1]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


This document defines the technology for a richer character set in
DNS.  This document specifically does not define policy for the
characters allowed in a name when used in a particular application.
For example, some protocols place restrictions on the characters
allowed in a name


2. Protocol Description

2.1 Components and roles

Before the description of the protocol itself authors feel a need to
clarify which components are involved in processing the hostnames and
describe the usage of the hostnames by these components. The following
list contains such information.

User.
User could be a human or application. Its role is to specify (also
known as "write") and retrieve (also known as "read") the hostname to
and from an application. The examples of such operations include
typing the hostname, writing it on a touch sensitive screen, reading
the name from the monitor, listening to a voicemail, etc...

Application.
Application's role is to
- process the hostname specified by user or other local or remote
  application.
- return to the user (for example display on a monitor screen) the
  hostname returned by DNS resolver.
- call DNS name resolution APIs to request resolver to perform the
  name resolution

Resolver.
Resolver's role is to
- process the name resolution requests from an application and submit
  appropriate DNS query to the DNS servers
- process the response from a DNS server and pass the response to the
  Application.

DNS server.
The role of the DNS server is to store and maintain the DNS data,
process the updates to its database, update the replica copies of the
databases and perform the DNS name resolution through responding to
the DNS queries.


2.2 Protocol details

This section describes the modifications (if any) to each of these
components and interfaces between the communicating components.



Expires November 2001                                         [Page 2]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


2.2.1 Users

No modifications to the users are proposed in this document. At the
same time support of this protocol by other components specified later
in this section may enable users to start using in hostnames
characters from wider set than one specified in [RFC1123].


2.2.2 Interface between users and applications

User may use any character set or multiple character sets supported by
the particular application. Specification of the allowed character
sets supported by an application is outside of the scope of this
document. The decision on which characters sets can be used to allow
user to input and retrieve the hostnames is left to the implementers
of the particular applications unless a protocol underlying specific
application specifies the supported characters set. Thus this protocol
does not affect the interface between users and applications.


2.2.3 Applications

Storage format of the hostnames by the applications is outside of the
scope of this protocol.


2.2.4 Interface between applications and resolvers

This protocol does not specify the APIs that applications should use
to request the resolver to perform the DNS name resolution of the
internationalized hostnames. Instead it only specifies the format of
the hostnames specified in the input and output of such APIs.

The applications supporting non-ASCII characters in hostnames MUST
pass to the resolvers a hostname in ISO/IEC 10646 encoding. If the
response returned by the resolver to the application contains the
hostname, then the application should expect the hostname to be
encoded using ISO/IEC 10646.


2.2.5 Resolvers

Before sending the hostname in the query packet, the resolver MUST
prepare each name part as specified in [NAMEPREP]. After the name
preparation the resolver MUST convert the hostname to be encoded using
UTF-8 as specified in [RFC2044].
Names encoded in UTF-8 must not exceed the size limits clarified in
[RFC2181]. Character count is insufficient to determine size, since
some UTF-8 characters exceed one octet in length.




Expires November 2001                                         [Page 3]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


When resolver receives a response to the query from a DNS server, it
MUST convert all of the hostnames from UTF-8 encoded format to the
ISO/IEC 10646 encoding before passing these hostnames back to the
application.


2.2.6 DNS servers

DNS servers authoritative for the records containing the hostnames
containing the characters not allowed by [RFC1123] MUST allow use of
the namepreped UTF-8 format to store and transmit those parts of the
hostnames.

According to existing standards, any binary string can be used in a
DNS name [RFC2181], but names must be compared with case-insensitivity
[RFC1035]. At the same time DNS protocol standard states that original
case SHOULD be preserved when possible as data is entered into the DNS
database. This requirement is modified as follows: a DNS server
authoritative for the internationalized hostnames MUST nameprep and
perform UTF-8 conversion on all names containing internationalized
characters in both record names and record data before storing these
hostnames and transmitting those names in any message. This new
requirement guarantees case-insensitive comparison of the
internationalized hostnames even by those DNS servers that do not
support this protocol.

DNS servers must compare names that contain UTF-8 characters
byte-for-byte, as opposed to using Unicode equivalency rules.


3. Interoperability Considerations

If user continues using ASCII-only characters in the hostnames, then
there is no need to upgrade any applications and/or resolvers.

As pointed in the previous section, there is no need to upgrade DNS
servers, except possibly those that are authoritative for the zones
containing internationalized hostnames.

The following interoperability issues should be taken into account

- A legacy application may not be able to process the hostnames
containing non-ASCII characters returned by DNS resolvers. Effect of
failure to process a name containing 7-bit needs to be separately
investigated.
- If other protocols decide to use the nameprep-UTF-8-encoding to
represent internationalized hostnames in their wire packets, then a
legacy application supporting such protocol that receives UTF-8
encoded hostname from another application (for example, such as mail
server or client) may fail to process such hostname. Effect of failure
to process a name containing 7-bit needs to be separately investigate.


Expires November 2001                                         [Page 4]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


Thus hostnames that are intended to be globally usable [RFC1958] on
legacy applications should still contain ASCII-only characters per
[RFC1123].

- If an updated application runs on legacy resolver that rejects name
resolution of the names containing any character not allowed by
[RFC1123], then such resolvers will require an upgrade to enable name
resolution of the internationalized hostnames.

- As specified above, DNS servers authoritative for the DNS records
containing the internationalized hostnames must be able to save and
load the hostnames containing napepreped-UTF-8-converted characters.
If the DNS server doesn't satisfy this requirement, but needs to host
such resource records, then it needs to be upgraded.

- Any DNS server involved in a name resolution process of the DNS
records containing an internationalized hostname must not reject name
resolution only because the hostname contains characters not allowed
by [RFC1123]. This requirement does not mean that every DNS server in
the name resolution path between the client and authoritative server
must be able to store and load the DNS records containing the
internationalized hostnames, but only means that the DNS server
performing recursive resolution needs to be able to query for and
cache such records, and that the DNS servers authoritative for the DNS
names higher in the DNS name hierarchy than the internationalized
names in query, need to be able to respond to such queries.
Overwhelming majority of the DNS servers currently deployed on the
Internet already satisfy this requirement. Authors are not aware of
any implementation of the DNS server widely deployed on the Internet
that doesn't satisfy this requirement.

Although most of the DNS servers may be capable of accepting a zone
transfer of a zone containing UTF-8 encoded hostnames, some of them
may not be able to store those names in a zone file or load those
names from a zone file. Administrators should exercise caution when
transferring a zone containing UTF-8 encoded hostnames to such DNS
servers.



4. Security Considerations

Support for internationalized hostnames introduces a possibility of a
new type of spoofing attacks that could be based on attacker's
knowledge of misbehaving applications or resolvers that modifies the
internationalized hostname that needs to be resolved. For example, if
there is an application that modifies any character containing 7-bit
in some predictable manner (for example by simply dropping the 7-bit),





Expires November 2001                                         [Page 5]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


then an attacker may register a DNS record mapping the derivative
(i.e. modified by the misbehaving application or resolver) name to the
data desired by attacker. In this scenario any user using such
misbehaving application may receive as a result of name resolution the
data (for example an IP address in A resource record) specified by the
attacker without noticing that they are subjected to an attack even if
the DNSSEC is used to verify the authenticity of the response.

Because this protocol depends on the procedures described in
[NAMEPREP] and [RFC2044], the security issues identified in these
document are also applicable to this protocol.


5. Acknowledgements

The authors of this document would like to thank the following people
for their contribution to this specification:  John McConnell,
Cliff Van Dyke and Bjorn Rettig.


6. References

[RFC1035]     P.V. Mockapetris, "Domain Names - Implementation and
              Specification," RFC 1035, ISI, Nov 1987.

[RFC2044]     F. Yergeau, "UTF-8, a transformation format of Unicode
              and ISO 10646," RFC 2044, Alis Technologies, Oct 1996.

[RFC1958]     B. Carpenter, "Architectural Principles of the
              Internet," RFC 1958, IAB, June 1996.

[RFC1123]     R. Braden, "Requirements for Internet Hosts -
              Application and Support," STD 3, RFC 1123, January 1989.

[RFC2130]     C. Weider et. al., "The Report of the IAB Character
              Set Workshop held 29 July - 1 March 1996",
              RFC 2130, Apr 1997.

[RFC2181]     R. Elz and R. Bush, "Clarifications to the DNS
              Specification," RFC 2181, University of Melbourne and
              RGnet Inc, July 1997.

[UNICODE 2.0] The Unicode Consortium, "The Unicode Standard, Version
              2.0," Addison-Wesley, 1996. ISBN 0-201-48345-9.

[NAMEPREP]    Paul Hoffman and Marc Blanchet, "Preparation of
              Internationalized Host Names",
              draft-ietf-idn-nameprep-*.txt.





Expires November 2001                                         [Page 6]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


7. Author's Addresses

Stuart Kwan                         James Gilroy
Microsoft Corporation               Microsoft Corporation
One Microsoft Way                   One Microsoft Way
Redmond, WA  98052                  Redmond, WA  98052
USA                                 USA
skwan@microsoft.com                 jamesg@microsoft.com

Levon Esibov
Microsoft Corporation
One Microsoft Way
Redmond, WA  98052
USA
levone@microsoft.com


11.  Intellectual Property Statement

The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to  pertain
to the implementation or use of the technology described in this
document or the extent to which any license under such rights might or
might not be available; neither does it represent that it has made any
effort to identify any such rights.  Information on the IETF's
procedures with respect to rights in standards-track and standards-
related documentation can be found in BCP-11.  Copies of claims of
rights made available for publication and any assurances of licenses to
be made available, or the result of an attempt made to obtain a general
license or permission for the use of such proprietary rights by
implementors or users of this specification can be obtained from the
IETF Secretariat.

The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary rights
which may cover technology that may be required to practice this
standard.  Please address the information to the IETF Executive
Director.


12.  Full Copyright Statement

Copyright (C) The Internet Society (2001).  All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it or
assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are included
on all such copies and derivative works.  However, this document itself
may not be modified in any way, such as by removing the copyright notice
or references to the Internet Society or other Internet organizations,
except as needed for the purpose of developing Internet standards in

Expires November 2001                                         [Page 7]


INTERNET-DRAFT                  UTF-8 DNS                     May 2001


which case the procedures for copyrights defined in the Internet
Standards process must be followed, or as required to translate it into
languages other than English.  The limited permissions granted above are
perpetual and will not be revoked by the Internet Society or its
successors or assigns.  This document and the information contained
herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE
INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

Expires November 2001                                         [Page 8]