Network Working Group                                          M. Duerst
Internet-Draft                                       W3C/Keio University
Expires: December 23, 2002                                 June 24, 2002

                 Internationalized Domain Names in URIs

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on December 23, 2002.

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.


   This document proposes to upgrade the definition of URIs (RFC 2396)
   [RFC2396] to work consistently with internationalized domain names.

Duerst                  Expires December 23, 2002               [Page 1]

Internet-Draft                IDNs in URIs                     June 2002

Table of Contents

   1.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . . .  3
   1.1 Changes from draft-ietf-idn-uri--01 to draft-duerst-idn-uri-00  3
   1.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01 .  3
   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  URI syntax changes . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Security considerations  . . . . . . . . . . . . . . . . . . .  5
       References . . . . . . . . . . . . . . . . . . . . . . . . . .  5
       Author's Address . . . . . . . . . . . . . . . . . . . . . . .  6
       Full Copyright Statement . . . . . . . . . . . . . . . . . . .  7

Duerst                  Expires December 23, 2002               [Page 2]

Internet-Draft                IDNs in URIs                     June 2002

1. Change Log

1.1 Changes from draft-ietf-idn-uri--01 to draft-duerst-idn-uri-00

   Changed to only change URIs; IRI syntax updated directly in IRI

   Removed syntax restriction on %hh in the US-ASCII part, but made
   clear that restrictions to domain names apply.

   Made clear that escaped domain names in URIs should only be an
   intermediate representation.

1.2 Changes from draft-ietf-idn-uri--00 to draft-ietf-idn-uri-01

   Changed requirement for URI/IRI resolvers from MUST to SHOULD

   Changed IRI syntax slightly (ichar -> idchar, based on changes in

   Various wording changes

2. Introduction

   Internet domain names serve to identify hosts and services on the
   Internet in a convenient way.  The IETF IDN working group [IDNWG] has
   been working on extending the character repertoire usable in domain
   names beyond a subset of US-ASCII.

   One of the most important places where domain names appear are
   Uniform Resource Identifiers (URIs, [RFC2396], as modified by
   [RFC2732]).  However, in the current definition of the generic URI
   syntax, the restrictions on domain names are 'hard-coded'.  In
   Section 2, this document relaxes these restrictions by updating the
   syntax, and defines how internationalized domain names are encoded in

   The syntax in this document is defined for consistency.  Uniformity
   of syntax is a very important principle of URIs.  In practice,
   escaped domanin names should be used as rarely as possible.  Wherever
   possible, the actual characters in Internationalized Domain Names
   should be preserved as long as possible by using IRIs [IRI] rather
   than URIs, and only converting to URIs and then to ACE-encoded domain
   names (or directly to ACE-encoding without even using URIs) when
   resolving the IRI.  Also, this document does in no way exclude the
   use of ACE encoding directly in an URI domain name part.  ACE
   encoding may be used directly in an URI domain name part if it is
   considered necessary for interoperability.

Duerst                  Expires December 23, 2002               [Page 3]

Internet-Draft                IDNs in URIs                     June 2002

3. URI syntax changes

   The syntax of URIs [RFC2396] currently contains the following rules
   relevant to domain names:

          hostname      = *( domainlabel "." ) toplabel [ "." ]
          domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
          toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

   The later two rules are changed as follows:

          domainlabel   = anchar | anchar *( anchar | "-" ) anchar
          toplabel      = achar | achar *( anchar | "-" ) anchar

   and the following rules are added:

                 anchar        = alphanum | escaped
                 achar         = alpha | escaped

   Characters outside the repertoire are encoded by first encoding the
   characters in UTF-8 [RFC 2279], resulting in a sequence of octets,
   and then escaping these octets according to the rules defined in

   Using UTF-8 assures that this encoding interoperates with IRIs [IRI].
   It is also aligned with the recommendations in [RFC2277] and
   [RFC2718], and is consistent with the URN syntax [RFC2141] as well as
   recent URL scheme definitions that define encodings of non-ASCII
   characters based on UTF-8 (e.g., IMAP URLs [RFC2192] and POP URLs

   The above syntax rules permit for domain names that are neither
   permitted as US-ASCII only domain names nor as internationalized
   domain names.  However, such syntax should never be used, and must
   always be rejected by resolvers.  For US-ASCII only domain names, the
   syntax rules in [RFC2396] are relevant.  For example, http:// is legal, because the corresponding 'w3' is a legal
   'domainlabel' according to [RFC2396].  However, http:// is illegal because the corresponding '*' is not a
   legal 'domainlabel' according to [RFC2396].  For domain names
   containing non-ASCII characters, the legal domain names are those for
   which the ToASCII operation ([IDNA], [Nameprep]; using the unescaped
   UTF-8 values as input) is successful.

   For consistency in comparison operations and for interoperability
   with older software, the following should be noted: 1) US-ASCII
   characters in domain names should never be escaped.  2) Because of
   the principle of syntax uniformity for URIs, it is always more

Duerst                  Expires December 23, 2002               [Page 4]

Internet-Draft                IDNs in URIs                     June 2002

   prudent to take into account the possibility that US-ASCII characters
   are escaped.

   The work of the IDN WG includes some procedures for name preparation
   [Nameprep].  Before encoding an internationalized domain name in an
   URI, this preparation step SHOULD be applied.  However, the URI
   resolver MUST also apply any steps required by [IDNA] as part of
   domain name resolution.

4. Security considerations

   The security considerations of [RFC2396] and those applying to
   internationalized domain names apply.  There may be an increased
   potential to smuggle escaped US-ASCII-based domain names across
   firewalls, although because of the uniform syntax principle for URIs,
   such a potential is already existing.


   [IDNWG]     "IETF Internationalized Domain Name (idn) Working Group".

   [IRI]       Duerst, M. and M. Suignard, "Internationalized Resource
               Identifiers (IRI)", draft-duerst-iri-00 (work in
               progress), April 2002.

   [ISO10646]  International Organization for Standardization,
               "Information Technology - Universal Multiple-Octet Coded
               Character Set (UCS) - Part 1: Architecture and Basic
               Multilingual Plane", ISO Standard 10646-1, October 2000.

   [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2141]   Moats, R., "URN Syntax", RFC 2141, May 1997.

   [RFC2192]   Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.

   [RFC2277]   Alvestrand, H., "IETF Policy on Character Sets and
               Languages", BCP 18, RFC 2277, January 1998.

   [RFC2279]   Yergeau, F., "UTF-8, a transformation format of ISO
               10646", RFC 2279, January 1998.

   [RFC2384]   Gellens, R., "POP URL Scheme", RFC 2384, August 1998.

   [RFC2396]   Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
               Resource Identifiers (URI): Generic Syntax", RFC 2396,
               August 1998.

Duerst                  Expires December 23, 2002               [Page 5]

Internet-Draft                IDNs in URIs                     June 2002

   [RFC2640]   Curtin, B., "Internationalization of the File Transfer
               Protocol", RFC 2640, July 1999.

   [RFC2718]   Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke,
               "Guidelines for new URL Schemes", RFC 2718, November

   [RFC2732]   Hinden, R., Carpenter, B. and L. Masinter, "Format for
               Literal IPv6 Addresses in URL's", RFC 2732, December

   [IDNA]      Faltstrom, P., Hoffman, P. and A. Costello,
               "Internationalizing Domain Names in Applications (IDNA)",
               draft-ietf-idn-idna-09.txt (work in progress), May 2002,

   [Nameprep]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
               Profile for Internationalized Domain Names", draft-ietf-
               idn-nameprep-10.txt (work in progress), May 2002, <http:/

Author's Address

   Martin Duerst
   W3C/Keio University
   5322 Endo
   Fujisawa  252-8520

   Phone: +81 466 49 1170
   Fax:   +81 466 49 1171

Duerst                  Expires December 23, 2002               [Page 6]

Internet-Draft                IDNs in URIs                     June 2002

Full Copyright Statement

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an


   Funding for the RFC Editor function is currently provided by the
   Internet Society.

Duerst                  Expires December 23, 2002               [Page 7]