Network Working Group                                         J. Klensin
Internet-Draft                                             June 17, 2003
Expires: December 16, 2003

   Registration Restrictions on Internationalized Domain Names -- An

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on December 16, 2003.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.


   IETF has introduced standards-track mechanisms to enable the use of
   "internationalized", i.e., non-ASCII, names in the DNS and
   applications that use it.  This has led, in turn, to concerns that
   characters with similar meanings or appearance could cause user
   confusion and opportunities for deliberate deception and fraud.  Part
   of this problem can be addressed by limiting, on a per-zone (or
   per-registry) basis, the specific characters that can be used to be a
   subset of the list allowed by the standard and by creating
   "reservations" of labels that might create confusion with those that
   are permitted.  The model for doing this for languages that use
   characters that originated with Chinese has been extensively
   developed in another document.  This document discusses some of the

Klensin                Expires December 16, 2003                [Page 1]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

   issues in that design and relates them to considerations and
   mechanisms that might be appropriate for other languages and scripts,
   especially those involving alphabetic characters.

   This document is intended to supply a basis for adapting methods
   developed for Chinese, Japanese, and Korean to other languages and
   scripts.  If these adaptations are made carefully and with due
   consideratio for local issues, the likelihood of problematic DNS
   registrations with be significantly reduced.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  The JET Guidelines Model . . . . . . . . . . . . . . . . . . .  3
   3.  Reserved Names and Label Packages  . . . . . . . . . . . . . .  5
   4.  Languages and Scripts  . . . . . . . . . . . . . . . . . . . .  5
   5.  Reservations and Exclusions  . . . . . . . . . . . . . . . . .  7
   5.1 Sequence Exclusions for Valid Characters . . . . . . . . . . .  7
   5.2 Character Pairing Issues . . . . . . . . . . . . . . . . . . .  7
   6.  Some Implications of this Approach . . . . . . . . . . . . . .  7
   7.  Conclusions and Recommendations  . . . . . . . . . . . . . . .  8
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
       References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 10
       Intellectual Property and Copyright Statements . . . . . . . . 11

Klensin                Expires December 16, 2003                [Page 2]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

1. Introduction

   Once work on the basic model for encoding non-ASCII strings in the
   DNS with IDNA ([1], [2], [3]) was nearing completion, it became clear
   that it would be desirable for registries to impose additional
   restrictions on the names that could actually be registered (e.g.,
   see [6]) as a means of reducing potential confusion among characters
   that were similar in some way.  These restrictions were, in many
   respects, part of a long tradition.  For example, while the original
   DNS specifications [4] permitted any string of octets to be used in a
   DNS label, they also recommended the use of a much more restricted
   subset, one that was derived from the much older "hostname" rules [7]
   and defined by the "LDH" (for "letter digit hyphen", the three
   permitted types of characters) convention. Enforcement of those
   restricted rules in registrations was the responsibility of the
   registry or domain administrator.  They were not embedded in the DNS
   protocol itself, although some applications protocols, notably those
   concerned with electronic mail, imposed and enforced similar rules.

   For non-ASCII names (so-called "internationalized domain names" or
   "IDNs"), the problem was more complicated than that which led to the
   "LDH" (hostname) rules.  In the earlier situation, all protocols,
   hosts, and DNS zones used ASCII exclusively in practice, so the LDH
   restriction could reasonably be applied uniformly across the
   Internet. With the introduction of a very large character repertoire,
   and different locations and languages considering different
   characters important, the optimal registration restrictions became,
   not a global matter, but ones that were different in different areas
   and, hence, in different DNS zones.

   To date, the best-developed system for handling registration
   restrictions for IDNs is the JET Guidelines for Chinese, Japanese,
   and Korean [5], the so-called "CJK" languages.  That system is
   limited to those languages and, in particular, to their common script
   base.   This document explores the principles behind those guidelines
   and some of the issues that might arise in trying to adapt them to
   alphabetic languages.

   A terminology note: The term "confusion" is used very generically in
   this document to cover the entire range from accidental user
   misperception of the relationship between characters with some
   characteristic in common (typically appearance, sound, or meaning) to
   cybersquatting and [other] deliberate fraudulent attempts to exploit
   those relationships.

2. The JET Guidelines Model

   The JET Guidelines establish several new ideas for DNS registry

Klensin                Expires December 16, 2003                [Page 3]

Internet-Draft     DNS IDN Registration Restrictions           June 2003


   o  "Reserved" names that do not appear in zone files

   o  "Packages" of names that are controlled, as a block, by a single

   o  Tables of permitted characters on a per-zone and per-language
      basis, potentially with supplemental processing to impose
      additional syntactic, semantic, or linguistic rules.

   o  Potential lists of "variant" characters, which are treated as
      more-or-less equivalent to other characters.

   In the JET Guideline model, a prospective registrant approaches the
   registry for a zone (perhaps through an intermediate registrar) with
   a label string --a proposed name to be registered-- and a list of
   languages in which that name is to be interpreted.  The languages are
   defined according to the fairly high-resolution coding of [8] --
   Chinese as used on the mainland of the People's Republic of China
   ("zh-cn") can, at registry option, be coded differently and
   represented by a separate table compared to Chinese as used in Taiwan

   The design of the JET Guideline took one important constraint as a
   basis: IDNA was treated as a firm standard.  A procedure that
   modified some portion of the IDNA functions, or was a variant one
   them, was considered a violation of those standards and should not be
   encouraged (or, probably, even permitted).

   Each registry is expected to construct (or obtain) a table for each
   language it considers relevant and appropriate.  These tables list,
   for the particular zone, the characters permitted for that language.
   If a character does not appear as a "valid code point" in that table,
   than a name containing it cannot be registered.  If multiple
   languages are listed for the registration, then the character must
   appear in the tables for each of those languages.

   The tables may also contain columns that specify alternate or variant
   forms of the valid character.  If these variants appear, they are
   used to synthesize labels that are alternatives to the original one.
   These labels are all reserved and can be registered or "activated"
   (placed into the DNS) only by the action or request of the original
   registrant; some (the "preferred variant labels") are typically
   registered automatically.  The zone is expected to establish
   appropriate policies for situations in which the variant forms of one
   label conflict with already-reserved or already-registered labels.

Klensin                Expires December 16, 2003                [Page 4]

   Most of these concepts were introduced because of concerns about
   specific issues with CJK characters, beginning from the requirement
   that the use of Simplified Chinese by some registrants and
   Traditional Chinese by others not be permitted to create confusion or
   opportunities for fraud.  While they may be applicable to registry
   tables contructed for alphabetic scripts, the transfer should be done
   with care, since many analogies are not exact.

   Some of the important issues are discussed in the sections that

3. Reserved Names and Label Packages

   A basic assumption of the JET model is that, if the properties of
   Unicode [9], [10] and IDNA cause two strings to appear similar enough
   to cause confusion, either or both should be registered by the same
   party or one of them should become unregisterable.  The definition of
   "appear similar enough" will differ for different cultures and
   circumstances --and hence DNS zones-- but the principle is fairly
   general.  In the JET model, all of the "variant" strings are
   identified, some are placed into the DNS automatically, and others
   are simply reserved and can be activated, if at all, only be the
   original registrant.  Other zones might find other policies
   appropriate.  For example, a zone might conclude that having similar
   strings registered in the DNS was undesirable.  If so, the list of
   variant labels would be used only to build a list of names that would
   not be registerable.

4. Languages and Scripts

   Conversations about scripts -- collections of characters associated
   with particular languages -- are common when discussing character
   sets and codes.   But the boundaries between one script and another
   are not well-defined.  The Unicode Standard [9][10], for example,
   does not define them at all, even though it is written in terms of
   usually-related blocks of characters.  The issue is complicated by
   the common origin of most alphabetic scripts (Cf. [11]), with certain
   character-symbols appearing in the scripts associated with multiple
   languages, sometimes with very different sounds or meanings.  This
   differs from the CJK situation in which, if a character appears in
   more than one of the relevant languages, it will almost always have
   the same interpretation in each one and, at least for the subset of
   characters that actually are ideographs, pronunciation is expected to
   vary widely while meaning is preserved. At least in part because of
   that similarity of meaning, it made sense in the JET case to permit a
   registration to specfy multiple languages, to verify that the
   characters in the label string were valid for each, and then to
   generate variant labels using each language in turn.  For many
   alphabetic languages, it may make sense to prohibit the label string
   submitted for registration from being associated with more than one

Klensin                Expires December 16, 2003                [Page 5]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

   language.  Indeed, "one label, one language" has been suggested as an
   important barrier against common sources of "look-alike" confusion.
   For example, the imposition of that rule in a zone would prevent the
   insertion of a few Greek or Cyrillic characters with shapes identical
   to the Latin ones into what was otherwise a Latin-based string.  For
   a particular table, the list of valid characters may be thought of as
   the script associated with the relevant language, with the
   understanding that the table design does not prevent the same
   character from appearing in the tables for multiple languages.

   Indeed, this notion of a locally, and specifically, identified script
   can be turned around: while the tables are referred to as "language
   tables", they are associated with languages only insofar as thinking
   about the character structure and word forms associated with a given
   language helps to inform the construction of a table.  A country like
   Finland, for example, might use

   o  One table each for Finnish, Swedish, and English characters and
      conventions, permitting a string to be registered in one, two, or
      all three languages (although a three-language registration would
      presumably prohibit any characters that did not appear in all
      three languages).

   o  One table each, but with a "one label, one language" rule for the

   o  A combined table based on the observation that all three writing
      systems were based on Roman characters and that the possibilities
      for confusion that were of interest to registry would not be
      reduced by "language" differentiation.

   Regardless of what decisions were ade about those languages and
   scripts, if they also decided to permit registrations of labels
   containing Cyrillic characters, they might have a separate table for
   them.  That table might contain some Roman-derived characters (either
   as "valid" or as variants) just as some CJK tables do.  See also
   Section 6, below.

   It is also worth stressing, as the JET Guidelines do, that no tables
   or systems of this type -- even if identified with languages as a
   means of defining or describing those tables -- can assure linguistic
   or even syntactic correctness of labels with regard to that language.
   That level of assurance may not be possible without human
   intervention or at least dictionary lookups of complete proposed
   labels.   It may even not be desirable to attempt that level of
   correctness (see Section 6).

Klensin                Expires December 16, 2003                [Page 6]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

   Of course, if any language-based tests or constraints, including "one
   label, one language", are to be applied to limit those sources of
   confusion, each zone must have a table for each language in which it
   expects to accept registrations; the notion of a single combined
   table for the zone is simply unworkable.    One could use a single
   table for the zone if the intent were to impose only minimal
   restrictions, e.g., to force alphabetic and numeric characters only
   and exclude symbols and punctuation.  That type of restriction might
   be useful in eliminating some problems, such as those of unreadable
   labels, but would be unlikely to be very helpful with, e.g.,
   confusion caused by similar-looking characters.

5. Reservations and Exclusions

5.1 Sequence Exclusions for Valid Characters

   The JET Guidelines are based on processing only single characters.
   Any processing of pairs or longer sequences of characters are left to
   what that document describes as "additional processing" -- procedures
   specifically permitted by the Guildlines but defined by a registry in
   addition to the variant table processing specified in the Guidelines
   themselves.  A different zone, with different needs, could use a
   modified version of the table structure, or different types of
   additional processing, to prohibit, as well as accept, particular
   sequences of characters by marking them as invalid.  Other
   modifications or extensions might be designed to prevent certain
   letters from appearing at the beginning or end of labels.  The use of
   regular expressions  in the "valid characters" column might be one
   way to implement these types of restrictions.

5.2 Character Pairing Issues

   Some character pairings -- the use of a character form (glyph) in one
   language and a different form with the same properties in a related
   one -- closely approximate the issues with mapping between
   Traditional and Simplified Chinese although the history is different.
   For example, it might be useful to have "o" with a stroke (U+00F8) as
   a variant for "o" with diaeresis above it (U+00F6) (and the
   equivalent upper-case pair) in a Swedish table, and vice versa in a
   Norwegian one, or to prohibit one of these characters entirely in
   each table. Obviously, if the relevant language of registration is
   unknown, this type of variant matching cannot be applied in any
   sensible way.

6. Some Implications of this Approach

   Historically, DNS labels were considered to be arbitrary identifier
   strings, without any inherent meaning.  Even in ASCII, there was no

Klensin                Expires December 16, 2003                [Page 7]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

   requirement that labels form words.  Labels that could not possibly
   represent words in any Romance or Germanic language have actually
   been quite common.  In general, in those languages, words contain at
   least one vowel and do not have embedded numbers. The more one moves
   toward "language"-based registry restrictions, the less it is going
   to be possible to construct labels out of fanciful strings. Such
   strings may make very good identifiers, while being terrible
   candidates for "words".  To take a trivial example using only ASCII
   characters, "rtr32w", "rtr32x", and "rtr32z" might be very good DNS
   labels for a particular zone and application, but would fail even the
   most superficial of tests for valid Engish word forms given the
   embedded digits and lack of vowels.

   Consequently, registries applying the principles outlined in this
   document should be careful not to apply more severe restrictions than
   are reasonable and appropriate.

7. Conclusions and Recommendations

   Thinking about the implications of the use in DNS labels of the full
   range of characters permitted by IDNA has led multiple groups to the
   conclusion that some restrictions, on a per-registry or per-zone
   basis, are needed to prevent many forms of user confusion about the
   actual structure of a name or the word, phrase, or term that it
   appears to spell out.  It appears that the best way to approach such
   restrictions involves drawing from the language and culture of the
   community of registrants and users in the relevant zone: if
   particular characters are likely to be unintelligible to both of
   those groups, it is probably wise to not permit it to be used in
   registrations. Registration restrictions can be carried much further
   than restricting permitted characters to a selected Unicode subset.
   The idea of a reserved "package" of related labels permits
   probably-confusing combinations or sets of characters to be bound
   together, under the control of a single registrant.  While that
   registrant might use the package in a way that confused his or her
   own users, the possibility of turning potential confusion into a
   hostile attack would be considerably reduced.

   At the same time, excessive restrictions may make DNS identifiers
   less useful for their original, intended, purpose: identifying
   particular hosts and similar resources on the network in an orderly
   way. Registries creating rules and policies about what can be
   registered in particular zones -- whether those are based on the JET
   Guidelines or the suggestions in this document-- should balance the
   need for restrictions against the need for flexibility in
   constructing identifiers.

Klensin                Expires December 16, 2003                [Page 8]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

8. Security Considerations

   Registration of labels in the DNS that contain essentially
   unrestricted sequences of arbitrary Unicode characters may introduce
   several opportunities for either attacks or simple confusion.  Some
   of these risks, such as confusion about which character, of several
   that look alike), is actually intended, may be associated with the
   presentation form of DNS names.  Others may be linked to databases
   associated with the DNS, e.g., with the difficulty of finding an
   entry in a Whois file when it is not clear how to enter, or search
   for, the characters that make up a name.  This document discusses a
   family of restrictions on the names that can be registered that can
   be imposed on a DNS zone ("registry") and some possible tools for
   implementing restrictions of that sort.  No plausible set of
   restrictions will eliminate all problems and sources of confusion:
   for example, it has often been pointed out that the characters
   digit-one ("1") and lower case L ("l") can easily be confused in some
   fonts used to display ASCII.   But, to the degree to which security
   may be aided by sensible risk reduction, these techniques may be

9. Acknowledgements

   Discussions in the process of developing the JET Guidelines were
   vital in developing this document and all of the JET participants are
   consequently acknowledged.  Attempts to explain some of the issues
   there to, and feedback from, Vint Cerf, Wendy Rickard, and members of
   the ICANN IDN Committee were also helpful in the thinking leading up
   to this document.

   An effort by Paul Hoffman to create a generic specification for
   registration restrictions of this type helped to inspire this
   document, which takes a somewhat different, more language-oriented,

   The opinions expressed here are, of course, the sole responsibility
   of the author. Some of those whose ideas are reflected in this
   document may disagree with the conclusions the author has drawn from


   [1]   Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing
         Domain Names in Applications (IDNA)", RFC 3490, March 2003.

   [2]   Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile
         for Internationalized Domain Names (IDN)", RFC 3491, March

Klensin                Expires December 16, 2003                [Page 9]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

   [3]   Costello, A., "Punycode: A Bootstring encoding of Unicode for
         Internationalized Domain Names in Applications (IDNA)", RFC
         3492, March 2003.

   [4]   Mockapetris, P., "Domain names - implementation and
         specification", RFC 1035, STD 13, November 1987.

   [5]   Seng, J., Ed. and J. Klensin, Ed., "International Domain Names
         Registration and Administration Guidelines for Chinese,
         Japanese, and Korean", draft-jseng-idn-admin-03.txt (work in
         progress), June 2003.

   [6]   Internet Engineering Steering Group, IETF, "IESG Statement on
         IDN", IESG Statement IDNstatement.txt, February 2003.

   [7]   Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet host
         table specification", RFC 952, October 1985.

   [8]   Alvestrand, H., "Tags for the Identification of Languages", BCP
         47, RFC 3066, January 2001.

   [9]   The Unicode Consortium, "The Unicode Standard--Version 3.0",
         January 2000.

   [10]  The Unicode Consortium, "Unicode Standard Annex #28", March

   [11]  Drucker, J., "The Alphabetic Labyrinth: The Letters in History
         and Imagination", 1995.

Author's Address

   John C Klensin
   1770 Massachusetts Ave, #322
   Cambridge, MA  02140

   Phone: +1 617 491 5735

Klensin                Expires December 16, 2003               [Page 10]

Internet-Draft     DNS IDN Registration Restrictions           June 2003

Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive

Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an

Klensin                Expires December 16, 2003               [Page 11]

Internet-Draft     DNS IDN Registration Restrictions           June 2003



   Funding for the RFC Editor function is currently provided by the
   Internet Society.

Klensin                Expires December 16, 2003               [Page 12]