Network Working Group J. Klensin
Internet-Draft October 18, 2003
Expires: April 17, 2004
National and Local Characters in DNS TLD Names
draft-klensin-idn-tld-01.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 except that the right to
produce derivative works is not granted.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 17, 2004.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
In the context of work on internationalizing the Domain Name System
(DNS), there have been extensive discussions about "multilingual" or
"internationalized" top level domain names (TLDs), especially for
countries whose predominant language is not written in a Roman-based
script. This document reviews some of the motivations for such
domains and the constraints that the DNS imposes. It then suggests
an alternative, local translation, that may solve a superset of the
problem while avoiding protocol changes, serious deployment delays,
and other difficulties. The suggestion utilizes a localization
technique to permit any TLD to be accessed using the characters of
any language not merely language- or country-specific "multilingual"
TLDs in the language(s) and script(s) of that country.
Klensin Expires April 17, 2004 [Page 1]
Internet-Draft Characters in DNS TLD Names October 2003
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Background on the "Multilingual Name" Problem . . . . . . . 3
1.1.1 Approaches to the requirement . . . . . . . . . . . . . . . 3
1.1.2 Writing the name of one's country in its own characters . . 4
1.1.3 Countries with multiple languages and countries with
multiple names . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Domain Name System Constraints . . . . . . . . . . . . . . . 5
1.2.1 Administrative Hierarchy . . . . . . . . . . . . . . . . . . 5
1.2.2 Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Internationalization and Localization . . . . . . . . . . . 6
2. Client-side solutions . . . . . . . . . . . . . . . . . . . 6
2.1 IDNA and the client . . . . . . . . . . . . . . . . . . . . 7
2.2 Local translation tables for TLD names . . . . . . . . . . . 7
3. Advantages and disadvantages of local translation . . . . . 7
3.1 Every TLD in the local language and character set . . . . . 7
3.2 Unification of country code domains . . . . . . . . . . . . 8
3.3 User understanding of local and global references . . . . . 9
3.4 Limits on TLD Propagation . . . . . . . . . . . . . . . . . 9
4. Security Considerations . . . . . . . . . . . . . . . . . . 9
5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 10
References . . . . . . . . . . . . . . . . . . . . . . . . . 10
Author's Address . . . . . . . . . . . . . . . . . . . . . . 11
Intellectual Property and Copyright Statements . . . . . . . 12
Klensin Expires April 17, 2004 [Page 2]
Internet-Draft Characters in DNS TLD Names October 2003
1. Introduction
1.1 Background on the "Multilingual Name" Problem
People who share a language usually prefer to communicate in it,
using whatever characters are normally used to write that language,
rather than in some "foreign" one. There have been standards for
using mutually-agreed characters and languages in electronic mail
message bodies and selected headers since the introduction of MIME in
1992 [MIME] and the Web has permitted multilingual text since its
inception. However, since domain names are exposed to users in email
addresses and URLs (and corresponding arrangements for other
protocols) demand rapidly arose to permit domain names in
applications that used characters other than those of the very
restrictive, ASCII-subset, "hostname" or "letter-digit-hyphen"
("LDH") conventions recommended in the DNS specifications [RFC1035].
The effort to do this rapidly became known as "multilingual domain
names", although that is a misnomer, since the DNS deals only with
characters and identifier strings, and not, except by accident, what
people usually think of as "names". And there has been little actual
interest in what would actually be a "multilingual name", i.e., a
name that contains components from more than one language. Instead,
interest has focused on the use, in the context of the DNS, of
strings that conform to specific languages.
1.1.1 Approaches to the requirement
If the requirement is seen, not as "modifying the DNS", but as
"providing users with access to the DNS from a variety of languages
and character sets", three sets of proposals have emerged in the IETF
and elsewhere. They are:
1. Perform processing in client software that recodes a user-visible
string into an ASCII-compatible form that can safely be passed
through the DNS protocols and stored in the DNS. This is the
approach used, for example, in the IETF's "IDNA" protocol
[RFC3490]
2. Modify the DNS to be more hospitable to non-ASCII names and
strings. There have been a variety of proposals to do this in
almost as many ways, some of which have been implemented on a
proprietary basis by various vendors. None of them have gained
acceptance in the IETF community, primarily because they would
take a long time to deploy, would leave many problems unsolved,
and that might cause problems with deployed solutions that had
not yet been upgraded.
3. Move the problem out of the DNS entirely, relying instead on a
Klensin Expires April 17, 2004 [Page 3]
Internet-Draft Characters in DNS TLD Names October 2003
"directory" or "presentation" layer to handle
internationalization. The rationale for this approach is
discussed in [RFC3467].
This document proposes a fourth approach, applicable to the top level
domains (TLDs) only (see Section 1.2.1 for a discussion of the
special issues that make TLDs problematic). That approach could be
used as an alternate or supplement to the strategies summarized
above.
1.1.2 Writing the name of one's country in its own characters
An early focus of the "multilingual domain name" efforts was
expressed in statements such as "users in my country, in which ASCII
is rarely used, should be able to write an entire domain name in
their own character set". In particular, since all top-level domain
names, at present, follow the LDH rules, the somewhat more
restrictive naming rules discussed in [RFC1123], and the coding
conventions specified in [RFC1591], all fully-qualified DNS names
were effectively required to contain at least one ASCII label (the
TLD name), and that was considered inappropriate. One should,
instead, be able to write the name of the ccTLD for China in Chinese,
the name of the ccTLD for Saudi Arabia in Arabic, and so on. That
much could be accomplished, given updated applications, by using a
new TLD name with IDNA encoding. But, if one examines (or even
thinks about) user behavior and preferences, it is almost as
important that one be able to write the name of the ccTLD for China
in Arabic and that of Saudi Arabia in Chinese: true
internationalization implies that, at least to the extent to which
ambiguity and conflicts can be avoided, people should be able to use
the languages and character sets they prefer. For the same reasons
that one would like to have all-Chinese domain names available in
China, it is important to have the capability to have an apparent
Chinese TLD for domain whose second level and beyond are Chinese
characters, even when the TLD itself serves predominantly
non-Chinese-speaking registrants and users.
1.1.3 Countries with multiple languages and countries with multiple
names
From a user interface standpoint, writing ccTLD names in local
characters is a problem. As discussed below in Section 1.2.2, the
DNS itself does not easily permit a domain to be referred to by more
than one name (or spelling or translation of a name). Countries with
more than one official language would require that the country name
be represented in each of those languages. And, just as it is
important that a user in China be able to represent the name of the
Chinese ccTLD in Chinese characters, she should be able to access a
Klensin Expires April 17, 2004 [Page 4]
Internet-Draft Characters in DNS TLD Names October 2003
Chinese-language site in France using Chinese characters. That would
require that she be able to write the name of the French ccTLD in
those characters rather than in a form based on a Roman character
set.
1.2 Domain Name System Constraints
1.2.1 Administrative Hierarchy
The domain name system is designed around the idea of an
"administrative hierarchy", with the entity responsible for a given
node of the hierarchy responsible for policies applicable to its
subhierarchies (Cf. [RFC1034] and [RFC1035]). The model works quite
well for the domain and subdomains of a particular enterprise. In an
enterprise situation, the hierarchy can be organized to match the
organizational structure; there are established ways to set policies;
and there are, at least presumably, shared assumptions about overall
goals and objectives among all registrants in the domain. It is more
problematic when a domain is shared by unrelated entities which lack
common policy assumptions. It is difficult to reach agreement on
rules that should apply to all of them. That situation always
prevails for the labels registered in a TLD (second-level names)
except in those TLDs for which the second level is structural (e.g.,
the .CO, .AC, .GOV conventions in many ccTLDs or in the historical
geographical organization of .US [RFC1480]) in which case, it exists
for the labels within that structural level.
TLDs may, but need not, have consistent registration policies for
those second (or third) level names. Countries (or ccTLD
administrators) have often adopted rules about what entities may
register in their ccTLDs, and what forms the names may take. RFC
1591 outlined registration norms for most of the gTLDs, even though
those norms have been largely ignored in recent years. And some
recent "sponsored" domains are based on quite specific rules about
appropriate registrations. Homogeneous registration rules for the
root are, by contrast, impossible: almost by definition, the
subdomains (TLDs) registered in the root are diverse and no single
policy applying to all root subdomains is feasible.
1.2.2 Aliases
In an environment different from the DNS, a rational way to permit
assigning local-language names to a country code (or other) domain
would be to set up an alias for the name, or to use some sort of "see
instead" reference. But the DNS does not have quite the right
facilities for either. Instead, it supports a "CNAME" record, whose
label can refer only to a particular label and not to a subtree. For
example, if A.B.C is a fully-qualified name, then a CNAME reference
Klensin Expires April 17, 2004 [Page 5]
Internet-Draft Characters in DNS TLD Names October 2003
from X to A would make X.B.C appear to have the same values as A.B.C.
However, a CNAME reference from Y to C would not make A.B.Y
referenceable (or even defined) at all. A second record type, DNAME
[RFC2672], can provide an alias for a portion of the tree. But it is
problematic technically, and its use is strongly discouraged except
as a means of enabling a transition from one domain to another.
1.3 Internationalization and Localization
It has often been observed that, while many people talk about
"internationalization", they often really mean, and want,
"localization". "Internationalization" in this context, suggests
making something globally accessible while incorporating a
broad-range "universal" character set and conventions appropriate to
all languages and cultures. "Localization", by contrast, involves
having things work well in a particular locality or for a broad range
of localities, although aspects of the style of operation might
differ for each locality. Anything that actually involves the DNS
must be global, and hence internationalized, since the DNS cannot
meaningfully support different responses based, e.g., on the location
of the user making a query. While the DNS cannot support
localization internally, many of the features discussed earlier in
this section are much more easily thought about in local terms
--whether localized to a geographical area, users of a language, or
using some other criteria -- than in global ones.
2. Client-side solutions
Traditionally, the IETF has avoided becoming involved in
standardization for actions that take place strictly on individual
hosts on the network, assuming that it should confine itself to
behavior that is observable "on the wire", i.e., in protocols between
network hosts. Exceptions to this general principle have been made
when different clients were required to utilize data or interpret
values in compatible ways to preserve interoperability: the standards
for email and web body formats, and IDNA itself, are examples of
these exceptions. Regardless of what is required to be standardized,
it is almost never required, and often unwise, that a user interface
present "on the wire" formats to the user, at least by default
(debugging options that show the wire formats are common and often
quite useful). However, in most cases when the presentation format
and the wire format differ, the client program must take precautions
that the wire format can be reconstructed from user input, or to keep
the wire format, while hidden, bound to the presentation mechanism so
that it can be reconstructed. While it is rarely a goal in itself,
it is often necessary that the user be at least vaguely aware that
the wire ("real") format is different from the presentation one and
that the wire format be available for debugging.
Klensin Expires April 17, 2004 [Page 6]
Internet-Draft Characters in DNS TLD Names October 2003
2.1 IDNA and the client
As mentioned above, IDNA itself is entirely a client-side protocol.
It works by providing labels to the DNS in a special format called
"punycode" [RFC3492]. When labels in that format are encountered,
they are transformed, by the client, back into internationalized
(normally Unicode ) characters. In the context of this document, the
important obvservation about IDNA is that any application program
that supports it is already doing considerable transformation work on
the client; it is not simply presenting the "on the wire" formats to
the user.
2.2 Local translation tables for TLD names
We suggest that, in addition to maintaining the code and tables
required to support IDNA, clients may want to maintain a table that
contains a list of TLDs and locally-desirable names for each one. For
ccTLDs, these might be the names (or locally-standard abbreviations)
by which the relevant countries are known locally (whether in ASCII
characters or others). With some care on the part of the application
designer (e.g., to ensure that local forms do not conflict with the
actual TLD names), a particular TLD name input from the user could be
either in local or standard form without special tagging or problems.
When DNS names are received by these client programs, the TLD labels
would be mapped to local form before IDNA is applied to the rest of
the name; when names are received from users, local TLD names would
be mapped to the global ones before being passed into IDNA or used in
other DNS processing.
3. Advantages and disadvantages of local translation
3.1 Every TLD in the local language and character set
The notion of a top-level domain whose name matches, e.g., the name
that is used for a country in that country or the name of a language
in that language as, as mentioned above, immediately appealing. But
most of the reasons for it argue equally strongly for other TLDs
being accessible from that language. A user in Korea who can access
the national ccTLD in the Korean language and character set has every
reason to expect that both generic top level domains and and domains
associated with other countries would be similarly accessible,
especially if the second-level domains bear Korean names. A user in
Spain or Portugal, or in Latin America, would presumably have similar
expectations, but would expect to use Spanish names, not Korean ones.
That level of local optimization is not realistic --some would argue
not possible-- with the DNS since it would ultimately require that
every top level domain be replicated for each of the world's
Klensin Expires April 17, 2004 [Page 7]
Internet-Draft Characters in DNS TLD Names October 2003
languages. That replication process would involve not just the top
level domain itself: in principle, all of its subtrees would need to
be completely replicated as well. Perhaps, in practice, not all
subtrees would require replication, but only those for which a
language variation or translation was significant. But, while that
restriction would change the scale of the problem, it would not alter
its basic nature. The administrative hierarchy characteristics of the
DNS (see Section 1.2.1) turn the replication process into an
administrative nightmare: every administrator of a second-level
domain in the world would be forced to maintain dozens, probably
hundreds, of similar zone files for the the replicates of the domain.
Even if only the zones relevant to a particular country or language
were replicated, the administrative and tracking problems to bind
these to the appropriate top-level domain and keep all of the
replicas synchronized would be extremely difficulty at best. And
many administrators of third- and fourth-level domains, and beyond,
would be faced with similar problems.
By contrast, dealing with the names of TLDs as a localization
problem, using local translation, is fairly simple. Each function
represented by a TLD -- a country, generic registrations, or
purpose-specific registrations -- could be represented in the local
language and character set as needed. And, for countries with many
languages, or users living, working, or visiting countries where
their language was not dominant, "local" could be defined in terms of
the needs or wishes of each particular user.
3.2 Unification of country code domains
It follows from some of the comments above that, while there appears
to be some immediate appeal from having (at least) two domains for
each country, one using the ISO 3166-1 code and another one using a
name based on the national name in the national language, such a
situation would create considerable problems for registrants in the
multiple domains. For registrants maintaining enterprise or
organizational subdomains, ease of administration in a single family
of zone files will usually make a registration in a single top-level
domain preferable to replicated sets of them, at least as long as
their functional requirements (such a local-language access) are met
by the unified structure.
For countries with multiple national languages that are considered
equal and legally equivalent, the advantages of a translation-based
approach, rather than multiple registrations and replicated trees,
would be even more significant.
Of course, having replicated domains might be popular with some
registries and registrars, since replication would almost inevitably
Klensin Expires April 17, 2004 [Page 8]
Internet-Draft Characters in DNS TLD Names October 2003
increase the total number of domains to be registered.
3.3 User understanding of local and global references
While the IDNA tables (actually Nameprep [RFC3491] and Stringprep
[RFC3454]) must be identical globally for IDNA to work reliably, the
tables for mapping between local names and TLD names could be locally
determined, and differ from one locale to another, as long as users
understood that international interchange of names required using the
standard forms. That understanding could be assisted by software.
It is likely that, at least for the foreseeable future, DNS names
being passed among users in different countries, or using different
languages, will be forced to be in punycode form to guarantee
compatibility in any event, so the marginal knowledge or effort
needed to put TLD names into standard form and transmit them that way
would be very small.
3.4 Limits on TLD Propagation
The concept of using local translation does have one side effect,
which some portions of the Internet community might consider
undesirable. The size and complexity of translation tables, and
maintaining those tables, will be, to a considerable extent, a
function of the number of top-level domains, the frequency with which
new domains are added, and the number of domains that are added at a
time. A country or other locale that wished to maintain a complete
set of translations (i.e., so that every TLD had a representation in
the local language) would presumably find setting up a table for the
current collection of a few hundred domains to be a task that would
take some days. If the number of TLDs were relatively stable, with a
relatively small number being added at infrequent intervals, the
updates could probably be dealt with on an ad hoc basis. But, if
large numbers of domains were added frequently, or if the total
number of TLDs became very large, maintaining the table might require
dedicated staff. Worse, updating the tables stored on client machines
might require update and synchronization protocols and all of the
complexities that tend to go with such protocols.
4. Security Considerations
IDNA provides a client-based mechanism for presenting Unicode names
in applications while passing only ASCII-based names on the wire. As
such, it constitutes a major step along the path of introducing a
client-based presentation layer into the Internet. Client-based
presentation layer transformations introduce risks from
non-conforming tables that can change meaning without external
protection. For example, if a mapping table normally maps A onto C
and that table is altered by an attacker so that A maps onto D
Klensin Expires April 17, 2004 [Page 9]
Internet-Draft Characters in DNS TLD Names October 2003
instead, much mischief can be committed. On the other hand, these
are not the usual sort of network attacks: they may be thought of as
falling into the "users can always cause harm to themselves"
category. The local translation model outlined here does not
significantly increase the risks over those associated with IDNA, but
may provide some new avenues for exploiting them.
Both this approach and IDNA rely on having updated programs present
information to the user in a very different form than the one in
which it is transmitted on the wire. Unless the internal (wire) form
is always used in interchange, there are possibilities for ambiguity
and confusion about references.
5. Acknowledgments
This document was inspired by a number of conversations in ICANN,
IETF, MINC, and private contexts about the future evolution and
internationalization of top level domains. Unknown to the author,
but unsurprisingly (the general concept should be obvious to anyone
even slightly skilled in the relevant technologies), the concept has
been apparently developed independently in other groups, including
JET, but, as far as this author knows, not written up for general
comment. Discussions within, and about, the ICANN IDN Committee have
been particularly helpful, although several of the members of that
committee may be surprised about where those discussions led. Email
correspondence with several people after the first version of this
document was posted, notably Richard Hill, Paul Hoffman, S L Lee, and
Soobok Lee, led to considerable clarification in the subsequent
versions.
References
[ISO10646]
International Organization for Standardization,
"Information Technology - Universal Multiple-octet coded
Character Set (UCS) - Part 1: Architecture and Basic
Multilingual Plane", ISO Standard 10646-1, May 1993.
[MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing
the Format of Internet Message Bodies", RFC 1341, June
1992.
Updated and replaced by Freed, N. and N. Borenstein,
"Multipurpose Internet Mail Extensions (MIME) Part One:
Format of Internet Message Bodies", RFC2045, November
1996. Also, Moore, K., "Representation of Non-ASCII Text
in Internet Message Headers", RFC 1342, June 1992. Updated
Klensin Expires April 17, 2004 [Page 10]
Internet-Draft Characters in DNS TLD Names October 2003
and replaced by Moore, K., "MIME (Multipurpose Internet
Mail Extensions) Part Three: Message Header Extensions for
Non-ASCII Text", RFC 2047, November 1996.
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, November 1987.
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application
and Support", STD 3, RFC 1123, October 1989.
[RFC1480] Cooper, A. and J. Postel, "The US Domain", RFC 1480, June
1993.
[RFC1591] Postel, J., "Domain Name System Structure and Delegation",
RFC 1591, March 1994.
[RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC
2672, August 1999.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
December 2002.
[RFC3467] Klensin, J., "Role of the Domain Name System (DNS)", RFC
3467, February 2003.
[RFC3490] Faltstrom, P., Hoffman, P. and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)", RFC
3491, March 2003.
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
for Internationalized Domain Names in Applications
(IDNA)", RFC 3492, March 2003.
Klensin Expires April 17, 2004 [Page 11]
Internet-Draft Characters in DNS TLD Names October 2003
Author's Address
John C Klensin
1770 Massachusetts Ave, #322
Cambridge, MA 02140
USA
Phone: +1 617 491 5735
EMail: john-ietf@jck.com
Klensin Expires April 17, 2004 [Page 12]
Internet-Draft Characters in DNS TLD Names October 2003
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive
Director.
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
Klensin Expires April 17, 2004 [Page 13]
Internet-Draft Characters in DNS TLD Names October 2003
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Klensin Expires April 17, 2004 [Page 14]