Internet Draft Dan Oscarsson
draft-ietf-idn-udns-01.txt Telia ProSoft
Updates: RFC 2181, 1035, 1034, 2535 27 August 2000
Expires: 27 February 2001
Using the Universal Character Set in the Domain Name System (UDNS)
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Since the Domain Name System (DNS) [RFC1035] was created there have
been a desire to use other characters than ASCII in domain names.
Lately this desire have grown very strong and several groups have
started to experiment with non-ASCII names.
This document defines how the Universal Character Set (UCS)
[ISO10646] can be used in DNS without extending the current [RFC1035]
protocol and how DNS is extended to overcome length limits in the
future.
1. Introduction
While the need for non-ASCII domain names have existed since the
creation of the DNS, the need have increased very much during the
last few years. Currently there are at least two implementations
using UTF-8 in use, and others using other methods.
Dan Oscarsson Expires: 27 Februray 2001 [Page 1]
Internet Draft Universal DNS 27 August 2000
To avoid several different implementations of non-ASCII names in DNS
that do not work together, and to avoid breaking the current ASCII
only DNS, there is an immediate need to standardise how DNS shall
handle non-ASCII names.
While the DNS protocol allow any octet in character data, so far the
octets are only defined for the ASCII code points. Octets outside the
ASCII range have no defined interpretation. This document defines how
all octets are to be used in character data allowing a standardised
way to use non-ASCII in DNS.
To support the software where only ASCII host and domain names are
allowed, this document defines how resource records are to be
returned in a response to avoid breaking that software.
The specification here conforms to the IDN requirements [IDNREQ].
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
IDN: Internationalised Domain Name, here used to mean a domain name
containing non-ASCII characters.
ACE: ASCII Compatible Encoding. Used to encode IDNs in a way
compatible with the ASCII host name syntax.
1.2 Previous versions of this document
The second version of this document was available as draft-ietf-idn-
udns-00.txt. It included a lot of possibilities as well as a flag bit
that is now removed.
The first version of this document was available as draft-oscarsson-
i18ndns-00.txt.
2. The DNS Protocol
The DNS protocol is used when communicating between DNS servers and
other DNS servers or DNS clients. User interface issues like the
format of zone files or how to enter or display domain names are not
part of the protocol.
The update of the protocol defined here can be used immediately as it
is fully compatible with the DNS of today.
Dan Oscarsson Expires: 27 Februray 2001 [Page 2]
Internet Draft Universal DNS 27 August 2000
2.1 Character data
Character data need to be able to represent as much as possible of
the characters in the world as well as being compatible with ASCII.
It must also be well defined so that it can easily be handled and
should be compact as only 63 octets is available without an extension
of the protocol. Character data is used in labels and in text fields
in the RDATA part of a RR.
Character data used in the DNS protocol MUST:
- Use ISO 10646 (UCS) [ISO10646] as coded character set.
- Be normalised using form C as defined in Unicode technical report
#15 [UTR15]. See also [CHNORM].
- Encoded using the UTF-8 [RFC2279] character encoding scheme.
2.2 Name matching
RFC 1035 states that the labels of a name are matched case-
insensitively. When using UCS this is no longer enough as there are
other forms than case that need to match as equivalent.
The original definition is now extended to be: labels must be
compared using form-insensitivity.
For the UCS character code range 0-255 (ASCII and ISO 8859-1) the
case folding MUST be done by case-insensitive matching following the
one to one mapping as defined in the Unicode 3.0 Character Database
[UDATA].
How to do form-insensitive matching for the rest of UCS will be
defined in a separate document.
2.2.1 Rules for matching of domain names in DNS servers
To be able to handle correct domain name matching in lookups, the
following MUST be followed by DNS servers:
- Do matching on authorative data using form-insensitive matching
for the characters used in the data (for example a zone using only
ASCII need only handle matching of ASCII characters).
- On non-authorative data, either do binary matching or case-
insensitive matching on ASCII letters and binary matching on all
others.
The effect of the above is:
- only servers handling authorative data must implement form-
insensitive matching of names. And they need only implement the
subset needed for the subset of characters of UCS they support in
their authorative zones.
Dan Oscarsson Expires: 27 Februray 2001 [Page 3]
Internet Draft Universal DNS 27 August 2000
- it normally gives fast lookup because data is usually sent like:
resolver <-> server <-> authorative server.
While form-insensitive matching can be complex and CPU consuming,
the server in the middle will do caching with only simple and fast
binary matching. So the impact of complex matching rules should
not slow down DNS very much.
2.3 Supporting older software and allowing for ASCII aliases.
As there is a lot of software expecting host and domain names to only
use a subset of ASCII, they may work incorrectly if receiving a
response with non-ASCII characters. And when communicating between
nations it is sometimes good to also have a version of a name that
can be used by most people.
To support this the following MUST be followed:
- Queries for PTR records must return two records if the name
pointed to includes non-ASCII. They may also return two records if
an alternative name exist for the object pointed to.
The two records MUST be ordered with the ASCII version of the name
first and the non-ASCII or true name second. The second record
defines the true name of the object, the first record an ASCII
version of the name.
Note: older software will normally stop analysing a response when
finding the first PTR record so they will get the ASCII name.
Newer software can select the name best suited for its needs.
- Queries for other records with non-ASCII in the RDATA section MUST
return an ASCII version also, unless the client is known to handle
non-ASCII.
At a future date IETF can decide that it is no longer necessary to
support the software only handling ASCII names, and the servers can
stop including ASCII versions in the responses.
NOTE: a cache server shall return data in the same way as an
authorative server. If some do not and change the order of the PTR
records, some old software will not get the ASCII version of the
name.
2.3.1 ASCII versions of a name
When returning an ASCII version of a name, there are two
possibilities: returning a user defined ASCII alias or an ASCII
compatible encoding (ACE) of the name.
The ASCII Compatible Encoding (ACE) is used to support older software
expecting only ASCII and to support downgrading from 8-bit to 7-bit
Dan Oscarsson Expires: 27 Februray 2001 [Page 4]
Internet Draft Universal DNS 27 August 2000
ASCII in other protocols (like SMTP). It is a transition mechanism
and will no longer be supported at some future time when it is so
decided.
All software following this specification MUST recognise ACE and
decode them into their true name when doing matching and handling. A
DNS server must recognise ACE in a query.
The definition of the ACE to be used, is defined in a separate
document. Typical definitions that are suitable are [SACE] and
[RACE].
NOTE: To support the transition to UTF-8 in resolver code, it is
recommended that a server recognise local encodings for the zones it
is authorative for. This will allow clients using the local character
set in many cases even before the resolver code is upgraded.
2.4 Handling long names
The current DNS protocol limits a label to 63 octets. As UTF-8 take
more than one octet for some characters, an UTF-8 name cannot have 63
characters in a label like an ASCII name can. For example a name
using Hangul would have a maximum of 21 characters.
The limits imposed by RFC 1035 is 63 octets per label and 255 octets
for the full name. The 255 limit is not a protocol limit but one to
simplify implementations.
To support longer names a long label type is defined using [RFC2671]
as extended label 0b000011 (the label type will be assigned by IANA
and may not be the number used here).
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|0 1 0 0 0 0 1 1| length | label data ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
length: length of label in octets
label data: the label
The long label MUST be handled by all software following this
specification. Also, they MUST support a UDP packet size of up to
1280 bytes.
The limits for labels are updated since RFC 1025 as follows:
A label is limited to a maximum of 63 character code points in UCS
Dan Oscarsson Expires: 27 Februray 2001 [Page 5]
Internet Draft Universal DNS 27 August 2000
normalised using Unicode form C. The full name is limited to a
maximum of 255 character code points normalised as for a label.
As long labels are not understood by older software, a response MUST
not include a long label unless the query did. At a later date, IETF
may change this.
2.5 Handling to large responses and identifying non-ASCII clients
If a client sends the QNAME in the query using the long label type,
the client shows that it implements this specification and do not
need ASCII compatibility.
If the client is not identified to follow this specification, the UDP
packet size is limited to 512 bytes. If then a response will not fit,
the response MUST set the TC bit (truncated) to indicate that. A
client may then resend the query using a long label in the query to
show that it can handle larger responses.
2.6 DNSSEC
As labels now can have non-ASCII in them, DNSSEC [RFC2535] need to be
revised so that it also can handle that.
3. User interface issues
Locally on a system or in a user interface a different character set
than the one defined to be used in the DNS protocol may be used.
Therefore software must map between the local character set and the
character set of the protocol, so that human beings can understand
it.
This means that a zone file that is edited in a text editor by a
person before being loaded into a DNS server must be allowed to be in
the local character set. Software may not assume that the user can
edit text encoded in UTF-8. A zone file transmitted between DNS
software that is not handled by a human, can be transmitted using any
format.
When character data is presented to a human or entered by a human,
software must, as good as possible, present it using local character
set and allow it to be entered using the local character set. It is
the responsibility of the software to convert between the local
character set and the one used in the protocol, not the human.
The down coding defined above allows all names to be entered and
displayed for all users, as long as at least the ASCII characters are
supported.
Dan Oscarsson Expires: 27 Februray 2001 [Page 6]
Internet Draft Universal DNS 27 August 2000
4.1 Applications using DNS software
If an application does a call to DNS, it must present the data to the
users in the local character set used by the user, down coding if
necessary. Software used to access DNS should give the application
programmer both the possibility of doing queries and getting
responses using local character set, and using UTF-8.
APIs like getipnodebyname should be updated with a IDN flag that
results in the name being returned using the current locale, instead
of native UTF-8 or ASCII format.
5. Effect on other protocols
As now a domain name may include non-ASCII many other protocols that
include domain names need to be updated. For example SMTP, HTTP and
URIs. The ACE format can be used when interfacing with ASCII only
software or protocols. Protocols like SMTP could be extended using
ESMTP and a UTF8 option that defines that all headers are in UTF-8.
It is recommended that protocols updated to handle i18n do this by
encoding character data in the same standard format as defined for
DNS in this document (UCS normalised form C). The use of encoding it
in ASCII or by tagged character sets should be avoided.
DNS do not only have domain names in them, for example e-mail
addresses are also included. So an e-mail address would be expected
to be changed to include non-ASCII both before and after the @-sign.
Software need to be updated to follow the user interface
recommendations given above, so that a human will see the characters
in their local character set, if possible.
5.1 An example: SMTP
When using SMTP it may be extended to allow UTF-8 in headers and
addresses. It will then have to, when transferring an e-mail to a
SMTP system that have not been extended, encoded e-mail addresses and
IDNs into an ACE.
In this case an e-mail address could look like:
ra--XXXXX.surname@ra--YYYYY.com
where ra--XXXXX is the ACE of the given name and ra--YYYYY is the ACE
of one part of the domain name.
6. Security Considerations
As always with data, if software does not check for data that can be
Dan Oscarsson Expires: 27 Februray 2001 [Page 7]
Internet Draft Universal DNS 27 August 2000
a problem, security may be affected. As more characters than ASCII is
allowed, software only expecting ASCII and with no checks may now get
security problems.
7. References
[RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
STD 13, RFC 1034, November 1987.
[RFC1035] P. Mockapetris, "Domain Names - Implementation and
Specification", STD 13, RFC 1035, November 1987.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.
[RFC2181] R. Elz and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997.
[RFC2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646",
RFC 2279, January 1998.
[RFC2535] D. Eastlake, "Domain Name System Security Extensions".
RFC 2535, March 1999.
[RFC2671] P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC
2671, August 1999.
[ISO10646] ISO/IEC 10646-1:2000. International Standard --
Information technology -- Universal Multiple-Octet Coded
Character Set (UCS)
[Unicode] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at
http://www.unicode.org/unicode/standard/versions/
Unicode3.0.html
[UTR15] M. Davis and M. Duerst, "Unicode Normalization Forms",
Unicode Technical Report #15, Nov 1999,
http://www.unicode.org/unicode/reports/tr15/.
[UTR21] M. Davis, "Case Mappings", Unicode Technical Report #21,
Dec 1999, http://www.unicode.org/unicode/reports/tr21/.
[UDATA] The Unicode Character Database,
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt.
The database is described in
ftp://ftp.unicode.org/Public/UNIDATA/
UnicodeCharacterDatabase.html.
Dan Oscarsson Expires: 27 Februray 2001 [Page 8]
Internet Draft Universal DNS 27 August 2000
[IDNREQ] James Seng, "Requirements of Internationalized Domain
Names", draft-ietf-idn-requirement.
[IANADNS] Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name
System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns.
[IDNE] Marc Blanchet,Paul Hoffman, "Internationalized domain
names using EDNS (IDNE)", draft-ietf-idn-idne.
[CHNORM] M. Duerst, M. Davis, "Character Normalization in IETF
Protocols", draft-duerst-i18n-norm.
[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.
[NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name
Proposals", draft-ietf-idn-compare.
[SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding", draft-
ietf-idn-sace.
[RACE] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding
for IDN", draft-ietf-idn-race.
8. Acknowledgements
Paul Hoffman giving many comments in our e-mail discussions.
Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent
Karlsson.
Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for
comments on my draft.
Discussions and comments by the members of the IDN working group.
Author's Address
Dan Oscarsson
Telia ProSoft AB
Box 85
201 20 Malmo
Sweden
E-mail: Dan.Oscarsson@trab.se
Dan Oscarsson Expires: 27 Februray 2001 [Page 9]
Internet Draft Universal DNS 27 August 2000
Dan Oscarsson Expires: 27 Februray 2001 [Page 10]