Internet Draft                                         Authors: Li Ming TSENG
<draft-ietf-idn-uname-01.txt>                                     Jan Ming HO
13 Jul 2001                                                      Hua Lin QIAN
Expires 13 Jan 2002                                               Kenny HUANG                                                               Editor: James SENG

       Internationalized Domain Names and Unique Identifiers/Names

Status of this Memo

    This document is an Internet-Draft and is in full conformance
    with all provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet
    Engineering Task Force (IETF), its areas, and its working
    groups. Note that other groups may also distribute working
    documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by other
    documents at any time. It is inappropriate to use Internet-
    Drafts as reference material or to cite them other than as
    "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html


Terminology

The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119].

Abstract

One of the biggest technical challenge of Internationalized Domain
Names (IDN) is how to determine if the two given domain names matches.
The current approach to this problem is via a process known as
[NAMEPREP].

This document attempts to describe an alternative view and solution to
the IDN matching problem. It could be treated as a further process of
NAMEPREP and it is compatible with the IDNA aproach.

There is a practical case to indicate that using CNAME to implement
UNAME is workable for Internet application to fetch a unique name.

1. Introduction

The Chinese Domain Name Consortium (CDNC) has taken a very keen
interest in the IDN, in particular, the uses of chinese script in the
domain names. CDNC are formed by the regional registries (CNNIC, TWNIC,
HKNIC and MONIC) and have experimented doing Chinese Domain Names
System for many months.

The primarily motivation for this proposal is due to the lack of
support of Traditional and Simplified Chinese in NAMEPREP. See [HAN]
for a discussion of Traditional/Simplified Han Ideograph problems.

In addition, given the operational experience of the registries, this
proposal will reduce the operational and deployment cost from a TLD
managers' perspective based on the examinations and developments in
CDNC.

Backward compatibility, interoperability, scalability, security,
operational and deployment are all elements that must be considered as
part of criteria when designing internationalized domain name system.

2. Background on Legacy Encoding

The most popular Chinese character set used in Taiwan is the
industrial standard "BIG5" and the corresponding one in China is
"GBK". BIG5 have primarily Traditional Chinese characters and GBK have
Simplified Chinese.
In addition, the China government has also mandated that all Chinese
software in China must support a new standard that supercede GBK known
as GB18030.

Both BIG5 and GBK are widely used in China, Taiwan, Hong Kong and
Macao and supported within many operating systems including Windows.
Thus, supporting these encodings in IDN is essential from a
geographical perspective.

3. An overview of current proposals and its problems

3.1. ASCII Compatible Encoding (ACE)

The need of supporting ACE in IDN has been extensively discussed in
the IDN Working Group. Backward compatibility is the strongest
advantage of ACE. The deployment of ACE neither affects the existing
naming infrastructure, nor creates potential damage of current
Internet applications. To move the current Internet to multilingual
infrastructure, ACE obviously is the most appropriate bridging
solution.

Although ACE has the advantages mentioned above, but most of the
user's systems support local encoding. User doesn't want to download
any special software or upgrade their software in order to handle
multilingual domain name system. The support of native encoding
without altering user's software has became an important issue for
TLD managers'.

3.2. NAMEPREP

The design goal of [NAMEPREP] is to allow users to enter host names
in applications and have the highest chance of getting the name
correct. The NAMEPREP process comprises of three basic steps, namely
"MAP", "NORMALIZATION" and "PROHIB".

The MAP and NORMALIZATION step aims to reduce the number of possible
representations domain name that should be equivalent. These are
based upon Unicode Technical Reports [UTR15] and [UTR21]. However,
when there are multiple representations of the same domain name but
matching changes depending on languages and context, NAMEPREP will
fail in these cases. Of our interest, Traditional and Simplified
Chinese ideograph cannot be handled by NAMEPREP.

4. Alternative view to the problem space

While the IDN WG has been working very hard to solve the ACE and
NAMEPREP in IDN, it is apparently that there is another view to these
problems that may give us a different approach and solution.

First, there is an assumption that NAMEPREP IDN is ISO10646/Unicode
string. In reality, most IDN is often encoded in legacy encoding and
a additional step have to be taken to covert it to ISO10646/Unicode.

Other than the backward compatibility feature of ACE, ACE is also an
identifier string for an IDN. And the NAMEPREP process is to unify the
various possible representations of IDNs to a single "unique name" for
matching purposes.

In other words, we have a conceptual model.

  +-------+     +---------+   (ISO10646)
  |XYZ.COM|-->--|Transcode|-->------------+
  +-------+     +---------+      +----------------+     +---------------+
       :  (Legacy)         ...---|NAMEPREP/Unified|-->--|ACE/unique name|
  +-------+     +---------+      +----------------+     +---------------+
  |xyz.com|-->--|Transcode|-->------------+
  --------+     +---------+   (ISO10646)

5. Proposal

Given the context of the alternative view to IDN, we can derive another
set of solution using a directory concept.

  +-------+       +---------+
  |XYZ.COM|-->----|         |
  +-------+       |         |     +---------------+
       :  (Legacy)|Directory|-->--|ACE/unique name|
  +-------+       |         |     +---------------+
  |xyz.com|-->----|         |
  +-------+       +---------+

In section 3.2., it mentioned there are some ideograph cannot only be
handled by NAMEPREP's "MAP", "NORMALIZATION" and "PROHIB" essential
process. To build up a directory system is to doing as a further
NAMEPREP process. The further process will solve the matching problem.
For example the one to many and many to one mapping.

The purpose of this directory system is to list all the possible
representations of IDNs and unify them to a unique name. This unique
name could be an ACE of the most common representation or NAMEPREPPED
ACE.

The content of the directory is build up upon registration whereby
registrant will have to provide a list of equivalence representation
of the domain names they registered.

However, there is still a question of what directory should we use.
In this document, we shall examine a couple of different solutions.

5.1. LDAP as Directory

Lightweight Directory Access Protocol [LDAP] is one of the most
widely used directory protocols. In LDAP, there is a concept of
hierarchy similar to the DNS hierarchy. Hence, it is possible to
distribute the content of the directory across various LDAP servers
for scalability and authority control. For example, each registries
who wish to deploy IDN may setup an LDAP server and to register this
LDAP with a "root" LDAP server.

The IDN query process would then look something like this:
   a. User Input IDN name into an application
   b. Application does a LDAP query to look for unique name
   c. Application use unique name to do DNS lookup

Advantages:
   - encapsulate the problem in the representation layer and
     registration time
   - able to handle with unification problems

Disadvantage
   - requires all applications to upgrade
   - additional LDAP lookup overhead
   - policy issues with "root" LDAP server
   - requires access to LDAP servers to function, i.e. can't work
offline

5.2. CNRP as Directory

Common Name Resolution Protocol [CNRP] is a newly developed protocol
in IETF that does common names resolutions. In CNRP, there is no
concept of hierarchy but there is a referrer scheme. Hence, it is
possible to build a distributed directory system whereby they refer
to each another.

The IDN query process would then look something like this:
  a. User Input IDN name into an application
  b. Application does a CNRP query to look for unique name
  c. Application use unique name to do DNS lookup

Advantages:
   - encapsulate the problem in the representation layer and
     registration time
   - able to handle with unification problems
   - no policy issues with "root" CNRP server

Disadvantage
   - requires all applications to upgrade
   - additional CNRP lookup overhead and no assurance that unique name
     can be located
   - requires access to CNRP servers to function, i.e. can't work
offline

5.3. DNS as Directory

Domain Name System [DNS] is a widely established lookup distributed
directory. There is an existing hierarchy structure and resource
records are distributed. In theory, the DNS is able to handle 8-bit
binary string.

The IDN query process would then look something like this:
   a. User Input IDN name into an application
   b. Application does a DNS query to look for unique name which will
return the       Resource Record of the unique name together

Advantages:
   - encapsulate the problem in the representation layer and
     registration time
   - able to handle with unification problems
   - existing "root" DNS server with existing hierarchy
   - does not requires all applications to upgrade

Disadvantage
   - unknown behavior on applications which cannot handle 8-bit
   - unknown behavior of servers/caching software which cannot handle
8-bit

6. Solution

Given CDNC operational experience that it is difficult to get
applications developers to upgrade, difficult to get users to
download new applications and difficult etc, using DNS as a Directory
would be the fastest approach to deploy IDN for our users.

6.1. Zone file

Because there are multiple encoding and multiple representation of the
same name even within the same encoding, for a single name, there are
multiple binary strings for a single domain name (e.g. ML1, ML2, ML3,
ML4).

Hence, we would create the following Resource Records within the name
server. In the Resource Records, it would look like this:

ML1             UNAME           ACE1
ML2             UNAME           ACE1
ML3             UNAME           ACE1
ML4             UNAME           ACE1

ACE1            IN      A       1.2.3.4.
                IN      A       1.2.3.4.

A "UNAME" Resource Record is shown here. In practice, it could be
CNAME (except CNAME is unable to handle MX).

6.2. The practical case of implementing UNAME with CNAME

Before the UNAME protocol is defined, in TWNIC IDN testbed, it has
implimented IDN unique name with CNAME in current stage. When register
a Traditional Chinese domain name(TCDN) can get another one
corresponding Simplified Chinese domain name(SCDN). The Traditional
and Simplified Chinese Conversion is defined in [TSCONV].

The Resource Records is look like this:

TCDN1           CNAME           EDN1
SCDN1           CNAME           EDN1
EDN1            IN      A       IP-of-EDN1

If the EDN1 is not in the same domain with TCDN1 and SCDN1, that the
Resource Record of EDN1 would in the different zone file. The left
side of CNAME Resource Record would be all of the equivalent ML1,
ML2 .... Like TCDN1 and SCDN1 are equivalent. The right side of CNAME
Resource Record would be an unique name of ACE compatible. EDN1
(English Domain Name 1) is a kind of ACE compatible nique name. EDN1
could be substiude with any kind of ACE compatible unique name. Such
like xACE encode or random number. Once the xACE is decided by IETF
IDN WG, the implimentation would adopt the standard. The unique name
also retains compatible with [IDNA] approach.

In order to get the unique name EDN1 not the destination IP-of-EDN1,
there would be construct some intermediate server. In TWNIC testbed,
there are Web DNS or DNS proxy as the intermediate server. Any
application can pass a TCDN1 or SCDN1 to the intermediate server. The
intermediate server would ask the DNS for the coresponding right side
which is the unique name. And then pass the unique name EDN1 to the
application. And then go with the current DNS infrastructure. Once the
UNAME protocol is defined, there is no more need a intermediate server.

The process could be represented as following:

                                +------+
                                | User |
                                +------+
                                 |    ^
                   Request to AP |    | Response from AP
                   with MDN      |    |                   End system
    +----------------------------|----|----------------------------+
    |                            v                                 |
    |  +--------------------------------------------------------+  |
    |  |                  Application Client                    |  |
    |  +--------------------------------------------------------+  |
    |      |  ^ Nameprepped               |  ^         |  ^        |
    |  MDN |  | ACE compatible            |  |         |  |        |
    |      |  | unique name               |  | IP of   |  |        |
    |      v  |            Nameprepped    |  | unique  |  |        |
    |  +--------------+    ACE compatible |  | name    |  |        |
    |  | intermediate |    unique name    v  |         |  |        |
    |  |  server      |             +----------+       |  |        |
    |  +--------------+             | Resolver |       |  |        |
    |      |  ^ Nameprepped         +----------+       |  |        |
    |  MDN |  | ACE compatible        |  ^             |  |        |
    |      |  | unique name           |  |             |  |        |
    |  +--------------+               |  |     Request |  |Response|
    |  | Directory of |               |  |     for     |  |from    |
    |  | DNS          |               |  |     service |  |server  |
    |  +--------------+               |  |             |  |        |
    |                                 |  |             |  |        |
    +---------------------------------|--|-------------|--|--------+
                       Nameprepped ACE|  | IP of       |  |
                       compatible     |  | unique      |  |
                       unique name    v  | name        v  |
                            +-------------+  +---------------------+
                            | DNS servers |  | Application servers |
                            +-------------+  +---------------------+

6.3. Advantages

The strongest advantage to this solution is that:
a. This does not requires our users to download any special software
or upgrade their software since it is able to handle the native
encoding of the user directly

b. It will work immediately for ccTLD who wish to offers ML.ccTLD
services without any changes at the user client

c. It also retains the compatible with IDNA approach so long we keep
the unique name equivalent to NAMEPREPPED ACE.

d. Existing DNS hierarchy

6.4. Potential Loopholes

There are many loopholes within this solution that we need to take
note:

a. Some "smart" localized browser will send out "wrong" binary
string due to different. For example, English Internet Explorer will
not be able to handle Chinese double-byte legacy encoding properly.
But if there is the requirement of use double-byte encoding, the
appropriate application environment is necessary.

b. While Chinese have a handful (usually 2 to 3) representation
forms for a single IDN, other languages may have much more
complicated representations which may not be suitable to use this
approach. For example, if case-folding for Latin character is done
using this solution, for a string length of 32 characters, it will
requires 2^32 entries in the DNS. But this could be solved in some
other means.

c. It might be possible to construct a binary string in some legacy
encoding which gives the same binary representation for another
domain name (a.k.a. binary collision). The binary collesion of in the
same zone could be avoided by registration system and policy. If the
left side of UNAME (like ML1,ML2, ML3, ML4 or TCDN1, SCDN1) are not in
the same zone, they would not occur binary collesion. The intermediate
server would have the ability to decide which zone of DNS directory it
sould access.

Acknowledgement

Author(s)

Li Ming Tseng, Prof
National Central University, TWNIC
Email: tsenglm@cc.ncu.edu.tw
Tel: +886-3-490-4421

Jan Ming Ho, Prof
Academia Sinica, TWNIC
Email: hoho@iis.sinica.edu.tw
Tel: +886-2-2788-3799 x 1803

Hua lin Qian, Prof
Chinese Academy of Science, CNNIC
Email: hlqian@ns.cnc.ac.cn
Tel: +86-10-6256-9960

Kenny Huang
Asia Infra International Ltd, TWNIC
Email: huangk@alum.sinica.edu
Tel: +886-2-2658-6510

Editor: James SENG
i-DNS.net International
8 Temasek Boulevard
Suntec Tower Three #24-02
Singapore 038988
Email: jseng@i-dns.net
Tel: +65-2486-188

Editor: Erin Chen
Taiwan Network Information Center (TWNIC)
4F-2, No. 9, Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan.
Email: erin@twnic.net.tw
Tel: +886-2-23411313#502

Reference

[IDNREQ]        Requirements of Internationalized Domain Names, Zita Wenzel,
                James Seng, draft-ietf-idn-requirements

[NAMEPREP]      Preparation of Internationalized Host Names, P. Hoffman,
                M. Blanchet, draft-ietf-idn-nameprep

[HAN]           Han Ideograph (CJK) for Internationalized Domain Names,
                J. Seng, Y. Yoneya, K. Huang, K. Kim, draft-ietf-idn-cjk

[LDAP]          Lightweight Directory Access Protocol (v3), M. Wahl,
                T. Howes, S. Kille, rfc2251.txt

[CNRP]          Common Name Resolution Protocol, N. Popp, M. Mealing,
                M. Moseley, draft-ietf-cnrp

[DNS]           Domain Names - Implementation and Specification,
                P. Mockapetris, RFC1035

[CJKV]          CJKV Information Processing ISBN 1-56592-224-7

[UTR15]         Unicode Normalization Forms, Mark Davis and Martin Duerst,
                Unicode Technical Report 15.

[UTR21]         Case Mappings, Mark Davis, Unicode Technical Report 21.

[TSCONV]        Traditonal and Simplified Chinese Conversion, XiaoDong LEE,
                HSU NAI-WEN, Erin Chen, GuoNian SUN, CNNIC, TWNIC, CDNC,
                draft-ietf-idn-tsconv

[IDNA]          Internationalizing Host Names In Applications (IDNA),
                Patrik Faltstrom, Paul Hoffman, draft-ietf-idn-idna                                    Cisco