Internet Draft                                 Deuk-kul Jang
draft-dkjang-idn-01.txt                        So-myung Ind
August 8, 2000                                 Expires in six months


       Internationalized domain names divided by characters key


Status of this memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Abstract

This description addresses the method for using internationalized
(multilingual) domain names under the current DNS. Further, the method
for converting a internationalized domain name expressed in a native
language, into a traditional US-ASCII domain name compatible in current
DNS, is addressed.

This method especially uses 'sequence of 36 characters' to convert all
characters in IDN into US-ASCII.

Finally, the way to use this method with lc2LDs are presented.



Contents

1. Introduction
  1.1. Definitions and Conventions
  1.2. Summary
2. Multilingual key
3. Language key
4. Sequence of 36 characters
5. Composition of Character substitute
  5.1. In case the number of characters is below 36;
    5.1.1. When the number of characters is below 9;
    5.1.2. When the number of characters is above 9 and below 26;

Expires 9th of Feb 2001                                  [Page  1]

Internet Draft       IDN divided by characters key       August 8, 2000


    5.1.3. When the number of characters is above 26 and below 35;
  5.2. In case the number of characters is above 36 characters;
    5.2.1. When the number of characters is above 35 characters and
           below 1260 characters;
    5.2.2. When the number of characters is above 1260 characters and
           below 1296 characters;
    5.2.3. When the number of characters is above 1296 characters and
           below 45,360 characters;
    5.2.4. When the number of characters is above 45,360 characters and
           below 47,952 characters;
  5.3. Characters in the plane besides BMP
6. TLD (Top level domain)
7. Conversion and display
  7.1 Converting IDN into the traditional name
  7.2 Display of IDN
8. Foreign language
9. Creating 'lc2LD's for IDN
10. References
11. Patent information
12. Author's address



1. Introduction

Under the current DNS (domain name system), the IP address (which is a
combination of numbers) and the domain names are used. The purpose of
the domain name is to use more familiar and memorable names than the IP
address. Nevertheless, because of the restriction of using only
US-ASCII characters in domain names, and although some persons don't
speak English, they have to use unfamiliar English domain names. For
them, it may not be much different from the IP address.

As a result, it is difficult to find home pages of even famous
companies without knowing their English domain names in advance. The
top level domains are designated in English for international
recognition. As for the second level domain under ccTLD, we have also
used English letters by using abbreviated English words which almost
seem to be secret codes (for example, ac, co, go, or, etc.). We have to
write as a 2LD of Seoul Korea '.seoul.kr' instead of Korean.

Furthermore, in order to control computers and use the Internet with
voice orders in the future, Internationalized domain names are
indispensable.

But, In order to be real international, IDN have to be expressed with
English for foreigners who do not know used language.


  1.1. Definitions and Conventions

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",


Expires 9th of Feb 2001                                  [Page  2]

Internet Draft        IDN divided by characters key      August 8, 2000


  "SHALL NOT", "SHOULD", "SHOULD NOT",
  "RECOMMENDED", "MAY", and "OPTIONAL" in this document
  are to be interpreted as described in [RFC2119].

  A 'internationalized domain name'(IDN) means the domain name expressed
  in the native language (non-ASCII) in the user's interface.

  A 'traditional domain name' means the domain name compatible with the
  current DNS.

  A 'converted domain name' is the same as the traditional domain name
  but it is converted from an IDN to be compatible with the current DNS.

  A 'character substitute' is a 'string of ASCII characters' that
  replaces a native character in IDN when the IDN is converted into
  traditional ones.

  A 'multilingual key' is a 'character assembly' located at a specific
  position of the converted domain name and represents that it is the
  converted domain name from an IDN.

  A 'language key' represents the original language from which the
  domain has been converted into the converted domain names.

  "DNS" ; Domain name System

  "IDN" ; Internationalized Domain Name (written in native language)

  "gTLD" ; generic Top Level Domain

  "lc2LD" ; language code 2nd Level Domain

  "BMP" ; Basic Multilingual Plane


  1-2. Summary

  According to [RFC1034], domain names must start with a letter (a
  through z), end with a letter or digit (0 through 9), and have only
  letters, digits, and hyphen as interior characters in the current DNS.

  Therefore native characters in an IDN must be converted into US-ASCII
  characters in order to be compatible with the current DNS.

  Hence, 'character substitutes' are made to express native characters
  (in IDN) with US-ASCII characters. To distinguish converted domain
  names from traditional ones, a 'multilingual key' is defined and
  added to every converted domain name. And to represent the original
  language from which the name has been converted, the 'language key'
  is assigned for every language.

  In this system, when an IDN is converted to the traditional domain
  name;

Expires 9th of Feb 2001                                  [Page  3]

Internet Draft       IDN divided by characters key       August 8, 2000


  1) The 'multilingual key' will be included automatically in All
     converted domain names.
  2) The 'language key' will be included automatically in the converted
     domain name.
  3) All characters will be replaced by 'character substitutes'.

  The conversion from an IDN to a traditional name, or the reverse
  conversion from a converted name to the original IDN, is performed by
  the conversion program installed on the user's computer. The
  conversion program will run when the user inputs the domain name
  including his/her own language and begins to use the internet
  services. For the display on the monitor, by the multilingual key
  included in the converted domain name, the program will be run.
  Therefore, users can use all the internet services in their own
  language at their convenience.



2. Multilingual key

By placing 'a certain specific ASCII characters' in a certain 'specific
location' in the domain name, a "multilingual key", that represents a
converted domain name from an IDN, can be made with this 'specific
characters in the specific location'. When a domain includes this
'multilingual key' at the specific location, the 'multilingual key'
indicates that the domain name was converted from IDN.

When a user inputs a IDN that includes his/her native language in order
to log in, the system adds this multilingual key with a language key
first and converts it into a traditional domain name.

For display, the system checks whether the multilingual key is
contained. If so, according to the language key, the system converts
that domain name to an IDN.

Note. To avoid confusion, the multilingual key should be characters
that is not commonly used. Further, it is recommended that the domain
name, containing the same character(s) in the same location, not be
registered as the traditional domain name.

In section 9. below, this multilingual key is replaced by a new gTLD.



3. Language key

One or two US-ASCII characters are assigned as a 'language key' for
every language currently defined (and being added) by [ISO 10646].
This language key follows the 'multilingual key' in the converted
domain names and represents the kind of language from which the name
has been converted.


Expires 9th of Feb 2001                                  [Page  4]

Internet Draft       IDN divided by characters key       August 8, 2000


By separating domains according to language, multilingual characters
can be expressed in minimum numbers of ASCII characters.

If the language key is not for the language that user have selected as
his main or subsidiary language, but for other languages, the system
shall not convert and display it to the monitor as US-ASCII characters.

As a language key, if two ASCII characters are given for every language
(like 'ko' for Korean, 'ja' for Japanese), we can manage 1,296 languages
(=36x36) theoretically.

In section 9. below, this language key is replaced by lc2LD.



4. Sequence of 36 characters

In order to substitute all characters (used in IDNs) by the least
number of US-ASCII when the IDNs are converted into traditional names,
the 'sequence of 36 characters' is made with letters (a-z) and digits
(0-9). They 36 characters may be used in the middle or at the end of
the domain name.



5. Composition of Character substitute

Character substitute consists of the following:
a. Separate characters currently defined (and being added) by
   [ISO10646] according to the kind of language.
b. For each language, all of native characters as well as alphabets and
   digits are arranged in the 'sequence of 36 characters'.
c. Set the arranged 'sequence of 36 characters' as the 'character
   substitute' for the character.


According to the entire numbers of characters of one language, assign
all characters to 'sequence of 36 characters' as follows:


  5.1. In case the number of characters is below 36;


    5.1.1. When the number of characters is below 9;

      5.1.1.1. Assign multilingual characters 1-9, and digits (0-9) to
      00-09 of the 'sequence of 36 characters'. The alphabets
      (US-ASCII, a-z) and hyphen are used as they are.

      5.1.1.2. As an alternative to 5.1.1.1. this arrangement may be
               used:
      Assign multilingual characters to the 01-09 of the 'sequence of


Expires 9th of Feb 2001                                  [Page  5]

Internet Draft       IDN divided by characters key       August 8, 2000


      36 characters' and the digit '0 (zero)' to '00'. digits (1-9)
      and the alphabets (US-ASCII) and hyphen are used as they are.


      (Example)

      In German there are 4 characters which are distinguished from
      english alphabets (a-z) case-insensitively.
      Because those 4 can be managed as multilingual characters,
      German characters correspond to this case 5.1.1. Therefore those
      4 characters assigned to 01-04 of the `sequence of 36characters'.

      But in this case It seems better to assign them to '0a' '0o' 0u'
      0b' (as if section 5.1.2.2) instead of 01-04.
      And assign the digit '0(zero)' to '00',
      The rest of digits (1-9) and the alphabets (a-z) and hyphen are
      used as they are.


      German domain name under gTLD (.com),
      (in hexadecimal) "0x0067/0x0072/0x00fc/0x006e/.com", will be
      converted following.


        <Assumption>

        The multilingual key is 'z-' at the position of name part,
        and the language key for German is 'de'.

        <Character substitute>

        German character(u-umlaut)
              0x00fc===> 0u
        3 alphabets are used as they are.
              0x0067===> g
              0x0072===> r
              0x006e===> n

        <Conversion>

        The name of IDN in German under user interface,
        "0x0067/0x0072/0x00fc/0x006e/.com", is converted to the
        traditional name following.

        (1) The multilingual key 'z-' is added to name part first.
        (2) The language key 'de' follows The multilingual key.
        (3) Character substitutes and '.com' follow them.

        <Final result>

        converted domain name : z-degr0un.com



Expires 9th of Feb 2001                                  [Page  6]

Internet Draft       IDN divided by characters key       August 8, 2000


    5.1.2. When the number of characters is above 9 and below 26;

      5.1.2.1. Assign multilingual characters to a-z, and the alphabets
      (US-ASCII) to 0a-0z, digit '0' to 00 and rests of digits (1-9) and
      hyphen are used as they are.

      5.1.2.2. As an alternative to 5.1.2.1 this arrangement may be
               used;
      Assign multilingual characters to 0a-0z, digit '0' to 00 and
      digits of 1-9 and the alphabets (US-ASCII) and hyphen are used as
      they are.

    5.1.3. When the number of characters is above 26 and below 35;

      5.1.3.1. Assign multilingual characters to 1-z in the order, and
      digits (0-9) and alphabets (a-z) to 00-0z. And hyphen is used
      as it is.

      5.1.3.2. As an alternative to 5.1.3.1 this arrangement may be
               used;
      Assign multilingual characters to 01-0z, digit '0' to 00 and
      digits of 1-9 and alphabets (a-z) and hyphen are used as they are.


  5.2. In case the number of characters is above 36 characters;


      5.2.1. When the number of characters is above 35 characters
             and below 1260 characters;

      Assign multilingual characters to 10-zz (35x36=1260) of the
      'sequence of 36 characters' in order.

      Assign digits (0-9) and alphabets (a-z) to 00-0z of the 'sequence
      of 36 characters'. Hyphen is used as it is.

      5.2.2. When the number of characters is above 1260 characters and
             below 1296 characters;

      By attaching the letters using a hyphen from '-0' to '-z' to the
      end of the sequence of 'zz', the representation range of two
      ASCII characters is extended to 1296 multilingual characters.
      In this case hyphen is assigned to '0-'.

      5.2.3. When the number of characters is above 1296 characters and
             below 45,360 characters;

      Assign digits (0-9) and alphabets (a-z) to 00-0z of the 'sequence
      of 36 characters'. A hyphen is used as it is.

      Assign multilingual characters to the three digits of '36
      characters sequence', 100-zzz, in order.


Expires 9th of Feb 2001                                  [Page  7]

Internet Draft       IDN divided by characters key       August 8, 2000


      (Example)

      In Korea, they use Korean characters together with chinese
      characters in writing. In other words, for Korean, multilingual
      characters are composed of Korean characters (11,172) and Chinese
      characters (about 21,000).

      The language of around 32,000 characters corresponds to this
      section 5.2.3.

      So digits (0-9) and alphabets (a-z) are assigned to 00-09 and
      0a-0z of the 36-characters sequence.
      And the 11,172 Korean characters at position 0xac00-d7a3
      (hexadecimal) in the [ISO10646] are arranged in 100-9mb of the
      36-characters sequence in the order, and chinese characters
      follow them.

      A multilingual domain name composed with 4 Korean characters,
      2 alphabets (ks), hyphen (-) and 2 digits (23) under gTlD (.com),
      "0xb300/0xd55c/0xb9c7/0xad09/ks-23.0xd68c/0xc0ac", will be
      converted following.

          "0xb300/0xd55c/0xbbfc/0xad5d" means Korea.
          And "0xd68c/0xc0ac" means company.

      <Assumption>

      The multilingual key is 'z-' at the position of name part,
      and the language key for Korean is 'ko'.

      <Character substitute>

      digits
             2 ===> 02    3 ===> 03

      alphabets and hyphen
             k ===> 0k    s ===> 0s    hyphen - ===> -

      Korean

        0xb300 (the 1,793th from the 0xac00) ===>
                  2ds (the 1,793th from the '100' of the 36 sequence)
        0xd55c (the 10,589th)  ===> 964
        0xbbfc (the 4,093th)   ===> 45y
        0xad5d (the 366th)     ===> 1a5

      direct translation (see section 6.)
        0xd68c/0xc0ab ===> .com

      <Conversion>

      The IDN under user interface,

Expires 9th of Feb 2001                                  [Page  8]

Internet Draft       IDN divided by characters key       August 8, 2000


      "0xb300/0xd55c/0xb9c7/0xad09/ks-23.0xd68c/0xc0ab",
      is converted to the traditional name following.

      (1) The multilingual key 'z-' is added to name part first.
      (2) The language key 'ko' follows The multilingual key.
      (3) Character substitutes and '.com' follow them.

      <Final result>

      converted domain name : z-kr2ds96445y1a50k0s-0203.com


      5.2.4. When the number of characters is above 45,360 characters
             and below 47,952 characters;

      As stated above in 5.2.2.2, the three ASCII characters using a
      hyphen, can extend the representation range to 47,952 (36x37x36)
      multilingual characters. In this case hyphen is assigned to '0-'.


  5.3. Characters in the plane besides BMP

  Same as characters in BMP, those characters in other plane of
  Canonical form are divided according to the kind of language and
  arranged independently in '36-characters sequence' with alphabets and
  digits. Therefore, which plane those characters are located in makes
  no difference.


  Table. Number of US-ASCII characters needed to represent one
         multilingual character (alt=alternative arrangement)
-----------------------------------------------------------------------
 Kind of                      Number of Characters
Character   0-9(alt)   10-26(alt)   27-35(alt)    36-1296   1297-47952
-----------------------------------------------------------------------
 Native      1 (2)        1 (2)        1 (2)         2          3

 Alphabets   1 (1)        2 (1)        2 (1)         2          2

 Digit       2 (1)        1 (1)        2 (1)         2          2
-----------------------------------------------------------------------



6. TLD (Top level domain)

The top level domains are limited in numbers (.com, .org, .net etc.).
Thus, instead of a substitute, they are directly translated.



7. Conversion and display


Expires 9th of Jan 2001                                  [Page  9]

Internet Draft       IDN divided by characters key       August 8, 2000


  7.1. Converting IDN into traditional name.

  When a user enters an IDN into an application to use an Internet
  service, the conversion program runs by multilingual character(s)
  included in the name. Then, the program converts the IDN into the
  traditional name by including the 'multilingual key' and the
  'language key' mentioned above, and by replacing each character with
  its 'character substitute', and hands over the converted domain name
  to the application handling the Internet service.


  7.2 Display of IDN

  When a domain name includes the multilingual key, and the language
  key in that name conforms to the language selected as main or
  subsidiary language, the program converts, (reverse of 7.1), by
  deleting the multilingual key and language key, and by replacing the
  rest of the ASCII characters with native characters. And then the
  IDN is displayed to users monitor in the native language.

  But if the domain name does not contain a multilingual key, or the
  language key does not conform to the language selected, the domain
  name is displayed to the monitor as it is US-ASCII without any
  conversion. In other words, traditional US-ASCII names and foreign
  IDNs are displayed in English, and the IDNs that belongs to user's
  native language are displayed in User's language.



8. Foreign language

When a user logs in another IDN in a different language zone,
(e.g., Japanese user tries to log in the Korean domain);

If the user does not have a text editor for its language, he/she types
and logs in the domain name as it is US-ASCII.


9. creating lc2LD for IDN

1) creating lc2LDs under current gTLDs

Two key, that is multilingual key  and language key, can be replaced by
one lc2LD (language code 2nd Level Domain), like .ko.com for korean
under '.com', '.ja.net' for japanese under .net.

Then,  the examples above will be encoded to;

0x0067/0x0072/0x00fc/0x006e/.com in section 5.1.1.2.
     ===>'gr0un.de.com' instead of 'z-degr0un.com'

'0xb300/0xd55c/0xbbfc/0xad5d/.0xd68c/0xc0ac/' in section 5.2.3.
     ===>'2ds96445y1a5.ko.com' instead of 'z-kr2ds96445y1a5.com'.

Expires 9th of Jan 2001                                  [Page  10]

Internet Draft       IDN divided by characters key       August 8, 2000


2) When it is impossible for every language to find suitable strings of
characters under current gTLDs, And if new gTLDs Only for IDN, such as
'.icom' (or .ico), 'inet' (or .ine), can be created, then new gTLDs will
replace the 'multilingual key'.

In this case the examples above will be encoded to;

0x0067/0x0072/0x00fc/0x006e/.com in section 5.1.1.2.
     ===>'degr0un.icom' instead of 'z-degr0un.com'

'0xb300/0xd55c/0xbbfc/0xad5d/.0xd68c/0xc0ac/' in section 5.2.3.
     ===>'kr2ds96445y1a5.icom' instead of 'z-kr2ds96445y1a5.com'.

3) lc2LD under new gTLD

The lc2LDs under new gTLDs can replace language key. Then the examples
will be encoded respectively to 'gr0un.de.icom'
and '2ds96445y1a5.kr.icom'



10. References

RFC1034
P. Mockapetris
"DOMAIN NAMES - CONCEPTS AND FACILITIES" November 1987



11. Patent information

The most part of this method has been applied for a patent in Korea.

Application Date: February 12, 2000
Application No.: 10-2000-0006723
Applicant: Deuk-kul Jang (4-1995-085521-2)



12. Author's address

Deuk-kul Jang
So-myung Ind.
Postal address: Kyunggido namyangjushi jingunmyun songnungri 178-6
                Republic of Korea
Telephone number; 502-3030-308, 17-266-3030
Fax. Number ; 31-573-6849

E-mail ; dkjang@smind.co.kr





Expires 9th of Feb 2001                                  [Page  11]