Internet Draft                                        Yoshiro Yoneya
draft-ietf-idn-jpchar-00.txt                      Yasuhiro Morishita
November 17, 2000                                              JPNIC
Expires May 17, 2001

        Japanese characters in multilingual domain name label

This document explains about Japanese characters and its canonicalization
rules in multilingual domain name labels.  This document is based on
discussions and examinations in JPNIC.

Despite of IDN WG rough consensus that character set in multilingual
domain name is UCS [UCS], most popular Japanese character set used in
Japan is Japanese Industrial Standards X 0208 -- hereafter abbreviated
as "JIS" -- [JISX0208].  This means that many of PCs and most of PDAs
including handy phones in Japan can display only JIS and ASCII.
Therefore, Japanese characters used in multilingual domain name are
strongly recommended as common part of JIS, ASCII and UCS.

Furthermore, for historical reasons, JIS have many compatible code
points in Kana and Alpha-numericals.  Such compatible code points are
still used widely, so that these characters SHOULD be acceptable
especially in user interface, and MUST be canonicalized before
transmission to the wire.  The former half should be implemented for
localization, and the latter half must be implemented for

1. Japanese characters in multilingual domain name labels

In principle domain name is a symbolic name of resources on the
Internet for understanding and memorizing easily to the Internet
users.  Internationalization or multilingualization of domain name
MUST obey this principle.  That is, characters in multilingualized
domain name labels SHOULD be unambiguous.

JIS has a lot of characters including graphical and compatible
characters.  But as for domain name, significant characters to
represent names are Kanji, Hiragana and Katakana [CJK].  Therefore,
according to the principle, Japanese characters in multilingual domain
name MUST be Kanji, Hiragana and Katakana in JIS.

The file "idntabjp10.txt" defines Japanese characters in the format of
[VERSION], with additional corresponding JIS code points as 3rd field,
that can be used in multilingual domain name labels.  Some of them,
such as PROLONGED SOUND MARK (U+30FC), are categorized into graphical
character in JIS, but usage of them are part of Kanji, Hiragana or
Katakana.  These characters are in canonicalized form.

2. Canonicalization rules of Japanese characters in multilingual
   domain name labels

In this section, this document describes two parts of canonicalization
rules.  One explains "localization", and the other comments on
"internationalization".  In other words, one is for Input/Display
level, and another is for API level [IDNA].

2.1 Localization: Characters to be canonicalized before NAMEPREP

As mentioned above, JIS has a lot of compatible characters that are
regarded alpha-numeric or Katakana.  The former is so called
FULL-WIDTH Alpha-numeric, and the latter is so called HALF-WIDTH kana.
These characters are prohibited in [NAMEPREP], but still widely used
in many PCs and most PDAs in Japan.  Hence, application softwares that
treat Japanese characters in multilingual domain name label SHOULD
accept these compatible characters as input and canonicalize them
before [NAMEPREP].

The file "idntabjpcanon10.txt" defines compatible characters, with
additional canonicalized character code as 3rd field; that is, mapping
table of FULL-WIDTH Alpha-numeric to ASCII, and HALF-WIDTH kana to

The file "idntabjpcomp10.txt" defines compatible character sequences
as composed, with additional canonicalized characters code as 3rd
field; that is, composition table of Kana and voiced sound mark.

Recommended order of applying canonicalization rules is as follows:

        (1) "idntabjpcanon10"
        (2) "idntabjpcom10"

This part is a local part of canonicalization.

2.2 Internationalization: Characters to be canonicalized in NAMEPREP

Japanese characters in multilingual domain name labels MUST be
characters defined in "idntabjp10".  Another characters except for
"idntabjp10" SHOULD be canonicalized at [NAMEPREP].

[NAMEPREP] is common and recommended rule for IDN.

This part is an international part of canonicalization.

3. Security considerations

None in particular.

5. Acknowledgements

JPNIC IDN-TF members.

