Internet Draft                                        Yoshiro Yoneya
draft-ietf-idn-jpchar-00.txt                      Yasuhiro Morishita
November 17, 2000                                              JPNIC
Expires May 17, 2001

        Japanese characters in multilingual domain name label

Status of this memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

Abstract

This document explains about Japanese characters and its canonicalization
rules in multilingual domain name labels.  This document is based on
discussions and examinations in JPNIC.

Despite of IDN WG rough consensus that character set in multilingual
domain name is UCS [UCS], most popular Japanese character set used in
Japan is Japanese Industrial Standards X 0208 -- hereafter abbreviated
as "JIS" -- [JISX0208].  This means that many of PCs and most of PDAs
including handy phones in Japan can display only JIS and ASCII.
Therefore, Japanese characters used in multilingual domain name are
strongly recommended as common part of JIS, ASCII and UCS.

Furthermore, for historical reasons, JIS have many compatible code
points in Kana and Alpha-numericals.  Such compatible code points are
still used widely, so that these characters SHOULD be acceptable
especially in user interface, and MUST be canonicalized before
transmission to the wire.  The former half should be implemented for
localization, and the latter half must be implemented for
internationalization.


1. Japanese characters in multilingual domain name labels

In principle domain name is a symbolic name of resources on the
Internet for understanding and memorizing easily to the Internet
users.  Internationalization or multilingualization of domain name
MUST obey this principle.  That is, characters in multilingualized
domain name labels SHOULD be unambiguous.

JIS has a lot of characters including graphical and compatible
characters.  But as for domain name, significant characters to
represent names are Kanji, Hiragana and Katakana [CJK].  Therefore,
according to the principle, Japanese characters in multilingual domain
name MUST be Kanji, Hiragana and Katakana in JIS.

The file "idntabjp10.txt" defines Japanese characters in the format of
[VERSION], with additional corresponding JIS code points as 3rd field,
that can be used in multilingual domain name labels.  Some of them,
such as PROLONGED SOUND MARK (U+30FC), are categorized into graphical
character in JIS, but usage of them are part of Kanji, Hiragana or
Katakana.  These characters are in canonicalized form.


2. Canonicalization rules of Japanese characters in multilingual
   domain name labels

In this section, this document describes two parts of canonicalization
rules.  One explains "localization", and the other comments on
"internationalization".  In other words, one is for Input/Display
level, and another is for API level [IDNA].

2.1 Localization: Characters to be canonicalized before NAMEPREP

As mentioned above, JIS has a lot of compatible characters that are
regarded alpha-numeric or Katakana.  The former is so called
FULL-WIDTH Alpha-numeric, and the latter is so called HALF-WIDTH kana.
These characters are prohibited in [NAMEPREP], but still widely used
in many PCs and most PDAs in Japan.  Hence, application softwares that
treat Japanese characters in multilingual domain name label SHOULD
accept these compatible characters as input and canonicalize them
before [NAMEPREP].

The file "idntabjpcanon10.txt" defines compatible characters, with
additional canonicalized character code as 3rd field; that is, mapping
table of FULL-WIDTH Alpha-numeric to ASCII, and HALF-WIDTH kana to
Katakana.

The file "idntabjpcomp10.txt" defines compatible character sequences
as composed, with additional canonicalized characters code as 3rd
field; that is, composition table of Kana and voiced sound mark.

Recommended order of applying canonicalization rules is as follows:

        (1) "idntabjpcanon10"
        (2) "idntabjpcom10"

This part is a local part of canonicalization.

2.2 Internationalization: Characters to be canonicalized in NAMEPREP

Japanese characters in multilingual domain name labels MUST be
characters defined in "idntabjp10".  Another characters except for
"idntabjp10" SHOULD be canonicalized at [NAMEPREP].

[NAMEPREP] is common and recommended rule for IDN.

This part is an international part of canonicalization.


3. Security considerations

None in particular.


4. References

[UCS]       "Universal Multiple-Octet Coded Character Set",
            ISO/IEC 10646-1:1993, ISBN 0-201-61633-5
[JISX0208]  "Japanese Industrial Standards",
            Information Technology (Terms/Code/Date elements)-99,
            ISBN4-542-12976-4
[IDNREQ]    "Requirements of Internationalized Domain Names",
            draft-ietf-idn-requirements-03.txt, Jun 2000, Z Wenzel, J Seng
[NAMEPREP]  "Preparation of Internationalized Host Names",
            draft-ietf-idn-nameprep-00.txt, Jul 2000, P Hoffman, M Blanchet
[CJK]       "Han Ideograph (CJK) for Internationalized Domain Names",
            draft-ietf-idn-cjk-00.txt, Sep 2000, J Seng, Y Yoneya,
            K Huang, K Kyongsok
[VERSION]   "Handling versions of internationalized domain names protocols",
            draft-ietf-idn-version-00.txt, Nov 2000, M Blanchet


5. Acknowledgements

JPNIC IDN-TF members.


6. Author's Address

Yoshiro Yoneya
Japan Network Information Center
Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
Chiyoda-ku Tokyo 101-0052, Japan
yone@nic.ad.jp

Yasuhiro Morishita
Japan Network Information Center
Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
Chiyoda-ku Tokyo 101-0052, Japan
yasuhiro@nic.ad.jp