draft-jseng-idn-admin-02

INTERNET DRAFT                                     Editors: James SENG
draft-jseng-idn-admin-02.txt                              John KLENSIN
18th Oct 20th Nov 2002                             Authors: K. KONISHI
Expires 2018th April May 2003                 K. HUANG, H. QIAN, Y. KO

   Internationalized Domain Names Registration and Administration
             Guideline for Chinese, Japanese and Korean

Status of this Memo

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC2026 except that the
   right to produce derivative works is not granted.

   Internet-Drafts are working documents of the Internet
   Engineering Task Force (IETF), its areas, and its working
   groups. Note that other groups may also distribute working
   documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

Achieving internationalized access to domain names raises
many complex issues.  These include are not only associated
with basic protocol design (i.e., how the names are
represented on the network, compared, and converted to
appropriate forms) but also issues and options for
deployment, transition, registration and administration.

The IETF IDN working group focused on the development of a
standards track specification for access to domain names in
a broader range of scripts than the original ASCII.  It
became clear during its efforts that there was great
potential for confusion, and difficulties in deployment and
transition, due to characters with similar appearances or
interpretations and that those issues could best be
addressed administratively, rather than through restrictions
embedded in the protocols.

This document provides guidelines for zone administrators
(including but not limited to registry operators and
registrars), and information for all domain names holders,
on the administration of those domain names which contain
characters drawn from Chinese, Japanese and Korean scripts
(CJK).  Other language groups are encouraged to develop
their own guidelines as needed, based on these guideline if
that is helpful.

Comments on this document can be sent to the authors at
idn-admin@jdna.jp.

Table of Contents

0. Pre-Note for ASCII-version of this document                       2

1. Introduction                                                      3

2. Definitions                                                       5

3. Administrative Framework                                          6
3.1. Principles underlying these Guidelines                          7
3.2. Registration of IDL                                             8
3.2.1. Language character variant table                              9
3.2.2  2. Formal syntax                                             10
3.2.3. Registration Algorithm                                       10
3.3. Deletion and Transfer of IDL and IDL Package                   12
3.4. Activation and De-activation of IDN variants                   13
3.5. Adding/Deleting language(s) association                        13
3.6. Versioning of the language character variant tables            13

4. Example of Guideline Adoption                                    14

i. Notes                                                            17

ii. Other Issues                                                    18

iii. Acknowledgements                                               18

iv. Authors                                                         18

v. Normative References                                             19

vi. Non-normative References                                        20




0. Pre-Note for ASCII-version of this document

In order to make meanings clear, especially in examples, Han
ideographs are used in several places in this document.  Of
course, these ideographs do not appear in its ASCII form of
this document.  So, for the convenience of readers of the
ASCII format and some readers not familiar with recognizing
and distinguishing Chinese characters, each
use of a particular character will be associated with both
its Unicode code point and an "asterisk tag" with its
corresponding Chinese Romanization [ISO7098] with the tone
mark represented by a number 1 to 4.  Those tags have no
meaning outside this document; they are intended simply to
provide a quick visual and reading reference to facilitate
the combinations and transformations of characters in the
guideline and table excerpts.  Appendix A would provide the
Romanization of the ideographs in Japanese (ISO 3602) and
Korean (ISO 11941).






1. Introduction

Defining and specifying protocols for Internationalized
Domain Names has been one of the most controversial tasks
initiated by the IETF in recent years.  Domain names are the
fundamental naming architecture of the Internet; many
Internet protocols and applications rely on the stability,
continuity, and absence of ambiguity of the DNS.

The introduction of internationalized domain names (IDN)
amplifies the difficulty of putting names into identifiers
and the confusion between scripts and languages.  It impacts
many internet protocols and applications and creates more
complexity in technical administration and services.

While the IETF IDN working group [IDN-WG] focused on the
technical problems of IDN, administrative guidelines are
also important in order to reduce unnecessary user confusion
and domain name disputes among domain name holders.

The IDN working group has completed working group last call
for the following internet-draftsThere are four documents
from the IDN working group that have been approved by IESG:

1. Preparation of Internationalized Strings [STRINGPREP]
2. Internationalizing Host Names In Applications [IDNA]
3. Punycode version 0.3.3 [PUNYCODE]
4. A Stringprep Profile for Internationalized Domain Names
   [NAMEPREP]

These drafts documents specify that the intersystem
protocols that make up the domain name system infrastructure
remain unchanged.  Instead, they introduce
internationalization (I18N) [Note1] in client software
(particularly via the IDNA protocol) using an ASCII
Compatible Encoding (ACE) known as Punycode.

The domain name protocols [STD13] also specify that
characters are to be interpreted so that upper and lower
case Latin-based characters are considered equivalent.  But
with the introduction of Unicode characters beyond US-ASCII,
and the possibility to represent a single character in
multiple ways in ISO10646/Unicode [UNICODE], a normalization
process, known as Nameprep, has been proposed to handle the
more complex problems of character-matching for those
additional characters.  Nameprep is also executed by client
software as described in IDNA.

While Nameprep normalizes domain names so that the users
have an improved chance of getting the right domain name
from information provided in other forms, as required for
I18N, Nameprep does not handle any localization (L10N).

This becomes significant when a domain name holder attempts
to use a Unicode string forming a "name", "word", or
"phrase" that may have certain meaning in a certain language
or when used as a domain name.  Such Unicode string may have
different variants in the context of the language or
culture.

Generally, these localized variants in CJK can be classified
into four categories, as described by Halpern et al. [C2C]:
[Note2]

a. Character (or Code) variants

Character (or Code) variants refer to variants that are
generated by character-by-character (or code-by-code)
substitution.

An example in English would be "A" or "a" (U+0041 or
U+0061). Two examples in Chinese would be U+98DB *fei1* or
U+98DE *fei1* and U+6A5F *ji1* or U+673A *ji1*.

Note that this does not mean the choice between U+6A5F and
U+673A is always symmetric like the one between "A" and "a"
it is a choice only for Chinese but not for Japanese.

The variants for particular characters may be just to drop
them. For example, points and vowels characters in Hebrew
(U+05B0 to U+05C4) and Arabic (U+064B to U+0652) are
optional; the variants for strings containing them are
constructed by simply dropping those points and vowels.

Code variants may also occur when different code points are
assigned to what visually or abstractly are the "same"
character, possibility due to compatibility issues, type
face differences or script range. For example, LATIN CAPITAL
LETTER A (U+0041) normally has an appearance identical to
GREEK CAPTIAL LETTER A (U+0391). CJK scripts have font
variants for compatibility (either U+4E0D or U+F967 may be
used) and "zVariant" (e.g. U+5154 and U+514E).

The difficulty lies in defining which characters are the
"same" and which are not.

b. Orthographic variants

Orthographic variants refer to variants that are generated
by word-by-word substitution.

An example in English would be "color" and "colour".

It is possible for some of these orthographic variants to be
generated by character variants. For example "airplane" in
Chinese may be either U+98DB U+6A5F *fei1 ji1* or U+98DE
U+673A *fei1 ji1*.

Other orthographic variants may not be generated by
character variants.  For example, in Chinese, both U+767C
*fa1* and U+9AEE *fa4*  are related to U+53D1 *fa1 or fa4*
depending on the word. For hair, U+5934 U+53D1 *tou2 fa4*,
the variant should be U+982D U+9AEE *tou2 fa4* but not
U+982D U+767C *tou2 fa1*.

c. Lexemic variants

Lexemic variants refer to variants that can be generated
when language is considered, by word-by-word substitution.

An example in English would be cab, taxi, or taxicab.

An example in Chinese would be U+8CC7 U+8A0A *zi1 xun4*
or U+4FE1 U+606F *xin4 xi1*.

Note that there is no relationship between U+8CC7 and U+4FE1
or U+8A0A and U+606F, i.e., the sequence U+8CC7 U+606F
*zi1 xi1* does not exist in Chinese.

d. Contextual variants

Contextual variants refer to variants that are generated by
word-by-word substitutions with context considered.

In English, the word "plane" has different meanings and
could be replaced by with different equivalent words
(synonyms) such as "airplane" or "plane" (as in a flat-
surface or device for smoothing wood) depending on context.
And, of course, "plain", which is pronounced the same way,
and indistinguishable in speech-to-text contexts such as
computer input systems for the visually impaired, is a
different word entirely.

Similarly, the word U+6587 U+4EF6 *wen2 jian4* could be
either document U+6587 U+4EF6 *wen2 jian4*or data file
U+6A94 U+6848 *dang3 an4* depending on context.

Although domain names were designed to be identifiers
without any language context, users have not been prevented
from using strings in domain names and interpreting them as
"words" or "names". It is likely that users will do this
with IDN as well. Therefore, given the added complications
of using a much broader range of characters, precautions
will be required when deploying IDN to minimize confusion
and fraud.

The intention of these guidelines is to provide advice about
the deployment of IDNs, with language consideration, but
focusing only on the category of character variants to
increase the possibility of successful resolution and
reduced confusion while accepting inherent DNS limitations.

2. Definitions

Unless otherwise stated, the definitions of the terms used
in this document are consistent with "Terminology Used in
Internationalization in the IETF" [I18NTERMS].

"FQDN" refers to a fully-qualified domain name and "domain
name label" refers to a label of a FQDN.

RFC3066 [RFC3066] defines a system for coding and
representing languages.

ISO/IEC 10646 is a universal multiple-octet coded character
set that is a product of ISO/IEC JTC1/SC2/WG2, Work Item
JTC1.02.18 (ISO/IEC 10646). It is a multi-part standard:
Part 1, published as ISO/IEC 10646-1:2000(E) covering the
Architecture and Basic Multilingual Plane; Part 2, published
as ISO/IEC 10646-2:2001(E) covers the supplementary
(additional) planes.

The Unicode Consortium publishes "The Unicode Standard -
Version 3.0", ISBN 0-201-61633-5. In March 2002, Unicode
Consortium published Unicode Standard Annex #28.  That annex
defines Version 3.2 of The Unicode Standard, which is fully
synchronized with ISO/IEC 10646-1:2000 (with Amendment 1).

The term "Unicode character" is used here to refer to
characters chosen from The Unicode Standard Version 3.2 (and
hence from ISO/IEC 10646). In this document, the characters
are identified by their positions (or "code points"). The
notation U+12AB, for example, indicates the character at the
position 12AB (hexadecimal) in the Unicode 3.2 table.

Similarly, "Unicode string" refers to a string of Unicode
characters. The Unicode string is identify by the sequence
of the Unicode characters regardless of the encoding scheme.

The term "IDN" is often used to refer to many different
things: (a) an abbreviation for "Internationalized Domain
Name" (b) a fully-qualified domain name that contains at
least one label that contains characters not appearing in
ASCII (c) a label of a domain name that contains at least
one character beyond ASCII (d) a Unicode string to be
processed by Nameprep (e) an IDN Package (in this document
context) (f) a Nameprep processed string (g) a Nameprep and
Punycode processed string (h) the IETF IDN Working Group (g)
ICANN IDN Committee (h) other IDN activities in other
companies/organizations etc.

Because of the potential confusion, this document shall use
the term "IDN" as an abbreviation for "Internationalized
Domain Name" only.

And also, this document provides a guideline to be applied
on a per zone basis, one label at a time, the term
"Internationalized Domain Name Label" or "IDL" will be used
instead.

In this document, the term "registration" refers to the
process by which a potential domain name holder requests
that a label be placed in the DNS, either as an individual
name within a domain or as a sub-domain delegation from
another domain name holder. A successful registration would
then lead to the label or delegation records being placed in
the relevant zone file.  The guidelines presented here are
recommended for all zones, at any hierarchy level, in which
CJK characters are to appear, not just domains at the first
or second level.

CJK characters are characters commonly used in Chinese,
Japanese or Korean language including but not limited to
ASCII (U+0020 to U+007F, Han Ideograph (U+3400 to U+9FAF and
U+20000 to U+2A6DF), Bopomofo (U+3100 to U+312F and U+31A0
to U+31BF), Kana (U+3040 to U+30FF), Jamo (U+1100 to 11FF
and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and U+3130
to U+318F) and its respective compatibility forms.

3. Administrative Framework

Zone administrators are responsible for the administration
of the domain name labels under their control. A zone
administrator might be responsible for a large zone such as
a Top Level Domain (TLD), generic or country code, or a
smaller one such as a typical second or third level domain.
A large zone would often be more complex then a smaller one
(sometimes it is just larger).  However, normally, actual
technical administrative tasks -- such as addition,
deletion, delegation and transfer of zones between domain
name holders -- are similar for all zones.

At the same time, different zones may have different
policies and processes.  For example, a pay-per-domain
policy and registry/registrar model for .COM may not be
applicable to such domains as .SG or .IBM.COM. The latter,
for example, has very restricted policies about who is
permitted to have a domain name label under IBM.COM, the
types of string that are permitted, and different procedures
for obtaining those string.

This document only provides guidelines for how CJK
characters should be handled within a zone, how language
issues should be considered and incorporated, and how domain
name labels containing CJK characters should be administered
(including registration, deletion and transfer of labels).
It does not provide any guidance for handling of non-CKJ
characters or languages in zones.

Other IDN policies, as the creation of new TLDs, or the cost
structure for registrations, are outside the scope of this
document.  Such discussions should be conducted in forums
outside the IETF as well.

Technical implementation issues are not discussed here
either.  For example, the decision as to whether various of
the guidelines should be implemented as registry or
registrar actions is left to zone administrators, possibly
differing from zone to zone.

3.1. Principles underlying these Guidelines

In many places, this document would assumes "First-Come-
First-Serve" (FCFS) as a conflict policy in the event of a
dispute although FCFS is not listed as one of the
principles. If other policies dominate priorities and
"rights", one can use these guidelines by replacing uses of
FCFS in this document by appropriate other policy rules
specific to the zone.  In other cases, some of these
guidelines may not be applicable although, some alternatives
for determining rights to labels -- such as use of UDRP or
mutual exclusion -- might have little impact on other
aspects of these guidelines.

(a) Each IDL to be registered should be associated with one
or more languages.

Although some Unicode strings may be pure identifiers made
up of an assortment of characters from many languages and
scripts, IDLs are likely to be names or phrases that have
certain meaning in some language.  While a zone
administration might or might not require "meaning" as a
registration criterion, the possibility of meaning provides
a useful tool when trying to avoid user confusion.

Zone administrators should administratively associate one or
more language with each IDL.  These associations should
either be pre-determined by the zone administrator and
applied to the entire zone or chosen by the registrants on a
per-IDL basis.  The latter may be necessary for some zones,
but will make administration more difficult and will
increase the likelihood of conflicts in variant forms.

A given zone might have multiple languages associated with
it, or have no language specified at all, but doing so may
provide additional opportunities for user confusion, and is
therefore NOT recommended.

The zone administrator must also verify the validity of the
IDL requested by using information associated with the
chosen language and possibly other rules as appropriate.

(b) When an IDL is registered, all of the character variants
for the associated language(s) should be reserved for the
registrant.  Each language associated with the IDL will lead
to different character variants.

IDL reservations of the type described here normally do not
appear in the distributed DNS zone file.  In other words,
these reserved IDLs do not resolve. Domain name holders
could request these reserved IDLs to be placed in the zone
file and made active and resolvable as, e.g., aliases or
synonyms.

Since different languages may imply different sets of
variants, the IDLs reserved for one IDL may overlap those
reserved for another.  In this case, the reserved IDLs
should be bound to one registration or the other, or
excluded from both, according to the applicable registration
or dispute resolution policy for the zone.

(c) For a given base language, the IDL may have one or more
recommended variants. Some language rules may prefer certain
variants over others. To increase the likelihood of correct
and predictable resolution of the IDL by end-users, the
recommended variants should be active.

(d) A zone administrator may impose additional rules and
other processing to further limit the set of reserved
variants, based on its policy and/or procedure.

For example, the zone administrator may have policy that
requires users to select the reserved variants. Or some
combinations of the characters are invalid.

Such additional rules and other processing are imposed on a
per zone basis and therefore not within the scope of this
document.

(e) The IDL and its reserved variants with the language(s)
association should be atomic.

The IDL and its reserved variants for the associated
language(s) are to be considered as a single unit - an "IDL
Package". For a given IDL, that IDL package is defined by
these guidelines and created upon registration.

The IDL Package is atomic: Transfer and deletion of IDL are
performed on the IDL Package as a whole. IDL, either active
or reserved, within the IDL Package should not be
transferred or deleted individually.  I.e., any re-
registration, transfers, or other actions that impact the
IDL should also impact the reserved variants.  Separate
registration or other actions for the variants are not
possible if these guidelines are to accomplish their
purpose.

Conflict policy of the zone may result in violation of the
IDL Package atomicity. In such case, the conflict policy
would take precedence.

3.2. Registration of IDL

Conforming to the principles described in 3.1, the
registration of an IDL would require at least two
components, i.e., the character variant tables for the
language and the registration algorithm.

3.2.1. Language character variant table

Any lines starting with, or portions of lines after, the
hash symbol("#") are treated as comments. Comments have no
significance in the processing of the tables, nor are there
any syntax requirements between the hash symbol and the end
of the line. Blank lines in the tables are ignored
completely.

Every language should have a character variant table
provided by a relevant group (or organization or other body)
and based on established standards. The group that defines a
particular character variant table should document
references to the appropriate standards in beginning of
table, tagged with the word "Reference" followed by an
integer (the reference number) followed by the description
of the reference.  For example,

Reference 1 CP936 (commonly known as GBK)
Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
Reference 3 List of Simplified character Table (Simplified column)
Reference 4 zSimpVariant in Unihan.txt
Reference 5 variant that exists in GB2312, common simplified hanzi

Each language character variant table must have a version
number. This is tagged with the word "Version" followed by
an integer then followed by the date in the format YYYYMMDD,
where YYYY is the 4 digit Year, MM is the 2 digit Month and
DD is the 2 digit Day of the publication date of the table

Version 1 20020701  # July 2002 Version 1

The table has three fields, separated by semicolons.  The
fields are: "valid code point"; "recommended variant(s)";
and "character variant(s)".

Only code points listed in the "valid code point" field are
allowed to be registered as part of a IDL associated with
that language.

There can be one or more "recommended variant(s)" (i.e.,
entries in the "recommended variant(s)" column). If the
"recommended variant(s)" column is empty, then there is no
corresponding variant.

The "character variant(s)" column contains all variants of
the code point.  Since the code point is always a variant of
itself, therefore, to avoid redundancy, the code point
itself need not be repeated in the "character variant(s)"
column.

If the variant is composed of a sequence of code points,
then sequence of code points is listed separated by a space
in the "recommended variant(s)" or "character variant(s)".

If there are multiple variants, each variant must be
separated by a comma in the "recommended variant(s)" or
"character variant(s)".

Any code point listed in the "recommended variant(s)" column
must be allowed, by the rules for the relevant language, to
be registered. However, this is not a requirement for the
entries in the "character variant(s)" column; it is possible
that some of those entries may not be allowed to be
registered.

Every code point in the table should have a corresponding
reference number (associated with the references) specified
to justify the entry. The reference number is placed in
parentheses after the code point. If there is more than one
reference, then the numbers are placed within a single set
of parentheses and separated by commas.

3.2.2. Formal syntax

This section uses the IETF "ABNF" metalanguage [ABNF]

LanguageCharacterVariantTable = 1*ReferenceLine VersionLine
                                1*EntryLine
ReferenceLine = "Reference" SP RefNo SP RefDesciption
                [ Comment ] CRLF
RefNo = 1*DIGIT
RefDesciption = *[VCHAR]
VersionLine = "Version" SP VersionNo SP VersionDate
              [ Comment ] CRLF
VersionNo = 1*DIGIT
VersionDate = YYYYMMDD
EntryLine = VariantEntry/Comment CRLF
VariantEntry = ValidCodePoint [ "(" RefList ")" ] ";"
               RecommendedVariant ";" CharacterVariant
               [ Comment ]
ValidCodePoint = CodePoint
RefList = RefNo  0*( "," RefNo )
RecommendedVariant = CodePointSet 0*( "," CodePointSet )
CharacterVariant = CodePointSet 0*( "," CodePointSet )
CodePointSet = CodePoint 0*( SP CodePoint )
CodePoint = 4DIGIT [DIGIT] [DIGIT]
Comment = "#" *VCHAR

YYYYMMDD is an integer representing a date where YYYY is the
4 digit year, MM is the 2 digit month and DD is the 2 digit
day.

3.2.3. Registration Algorithm

(An explanation of these steps follows them)

1.   IN <= IDL to be registered and
     {L} <= Set of languages associated with IN
2.   {V} <= Set of version numbers of the language character
             variant tables derived from {L}
3.   NP(IN) <= Nameprep processed IN  and
     check availability of NP(IN).
     If not available, route to conflict policy.
4.   For each AL in {L}
4.1.   Check validity of NP(IN) in AL. If failed, stop processing.
4.2.   PV(IN,AL) <= Set of available Nameprep processed recommended
                    variants of NP(IN) in AL
4.3.   RV(IN,AL) <= Set of available Nameprep processed character
                    variants of NP(IN) in AL
4.4. End of Loop
5.   {PV} <= Set of all PV(IN,AL)
6.   {ZV} <= {PV} set-union NP(IN)
7.   {RV} <= Set of all RV(IN,AL) set-minus {ZV}
8.   Create IDL Package for IN using IN, {L}, {V}, {ZV} and {RV}
9.   Put {ZV} into zone file

Explanation

Step 1 takes the IDL to be registered and the associated
language(s) as input to the process.

Step 2 extract the set of version numbers of the associated
language(s) tables.

Step 3 Nameprep processed the IDL. If the Nameprep
processing fails, then the IDL is invalid and the
registration process would stop.

If the Nameprep processed IDL is already registered or
reserved, then the conflict policy is applied here.  For
example, if FCFS is used, the registration process would
stop here.

Step 4 goes through all languages associated with the
proposed IDL, checks for validity in each language, and
generates the recommended variants and the reserved
variants.

In step 4.1, IDL validation is done by checking that every
code point in the Nameprep processed IDL is a code point
allowed by the "valid code point" column of the character
variant table for the language. If one or more code points
are invalid, the registration process must stop here.

Step 4.2 generates the list of recommended variants of the
IDL by doing a combination of all possible variants listed
in "recommend variant(s)" column for each code point in the
Nameprep processed IDL.  Generated variants must be
processed with Nameprep.  If any of the Nameprep processing
fails for an variant, then that variant will be remove from
the list.  If any of the recommended variants of the IDL is
registered or reserved, then the conflict policy will be
applied although this does not prevent the IDL from being
registered (unless the policy prevents such registration).
For example, if FCFS is used, then the conflicting
variant(s) will be removed from the list.

Step 4.3 generates the list of reserved variants by doing a
combination of all the possible variants listed in
"character variant(s)" column for each code point in the
Nameprep processed IDL.  If any of the Nameprep processing
fails for an variant, then that variant will be remove from
the list.  Generated variants must be Nameprep processed.
If any of the variants are registered or reserved, then the
conflict policy will apply here although this does not
prevent the IDL from being registered (unless the policy
prevents such registration).  For example, if FCFS is used,
then the conflict variants will be removed from the list.

The "combination" in Step 4.2 and Step 4.3 could achieve by
a recursive function similar to the following pseudo code:

Function Combination(Str)
  F <= first codepoint of Str
  SStr <= Substring of Str, without the first code point
  NSC <= {}

  If SStr is empty Then
    For each V in (Variants of code point F)
      NSC = NSC set-union (the string with the code point V)
    End of Loop
  Else
    SubCom = Combination(SStr)
    For each V in (Variants of code point F)
      For each SC in SubCom
        NSC = NSC set-union (the string with the
              first code point V followed by the string SC)
      End of Loop
    End of Loop
  Endif

  Return NSC

Step 5 generates the list of all recommended variants for
all language.

Step 6 generates the list of variants including the Nameprep
processed IDL which to be activated and Step 7 generates the
list of reserved variants.

For each of the step 5 and 7, the zone administrator may
impose additional rules and processing to limit the numbers
of activated and reserved variants.  These additional rules
and processings are zone specific and therefore not
specified in this document.

Then an "IDL Package" for IDL is created in Step 8 with the
original IDL, the associated language(s), all the list of
activated IDLs and the list of variants.  The version
numbers of the language character variants tables are also
stored in the IDL Package.

Lastly, the activated IDLs are converted using ToASCII
[IDNA] with UseSTD13ASCIIRules on and then put into the zone
file.  If ToASCII fails for any of the activated IDL, that
IDL must not be place into the zone file.  If the IDL is a
subdomain name, it will be delegated. The activated IDLs may
be delegated to a different domain name server so long it is
owned by the same domain name holder.

3.3. Deletion and Transfer of IDL and IDL Package

In normal domain administration, every domain name label is
independent of all other domain name labels.  Registration,
deletion and transfer of domain name labels is done on a per
domain name label basis.  Depending on the zone's
administrative policies, aliases (e.g., "CNAME" entries) may
be bound to particular labels with rules about whether one
can be changed without the other.  Current policies in gTLDs
generally prohibit registration of such aliases, in part to
avoid needing to form and enforce policies about these
change (or binding) rules.

However, with internationalization, each IDL is bound to a
list of variant IDLs (with the list depending on the
associated language), bound together in an IDL Package.

Because all variants of the IDL should belong to a single
domain name holder, the IDL Package should be treated as a
single entity. Individual IDL, either active or reserved,
within the IDL Package should not be deleted or transferred
independently of the other IDLs.  Specifically, if an IDL is
to be deleted or transferred, that action must be taken only
as part of an action that affects the entire IDL Package.

If the local conflict policy requires IDL to be transferred
and deleted independently of the IDL Package, the conflict
policy would take precedence. In such event, the conflict
policy should be associated with a transfer or delete
procedure taking IDL Package into consideration.

When an IDL Package is deleted, all the active and reserved
variants would be available again.  IDL Package deletion
does not change any other IDL Packages, including IDL
Packages that have variants that conflict with the variants
in the deleted IDL Package. This is to be consistent with
the atomicity and predictability of the IDL Package.

3.4. Activation and De-activation of IDL variants

As there are active IDLs and inactive IDLs within an IDL
Package, processes are required to activate or de-activate
IDL variants in an IDL Package.

The activation algorithm is described below:

1.   IN <= IDL to be activated & PA <= IDL Package
2.   NP(IN) <= Nameprep processed IN
3.   If NP(IN) not in {RV} then stop
4.   {RV} <= {RV} set-minus NP(IN) and
             {ZV} <= {ZV} set-union NP(IN)
5.   Put {ZV} into the zone file

Similarly, the deactivation algorithm:
1.   IN <= IDL to be deactivated & PA <= IDL Package
2.   NP(IN) <= Nameprep processed IN
3.   If NP(IN) not in {ZV} then stop
4.   {RV} <= {RV} set-union NP(IN) and
             {ZV} <= {ZV} set-minus NP(IN)
5.   Put {ZV} into the zone file

3.5. Adding/Deleting language(s) association

The list of variants is generated from the IDL and tables
for the associated languages.  If the language associations
are changed, then the lists of variants have to be updated.
On the other hand, the IDL Package is atomic and the list of
variants must not be changed after creation.

Therefore, this document recommends deleting the IDL Package
followed by a registration with the new set of languages
rather than attempting to add or delete language(s)
association within the IDL Package.  Zone administrators may
find it desirable to devise procedures to prevent other
parties from capturing the labels in the IDL Package during
these operations.

3.6. Versioning of the language character variant tables

Language character variants tables are subjected to changes
over time and the changes may or may not be backward
compatible.  It is possible that different version of the
language character variants tables may produce a different
set of recommended variants and reserved variants.

New IDL Packages should use the latest version of the
language character variants tables.

Existing IDL Packages created using previous version of
language character variants tables are not affected when
there a new version of the character variants table is
released.

4. Example of Guideline Adoption

To provide a meaningful example, some language character
variant tables have to be defined.  Assume, then, that the
following four language character variants tables are
defined (note that these tables are not a representation of
the actual table and they do not contain sufficient entries
to be used in any actual implementation):

a) language character variants tables for zh-cn and zh-sg

Reference 1 CP936 (commonly known as GBK)
Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
Reference 3 List of Simplified character Table (Simplified column)
Reference 4 zSimpVariant in Unihan.txt
Reference 5 variant that exists in GB2312, common simplified hanzi

Version 1 20020701 # July 2002

56E2(1);56E2(5);5718(2)            # sphere, ball, circle; mass, lump
5718(1);56E2(4);56E2(2),56E3(2)    # sphere, ball, circle; mass, lump
60F3(1);60F3(5);                   # think, speculate, plan, consider
654E(1);6559(5);6559(2)            # teach
6559(1);6559(5);654E(2)            # teach, class
6DF8(1);6E05(5);6E05(2)            # clear
6E05(1);6E05(5);6DF8(2)            # clear, pure, clean; peaceful
771E(1);771F(5);771F(2)            # real, actual, true, genuine
771F(1);771F(5);771E(2)            # real, actual, true, genuine
8054(1);8054(3);806F(2)            # connect, join; associate, ally
806F(1);8054(3);8054(2),8068(2)    # connect, join; associate, ally
96C6(1);96C6(5);                   # assemble, collect together


b) language variants table for zh-tw

Reference 1 CP950 (commonly known as BIG5)
Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
Reference 3 List of Simplified Character Table (Traditional column)
Reference 4 zTradVariant in Unihan.txt

Version 1 20020701 # July 2002

5718(1);5718(4);56E2(2),56E3(2)    # sphere, ball, circle; mass, lump
60F3(1);60F3(1);                   # think, speculate, plan, consider
6559(1);6559(1);654E(2)            # teach, class
6E05(1);6E05(1);6DF8(2)            # clear, pure, clean; peaceful
771F(1);771F(1);771E(2)            # real, actual, true, genuine
806F(1);806F(3);8054(2),8068(2)    # connect, join; associate, ally
96C6(1);96C6(1);                   # assemble, collect together

c) language variants table for ja

Reference 1 CP932 (commonly known as Shift-JIS)
Reference 2 zVariant in Unihan.txt
Reference 3 variant that exists in JIS X0208, commonly used Kanji

Version 1 20020701 # July 2002

5718(1);5718(3);56E3(2)            # sphere, ball, circle; mass, lump
60F3(1);60F3(3);                   # think, speculate, plan, consider
654E(1);6559(3);6559(2)            # teach
6559(1);6559(3);654E(2)            # teach, class
6DF8(1);6E05(3);6E05(2)            # clear
6E05(1);6E05(3);6DF8(2)            # clear, pure, clean; peaceful
771E(1);771E(1);771F(2)            # real, actual, true, genuine
771F(1);771F(1);771E(2)            # real, actual, true, genuine
806F(1);806F(1);8068(2)            # connect, join; associate, ally
96C6(1);96C6(3);                   # assemble, collect
together

d) language variants table for ko

Reference 1 CP949 (commonly known as EUC-KR)
Reference 2 zVariant in Unihan.txt

Version 1 20020701 # July 2002

5718(1);56E2(1);56E3(2)            # sphere, ball, circle; mass, lump
60F3(1);60F3(1);                   # think, speculate, plan, consider
654E(1);6559(1);6559(2)            # teach
6DF8(1);6E05(1);6E05(2)            # clear
771E(1);771F(1);771F(2)            # real, actual, true, genuine
806F(1);8054(1);8068(2)            # connect, join; associate, ally
96C6(1);96C6(1);                   # assemble, collect together

Example 1: IDL = U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
          {L} = {zh-cn, zh-sg, zh-tw}

NP(IN) = (U+6E05 U+771F U+6559)
PV(IN,zh-cn) = (U+6E05 U+771F U+6559)
PV(IN,zh-sg) = (U+6E05 U+771F U+6559)
PV(IN,zh-tw) = (U+6E05 U+771F U+6559)
{ZV} = (U+6E05 U+771F U+6559)}
{RV} = (U+6E05 U+771E U+6559),
       (U+6E05 U+771E U+654E),
       (U+6E05 U+771F U+654E),
       (U+6DF8 U+771E U+6559),
       (U+6DF8 U+771E U+654E),
       (U+6DF8 U+771F U+6559),
       (U+6DF8 U+771F U+654E)}

Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
          {L} = {ja}

NP(IN) = (U+6E05 U+771F U+6559)
PV(IN,ja) = (U+6E05 U+771F U+6559)
{ZV} = (U+6E05 U+771F U+6559)}
{RV} = (U+6E05 U+771E U+6559),
       (U+6E05 U+771E U+654E),
       (U+6E05 U+771F U+654E),
       (U+6DF8 U+771E U+6559),
       (U+6DF8 U+771E U+654E),
       (U+6DF8 U+771F U+6559),
       (U+6DF8 U+771F U+654E)}

Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
          {L} = {zh-cn, zh-sg, zh-tw, ja, ko}

NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
Invalid registration because U+6E05 is invalid in L = ko

Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718)
                 *lian2 xiang3 ji2 tuan2*
          {L} = {zh-cn, zh-sg, zh-tw}

NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718)
{ZV} = (U+8054 U+60F3 U+96C6 U+56E2),
       (U+806F U+60F3 U+96C6 U+5718)}
{RV} = (U+8054 U+60F3 U+96C6 U+56E3),
       (U+8054 U+60F3 U+96C6 U+5718),
       (U+806F U+60F3 U+96C6 U+56E2),
       (U+806f U+60F3 U+96C6 U+56E3),
       (U+8068 U+60F3 U+96C6 U+56E2),
       (U+8068 U+60F3 U+96C6 U+56E3),
       (U+8068 U+60F3 U+96C6 U+5718)

Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
               *lian2 xiang3 ji2 tuan2*
          {L} = {zh-cn, zh-sg}

NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
{ZV} = (U+8054 U+60F3 U+96C6 U+56E2)}
{RV} = (U+8054 U+60F3 U+96C6 U+56E3),
       (U+8054 U+60F3 U+96C6 U+5718),
       (U+806F U+60F3 U+96C6 U+56E2),
       (U+806f U+60F3 U+96C6 U+56E3),
       (U+806F U+60F3 U+96C6 U+5718),
       (U+8068 U+60F3 U+96C6 U+56E2),
       (U+8068 U+60F3 U+96C6 U+56E3),
       (U+8068 U+60F3 U+96C6 U+5718)}

Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
               *lian2 xiang3 ji2 tuan2*
          {L} = {zh-cn, zh-sg, zh-tw}

NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
Invalid registration because U+8054 is invalid in L = zh-tw

Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718)
               *lian2 xiang3 ji2 tuan2*
          {L} = {ja,ko}

NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718)
PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718)
{ZV} = (U+806F U+60F3 U+96C6 U+5718)}
{RV} = (U+806F U+60F3 U+96C6 U+56E3),
       (U+8068 U+60F3 U+96C6 U+5718),
       (U+8068 U+60F3 U+96C6 U+56E3)}

i. Notes

1. The terms "i18n" and "l10n", sometimes used in upper-case
form (i.e., "I18N" and "L10N"), have become popular in
international standards usage as abbreviations for
"internationalization" and "localization", respectively.
The abbreviations were derived by using the first and last
letters of the words, with the number of characters that
appear between them.  I.e., in "internationalization", there
are 18 characters between the initial "i" and the terminal
"n".

2. Every human language is unique and therefore, every
linguistic and localization issue is also unique. It is
difficult or impossible to make comparisons across multiple
languages or to classify them into categories.  And any
cross-language analogies are, by their very nature,
imperfect at best.

For example, to classify Traditional Chinese/Simplified
Chinese as upper/lower case makes as much sense as to
classify TC/SC as "spelling variant" like "color" and
"colour". Both comparisons are potentially useful but
neither is completely correct.

3. The variants in CJK are very complex and require many
different layers of solution. This guideline is a one of the
solution components, but not sufficient, by itself, to solve
the whole problem.

ii. Other Issues

It is possible that many variants generated may have no
meaning in the associated language or languages.  The
intention is not to generate meaningful "words" but to
generate similar variants to be reserved.

The language Character Variants tables are critical to the
success of the guideline.  A badly designed table may either
generate too many meaningless variants or may not generate
enough meaningful variants. The principles to be used to
generate the tables are not within the scope of this
document, nor are the tables themselves.

This document recommends against registration of IDL in a
particular language until the language character variants
table for that language is available.

iii. Acknowledgements

The authors gratefully acknowledge the contributions of:

V.CHEN, N.HSU, H.HOTTA, S.TASHIRO, Y.YONEYA and other Joint
Engineering Team members at the JET meeting in
Bangkok/Thailand.

Yves Arrouye, an observer at the JET meeting in
Bangkok/Thailand, for his contribution on the IDL Package.

Soobok LEE
L.M TSENG
Patrik FALTSTROM
Paul HOFFMAN
Erin CHEN
MAO Wei
LEE Xiaodong
Harald ALVESTRAND
Erik NORDMARK

iv. Author(s)

James SENG
180 Lompang Road
#22-07 Singapore 670180
Phone: +65 9638-7085
Email: jseng@pobox.org.sg

Kazunori KONISHI
JPNIC
Kokusai-Kougyou-Kanda Bldg 6F
2-3-4 Uchi-Kanda, Chiyoda-ku
Tokyo 101-0047
JAPAN
Phone: +81 49-278-7313
Email: konishi@jp.apan.net

Kenny HUANG
TWNIC
3F, 16, Kang Hwa Street, Taipei
Taiwan
TEL : 886-2-2658-6510
Email: huangk@alum.sinica.edu

QIAN Hualin
CNNIC
No.6 Branch-box of No.349 Mailbox, Beijing 100080
Peoples Republic of China
Email: Hlqian@cnnic.net.cn

KO YangWoo
PeaceNet
Yangchun P.O. Box 81 Seoul 158-600
Korea
Email: newcat@peacenet.or.kr

John C KLENSIN
1770 Massachusetts Ave, No. 322
Cambridge, MA 02140
USA
Email: Klensin+ietf@jck.com

v. Normative References

[ABNF]      Augmented BNF for Syntax Specifications: ABNF, RFC
            2234, D. Crocker and P. Overell, Eds., November 1997.

[I18NTERMS] Terminology Used in Internationalization in the
            IETF, draft-hoffman-i18n-terms-07.txt, September 2002,
            Paul Hoffman, work in progress

[RFC3066]   Tags for the Identification of Languages,
            RFC3066, Jan 2001, H. Alvestrand

[IDNA]      Internationalizing Domain Names in Applications,
            draft-ietf-idn-idna, Feb 2002, Patrik Faltstrom,
            Paul Hoffman, Adam M. Costella, work in progress

[PUNYCODE]  Punycode: An encoding of Unicode for use with
            IDNA, draft-ietf-idn-punycode, Feb 2002, Adam
            M. Costello, work in progress

[STRINGPREP]Preparation of Internationalized Strings,
            draft-hoffman-stringprep, Feb 2002, Paul
            Hoffman, Marc Blanchet, work in progress

[NAMEPREP]  Nameprep: A Stringprep Profile for
            Internationalized Domain Names, work in progress,
            draft-ietf-idn-nameprep, Feb 2002, Paul Hoffman,
            Marc Blanchet, work in progress

[UNIHAN]    Unicode Han Database, Unicode Consortium
            ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt

[UNICODE]   The Unicode Consortium, "The Unicode Standard -
            Version 3.0", ISBN 0-201-61633-5. Unicode Standard Annex
            #28, (http://www.unicode.org/unicode/reports/tr28/)
            defines Version 3.2 of The Unicode Standard.

[ISO7098]   ISO 7098;1991 Information and documentation - Romanization
            of Chinese, ISO/TC46/SC2.

vi. Non-normative References

[IDN-WG]    IETF Internationalized Domain Names Working Group,
            idn@ops.ietf.org, James Seng, Marc Blanchet.
            http://www.i-d-n.net/

[STD13]     Paul Mockapetris, "Domain names - concepts and facilities"
            (RFC 1034) and "Domain names - implementation and
            specification" (RFC 1035), STD 13, November 1987.

[C2C]       Pitfalls and Complexities of Chinese to Chinese Conversion,
            http://www.cjk.org/cjk/c2c/c2c.pdf, Jack Halpern, Jouni
Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 3743. Expired & archived
	Select version	00 01 02 03 04 05 RFC 3743
	Compare versions
	Author
	RFC stream	Independent Submission
	Other formats	txt pdf bibtex bibxml