Internet Draft                                            Yoshiro Yoneya
draft-ietf-idn-ace-eval-jp-00.txt                                  JPNIC
Jun 26, 2000
Expires Dec 26, 2001

     Evaluation of various ACEs with existing Japanese Domain Names

Status of this memo

    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as
    Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as reference
    material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.

Abstract

    The ACE (ASCII Compatible Encoding) Design Team in IDN Working Group
    is working on choosing a good ACE proposal.  To prompt the ACE Design
    Team work, this document explains about result of applying various
    ACEs to existing Japanese Domain Names, proposes evaluation criteria,
    and considers which ACE is appropriate to Japanese Domain Names.

1. Evaluation data outline

    The sample data of this evaluation was over 50000 existing Japanese
    Domain names under JP ccTLD that were registered during 22 Feb 2001
    to 23 Mar 2001.

    The Japanese Domain Names involves a lot of following names that are
    familiar to Japanese.

        - Company names
        - Registered names
        - University names
        - Personal names

    The Japanese Domain Name consists from combination one or more
    Japanese Characters defined in [JPCHAR] and zero or more LDH
    (Letters, Digits, Hyphens) defined in [RFC1035].  Number of
    characters of the Japanese Domain Name are 1 through 15 due to
    restriction of JP ccTLD registration regulation.  The Japanese
    Domain Names are normalized according to [NAMEPREP] and [JPCHAR] at
    the registration time.  Following graph shows distribution of number
    of characters of sample data.

     N  15|**
     u  14|***
     m  13|****
     b  12|******
     e  11|*******
     r  10|************
         9|**************
     o   8|**********************
     f   7|*************************
         6|********************************
     c   5|********************************
     h   4|************************************************
     a   3|***************************
     r   2|**********************
     s   1|**
          +----+----+----+----+----+----+----+----+----+----
               1    2    3    4    5    6    7    8    9    (x1000)
                      Number of Japanese Domain Names

    Applied ACEs are RACE-03 [RACE], BRACE-00 [BRACE], LACE-01 [LACE],
    UTF-6-00 [UTF6], DUDE-02 [DUDE], AMC-ACE-M-00 [AMCACEM], AltDUDE-01
    [AltDUDE], AMC-ACE-O-00 [AMCACEO], AMC-ACE-R-01 [AMCACER],
    AMC-ACE-V-00 [AMCACEV], AMC-ACE-W-00 [AMCACEW] and MACE-00 [MACE].
    JPNIC's mDNkit-2.2 [MDNKIT] will provide implementation of all of
    them.  Of course, current version of mDNkit, mDNkit-2.1, provides
    most of them.  This evaluation was done with mDNkit-2.2 snapshot.

    The process for creating evaluation data was following.

        1) Applies Japanese Domain Names to each ACE and measures
           conversion time.
        2) Extracts ACE signature from resulting string.
        3) Compares resulting string length with original Japanese
           Domain Name string length.
        4) Calculates statistical values.


2. Result of each ACE

    In this section, CHARS indicates original Japanese Domain Name
    string length (by means of characters), MAX and MIN indicates worst
    and best case of resulting string length (by means of octets)
    respectively, MEAN indicates mean of resulting string length, VAR
    indicates variance of resulting string length and SLOPE indicates
    slope of regression line of MAX, MIN and MEAN.

2.1 RACE

    CHARS     1      2      3      4      5      6      7      8
    MAX       5      8     12     15     18     21     24     28
    MIN       4      5      7      8     10     12     13     15
    MEAN   4.01   7.63   9.80  12.06  13.73  16.23  17.53  21.81
    VAR    0.01   0.96   6.14  11.86  15.58  19.73  28.47  41.14

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      31     34     37     40     44     47     50   3.21
    MIN      16     18     20     21     23     24     26   1.59
    MEAN  23.71  27.14  29.10  32.27  34.72  37.36  39.71   2.53
    VAR   54.12  60.27  69.14  82.95 103.23 122.69 138.72

2.2 BRACE

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      7     10     14     17     20     23     26
    MIN       4      5      7      8      9     11     12     14
    MEAN   4.00   6.78   8.62  11.32  12.66  14.34  15.62  18.86
    VAR    0.00   0.36   2.11   7.55   9.06  13.75  19.04  24.83

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      30     33     36     39     42     46     49   3.21
    MIN      15     16     17     19     21     22     24   1.40
    MEAN  20.27  23.31  24.16  26.72  27.62  29.72  30.84   1.93
    VAR   32.43  40.05  42.54  54.13  48.78  57.94  47.31

2.3 LACE

    CHARS     1      2      3      4      5      6      7      8
    MAX       5      8     12     15     18     21     24     28
    MIN       5      7      8     10     12     13     15     16
    MEAN   5.00   7.88  10.25  12.85  14.72  16.52  18.27  21.74
    VAR    0.00   0.11   3.94   5.90   8.39  13.91  15.86  30.83

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      31     34     37     40     44     47     50   3.21
    MIN      18     20     21     23     24     26     28   1.61
    MEAN  23.58  26.55  27.99  31.01  32.29  34.83  35.74   2.23
    VAR   33.45  36.77  46.60  51.81  63.82  69.28  60.90

2.4 UTF6

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     12     16     20     24     28     32
    MIN       4      6      6     10     13     14     16     18
    MEAN   4.00   7.86  10.64  13.86  16.18  19.20  21.45  25.79
    VAR    0.00   0.14   2.23   6.05  11.77  19.51  27.94  40.51

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      36     40     44     48     52     56     60   4.00
    MIN      21     22     25     27     29     31     33   2.12
    MEAN  28.66  32.73  35.25  39.46  42.24  45.75  48.20   3.17
    VAR   53.18  66.86  84.41  98.58 120.62 141.40 170.42

2.5 DUDE

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     12     16     20     24     28     32
    MIN       4      5      6      7      8      9     12     14
    MEAN   4.00   7.57   9.75  12.69  14.01  16.45  18.22  21.94
    VAR    0.00   0.57   3.82   7.93  10.31  15.55  18.58  29.66

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      36     40     44     48     52     56     58   3.95
    MIN      15     17     19     20     23     25     28   1.70
    MEAN  23.55  26.67  28.15  30.96  32.38  34.71  35.91   2.29
    VAR   30.62  41.07  41.05  47.99  45.43  49.33  36.39

2.6 AMC-ACE-M

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     12     15     18     21     24     27
    MIN       4      5      6      7      8     10     12     14
    MEAN   4.00   7.57   9.30  11.71  13.04  15.17  16.78  19.83
    VAR    0.00   0.58   3.05   5.47   6.69   9.36  11.06  16.02

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      30     33     36     38     42     44     47   3.01
    MIN      15     17     18     20     22     23     26   1.58
    MEAN  21.40  24.06  25.64  28.08  29.49  31.68  33.08   2.05
    VAR   17.30  21.74  23.11  26.29  26.49  29.50  24.81

2.7 AltDUDE

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     12     16     20     24     28     32
    MIN       4      5      6      7      8      9     12     14
    MEAN   4.00   7.57   9.75  12.69  14.01  16.45  18.22  21.94
    VAR    0.00   0.57   3.82   7.93  10.31  15.55  18.58  29.66

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      36     40     44     48     52     56     58   3.95
    MIN      15     17     19     20     23     25     28   1.70
    MEAN  23.55  26.67  28.15  30.96  32.38  34.71  35.91   2.29
    VAR   30.62  41.07  41.05  47.99  45.43  49.33  36.39

2.8 AMC-ACE-O

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     12     16     20     24     27     31
    MIN       4      5      6      7      8     10     12     14
    MEAN   4.00   7.58   9.62  12.33  13.59  15.81  17.46  20.94
    VAR    0.00   0.55   3.76   7.75   9.82  14.57  17.65  27.78

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      35     39     42     46     50     53     57   3.77
    MIN      15     17     18     20     22     23     26   1.58
    MEAN  22.37  25.32  26.73  29.29  30.49  32.74  33.74   2.12
    VAR   28.73  37.75  37.60  43.39  40.41  44.92  36.04

2.9 AMC-ACE-R

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     12     16     20     24     28     32
    MIN       4      5      6      7      8     10     12     14
    MEAN   4.00   7.58   9.79  12.74  14.28  16.69  18.49   22.28
    VAR    0.00   0.55   3.92   8.38  11.49  16.94  21.65   33.06

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      36     40     44     48     52     56     59   3.98
    MIN      15     17     18     20     22     23     27   1.60
    MEAN  23.93  27.08  28.61  31.47  32.68  34.98  36.14   2.31
    VAR   36.33  45.99  48.35  57.62  54.90  60.44  48.34

2.10 AMC-ACE-V

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     11     15     18     20     24     26
    MIN       4      6      7      8      9     12     13     15
    MEAN   4.00   6.89   9.10  11.63  13.41  15.64  17.52  20.47
    VAR    0.00   0.12   1.10   2.24   3.25   4.69   5.99   9.20

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      29     32     36     38     41     44     46   2.98
    MIN      15     17     18     21     23     25     28   1.62
    MEAN  22.23  24.88  26.62  29.02  30.66  32.77  34.33   2.17
    VAR   10.35  13.30  14.15  17.08  16.36  17.98  14.11

2.11 AMC-ACE-W

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      8     11     15     18     21     25     28
    MIN       4      6      7      8      9     11     13     14
    MEAN   4.00   6.89   9.07  11.60  13.32  15.52  17.37  20.37
    VAR    0.00   0.12   1.27   2.62   4.06   5.85   7.36  10.89

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      32     33     36     38     42     44     46   3.01
    MIN      15     17     18     20     23     25     29   1.64
    MEAN  22.14  24.77  26.46  28.92  30.60  32.77  34.29   2.17
    VAR   12.42  15.74  16.64  19.59  19.37  21.07  16.14

2.12 MACE

    CHARS     1      2      3      4      5      6      7      8
    MAX       4      7     10     13     16     19     22     25
    MIN       4      6      7      8      9     11     13     15
    MEAN   4.00   6.98   9.43  11.97  13.86  16.08  17.96  20.85
    VAR    0.00   0.02   0.51   1.48   2.75   4.35   5.80   8.66

    CHARS     9     10     11     12     13     14     15  SLOPE
    MAX      28     31     34     37     40     43     46   3.00
    MIN      15     17     19     21     24     25     29   1.68
    MEAN  22.67  25.26  26.98  29.36  31.16  33.29  34.78   2.19
    VAR   10.16  13.20  13.97  16.65  17.02  18.12  13.97

2.13 Conversion Time

    Conversion time (by means of seconds) was measured by converting all
    sample data at once on the same machine and the same condition
    (without another load).

               To    From
    AltDUDE    1.98  2.41
    DUDE       1.99  2.43
    MACE       2.03  2.58
    RACE       2.08  2.57
    AMC-ACE-W  2.08  2.60
    UTF-6      2.11  2.71
    LACE       2.12  2.89
    AMC-ACE-R  2.22  2.85
    BRACE      2.23  2.87
    AMC-ACE-M  2.89  3.47
    AMC-ACE-V  3.49  5.36
    AMC-ACE-O  4.86  5.42

3. Consideration about results

3.1 MAX

    Following is a compiled graph of each ACE's MAX value of each CHARS.

  AMC-ACE-V|   1   2  3   4  5 6   7 8  9  A   B C  D  E F                2.98
       MACE|   1  2  3  4  5  6  7  8  9  A  B  C  D  E  F                3.00
  AMC-ACE-W|   1   2  3   4  5  6   7  8   9A  B C   D E F                3.01
  AMC-ACE-M|   1   2   3  4  5  6  7  8  9  A  B C   D E  F               3.01
      BRACE|   1  2  3   4  5  6  7  8   9  A  B  C  D   E  F             3.21
       LACE|    1  2   3  4  5  6  7   8  9  A  B  C   D  E  F            3.21
       RACE|    1  2   3  4  5  6  7   8  9  A  B  C   D  E  F            3.21
  AMC-ACE-O|   1   2   3   4   5   6  7   8   9   A  B   C   D  E   F     3.77
    AltDUDE|   1   2   3   4   5   6   7   8   9   A   B   C   D   E F    3.95
       DUDE|   1   2   3   4   5   6   7   8   9   A   B   C   D   E F    3.95
  AMC-ACE-R|   1   2   3   4   5   6   7   8   9   A   B   C   D   E  F   3.98
      UTF-6|   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F  4.00
           +----+----+----+----+----+----+----+----+----+----+----+----+
                5   10   15   20   25   30   35   40   45   50   55   60

    X-axis is resulting string length, Y-axis is ACE name and letter in
    graph is number of characters in domain name.  A-Z represents 10-15
    respectively.  Rightmost decimal represents slope of regression line.

3.2 MIN

    Following is a compiled graph of each ACE's MIN value of each CHARS.

      BRACE|   12 345 67 89AB C DE F                                      1.40
  AMC-ACE-M|   12345 6 7 89 AB C DE  F                                    1.58
  AMC-ACE-O|   12345 6 7 89 AB C DE  F                                    1.58
       RACE|   12 34 5 67 89 A BC DE F                                    1.59
  AMC-ACE-R|   12345 6 7 89 AB C DE   F                                   1.60
       LACE|    1 23 4 56 78 9 AB CD E F                                  1.61
  AMC-ACE-V|   1 2345  67 9 AB  C D E  F                                  1.62
  AMC-ACE-W|   1 2345 6 789 AB C  D E   F                                 1.64
       MACE|   1 2345 6 7 9 A B C  DE   F                                 1.68
    AltDUDE|   123456  7 89 A BC  D E  F                                  1.70
       DUDE|   123456  7 89 A BC  D E  F                                  1.70
      UTF-6|   1 3   4  56 7 8  9A  B C D E F                             2.12
           +----+----+----+----+----+----+----+----+----+----+----+----+
                5   10   15   20   25   30   35   40   45   50   55   60

    X-axis is resulting string length, Y-axis is ACE name and letter in
    graph is number of characters in domain name.  A-Z represents 10-15
    respectively.  Rightmost decimal represents slope of regression line.

3.3 MEAN

    Following is a compiled graph of each ACE's MEAN value of each CHARS.

      BRACE|   1 2 3  45 67  8 9  AB CD EF                                1.93
  AMC-ACE-M|   1  2 3 4 5 67  8 9  AB  CD E F                             2.05
  AMC-ACE-O|   1  2 3  45 6 7  8 9  AB  CD EF                             2.12
  AMC-ACE-W|   1 2  3 4 5 6 7  8 9 A B C D E F                            2.17
  AMC-ACE-V|   1 2  3 4 5 6 7  8 9 A B  CD E F                            2.17
       MACE|   1 2  3 4 5  67  8 9  AB  C D EF                            2.19
       LACE|    1 2  3 4 5 6 7  8 9  AB   CD EF                           2.23
    AltDUDE|   1  2 3  4 5 6 7  8 9  A B C D EF                           2.29
       DUDE|   1  2 3  4 5 6 7  8 9  A B C D EF                           2.29
  AMC-ACE-R|   1  2 3  4 5 6 7   89   AB  CD E F                          2.31
       RACE|   1  2 3  45  67   8 9   A B  C D  E F                       2.53
      UTF-6|   1  2  3  4  5  6 7   8  9   A  B   C  D  E  F              3.17
           +----+----+----+----+----+----+----+----+----+----+----+----+
                5   10   15   20   25   30   35   40   45   50   55   60

    X-axis is resulting string length, Y-axis is ACE name and letter in
    graph is number of characters in domain name.  A-Z represents 10-15
    respectively.  Rightmost decimal represents slope of regression line.

3.4 VAR

    Following is a compiled graph of each ACE's VAR value of each CHARS.

       MACE|24 56 7  89  AF  DE
  AMC-ACE-V|2345 67  89  AF DCE
  AMC-ACE-W|23 45 67   89   FB DCE
  AMC-ACE-M|12 3 4 5 6 7    89    AB FD  E
  AMC-ACE-O|12  3   4 5    6  7         89      F B D  C E
    AltDUDE|12  3   4 5     6  7          89    F    B   D  CE
       DUDE|12  3   4 5     6  7          89    F    B   D  CE
      BRACE|2 3     45    6    7     8      9       A  B   F DCE
       LACE|2   3 4 5     6 7              8 9   A         B  CFE
  AMC-ACE-R|12  3   4  5     6    7          8  9         A F  E
       RACE|12    3     4   5   6       7            8        9ABC D E F
      UTF-6|2 3   4     5       6       7            8        9 AB C D E  F
           ++----+----+----+----+----+----+----+----+----+----+----+----+----
            0    5   10   15   20   25   30   35   40   45   50  100  150

    X-axis is value of variance, Y-axis is ACE name and letter in graph
    is number of characters in domain name.  A-Z represents 10-15
    respectively.  Note that scale of X-axis over 50 is 10 times large.

3.5 Consideration

    Regarding to the result of each ACE, It could be said:

        - MIN (best case) is not significant.  It is small enough.
        - Conversion time is significant.  Difference between fastest
          and slowest is double.
        - MAX (worst case) is significant.  It reflects maximum number
          of characters in domain name label.  Many of Japanese company
          or organization names exceeds 15 characters.  From the
          Japanese point of view, ACE should allow more than 15
          characters.
        - MEAN (average) is significant.  It reflects efficiency of the
          ACE.
        - VAR (variance) is significant.  It also reflects efficiency of
          the ACE.  End users easily can estimate resulting length of
          conversion.

    In other words, ACE with smaller slope of regression line of MAX and
    MEAN, faster in conversion and smaller in VAR is effective.

    Therefore, following could be additional criteria for evaluating
    ACEs in a certain language.

        1) Length of resulting string (by means of octets) is shorter
           than another ACEs in MAX (worst case).
        2) Value of resulting string (by means of octets) is smaller
           than another ACEs in MEAN (average).
        3) Seconds of conversion is faster than other ACEs.
        4) Value of variance of resulting string length in each number
           of characters.


4. Conclusion

    For Japanese Domain Names, by criteria described above, MACE or
    AMC-ACE-W is preferable.


5. References

    [JPCHAR]   "Japanese characters in multilingual domain name labels",
               draft-ietf-idn-jpchar-01.txt, Mar 2001, Y Yoneya, Y Morishita
    [RFC1035]  "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION",
               RFC1034, Nov 1987, P. Mockapetris
    [NAMEPREP] "Preparation of Internationalized Host Names",
               draft-ietf-idn-nameprep-03.txt, Feb 2001, P Hoffman, M Blanchet
    [RACE]     "RACE: Row-based ASCII Compatible Encoding for IDN",
               draft-ietf-idn-race-03.txt, Nov 2000, P Hoffman
    [BRACE]    "BRACE: Bi-mode Row-based ASCII-Compatible Encoding for IDN
               version 0.1.2"
               draft-ietf-idn-brace-00.txti, Sep 2000, A Costello
    [LACE]     "LACE: Length-based ASCII Compatible Encoding for IDN"
               draft-ietf-idn-lace-01.txt, Jan 2001, M Davis, P Hoffman
    [UTF6]     "UTF-6 - Yet Another ASCII-Compatible Encoding for IDN"
               draft-ietf-idn-utf6-00, Nov 2000, M Welter, B Spolarich
    [DUDE]     "Differential Unicode Domain Encoding (DUDE)"
               draft-ietf-idn-dude-02.txt, Jun 2001, M Welter, B Spolarich,
               A Costello
    [AMCACEM]  "AMC-ACE-M version 0.1.0"
               draft-ietf-idn-amc-ace-m-00.txt, Feb 2001, A Costello
    [AltDUDE]  "AltDUDE version 0.0.2"
               draft-ietf-idn-altdude-00.txt, Mar 2001, A Costello
    [AMCACEO]  "AMC-ACE-O version 0.0.3"
               draft-ietf-idn-amc-ace-o-00.txt, Mar 2001, A Costello
    [AMCACER]  "AMC-ACE-R version 0.2.1"
               draft-ietf-idn-amc-ace-r-01.txt, May 2001, A Costello
    [AMCACEV]  "AMC-ACE-V version 0.1.0"
               draft-ietf-idn-amc-ace-v-00.txt, May 2001, A Costello
    [AMCACEW]  "AMC-ACE-W version 0.1.0"
               draft-ietf-idn-amc-ace-w-00.txt, May 2001, A Costello
    [MACE]     "MACE: Modal ASCII Compatible Encoding for IDN"
               draft-ietf-idn-mace-00.txt, Jun 2001, M Ishisone, Y Yoneya
    [MDNKIT]   "Multilingual Domain Name tool Kit",
               http://www.nic.ad.jp/jp/research/idn/mdnkit/download/

6. Acknowledgements

    Japan Registry Service Co., Ltd. provided really registered Japanese
    Domain Names.
    JPNIC mDNkit development team provided me new ACE implementation
    swiftly when it was published as Internet Draft.
    Tomoyuki Hasei created fundamental tools and data for evaluation.
    JPNIC IDN-TF members gave me a lot of advices.

7. Author's Address

    Yoshiro Yoneya
    Japan Network Information Center
    Fuundo Bldg 1F, 1-2 Kanda-ogawamachi
    Chiyoda-ku Tokyo 101-0052, Japan
    yone@nic.ad.jp