Internet Draft                                   Paul Hoffman, Editor
draft-ietf-idn-ace-report-00.txt
June 14, 2001
Expires in six months

                 Report of the IDN ACE Design Team

Status of this memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Abstract

This document is a summary of the work of the ACE design team of the
Internationalized Domain Name (IDN) Working Group. If the IDN WG selects
a single ACE, the design team suggests DUDE. There are many factors that
the IDN WG might consider that may lead it to choose a different ACE;
those factors and proposals that some of the design team members favored
are also described in this document.


1. Introduction

The chairs of the IDN WG appointed an ACE design team to study the many
ACE proposals that had come to the working group and to make a
recommendation based on that study. The design team consisted of Adam
Costello, Paul Hoffman, Makoto Ishisone, David Lawrence, Brian
Spolarich, and Rick Wesson. There were three advisors: Marc Blanchet,
Patrik Faltstrom, and Erik Nordmark.

The design team evaluated the large number of ACEs that have been
proposed in the IDN WG. In comparing them, we looked primarily at two
factors:

- how easy they are to understand and implement

- whether they would restrict long names that are likely to be used

Our discussions led us to discover that neither factor was particularly
easy to measure. Given that it was hard to measure either factor, it was
also difficult to decide how to weigh the two of them against each
other.


2. Recommendation

Based on the two factors, the design team recommends to the IDN WG that
it picks the DUDE algorithm as the ACE to be used in its protocol. There
was general agreement that DUDE was fairly easy to implement
(particularly with the design changes starting with the -02 draft) and
did not restrict long names that were likely to be used in domain names.

There was not complete agreement in the design team on recommending
DUDE. Members disagreed about how much less complex DUDE was
compared to the other proposed ACEs. In addition, some members felt that
price of higher complexity of other proposed ACEs was worth the greater
compression that they give.

The design team chose the DUDE algorithm after the release of the -02
draft, which has some significant design changes from earlier drafts.
Even among the DUDE supporters, there was not universal acclaim. Some
felt that the discussion of "mixed-case annotation" should be removed,
but were willing to recommend the protocol anyway and ask the IDN WG to
remove that optional part of the protocol later.

It is important to note that the ACEs we considered most strongly do not
provide for special treatment of any particular script or language. The
design team members felt that there was no way to provide for such
handling that would not dramatically increase the complexity of the
protocol, and the apparent benefits in efficiency were relatively
modest. All the algorithms provide for relatively efficient treatment of
all scripts, and do not impose unreasonable limitations on label size
for users of particular scripts; the variation for particular scripts is
small in the proposed ACEs.


3. Weighing the Design Goals

3.1 Complexity

It is very difficult to analyze how complex an algorithm is. The
proposed ACE algorithms had different types of complexity and were
therefore difficult to compare accurately. For example, it is not clear
how to compare the complexity of a two-pass algorithm such as RACE or
LACE with one-pass algorithm with binary arithmetic such as DUDE. It was
pointed out that other algorithms that are quite complex have been
implemented well on the Internet.

It is not clear how important complexity is in the long run. One
argument says that most applications that use ACE will use an ACE
conversion toolkit supplied by an outside source, and there is likely to
be only a small number of such toolkits. An opposing argument is that,
even if that is true in most cases, there still has to be dozens if not
hundreds of toolkits for the various platforms on which IDN will be
supported. Further, many companies insist on writing all their own
software, even when it is complex (the IPsec market is a good example of
this).

3.2 Restrictions on long names that are likely to be used

The IDN WG had earlier agreed with the statement that the purpose of
compression is not to reduce the number of octets on the wire, but to
allow longer sensible name parts within the 63-octet limit.
Unfortunately, it is impossible to determine how long "sensible name
parts" would be in various scripts and languages. Some of what makes a
name part sensible is its usefulness in non-computer environments such
as on billboards, business cards, and radio commercials. Stringing
together many words is common in most languages, but it reduces the
reproducibility of a name.

The other side of this argument is that the domain name system requires
every name at a particular level of the name hierarchy to be unique. It
is quite common to see English names in the .com zone that clearly are
not the first choice of the companies or people who got them, most
likely because the desired (shorter) name was already taken. Because of
name exhaustion and the currently tightly-restricted choice in the TLD
zone, the length of sensible names is higher than it might be with more
TLDs available.

After asking many language experts, some of the people on the design
team came to the conclusion that 15 characters for Han-based languages
and 30 characters for alphabetic-based languages would put very few
restrictions on names that would reasonably be expected to be used. Of
course, any limit can be viewed as too restrictive, even the 63
character limit for current names. For example, the name:

computerengineeringdepartmentatuniversityofcaliforniasantabarbara

makes linguistic sense, but is unlikely to be used because it runs
together too many words, and would be unwieldy to type.


4. Analysis of the ACEs

The ACE drafts considered by the design team are listed here.
Note that these are not long-term documents and are therefore
not listed in the references section of this document.

4.1 All ACE proposals

draft-ietf-idn-altdude-00.txt -- AltDUDE. Withdrawn by author.

draft-ietf-idn-amc-ace-*-00.txt -- A series of one-step encodings with
varying degrees of complexity and compression.

draft-ietf-idn-brace-00.txt -- BRACE: Bi-mode Row-based ASCII-Compatible
Encoding for IDN. Withdrawn by author.

draft-ietf-idn-dude-02.txt -- Differential Unicode Domain Encoding
(DUDE). Uses a one-step encoding that uses the binary XOR of successive
characters, encoded with Base32.

draft-ietf-idn-dunce-00.txt -- DUNCE: A proposal for a Definitely
Unencumbered New Compatible [ACE] Encoding. Specifies multiple different
ways to encode strings directly but does not say how to make the
encoding unique. Also, does not specify a compression mechanism.

draft-ietf-idn-lace-01.txt -- LACE: Length-based ASCII Compatible
Encoding for IDN. Uses a two-step encoding: first compress (using a
simple run-length encoding algorithm), then use Base32 on the compressed
string.

draft-ietf-idn-race-03.txt -- RACE: Row-based ASCII Compatible Encoding
for IDN. This document expired.

draft-ietf-idn-sace-01.txt -- Simple ASCII Compatible Encoding (SACE).
This document expired.

draft-ietf-idn-step-00.txt -- StepCode- A User Access Oriented IDN
Encoding. Denotes Chinese characters with their phonetic elements. It
does not apply to other languages or scripts and is not based on the
ISO/IEC 10646 character repertoire.

draft-ietf-idn-utf6-00.txt -- UTF-6 - Yet Another ASCII-Compatible
Encoding for IDN. This document expired.

draft-ietf-idn-vidn-01.txt -- Virtually Internationalized Domain Names
(VIDN). Uses phonetic transliteration to create ACEs. There were many
problems for many languages that were pointed out on the WG mailing
list. The proposal is at least partially covered by a patent.

A draft on MACE, Modal ASCII-Compatible Encoding, is expected to be
published soon. The design team considered a preliminary version of the
encoding described my Makoto Ishisone.

4.2 Primary choices

The design team focused on three classes of ACE: LACE, DUDE, and the AMC
series. The ACEs had different levels of complexity and different
amounts of compression for mixes of one-row and multi-row input.

The following table summarizes the maximum length for an input string
for two cases: the entire string is a typical mix from one row (such as
a single-row script), and the entire string is in Han, which usually is
a mix of widely-divergent rows. Other comparisons are possible, of
course; you might compare how well each ACE does for primarily Latin
names (which use a mix from two rows), or names that are mostly
non-ASCII characters but use an occasional ASCII character such as a
dash.

           Equation for    Max for     Equation  Max for Han
           all one row   all one row   for Han     typical
             typical       typical     typical
DUDE         1.5n           39          3.8n        15
AMC-W        1.5n           39          1+3n        19
AMC-V        1.5n           39          1+3n        19
LACE       3.2+1.6n         34        1.6+3.2n      17

Two observations come out of this:

- All of the proposals give 34 or more characters for one row typical.
Except for strung-together names and some very long German or Thai
nouns, that is probably sufficient for most typical names.

- All of the proposals give 15 or more characters for Han typical.
Again, that is probably be fine for the vast majority of names, even
those with a few sub-names strung together.

Although LACE allows names with two more Han characters than DUDE, the
authors of LACE feel that the two-step process is indeed more
complicated and therefore did not warrant its use when compared to DUDE.

When compared to DUDE, AMC-W, and AMC-V get four more Han characters
with no loss of one-row characters. However, they are both more
complicated than DUDE. The members of the group disagreed as to how much
more complicated they were, with one group saying that they were "much
more complicated" and another group saying "only a little more
complicated".


5. Security Considerations

The design team did not perform security reviews on the ACE candidates.
A cursory review was done to see whether every Unicode string could
result in only one ACE string, and every ACE string could result in zero
or one Unicode strings. It is assumed that the authors of each ACE
proposal did more intense testing for the one-to-one correspondence.


6. References

References to particular ACE implementations are not given here because
none are currently RFCs and it is assumed that only one (or a small
number) will eventually reach RFC status.


7. Editor Contact Information

Paul Hoffman
Internet Mail Consortium and VPN Consortium
127 Segre Place
Santa Cruz, CA  95060 USA
paul.hoffman@imc.org and paul.hoffman@vpnc.org