Skip to main content

Mapping Characters in IDNA
draft-ietf-idnabis-mappings-05

Approval announcement
Draft of message to be sent after approval:

Announcement

From: The IESG <iesg-secretary@ietf.org>
To: IETF-Announce <ietf-announce@ietf.org>
Cc: Internet Architecture Board <iab@iab.org>,
    RFC Editor <rfc-editor@rfc-editor.org>, 
    idnabis mailing list <idna-update@alvestrand.no>, 
    idnabis chair <idnabis-chairs@tools.ietf.org>
Subject: Document Action: 'Mapping Characters in IDNA' to Informational
RFC

The IESG has approved the following document:

- 'Mapping Characters in IDNA '
   <draft-ietf-idnabis-mappings-04.txt> as an Informational RFC


This document is the product of the Internationalized Domain Names in
Applications, Revised Working Group. 

The IESG contact persons are Lisa Dusseault and Alexey Melnikov.

A URL of this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-idnabis-mappings-04.txt

Ballot Text

Technical Summary

   Relevant content can frequently be found in the abstract
   and/or introduction of the document.  If not, this may be 
   an indication that there are deficiencies in the abstract
   or introduction.

Working Group Summary

   Was there anything in the WG process that is worth noting?
   For example, was there controversy about particular points 
   or were there decisions where the consensus was
   particularly rough? 

Document Quality

   Are there existing implementations of the protocol?  Have a 
   significant number of vendors indicated their plan to
   implement the specification?  Are there any reviewers that
   merit special mention as having done a thorough review,
   e.g., one that resulted in important changes or a
   conclusion that the document had no substantive issues?  If
   there was a MIB Doctor, Media Type, or other Expert Review,
   what was its course (briefly)?  In the case of a Media Type
   Review, on what date was the request posted?

Personnel

   Who is the Document Shepherd for this document?  Who is the 
   Responsible Area Director?  If the document requires IANA
   experts(s), insert 'The IANA Expert(s) for the registries
   in this document are <TO BE ADDED BY THE AD>.'

RFC Editor Note

  (Insert RFC Editor Note here or remove section)

IRTF Note

  (Insert IRTF Note here or remove section)

IESG Note

The IESG is currently considering an IESG note.  An early draft note:

> 1.1 Notes on the Consensus on this document

>  The IDNABIS Working Group (WG) was responsible for this document
throughout
> most of its development.  A full and complete consensus was not
achievable
> due to some conflicting desires.  While this document represents the
> majority opinion of the WG, some additional notes on the disagreements
> might be useful to implementers and readers of other specifications such
> as Unicode UTS46.

> When IDNA2003 was developed, it introduced the idea of mapping some
> Unicode characters into others. The argument was made then that having a
> commonly adopted mapping scheme was a good thing.
> A different view is that mapping is not a good idea. This view arises in
part from a
> strong sense that the notion of a “canonical” domain label form is an
important
> contributor to stability for internationalized domain name.

> Some generally shared principles:

> 1.  Mapping user input should not be recommended during
> registration processes or employed in all lookup use cases. It should be
optional.

> 2.  Mapping of user input is required in a few lookup use cases, to
> simulate the case-insensitivity ASCII input to DNS servers.

> 3.  If client software maps user input to a valid IDNA label, there
> is significant benefit to having a consistent set of mappings
implemented.  In other words,
> it's confusing or worse if the same input resolves to one domain in one
client
> or in one locale, and might resolve to a different domain, or not
resolve at
> all, in a different client or different locale. 

> Despite these shared principles, there are still two differing
approaches to choosing
> how extensive the set of recommended mappings should be.

> Some of the WG preferred that mappings be designed for the long-term,
> choosing only those that were the most necessary,
> and with the principle of least-surprise for the user providing input. 
> Here is some elaboration on how "least surprise" was modeled:

>   - A user would be less surprised to see an accented capital letter
mapped to the
>     corresponding accented lower case letter, than to see it fail, due
to the
>     ASCII case-insensitivity of DNS. 
>   - A user would be less surprised to see a character mapped to the
equivalent
>     normal width character, than to see it fail, because these are
really seen
>     as the same character.
>   - A user that typed a character using combining accents would be less
surprised
>     to see that mapped to the canonical form, than to see the lookup
fail because
>     the combining marks were not in canonical order. 
>   - A user would be less surprised to see a superscript or circled
character fail,
>     than to see it mapped, because these would only be typed in if for
some reason
>     the user really expected the special character to be different than
the mapped
>     non-superscript, non-circled character. 
>   - Some thought was given to avoiding
>     mappings that might not be appropriate for all time and for all
users and locales.

> In contrast to the "least-surprise" model, some of the WG preferred
that mappings be designed for maximum
> backwards-compatibility with documents produced while IDNA2003
> was the Proposed Standard for IDNs.  This principle led to a list
> of mappings derived from those done in IDNA2003.  This principle may be
most
> important for browsers, which must deal with a vast corpus of documents
with
> no oversight over how labels are used in hyperlinks, and for search
engines,
> which must know how browsers resolve links in order to create search
indexes
> over the same corpus.

>  The preference for a single set of consistent mappings could not be
reconciled
> with the difference between the long-term "least surprise"
> mappings, and the transitional, backwards-compatible mappings.  The
difference
> between the two approaches lies in several thousand characters that were
> required to be mapped in IDNA2003 but were deemed to risk user surprise
or
> confusion or simply not be necessary. 

> This specification recommends the long-term least-surprise
> mappings.  The algorithm in this specification results in
> about 15,000 characters mapped to simulate case insensitivity, 200
characters mapped
> for width variability, and a further 1100 mappings that transform
non-canonical
> character compositions to canonical NFC character compositions.  These
mappings
> remain operational in expected future versions of Unicode. 

>    NOTE: Implementers concerned with backwards compatibility will
undoubtedly
>    look at TRS46, particularly if their application must deal with
content created
>    with IDNA2003 and not updated during the time period in which
registries and
>    content creators transition to IDNA2008.  TRS46 prioritizes
backwards-compatibility
>    and dealing with labels in existing documents, rather than minimizing
surprise for
>    user input.  Finally, implementers might also consider how registry
transition
>    strategies affect backwards-compatibility strategies -- those
transition strategies
>    are ultimately in the hands of registries and zone operators, but
general principles
>    are being discussed in venues such as ICANN. 


IANA Note

  (Insert IANA Note here or remove section)

RFC Editor Note