Skip to main content

Mapping Characters in IDNA

Approval announcement
Draft of message to be sent after approval:


From: The IESG <>
To: IETF-Announce <>
Cc: Internet Architecture Board <>,
    RFC Editor <>, 
    idnabis mailing list <>, 
    idnabis chair <>
Subject: Document Action: 'Mapping Characters in IDNA' to Informational

The IESG has approved the following document:

- 'Mapping Characters in IDNA '
   <draft-ietf-idnabis-mappings-04.txt> as an Informational RFC

This document is the product of the Internationalized Domain Names in
Applications, Revised Working Group. 

The IESG contact persons are Lisa Dusseault and Alexey Melnikov.

A URL of this Internet-Draft is:

Ballot Text

Technical Summary

   Relevant content can frequently be found in the abstract
   and/or introduction of the document.  If not, this may be 
   an indication that there are deficiencies in the abstract
   or introduction.

Working Group Summary

   Was there anything in the WG process that is worth noting?
   For example, was there controversy about particular points 
   or were there decisions where the consensus was
   particularly rough? 

Document Quality

   Are there existing implementations of the protocol?  Have a 
   significant number of vendors indicated their plan to
   implement the specification?  Are there any reviewers that
   merit special mention as having done a thorough review,
   e.g., one that resulted in important changes or a
   conclusion that the document had no substantive issues?  If
   there was a MIB Doctor, Media Type, or other Expert Review,
   what was its course (briefly)?  In the case of a Media Type
   Review, on what date was the request posted?


   Who is the Document Shepherd for this document?  Who is the 
   Responsible Area Director?  If the document requires IANA
   experts(s), insert 'The IANA Expert(s) for the registries
   in this document are <TO BE ADDED BY THE AD>.'

RFC Editor Note

  (Insert RFC Editor Note here or remove section)


  (Insert IRTF Note here or remove section)


The IESG is currently considering an IESG note.  An early draft note:

> 1.1 Notes on the Consensus on this document

>  The IDNABIS Working Group (WG) was responsible for this document
> most of its development.  A full and complete consensus was not
> due to some conflicting desires.  While this document represents the
> majority opinion of the WG, some additional notes on the disagreements
> might be useful to implementers and readers of other specifications such
> as Unicode UTS46.

> When IDNA2003 was developed, it introduced the idea of mapping some
> Unicode characters into others. The argument was made then that having a
> commonly adopted mapping scheme was a good thing.
> A different view is that mapping is not a good idea. This view arises in
part from a
> strong sense that the notion of a “canonical” domain label form is an
> contributor to stability for internationalized domain name.

> Some generally shared principles:

> 1.  Mapping user input should not be recommended during
> registration processes or employed in all lookup use cases. It should be

> 2.  Mapping of user input is required in a few lookup use cases, to
> simulate the case-insensitivity ASCII input to DNS servers.

> 3.  If client software maps user input to a valid IDNA label, there
> is significant benefit to having a consistent set of mappings
implemented.  In other words,
> it's confusing or worse if the same input resolves to one domain in one
> or in one locale, and might resolve to a different domain, or not
resolve at
> all, in a different client or different locale. 

> Despite these shared principles, there are still two differing
approaches to choosing
> how extensive the set of recommended mappings should be.

> Some of the WG preferred that mappings be designed for the long-term,
> choosing only those that were the most necessary,
> and with the principle of least-surprise for the user providing input. 
> Here is some elaboration on how "least surprise" was modeled:

>   - A user would be less surprised to see an accented capital letter
mapped to the
>     corresponding accented lower case letter, than to see it fail, due
to the
>     ASCII case-insensitivity of DNS. 
>   - A user would be less surprised to see a character mapped to the
>     normal width character, than to see it fail, because these are
really seen
>     as the same character.
>   - A user that typed a character using combining accents would be less
>     to see that mapped to the canonical form, than to see the lookup
fail because
>     the combining marks were not in canonical order. 
>   - A user would be less surprised to see a superscript or circled
character fail,
>     than to see it mapped, because these would only be typed in if for
some reason
>     the user really expected the special character to be different than
the mapped
>     non-superscript, non-circled character. 
>   - Some thought was given to avoiding
>     mappings that might not be appropriate for all time and for all
users and locales.

> In contrast to the "least-surprise" model, some of the WG preferred
that mappings be designed for maximum
> backwards-compatibility with documents produced while IDNA2003
> was the Proposed Standard for IDNs.  This principle led to a list
> of mappings derived from those done in IDNA2003.  This principle may be
> important for browsers, which must deal with a vast corpus of documents
> no oversight over how labels are used in hyperlinks, and for search
> which must know how browsers resolve links in order to create search
> over the same corpus.

>  The preference for a single set of consistent mappings could not be
> with the difference between the long-term "least surprise"
> mappings, and the transitional, backwards-compatible mappings.  The
> between the two approaches lies in several thousand characters that were
> required to be mapped in IDNA2003 but were deemed to risk user surprise
> confusion or simply not be necessary. 

> This specification recommends the long-term least-surprise
> mappings.  The algorithm in this specification results in
> about 15,000 characters mapped to simulate case insensitivity, 200
characters mapped
> for width variability, and a further 1100 mappings that transform
> character compositions to canonical NFC character compositions.  These
> remain operational in expected future versions of Unicode. 

>    NOTE: Implementers concerned with backwards compatibility will
>    look at TRS46, particularly if their application must deal with
content created
>    with IDNA2003 and not updated during the time period in which
registries and
>    content creators transition to IDNA2008.  TRS46 prioritizes
>    and dealing with labels in existing documents, rather than minimizing
surprise for
>    user input.  Finally, implementers might also consider how registry
>    strategies affect backwards-compatibility strategies -- those
transition strategies
>    are ultimately in the hands of registries and zone operators, but
general principles
>    are being discussed in venues such as ICANN. 


  (Insert IANA Note here or remove section)

RFC Editor Note