IAB Thoughts on Encodings for Internationalized Domain Names
RFC 6055

Document Type RFC - Informational (February 2011; Errata)
Updates RFC 2130
Last updated 2015-10-14
Stream IAB
Formats plain text pdf html bibtex
Stream IAB state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
Internet Architecture Board (IAB)                              D. Thaler
Request for Comments: 6055                                     Microsoft
Updates: 2130                                                 J. Klensin
Category: Informational
ISSN: 2070-1721                                              S. Cheshire
                                                                   Apple
                                                           February 2011

      IAB Thoughts on Encodings for Internationalized Domain Names

Abstract

   This document explores issues with Internationalized Domain Names
   (IDNs) that result from the use of various encoding schemes such as
   UTF-8 and the ASCII-Compatible Encoding produced by the Punycode
   algorithm.  It focuses on the importance of agreeing on a single
   encoding and how complicated the state of affairs ends up being as a
   result of using different encodings today.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Architecture Board (IAB)
   and represents information that the IAB has deemed valuable to
   provide for permanent record.  Documents approved for publication by
   the IAB are not a candidate for any level of Internet Standard; see
   Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6055.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.

Thaler, et al.                Informational                     [Page 1]
RFC 6055                      IDN Encodings                February 2011

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
     1.1.  APIs . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   2.  Use of Non-DNS Protocols . . . . . . . . . . . . . . . . . . .  9
   3.  Use of Non-ASCII in DNS  . . . . . . . . . . . . . . . . . . . 10
     3.1.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . 14
   4.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . . 16
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
   7.  IAB Members at the Time of Approval  . . . . . . . . . . . . . 19
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 20
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 20

1.  Introduction

   The goal of this document is to explore what can be learned from some
   current difficulties in implementing Internationalized Domain Names
   (IDNs).

   A domain name consists of a sequence of labels, conventionally
   written separated by dots.  An IDN is a domain name that contains one
   or more labels that, in turn, contain one or more non-ASCII
   characters.  Just as with plain ASCII domain names, each IDN label
   must be encoded using some mechanism before it can be transmitted in
   network packets, stored in memory, stored on disk, etc.  These
   encodings need to be reversible, but they need not store domain names
   the same way humans conventionally write them on paper.  For example,
   when transmitted over the network in DNS packets, domain name labels
   are *not* separated with dots.

   Internationalized Domain Names for Applications (IDNA), discussed
   later in this document, is the standard that defines the use and
   coding of internationalized domain names for use on the public
   Internet [RFC5890].  An earlier version of IDNA [RFC3490] is now
   being phased out.  Except where noted, the two versions are
   approximately the same with regard to the issues discussed in this
   document.  However, some explanations appeared in the earlier
   documents that were no longer considered useful when the later
   revision was created; they are quoted here from the documents in
   which they appear.  In addition, the terminology of the two versions
   differ somewhat; this document reflects the terminology of the
   current version.

   Unicode [Unicode] is a list of characters (including non-spacing
   marks that are used to form some other characters), where each
   character is assigned an integer value, called a code point.  In
Show full document text