IAB Thoughts on Encodings for Internationalized Domain Names
RFC 6055
Document | Type |
RFC - Informational
(February 2011; Errata)
Updates RFC 2130
Was draft-iab-idn-encoding (iab)
|
|
---|---|---|---|
Authors | Stuart Cheshire , John Klensin , Dave Thaler | ||
Last updated | 2015-10-14 | ||
Stream | IAB | ||
Formats | plain text html pdf htmlized bibtex | ||
Stream | IAB state | (None) | |
Consensus Boilerplate | Unknown | ||
RFC Editor Note | (None) |
Internet Architecture Board (IAB) D. Thaler Request for Comments: 6055 Microsoft Updates: 2130 J. Klensin Category: Informational ISSN: 2070-1721 S. Cheshire Apple February 2011 IAB Thoughts on Encodings for Internationalized Domain Names Abstract This document explores issues with Internationalized Domain Names (IDNs) that result from the use of various encoding schemes such as UTF-8 and the ASCII-Compatible Encoding produced by the Punycode algorithm. It focuses on the importance of agreeing on a single encoding and how complicated the state of affairs ends up being as a result of using different encodings today. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. Documents approved for publication by the IAB are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6055. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Thaler, et al. Informational [Page 1] RFC 6055 IDN Encodings February 2011 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. Use of Non-DNS Protocols . . . . . . . . . . . . . . . . . . . 9 3. Use of Non-ASCII in DNS . . . . . . . . . . . . . . . . . . . 10 3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 14 4. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 16 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 7. IAB Members at the Time of Approval . . . . . . . . . . . . . 19 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8.1. Normative References . . . . . . . . . . . . . . . . . . . 20 8.2. Informative References . . . . . . . . . . . . . . . . . . 20 1. Introduction The goal of this document is to explore what can be learned from some current difficulties in implementing Internationalized Domain Names (IDNs). A domain name consists of a sequence of labels, conventionally written separated by dots. An IDN is a domain name that contains one or more labels that, in turn, contain one or more non-ASCII characters. Just as with plain ASCII domain names, each IDN label must be encoded using some mechanism before it can be transmitted in network packets, stored in memory, stored on disk, etc. These encodings need to be reversible, but they need not store domain names the same way humans conventionally write them on paper. For example, when transmitted over the network in DNS packets, domain name labels are *not* separated with dots. Internationalized Domain Names for Applications (IDNA), discussed later in this document, is the standard that defines the use and coding of internationalized domain names for use on the public Internet [RFC5890]. An earlier version of IDNA [RFC3490] is now being phased out. Except where noted, the two versions are approximately the same with regard to the issues discussed in this document. However, some explanations appeared in the earlier documents that were no longer considered useful when the later revision was created; they are quoted here from the documents in which they appear. In addition, the terminology of the two versions differ somewhat; this document reflects the terminology of the current version. Unicode [Unicode] is a list of characters (including non-spacing marks that are used to form some other characters), where each character is assigned an integer value, called a code point. In Thaler, et al. Informational [Page 2] RFC 6055 IDN Encodings February 2011Show full document text