Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
RFC 3492

Document Type RFC - Proposed Standard (March 2003; Errata)
Updated by RFC 5891
Last updated 2013-03-02
Stream IETF
Formats plain text pdf html bibtex
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 3492 (Proposed Standard)
Consensus Boilerplate Unknown
Telechat date
Responsible AD Erik Nordmark
IESG note Updated drafts are available. Need to be reviewed to see
if the IESG comments have been addressed.
Send notices to <jseng@pobox.org.sg>, <Marc.Blanchet@viagenie.qc.ca>
Network Working Group                                        A. Costello
Request for Comments: 3492                 Univ. of California, Berkeley
Category: Standards Track                                     March 2003

              Punycode: A Bootstring encoding of Unicode
       for Internationalized Domain Names in Applications (IDNA)

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

Abstract

   Punycode is a simple and efficient transfer encoding syntax designed
   for use with Internationalized Domain Names in Applications (IDNA).
   It uniquely and reversibly transforms a Unicode string into an ASCII
   string.  ASCII characters in the Unicode string are represented
   literally, and non-ASCII characters are represented by ASCII
   characters that are allowed in host name labels (letters, digits, and
   hyphens).  This document defines a general algorithm called
   Bootstring that allows a string of basic code points to uniquely
   represent any string of code points drawn from a larger set.
   Punycode is an instance of Bootstring that uses particular parameter
   values specified by this document, appropriate for IDNA.

Table of Contents

   1. Introduction...............................................2
       1.1 Features..............................................2
       1.2 Interaction of protocol parts.........................3
   2. Terminology................................................3
   3. Bootstring description.....................................4
       3.1 Basic code point segregation..........................4
       3.2 Insertion unsort coding...............................4
       3.3 Generalized variable-length integers..................5
       3.4 Bias adaptation.......................................7
   4. Bootstring parameters......................................8
   5. Parameter values for Punycode..............................8
   6. Bootstring algorithms......................................9

Costello                    Standards Track                     [Page 1]
RFC 3492                     IDNA Punycode                    March 2003

       6.1 Bias adaptation function.............................10
       6.2 Decoding procedure...................................11
       6.3 Encoding procedure...................................12
       6.4 Overflow handling....................................13
   7. Punycode examples.........................................14
       7.1 Sample strings.......................................14
       7.2 Decoding traces......................................17
       7.3 Encoding traces......................................19
   8. Security Considerations...................................20
   9. References................................................21
       9.1 Normative References.................................21
       9.2 Informative References...............................21
   A. Mixed-case annotation.....................................22
   B. Disclaimer and license....................................22
   C. Punycode sample implementation............................23
   Author's Address.............................................34
   Full Copyright Statement.....................................35

1. Introduction

   [IDNA] describes an architecture for supporting internationalized
   domain names.  Labels containing non-ASCII characters can be
   represented by ACE labels, which begin with a special ACE prefix and
   contain only ASCII characters.  The remainder of the label after the
   prefix is a Punycode encoding of a Unicode string satisfying certain
   constraints.  For the details of the prefix and constraints, see
   [IDNA] and [NAMEPREP].

   Punycode is an instance of a more general algorithm called
   Bootstring, which allows strings composed from a small set of "basic"
   code points to uniquely represent any string of code points drawn
   from a larger set.  Punycode is Bootstring with particular parameter
   values appropriate for IDNA.

1.1 Features

   Bootstring has been designed to have the following features:

   *  Completeness:  Every extended string (sequence of arbitrary code
      points) can be represented by a basic string (sequence of basic
      code points).  Restrictions on what strings are allowed, and on
      length, can be imposed by higher layers.

   *  Uniqueness:  There is at most one basic string that represents a
      given extended string.

   *  Reversibility:  Any extended string mapped to a basic string can
      be recovered from that basic string.
Show full document text