Network Working Group                                        A. Sullivan
Internet-Draft                                                 Dyn, Inc.
Intended status: Informational                                 D. Thaler
Expires: December 14, 2012                                     Microsoft
                                                              J. Klensin

                                                              O. Kolkman
                                                              NLnet Labs
                                                           June 12, 2012

    Principles for Unicode Code Point Inclusion in Labels in the DNS


   IDNA makes available to DNS zone administrators a very wide range of
   Unicode code points.  Most operators of zones should probably not
   permit registration of U-labels using the entire range.  This is
   especially true of zones that accept registrations from outside
   parties, including (and most importantly) the root.  It is
   unfortunately not possible to generate algorithms to determine
   whether a code point is intrinsically safe to permit.  This memo
   presents a set of principles that can be used to guide the decision
   of whether a Unicode code point may be wisely included in the
   repertoire of permissible code points in a U-label in a zone.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 14, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Sullivan, et al.        Expires December 14, 2012               [Page 1]

Internet-Draft       DNS Zone Code Point Principles            June 2012

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Less-restrictive Rules Going Down the DNS Tree . . . . . .  5
   3.  Principles applicable to all zones . . . . . . . . . . . . . .  6
     3.1.  Longevity Principle  . . . . . . . . . . . . . . . . . . .  6
     3.2.  Usability Principle  . . . . . . . . . . . . . . . . . . .  6
   4.  Principles applicable to all public zones  . . . . . . . . . .  6
     4.1.  Conservatism Principle . . . . . . . . . . . . . . . . . .  6
     4.2.  Inclusion Principle  . . . . . . . . . . . . . . . . . . .  7
     4.3.  Simplicity Principle . . . . . . . . . . . . . . . . . . .  7
     4.4.  Predictability Principle . . . . . . . . . . . . . . . . .  7
     4.5.  Stability Principle  . . . . . . . . . . . . . . . . . . .  8
   5.  Principle specific to the root zone  . . . . . . . . . . . . .  8
     5.1.  Letter Principle . . . . . . . . . . . . . . . . . . . . .  8
   6.  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   10. Informative References . . . . . . . . . . . . . . . . . . . .  9
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10

Sullivan, et al.        Expires December 14, 2012               [Page 2]

Internet-Draft       DNS Zone Code Point Principles            June 2012

1.  Introduction

   Operators of a DNS zone need to set policies around what Unicode code
   points are allowed in labels in that zone.  Typically there a number
   of important goals to consider when constructing such policies.
   These include, for instance, possible visual confusability between
   two labels, possible confusion between Fully-Qualified Domain Names
   (FQDNs) and IP address literals, and other usability and
   accessibility issues.

   This document provides a set of principles that zone operators can
   use to construct their code point policies in order to improve
   usability and clarity and thereby reduce confusion.

1.1.  Terminology

   This document uses the following terms.

      LDH Label: a string consisting of ASCII letters, digits, and the
      hyphen, with additional restrictions as explained in Section 2.3.1
      of [RFC5890].

      A-label: an LDH label that starts with "xn--" and meets all the
      IDNA requirements, with additional restrictions as explained in
      Section of [RFC5890].

      U-label: a string of Unicode characters that meets all the IDNA
      requirements and includes at least one non-ASCII character, with
      additional restrictions as explained in Section of

      Public zone: in this document, a DNS zone that accepts
      registration requests from organizations outside the zone
      administrator's own organization.  (Whether the zone performs
      delegation is a separate question.  What is important is the
      diversity of the registration-requesting community.)  Note that
      under this definition, the root zone is a public zone, though one
      that has a unique function in the DNS.

      Character: a member of a set of elements used for the
      organization, control, or representation of data.  See Section 2
      of [RFC6365] for more details.

      Language: a way that humans communicate.  The use of language
      occurs in many forms, the most common of which are speech,
      writing, and signing.  See Section 2 of [RFC6365] for more

Sullivan, et al.        Expires December 14, 2012               [Page 3]

Internet-Draft       DNS Zone Code Point Principles            June 2012

      Rendering: the display of a string of text.  See Section 5 of
      [RFC6365] for more details.

      Script: a set of graphic characters used for the written form of
      one or more languages.  See Section 2 of [RFC6365] for more

      Writing system: a set of rules for using one or more scripts to
      write a particular language.  See Section 2 of [RFC6365] for more

   This memo does not propose a protocol standard, and the use of words
   such as "should" follow the ordinary English meaning, and not that
   laid out in [RFC2119].

2.  Background

   In recent communications ([IABCOMM1] and [IABCOMM2]), the IAB has
   emphasized the importance of conservatism in allocating labels
   conforming to IDNA2008 ([RFC5890], [RFC5891], [RFC5892], [RFC5893],
   [RFC5894], [RFC5895]) in DNS zones, and especially in the root zone.
   Traditional LDH-labels in the root zone used only alphabetic
   characters (i.e., ASCII a-z or A-Z).  Matters are more complicated
   with U-labels, however.  The IAB communications recommended that
   U-labels permit only code points with a General_Category (gc) of Ll
   (Lowercase_Letter), Lo (Other_Letter), or Lm (Modifier_Letter), but
   noted that for practical considerations other code points might be
   permitted on a case-by-case basis.

   The IAB recommendation does, however, leave some issues open that
   need to be addressed.  First, it is by no means clear that all of the
   code points with General_Category Lo or Lm and which are permitted
   under IDNA2008 are appropriate for a zone such as the root zone.  To
   take but one example, the code point U+02BC MODIFIER LETTER
   APOSTROPHE has a General_Category of Lm.  In practically every
   rendering (and we are unaware of an exception), U+02BC is
   indistinguishable from U+2019 RIGHT SINGLE QUOTATION MARK, which has
   a General_Category of Pf (Final_Punctuation).  U+02BC will also be
   read by large numbers of people as being the same character as U+0027
   APOSTROPHE, which has a General_Category of Po (Other_Punctuation).
   U+02BC is PROTOCOL VALID (PVALID) under IDNA2008 (see [RFC5892]),
   whereas both other code points are DISALLOWED.  So, to begin with, it
   is plain that not every code point with a General_Category of Ll, Lo,
   or Lm is consistent with the type of conservatism principle discussed
   in Section 4.1 or the IAB recommendation.

   To make matters worse, some languages are dependent on code points

Sullivan, et al.        Expires December 14, 2012               [Page 4]

Internet-Draft       DNS Zone Code Point Principles            June 2012

   with General_Category Mc (Spacing_Mark) or General_Category Mn
   (Nonspacing_Mark).  This dependency is particularly common in Indic
   languages, though not exclusive to them.  (At the risk of vastly
   oversimplifying, the overarching issue is mostly the interaction of
   complex writing systems and the way Unicode works.)  To restrict
   users of those languages only to code points with General_Category of
   Ll, Lo, or Lm. would be extremely limiting.  While DNS labels are not
   words, or sentences, or phrases (as noted in [RFC4690]), they are
   intended to support useful mnemonics.  Mnemonics that diverge wildly
   from the usual conventions in a language are likely to attract strong
   objections, particularly in the root.  The objections might drag the
   discussion away from sound management of the DNS zone in question and
   towards discussions of cultural hegemony.  That sort of discussion
   itself might present risks for the operation of the zone.

   Many of the issues above turn out to be relevant to all public zones.
   Moreover, the overall issue of developing a policy for code point
   permission is common to all zones that accept A-labels or U-labels
   for registration.  As section 4.2.4 of [RFC5891] says, every registry
   at every level of the DNS is "expected to establish policies about
   label registrations."

   For reasons of sound management, it is not desirable to decide
   whether to permit a given code point only when an application
   containing that code point is pending.  That approach reduces
   predictability and is bound to appear subject to special pleas.  It
   is better instead to come up with the rules governing acceptance of
   code points in advance.

   As is evident from the foregoing discussion about the Letter and Mark
   categories, it is simply not possible to make code point decisions
   algorithmically.  If it were possible to develop such an algorithm,
   it would already exist: the DNS is hardly unique in needing to impose
   restrictions on code points while accommodating many different
   linguistic communities.  Rules can nevertheless be made sensibly by
   using overarching principles to guide the rule-making.  These
   principles function as meta-rules, guiding the establishment of rules
   for inclusion of any code point (from those permitted by IDNA) in
   labels in a given zone.

2.1.  Less-restrictive Rules Going Down the DNS Tree

   A set of principles derived from the above ideas follows in Section 3
   through Section 5 below.  Such principles fall into three categories.
   Some principles apply to every DNS zone.  Some additional principles
   apply to all public zones, including the root zone.  Finally, other
   principles apply only to the root zone.  This means that zones higher
   in the DNS tree tend to have more restrictive rules (since additional

Sullivan, et al.        Expires December 14, 2012               [Page 5]

Internet-Draft       DNS Zone Code Point Principles            June 2012

   principles apply), and zones lower in the DNS tree tend to have less
   restrictive rules, since they are used within a more narrow context.
   In general, the relevant context for a principle is that of the zone,
   not that of a given subset of the user community; for the root zone,
   for example, the context is "the entire Internet population".

3.  Principles applicable to all zones

3.1.  Longevity Principle

   Unicode properties of a code point ought to be stable across the
   versions of Unicode that users of the zone are likely to have
   installed.  Because it is possible for the properties of a code point
   to change between Unicode versions, a good way to predict such
   stability is to ensure that a code point has in fact been stable for
   multiple successive versions of Unicode.  This principle is related
   to the Stability Principle in Section 4.5.

   The more diverse the community using the zone, the greater the
   importance of following this principle.  The policy for a leaf zone
   in the DNS might only require stability across two Unicode versions,
   wereas a more public zone might require stability across four or more
   releases before the code point's properties are considered long-lived
   and stable.

3.2.  Usability Principle

   Every zone administrator should be sensitive to the potential use of
   a code point to be permitted, and to be as cautious as practical in
   permitting code points in the zone.  The Conservatism Principle
   outlined in Section 4.1 is in general a good idea, though less
   conservatism may be appropriate in the case of well-delineated and
   less-diverse user communities.  Zone administrators should especially
   consider whether a candidate code point could be used maliciously or
   could present difficulty if the code point is encountered outside the
   usual linguistic circumstances.

4.  Principles applicable to all public zones

4.1.  Conservatism Principle

   Public zones are, by definition, zones that are shared by different
   groups of people.  Therefore, any decision to permit a code point in
   a public zone (including the root) should be as conservative as
   practicable.  Doubts should always be resolved in favor of rejecting
   a code point for inclusion rather than in favor of including it, in

Sullivan, et al.        Expires December 14, 2012               [Page 6]

Internet-Draft       DNS Zone Code Point Principles            June 2012

   order to minimize risk.

4.2.  Inclusion Principle

   Just as IDNA2008 starts from the principle that the Unicode range is
   excluded, and then adds code points according to derived properties
   of the code points, so a public zone should only permit inclusion of
   a code point if it is known to be "safe" in terms of usability and
   confusability within the context of that zone.  The default treatment
   of a code point should be that it is excluded.

4.3.  Simplicity Principle

   The rules for determining whether a code point is to be included
   should be simple enough that they are readily understood by someone
   with a moderate background in the DNS and Unicode issues.  This
   principle does not mean that a completely naive person needs to be
   able to understand the rationale for why a code point is included,
   but it does mean that the reason for inclusion of very peculiar code
   points, even if the code points are safe in themselves, will be too
   difficult to understand and such code points will therefore be

   The meaning of "simple" or "readily understood" is context-dependent.
   For instance, the root zone has to serve everyone in the world; for
   practical purposes, this means that the reasons for including a code
   point need to be comprehensible even to people who cannot use the
   script where the code point is found.  In a zone that permits a
   constrained subset of Unicode characters (for instance, only those
   needed to write a single alphabetic language) and that supports a
   clearly-delineated linguistic community (for instance, the speakers
   of a single language with well-understood written conventions), more
   complicated rules might be acceptable.  Compare this principle with
   the Usability Principle in Section 3.2.

4.4.  Predictability Principle

   The rules for determining whether a code point is to be included
   should be predictable enough that those with the requisite
   understanding of DNS, IDNA, and Unicode will usually reach the same
   conclusion.  This is not a requirement for algorithmic treatment of
   code points; as previously noted, that is not possible.  It is rather
   to say that the consistent application of professional judgment is
   likely to yield the same results; combined with the principle in
   Section 4.1, when results are not predictable the anomalous code
   point would not be permitted.

   Just as in Section 4.3, this principle tends to cause more

Sullivan, et al.        Expires December 14, 2012               [Page 7]

Internet-Draft       DNS Zone Code Point Principles            June 2012

   restriction the more diverse the community using the zone; it is most
   restrictive for the root zone.  This is because what is predictable
   within a given language community is possibly very surprising across

4.5.  Stability Principle

   Once a code point is permitted, it is at least very hard to stop
   permitting that code point.  In public zones (including the root),
   the list of code points to be permitted should change very slowly, if
   at all, and usually only in the direction of permitting an addition
   as time and experience indicates that inclusion of such a code point
   is both safe and consistent with these principles.

5.  Principle specific to the root zone

5.1.  Letter Principle

   In keeping with the spirit of the note in [RFC1123] that top-level
   labels "will be alphabetic", U-labels in the root zone should exclude
   code points that are not normally used to write words, or that are in
   some cases normally used for purposes other than writing words.  This
   is not the same as using Unicode's General_Category to include only
   letters.  It is a restriction that expands the possible class of
   included code points beyond the Unicode letters, but only expands so
   far as to include the things that are normally used the way letters
   are.  Under this principle, code points with (for example)
   General_Category Mn (Nonspacing_Mark) might be included -- but only
   those that are used to write words and not (for instance) musical
   symbols.  This principle should be applied as narrowly as possible;
   as [RFC4690] says, "While DNS labels may conveniently be used to
   express words in many circumstances, the goal is not to express words
   (or sentences or phrases), but to permit the creation of unambiguous
   labels with good mnemonic value."

6.  Conclusion

   The principles outlined in this document can be applied when
   considering any range of Unicode code points for possible inclusion
   in a DNS zone.  It is worth observing that doing anything (especially
   in light of Section 4.5) implicitly disadvantages communities with a
   writing system not yet well understood and not represented in the
   technical and policy communities involved in the discussion.  That
   disadvantage is to be guarded against as much as practical, but is
   effectively impossible to prevent (while still taking action) in
   light of imperfect human knowledge.

Sullivan, et al.        Expires December 14, 2012               [Page 8]

Internet-Draft       DNS Zone Code Point Principles            June 2012

7.  Security Considerations

   The principles outlined in this memo are intended to improve
   usability and clarity and thereby reduce confusion among different
   labels.  While these principles may contribute to reduction of risk,
   they are not sufficient to provide a comprehensive
   internationalization policy for zone management.

8.  IANA Considerations

   None.  RFC Editor: this section may be removed on publication.

9.  Acknowledgements

   The authors thank the participants in the IAB Internationalization
   programme for the discussion of the ideas in this memo.  Marc
   Blanchet made specific comments.

10.  Informative References

              Internet Architecture Board, "IAB Statement: 'The
              interpretation of rules in the ICANN gTLD Applicant
              Guidebook.'", February 2012.

              Internet Architecture Board, "Response to ICANN questions
              concerning 'The interpretation of rules in the ICANN gTLD
              Applicant Guidebook'", March 2012.

   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
              and Support", STD 3, RFC 1123, October 1989.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
              Recommendations for Internationalized Domain Names
              (IDNs)", RFC 4690, September 2006.

   [RFC5890]  Klensin, J., "Internationalized Domain Names for
              Applications (IDNA): Definitions and Document Framework",
              RFC 5890, August 2010.

   [RFC5891]  Klensin, J., "Internationalized Domain Names in

Sullivan, et al.        Expires December 14, 2012               [Page 9]

Internet-Draft       DNS Zone Code Point Principles            June 2012

              Applications (IDNA): Protocol", RFC 5891, August 2010.

   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
              Internationalized Domain Names for Applications (IDNA)",
              RFC 5892, August 2010.

   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
              Internationalized Domain Names for Applications (IDNA)",
              RFC 5893, August 2010.

   [RFC5894]  Klensin, J., "Internationalized Domain Names for
              Applications (IDNA): Background, Explanation, and
              Rationale", RFC 5894, August 2010.

   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
              Internationalized Domain Names in Applications (IDNA)
              2008", RFC 5895, September 2010.

   [RFC6365]  Hoffman, P. and J. Klensin, "Terminology Used in
              Internationalization in the IETF", BCP 166, RFC 6365,
              September 2011.

Authors' Addresses

   Andrew Sullivan
   Dyn, Inc.
   150 Dow St
   Manchester, NH  03101


   Dave Thaler
   One Microsoft Way
   Redmond, WA  98052


Sullivan, et al.        Expires December 14, 2012              [Page 10]

Internet-Draft       DNS Zone Code Point Principles            June 2012

   John C Klensin
   1770 Massachusetts Ave, Ste 322
   Cambridge, MA  02140

   Phone: +1 617 491 5735

   Olaf Kolkman
   NLnet Labs
   Science Park 400
   Amsterdam  1098 XH
   The Netherlands


Sullivan, et al.        Expires December 14, 2012              [Page 11]