Skip to main content

Representing Label Generation Rulesets using XML
draft-davies-idntables-06

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Kim Davies , Asmus Freytag
Last updated 2014-03-05
Replaced by draft-ietf-lager-specification, RFC 7940
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-davies-idntables-06
Network Working Group                                          K. Davies
Internet-Draft                                                     ICANN
Intended status: Informational                                A. Freytag
Expires: September 6, 2014                                    ASMUS Inc.
                                                           March 5, 2014

            Representing Label Generation Rulesets using XML
                       draft-davies-idntables-06

Abstract

   This document describes a method of representing the domain name
   registration policy for a zone administrator using Extensible Markup
   Language (XML).  These policies, known as "Label Generation Rulesets"
   (LGRs), are particularly used for the implementation of
   Internationalized Domain Names (IDNs).  The rulesets are used to
   implement and share policy defining which labels and specific Unicode
   code points are permitted for registrations, which alternative code
   points are considered variants, and what actions may be performed on
   labels containing those variants.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 6, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents

Davies & Freytag        Expires September 6, 2014               [Page 1]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Design Goals . . . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  LGR Format . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     4.1.  Namespace  . . . . . . . . . . . . . . . . . . . . . . . .  9
     4.2.  Basic Structure  . . . . . . . . . . . . . . . . . . . . .  9
     4.3.  Metadata . . . . . . . . . . . . . . . . . . . . . . . . .  9
       4.3.1.  The version Element  . . . . . . . . . . . . . . . . . 10
       4.3.2.  The date Element . . . . . . . . . . . . . . . . . . . 10
       4.3.3.  The language Element . . . . . . . . . . . . . . . . . 10
       4.3.4.  The domain Element . . . . . . . . . . . . . . . . . . 11
       4.3.5.  The description Element  . . . . . . . . . . . . . . . 11
       4.3.6.  The validity-start and validity-end Elements . . . . . 12
       4.3.7.  The unicode-version Element  . . . . . . . . . . . . . 12
       4.3.8.  The references Element . . . . . . . . . . . . . . . . 12
   5.  Code Points and Variants . . . . . . . . . . . . . . . . . . . 14
     5.1.  Sequences  . . . . . . . . . . . . . . . . . . . . . . . . 14
     5.2.  Variants . . . . . . . . . . . . . . . . . . . . . . . . . 15
       5.2.1.  Basic Variants . . . . . . . . . . . . . . . . . . . . 15
       5.2.2.  Null Variants  . . . . . . . . . . . . . . . . . . . . 16
       5.2.3.  Dispositions . . . . . . . . . . . . . . . . . . . . . 16
       5.2.4.  The ref Attribute  . . . . . . . . . . . . . . . . . . 17
       5.2.5.  Variants with Reflexive Mapping  . . . . . . . . . . . 18
       5.2.6.  Conditional Variants . . . . . . . . . . . . . . . . . 19
       5.2.7.  The comment Attribute  . . . . . . . . . . . . . . . . 20
     5.3.  Code Point Tagging . . . . . . . . . . . . . . . . . . . . 20
   6.  Whole Label and Context Evaluation . . . . . . . . . . . . . . 21
     6.1.  Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 21
     6.2.  Character Classes  . . . . . . . . . . . . . . . . . . . . 21
       6.2.1.  Tag-based Classes  . . . . . . . . . . . . . . . . . . 22
       6.2.2.  Unicode Property-based Classes . . . . . . . . . . . . 23
       6.2.3.  Explicitly Declared Classes  . . . . . . . . . . . . . 23
       6.2.4.  Combined Classes . . . . . . . . . . . . . . . . . . . 24
     6.3.  Whole Label and Context Rules  . . . . . . . . . . . . . . 26
       6.3.1.  The rule Element . . . . . . . . . . . . . . . . . . . 26
       6.3.2.  The Match Operators  . . . . . . . . . . . . . . . . . 27
       6.3.3.  The count Attribute  . . . . . . . . . . . . . . . . . 28
       6.3.4.  The name and byref Attributes  . . . . . . . . . . . . 29
       6.3.5.  The choice Element . . . . . . . . . . . . . . . . . . 30

Davies & Freytag        Expires September 6, 2014               [Page 2]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       6.3.6.  Literal Code Point Sequences . . . . . . . . . . . . . 30
       6.3.7.  The any Element  . . . . . . . . . . . . . . . . . . . 30
       6.3.8.  The start and end Elements . . . . . . . . . . . . . . 31
       6.3.9.  Example rule from IDNA2008 . . . . . . . . . . . . . . 31
     6.4.  Parameterized Context or When Rules  . . . . . . . . . . . 32
       6.4.1.  The anchor Element . . . . . . . . . . . . . . . . . . 32
       6.4.2.  The look-behind and look-ahead Elements  . . . . . . . 33
       6.4.3.  Omitting the anchor Element  . . . . . . . . . . . . . 34
   7.  The action Element . . . . . . . . . . . . . . . . . . . . . . 36
     7.1.  The match and not-match Attributes . . . . . . . . . . . . 36
     7.2.  Actions matching Variant Dispositions  . . . . . . . . . . 36
       7.2.1.  Variant Disposition triggers . . . . . . . . . . . . . 36
       7.2.2.  Example for RFC3743-style Tables . . . . . . . . . . . 37
     7.3.  Recommended Disposition Values . . . . . . . . . . . . . . 38
     7.4.  Precedence . . . . . . . . . . . . . . . . . . . . . . . . 38
     7.5.  Implied Actions  . . . . . . . . . . . . . . . . . . . . . 39
     7.6.  Default Actions  . . . . . . . . . . . . . . . . . . . . . 39
   8.  Processing a Label Against an LGR  . . . . . . . . . . . . . . 41
     8.1.  Determining Eligibility for a Label  . . . . . . . . . . . 41
     8.2.  Determining Variants for a Label . . . . . . . . . . . . . 41
     8.3.  Determining a  Disposition for a Label or variant Label  . 42
   9.  Conversion to and from Other Formats . . . . . . . . . . . . . 43
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 44
   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 45
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 46
   Appendix A.  Example Table . . . . . . . . . . . . . . . . . . . . 47
   Appendix B.  How to Translate RFC 3743 based Tables into the
                XML Format  . . . . . . . . . . . . . . . . . . . . . 49
   Appendix C.  Indic Syllable Structure Example  . . . . . . . . . . 54
   Appendix D.  RelaxNG Schema  . . . . . . . . . . . . . . . . . . . 57
   Appendix E.  Acknowledgements  . . . . . . . . . . . . . . . . . . 69
   Appendix F.  Editorial Notes . . . . . . . . . . . . . . . . . . . 70
     F.1.  Known Issues and Future Work . . . . . . . . . . . . . . . 70
     F.2.  Change History . . . . . . . . . . . . . . . . . . . . . . 70
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 72

Davies & Freytag        Expires September 6, 2014               [Page 3]
Internet-Draft      Label Generation Rulesets in XML          March 2014

1.  Introduction

   This memo describes a method of using Extensible Markup Language
   (XML) to describe the algorithm used to determine whether a given
   domain label is permitted, and under which conditions, based on the
   code points it contains and their context.  These algorithms are
   comprised of a list of permissible code points, variant code point
   mappings, and a set of rules acting on them.  These algorithms form
   part of a zone administrator's policies, and can be referred to as
   Label Generation Rulesets (LGRs), or IDN tables.

   Administrators of the zones for top-level domain registries have
   historically published their LGRs using ASCII text or HTML.  The
   formatting of these documents has been loosely based on the format
   used for the Language Variant Table in [RFC3743].  [RFC4290] also
   provides a "model table format" that describes a similar set of
   functionality.  Common to these formats is that the algorithms used
   to evaluate the data therein are implicit or specified elsewhere.

   Through the first decade of IDN deployment, experience has shown that
   LGRs derived from these formats are difficult to consistently
   implement and compare due to their differing formats.  A universal
   format, such as one using a structured XML format, will assist by
   improving machine-readability, consistency, reusability and
   maintainability of LGRs.  It also provides for more complex
   conditional implementation of variants that reflects the known
   requirements of current zone administrator policies.

   Another feature of this format is that it allows many of the
   algorithms to be made explicit and machine implementable.  A
   remaining small set of implicit algorithms is described in this
   document to allow commonality in implementation.

   While the predominant usage of this specification is to represent IDN
   label policy, the format is not limited to IDN usage may also be used
   for describing ASCII domain name label rulesets.

Davies & Freytag        Expires September 6, 2014               [Page 4]
Internet-Draft      Label Generation Rulesets in XML          March 2014

2.  Design Goals

   The following items are explicit design goals of this format:

   o  MUST be in a format that can be implemented in a reasonably
      straightforward manner in software;

   o  The format SHOULD be able to be checked for formatting errors,
      such that common mistakes can be caught;

   o  An LGR MUST be able to express the set of valid code points that
      are allowed for registration under a specific zone administrator's
      policies;

   o  MUST be able to express computed alternatives to a given domain
      name based on mapping relationships between code points, whether
      one-to-one or many-to-many.  These computed alternatives are
      commonly known as "variants";

   o  Variant code points SHOULD be able to be tagged with specific
      dispositions or categories that can be used to support registry
      policy (such as whether to allocate the computed variant in the
      zone, or to merely block it from registration);

   o  Variants and code points MUST be able to stipulated based on
      contextual information.  For example, specific variants may only
      be applicable when they follow another specific code point, or
      when the code point is displayed in a specific presentation form;

   o  The data contained within an LGR MUST be able to be interpreted
      unambiguously, such that independent implementations that utilize
      the contents will arrive at the same results;

   o  To the largest extent possible, policy rules SHOULD be able to be
      specified in the XML format without relying hidden, or built-in
      algorithms in implementations.

   o  LGRs SHOULD be suitable for comparison and re-use, such that one
      could easily compare the contents of two or more to see the
      differences, to merge them, and so on.

   o  LGRs SHOULD be able to be merged automatically, at the minimum
      where code points and variant information is concerned.

   o  As many existing IDN tables as practicable SHOULD be able to be
      migrated to the LGR format with all applicable logic retained.

   It is explicitly NOT the goal of this format to stipulate what code

Davies & Freytag        Expires September 6, 2014               [Page 5]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   points should be listed in an LGR by a zone administrator.  Which
   registration policies are used for a particular zone is outside the
   scope of this memo.

Davies & Freytag        Expires September 6, 2014               [Page 6]
Internet-Draft      Label Generation Rulesets in XML          March 2014

3.  Requirements

   To be able to fulfill the known utilization of LGRs, the existing
   corpus of published IDN tables were reviewed to prepare this
   specification.

   In addition, the requirements of ICANN's work to implement an LGR for
   the DNS Root Zone [LGR-PROCEDURE] were also considered.  In
   particular, Section B of that document identifies five specific
   requirements for an LGR methodology.

   Finally, the syntax and rules in [RFC5892] and [RFC3743] were
   reviewed.

   Altogether these reviews resulted in the following requirements:

   o  The ability to identify a set of code points that are permitted.

   o  The ability to include code points that are permitted only in
      given contexts.

   o  The ability to represent a list of variants, if any, for each code
      point.

   o  The ability to include variants that are defined only in given
      contexts.

   o  The ability to assign a single disposition or categorization for
      each variants

   o  The ability to assign variants with reflexive mappings.

   o  The ability to assign variants that have a code point sequence as
      target.

   o  The ability to express variant mappings symmetrically.

   o  A method of identifying code points that are related, using a one
      or several tags per code point.

   o  The ability to describe rules regarding the possible actions that
      may be performed on the resulting label (such as block,
      allocatable, etc.)

   o  The ability to describe rules that check for ill-formed
      combinations across the whole label.

Davies & Freytag        Expires September 6, 2014               [Page 7]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   o  The ability to describe rules that define contexts in which code
      points are permissible or variants defined.

   o  The ability to preserve normative reference information as well as
      informative comments.

Davies & Freytag        Expires September 6, 2014               [Page 8]
Internet-Draft      Label Generation Rulesets in XML          March 2014

4.  LGR Format

   An LGR is expressed as a well-formed XML Document[XML].

4.1.  Namespace

   The XML Namespace URI is [TBD].

4.2.  Basic Structure

   The basic XML framework of the document is as follows:

       <?xml version="1.0"?>
       <lgr xmlns="http://www.iana.org/lgr/0.1">
           ...
       </lgr>

   Within the "lgr" element rest several sub-elements.  First is a
   "meta" element that contains all meta-data associated with the IDN
   table, such as its authorship, what it is used for, implementation
   notes and references.  This is followed by a "data" element that
   contains the substantive code point data.  Finally, an optional
   "rules" element contains information on contextual and whole-label
   evaluation rules, if any, along with any specific action elements
   providing for the disposition of labels and computed variant labels.

       <?xml version="1.0"?>
       <lgr xmlns="http://www.iana.org/lgr/0.1">
           <meta>
               ...
           </meta>
           <data>
               ...
           </data>
           <rules>
               ...
           </rules>
       </lgr>

   A document MUST contain exactly one "lgr" element.  Each "lgr"
   element MUST contain exactly one "data" element, optionally preceded
   by one "meta" element and optionally followed by one "rules" element.

4.3.  Metadata

   The "meta" element is used to express meta-data associated within the
   LGR.  It can be used to identify the author or relevant contact
   person, explain the intended usage of the LGR, and provide

Davies & Freytag        Expires September 6, 2014               [Page 9]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   implementation notes as well as references.  The data contained
   within is not required by software consuming the LGR in order to
   calculate valid labels, or to calculate variants.  However, the
   "unicode-version" element MUST be used by a consumer of the table to
   identify that it has the right Unicode data to perform operations on
   the table.

4.3.1.  The version Element

   The "version" element is used to uniquely identify each version of
   the LGR being represented.  No specific format is required, but it is
   RECOMMENDED that it be a numerical positive integer, which is
   incremented with each revision of the file.

   An example of a typical first edition of a document:

       <version>1</version>

   The version element may have an optional "comment" attribute.

       <version comment="draft">1</version>

4.3.2.  The date Element

   The "date" element is used to identify the date the LGR was posted.
   The contents of this element MUST be a valid ISO 8601 date string as
   described in [RFC3339].

   Example of a date:

       <date>2009-11-01</date>

4.3.3.  The language Element

   The "language" element signals that the LGR is associated with a
   specific language or script.  The value of the language element must
   be a valid language tag as described in [RFC5646].  The tag may
   simply refer to a script if the LGR is not referring to a specific
   language.

   Example of an English language LGR:

      <language>en</language>

   If the LGR applies to a specific script, rather than a language, the
   "und" language tag should be used followed by the relevant [RFC5646]

Davies & Freytag        Expires September 6, 2014              [Page 10]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   script subtag.  For example, for a Cyrillic script LGR:

      <language>und-Cyrl</language>

   If the LGR covers a specific set of multiple languages or scripts,
   the language element can be repeated.  However, for cases of a
   script-specific LGR exhibiting insignificant admixture of code points
   from other scripts, it is RECOMMENDED to the use a single "language"
   element identifying the predominant script.  In the exceptional case
   of a multi-script LGR where no script is predominant, use Zyyy
   (Common):

      <language>und-Zyyy</language>

   Note that that for the particular case of Japanese, a script tag
   "Japn" exists that matches the mixture of scripts used in writing
   that language.  The preferred language element would be:

      <language>und-Japn</language>

4.3.4.  The domain Element

   This optional element refers to a domain to which this policy is
   applied.  The value must be a valid domain name that represents the
   apex of the zone to which the domain is applied, and in the case of
   the root zone, should be represented as ".".

       <domain>example.com</domain>

   There may be multiple <domain> tags used to reflect a list of
   domains.

4.3.5.  The description Element

   The "description" element is a free-form element that contains any
   additional relevant description that is useful for the user in its
   interpretation.  Typically, this field contains authorship
   information, as well as additional context on how the LGR was
   formulated (such as citations and references), and how it has been
   applied.

   The element has an optional "type" attribute, which refers to the
   internet media type of the enclosed data.  Typical types would be
   "text/plain" or "text/html".  The attribute SHOULD be a valid MIME
   type.  If supplied, it will be assumed the contents is content of
   that media type.  If the description lacks a type field, it will be
   assumed to be plain text ("text/plain").

Davies & Freytag        Expires September 6, 2014              [Page 11]
Internet-Draft      Label Generation Rulesets in XML          March 2014

4.3.6.  The validity-start and validity-end Elements

   The "validity-start" and "validity-end" elements are optional
   elements that describe the time period from which the contents of the
   LGR become valid (i.e. are used in registry policy), and the contents
   of the LGR cease to be used.

   The times should conform to the format described in section 5.6 of
   [RFC5646].  It may be comprised of a date, or a date and time stamp.

4.3.7.  The unicode-version Element

   Whenever an IDN table depends on character properties from a given
   version of the Unicode standard, the version number used in creating
   the LGR MUST be listed.  If any software processing the table does
   not have access to character property data of the requisite version,
   it MUST NOT perform any operations relating to whole-label
   evaluation.  While, some Unicode code points may not have been
   assigned in an earlier version, leaving properties for these code
   points undefined, in other cases their properties may have been
   updated in the Unicode standard between versions.  It is RECOMMENDED
   to only reference stable or immutable properties.  For a given LGR,
   the property values for the code points in the actual repertoire may
   be unchanged in a later version of Unicode, even though other changes
   were made in that standard.  If that fact can be established, it MAY
   be acceptable to use tools based on a later version of Unicode.

   [[TODO: A method of indicating a range of permissible Unicode
   versions should be described.]]

       <unicode-version>6.2</unicode-version>

   It is not necessary to include a "unicode-version" element for files
   that do not make use of Unicode properties.  Because Unicode has been
   strictly additive from Version 1.1, the required minimum version for
   the repertoire can be uniquely determined by checking the code point
   values in any "cp" attributes against the "age" property in [UAX42].

4.3.8.  The references Element

   A Label Generation Ruleset may define a list of references which are
   used to associate various elements in the LGR to one or more
   normative references.  In contrast, global references for the entire
   LGR can simply be part of the "description" element.

   References are specified in an optional "references" element contains
   any number of "reference" elements, each with a unique "id"
   attribute.  It is RECOMMENDED that the "id" attribute be an zero-

Davies & Freytag        Expires September 6, 2014              [Page 12]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   based integer.  The value of each "reference" element SHOULD be the
   citation of a standard, dictionary or other specification in any
   suitable format.  In addition to an "id" attribute, a reference
   element may have a "comment" attribute for an optional free-form
   annotation.

       <references>
         <reference id="0">The Unicode Standard, Version 7.0</reference>
         <reference id="1">Big-5: Computer Chinese Glyph and Character
            Code Mapping Table, Technical Report C-26, 1984</reference>
         <reference id="2" comment="synchronized with Unicode 6.1">
            ISO/IEC
            10646:2012 3rd edition</reference>
         ...
       </references>
       ...
       <data>
         <char cp="0620" ref="0 2" />
         ...
       </data>

   A reference can be associated with many types of elements in the
   "data" or "rules" sections of the LGR by using an optional "ref"
   attribute (see Section 5.2.4).  A "ref" attribute may not occur on
   elements that are named references to character classes and rules nor
   on certain specific other element types.  See description of these
   elements below.

Davies & Freytag        Expires September 6, 2014              [Page 13]
Internet-Draft      Label Generation Rulesets in XML          March 2014

5.  Code Points and Variants

   The bulk of a label generation ruleset is a description of which set
   of code points are eligible for a given label.  For rulesets that
   perform operations that result in potential variants, the code point-
   level relationships between variants need to also be described.

   The code point data is collected within a "data" element.  Within
   this element, a series of "char" and "range" elements describe
   eligible code points, or ranges of code points, respectively.

   Discrete permissible code points or code point sequences are declared
   with a "char" element, e.g.

       <char cp="002D"/>

   Ranges of permissible code points may be stipulated with a "range"
   element, e.g.

       <range first-cp="0030" last-cp="0039"/>

   The range is inclusive of the first and last code points.  Whether
   code points are specified individually or as part of a range makes no
   difference in processing the data, and tools reading or writing the
   XML format are not required to retain a distinction.  All attributes
   defined for a range element are as if applied to each code point
   within.

   Code points must be expressed in uppercase, hexadecimal, and zero
   padded to a minimum of 4 digits - in other words according to the
   standard Unicode convention but without the prefix "U+".  The
   rationale for not allowing other encoding formats, including native
   Unicode encoding in XML, is explored in [UAX42].  The XML conventions
   used in this format, including the element and attribute names,
   mirror this document where practical and reasonable to do so.  It is
   RECOMMENDED to list all "char" elements in ascending order of cp
   attribute.

5.1.  Sequences

   A sequence of two or more code points may be specified in a LGR, for
   example, when defining the source for n:m variant mappings.  Another
   use of sequences would be in cases when the exact sequence of code
   points is required to occur in order for the constituent elements to
   be eligible, such as when a specific code point is only eligible when
   preceded or followed by another code point.  The following would
   define the eligibility of the MIDDLE DOT (U+00B7) only when both
   preceded and followed by the LATIN SMALL LETTER L (U+006C):

Davies & Freytag        Expires September 6, 2014              [Page 14]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <char cp="006C 00B7 006C" comment="Catalan middle dot"/>

   As an alternative to using sequences to define a required context, a
   "char" or "range" element may specify conditional context in a "when"
   attribute as described below in Section 5.2.6.  The latter method is
   more flexible in that such conditional context is not limited to
   specific code point in addition to allowing both prohibited as well
   as required context to be specified.

5.2.  Variants

   While most LGRs typically only determine code point eligibility,
   others additionally specify a mapping of code points to other code
   points, known as "variants".  What constitutes a variant code point
   is a matter of policy, and varies for each implementation.  The
   following examples are intended to demonstrate the syntax; they are
   not necessarily typical.

5.2.1.  Basic Variants

   Variant code points are specified using one of more "var" elements as
   children of a "char" element.

   For example, to map LATIN SMALL LETTER V (U+0076) as a variant of
   LATIN SMALL LETTER U (U+0075):

       <char cp="0075">
           <var cp="0076"/>
       </char>

   A sequence of multiple code points can be specified as a variant of a
   single code point.  For example, the sequence of LATIN SMALL LETTER O
   (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
   specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS
   (U+00F6) as follows:

       <char cp="00F6">
           <var cp="006F 0065"/>
       </char>

   The "var" element specifies variant mappings in only one direction,
   even though the variant relation is usually considered symmetric,
   that is, if A is a variant of B then B should also be a variant of A.
   The format requires that the inverse of the variant be given
   explicitly to fully specify symmetric variant relations in the IDN
   table.  This has the beneficial side effect of making the symmetry
   explicit:

Davies & Freytag        Expires September 6, 2014              [Page 15]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <char cp="006F 0065">
           <var cp="00F6"/>
       </char>

   Both the source and target of a variant mapping may be sequences.  As
   it is not possible to specify variants for ranges, ranges cannot be
   used for characters for which variant relations need to be defined.

   All variants MUST be unique.  For a given "char" element all variants
   must have a unique combination of "cp" , "when" and "not-when"
   attributes.  It is RECOMMENDED to list the "var" elements in
   ascending order of their target code point sequence.

5.2.2.  Null Variants

   To specify a null variant, which is a variant string that maps to no
   code point, use an empty cp attribute.  For example, to mark a string
   with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the
   ZERO WIDTH NON-JOINER:

       <char cp="200C">
           <var cp=""/>
       </char>

   This is useful in expressing the intent that some code points in a
   label are to be mapped away when generating a canonical variant of
   the label.  However, in tables that are designed to have symmetric
   variant mappings, this could lead to combinatorial explosion, if not
   handled carefully.

   The symmetric form of a null variant is expressed as follows:

       <char cp="">
           <var cp="200C" disp="invalid" />
       </char>

   A char element with an empty "cp" attribute MUST specify at least one
   variant mapping, or the results are undefined.  It is strongly
   RECOMMENDED to use a disposition of 'invalid" or equivalent when
   defining variant mappings from null sequences, so that variant
   mapping from null sequences are removed in variant label generation.

5.2.3.  Dispositions

   Variants may be given dispositions.  These describe the policy state
   for a variant label that was generated using a particular variant.
   The dispositions are the same as described below in Section 7.

Davies & Freytag        Expires September 6, 2014              [Page 16]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   A disposition may be of any non-empty value not starting with an
   underscore and not containing spaces.  Within these restrictions a
   disposition may have any value, but several conventional dispositions
   are predefined below in Section 7 to encourage common conventions in
   their application.  If these values can represent registry policy,
   they SHOULD be used.  (See also Section 7.6 ).

       <char cp="767C">
           <var cp="53D1" disp="allocate"/>
           <var cp="5F42" disp="block"/>
           <var cp="9AEA" disp="block"/>
           <var cp="9AEE" disp="block"/>
       </char>

   Usually, if a variant label contains any instance of one of the
   variants that are to be blocked the label would be blocked, but if it
   contained only instances of variants to be allocated it could be
   allocated.  See the discussion about implied actions in Section 7.6.

   Because variants MUST be unique, it is not possible to define the
   same variant for the same "char" element with different dispositions
   (see however Section 5.2.6).

5.2.4.  The ref Attribute

   Reference information may optionally be specified by a "ref"
   attribute, consisting of a space delimited sequence of reference
   identifiers.

       <char cp="522A" ref="0">
           <var cp="5220" ref="2 3"/>
           <var cp="5220" ref="2 3"/>
       </char>

   This facility is typically used to give source information for code
   points or variant relations.  This information is ignored when
   machine-processing an LGR.  Specifying a "ref" attribute on a range
   element is equivalent to specifying the same ref attribute on every
   single code point of the range.  All reference identifiers MUST be
   from the set declared in the "references" element (see
   Section 4.3.8).  It is RECOMMENDED that they be listed in ascending
   order.

   In addition to "char", "range" and "var" elements in the data
   section, a ref attribute may be present for literals ("char" inside a
   rule) as well as rules and class definitions, but not for named
   references to them.

Davies & Freytag        Expires September 6, 2014              [Page 17]
Internet-Draft      Label Generation Rulesets in XML          March 2014

5.2.5.  Variants with Reflexive Mapping

   At first sight there seems to be no call for adding variant mappings
   for which source and target code points are the same, that is for
   which the mapping is reflexive, or, in other words, an identity
   mapping.  Yet such reflexive mappings occur frequently in LGRs that
   follow [RFC3743].

   Adding a "var" element allows both a disposition and a reference id
   to be specified for it.  While the reference id is not used in
   processing, the disposition value can be used to trigger actions.  In
   permuting the label to generate all possible variants, the
   disposition value associated with a reflexive variant mapping is
   applied to any of the permuted labels containing the original code
   point.

   In the following example, the code point U+3473 exists both as a
   variant of U+3447 and as a variant of itself (reflexive mapping).
   Assuming an original label of "3473 3447", the permuted variant "3473
   3473" would consist of the reflexive variant of 3473 followed by a
   variant of 3447.  Accordingly, the dispositions for both of the
   variant mappings used to generate that particular permutation would
   have the value "preferred" given the following definitions of variant
   mappings:

        <char cp="3447" ref="0">
         <var cp="3473" disp="preferred" ref="1 3" />
       </char>
       <char cp="3473" ref="0">
         <var cp="3447" disp="block" ref="1 3" />
         <var cp="3473" disp="preferred" ref="0" />
       </char>

   Having established the disposition values in this way, a set of
   actions could be defined that return a disposition of "allocate" or
   "activate" for a label consisting exclusively of variants with
   disposition "preferred" for example.  (For details on how to define
   actions based on variant dispositions see Section 7.)

   In general, using reflexive variant mappings in this manner, makes it
   possible to calculate disposition values using a uniform approach for
   all labels, whether they consist of mapped variant code point,
   original code points, or a mixture of both.  In particular, the
   disposition values for two otherwise identical labels may differ
   based on which variant mappings were executed in order to generate
   each of them.  (For details on how to generate variants and evaluate
   dispositions, see Section 8.)

Davies & Freytag        Expires September 6, 2014              [Page 18]
Internet-Draft      Label Generation Rulesets in XML          March 2014

5.2.6.  Conditional Variants

   Fundamentally, variants are mappings between two sequences of code
   points.  However, in some instances for a variant relationship to
   exist, some context external to the code point sequence must be
   considered.  For example, a positional context may determine whether
   two code point sequences are variants of each other.

   An example of that are the Arabic code points, which can have
   different forms based on position, with some code points sharing
   forms, thus making them variants in the positions corresponding to
   those forms.  Such positional context cannot be solely derived from
   the code point by itself, as the code point would be the same for the
   various forms.

   To specify a conditional variant relationship the optional "when"
   attribute is used.  The variant relationship exists when the
   condition in the "when" attribute is satisfied.  A "not-when"
   attribute may be used for conditions that must not be satisfied.  The
   value of each "when" or "not-when" attributes is a parameterized
   context rule as described below in Section 6.4.

   Assuming the "rules" element contains suitably defined rules for
   "arabic-isolated" and "arabic-final", the following example shows how
   to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW (U+0673) as a
   variant of ARABIC LETTER ALEF WITH HAMZA BELOW (U+0625), but only
   when it appears in its isolated or final forms:

       <char cp="0625">
           <var cp="0673" when="arabic-isolated"/>
           <var cp="0673" when="arabic-final"/>
       </char>

   Only a single "when" or "not-when" attribute can be applied to any
   "var" element, however, multiple "var" elements using the same
   mapping, but different "when" or "not-when" attributes may be
   specified.

   While currently Arabic is the only script known for which such
   conditional variants are defined. there are other scripts, such as
   Mongolian, which share the concept of positional forms.  By requiring
   explicit definitions for these rules, this mechanism can easily
   handle any additional types of conditional variants that are
   required.

   As described in Section 5.1 a "when" or "not-when" attribute may also
   be specified to any "char" element in the data section to define
   required or prohibited contextual conditions under which a code point

Davies & Freytag        Expires September 6, 2014              [Page 19]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   is valid.

5.2.7.  The comment Attribute

   Any "char", "range" or "variant" element may contain a "comment"
   attribute.  The contents of a comment attribute are free-form plain
   text.  Comments are ignored in machine processing of the table.
   Comment attributes may also be placed on certain elements in the
   "rules" section of the document, such as actions and literals
   ("char"), as well as definitions of classes and rules, but not named
   references to them.  Finally, in the metadata the "version" and
   "reference" elements may have comment attributes to match the syntax
   in [RFC3743]

5.3.  Code Point Tagging

   Typically, LGRs are used to explicitly designate allowable code
   points, where any label that contains a code point not explicitly
   listed in the LGR is considered an ineligible label according to the
   ruleset.

   For more complex registry rules, there may be a need to discern on or
   more subsets code points.  This can be accomplished by applying a
   "tag" attribute to char or range elements, thereby defining character
   classes (see Section 6.2.1) which can then be used in whole label
   evaluation rules (see Section 6.3.2).  Tag attributes may be of any
   value, and multiple values are separated by space.  Code point
   sequences not being proper members of a set of code points, a "tag"
   attribute MUST NOT be present in a char element defining a code point
   sequence.

   A simple example of tag use would be to label preferred code points
   (as in [RFC3743]) by adding "preferred" to the tag, and then using a
   rule such as shown in Section 6.3.1 to single out labels for
   allocation that consist entirely of such preferred code points.  For
   a variety of reasons, actual tables use a different approach.

Davies & Freytag        Expires September 6, 2014              [Page 20]
Internet-Draft      Label Generation Rulesets in XML          March 2014

6.  Whole Label and Context Evaluation

6.1.  Basic Concepts

   The code points in a label sometimes need to satisfy context-based
   rules, for example for the label to be considered valid, or to
   satisfy the context for a variant mapping (see the description of the
   "when" attribute in Section 6.4).

   A Whole Label Evaluation rule (WLE) is applied to the whole label.
   It is used to validate both original labels and variant labels
   computed from them using a permutation over all applicable variant
   mappings.  A conditional context rules is a specialized form of WLE
   specific to the context around a single code point or code point
   sequence.  For example, if a rule is referenced in the "when"
   attribute of a variant mapping it is used to describe the conditional
   context under which the particular variant mapping is defined to
   exist.

   Each rule is defined in a "rule" element.  A rule may contain the
   following as child elements:

   o  literal code points or code point sequences

   o  character classes, which define sets of code points to be used for
      context comparisons;

   o  nested rules; and

   o  context operators, which define when character classes and
      literals may appear; and

   Collectively, these are called match operators and are listed in
   Section 6.3.2.

6.2.  Character Classes

   Character classes are sets of characters that often share a
   particular property.  While they function like sets in every way,
   even supporting the usual set operators, they are called character
   classes here in a nod to the use of that term in regular expression
   syntax.  (This also avoids confusion with the term "character set" in
   the sense of character encoding.)

   Character classes (or sets) can be specified in several ways:

Davies & Freytag        Expires September 6, 2014              [Page 21]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   1.  by defining the set via matching a tag in the code point data.
       All characters with the same tag attribute are part of the same
       class.

   2.  by referencing one of the Unicode character properties defined in
       the Unicode Character Database[UAX42];

   3.  by explicitly listing all the code points in the class; or

   4.  by defining the class as a set combination of any number of other
       classes.

   A character class has an optional "name" attribute, consisting of a
   single identifier not containing spaces.  If it is omitted, the class
   is anonymous and exists only inside the rule or combined class where
   it is defined.  A named character class is defined independently and
   can be referenced by name from within any rules or as part of other
   character class definitions.

       <class name="example" comment="an example class definition">
           <char cp="0061" />
           <char cp="4E00" />
       </class>
       ...
       <rule>
           <class byref="example" />
       </rule>

   An empty "class" element with a "byref" attribute is a reference to
   an existing named class.  Such an element MUST NOT have either
   "comment" or "ref" attributes as those may only be placed on a class
   definition.  A "byref" and a "name" attribute MUST NOT occur in the
   same element.

6.2.1.  Tag-based Classes

   The char element may contain a tag attribute that consists of one or
   more space separated identifiers, for example:

       <char cp="0061" tag="letter lower"/>
       <char cp="4E00" tag="letter"/>

   This defines two tags for use with code point U+0061, the tag
   "letter" and the tag "lower".  Implicitly, this defines two named
   character classes, the class "letter" and the class "lower", the
   first with 0061 and 4E00 as elements and the latter with 0061, but
   not 4E00 as an element.  The document MUST not contain an explicitly
   named class definition of the same name as an implicitly named tag-

Davies & Freytag        Expires September 6, 2014              [Page 22]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   derived class.

6.2.2.  Unicode Property-based Classes

   A class is defined in terms of Unicode properties by giving the
   Unicode property alias and the property value or property value
   alias, separated by a colon.

       <class name="virama" property="ccc:9" />

   The example above selects all code points for which the Unicode
   canonical combining class (ccc) value is 9.  This value of the ccc is
   assigned to all code points that encode viramas.  The string "ccc" is
   the short-alias for the canonical combining class, as defined in the
   Unicode Character Database [UAX42].

   Unicode properties may, in principle, change between versions of the
   Unicode Standard.  However, the values assigned for a given version
   are fixed.  If Unicode Properties are used, a minimum Unicode version
   MUST be declared in the header.  (Note, some Unicode properties are
   by definition stable across versions and do not change once
   assigned.)

6.2.3.  Explicitly Declared Classes

   A class of code points may also be declared by listing the code
   points that are a member of the class.  This is useful when tagging
   cannot be used because code points are not listed individually as
   part of the eligible set of code points for the given LGR, for
   example because they only occur in code point sequences.

   To define a class in terms of an explicit list of code points:

       <class name="abc">
           <char cp="0061"/>
           <char cp="0062"/>
           <char cp="0063"/>
      </class>

   This defines a class named "abc" containing the code points for
   characters "a", "b" and "c".  The ordering of the code points is not
   material, but it is RECOMMENDED to list them in ascending order.

   Range operators may also be used to represent any series of
   consecutive code points.  The same declaration can be made as
   follows:

Davies & Freytag        Expires September 6, 2014              [Page 23]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <class name="abc">
           <range first-cp="0061" last-cp="0063"/>
       </class>

   Range and code point declarations can be freely intermixed.  A
   shorthand notation exists where code points are directly represented
   by space separated hexadecimal values, and ranges are represented by
   a start and end value separated by a hyphen.  The element:

       <class name="abc">0061 0062-0063</class>

   would be a more streamlined expression of the same class using the
   shorthand notation.

   A class element either contains any combination of char and range
   elements and no other elements, or a text node with the shorthand
   notation.

6.2.4.  Combined Classes

   Classes may be combined using operators for set complement, union,
   intersection, difference and symmetric difference (exclusive-or).
   Because classes fundamentally function like sets, the union of
   several character classes is itself a class, for example.

Davies & Freytag        Expires September 6, 2014              [Page 24]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   +-------------------+---------------------------------------------+
   | Logical Operation | Example                                     |
   +-------------------+---------------------------------------------+
   | Complement        | <complement><class byref="xxx"></complement>|
   +-------------------+---------------------------------------------+
   | Union             | <union>                                     |
   |                   |    <class byref="class-1"/>                 |
   |                   |    <class byref="class-2"/>                 |
   |                   |    <class byref="class-3"/>                 |
   |                   | </union>                                    |
   +-------------------+---------------------------------------------+
   | Intersection      | <intersection>                              |
   |                   |    <class byref="class-1"/>                 |
   |                   |    <class byref="class-2"/>                 |
   |                   | </intersection>                             |
   +-------------------+---------------------------------------------+
   | Difference        | <difference>                                |
   |                   |    <class byref="class-1"/>                 |
   |                   |    <class byref="class-2"/>                 |
   |                   | </difference>                               |
   +-------------------+---------------------------------------------+
   | Symmetric         | <symmetric-difference>                      |
   | Difference        |    <class byref="class-1"/>                 |
   |                   |    <class byref="class-2"/>                 |
   |                   | </symmetric-difference>                     |
   +-------------------+---------------------------------------------+

   The elements from this table may be arbitrarily nested inside each
   other, subject to the following restriction: a "complement" element
   MUST contain precisely one "class" or one of the operator elements,
   while an "intersection", "symmetric-difference" or "difference"
   element MUST contain precisely two, and a "union" element MUST
   contain two or more of these elements.

   An anonymous combined class can be defined directly inside a rule or
   of the match operator elements that allow child elements (see
   Section 6.3.2) by using the set combination as the outer element.

       <rule>
           <union>
               <class byref="xxx"/>
               <class byref="yyy"/>
           </union>
       </rule>

   The example shows the definition of an anonymous combined class that
   represents the union of classes "xxx" and "yyy".  There is no need to
   wrap this union inside another class element, and, in fact, set

Davies & Freytag        Expires September 6, 2014              [Page 25]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   combination elements MUST NOT be nested inside a "class" element.

   Lastly, to create a named combined class that can be referenced in
   other classes or in rules as <class byref="xxxyyy"/>, add a "name"
   attribute to the set combination element, for example <union
   name="xxxyyy" /> and place it at the top level below the "rules"
   element.

    <rules>
       <union class name="xxxyyy">
           <class byref="xxx"/>
           <class byref="yyy"/>
       </union>
         . . .
     </ rules>

   Because (as for sets) a combination of classes is itself a class, no
   matter how a class is created, a reference to it always uses the
   "class" element.  That is, a named class is always referenced via an
   empty "class" element using the "byref" attribute containing the name
   of the class to be referenced.

6.3.  Whole Label and Context Rules

   Each rule is comprised of a series of matching operators that must be
   satisfied in order to determine whether a label meets a given
   condition.  Rules may reference other rules or character classes
   defined elsewhere in the table.

6.3.1.  The rule Element

   A matching rule is defined by a "rule" element, the child elements of
   which are one of the match operators from the table below.  In
   evaluating a rule, each child element is matched in order.  Rule
   elements may be nested.

   Rules may optionally be named using a "name" attribute containing a
   single identifier string with no spaces.  A named rule may be
   incorporated into another rule by reference.  If the name attribute
   is omitted, the rule is anonymous and may not be incorporated by
   reference into another rule or referenced by an action or "when"
   attribute.

   A simple rule to match a label where all characters are members of
   the class "preferred":

Davies & Freytag        Expires September 6, 2014              [Page 26]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <rule name="preferred">
          <start />
              <class byref="preferred" count="1+"/>
           <end />
       </rule>

   Rules are paired with explicit and implied actions, triggering these
   actions when a rule matches a label.  For example, a simple explicit
   action for the rule shown above would be:

       <action disp="allocate" match="preferred" />

   which has the effect of setting the policy disposition for a label
   made up entirely of "preferred" code points to "allocate".  Explicit
   actions are further discussed in Section 7 and the use of rules in
   conditional contexts for implied actions is discussed in
   Section 5.2.6 and Section 7.5.

6.3.2.  The Match Operators

   The child elements of a rule are a series of match operators, which
   are listed here by type and name and with a basic example or two.

Davies & Freytag        Expires September 6, 2014              [Page 27]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   +------------+-------------+------------------------------------+
   | Type       | Operator    | Examples                           |
   +------------+-------------+------------------------------------+
   | logical    | any         | <any />                            |
   |            +-------------+------------------------------------+
   |            | choice      | <choice>                           |
   |            |             |  <rule byref="alternative1"/>      |
   |            |             |  <rule byref="alternative2"/>      |
   |            |             | </choice>                          |
   +--------------------------+------------------------------------+
   | location   | start       | <start />                          |
   |            +-------------+------------------------------------+
   |            | end         | <end />                            |
   +--------------------------+------------------------------------+
   | literal    | char        | <char cp="0061 0062 0063" />       |
   +--------------------------+------------------------------------+
   | set        | class       | <class byref="class1" />           |
   |            |             | <class>0061 0064-0065</class>      |
   +--------------------------+------------------------------------+
   | group      | rule        | <rule byref="rule1" />             |
   |            |             | <rule><any /><rule />              |
   +--------------------------+------------------------------------+
   | contextual | anchor      | <anchor />                         |
   |            +-------------+------------------------------------+
   |            | look-ahead  | <look-ahead><any /></look-ahead>   |
   |            +-------------+------------------------------------+
   |            | look-behind | <look-behind><any /></look-behind> |
   +--------------------------+------------------------------------+

   Any expression defining an anonymous class, including any of the set
   combination operators (see Section 6.2.4), in addition to references
   to a named classes.

   All match operators shown as empty elements in the Examples column of
   the table above do not support child elements of their own; otherwise
   match operators may be nested.  In particular, anonymous rule
   elements can be used for grouping.

6.3.3.  The count Attribute

   The count attribute specifies the minimally required or maximal
   permitted number of times a match operator is used to match input.
   If the count attribute is

   n or n:n  the match operator matches the input exactly n times, where
        n is 1 or greater.

Davies & Freytag        Expires September 6, 2014              [Page 28]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   n+   the match operator matches the input at least n times, where n
        is 0 or greater.

   n:m  the match operator matches the input at least n times where n is
        0 or greater, but matches the input up to m times in total,
        where m > n.

   missing  the match operator matches the input exactly once.

   In matching, greedy evaluation is used in the sense defined for
   regular expressions: beyond the required number or times, the input
   is matched as many times as possible, but not so often as to prevent
   a match of the remainder of the rule.

   The count attribute MUST NOT be applied to match operators of type
   "start", "end", "anchor", "look-ahead" and "look-behind".  It may be
   applied to "class" and "rule" elements only if they do not have a
   "name" attribute, that is to anonymous rules and classes or any
   invocation of predefined rules or classes by reference.

6.3.4.  The name and byref Attributes

   Rules (and classes) may be named using a "name" attribute and can
   then be nested inside other match operators only by reference.  To
   reference a named rule (or class) use a rule or class element with
   the "byref" attribute containing the name of the referenced element.
   It is an error to reference a rule or class for which the definition
   has not been seen, or that is not an implicitly defined tag-based
   class.  A rule or class element with a "byref" attribute does not
   have child elements, nor any "ref" or "comment" attributes.

   Here's an example of a rule requiring that all labels be letters
   (optionally followed by combining marks) and possibly digits.  The
   example shows rules and classes referenced by name.

Davies & Freytag        Expires September 6, 2014              [Page 29]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <class name="letter" property="gc:L"/>
       <class name="combining-mark" property="gc:M"/>
       <class name="digit" property="gc:Nd">
       <rule name="letter-grapheme">
          <class byref="letter" count="1+"/>
          <class byref="combining-mark" count="0+"/>
       </rule>
       <rule name="leading-letter" >
          <start />
          <rule byref="letter-grapheme" count="1"/>
          <choice count="0+">
              <rule byref="letter-grapheme" count="0+"/>
              <class byref="digit" count="0+"/>
          </choice>
          <end />
       </rule>

6.3.5.  The choice Element

   For cases where several alternates could be chosen, the "choice"
   element can encode a list of choices:

       <rule name="ldh">
          <choice count="1+">
              <class byref="letters"/>
              <class byref="digits"/>
              <char cp="002D"/>
          </choice>
       </rule>

   Each child element of a "choice" represents one alternative.  The
   first matching alternative determines the match for the choice
   element.  To express a choice where one alternative consists of a
   sequence of elements, they can be wrapped in an anonymous rule.

6.3.6.  Literal Code Point Sequences

   A literal code point sequence matches a single code point or a
   sequence.  It is defined by a "char" element, with the code point or
   sequence to be matched given by the "cp" attribute.  When used as a
   literal, a "char" element may contain a "count" in addition to the
   "cp" attribute, comments or references, but no conditional contexts
   or child elements.

6.3.7.  The any Element

   The "any" element matches any single code point.  It may have a
   "count" attribute.  For an example see Section 6.3.9

Davies & Freytag        Expires September 6, 2014              [Page 30]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   The "any" element" may have neither a "comment" nor a "ref"
   attribute.

6.3.8.  The start and end Elements

   To match the beginning or end of a label, use the "start" or "end"
   element.

       <rule name="empty-label">
           <start/>
           <end/>
       </rule>

   Whole Label Evaluation Rules in principle always apply to the entire
   label, but in practice, many rules do not need to cover the entire
   label.  For example, to express a requirement of not starting a label
   with a digit, the rule needs to describe only the initial part of a
   label.

   Start and end elements do not have a "count" or any other attribute.

6.3.9.  Example rule from IDNA2008

   This sections shows an example of the whole label evaluation rule
   from[RFC5892]forbidding the mixture of the Arabic-Indic and extended
   Arabic-Indic digits in the same label.

Davies & Freytag        Expires September 6, 2014              [Page 31]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <data>
          <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
                 tag="arabic-indic-digits" />
          <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
                 tag="extended-arabic-indic-digits" />
       </data>
       <rules>
          <rule name="mixed-digits">
             <choice>
               <rule>
                   <class byref="arabic-indic-digits"/>
                   <any count="0+"/>
                   <class byref="extended-arabic-indic-digits"/>
                </rule>
                <rule>
                   <class byref="extended-arabic-indic-digits"/>
                   <any count="0+"/>
                   <class byref="arabic-indic-digits"/>
                </rule>
             </choice>
          </rule>
       </rules>

   The preceding example also demonstrates several instances of the use
   of anonymous rules for grouping.

6.4.  Parameterized Context or When Rules

   A special type of rule provides a context for evaluating the validity
   of a code point or variant mapping.  This rule is invoked by the
   "when" attribute described in Section 5.2.6.  An action implied by a
   context rule always has a disposition of "invalid" whenever the rule
   is not matched (see Section 7.5).  Conversely, a "not-when" attribute
   results in a disposition of invalid whenever the rule is matched.

6.4.1.  The anchor Element

   Such parameterized context or "When Rules" may contain a special
   place holder represented by an "anchor" element.  As each When Rule
   is evaluated, the "anchor" element is replaced by a literal
   corresponding to the "cp" attribute of the element containing the
   "when" (or "not-when") attribute.  The match to the "anchor" element
   must be at the same position in the label as the code point or
   variant mapping triggering the When Rule.

   For example, the Greek lower numeral sign is invalid if not
   immediately preceding a character in the Greek script.  This is most
   naturally addressed with a When Rule using look-ahead:

Davies & Freytag        Expires September 6, 2014              [Page 32]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <char cp="0375" when="preceding-greek"/>
       ...
       <class name="greek-script" property="sc:Grek"/>
       <rule name="preceding-greek">
           <anchor/>
           <look-ahead>
               <class byref="greek-script"/>
           </look-ahead>
       </rule>

   In evaluating this rule, the "anchor" element is treated as if it was
   replaced by a literal

       <char cp="0375"/>

   but only the instance of U+0375 at the given position is evaluated.
   If a label had two instances of U+0375 with the first one matching
   the rule and the second not, then evaluating the When Rule MUST
   succeed for the first and fail for the second instance.

   Unlike other rules, When Rules containing an "anchor" element MUST
   only be invoked via the "when" or "not-when" attributes on code
   points or variants; otherwise their "anchor" elements cannot be
   evaluated.  However, it is possible to invoke rules not containing an
   "anchor" element from a "when" or "not-when" attribute.  (See
   Section 6.4.3)

6.4.2.  The look-behind and look-ahead Elements

   Context rules use the "look-behind" and "look-ahead" elements to
   define context before and after the code point sequence matched by
   the "anchor" element.  If the "anchor" element is omitted, neither
   the "look-behind" nor the "look-ahead" element may be present.

   Here is an example of a rule that defines an "initial" context for an
   Arabic code point:

Davies & Freytag        Expires September 6, 2014              [Page 33]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <class name="transparent" property="jt:T"/>
       <class name="right-joining" property="jt:R"/>
       <class name="left-joining" property="jt:L"/>
       <class name="dual-joining" property="jt:D"/>
       <class name="non-joining" property="jt:U"/>
       <rule name="Arabic-initial">
         <look-behind>
           <choice>
             <start/>
             <rule>
               <class byref="transparent" count="0+"/>
               <class byref="non-joining"/>
             </rule>
           </choice>
         </look-behind>
         <anchor/>
         <look-ahead>
           <class byref="transparent" count="0+" />
           <choice>
             <class byref="right-joining" />
             <class byref="dual-joining" />
           </choice>
         </look-ahead>
       </rule>

   A when rule contains any combination of "look-behind" , "anchor" and
   "look-ahead" elements in that order.  Each of these elements occurs
   at most once, except if nested inside a "choice" element in such a
   way that each in matching each alternative has only one occurrence is
   encountered.  Otherwise, the result is undefined.  None of these
   elements takes a "count" attribute.  If a context rule contains a
   look-ahead or look-behind element, it MUST contain an "anchor"
   element.

6.4.3.  Omitting the anchor Element

   If the "anchor" element is omitted, the evaluation of the context
   rule is not tied to the position of the code point or sequence
   associated with the "when" attribute.

   Katakana middle dot is invalid in any label not containing at least
   one Japanese character anywhere in the label.  Because this
   requirement is independent of the position of the middle dot, the
   rule does not require an "anchor" element.

Davies & Freytag        Expires September 6, 2014              [Page 34]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <char cp="30FB" when="japanese-in-label"/>
       <rule name="japanese-in-label">
           <union>
               <class property="sc:Hani"/>
               <class property="sc:Kata"/>
               <class property="sc:Hira"/>
           </union>
       </rule>

   The Katakana middle dot is used only with Han, Katakana or Hiragana.
   The corresponding When Rule requires that at least one code point in
   the label is in one of these scripts.  (Note that the Katakana middle
   dot itself is of script Common).

Davies & Freytag        Expires September 6, 2014              [Page 35]
Internet-Draft      Label Generation Rulesets in XML          March 2014

7.  The action Element

   The purpose of a rule is to trigger a specific action.  Often, the
   action simply results in blocking or invalidating a label that does
   not match a rule.  An example of an action invalidating a label
   because it does not match a rule named "leading-letter" is as
   follows:

      <action disp="invalid" not-match="leading-letter"/>

   If an action is to be triggered on matching a rule, a "match"
   attribute is used instead.  Actions are evaluated in the order that
   they appear in the XML file, Once an action is triggered by a label,
   the disposition defined in the "disp" attribute is assigned to the
   label and no other actions are evaluated for that label.

7.1.  The match and not-match Attributes

   A "match" or "not-match" attribute specify a rule that must be
   matched or not matched as a condition for triggering an action.  Only
   a single rule may be named as the value of a "match" or "not-match"
   attribute.  Because rules may be composed of other rules, this
   restriction to a single attribute value does not impose any
   limitation on the contexts that can trigger an action.

   An action may contain a "match" or a "not-match" attribute, but not
   both.  An action without any attributes is triggered by all labels
   unconditionally.  For a very simple LGR, the following action would
   allocate all labels that match the repertoire:

       <action disp="allocate" />

   Since rules are evaluated for all labels, whether they are the
   original label or computed by permuting the defined and valid variant
   mappings for the label's code points, actions based on matching or
   not matching a rule may be triggered for both original and variant
   labels, but they the rules are not affected by the disposition
   attributes of the variant mappings.  To trigger any actions base on
   these dispositions requires the use additional optional attributes
   for actions described next.

7.2.  Actions matching Variant Dispositions

7.2.1.  Variant Disposition triggers

   An action may contain one of the optional attributes "any-variant",
   "all-variants" or "only-variants" defining triggers based on variant
   dispositions.  The permitted value for these attributes consists of

Davies & Freytag        Expires September 6, 2014              [Page 36]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   one or more variant disposition values, separated by space.  When a
   variant label is generated, these disposition values are compared to
   the disposition values on the variant mappings used to generate the
   particular variant label.

   Any single match may trigger an action that contains an "any-variant"
   attribute, while for an "all-variants", "only-variants" attribute,
   the dispositions for all variant code points must match one or
   several of the dispositions specified in the attribute value to
   trigger the action.  An "only-variants" attribute will trigger the
   action only if the variant label contains no original code points
   other than those with a reflexive mapping (see Section 5.2.5).

   One of these variant disposition triggers may be used by itself or in
   conjunction with an attribute matching or not-matching a rule.  If
   variant triggers and rule-matching triggers are used together, the
   label MUST "match" or respectively "not-match" the specified rule,
   AND satisfy the conditions on the disposition values given by the
   "any-variant", "all-variants", or "only-variants" attribute.

7.2.2.  Example for RFC3743-style Tables

   This section gives an example of using variant disposition triggers,
   combined with variants with reflexive mappings Section 5.2.5 to
   achieve LGRs that implement tables like those defined according to
   [RFC3743] where the l is to allow only variants that consist entirely
   of simplified or traditional variants, in addition to the original
   label.

   Assuming an LGR where all variants have been given suitable "disp"
   attributes of "block", "simplified", "traditional", or "both",
   similar to the one in Appendix B.  Given such an LGR, the following
   example actions evaluate the disposition for the variant label:

       <action disp="block" any-variant="block" />
       <action disp="allocate" only-variants="simplified both" />
       <action disp="allocate" only-variants="traditional both" />
       <action disp="block" all-variants="simplified traditional " />
       <action disp="allocate" />

   The first action matches any variant label for which at least one of
   the code point variants carries the disposition "block".  The second
   matches any variant label for which all of the code point variants
   carry the disposition "simplified" or "both", in other words an all-
   simplified label.  The third matches any label for which all variants
   carry the disposition "traditional" or "both", or all traditional.
   These two actions are not triggered by any variant labels containing
   some original code points, unless the code point has a variant

Davies & Freytag        Expires September 6, 2014              [Page 37]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   defined with a reflexive mapping (Section 5.2.5).

   The final two actions rely on the fact that actions are evaluated in
   sequence, and that the first action triggered also defines the final
   disposition for a variant label (see Section 7.4).  They further rely
   on the assumption that the only variants with disposition "both" are
   also identity variants.

   Given these assumptions, any remaining simplified or traditional
   variants must then be part of a mixed label, and so are blocked; all
   labels surviving to the last action are original code points only
   (that is the original label).

   The assumption on identity mapping made above does not necessarily
   hold, so this scheme needs some refinements to cover tables where it
   is violated.  For a more complete example, see Appendix B.

7.3.  Recommended Disposition Values

   The precise nature of the policy action taken in response to a
   disposition and the name of the corresponding "disp" attributes are
   only partially defined here.  It is strongly RECOMMENDED to use the
   following dispositions only with their conventional sense.

   invalid  The resulting string is not a valid label.  This disposition
        may be assigned implicitly, see Section 7.5.  No variant labels
        should be generated from a variant mapping with this
        disposition.

   block  The resulting string is a valid label, but should be block
        from registration.  This would typically apply for a derived
        variant that has is undesirable as having no practical use or
        being confusingly similar to some other label.

   allocate  The resulting string should be reserved for use by the same
        operator of the origin string, but not automatically allocated
        for use.

   activate  The resulting string should be activated for use.  (This is
        the typical default action if no dispositions are defined and is
        known as a "preferred" variant in [RFC3743])

7.4.  Precedence

   Actions are applied in the order of their appearance in the file.
   This defines their relative precedence.  The first action triggered
   by a label defines the disposition for that label.  To define a
   specific order of precedence list the actions in the desired order.

Davies & Freytag        Expires September 6, 2014              [Page 38]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   The conventional order of precedence for the actions defined in
   Section 7.3 is "invalid", "block", "allocate", "activate" .  This
   default precedence is used for the default actions defined in
   Section 7.6.

7.5.  Implied Actions

   The context rules on code points ("not-when" or "when" rules) carry
   an implied action with a disposition of "invalid" (not eligible).
   These rules are evaluated at the time the code points for a label or
   its variant labels are checked for validity (see Section 8).  In
   other words, they are evaluated before any of the whole-label
   evaluation rules and with higher precedence.  The context rules for
   variant mappings are evaluated when variants are generated and / or
   when variant tables are made symmetric and transitive.  They have an
   implied action with a disposition of "invalid" (undefined) which
   means a putative variant mapping does not exist whenever the given
   context matches a "not-when" rule or fails to match a "when" rule
   specified for that mapping.

   Note that such non-existing variant mapping is different from a
   blocked variant, which is a variant code point mapping that exists
   but results in a label that may not be allocated.

7.6.  Default Actions

   As described in Section 7 any variant mapping may be given a "disp"
   attribute. defining a disposition.  An action containing an "any-
   variant" or "all-variants" attribute relates these disposition values
   to a resulting disposition for the entire variant label.

   If no actions are defined for the standard disposition values of
   "invalid", "block", "allocate" and "activate", then the following
   default actions exist that are shown below in their default order of
   precedence (see Section 7.4.  This default order for evaluating
   dispositions applies only to labels that triggered no explicitly
   defined actions and which are therefore handled by default actions.
   Default actions have a lower order of precedence than explicit
   actions (see Section 8.3).

   The default actions for variant labels are defined as follows:

      <action disp="invalid" any-variant="invalid"/>
       <action disp="block" any-variant="block"/>
       <action disp="allocate" any-variant="allocate"/>
       <action disp="activate" all-variants="activate"/>

   A final default action sets the disposition to "allocate" for any

Davies & Freytag        Expires September 6, 2014              [Page 39]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   label matching the repertoire for which no other action has been
   triggered (catch-all).

       <action disp="allocate" />

Davies & Freytag        Expires September 6, 2014              [Page 40]
Internet-Draft      Label Generation Rulesets in XML          March 2014

8.  Processing a Label Against an LGR

8.1.  Determining Eligibility for a Label

   In order to use a table to test a specific domain label for
   membership in the LGR, a consumer of the LGR must iterate through
   each code point within a given U-label, and test that each code point
   is a member of the LGR.  If any code point is not a member of the
   LGR, it shall be deemed as not eligible in accordance with the table.

   A code point is deemed a member of the table when it is listed with
   the "char" element, and all necessary condition listed in "when" or
   "not-when" attributes are correctly satisfied.

   A label must also not trigger any action that results in a
   disposition of "invalid" or equivalent, otherwise it is deemed not
   eligible.  (This step may be deferred, until dispositions are
   determined)

   For LGRs that contain reflexive variant mappings (defined in
   Section 5.2.5) the evaluation of dispositions must be deferred until
   variants are generated.  In essence, tables that use this feature
   treat the original as the (identity) variant of itself.  For such
   tables, the ordinary iteration over code points can at best be used
   to exclude a subset of invalid labels, effectively a pre-screening.

8.2.  Determining Variants for a Label

   For a given eligible label, the set of variant labels is deemed to
   consist of each possible permutation of original code points and
   "var" elements, whereby all "when" and "not-when" attributes are
   correctly satisfied for each code point or var element in the given
   permutation and all applicable whole label evaluation rules are
   satisfied as follows:

   o  Create each possible permutation of a label, by substituting each
      code point or code point sequence in turn by any defined variant
      mapping (including any reflexive mappings).

   o  Apply variant mappings with "when" or "not-when" attributes only
      if the conditions are satisfied

   o  Record each of the "disp" values on the variant mappings used in
      creating a given variant label; for any unmapped code point record
      the "disp" value of any variant with a reflexive mapping (see
      Section 5.2.5)

Davies & Freytag        Expires September 6, 2014              [Page 41]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   o  Determine the disposition for each variant label per Section 8.3

   o  If the disposition is "invalid", remove the label from the set

   o  If final evaluation of the disposition for the original label per
      Section 8.3 results in a disposition of "invalid" or equivalent,
      remove all associated variant labels from the set.

8.3.  Determining a  Disposition for a Label or variant Label

   For a given label (variant or original), its disposition is
   determined by evaluating in order of their appearance all actions for
   which the label or variant label satisfies the conditions.

   o  For any label, the disposition is given by the value of the "disp"
      attribute for the first action triggered by the label.  An action
      is triggered, if

      *  the label matches or doesn't match the whole label evaluation
         rule, given in the "match" or "not-match" attribute
         respectively for that action;

      *  any or all of the recorded variant dispositions for a variant
         label match the dispositions specified in an "any-variant" ,
         "all-variants", or "only-variants" attribute, respectively, for
         that action, and in case of "only-variants" the label contains
         only code points that are the target of applied variant
         mappings;

      *  the label matches or doesn't match the whole label evaluation
         rule, given in the "match" or "not-match" attribute
         respectively for that action and any or all of the recorded
         variant dispositions for a variant label match the dispositions
         specified in an "any-variant" , "all-variants", or "only-
         variants" attribute, respectively, for that action, and in case
         of "only-variants" the label contains only code points that are
         the target of applied variant mappings; or

      *  the action does not contain any "match", "not-match", "any-
         variant" or "all-variants" attributes (catch-all).

   o  For any remaining variant label, assign the variant label the
      disposition using the default actions defined in Section 7.6.  For
      this step, variant dispositions outside the predefined recommended
      set (see Section 7.3) are ignored.

   o  For any remaining label, set the disposition to "allocate".

Davies & Freytag        Expires September 6, 2014              [Page 42]
Internet-Draft      Label Generation Rulesets in XML          March 2014

9.  Conversion to and from Other Formats

   Both [RFC3743] and [RFC4290] provide different grammars for IDN
   tables.  These formats are unable to fully cater for the increased
   requirements of contemporary IDN variant policies.

   This specification is a superset of functionality provided by these
   IDN table formats, thus any table expressed in those formats can be
   expressed in this format.  Automated conversion can be conducted
   between tables conformant with the grammar specified in each
   document.

   For notes on how to translate an RFC 3743-style table, see
   Appendix B.

Davies & Freytag        Expires September 6, 2014              [Page 43]
Internet-Draft      Label Generation Rulesets in XML          March 2014

10.  IANA Considerations

   This document does not specify any IANA actions.

Davies & Freytag        Expires September 6, 2014              [Page 44]
Internet-Draft      Label Generation Rulesets in XML          March 2014

11.  Security Considerations

   There are no security considerations for this memo.

Davies & Freytag        Expires September 6, 2014              [Page 45]
Internet-Draft      Label Generation Rulesets in XML          March 2014

12.  References

   [ASIA-TABLE]
              DotAsia Organisation, ".ASIA ZH IDN Language Table".

   [LGR-PROCEDURE]
              Internet Corporation for Assigned Names and Numbers,
              "Procedure to Develop and Maintain the Label Generation
              Rules for the Root Zone in Respect of IDNA Labels".

   [RFC3339]  Klyne, G., Ed. and C. Newman, "Date and Time on the
              Internet: Timestamps", RFC 3339, July 2002.

   [RFC3743]  Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
              Engineering Team (JET) Guidelines for Internationalized
              Domain Names (IDN) Registration and Administration for
              Chinese, Japanese, and Korean", RFC 3743, April 2004.

   [RFC4290]  Klensin, J., "Suggested Practices for Registration of
              Internationalized Domain Names (IDN)", RFC 4290,
              December 2005.

   [RFC5564]  El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
              "Linguistic Guidelines for the Use of the Arabic Language
              in Internet Domains", RFC 5564, February 2010.

   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
              Languages", BCP 47, RFC 5646, September 2009.

   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
              Internationalized Domain Names for Applications (IDNA)",
              RFC 5892, August 2010.

   [TDIL-HINDI]
              Technology Development for Indian Languages (TDIL)
              Programme, "Devanagari Script Behaviour for Hindi".

   [UAX42]    Unicode Consortium, "Unicode Character Database in XML".

   [XML]      World Wide Web Consortium, "Extensible Markup Language
              (XML) 1.0".

Davies & Freytag        Expires September 6, 2014              [Page 46]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix A.  Example Table

   The following presents a sample XML LGR showing a near complete
   collection of most of the elements and attributes defined in this
   specification in somewhat typical context.

   <?xml version="1.0" encoding="utf-8"?>
   <lgr xmlns="http://www.iana.org/lgr/0.1">

     <meta>
       <version>1</version>
       <date>2010-01-01</date>
       <language>sv</language>
       <domain>example</domain>
       <description type="text/html">
           <![CDATA[
           This language table was developed with the
           <a href="http://swedish.example/">Swedish
           examples institute</a>.
           ]]>
       </description>
       <references>
         <reference id="0" >The Unicode Standard 6.3</reference>
         <reference id="1" >RFC 5892</reference>
         <reference id="2" >Big-5: Computer Chinese Glyph and Character
            Code Mapping Table, Technical Report C-26, 1984</reference>
       </references>
    </meta>
     <data>
       <char cp="002D" ref="1" comment="HYPHEN" />
       <range first-cp="0030" last-cp="0039" ref="1" tag="digit" />
       <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" />
       <range first-cp="0370" last-cp="0380"  />
       <char cp="00B7" when="catalan-middle-dot" />
       <char cp="200D" when="joiner" />
       <char cp="4E16" tag="preferred" ref="0">
         <var cp="4E17" disp="block" ref="2" />
         <var cp="534B" disp="allocate" ref="2" />
       </char>
       <char cp="4E17" ref="0">
         <var cp="4E16" disp="allocate" ref="2" />
         <var cp="534B" disp="allocate" ref="2" />
       </char>
       <char cp="534B" ref="0">
         <var cp="4E16" disp="allocate" ref="2" />
         <var cp="4E17" disp="block" ref="2" />
       </char>
     </data>

Davies & Freytag        Expires September 6, 2014              [Page 47]
Internet-Draft      Label Generation Rulesets in XML          March 2014

     <rules>
       <class name="virama" property="ccc:9" />
       <rule name="catalan-middle-dot" ref="0">
           <look-behind>
               <char cp="006C" />
           </look-behind>
           <anchor />
           <look-ahead>
               <char cp="006C" />
           </look-ahead>
       </rule>
       <rule name="joiner"  ref="1" >
           <look-behind>
               <class byref="virama" />
           </look-behind>
       </rule>
       <rule name="example" >
           <difference>
               <complement>
                   <class comment="use shorthand class notation">
                       006E 0070-0078
                   </class>
               </omplement>
               <class comment="use standard notation">
                   <range first-cp="0000" last-cp="001F" />
                   <char cp="007F" />
               </class>
           </difference>
       </rule>
       <rule name="preferred"
             comment="non-empty label of preferred code points">
           <class byref="preferred" count="1+" />
       </rule>
       <action disp="example" match="example" />
       <action disp="block" any-variant="block" />
       <action disp="activate" all-variants="allocate"
             match="preferred" />
       <action disp="activate"  match="preferred" />
     </rules>
   </lgr>

Davies & Freytag        Expires September 6, 2014              [Page 48]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix B.  How to Translate RFC 3743 based Tables into the XML Format

   As a background, the [RFC3743] rules work as follows:

   1.  The Original (requested) label is checked to make sure that all
       the code points are a subset of the repertoire.

   2.  If it passes the check, the Original label is allocatable.

   3.  Generate the all-simplified and all-traditional variant labels
       (union of all the labels generated using all the simplified
       variants of the code points) for allocation.

   To illustrate by example, here is one of the more complicated set of
   variants:

       U+4E7E
       U+4E81
       U+5E72
       U+5E79
       U+69A6
       U+6F27

   The following shows the relevant section of the Chinese language
   table published by the .ASIA registry [ASIA-TABLE].  Its entries
   read:

    <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>

   These are the lines corresponding to the set of variants listed above

   U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
   U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
   U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
   U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
   U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
   U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6

   The corresponding data section XML format would look like this:

       <data>
       <char cp="4E7E">
       <var cp="4E7E" disp="both" comment="identity" />
       <var cp="4E81" disp="block" />
       <var cp="5E72" disp="simp" />
       <var cp="5E79" disp="block" />
       <var cp="69A6" disp="block" />
       <var cp="6F27" disp="block" />

Davies & Freytag        Expires September 6, 2014              [Page 49]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       </char>
       <char cp="4E81">
       <var cp="4E7E" disp="trad" />
       <var cp="5E72" disp="simp" />
       <var cp="5E79" disp="block" />
       <var cp="69A6" disp="block" />
       <var cp="6F27" disp="block" />
       </char>
       <char cp="5E72">
       <var cp="4E7E" disp="trad"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="both" comment="identity"/>
       <var cp="5E79" disp="trad"/>
       <var cp="69A6" disp="block"/>
       <var cp="6F27" disp="block"/>
       </char>
       <char cp="5E79">
       <var cp="4E7E" disp="block"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="simp"/>
       <var cp="5E79" disp="trad" comment="identity"/>
       <var cp="69A6" disp="block"/>
       <var cp="6F27" disp="block"/>
       </char>
       <char cp="69A6">
       <var cp="4E7E" disp="block"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="simp"/>
       <var cp="5E79" disp="block"/>
       <var cp="69A6" disp="trad" comment="identity"/>
       <var cp="6F27" disp="block"/>
       </char>
       <char cp="6F27">
       <var cp="4E7E" disp="simp"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="block"/>
       <var cp="5E79" disp="block"/>
       <var cp="69A6" disp="block"/>
       <var cp="6F27" disp="trad" comment="identity"/>
       </char>
     </data>

   Here the simplified variants have been given a disposition of "simp",
   the traditional variants one of "trad" and all other ones are given
   "block".

   Note that some variant mappings map to themselves (identity), that is
   the mapping is reflexive (see Section 5.2.5).  In creating the

Davies & Freytag        Expires September 6, 2014              [Page 50]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   permutation of all variant labels, these mappings have no effect,
   other than adding a value to the variant disposition list for the
   variant label containing them.

   Because some variant mappings show in more than one column, while the
   XML format allows only a single disposition value, they have been
   given the disposition of "both".

   In the example so far, all of these are also mappings where source
   and target are identical that is, reflexive mappings as defined in
   Section 5.2.5.

   Given a label "U+4E7E U+4E81", the following labels would be ruled
   allocatable under [RFC3743] based on how it is commonly implemented
   in domain registries:

       Original label:     U+4E7E U+4E81
       Simplified label 1: U+4E7E U+5E72
       Simplified label 2: U+5E72 U+5E72
       Traditional label:  U+4E7E U+4E7E

   However, If we generated allocatable labels without regard to the
   simplified-to-traditional variants, we would end up with an extra
   allocatable label: "U+5E72 U+4E7E".  That label is comprised of an SC
   character and a TC character which shouldn't be allocatable, but it
   would be the result of a straight permutation of all variants with
   disposition other than disp="block".

   To more fully resolve the dispositions requires several actions to be
   defined as described in Section 7.2.2.  After blocking all labels
   that contain a variant with disposition "block", these actions will
   first allocate all labels that consist entirely of variants
   (including variants with reflexive mappings) that are "simp" or
   "both", then do likewise for labels that are entirely "trad" or
   "both".  All surviving labels containing any one of the dispositions
   "simp" or "trad" are now known to be part of an undesirable mixed
   simplified/traditional label and are blocked.  Finally, the remaining
   labels must be code points without variants or reflexive variants of
   type "both", in other words, the original label.

Davies & Freytag        Expires September 6, 2014              [Page 51]
Internet-Draft      Label Generation Rulesets in XML          March 2014

     <rules>
       <!--Action elements - order defines precedence-->
       <action disp="block" any-variant="block"
           comment="filter out by blocked code point" />
       <action disp="allocate"
           only-variants="simp both"
           comment="only allocate if simplified variant
           including reflexive (identity) mapping" />
       <action disp="allocate"
           only-variants="trad both"
           comment="only allocate if traditional variant,
           including reflexive (identity) mapping" />
       <action disp="block"
           any-variant="simp trad"
           comment="filter out any remaining variant code point" />
       <action disp="activate" comment="surviving labels must be
           original labels" />
     </rules>

   In the example above, variants with the disposition "both" occur only
   as part of identity mappings (as pointed out in the comments).  The
   scheme described so far relies on the assumption that this is always
   the case.  However, consider the following set of variants:

       U+62E0;U+636E;U+636E;U+64DA
       U+636E;U+636E;U+64DA;U+62E0
       U+64DA;U+636E;U+64DA;U+62E0

   for which the corresponding XML would be:

       <char cp="62E0">
       <var cp="636E" disp="both" comment=" BOTH, but NOT identity" />
       <var cp="64DA" disp="block" />
       </char>
       <char cp="636E">
       <var cp="636E" disp="simp" comment="identity, but not BOTH" />
       <var cp="64DA" disp="trad" />
       <var cp="62E0" disp="block" />
       </char>
       <char cp="64DA">
       <var cp="636E" disp="simp" />
       <var cp="64DA" disp="trad" comment="identity" />
       <var cp="62E0" disp="block" />
       </char>

   What is needed to make such variant sets work is a way to capture
   when a disposition is associated with an identity or reflexive
   mapping, and when it is associated with an ordinary variant mapping.

Davies & Freytag        Expires September 6, 2014              [Page 52]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   This can be done by adding a prefix "i-" in front of the disposition
   whenever the mapping is an identity mapping, for example the last
   "trad" in the preceding figure would become "i-trad".

   With all the dispositions prepared in this way, only a slight
   modification to the actions is needed to yield the correct set of
   allocatable labels:

      <action disp="block" any-variant="block" />
      <action disp="allocate" only-variants="simp i-simp both i-both" />
      <action disp="allocate" only-variants="trad i-trad both i-both" />
      <action disp="block" all-variants="simp trad both" />
      <action disp="allocate" />

   The first three actions get triggered by the same labels as before.

   The fourth action blocks any label that combines an original code
   point with any of the variant mappings, yet lets through all labels
   that are a combination of only original code points (everything
   having either no variant mapping or one of the identity mappings).
   These are the original labels and they are allocated in the last
   action.

   With this modification all RFC 3743-style tables can be converted to
   XML and, by using the above set of actions, the result will be that
   all variants consisting completely of variants preferred for
   simplified or traditional, respectively, will be allocated, as will
   be the original label.  All other variant labels will be blocked.

Davies & Freytag        Expires September 6, 2014              [Page 53]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix C.  Indic Syllable Structure Example

   In LGRs for Indic scripts it may be desirable to restrict valid
   labels to sequences of valid Indic syllables, or aksharas.  This
   appendix gives a sample set of rules designed to enforce this
   restriction.

   We start with the following BNF form for an akshara which has been
   published in "Devanagari Script Behavior for Hindi" [TDIL-HINDI] but
   which, if not directly valid for other languages and scripts used in
   India is at least similar to equivalent definitions used for them.

       V[m]|{C[N]H}C[N](H|[v][m])

   Where:

   V    (upper case) is any independent vowel

   m    is any vowel modifier (Devanagari Anusvara, Visarga, and
        Candrabindu)

   C    is any consonant (with inherent vowel)

   N    is Nukta

   H    is a Halant (or Virama)

   v    (lower case) is any dependent vowel sign (matra)

   {}   encloses items which may be repeated one or more times

   [ ]  encloses items which may or may not be present

   |    separates items, out of which only one can be present

   By using the Unicode property "InSC" or "Indic_Syllable_Category"
   which corresponds rather directly to the classification of characters
   in the BNF above, we can directly translate the BNF into a set of WLE
   rules matching the definition of an akshara.

 <rules>
    <!--Character Class Definitions go here-->
    <class name="halant" property="InSC:Virama" />
    <union name="vowel-modifier">
      <class property="InSC:Visarga" />
      <class property="InSC:Bindu" comment="includes anusvara" />
    </union>
    <!--Whole label evaluation and Context rules go here-->

Davies & Freytag        Expires September 6, 2014              [Page 54]
Internet-Draft      Label Generation Rulesets in XML          March 2014

    <rule name="consonant-with-optional-nukta">
        <class byref="InSC:Consonant" />
        <class byref="InSC:Nukta"  count="0:1"/>
    </rule>
    <rule name="independent-vowel-with-optional-modifier">
        <class byref="InSC:Vowel_Independent" />
        <class byref="vowel-modifier"  count="0:1" />
    </rule>
    <rule name="optional-dependent-vowel-with-optional-modifier" >
      <class byref="InSC:Vowel_Dependent" count="0:1" />
      <class byref="vowel-modifier" count="0:1"  />
    </rule>
    <rule name="consonant-cluster">
      <rule count="0+">
        <rule byref="consonant-with-optional-nukta" />
        <class byref="halant" />
      </rule>
      <rule byref="consonant-with-optional-nukta" />
      <choice>
        <class byref="halant" />
        <rule byref="optional-dependent-vowel-with-optional-modifier" />
      </choice>
    </rule>
    <rule name="akshara">
      <choice>
        <rule byref="independent-vowel-with-optional-modifier" />
        <rule byref="consonant-cluster" />
      </choice>
    </rule>
    <rule name="WLE-akshara-or-other" comment="series of one or
        more aksharas, possibly alternating with other types of
        code points such as digits">
      <start />
      <choice count="1+">
        <class property="InSC:other"  />
        <rule byref="akshara"  />
      </choice>
      <end />
    </rule>
    <!--Action elements go here - order defines precedence-->
    <action disp="invalid" not-match="WLE-akshara-or-other" />
  </rules>

   With the rules and classes as defined above, the final action assigns
   a disposition of "invalid" to all labels that are not composed of a
   sequence of well-formed aksharas, optionally interspersed with other
   characters, perhaps digits, for example.

Davies & Freytag        Expires September 6, 2014              [Page 55]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   The relevant Unicode property is as of this writing still considered
   provisional; however, it could be replicated by tagging repertoire
   values directly in the LGR which would remove the dependency on the
   Unicode Standard altogether.

Davies & Freytag        Expires September 6, 2014              [Page 56]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix D.  RelaxNG Schema

<grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
   xmlns="http://relaxng.org/ns/structure/1.0" ns="http://www.iana.org/lgr/0.1"
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <define name="language-tag">
    <data type="token"/>
  </define>
  <define name="domain-name">
    <text/>
  </define>
  <define name="code-point">
    <data type="token">
      <param name="pattern">[0-9A-F]{4,6}</param>
    </data>
  </define>
  <define name="code-point-sequence">
    <data type="token">
      <param name="pattern">[0-9A-F]{4,6}( [0-9A-F]{4,6})+</param>
    </data>
  </define>
  <define name="code-point-literal">
    <choice>
      <ref name="code-point"/>
      <ref name="code-point-sequence"/>
    </choice>
  </define>
  <define name="date">
    <data type="token">
      <param name="pattern">\d{4}-\d\d-\d\d</param>
    </data>
  </define>
  <define name="rule-ref">
    <data type="IDREF"/>
  </define>
  <define name="tag">
    <text/>
  </define>
  <define name="identifier">
    <data type="ID"/>
  </define>
  <define name="class-ref">
    <text/>
  </define>
  <define name="count-pattern">
    <data type="token">
      <param name="pattern">\d+(\+|:\d+)?</param>
    </data>

Davies & Freytag        Expires September 6, 2014              [Page 57]
Internet-Draft      Label Generation Rulesets in XML          March 2014

  </define>
  <define name="char">
    <element name="char">
      <attribute name="cp">
        <ref name="code-point-literal"/>
      </attribute>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="when">
          <ref name="rule-ref"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="not-when">
          <ref name="rule-ref"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="tag">
          <ref name="tag"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
      <zeroOrMore>
        <ref name="variant"/>
      </zeroOrMore>
    </element>
  </define>
  <define name="char-single">
    <element name="char">
      <attribute name="cp">
        <ref name="code-point"/>
      </attribute>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
    </element>
  </define>
  <define name="range">
    <element name="range">
      <attribute name="first-cp">

Davies & Freytag        Expires September 6, 2014              [Page 58]
Internet-Draft      Label Generation Rulesets in XML          March 2014

        <ref name="code-point-literal"/>
      </attribute>
      <attribute name="last-cp">
        <ref name="code-point-literal"/>
      </attribute>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="tag">
          <ref name="tag"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
      <text/>
    </element>
  </define>
  <define name="variant">
    <element name="var">
      <attribute name="cp">
        <ref name="code-point-literal"/>
      </attribute>
      <optional>
        <attribute name="type"/>
      </optional>
      <optional>
        <attribute name="when">
          <ref name="rule-ref"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="not-when">
          <ref name="rule-ref"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="disp"/>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
    </element>
  </define>

Davies & Freytag        Expires September 6, 2014              [Page 59]
Internet-Draft      Label Generation Rulesets in XML          March 2014

  <define name="class-invocation">
    <element name="class">
      <attribute name="byref">
        <ref name="class-ref"/>
      </attribute>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
    </element>
  </define>
  <define name="class-declaration">
    <element name="class">
      <optional>
        <attribute name="name">
          <ref name="identifier"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
      <choice>
        <attribute name="property"/>
        <oneOrMore>
          <choice>
            <ref name="char-single"/>
            <ref name="range"/>
          </choice>
        </oneOrMore>
        <text/>
      </choice>
    </element>
  </define>
  <define name="class-or-set-operator-nested">
    <choice>
      <ref name="class-invocation"/>

Davies & Freytag        Expires September 6, 2014              [Page 60]
Internet-Draft      Label Generation Rulesets in XML          March 2014

      <ref name="class-declaration"/>
      <ref name="set-operator"/>
    </choice>
  </define>
  <define name="class-or-set-operator-declaration">
    <choice>
      <ref name="class-declaration"/>
      <ref name="set-operator"/>
    </choice>
  </define>
  <define name="complement-operator">
    <element name="complement">
      <optional>
        <attribute name="name">
          <ref name="identifier"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <ref name="class-or-set-operator-nested"/>
    </element>
  </define>
  <define name="union-operator">
    <element name="union">
      <optional>
        <attribute name="name">
          <ref name="identifier"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <ref name="class-or-set-operator-nested"/>
      <oneOrMore>
        <ref name="class-or-set-operator-nested"/>
      </oneOrMore>
    </element>

Davies & Freytag        Expires September 6, 2014              [Page 61]
Internet-Draft      Label Generation Rulesets in XML          March 2014

  </define>
  <define name="intersection-operator">
    <element name="intersection">
      <optional>
        <attribute name="name">
          <ref name="identifier"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <ref name="class-or-set-operator-nested"/>
      <ref name="class-or-set-operator-nested"/>
    </element>
  </define>
  <define name="difference-operator">
    <element name="difference">
      <optional>
        <attribute name="name">
          <ref name="identifier"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <ref name="class-or-set-operator-nested"/>
      <ref name="class-or-set-operator-nested"/>
    </element>
  </define>
  <define name="symmetric-difference-operator">
    <element name="symmetric-difference">
      <optional>
        <attribute name="name">
          <ref name="identifier"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>

Davies & Freytag        Expires September 6, 2014              [Page 62]
Internet-Draft      Label Generation Rulesets in XML          March 2014

      </optional>
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <ref name="class-or-set-operator-nested"/>
      <ref name="class-or-set-operator-nested"/>
    </element>
  </define>
  <define name="set-operator">
    <choice>
      <ref name="complement-operator"/>
      <ref name="union-operator"/>
      <ref name="intersection-operator"/>
      <ref name="difference-operator"/>
      <ref name="symmetric-difference-operator"/>
    </choice>
  </define>
  <define name="any-matcher">
    <element name="any">
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
    </element>
  </define>
  <define name="choice-matcher">
    <element name="choice">
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <ref name="match-operator"/>
      <oneOrMore>
        <ref name="match-operator"/>
      </oneOrMore>
    </element>
  </define>
  <define name="char-matcher">
    <element name="char">
      <attribute name="cp">
        <ref name="code-point-literal"/>
      </attribute>
      <optional>
        <attribute name="count">

Davies & Freytag        Expires September 6, 2014              [Page 63]
Internet-Draft      Label Generation Rulesets in XML          March 2014

          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
    </element>
  </define>
  <define name="start-matcher">
    <element name="start">
      <empty/>
    </element>
  </define>
  <define name="end-matcher">
    <element name="end">
      <empty/>
    </element>
  </define>
  <define name="anchor-matcher">
    <element name="anchor">
      <empty/>
    </element>
  </define>
  <define name="look-ahead-matcher">
    <element name="look-ahead">
      <optional>
        <attribute name="comment"/>
      </optional>
      <oneOrMore>
        <ref name="match-operator"/>
      </oneOrMore>
    </element>
  </define>
  <define name="look-behind-matcher">
    <element name="look-behind">
      <optional>
        <attribute name="comment"/>
      </optional>
      <oneOrMore>
        <ref name="match-operator"/>
      </oneOrMore>
    </element>
  </define>
  <define name="match-operator">
    <choice>

Davies & Freytag        Expires September 6, 2014              [Page 64]
Internet-Draft      Label Generation Rulesets in XML          March 2014

      <ref name="any-matcher"/>
      <ref name="choice-matcher"/>
      <ref name="start-matcher"/>
      <ref name="end-matcher"/>
      <ref name="char-matcher"/>
      <ref name="class-or-set-operator-nested"/>
      <ref name="rule-matcher"/>
      <ref name="anchor-matcher"/>
      <ref name="look-ahead-matcher"/>
      <ref name="look-behind-matcher"/>
    </choice>
  </define>
  <define name="rule-declaration-top">
    <element name="rule">
      <attribute name="name">
        <ref name="identifier"/>
      </attribute>
      <optional>
        <attribute name="comment"/>
      </optional>
      <optional>
        <attribute name="ref"/>
      </optional>
      <oneOrMore>
        <ref name="match-operator"/>
      </oneOrMore>
    </element>
  </define>
  <define name="rule-matcher">
    <element name="rule">
      <optional>
        <attribute name="count">
          <ref name="count-pattern"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="comment"/>
      </optional>
      <choice>
        <attribute name="byref">
          <ref name="rule-ref"/>
        </attribute>
        <oneOrMore>
          <ref name="match-operator"/>
        </oneOrMore>
      </choice>
    </element>
  </define>

Davies & Freytag        Expires September 6, 2014              [Page 65]
Internet-Draft      Label Generation Rulesets in XML          March 2014

  <define name="action-declaration">
    <element name="action">
      <optional>
        <attribute name="comment"/>
      </optional>
      <attribute name="disp"/>
      <optional>
        <choice>
          <attribute name="match"/>
          <attribute name="not-match"/>
        </choice>
      </optional>
      <optional>
        <choice>
          <attribute name="any-variant"/>
          <attribute name="all-variants"/>
          <attribute name="only-variants"/>
        </choice>
      </optional>
    </element>
  </define>
  <start>
    <ref name="lgr"/>
  </start>
  <define name="lgr">
    <element name="lgr">
      <optional>
        <attribute name="id"/>
      </optional>
      <optional>
        <ref name="meta-section"/>
      </optional>
      <ref name="data-section"/>
      <optional>
        <ref name="rules-section"/>
      </optional>
    </element>
  </define>
  <define name="meta-section">
    <element name="meta">
      <zeroOrMore>
        <choice>
          <element name="version">
            <optional>
              <attribute name="comment"/>
            </optional>
            <text/>
          </element>

Davies & Freytag        Expires September 6, 2014              [Page 66]
Internet-Draft      Label Generation Rulesets in XML          March 2014

          <optional>
            <element name="date">
              <ref name="date"/>
            </element>
          </optional>
          <zeroOrMore>
            <element name="language">
              <ref name="language-tag"/>
            </element>
          </zeroOrMore>
          <zeroOrMore>
            <element name="domain">
              <ref name="domain-name"/>
            </element>
          </zeroOrMore>
          <optional>
            <element name="validity-start">
              <text/>
            </element>
          </optional>
          <optional>
            <element name="validity-end">
              <text/>
            </element>
          </optional>
          <zeroOrMore>
            <element name="unicode-version">
              <text/>
            </element>
          </zeroOrMore>
          <zeroOrMore>
            <element name="description">
              <optional>
                <attribute name="type"/>
              </optional>
              <text/>
            </element>
          </zeroOrMore>
          <optional>
            <element name="references">
              <zeroOrMore>
                <element name="reference">
                  <attribute name="id"/>
                  <optional>
                    <attribute name="comment"/>
                  </optional>
                  <text/>
                </element>

Davies & Freytag        Expires September 6, 2014              [Page 67]
Internet-Draft      Label Generation Rulesets in XML          March 2014

              </zeroOrMore>
            </element>
          </optional>
        </choice>
      </zeroOrMore>
    </element>
  </define>
  <define name="data-section">
    <element name="data">
      <oneOrMore>
        <choice>
          <ref name="char"/>
          <ref name="range"/>
        </choice>
      </oneOrMore>
    </element>
  </define>
  <define name="rules-section">
    <element name="rules">
      <zeroOrMore>
        <ref name="class-or-set-operator-declaration"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="rule-declaration-top"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="action-declaration"/>
      </zeroOrMore>
    </element>
  </define>
</grammar>

Davies & Freytag        Expires September 6, 2014              [Page 68]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix E.  Acknowledgements

   This format builds upon the work on documenting IDN tables by many
   different registry operators.  Notably, a comprehensive language
   table for Chinese, Japanese and Korean was developed by the "Joint
   Engineering Team" [RFC3743] that is the basis of many registry
   policies; and a set of guidelines for Arabic script registrations
   [RFC5564] was published by the Arabic-language community.

   Contributions that have shaped this document have been provided by
   Francisco Arias, Mark Davis, Nicholas Ostler, Thomas Roessler, Steve
   Sheng, Michel Suignard, Andrew Sullivan, Wil Tan and John Yunker.

Davies & Freytag        Expires September 6, 2014              [Page 69]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix F.  Editorial Notes

   This appendix to be removed prior to final publication.

F.1.  Known Issues and Future Work

   o  A method of specifying the origin URI for a table, and an
      expiration or refresh policy, as meta-data may be a useful way to
      declare how the table will be updated.

   o  The "domain" element should be specified as absolute, so that the
      Root can be identified as needed for the Root Zone LGR.

   o  The recommended names for disposition ("block" and "allocate")
      deviate from the name in the Root Zone LGR Procedure ("blocked"
      and "allocatable").  The latter were chosen to highlight that the
      machine processing of the LGR table is just the first step, actual
      allocation requires additional actions, hence "allocatable".  This
      should be resolved.

F.2.  Change History

   -00  Initial draft.

   -01  Add an XML Namespace, and fix other XML nits.  Add support for
        sequences of code points.  Improve on consistently using Unicode
        nomenclature.

   -02  Add support for validity periods.

   -03  Incorporate requirements from the Label Generation Ruleset
        Procedure for the DNS Root Zone.  These requirements include a
        detailed grammar for specifying whole-label variants, and the
        ability to explicitly declare of the actions associated with a
        specific variant.  The document also consistently applies the
        term "Label Generation Ruleset", rather than "IDN table", to
        reflect the policy term now being used to describe these.

   -04  Support reference information per [RFC3743].  Update description
        in response to feedback.  Extend the context rules to "char"
        elements and allow for inverse matching ("not-when").  Extend
        the description of label processing and implied actions, and
        allow for actions that reference disposition attributes on any
        or all variant mappings used in the generation of a variant
        label.

Davies & Freytag        Expires September 6, 2014              [Page 70]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   -05  Change the name of the "disposition" attribute to "disp".  Add
        comment attribute on version and reference elements.  Allow
        empty "cp" attributes in char elements to support expressing
        symmetric mapping of null variants.  Describe use of variants
        that map identically.  Clarify how actions are triggered, in
        particular based on variant dispositions, as well as description
        of default actions.  Revise description of processing a label
        and its variants.  Move example table at the head of appendices.
        Add "only-variants" attribute.  Change "name" attribute to
        "byref" attribute for referencing named classes and rules.
        Change "not" to "complement".  Remove "match" attribute on rules
        as redundant if "start" and "end" are supported.  Rename "match"
        element to "anchor" as better fitting it's function and removing
        confusion with both the "match" attribute on actions as well as
        the generic term Match Operator.  Augmented the examples
        relevant to [RFC3743].

   -06  Extend the discussion of reflexive variants and their use;
        includes update of the appendix on converting tables in the
        style of [RFC3743].  Improve description of tagging and clarify
        that it doesn't apply to sequences.  Specify that root zone uses
        ".".  Add an appendix with an Indic Syllable Structure example.
        Extend count attribute to allow maximal counts.

Davies & Freytag        Expires September 6, 2014              [Page 71]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Authors' Addresses

   Kim Davies
   Internet Corporation for Assigned Names and Numbers
   12025 Waterfront Drive
   Los Angeles, CA  90094
   US

   Phone: +1 310 301 5800
   Email: kim.davies@icann.org
   URI:   http://www.icann.org/

   Asmus Freytag
   ASMUS Inc.

   Email: asmus@unicode.org

Davies & Freytag        Expires September 6, 2014              [Page 72]