draft-newman-i18n-comparator-03

Network Working Group                                          C. Newman
Internet-Draft                                          Sun Microsystems
Expires: April 25, 2005                                        M. Duerst
                                                     W3C/Keio University
                                                        October 25, 2004



            Internet Application Protocol Collation Registry
                  draft-newman-i18n-comparator-03.txt


Status of this Memo


   This document is an Internet-Draft and is subject to all provisions
   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


   This Internet-Draft will expire on April 25, 2005.


Copyright Notice


   Copyright (C) The Internet Society (2004).


Abstract


   Many Internet application protocols include string-based lookup,
   searching, or sorting operations.  However the problem space for
   searching and sorting international strings is large, not fully
   explored, and is outside the area of expertise for the Internet
   Engineering Task Force (IETF).  Rather than attempt to solve such a
   large problem, this specification creates an abstraction framework so




Newman & Duerst          Expires April 25, 2005                 [Page 1]


Internet-Draft             Collation Registry               October 2004



   that application protocols can precisely identify a comparison
   function and the repertoire of comparison functions can be extended
   in the future.


Table of Contents


   1.   Introduction . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.1  Structure of this Document . . . . . . . . . . . . . . . .   4
     1.2  Conventions Used in this Document  . . . . . . . . . . . .   4
   2.   Collation Definition and Purpose . . . . . . . . . . . . . .   4
     2.1  Definition . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.2  Purpose  . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.3  Sort Keys  . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.   Collation Name Syntax  . . . . . . . . . . . . . . . . . . .   5
     3.1  Basic Syntax . . . . . . . . . . . . . . . . . . . . . . .   5
     3.2  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . .   6
     3.3  Ordering Direction . . . . . . . . . . . . . . . . . . . .   6
     3.4  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . .   6
     3.5  Naming Guidelines  . . . . . . . . . . . . . . . . . . . .   7
   4.   Collation Specification Requirements . . . . . . . . . . . .   7
     4.1  Operations Supported . . . . . . . . . . . . . . . . . . .   7
       4.1.1  Equality . . . . . . . . . . . . . . . . . . . . . . .   8
     4.2  Substring  . . . . . . . . . . . . . . . . . . . . . . . .   8
     4.3  Ordering . . . . . . . . . . . . . . . . . . . . . . . . .   8
     4.4  Internal Canonicalization Algorithm  . . . . . . . . . . .   9
     4.5  Use of Lookup Tables . . . . . . . . . . . . . . . . . . .   9
     4.6  Treatement of NULL Strings . . . . . . . . . . . . . . . .   9
     4.7  Multi-Value Attributes . . . . . . . . . . . . . . . . . .   9
   5.   Application Protocol Requirements  . . . . . . . . . . . . .   9
     5.1  Character Encoding . . . . . . . . . . . . . . . . . . . .  10
     5.2  Operations . . . . . . . . . . . . . . . . . . . . . . . .  10
     5.3  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . .  10
     5.4  Canonicalization Function  . . . . . . . . . . . . . . . .  11
     5.5  Disconnected Clients . . . . . . . . . . . . . . . . . . .  11
     5.6  Error Codes  . . . . . . . . . . . . . . . . . . . . . . .  11
     5.7  Octet Collation  . . . . . . . . . . . . . . . . . . . . .  11
   6.   Use by ACAP and Sieve  . . . . . . . . . . . . . . . . . . .  11
   7.   Collation Registration . . . . . . . . . . . . . . . . . . .  12
     7.1  Collation Registration Procedure . . . . . . . . . . . . .  12
     7.2  Collation Registration Format  . . . . . . . . . . . . . .  12
       7.2.1  Registration Template  . . . . . . . . . . . . . . . .  13
       7.2.2  The <collation> Element  . . . . . . . . . . . . . . .  13
       7.2.3  The <name> Element . . . . . . . . . . . . . . . . . .  14
       7.2.4  The <title> Element  . . . . . . . . . . . . . . . . .  14
       7.2.5  The <functions> Element  . . . . . . . . . . . . . . .  14
       7.2.6  The <specification> Element  . . . . . . . . . . . . .  14
       7.2.7  The <submitter> Element  . . . . . . . . . . . . . . .  14
       7.2.8  The <owner> Element  . . . . . . . . . . . . . . . . .  14




Newman & Duerst          Expires April 25, 2005                 [Page 2]


Internet-Draft             Collation Registry               October 2004



       7.2.9  The <version> Element  . . . . . . . . . . . . . . . .  14
       7.2.10   The <UnicodeVersion> Element . . . . . . . . . . . .  15
       7.2.11   The <UCAVersion> Element . . . . . . . . . . . . . .  15
       7.2.12   The <UCAMatchLevel> Element  . . . . . . . . . . . .  15
     7.3  DTD for Collation Registration . . . . . . . . . . . . . .  15
     7.4  Structure of Collation Registry  . . . . . . . . . . . . .  16
     7.5  Example Initial Registry Summary . . . . . . . . . . . . .  17
   8.   Guidelines for Expert Reviewer . . . . . . . . . . . . . . .  17
   9.   Initial Collations . . . . . . . . . . . . . . . . . . . . .  18
     9.1  ASCII Numeric Collation  . . . . . . . . . . . . . . . . .  18
       9.1.1  ASCII Numeric Collation Description  . . . . . . . . .  18
       9.1.2  ASCII Numeric Collation Registration . . . . . . . . .  19
     9.2  ASCII Casemap Collation  . . . . . . . . . . . . . . . . .  19
       9.2.1  ASCII Casemap Collation Description  . . . . . . . . .  19
       9.2.2  Legacy English Casemap Collation Registration  . . . .  20
       9.2.3  English Casemap Collation Registration . . . . . . . .  20
     9.3  Nameprep Collation . . . . . . . . . . . . . . . . . . . .  20
       9.3.1  Nameprep Collation Description . . . . . . . . . . . .  20
       9.3.2  Nameprep Collation Registration  . . . . . . . . . . .  21
     9.4  Basic Collation  . . . . . . . . . . . . . . . . . . . . .  21
       9.4.1  Basic Collation Description  . . . . . . . . . . . . .  21
       9.4.2  Basic Collation Registration . . . . . . . . . . . . .  24
       9.4.3  Basic Accent Sensitive Match Collation Registration  .  24
       9.4.4  Basic Case Sensitive Match Collation Registration  . .  25
     9.5  Octet Collation  . . . . . . . . . . . . . . . . . . . . .  25
       9.5.1  Octet Collation Description  . . . . . . . . . . . . .  25
       9.5.2  Octet Collation Registration . . . . . . . . . . . . .  26
   10.  IANA Considerations  . . . . . . . . . . . . . . . . . . . .  26
   11.  Security Considerations  . . . . . . . . . . . . . . . . . .  26
   12.  Open Issues  . . . . . . . . . . . . . . . . . . . . . . . .  26
   13.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . .  26
     13.1   Changes From -02 . . . . . . . . . . . . . . . . . . . .  26
     13.2   Changes From -01 . . . . . . . . . . . . . . . . . . . .  27
     13.3   Changes From -00 . . . . . . . . . . . . . . . . . . . .  27
   14.  References . . . . . . . . . . . . . . . . . . . . . . . . .  27
   14.1   Normative References . . . . . . . . . . . . . . . . . . .  27
   14.2   Informative References . . . . . . . . . . . . . . . . . .  28
        Authors' Addresses . . . . . . . . . . . . . . . . . . . . .  29
        Intellectual Property and Copyright Statements . . . . . . .  30













Newman & Duerst          Expires April 25, 2005                 [Page 3]


Internet-Draft             Collation Registry               October 2004



1.  Introduction


   The ACAP [11] specification introduced the concept of a comparator
   (which we call collation in this document), but failed to create an
   IANA registry.  With the introduction of stringprep [6] and the
   Unicode Collation Algorithm [8], it is now time to create that
   registry and populate it with some initial values appropriate for an
   international community.  This specification replaces and generalizes
   the definition of a comparator in ACAP and creates a collation
   registry.


1.1  Structure of this Document


   @@@@ to be completed


1.2  Conventions Used in this Document


   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
   in this document are to be interpreted as defined in "Key words for
   use in RFCs to Indicate Requirement Levels" [1].


   The attribute syntax specifications use the Augmented Backus-Naur
   Form (ABNF) [2] notation including the core rules defined in Appendix
   A.  This also inherits ABNF rules from Language Tags [5].


   The term 'protocol' is used in this memo in a very generic sense, and
   includes things such as query languages.


2.  Collation Definition and Purpose


2.1  Definition


   A collation is a named function which takes two arbitrary length
   character strings (with the exception of the i;octet (Section 9.5)
   collation) as input and can be used to perform one or more of three
   basic comparison operations: equality test, substring match, and
   ordering test.


2.2  Purpose


   Collations provide a multi-protocol abstraction layer for comparison
   functions so the details of a particular comparison operation can be
   specified by someone with appropriate expertise independent of the
   application protocol that consumes that collation.  This is similar
   to the way a charset [14] separates the details of octet to character
   mapping from a protocol specification such as MIME [9] or the way
   SASL [10] separates the details of an authentication mechanism from a
   protocol specification such as ACAP [11].




Newman & Duerst          Expires April 25, 2005                 [Page 4]


Internet-Draft             Collation Registry               October 2004



   Here a small diagram to help illustrate the value of this abstraction
   layer:


   +-------------------+                         +-----------------+
   | IMAP i18n SEARCH  |--+                      | Basic           |
   +-------------------+  |                   +--| Collation Spec  |
                          |                   |  +-----------------+
   +-------------------+  |  +-------------+  |  +-----------------+
   | ACAP i18n SEARCH  |--+--| Collation   |--+--| A stringprep    |
   +-------------------+  |  | Registry    |  |  | Collation Spec  |
                          |  +-------------+  |  +-----------------+
   +-------------------+  |                   |  +-----------------+
   | ...other protocol |--+                   |  | locale-specific |
   +-------------------+                      +--| Collation Spec  |
                                                 +-----------------+


   Thus IMAP, ACAP and future application protocols with international
   search capability simply specify how to interface to the collation
   registry instead of each protocol specification having to specify all
   the collations it supports.


2.3  Sort Keys


   One component of a collation is a canonicalization function which can
   be pre-applied to single strings and may enhance the performance of
   subsequent comparison operations.  Normally, this is an
   implementation detail of collations, but at times it may be useful
   for an application protocol to expose collation canonicalization over
   protocol.  Collation canonicalization can range from an identity
   mapping (e.g., the i;octet collation Section 9.5) to a mapping which
   makes the string unreadable to a human (e.g., the basic collation).


3.  Collation Name Syntax


3.1  Basic Syntax


   The collation name itself is a single US-ASCII string beginning with
   a letter and made up of letters, digits, or one of the following 4
   symbols: "-", ";", "=" or ".".  The name MUST NOT be longer than 254
   characters.


     collation-char  =  ALPHA / DIGIT / "-" / ";" / "=" / "."


     collation-name  =  ALPHA *253collation-char








Newman & Duerst          Expires April 25, 2005                 [Page 5]


Internet-Draft             Collation Registry               October 2004



3.2  Wildcards


   The string a client uses to select a collation MAY contain a wildcard
   ("*") character which matches zero or more collation-chars.  Wildcard
   characters MUST NOT be adjacent.  Clients which support disconnected
   operation SHOULD NOT use wildcards to select a collation, but clients
   which provide collation operations only when connected to the server
   MAY use wildcards.  If the wildcard string matches multiple
   collations, the server SHOULD select the collation with the broadest
   scope (preferably international scope), the most recent table
   versions and the greatest number of supported operations.  A single
   wildcard character ("*") refers to the application protocol collation
   behavior that would occur if no explicit negotiation were used.


     collation-wild  =  ("*" / (ALPHA ["*"])) *(collation-char ["*"])
                         ; MUST NOT exceed 255 characters total



3.3  Ordering Direction


   When used as a protocol element for ordering, the collation name MAY
   be prefixed by either "+" or "-" to explicitly specify an ordering
   direction.  As mentioned previously, "+" has no effect on the
   ordering function, while "-" negates the result of the ordering
   function.  In general, collation-order is used when a client requests
   a collation, and collation-sel is used with the server informs the
   client of the selected collation.


     collation-sel   =  ["+" / "-"] collation-name


     collation-order =  ["+" / "-"] collation-wild



3.4  URIs


   Some protocols are designed to use URIs to refer to collations rather
   than simple tokens.  A special section of the IANA web page is
   reserved for such usage.  The "collation-uri" form is used to refer
   to a specific IANA registry entry for a specific named collation (the
   collation registration may not actually be present if it is
   experimental).  The "collation-auri" form is an abstract name for an
   ordering, a comparator pattern or a vendor private comparator.










Newman & Duerst          Expires April 25, 2005                 [Page 6]


Internet-Draft             Collation Registry               October 2004



     collation-uri   =  "http://www.iana.org/assignments/collation/"
                        collation-name ".xml"


     collation-auri  =  ( "http://www.iana.org/assignments/collation/"
                        collation-order [".xml"]) / other-uri


     other-uri       =  absoluteURI
                     ;  excluding the IANA collation namespace.



3.5  Naming Guidelines


   While this specification makes no absolute requirements on the
   structure of collation names, naming consistency is important, so the
   following initial guidelines are provided.


   Collation names with an international audience typically begin with
   "i;".  Collation names intended for a particular language or locale
   typically begin with a language tag [5] followed by a ";".  After the
   first ";" is normally the name of the general collation algorithm
   followed by a series of algorithm modifications separated by the ";"
   delimiter.  Parameterized modifications will use "=" to delimit the
   parameter from the value.  The version numbers of any lookup tables
   used by the algorithm SHOULD be present as parameterized
   modifications.


   Collation names of the form *;vnd-domain.com;* are reserved for
   vendor-specific collations created by the owner of the domain name
   following the "vnd-" prefix.  Registration of such collations (or the
   name space as a whole) with intended use of "Vendor" is encouraged
   when a public specification or open-source implementation is
   available, but is not required.


4.  Collation Specification Requirements


4.1  Operations Supported


   A collation specification MUST state which of the three basic
   functions are supported (equality, substring, ordering) and how to
   perform each of the supported functions on any two input character
   strings including empty strings (with the exception of the i;octet
   (Section 9.5) collation).  Collations must be deterministic,
   i.e.given a collation with a specific name, and any two fixed input
   strings, the result MUST be the same for the same operation.
   Collations MUST be transitive.







Newman & Duerst          Expires April 25, 2005                 [Page 7]


Internet-Draft             Collation Registry               October 2004



4.1.1  Equality


   The equality function always returns "match" or "no-match" when
   supplied valid input and MAY return "error" if the input strings are
   not valid character strings or violate other collation constraints.


4.2  Substring


   The substring matching function determines if the first string is a
   substring of the second string.  A collation which supports substring
   matching will automatically support the two special cases of
   substring matching: prefix and suffix matching if those special cases
   are supported by the application protocol.  It returns "match" or
   "no-match" when supplied valid input and returns "error" when
   supplied invalid input.


   Application protocols MAY return position information for substring
   matches.  If this is done, the position information MUST include both
   the starting offset and the ending offset in the string.  This is
   important because more sophisticated collations can match strings of
   unequal length (for example, a pre-composed accented character will
   match a decomposed accented character).


4.3  Ordering


   The ordering function determines how two character strings are
   ordered.  It returns "-1" if the first string is listed before the
   second string according to the collation, "+1" if the second string
   is listed before the first string, and "0" if the two strings are
   equal.  If the order of the two strings is reversed, the result of
   the ordering function of the collation MUST be reversed, i.e.
   results which would be "+1" are instead "-1" and results which would
   be "-1" are instead "+1", while results which would be "0" stay "0".
   In general, collations SHOULD NOT return "0" unless the two character
   sequences are identical.


   Since ordering is normally used to sort a list of items, "error" is
   not a useful return value from the ordering function.  Strings with
   errors that prevent the sorting algorithm from functioning correctly
   should sort to the end of the list.  Thus if the first string is
   invalid while the second string is valid, the result will be "+1".
   If the second string is invalid while the first string is valid, the
   result will be "-1".  If  both strings are invalid, the result SHOULD
   match the result from the "i;octet" collation.


   When the collation is used with a "+" prefix, the behavior is the
   same as when used with no prefix.  When the collation is used with a
   "-" prefix, the result of the ordering function of the collation MUST




Newman & Duerst          Expires April 25, 2005                 [Page 8]


Internet-Draft             Collation Registry               October 2004



   be reversed.


4.4  Internal Canonicalization Algorithm


   A collation specification MUST describe the internal canonicalization
   algorithm.  This algorithm can be applied to individual strings and
   the result strings can be stored to potentially optimize future
   comparison operations.  A collation MAY specify that the
   canonicalization algorithm is the identity function.  The output of
   the canonicalization algorithm MAY have no meaning to a human.


4.5  Use of Lookup Tables


   Collations which use more than one customizable lookup table in a
   documented format MUST assign numbers to the tables they use.  This
   permits an application protocol command to access the tables used by
   a server collation.


4.6  Treatement of NULL Strings


   Unless otherwise specified by the collation or application protocol,
   a NULL string (as opposed to an empty string) is equal only to
   another NULL string, a NULL string is not a substring of any other
   string, and a NULL string sorts to a position after all non-NULL
   strings, but before strings which generate errors.


4.7  Multi-Value Attributes


   Some application protocols will permit the use of multi-value
   attributes with a collation.  This paragraph describes the rules that
   apply unless otherwise specified by the collation or application
   protocol.  In the case of the equality and substring operation, the
   operations are applied over each pair of single values from the two
   inputs.  If any combination produces an error, the result is an
   error.  Otherwise, if any combination produces a "match", the result
   is a match.  Otherwise the result is "no-match".  For the ordering
   function, the smallest ordinal character string from the first set of
   values is compared to the smallest ordinal character string from the
   second set of values.


5.  Application Protocol Requirements


   This section describes the requirements and issues that an
   application protocol which offers searching, substring matching and/
   or sorting and permits the use of characters outside the US-ASCII
   charset needs to consider.






Newman & Duerst          Expires April 25, 2005                 [Page 9]


Internet-Draft             Collation Registry               October 2004



5.1  Character Encoding


   The protocol specification has to make sure that it is clear on which
   characters (rather than just octets) the collations are used.  This
   can be done by specifying the protocol itself in terms of characters
   (e.g.  in the case of a query language), by specifying a single
   character encoding for the protocol (e.g.  UTF-8 [3]), or by
   carefully describing the relevant issues of character encoding
   labeling and conversion.  In the later case, details to consider
   include how to handle unknown charsets, any charsets which are
   mandatory-to-implement, any issues with byte-order that might apply,
   and any transfer encodings which need to be supported.


5.2  Operations


   The protocol must specify which of the operations defined in this
   specification (equality matching, substring matching and ordering)
   can be invoked in the protocol, and how they are invoked.  There may
   be more than one way to invoke an operation.


   The protocol MUST provide a mechanism for the client to select the
   collation to use with equality matching, substring matching and
   ordering.


   If the protocol provides positional information for the results of a
   substring match, that positional information MUST fully specify the
   substring in the result that matches independent of the length of the
   search string.  For example, returning both the starting and ending
   offset of the match would suffice, as would the starting offset and a
   length.  Returning just the starting offset is not acceptable.  This
   rule is necessary because advanced collations can treat strings of
   different lengths as equal (for example, pre-composed and decomposed
   accented characters).


5.3  Wildcards


   The protocol MUST specify whether it allows the use of wildcards in
   collation identifiers or not.  If the protocol allows wildcards,
   then:
      The protocol MUST specify how comparisons behave in the absence of
      explicit collation negotiation or when a collation of "*" is
      requested.  The protocol MAY specify that the default collation
      used in such circumstances is sensitive to server configuration.
      The protocol SHOULD provide a way to list available collations
      matching a given wildcard pattern or patterns.







Newman & Duerst          Expires April 25, 2005                [Page 10]


Internet-Draft             Collation Registry               October 2004



5.4  Canonicalization Function


   If the protocol provides a canonicalization function for strings,
   then use of collations MAY be appropriate for that function.  [Need
   to describe how that would be done.]


5.5  Disconnected Clients


   If the protocol supports disconnected clients, then a mechanism for
   the client to precisely replicate the server's collation algorithm is
   likely desirable.  Thus the protocol MAY wish to provide a command to
   fetch lookup tables used by charset conversions and collations.


5.6  Error Codes


   The protocol specification should consider assigning protocol error
   codes for the following circumstances:
   o  The client requests the use of a collation by name or pattern, but
      no implemented collation matches that pattern.
   o  The client attempts to use a collation for a function that is not
      supported by that collation.  For example, attempting to use the
      "i;ascii-numeric" collation for a substring matching function.
   o  The client uses an equality or substring matching collation and
      the result is an error.  It may be appropriate to distinguish
      between the two input strings, particularly when one is supplied
      by the client and one is stored by the server.  It might also be
      appropriate to distinguish the specific case of an invalid UTF-8
      string.


5.7  Octet Collation


   If the protocol permits the use of the i;octet (Section 9.5)
   collation, it has to say so.  The octet collation SHOULD NOT be used
   unless the protocol uses UTF-8 as its single character encoding.


   If the protocol permits the use of collations with data structures
   beyond those described in this specification ([is the following a
   list of  described data structures, or of undescribed data
   structures???] octet strings, NULL string, array of octet strings),
   the protocol MUST describe the default behavior for a collation with
   that data structure.


6.  Use by ACAP and Sieve


   Both ACAP [11] and Sieve [15] are standards track specifications
   which used collations prior to the creation of this specification and
   registry.  Those standards do not meet all the application protocol
   requirements described in Section 5.  For backwards compatibility,




Newman & Duerst          Expires April 25, 2005                [Page 11]


Internet-Draft             Collation Registry               October 2004



   those protocols use the "i;ascii-casemap" instead of
   "en;ascii-casemap".  [have to check whether the following is true:]
   These protocols allow the use of the i;octet (Section 9.5) collation
   working directly on UTF-8 data as used in these protocols.


7.  Collation Registration


7.1  Collation Registration Procedure


   IANA will create a mailing list collation@iana.org which can be used
   for public discussion of collation proposals prior to registration.
   Use of the mailing list is encouraged but not required.  The actual
   registration procedure will not begin until the completed
   registration template is sent to iana@iana.org.  The IESG will
   appoint a designated expert who will monitor the collation@iana.org
   mailing list and review registrations forwarded from IANA.  The
   designated expert is expected to tell IANA and the submitter of the
   registration within two weeks whether the registration is approved,
   approved with minor changes, or rejected with cause.  When a
   registration is rejected with cause, it can be re-submitted if the
   concerns listed in the cause are addressed.  Decisions made by the
   designated expert can be appealed to the IESG and subsequently follow
   the normal appeals procedure for IESG decisions.


   Collation registrations in a standards track, BCP or IESG-approved
   experimental RFC are owned by the IETF, and changes to the
   registration follow normal procedures for updating such documents.
   Collation registrations in other RFCs are owned by the RFC author(s).
   Other collation registrations are owned by the individual(s) listed
   in the contact field of the registration and IANA will preserve this
   information.  Changes to a registration MUST be approved by the
   owner.  In the event the owner cannot be contacted for a period of
   one month and a change is deemed necessary, the IESG MAY re-assign
   ownership to an appropriate party.


7.2  Collation Registration Format


   Registration of a collation is done by sending a well-formed XML
   document that validates with collationreg.dtd (Section 7.3).













Newman & Duerst          Expires April 25, 2005                [Page 12]


Internet-Draft             Collation Registry               October 2004



7.2.1  Registration Template


   Here is a template for the registration:


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="YYYY" scope="i18n" intendedUse="common">
     <name>collation name</name>
     <title>technical title for collation</title>
     <functions>equality order substring</functions>
     <specification>specification reference</specification>
     <owner>email address of owner or IETF</owner>
     <submitter>email address of submitter<submitter>
     <version>1</version>
     <UnicodeVersion>3.2</UnicodeVersion>
     <UCAVersion>3.1.1</UCAVersion>
   </collation>



7.2.2  The <collation> Element


   The root of the  registration document MUST be a <collation> element.
   The  collation element contains the other elements in the
   registration, which are described in the following sub-subsections,
   in the order given here.


   The <collation> element MAY include an "rfc=" attribute if the
   specification is in an RFC.  The "rfc=" attribute  gives only the
   number of the RFC, without any prefix, such as "RFC", or suffix, such
   as ".txt".


   The <collation> element  MUST include a "scope=" attribute, which
   MUST have one of the values "i18n", "local" or "other".


   The <collation> element  MUST include an "intendedUse=" attribute,
   which must have one fo the values "common", "limited", "vendor", or
   "deprecated".  Collation specifications intended for "common" use are
   expected to reference standards from standards bodies with
   significant experience dealing with the details of international
   character sets.


   Be aware that future revisions of this specification may add
   additional function types, as well as additional XML attributes and
   values.  Any system which automatically parses these XML documents
   MUST take this into account to preserve future compatibility.  A DTD
   for the current definition of the collation registration template is
   given in Section 7.3





Newman & Duerst          Expires April 25, 2005                [Page 13]


Internet-Draft             Collation Registry               October 2004



7.2.3  The <name> Element


   The <name> element gives the precise name of the comparator.  The
   <name> element is mandatory.


7.2.4  The <title> Element


   The <title> element give the title of the comparator.  The <title>
   element is mandatory.


7.2.5  The <functions> Element


   The <functions> element lists which of the three functions the
   comparator provides.  The <functions> element is mandatory.


7.2.6  The <specification> Element


   The <specification> element describes where to find the
   specification.  The <specification> element is mandatory.  It MAY
   have a URI attribute.  [check that the following is really true; it
   reflects what is currently in the DTD; also, say what it means/in
   what cases it should be used] There may be more than one
   <specification> elements.


7.2.7  The <submitter> Element


   The <submitter> element provides an RFC 2822 email address for the
   person who submitted the registration.  It is optional if the <owner>
   element contains an email address.  [check that the following is
   really true; it reflects what is currently in the DTD; also, say what
   it means/in what cases it should be used] There may be more than one
   <submitter> elements.


7.2.8  The <owner> Element


   The <owner> element contains either the four letters "IETF" or an
   email address of the owner of the registration.  The <owner> element
   is mandatory.  [check that the following is really true; it reflects
   what is currently in the DTD; also, say what it means/in what cases
   it should be used] There may be more than one <owner> elements.


7.2.9  The <version> Element


   The <version> element is included when the registration is likely to
   be revised or has been revised in such a way that the results change
   for certain input strings.  The <version> element is optional.






Newman & Duerst          Expires April 25, 2005                [Page 14]


Internet-Draft             Collation Registry               October 2004



7.2.10  The <UnicodeVersion> Element


   The <UnicodeVersion> element indicates the version number of the
   UnicodeData file on which the collation is based.  The
   <UnicodeVersion> element is optional.


7.2.11  The <UCAVersion> Element


   The <UCAVersion> element specifics the version of the Unicode
   Collation Algorithm on which the collation is based.  The
   <UCAVersion> element is optional.


7.2.12  The <UCAMatchLevel> Element


   The <UCAMatchLevel> element specifies the number of Unicode Collation
   Algorithm sort key levels used for the equality and substring
   operations.  The <UCAMatchLevel> element is optional.


7.3  DTD for Collation Registration

































Newman & Duerst          Expires April 25, 2005                [Page 15]


Internet-Draft             Collation Registry               October 2004



   <!-
     DTD for Collation Registration Document


     Data types:


     entity      description
     ======      ===========
     NUMBER      [0-9]+
     URI         As defined in RFC YYYY
     CTEXT       printable ASCII text (no line-terminators)
     TEXT        character data
     ->
   <!ENTITY % NUMBER        "CDATA">
   <!ENTITY % URI           "CDATA">
   <!ENTITY % CTEXT         "#PCDATA">
   <!ENTITY % TEXT          "#PCDATA">
   <!ELEMENT collation      (name,title,functions,specification+,owner+,
                             submitter*,version?,UnicodeVersion?,
                             UCAVersion?,UCAMatchLevel?)>
   <!ATTLIST collation
             rfc            %NUMBER;                           "0"
             scope          (i18n|local|other)                 #IMPLIED
             intendedUse    (common|limited|vendor|deprecated) #IMPLIED>
   <!ELEMENT name           (%CTEXT;)>
   <!ELEMENT title          (%CTEXT;)>
   <!ELEMENT functions      (%CTEXT;)>
   <!ELEMENT specification  (%TEXT;)>
   <!ATTLIST specification
             uri            %URI;                              "">
   <!ELEMENT owner          (%CTEXT;)>
   <!ELEMENT submitter      (%CTEXT;)>
   <!ELEMENT version        (%CTEXT;)>
   <!ELEMENT UnicodeVersion (%CTEXT;)>
   <!ELEMENT UCAVersion     (%CTEXT;)>
   <!ELEMENT UCAMatchLevel  (%CTEXT;)>



7.4  Structure of Collation Registry


   Once the registration is approved, IANA will store each XML
   registration document in a URL of the form
   http://www.iana.org/assignments/collation/collation-name.xml where
   collation-name is the contents of the name element in the
   registration.  Both the submitter and the designated expert is
   responsible for verifying that the XML is well-formed and complies
   with the DTD.  In the future, it is hoped IANA will take over XML
   verification responsibility from the designated expert.





Newman & Duerst          Expires April 25, 2005                [Page 16]


Internet-Draft             Collation Registry               October 2004



   IANA will also maintain a text summary of the registry under the name
   http://www.iana.org/assignments/collation/summary.txt.  This summary
   is divided into four sections.  The first section is for collations
   intended for common use.  This section is intended for collation
   registrations published in IESG approved RFCs or for locally scoped
   collations from the primary standards body for that locale.  The
   designated expert is encouraged to reject collation registrations
   with an intended use of "common" if the expert believes it should be
   "limited", as it is desirable to keep the number of "common"
   registrations small and high quality.  The second section is reserved
   for limited use collations.  The third section is reserved for
   registered vendor specific collations.  The final section is reserved
   for deprecated collations.


7.5  Example Initial Registry Summary


   The following is an example of how IANA might structure the initial
   registry summary.txt file:


     Collation                              Functions Scope Reference
     ---------                              --------- ----- ---------
   Common Use Collations:
     i;nameprep;v=1;uv=3.2                  e, o, s   i18n  [RFC XXXX]
     i;basic;uca=3.1.1;uv=3.2               e, o, s   i18n  [RFC XXXX]
     i;basic;uca=3.1.1;uv=3.2;match=accent  e, o, s   i18n  [RFC XXXX]
     i;basic;uca=3.1.1;uv=3.2;match=case    e, o, s   i18n  [RFC XXXX]
     en;ascii-casemap                       e, o, s   Local [RFC XXXX]


   Limited Use Collations:
     i;octet                                e, o, s   Other [RFC XXXX]
     i;ascii-numeric                        e, o      Other [RFC XXXX]


   Vendor Collations:


   Deprecated Collations:
     i;ascii-casemap                        e, o, s   Local [RFC XXXX]



   References
   ----------
   [RFC XXXX]  Newman, C., "Internet Application Protocol Collation
               Registry", RFC XXXX, Sun Microsystems, October 2003.



8.  Guidelines for Expert Reviewer


   The expert reviewer appointed by the IESG has fairly broad latitude
   for this registry.  While a number of collations are expected




Newman & Duerst          Expires April 25, 2005                [Page 17]


Internet-Draft             Collation Registry               October 2004



   (particularly customizations of the basic collation for localized
   use), an explosion of collations (particularly common use collations)
   is not desirable for widespread interoperability.  However, it is
   important for the expert reviewer to provide cause when rejecting a
   registration, and when possible to describe corrective action to
   permit the registration to proceed.  The following table includes
   some example reasons to reject a registration with cause:
   o  The registration is not a well-formed XML document that follows
      the DTD.
   o  The registration has intended use of "common", but there is no
      evidence the collation will be widely deployed so it should be
      listed as "limited".
   o  The registration has intended use of "common", but is redundant
      with the functionality of a previously registered "common"
      collation.
   o  The collation name fails to precisely identify the version numbers
      of relevant tables to use.
   o  The registration fails to meet one of the "MUST" requirements in
      Section 4.
   o  The collation name fails to meet the syntax in Section 3.
   o  The collation specification referenced in the registration is
      vague or has optional features without a clear behavior specified.
   o  The referenced specification does not adequately address security
      considerations specific to that collation.


9.  Initial Collations


   This section describes an initial set of collations for the collation
   registry.


9.1  ASCII Numeric Collation


9.1.1  ASCII Numeric Collation Description


   The "i;ascii-numeric" collation is a simple collation intended for
   use with arbitrary sized decimal numbers stored as octet strings of
   US-ASCII digits (0x30 to 0x39).  It supports equality and ordering,
   but does not support the substring function.  The algorithm is as
   follows:
   1.  If neither string begins with a digit, return "error" if
       matching, or the result of the "i;octet" collation for ordering.
   2.  If the first string begins with a digit and the second string
       does not, return "error" if matching and "-1" for ordering.
   3.  If the second string begins with a digit and the first string
       does not, return "error" if matching and "+1" for ordering.
   4.  Let "n" be the number of digits at the beginning of the first
       string, and "m" be the number of digits at the beginning of the
       second string.




Newman & Duerst          Expires April 25, 2005                [Page 18]


Internet-Draft             Collation Registry               October 2004



   5.  If n is equal to m, return the result of the "i;octet" collation.
   6.  If n is greater than m, prepend a string of "n - m" zeros to the
       second string and return the result of the "i;octet" collation.
   7.  If m is greater than n, prepend a string of "m - n" zeros to the
       first string and return the result of the "i;octet" collation.


   The associated canonicalization algorithm is to truncate the input
   string at the first non-digit character.


9.1.2  ASCII Numeric Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="other" intendedUse="limited">
     <name>i;ascii-numeric</name>
     <title>ASCII Numeric</title>
     <functions>equality order</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
   </collation>



9.2  ASCII Casemap Collation


9.2.1  ASCII Casemap Collation Description


   The "en;ascii-casemap" collation is a simple collation intended for
   use with English language text in pure US-ASCII.  It provides
   equality, substring and ordering functions.  The algorithm first
   applies a canonicalization algorithm to both input strings which
   subtracts 32 (0x20) from all octet values between 97 (0x61) and 122
   (0x7A) inclusive.  The result of the collation is then the same as
   the result of the "i;octet" collation for the canonicalized strings.
   Care should be taken when using OS-supplied functions to implement
   this collation as this is not locale sensitive, but functions such as
   strcasecmp and toupper can be locale sensitive.


   For historical reasons, in the context of ACAP and Sieve, the name
   "i;ascii-casemap" is a synonym for this collation.












Newman & Duerst          Expires April 25, 2005                [Page 19]


Internet-Draft             Collation Registry               October 2004



9.2.2  Legacy English Casemap Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="local" intendedUse="deprecated">
     <name>i;ascii-casemap</name>
     <title>Legacy English Casemap</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
   </collation>



9.2.3  English Casemap Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="local" intendedUse="common">
     <name>en;ascii-casemap</name>
     <title>English Casemap</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
   </collation>



9.3  Nameprep Collation


9.3.1  Nameprep Collation Description


   The "i;nameprep;v=1;uv=3.2" collation is an implementation of the
   nameprep [7] specification based on normalization tables from Unicode
   version 3.2.  This collation applies the nameprep canoncialization
   function to both input strings and then returns the result of the
   i;octet collation on the canonicalized strings.  While this collation
   offers all three functions, the ordering function it provides is
   inadequate for use by the majority of the world.


   Version number 1 is applied to nameprep as specified in RFC 3491.  If
   the nameprep specification is revised without any changes that would
   produce different results when given the same pair of input octet
   strings, then the version number will remain unchanged.








Newman & Duerst          Expires April 25, 2005                [Page 20]


Internet-Draft             Collation Registry               October 2004



   The table numbers for tables used by nameprep are as follows:


                +--------------+-----------------------+
                | Table Number | Table Name            |
                +--------------+-----------------------+
                |            1 | UnicodeData-3.2.0.txt |
                |            2 | Table B.1             |
                |            3 | Table B.2             |
                |            4 | Table C.1.2           |
                |            5 | Table C.2.2           |
                |            6 | Table C.3             |
                |            7 | Table C.4             |
                |            8 | Table C.5             |
                |            9 | Table C.6             |
                |           10 | Table C.7             |
                |           11 | Table C.8             |
                |           12 | Table C.9             |
                +--------------+-----------------------+



9.3.2  Nameprep Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="i18n" intendedUse="common">
     <name>i;nameprep;v=1;uv=3.2</name>
     <title>Nameprep</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
     <version>1</version>
     <UnicodeVersion>3.2</UnicodeVersion>
   </collation>



9.4  Basic Collation


9.4.1  Basic Collation Description


   The basic collation is intended to provide tolerable results for a
   number of languages for all three functions (equality, substring and
   ordering) so it is suitable as a mandatory-to-implement collation for
   protocols which include ordering support.  The ordering function of
   the basic collation is the Unicode Collation Algorithm [8] version 9
   (UCAv9).


   The equality and substring functions are created as described in




Newman & Duerst          Expires April 25, 2005                [Page 21]


Internet-Draft             Collation Registry               October 2004



   UCAv9 section 8.  While that section is informative to UCAv9, it is
   normative to this collation specification.


   This collation is based on Unicode version 3.2, with the following
   tables relevant:
   1.  For the normalization step,
       <http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt>
       is used.  Column 5 is used to determine the canonical
       decomposition, while column 3 contains the canonical combining
       classes necessary to attain canonical order.
   2.  The table of characters which require a logical order exception
       is a subset of the table in
       <http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt> and
       is included here:


   0E40..0E44    ; Logical_Order_Exception
   # Lo   [5] THAI CHARACTER SARA E..THAI CHARACTER SARA AI MAIMALAI
   0EC0..0EC4    ; Logical_Order_Exception
   # Lo   [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI


   # Total code points: 10


   3.  The table used to translate normalized code points to a sort key
       is <http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt>.


   UCAv9 includes a number of configurable parameters and steps labelled
   as potentially optional.  The following list summarizes the defaults
   used by this collation:
   o  The logical order exception step is mandatory by default to
      support the largest number of languages.
   o  Steps 2.1.1 to 2.1.3 are mandatory as the repertoire of the basic
      collation is intended to be large.
   o  The second level in the sort key is evaluated forwards by default.
   o  The variable weighting uses the "non-ignorable" option by default.
   o  The semi-stable option is not used by default.
   o  Support for exactly three levels of collation is the default
      behavior.
   o  No preprocessing step is used by the basic collation prior to
      applying the UCAv9 algorithm.  Note that an application protocol
      specification MAY require pre-processing prior to the use of any
      collations.
   o  The equality and substring algorithms exclude differences at level
      2 and 3 by default (thus it is case-insensitive and ignores
      accentual distinctions.
   o  The equality and substring algorithms use the "Whole Characters
      Only" feature described in UCAv9 section 8 by default.


   The exact collation name with these defaults is




Newman & Duerst          Expires April 25, 2005                [Page 22]


Internet-Draft             Collation Registry               October 2004



   "i;basic;uca=3.1.1;uv=3.2".  When a specification states that the
   basic collation is mandatory-to-implement, only this specific name is
   mandatory-to-implement.


   In order to allow modification of the optional behaviors, the
   following ABNF is used for variations of the basic collation:


     basic-collation  =  ("i" / Language-Tag) ";basic;uca=3.1.1;uv=3.2"
                         [";match=accent" / ";match=case"]
                         [";tailor=" 1*collation-char ]


   If multiple modifiers appear, they MUST appear in the order described
   above.  The modifiers have the following meanings:
   match=accent   Both the first and second levels of the sort keys are
                  considered relevant to the equality and substring
                  operations (rather than the default of first level
                  only).  This makes the matching functions sensitive to
                  accentual distinctions.
   match=case     The first three levels of sort keys are considered
                  relevant to the equality and substring operations.
                  This makes the matching functions sensitive to both
                  case and accentual distinctions.


   The default weighting option is "non-ignorable".  The "semi-stable"
   sort key option is not used by default.


   The canonicalization algorithm associated with this collation is the
   output of step 3 of the UCAv9 algorithm (described in section 4.3 of
   the UCA specification).  This canonicalization is not suitable for
   human consumption.


   Finally, the UCAv9 algorithm permits the "allkeys" table to be
   tailored to a language.  People who make quality tailorings are
   encouraged to register those tailorings using the collation registry.
   Tailoring names beginning with "x" are reserved for experimental use,
   are treated as "Limited use" and MUST NOT match wildcards if any
   registered collation is available that does match.















Newman & Duerst          Expires April 25, 2005                [Page 23]


Internet-Draft             Collation Registry               October 2004



9.4.2  Basic Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="i18n" intendedUse="common">
     <name>i;basic;uca=3.1.1;uv=3.2</name>
     <title>Basic</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
     <UnicodeVersion>3.2</UnicodeVersion>
     <UCAVersion>3.1.1</UCAVersion>
     <UCAMatchLevel>1</UCAMatchLevel>
   </collation>



9.4.3  Basic Accent Sensitive Match Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="i18n" intendedUse="common">
     <name>i;basic;uca=3.1.1;uv=3.2;match=accent</name>
     <title>Basic Accent Sensitive Match</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
     <UnicodeVersion>3.2</UnicodeVersion>
     <UCAVersion>3.1.1</UCAVersion>
     <UCAMatchLevel>2</UCAMatchLevel>
   </collation>




















Newman & Duerst          Expires April 25, 2005                [Page 24]


Internet-Draft             Collation Registry               October 2004



9.4.4  Basic Case Sensitive Match Collation Registration


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="i18n" intendedUse="common">
     <name>i;basic;uca=3.1.1;uv=3.2;match=case</name>
     <title>Basic Case Sensitive Match</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
     <UnicodeVersion>3.2</UnicodeVersion>
     <UCAVersion>3.1.1</UCAVersion>
     <UCAMatchLevel>3</UCAMatchLevel>
   </collation>



9.5  Octet Collation


9.5.1  Octet Collation Description


   The "i;octet" collation is a simple and fast collation intended for
   use on binary octet strings rather than on character data.  It is the
   only such collation; it is not possible to register additional
   collations with this property.  Protocols that want to make this
   collation available have to do so by explicitly allowing it.  If not
   explicitly allowed, it MUST NOT be used.  It never returns an "error"
   result.  It provides equality, substring and ordering functions.


   The ordering algorithm is as follows:
   1.  If both strings are the empty string, return the result "0".
   2.  If the first string is empty and the second is not, return the
       result "-1".
   3.  If the second string is empty and the first is not, return the
       result "+1".
   4.  If both strings begin with the same octet value, remove the first
       octet from both strings and repeat this algorithm from step 1.
   5.  If the unsigned value (0 to 255) of the first octet of the first
       string is less than the unsigned value of the first octet of the
       second string, then return "-1".
   6.  If this step is reached, return "+1".


   This algorithm is roughly equivalent to the C library function memcmp
   with appropriate length checks added.


   The matching function returns "match" if the sorting algorithm would
   return "0".  Otherwise the matching function returns "no-match".





Newman & Duerst          Expires April 25, 2005                [Page 25]


Internet-Draft             Collation Registry               October 2004



   The substring function returns "match" if the first string is the
   empty string, or if there exists a substring of the second string of
   length equal to the length of the first string which would result in
   a "match" result from the equality function.  Otherwise the substring
   function returns "no-match".


   The associated canonicalization algorithm is the identity function.


9.5.2  Octet Collation Registration


   This collation is defined with intendedUse="limited" because it can
   only be used by protocols that explicitly allow it.


   <?xml verison='1.0'?>
   <!DOCTYPE rfc SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="i18n" intendedUse="limited">
     <name>i;octet</name>
     <title>Octet</title>
     <functions>equality order substring</functions>
     <specification>RFC XXXX</specification>
     <owner>IETF</owner>
     <submitter>chris.newman@sun.com<submitter>
   </collation>



10.  IANA Considerations


   Section 7 defines how to register collations with IANA.  This section
   should be carefully studied, and commented upon if necessary, by IANA
   before approval of this document for publication as an RFC.Section 9
   defines a list of predefined collations, which should be registered
   when this document is approved and published as an RFC.


11.  Security Considerations


   Collations will normally be used with UTF-8 strings.  Thus the
   security considerations for UTF-8 [3] and stringprep [6] also apply
   and are normative to this specification.


12.  Open Issues


   See http://www.w3.org/2004/08/ietf-collation.


13.  Change Log


13.1  Changes From -02






Newman & Duerst          Expires April 25, 2005                [Page 26]


Internet-Draft             Collation Registry               October 2004



   1.  Changed from data being octet sequences (in UTF-8) to data being
       character sequences (with octet collation as an exception).
   2.  Made XML format description much more structured.
   3.  Changed <submittor> to <submitter>, because this spelling is much
       more common.
   4.  Defined 'protocol' to include query languages.
   5.  Reorganized document, in particular IANA considerations section
       (which newly is just a list of pointers).
   6.  Added subsections, and a 'Structure of this Document' section.
   7.  Updated references.
   8.  Created a 'Change Log' chapter, with sections for each draft.
   9.  Reduced 'Open issues' section, open issues are now maintained at
       http://www.w3.org/2004/08/ietf-collation.


13.2  Changes From -01


   Add IANA comment to open issues.  Otherwise this is just a re-publish
   to keep the document alive.


13.3  Changes From -00


   1.  Replaced the term comparator with collation.  While comparator is
       somewhat more precise because these abstract functions are used
       for matching as well as ordering, collation is the term used by
       other parts of the industry.  Thus I have changed the name to
       collation for consistency.
   2.  Remove all modifiers to the basic collation except for the
       customization and the match rules.  The other behavior
       modifications can be specified in a customization of the
       collation.
   3.  Use ";" instead of "-" as delimiter between parameters to make
       names more URL-ish.
   4.  Add URL form for comparator reference.
   5.  Switched registration template to use XML document.
   6.  Added a number of useful registration template elements related
       to the Unicode Collation Algorithm.
   7.  Switched language from "custom" to "tailor" to match UCA language
       for tailoring of the collation algorithm.


14.  References


14.1  Normative References


   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, March 1997.


   [2]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
        Specifications: ABNF", RFC 2234, November 1997.




Newman & Duerst          Expires April 25, 2005                [Page 27]


Internet-Draft             Collation Registry               October 2004



   [3]  Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD
        63, RFC 3629, November 2003.


   [4]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource
        Identifier (URI): Generic Syntax",
        draft-fielding-uri-rfc2396bis-07.txt (work in progress), April
        2004.


   [5]  Alvestrand, H., "Tags for the Identification of Languages", BCP
        47, RFC 3066, January 2001.


   [6]  Hoffman, P. and M. Blanchet, "Preparation of Internationalized
        Strings ("stringprep")", RFC 3454, December 2002.


   [7]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
        Internationalized Domain Names (IDN)", RFC 3491, March 2003.


   [8]  Davis, M. and K. Whistler, "Unicode Collation Algorithm version
        9", July 2002,
        <http://www.unicode.org/reports/tr10/tr10-9.html>.


14.2  Informative References


   [9]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
         Extensions (MIME) Part One: Format of Internet Message Bodies",
         RFC 2045, November 1996.


   [10]  Myers, J., "Simple Authentication and Security Layer (SASL)",
         RFC 2222, October 1997.


   [11]  Newman, C. and J. Myers, "ACAP -- Application Configuration
         Access Protocol", RFC 2244, November 1997.


   [12]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
         Considerations Section in RFCs", BCP 26, RFC 2434, October
         1998.


   [13]  Resnick, P., "Internet Message Format", RFC 2822, April 2001.


   [14]  Freed, N. and J. Postel, "IANA Charset Registration
         Procedures", BCP 19, RFC 2978, October 2000.


   [15]  Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
         January 2001.








Newman & Duerst          Expires April 25, 2005                [Page 28]


Internet-Draft             Collation Registry               October 2004



Authors' Addresses


   Chris Newman
   Sun Microsystems
   1050 Lakes Drive
   West Covina, CA  91790
   US


   EMail: chris.newman@sun.com



   Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever
                 possible, for example as "D&#252;rst" in XML and HTML.)
   W3C/Keio University
   5322 Endo
   Fujisawa, Kanagawa  252-8520
   Japan


   Phone: +81 466 49 1170
   Fax:   +81 466 49 1171
   EMail: mailto:duerst@w3.org
   URI:   http://www.w3.org/People/D%C3%BCrst/






























Newman & Duerst          Expires April 25, 2005                [Page 29]


Internet-Draft             Collation Registry               October 2004



Intellectual Property Statement


   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.


   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.


   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.



Disclaimer of Validity


   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.



Copyright Statement


   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.



Acknowledgment


   Funding for the RFC Editor function is currently provided by the
   Internet Society.





Newman & Duerst          Expires April 25, 2005                [Page 30]

Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 4790. Expired & archived
	Select version	00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 RFC 4790
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf bibtex bibxml