Skip to main content

i;unicode-casemap - Simple Unicode Collation Algorithm
draft-crispin-collation-unicasemap-07

Revision differences

Document history

Date Rev. By Action
2012-08-22
07 (System) post-migration administrative database adjustment to the Yes position for Sam Hartman
2007-09-07
07 (System) IANA Action state changed to RFC-Ed-Ack from Waiting on RFC Editor
2007-09-07
07 (System) IANA Action state changed to Waiting on RFC Editor from In Progress
2007-09-07
07 (System) IANA Action state changed to In Progress from Waiting on Authors
2007-09-07
07 (System) IANA Action state changed to Waiting on Authors from In Progress
2007-09-07
07 (System) IANA Action state changed to In Progress from Waiting on Authors
2007-09-07
07 (System) IANA Action state changed to Waiting on Authors from In Progress
2007-09-06
07 Amy Vezza State Changes to RFC Ed Queue from Approved-announcement sent by Amy Vezza
2007-09-06
07 Amy Vezza IESG state changed to Approved-announcement sent
2007-09-06
07 Amy Vezza IESG has approved the document
2007-09-06
07 Amy Vezza Closed "Approve" ballot
2007-09-06
07 (System) IANA Action state changed to In Progress
2007-09-05
07 Lisa Dusseault State Changes to Approved-announcement to be sent from IESG Evaluation::AD Followup by Lisa Dusseault
2007-08-31
07 Sam Hartman [Ballot Position Update] Position for Sam Hartman has been changed to Yes from Discuss by Sam Hartman
2007-08-31
07 (System) New version available: draft-crispin-collation-unicasemap-07.txt
2007-08-24
07 (System) Removed from agenda for telechat - 2007-08-23
2007-08-23
07 Amy Vezza State Changes to IESG Evaluation::AD Followup from IESG Evaluation by Amy Vezza
2007-08-23
07 Jon Peterson [Ballot Position Update] New position, No Objection, has been recorded by Jon Peterson
2007-08-22
07 Ron Bonica [Ballot Position Update] New position, No Objection, has been recorded by Ron Bonica
2007-08-22
07 Dan Romascanu [Ballot Position Update] New position, No Objection, has been recorded by Dan Romascanu
2007-08-22
07 Lars Eggert [Ballot Position Update] New position, No Objection, has been recorded by Lars Eggert
2007-08-21
07 Russ Housley
[Ballot comment]
Based on Gen-ART Review from Christian Vogt.

  Section 1 ends with an applicability statement for the algorithm.
  The section says that, …
[Ballot comment]
Based on Gen-ART Review from Christian Vogt.

  Section 1 ends with an applicability statement for the algorithm.
  The section says that, while the algorithm is well-suited for
  technical languages, it does not work correctly in certain cases
  when applied to natural language.  My suggestion is to move the
  applicability statement to a more prominent place, perhaps into a
  new section preceding current section 1.

  3rd paragraph of section 1:  s/using using/using/
2007-08-21
07 Russ Housley [Ballot Position Update] New position, No Objection, has been recorded by Russ Housley
2007-08-20
07 David Ward [Ballot Position Update] New position, No Objection, has been recorded by David Ward
2007-08-20
07 Jari Arkko [Ballot Position Update] New position, Yes, has been recorded by Jari Arkko
2007-08-17
07 Chris Newman [Ballot Position Update] New position, Yes, has been recorded by Chris Newman
2007-08-17
07 Chris Newman
[Ballot comment]
In response to Sam's discuss, the normalization form described is
Normalization form KD excluding the "Canonical Ordering Behavior" of
non-spacing marks.

Unicode TR …
[Ballot comment]
In response to Sam's discuss, the normalization form described is
Normalization form KD excluding the "Canonical Ordering Behavior" of
non-spacing marks.

Unicode TR 15 (http://www.unicode.org/reports/tr15/) defines form KD
as "compatibility decomposition" and the version of the Unicode
specification I have handy (v 2.0) defines compatibility decomposition
as recursively applying both compatibility and canonical mappings and
then re-ordering non-spacing marks (I assume this hasn't changed).  As
the UniData file contains both compatibility and canonical decompositions
in the "decomposition property" (the Unicode Character Database overview
document calls this the Decomposition_Mapping field) the core of this
algorithm is the same as NFKD.

If this variant of NFKD was a form visible on the wire, I would
consider that problematic, but as it is a form used internal to the
algorithm only, I do not consider it problematic as long as it is
intentional.  The one place where this difference from normalization
form KD might have surprising results is if one of the input strings
contains a non-canonical decomposition of a letter with multiple diacritical marks.  In that case the character would not match under
the equality function.  As the intention of this collation is a 'cheap'
collation rather than a linguistically correct collation (just as i;ascii-casemap is not a proper English collation), I consider this simplification justifiable.  The "canonical ordering behavior" section
of the Unicode specification is non-trivial and provides little
incremental benefit (indeed little visible change) to this collation
for that additional complexity.

The specification could be made more helpful to implementors with a
Unicode library by giving advice about whether or not it is acceptable
to substitute NFKD for the partial-NFKD described here.

On the issue of tracking the Unicode standard, I'll mention that many
runtime environments have built-in Unicode support now and it can be
difficult to determine which version of Unicode is active and most such
environments provide no way simple way to bind to a previous version of
UniData.txt.  So I consider tracking the current version of Unicode to
be pragmatic and necessary to make implementations feasible.  I would
look to the "running code" principle of the IETF to support this
position.
2007-08-17
07 Sam Hartman
[Ballot discuss]
I don't expect to hold this discuss significantly past the telechat
and would not be surprised if no change is required.

I want …
[Ballot discuss]
I don't expect to hold this discuss significantly past the telechat
and would not be surprised if no change is required.

I want to ask how much review has been done for two issues:

1) How well does the decomposition normalization in this spec align
with NFKD.  The text says it is effectively the same, but where does
it produce different results?  Do we care?

2) Are we comfortable with the unstable reference to unicode data?  Do we at least need to discuss the security considerations of these changes?
2007-08-17
07 Sam Hartman [Ballot Position Update] New position, Discuss, has been recorded by Sam Hartman
2007-08-16
07 Cullen Jennings Placed on agenda for telechat - 2007-08-23 by Cullen Jennings
2007-08-16
07 Cullen Jennings [Ballot Position Update] New position, No Objection, has been recorded by Cullen Jennings
2007-08-16
07 Ron Bonica Removed from agenda for telechat - 2007-08-23 by Ron Bonica
2007-08-16
07 Tim Polk [Ballot Position Update] New position, No Objection, has been recorded by Tim Polk
2007-08-14
07 Ross Callon [Ballot Position Update] New position, No Objection, has been recorded by Ross Callon
2007-08-09
07 Lisa Dusseault Ballot has been issued by Lisa Dusseault
2007-08-09
06 (System) New version available: draft-crispin-collation-unicasemap-06.txt
2007-08-07
07 Lisa Dusseault [Ballot Position Update] New position, Yes, has been recorded for Lisa Dusseault
2007-08-07
07 Lisa Dusseault Ballot has been issued by Lisa Dusseault
2007-08-07
07 Lisa Dusseault Created "Approve" ballot
2007-08-07
07 Lisa Dusseault Placed on agenda for telechat - 2007-08-23 by Lisa Dusseault
2007-08-07
07 Lisa Dusseault State Changes to IESG Evaluation from Waiting for AD Go-Ahead by Lisa Dusseault
2007-08-07
07 Lisa Dusseault State Changes to Waiting for AD Go-Ahead from Waiting for Writeup by Lisa Dusseault
2007-08-07
05 (System) New version available: draft-crispin-collation-unicasemap-05.txt
2007-07-30
07 Lisa Dusseault
APPs-REVIEW

There are a couple of clarifications I would suggest:

Section 1, Para 1:

"All input is valid." - it is not clear that this …
APPs-REVIEW

There are a couple of clarifications I would suggest:

Section 1, Para 1:

"All input is valid." - it is not clear that this refers to the validity test in 4790 as opposed to input to the tests described in the previous sentence. Suggested alternative: "The validity test operation always returns a valid result."

Even with that change, later the spec states that "strings in other character sets and/or encodings can not be used with this collation" so wouldn't those return an invalid response if used in the validity test? Same for invalid UTF-8 sequences?

Section 1, Para 5:
I would like to see a informative reference pointing to the current UnicodeData.txt file rather than just having the generic [UNICODE] reference.

Section 5, Para 6:
"The resulting two titlecased canonicalized UTF-8 strings are then treated as in i;octet for equality and ordering."

shouldn't that also mention substring? Suggested alternative:

"The resulting two titlecased canonicalized UTF-8 strings are then treated as in i;octet for equality, substring and ordering operations."

Other than that looks good. This should proceed to publication asap as it is needed by a lot of apps.

--Cyrus Daboo
2007-06-20
07 (System) State has been changed to Waiting for Writeup from In Last Call by system
2007-06-07
07 Samuel Weiler Request for Last Call review by SECDIR Completed. Reviewer: Sean Turner.
2007-06-07
07 Yoshiko Fong
IANA Last Call Comments:

Upon approval of this document, the IANA will make the
following assignments in the "Collation Registry" registry
located at

http://www.iana.org/assignments/collation/collation-index.html

Coallition: …
IANA Last Call Comments:

Upon approval of this document, the IANA will make the
following assignments in the "Collation Registry" registry
located at

http://www.iana.org/assignments/collation/collation-index.html

Coallition: See Section 2 of [RFC-crispin-collation-unicasemap-02]

Description:
The i;unicode-casemap collation is well suited to to
use with many Internet protocols and computer languages.
Use with natural language is often inappropriate; even
though the collation apparently supports languages such
as Swahili and English, in real-world use it tends to
mis-sort a number of types of string

Reference: [RFC-crispin-collation-unicasemap-02]


We understand the above to be the only IANA Action for
this document.
2007-05-25
07 Lisa Dusseault
Issues raised to deal with in next version or RFC Ed notes:
- conversion to titlecased canonicalized UTF8 must be applied recursively
- deal with …
Issues raised to deal with in next version or RFC Ed notes:
- conversion to titlecased canonicalized UTF8 must be applied recursively
- deal with apparent conflict between paragraphs on using other Unicode mappings
2007-05-25
07 Samuel Weiler Request for Last Call review by SECDIR is assigned to Sean Turner
2007-05-25
07 Samuel Weiler Request for Last Call review by SECDIR is assigned to Sean Turner
2007-05-23
07 Amy Vezza Last call sent
2007-05-23
07 Amy Vezza State Changes to In Last Call from Last Call Requested by Amy Vezza
2007-05-22
07 Lisa Dusseault State Changes to Last Call Requested from AD Evaluation::AD Followup by Lisa Dusseault
2007-05-22
07 Lisa Dusseault Last Call was requested by Lisa Dusseault
2007-05-22
07 (System) Ballot writeup text was added
2007-05-22
07 (System) Last call text was added
2007-05-22
07 (System) Ballot approval text was added
2007-05-15
07 Lisa Dusseault State Changes to AD Evaluation::AD Followup from Publication Requested by Lisa Dusseault
2007-05-03
04 (System) New version available: draft-crispin-collation-unicasemap-04.txt
2007-05-02
07 Lisa Dusseault
PROTO Writeup

  (1.a)  Alexey Melnikov  is the document
shepherd for this document.  The document is ready for publication.

  (1.b)  This document was reviewed …
PROTO Writeup

  (1.a)  Alexey Melnikov  is the document
shepherd for this document.  The document is ready for publication.

  (1.b)  This document was reviewed by several active and experienced
IMAPEXT and Sieve WG members.  There are no concerns about the depth of
the reviews.

Also note that this document is a dependency for the IMAP I18N document
and an indirect dependency for Lemonade Profile Bis document
(draft-ietf-lemonade-profile-bis-XX.txt).

  (1.c)  No concerns requiring additional review.  In particular,
this document was reviewed by Arnt Gulbrandsen, who is one of the editors
of the RFC 4790.

  (1.d) No specific concerns. No IPR disclosure was filed for this
document.

  (1.e)  This document is an individual submission.

  (1.f)  No appeals threatened.

  (1.g)  IDnits 2.04.07 was used to verify the document. Excessively
long lines were found, but this is purely editorial.  It also reports
2 Missing Reference, but they are defined. I think this is a bug in
IDnits.  Also there are some reports on possible DOWNREFs. 2 of them
are informative, the other 2 point to Unicode documents.

  (1.h)  References are properly split. There are no downward normative
references.

  (1.i)  An IANA considerations section exists and is clearly defined. It
contains a registration of a new collation algorithm.

  (1.j)  The document doesn't have any ABNF, MIB, etc.
The XML registration template opens fine with Mozilla.

  (1.k)  Document Announcement Write-Up

        Technical Summary

This document describes "i;unicode-casemap", a simple
case-insensitive collation for Unicode strings. It provides
equality, substring and ordering operations.

        Working Group Summary

This document is an individual submission. It was informally last
called in the IMAPEXT WG and consensus was reached that this collation
would be easier to implement than i;basic, thus it has a better chance
of being deployed.

        Document Quality

There is at least one server implementation of this document. At least
2 other server vendors are interested in implementing it.

        Personnel

Alexey Melnikov  is the document shepherd
for this document.
Lisa Dusseault  is the responsible AD.
2007-05-02
07 Lisa Dusseault Draft Added by Lisa Dusseault in state Publication Requested
2007-04-17
03 (System) New version available: draft-crispin-collation-unicasemap-03.txt
2007-04-11
02 (System) New version available: draft-crispin-collation-unicasemap-02.txt
2007-03-22
01 (System) New version available: draft-crispin-collation-unicasemap-01.txt
2006-12-06
00 (System) New version available: draft-crispin-collation-unicasemap-00.txt