Skip to main content

i;unicode-casemap - Simple Unicode Collation Algorithm
draft-crispin-collation-unicasemap-07

Yes

(Jari Arkko)
(Lisa Dusseault)
(Sam Hartman)

No Objection

Lars Eggert
(Cullen Jennings)
(Dan Romascanu)
(David Ward)
(Jon Peterson)
(Ron Bonica)
(Ross Callon)
(Tim Polk)

Note: This ballot was opened for revision 07 and is now closed.

Lars Eggert No Objection

(Chris Newman; former steering group member) Yes

Yes (2007-08-17)
In response to Sam's discuss, the normalization form described is
Normalization form KD excluding the "Canonical Ordering Behavior" of
non-spacing marks.

Unicode TR 15 (http://www.unicode.org/reports/tr15/) defines form KD
as "compatibility decomposition" and the version of the Unicode
specification I have handy (v 2.0) defines compatibility decomposition
as recursively applying both compatibility and canonical mappings and
then re-ordering non-spacing marks (I assume this hasn't changed).  As
the UniData file contains both compatibility and canonical decompositions
in the "decomposition property" (the Unicode Character Database overview
document calls this the Decomposition_Mapping field) the core of this
algorithm is the same as NFKD.

If this variant of NFKD was a form visible on the wire, I would
consider that problematic, but as it is a form used internal to the
algorithm only, I do not consider it problematic as long as it is
intentional.  The one place where this difference from normalization
form KD might have surprising results is if one of the input strings
contains a non-canonical decomposition of a letter with multiple diacritical marks.  In that case the character would not match under
the equality function.  As the intention of this collation is a 'cheap'
collation rather than a linguistically correct collation (just as i;ascii-casemap is not a proper English collation), I consider this simplification justifiable.  The "canonical ordering behavior" section
of the Unicode specification is non-trivial and provides little
incremental benefit (indeed little visible change) to this collation
for that additional complexity.

The specification could be made more helpful to implementors with a
Unicode library by giving advice about whether or not it is acceptable
to substitute NFKD for the partial-NFKD described here.

On the issue of tracking the Unicode standard, I'll mention that many
runtime environments have built-in Unicode support now and it can be
difficult to determine which version of Unicode is active and most such
environments provide no way simple way to bind to a previous version of
UniData.txt.  So I consider tracking the current version of Unicode to
be pragmatic and necessary to make implementations feasible.  I would
look to the "running code" principle of the IETF to support this
position.

(Jari Arkko; former steering group member) Yes

Yes ()

                            

(Lisa Dusseault; former steering group member) Yes

Yes ()

                            

(Sam Hartman; former steering group member) (was Discuss) Yes

Yes ()

                            

(Cullen Jennings; former steering group member) No Objection

No Objection ()

                            

(Dan Romascanu; former steering group member) No Objection

No Objection ()

                            

(David Ward; former steering group member) No Objection

No Objection ()

                            

(Jon Peterson; former steering group member) No Objection

No Objection ()

                            

(Ron Bonica; former steering group member) No Objection

No Objection ()

                            

(Ross Callon; former steering group member) No Objection

No Objection ()

                            

(Russ Housley; former steering group member) No Objection

No Objection (2007-08-21)
  Based on Gen-ART Review from Christian Vogt.

  Section 1 ends with an applicability statement for the algorithm.
  The section says that, while the algorithm is well-suited for
  technical languages, it does not work correctly in certain cases
  when applied to natural language.  My suggestion is to move the
  applicability statement to a more prominent place, perhaps into a
  new section preceding current section 1.

  3rd paragraph of section 1:  s/using using/using/

(Tim Polk; former steering group member) No Objection

No Objection ()