i;unicode-casemap - Simple Unicode Collation Algorithm
draft-crispin-collation-unicasemap-07
Yes
No Objection
Note: This ballot was opened for revision 07 and is now closed.
Lars Eggert No Objection
(Chris Newman; former steering group member) Yes
In response to Sam's discuss, the normalization form described is Normalization form KD excluding the "Canonical Ordering Behavior" of non-spacing marks. Unicode TR 15 (http://www.unicode.org/reports/tr15/) defines form KD as "compatibility decomposition" and the version of the Unicode specification I have handy (v 2.0) defines compatibility decomposition as recursively applying both compatibility and canonical mappings and then re-ordering non-spacing marks (I assume this hasn't changed). As the UniData file contains both compatibility and canonical decompositions in the "decomposition property" (the Unicode Character Database overview document calls this the Decomposition_Mapping field) the core of this algorithm is the same as NFKD. If this variant of NFKD was a form visible on the wire, I would consider that problematic, but as it is a form used internal to the algorithm only, I do not consider it problematic as long as it is intentional. The one place where this difference from normalization form KD might have surprising results is if one of the input strings contains a non-canonical decomposition of a letter with multiple diacritical marks. In that case the character would not match under the equality function. As the intention of this collation is a 'cheap' collation rather than a linguistically correct collation (just as i;ascii-casemap is not a proper English collation), I consider this simplification justifiable. The "canonical ordering behavior" section of the Unicode specification is non-trivial and provides little incremental benefit (indeed little visible change) to this collation for that additional complexity. The specification could be made more helpful to implementors with a Unicode library by giving advice about whether or not it is acceptable to substitute NFKD for the partial-NFKD described here. On the issue of tracking the Unicode standard, I'll mention that many runtime environments have built-in Unicode support now and it can be difficult to determine which version of Unicode is active and most such environments provide no way simple way to bind to a previous version of UniData.txt. So I consider tracking the current version of Unicode to be pragmatic and necessary to make implementations feasible. I would look to the "running code" principle of the IETF to support this position.
(Jari Arkko; former steering group member) Yes
(Lisa Dusseault; former steering group member) Yes
(Sam Hartman; former steering group member) (was Discuss) Yes
(Cullen Jennings; former steering group member) No Objection
(Dan Romascanu; former steering group member) No Objection
(David Ward; former steering group member) No Objection
(Jon Peterson; former steering group member) No Objection
(Ron Bonica; former steering group member) No Objection
(Ross Callon; former steering group member) No Objection
(Russ Housley; former steering group member) No Objection
Based on Gen-ART Review from Christian Vogt. Section 1 ends with an applicability statement for the algorithm. The section says that, while the algorithm is well-suited for technical languages, it does not work correctly in certain cases when applied to natural language. My suggestion is to move the applicability statement to a more prominent place, perhaps into a new section preceding current section 1. 3rd paragraph of section 1: s/using using/using/
(Tim Polk; former steering group member) No Objection