Internet Application Protocol Collation Registry
RFC 4790

Note: This ballot was opened for revision 14 and is now closed.

(Lisa Dusseault) Yes

(Sam Hartman) (was Discuss) Yes

(Jari Arkko) No Objection

(Ross Callon) No Objection

(Brian Carpenter) No Objection

Comment (2006-08-16 for -)
No email
send info
For the extensive followup to these comments, see the Gen-ART archive
starting at
http://www1.ietf.org/mail-archive/web/gen-art/current/msg01152.html

From Gen-ART review by Spencer Dawkins:

2.2.  Purpose

  Collations abstraction layer for comparison functions so that these
  comparison functions can be used in multiple protocols.

I am just barely able to parse this sentence so that it's not a sentence fragment. I think the problem is that "functions" is being used as a verb and as a noun in the same sentence. I saw later in the document that you had changed "function"-the-noun to "operation", so should be easy to fix. But this isn't an editorial comment, because I'm not sure what the sentence is saying.

4.2.2.  Equality
   ...
   In this specification, the return values of the equality test are
   called "match", "no-match" and "undefined".  This is not a
   specification, merely a choice of phrasing.

What does the last sentence mean? (Brian Carpenter asked me, so he doesn't know, either).

5.2.  Operations

...

  Although the collation's substring function provides a list of
  matches, a protocol need not provide all that to the client.  It may
  provide only the first matching substring, or even just the
  information that the substring search matched.

Hmmm. I am trying to remember that you're not defining a protocol, only describing what protocols do and don't do, but I'm trying to read this from the application's perspective, and having a hard time understanding how (for example) an application that is trying to display what is matching responds when the protocol only provides an indication that something matched. You may say this is what the protocol developers are supposed to worry about ("if you think applications will want to display what matches, you'd better define the protocol so that this information is returned"), and that's OK. I'm just struggling a bit here.

6.  Use by Existing Protocols

...

  IMAP [16] also collates, although that is explicit only when the
  COMPARATOR [18] extension is used.  The built-in IMAP substring
  operation and the ordering provided by the SORT [17] extension may
  not meet the requirements made in this document.

  Other protocols may be in a similar position.

  In IMAP, the default collation is i;ascii-casemap, because its
  operations most closely resembles IMAP's built-in operations.

EDITORIAL: I'm guessing that the previous paragraph should be moved up one? At the very least, I'm confused because I'm not sure if the top paragraph in this extract describes the differences between i;ascii-casemap and IMAP's built-in operations or is talking about something else.

9.1.1.  ASCII Numeric Collation Description

  The "i;ascii-numeric" collation is a simple collation intended for
  use with arbitrary sized unsigned decimal integer numbers stored as
  octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
  the numbers.  Before converting from string to integer, the input
  string is truncated at the first non-digit character.  All input is
  valid; strings which do not start with a digit represent positive
  infinity.

Is it obvious to everyone except me that leading zeros are ignored? The examples giving a little further down say so - is making this point in examples normative enough?

9.2.1.  ASCII Casemap Collation Description

...

  The i;ascii-casemap collation is well suited to to use with many
  internet protocols and computer languages.  Use with natural language
  is often inappropriate: even though the collation apparently supports
  languages such as Italian and English, in real-world use it tends to
  stumble over words such as "naive", names such as "Llwyd", people and
  place names containing non-ASCII, euro and pound sterling symbols,
  quotation marks, dashes/hyphens, etc.

OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed to include a non-ascii character, or is that sentence saying something else? (Welcome to the world of the RFC Editor)

13.  Open Issues

   ... adding a
   note to the RFC editor to possibly replace the 3066 reference

> From Brian: Surely this needs to be done?


> From Spencer: I'm thinking that the "checking the SP SP "1" SP SP string for correctness" also needs to be done pretty soon :-0

(Lars Eggert) No Objection

Comment (2006-08-15 for -)
No email
send info
Section 2.2., paragraph 1:

>    Collations abstraction layer for comparison functions so that these
>    comparison functions can be used in multiple protocols.

  Nit: Verb missing.


Section 2.2., paragraph 3:

>    Here is a small diagram to help illustrate the value of this
>    abstraction layer:

  Maybe I'm dense, I see how collations form an abstraction, but I don't
  see how they are an abstraction _layer_. Especially when looking at
  the diagram. s/abstraction layer/abstraction/ throughout the document?


Section 2.3., paragraph 3:

>    A server needs to use the operations provided by collations
>    in order to fulfil the client's requests.

  Nit: s/fulfil/fulfill/


Section 2.4., paragraph 3:

>    This is an implementation detail of collations or servers.  A
>    protocol SHOULD NOT expose it, since some collations leave the sort
>    key's format up to the implementation, and current conformant
>    implementations are known to use different formats.

  I don't understand what "expose" means in this statement. Expose the
  transformation as cleartext on the wire? Expose it to clients or
  servers? Something else? Please clarify.


Section 3.1., paragraph 1:

>    The collation identifier itself is a single US-ASCII string beginning
>    with a letter and made up of letters, digits, and one of the
>    following 4 symbols: "-", ";", "=" and ".".

  Nit: s/one of/some of/


Section 3.1., paragraph 4:

>    The identifier "default" is reserved.  For protocol which have a
>    default collation, "default" refers to that collation.  For other
>    protocols, the identifier "default" matches no collations, and
>    servers SHOULD treat it in the same way as they treat nonexistent
>    collations.

  s/matches no collations/MUST match no collations/
  Nit: s/For protocol/For protocols/


Section 3.2., paragraph 1:

>    The string a client uses to select a collation MAY contain one or
>    more wildcard ("*") character which matches zero or more collation-

  Nit: s/characters which matches/characters which match/


Section 3.2., paragraph 2:

>    chars.  Wildcard characters MUST NOT be adjacent.  If the wildcard
>    string matches multiple collations, the server SHOULD select the
>    collation with the broadest scope (preferably international scope),
>    the most recent table versions and the greatest number of supported
>    operations.

  DISCUSS: "Broadest scope" is underspecified. I'm letting Ted hold it.


Section 4.2., paragraph 3:

>    A nonobvious consequence of the rules for each collation operation is

  Nit: s/nonobvious/non-obvious/


Section 4.2.3., paragraph 1:

>    The substring matching operation determines if the first string is a
>    substring of the second string, ie. if one or more substrings of the

  Nit: s/ie./i.e.,./ here and elsewhere in the drafts, also for e.g.


Section 9.1.1., paragraph 3:

>    The equality operation returns "match" if the two strings represent
>    the same number (ie. leading zeroes and trailing nondigits are

  Nit: s/ie.(i.e.,/ Nit: s/nondigits/non-digits/


Section 9.2.1., paragraph 4:

>    The i;ascii-casemap collation is well suited to to use with many
>    internet protocols and computer languages.  Use with natural language

  Nit: s/internet/Internet/


Section 9.2.1., paragraph 5:

>    place names containing non-ASCII, euro and pound sterling symbols,

  Nit: s/euro and pound sterling/Euro and Pound Sterling/


Section 2., paragraph 0:

>    2.  Get rid of all the typoes I could find.

  Nit: s/typoes/typos/

(Ted Hardie) (was Discuss) No Objection

Comment (2006-08-14)
No email
send info
This sentence at the start of 2.2 did not make sense to me:

 Collations abstraction layer for comparison functions so that these
   comparison functions can be used in multiple protocols.

In 3.4, was there a reason that the authors chose not to use the IETF
URN parameter namespace for this?  For the http URIs, if the IANA must change
its assignment of that web space, I believe allowing "other-uri" to
refer to the IANA collation namespace (with whatever new URIs are
assigned) might be useful.  

Does this statement:

   In IMAP, the default collation is i;ascii-casemap, because its
   operations most closely resembles IMAP's built-in operations.

serve to mark that as equivalent to "default" for IMAP or is more needed?

(Russ Housley) (was Discuss, No Objection, Discuss) No Objection

Comment (2006-08-17)
No email
send info
  Please remove sections 13 and 14 prior to publication as an RFC.

(Cullen Jennings) No Objection

Comment (2006-08-17 for -)
No email
send info
I think we need a better way of knowing that i81n documents have been adequately reviewed before they get to IESG.

(David Kessens) No Objection

(Jon Peterson) No Objection

(Mark Townsley) No Objection