Skip to main content

Last Call Review of draft-bormann-cbor-04
review-bormann-cbor-04-genart-lc-thomson-2013-07-30-00

Request Review of draft-bormann-cbor
Requested revision No specific revision (document currently at 09)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2013-08-13
Requested 2013-07-18
Authors Carsten Bormann , Paul E. Hoffman
I-D last updated 2013-07-30
Completed reviews Genart Last Call review of -04 by Martin Thomson (diff)
Genart Telechat review of -04 by Martin Thomson (diff)
Secdir Last Call review of -04 by Matt Lepinski (diff)
Assignment Reviewer Martin Thomson
State Completed
Request Last Call review on draft-bormann-cbor by General Area Review Team (Gen-ART) Assigned
Reviewed revision 04 (document currently at 09)
Result Ready
Completed 2013-07-30
review-bormann-cbor-04-genart-lc-thomson-2013-07-30-00
I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at

<

http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

Document: draft-bormann-cbor-04
Reviewer: Martin Thomson
Review Date: 2013-07-29
IETF LC End Date: ?
IESG Telechat date: 2013-08-15

Summary:
This document is not ready for publication as proposed standard.

I'm glad that I held this review until Paul's appsarea presentation.
This made it very clear to me that the types of concerns I have are
considered basically irrelevant by the authors because they aren't
interested in changing the design goals.  I don't find the specific
design goals to be interesting and am of the opinion that the goals
are significant as a matter of general application.  I hope that is
clear from my review.

Independent of any conclusions regarding design goals, there are
issues that need addressing.

(This is an atypical Gen-ART review.  I make no apologies for that.  I
didn't intend to write a review like this when I started, but feel
that it's important to commit these thoughts to the record.  It's also
somewhat long, sorry.  I tried to edit it down.)

I have reviewed the mailing list feedback, and it's not clear to me
that there is consensus to publish this.  It might be that the dissent
that I have observed is not significant in Barry's learned judgment,
or that this is merely dissent on design goals and therefore
irrelevant.  The fact that this work isn't a product of a working
group still concerns me.  I'm actually interested in why this is
AD-sponsored rather than a working group product.

Major issues:
My major concerns with this document might be viewed as disagreements
with particular design choices.  And, I consider it likely that the
authors will conclude that the document is still worth publishing as
is, or perhaps with some minor changes.  In the end, I have no issue
with that, but expect that the end result will be that the resulting
RFC is ignored.

What would cause this to be tragic, is if publication of this were
used to prevent other work in this area from subsequently being
published.  (For those drawing less-than-charitable inferences from
this, I have no desire to throw my hat into this particular ring,
except perhaps in jest [1].)

This design is far too complex and large.  Regardless of how
well-considered it might be, or how well this meets the stated design
goals, I can't see anything but failure in this document's future.
JSON succeeds largely because it doesn't attempt to address so many
needs at once, but I could even make a case for why JSON contains too
many features.

In comparison with JSON, this document does one major thing wrong: it
has more options than JSON in several dimensions.  There are more
types, there are several more dimensions for extensibility than JSON:
types extensions, values extensions (values of 28-30 in the lower bits
of the type byte), plus the ability to apply arbitrary tags to any
value.  I believe all of these to be major problems that will cause
them to be ignored, poorly implemented, and therefore useless.

In part, this complexity produces implementations that are far more
complex than they might need to be, unless additional standardization
is undertaken.  That idea is something I'm uncomfortable with.

Design issue: extensibility:
This document avoids discussion of issues regarding schema-less
document formats that I believe to be fundamental.  These issues are
critical when considering the creation of a new interchange format.
By choosing this specific design it makes a number of trade-offs that
in my opinion are ill-chosen.  This may be in part because the
document is unclear about how applications intend to use the documents
it describes.

You may conclude after reading this review that this is simply because
the document does not explain the rationale for selecting the approach
it takes.  I hope that isn't the conclusion you reach, but appreciate
the reasons why you might do so.

I believe the fundamental problem to be one that arises from a
misunderstanding about what it means to have no schema.  Aside from
formats that require detailed contextual knowledge to interpret, there
are several steps toward the impossible, platonic ideal of a perfectly
self-describing format.  It's impossible because ultimately the entity
that consumes the data is required at some level to understand the
semantics that are being conveyed.  In practice, no generic format can
effectively self-describe to the level of semantics.

This draft describes a format that is more capable at self-description
than JSON.  I believe that to not just be unnecessary, but
counterproductive.  At best, it might provide implementations with a
way to avoid an occasional extra line of code for type conversion.

Extensibility as it relates to types:
The use of extensive typing in CBOR implies an assumption of a major
role for generic processing.  XML schema and XQuery demonstrate that
this desire is not new, but they also demonstrate the folly of
pursuing those goals.

JSON relies on a single mechanism for extensibility. JSON maps that
contain unknown or unsupported keys are (usually) ignored.  This
allows new values to be added to documents without destroying the
ability of an old processor to extract the values that it supports.
The limited type information JSON carries leaks out, but it's unclear
what value this has to a generic processor.  All of the generic uses
I've seen merely carry that type information, no specific use is made
of the knowledge it provides.

ASN.1 extensibility, as encoded in PER, leads to no type information
leaking.  Unsupported extensions are skipped based on a length field.

(As an aside, PER is omitted from the analysis in the appendix, which
I note from the mailing lists is due to its dependency on schema.
Interestingly, I believe it to be possible - though not trivial - to
create an ASN.1 description with all the properties described in CBOR
that would have roughly equivalent, if not fully equivalent,
properties to CBOR when serialized.)

By defining an extensibility scheme for types, CBOR effectively
acknowledges that a generic processor doesn't need type information
(just delineation information), but it then creates an extensive type
system.  That seems wasteful.

Design issue: types:
The addition of the ability to carry uninterpreted binary data is a
valuable and important feature.  If that was all this document did,
then that might have been enough.  But instead it adds numerous
different types.

I can understand why multiple integer encoding sizes are desirable,
and maybe even floating point representations, but this describes
bignums in both base 2 and 10, embedded CBOR documents in three forms,
URIs, base64 encoded strings, regexes, MIME bodies, date and times in
two different forms, and potentially more.

I also challenge the assertion made where the code required for
parsing a data type produces larger code sizes if performed outside of
a common shared library.  That's arguably provably true, but last time
I checked a few extra procedure calls (or equivalent) weren't the
issue for code size.  Sheer number of options on the other hand might
be.

Half-precision floating point numbers are a good example of excessive
exuberance.  They are not available in many languages for good reason:
they aren't good for much.  They actually tend to cause errors in
software in the same way that threading libraries do: it's not that
it's hard to use them, it's that it's harder than people think.  And
requiring that implementations parse these creates unnecessary
complexity.  I do not believe that for the very small subset of cases
where half precision is actually useful, the cost of transmitting the
extra 2 bytes of a single-precision number is not going to be a
burden.  However, the cost of carrying the code required to decode
them is not as trivial as this makes out.  The fact that this requires
an appendix would seem to indicate that this is special enough that
inclusion should have been very carefully considered.  To be honest,
if it were my choice, I would have excluded single-precision floating
point numbers as well, they too create more trouble than they are
worth.

Design issue: optionality
CBOR embraces the idea that support for types is optional.  Given the
extensive nature of the type system, it's almost certain that
implementations will choose to avoid implementation of some subset of
the types.  The document makes no statements about what types are
mandatory for implementations, so I'm not sure how it is possible to
provide interoperable implementations.

If published in its current form, I predict that only a small subset
of types will be implemented and become interoperable.

Design issue: tagging
The tagging feature has a wonderful property: the ability to create
emergency complexity.  Given that a tag itself can be arbitrarily
complex, I'm almost certain that this is a feature you do not want.

Minor issues:
Design issue: negative numbers
Obviously, the authors will be well-prepared for arguments that
describe as silly the separation of integer types into distinct
positive and negative types.  But it's true, this is a strange choice,
and a very strange design.

The fact that this format is capable of describing 64-bit negative
numbers creates a problem for implementations that I'm surprised
hasn't been raised already.  In most languages I use, there is no
native type that is capable of carrying the most negative value that
can be expressed in this format.  -2^64 is twice as large as a 64-bit
twos-complement integer can store.

It almost looks as though CBOR is defining a 65-bit, 33-bit or 17-bit
twos complement integer format, with the most significant bit isolated
from the others, except that the negative expression doesn't even have
the good sense to be properly sortable.  Given that and the fact that
bignums are also defined, I find this choice to be baffling.

Document issue: Canonicalization
Please remove Section 3.6.  c14n is hard, and this format actually
makes it impossible to standardize a c14n scheme, that says a lot
about it.  In comparison, JSON is almost trivial to canonicalize.

If the intent of this section is to describe some of the possible
gotchas, such as those described in the last paragraph, then that
would be good.  Changing the focus to "Canonicalization
Considerations" might help.

I believe that there are several issues that this section would still
need to consider.  For instance, the use of the types that contain
additional JSON encoding hints carry additional semantics that might
not be significant to the application protocol.

Extension based on minor values 28-30 (the "additional information" space):
...is impossible as defined.  Section 5.1 seems to imply otherwise.
I'm not sure how that would ever happen without breaking existing
parsers.  Section 5.2 actually makes this worse by making a
wishy-washy commitment to size for 28 and 29, but no commitment at all
for 30.

Nits:
Section 3.7 uses the terms "well-formed" and "valid" in a sense that I
believe to be consistent with their use in XML and XML Schema.  I
found the definition of "valid" to be a little difficult to parse;
specifically, it's not clear whether invalid is the logical inverse of
valid.

Appendix B/Table 4 has a TBD on it.  Can this be checked?

Table 4 keeps getting forward references, but it's hidden in an
appendix.  I found that frustrating as a reader because the forward
references imply that there is something important there.  And that
implication was completely right, this needs promotion.  I know why
it's hidden, but that reason just supports my earlier theses.

Section 5.1 says "An IANA registry is appropriate here.".  Why not
reference Section 7.1?

[1] 

https://github.com/martinthomson/aweson