Last Call Review of draft-ietf-cellar-ebml-09
review-ietf-cellar-ebml-09-genart-lc-sparks-2019-02-20-00

Request Review of draft-ietf-cellar-ebml
Requested rev. no specific revision (document currently at 14)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2019-02-28
Requested 2019-01-29
Requested by Michael Richardson
Authors Steve Lhomme, Dave Rice, Moritz Bunkus
Draft last updated 2019-02-20
Completed reviews Secdir Last Call review of -09 by Valery Smyslov (diff)
Genart Last Call review of -09 by Robert Sparks (diff)
Genart Last Call review of -13 by Robert Sparks (diff)
Secdir Last Call review of -13 by Valery Smyslov (diff)
Opsdir Last Call review of -13 by Shwetha Bhandari (diff)
Genart Telechat review of -14 by Robert Sparks
Assignment Reviewer Robert Sparks
State Completed
Review review-ietf-cellar-ebml-09-genart-lc-sparks-2019-02-20
Reviewed rev. 09 (document currently at 14)
Review result Not Ready
Review completed: 2019-02-20

Review
review-ietf-cellar-ebml-09-genart-lc-sparks-2019-02-20

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-cellar-ebml-09
Reviewer: Robert Sparks
Review Date: 2019-02-20
IETF LC End Date: None
IESG Telechat date: Not scheduled for a telechat

Summary: Not ready for publication as a Proposed Standard

Note that this is really an early review - IETF LC has not yet been issued for this draft.

Issues:

It's not clear that this group is chartered to produce a general purpose
binary equivalent to XML. Instead, it appears to be chartered to document FFV1
and Matroska. EBML as it is currently used for those things needs to be
documented, but rather than try to make it into a format that other things
besides the work of this group appears out of scope. If I'm correct, then this
document shouldn't need to create an IANA registry - it need only document
what the group needs (and if the group needs more later, it can update this
document). The abstract and introduction would need to be adjusted to scope
the purpose of the format to supporting the work of this group. My review assumes
a scope of "documenting these existing formats" rather than providing a general
purpose markup language. If I'm wrong and this group is chartered to produce an
alternative for other protocols to use, this needs review from people who are
more expert in that kind of representational design than me.

The document is very hard to absorb given its current structure. It requires
multi-pass reading to figure out. While I like the idea of leaping directly
into security considerations, the draft really needs an introduction that lays
out the high-level structural concepts more linearly. A diagram of the
inherent tree structure would help reinforce the definitions. Showing the
defined types (13.1.5.9) in that diagram would be useful. A shorter
introduction to VINT (not the details of its structure, but why you have such
a construct in the first place) would also help. Also consider examples
showing the end of Unknown-Sized elements is determined (The description in
the first partial paragraph at the top of page 12 is complicated). It would
also help to introduce the idea of null terminated strings here to smooth the
forward reference from section 8.4 to section 9. CRC-32 elements aren't
defined until towards the end of the document, but non-trivial things are done
with them throughout. Having those introduced, at least conceptually, much
earlier would be good.

The third paragraph of the Security Considerations section (including the 
bullets) appears to say its ok to treat the following invalid things as valid.
If that's the case, why are they invalid in the first place? I think for some
of them, perhaps text has drifted and they are no longer invalid, but just
to be avoided when it is reasonable to do so? For those that really are invalid,
some text should be added describing _why_ they are ok to accept.

In 5.4 you claim the Size column in the first table refers to the size of the 
"VINT_DATA" in bits. The values there are not how many bits of VINT_DATA you have.
If that's what you really want to show, the values should be 7, 14, etc. rather
that 2^7, 2^14. If you mean for the column to show how large an integer can be 
represented by a VINT with this octet length, the values need to be 2^7-1,
2^14-1, etc. It would be helpful to reduce the width of the first columns so that
the last value in the Representation column does not wrap.

The last sentence of the 2nd paragraph of section 7 does not parse. Please
simplify it.

There is odd notation at the end of page 12. What does 2^(2_7) mean?

At section 8, the second sentence scans very badly. Consider breaking it into
two or more sentences so that there is no ambiguity around the concept being
defined. As written, its too easy to misunderstand that the element type
defines a concept that describes definition.

At section 11, you say that EBML Documents MUST NOT contain any data that is
not part of an EBML Element, but I think other sections allow that to happen
in data rewrite situations?

Why is the unique and persistent requirement SHOULD and not MUST at the top
of page 20?

At the definition of name (13.1.5.1), please talk about why this particular set
of characters were chosen. 

Please expand on "the risk of false positives" in 13.1.5.3. The risk here is not
obvious from the current text.

At 13.1.5.6.1 (a section nesting 5 deep is a document structure warning sign 
by itself), in the first bullet I suggest you say "are of the type of the element"
instead of "are integers or floats". What you have now lets me put an integer on
one side and a float on the other.

The range representations in 13.1.5.6.1 don't allow representing 0<=x<1, 0<x<1,
or any other range with one or more open ends. Why isn't that problematic?

The element descriptions in section 13 have range values that don't conform to
the range represnations in 13.1.5.6.1. See 13.2.5 for example (where the range
is "not 0").

The last sentence of section 13.1.12 looks like a security problem. Why would
you ever use _any_ elements containing a CRC-32 element that's bad? Some discussion
near the top of the document about what these CRC-32 elements are supposed to
accomplish seems necessary. More needs to be said about the similar security issues
at section 14.

At 13.2.9, what does "read upper values" mean?

Nits:

The definition of Master Element is not a definition - it describes one
propery Master Elements have. AFAICT, the concept isn't concisely defined
anywhere in the document.

The definition of Void Element doesn't need to say "damaged data". It
coule be perfectly good data that you are overwriting.

I suggest removing the word "official" from the Element Name definition.

The concept of an Identically Recurring Element is not clear throughout
the document. Some section should give a complete definition, and the
places that call out whether exceptions (such as where the security 
considerations discusses Identically Recurring Elements with invalid 
CRC-32 elements) are still Identically Recurring Elements or not.
What happens if the values in the invalid CRC-32 elements are different
from each other?

At section 8.7, second paragraph, first sentence. The sentence is long, and "equals to" looks like a problem spot for non-native-English readers. Perhaps "set to the same value as"?

Micronits:

Page 9: s/four octet./four octets./

Expanding the numbers in section 8.1 and 8.2 isn't really helpful. It would
be better to just say 2^64-1.