Skip to main content

Minutes IETF101: cbor
minutes-101-cbor-00

Meeting Minutes Concise Binary Object Representation Maintenance and Extensions (cbor) WG
Date and time 2018-03-20 13:30
Title Minutes IETF101: cbor
State Active
Other versions plain text
Last updated 2018-04-05

minutes-101-cbor-00
CBOR
IETF 101 - London
Tuesday, Mar 20, 2018, 13:30-15:30
Chairs: Joe Hildebrand, Francesca Palombini
Recordings:
https://play.conf.meetecho.com/Playout/?session=IETF101-CBOR-20180320-1330 or
https://youtu.be/FrVinVcs-P0
  Minutes by Paul Hoffman, Christian Amsüss
  Nothing from the slides reproduced here

* Introduction [15'] : Chairs
  https://youtu.be/FrVinVcs-P0?t=5m55s
  Agenda bashing and status update
  CDDL had WGLC, that generated review comments to be discussed today. 7049bis:
  Implementation matrix updated (see wiki). array tags got reviews, ready for
  adoption when reviews addressed.

* CDDL, draft-ietf-cbor-cddl-02 Presented by Carsten Bormann
   https://youtu.be/FrVinVcs-P0?t=8m32s
  - Changes since IETF 100: cuts in maps introduced for one particular
  application. From regexp discussion there: Using XSD regexp that are still
  weird but not too much, and [something with unicode]. - Created a "freezer"
  document for things that will not go into the main document - On map
  matching: CDDL has single concept of "groups" (for maps and arrays) that are
  grammars of types; describe linear languages. Linear properties are only used
  for arrays, and as maps are unordered (thus match anything), implementations
  need to be driven from grammar and not from text. That also creates some
  nondeterminism -- that hasn't hurt anywhere, but edge cases might come up. -
  Map validation (though "validation" not a term of CDDL): wildcard in matches
  could consume explicit matches [if the scribe understood correctly]. For that
  (and only that), cuts are introduced: once a fork is matched, the path is
  committed, and if the rest won't match, the whole thing won't. This is only
  meaningful on a match. That comes with limitations [...]. - Are there only
  editorial issues left, or is anything technical still open? Jeffrey Yasskin:
  Semantics of group and cuts are not specified well
    Wants to know precisely what causes a map to match. We need a precise
    specification of the matching algorithm. Carsten: This is precise as it's a
    parse expression grammar, which is greedy. It becomes a problem when
    expressed in a nondeterministic[?] finite automaton.
  Jim Schaad: Wants to change the ordering.
    Match the most specific keys first, specific before "any"
    Jeffrey: Can't say one type is more specific than other types.
    Choice types can just overlap and not be more specific than each other.
    Jim: Anything is a value > constructed type > groups (Value is more
    specific than type is more specific than any.)
  Sean Leonard: I want to express that I am opposed to introducing "cuts" into
  CDDL v1.0. Cuts transform a context-free grammar into a context-sensitive
  one. An alternative, "subsets" or "constraints", was sketched out in
  Singapore. From an editorial point, this is introducing new matter at the
  last minute when it needs to be fleshed out more over the coming months,
  i.e., after we get this version of CDDL out. (So basically it is similar to
  Jeffrey Yaskin's point: not well specified and also trying to do too much.)
    Carsten: If we want colon to be a short cut for "^ =>", we need to do this
    now, can't be added later. And that is the meaning probably most spec
    writers will expect of ":". The alternative is to smearing the previous
    cases into the later more generic ones. This won't make concise
    specificatons.
      Jeffrey: useful to have ":"" syntax act as a cut, not sure we need the
      cut syntax. Just column is maybe easier to specify.
    Henk Birkholz: Exception is the behavior of cuts
      If we don't want a simple notation, we need to decide this soon
      Likes the last example; intuitive
    Carsten: The interesting proposal here is to only have ":" and not "^".
      It's weird to have a shortcut that you can't have in long form
    Sean: (refers to slide 15) So the point is that you want to say "4 is text
    only, all other uints are anything else", right? So the slide says "* unit"
    and that means "* all uints except 4", right? What happens when later you
    want to say 5 is only a byte string? You have to put ? 5 : at the top, but
    you can't put it at the top, you are supposed to be putting it at the
    bottom... Jim: if you append a colon-thing to the bottom, you're going to
    expect enforcement, but not checked/enforced. Carsten: By creating a
    sorting mechanism, this consideration could be handled. I don't like the
    sorting mechanism b/c spec writer can intend a sequence. Sean/Jim: Spec
    writer can intend a sequence, but extension writer can only append.
    Carsten: [...] You can give the first thing a name, and have that as an
    extension point, and have the wild card after the named. Sean: The way that
    "constraints" work is that you say, with appropriate syntax: * unit -> any
    FIRST. Then in the subsequent parts of the spec, you identify specific
    instances of *uint => any. For example: "when uint is 4: text". "When uint
    is 5: byte string". (this is discussed in draft-seantek-constrained-abnf,
    for ABNF.) Carsten: [...] There are several proposals. Some proposal to
    have the shortcut but not the long form, others not to do that at all
    (refer to slide 15). Take it from the list from there. Francesca: Yes.
  - Carsten: Next topic: operator precedence. Operator precedence is logical
  when it comes to groups and types. The same syntax in a map context is
  unfamiliar in a type choice after a [quantifier]. General changes in operator
  precedence would create annoyance in form of needing more parens and raising
  syntax errors if missing. We can add text to explain and encourage a style
  that doesn't contain surprising cases. Comments? Room: silent.
    Hank: I see the necessity, but it violates the rule of not being noisy, and
    specs will be paren-laden after that, and move away from the being
    easy-to-read-and-write, but I see the point. Paul: It's never too hard to
    read too many parenthesis. Carsten: It doesn't need to be names, more
    common is to name the choice and than use that name. We can still make the
    recommendation w/o littering up specs -- but yes, we should check that.
  Carsten: Addressing Jim's review.
    @Dead code: should not lead to hard errors. A tool might still give
    warnings on that. It's generally undecidable, but often possible in
    realistic cases. @generics: grammar says it, text doesn't, but should say
    it too. @precedence: there were errors. @unwrap grammar: found copy/paste
    error. @terminology: we should make it visible that there are CBOR and CDDL
    terms, and they never mix.
  Jeffrey: CDDL spec is written as a tutorial, not a spec
    Appendix C is a good start, but it should move there before becoming an RFC.
    Carsten: A sensible proposition -- which would need half a year.
      Jeffrey: can we speed that up with a pull-request style?
    Alexey: Is this just reordering?
      Jeffrey: More. For some cases, I don't even know the algorithm Carsten
      has in mind. For other, it's clear enough that I'd be capable of writing
      the spec, but it takes time. I could sketch something in a month, but
      getting the exact words would take longer.
    Francesca: The WG said it wants this out as soon as possible
    Sean: Let's get version 1 out now and run a more formal spec on next
    version if we feel it's necessary.
      Jim: Agrees
    Joe Hildebrand: Sees a ton a value for this, but wants something sooner
    Carsten: Wants a list of the items Jeffrey does not understand
    Jeffrey: Can't currently use this in web specs
    Joe: We know that there is still work to do, we expect a -bis
    Alexey: Is comfortable with this approach
  Alexey: When can you be done?
    Carsten: Before late May. Would like to get input from Jeffrey.
  Francesca: To the WG: keep checking the doc (see github for most recent
  update). Bring leftover points/issues to the mailing list. After the update,
  we'll see if we need another WGLC.

* CBOR specification, draft-ietf-cbor-7049bis-02 Presented by Carsten Bormann
  https://youtu.be/FrVinVcs-P0?t=51m52s
  - This is about taking this to standard level, learn from first 5 years but
  don't futz around. This is way beyond errata, but follows the definition of
  standard level (look it up). - Since -00: experience says making readers
  infer data model from spec is mistake. We now define "generic data model"
  with extension points. - Separation of integer and floating point types (as
  it has been used). That played back into key equivalence. Now there are
  environments that don't allow that separation easily -- and we can't fix
  that. Needs to be considered when writing a model, will need a bit of general
  guidance. Joe: In JSON RFC, if you use something like an integer that is
  >2^53, you'll have problems
    Carsten: We won't have that problem
    Joe: Nevermind, not an useful idea
  - On canonicalization (c14n): this was problematic btwn authors in original
  spec, but there are uses for it -- let's help those people. Careful: There is
  key equivalence that can come from the application level. Floats are
  problematic too. We want to encourage generic encoder writers to not ignore
  it when users ask for c14n. To help them: Provide recommendations (and that's
  all that is in the RFC). Those recommendation rules were leaky, and keyorder
  (often complained about by implementors). The key order will change to
  byte-wise lexical -- but we also keep the old one in (but not recommendation)
  so existing specs can still reference it. Too bad, sorry. We will want to be
  more specific in float normalization; we have 3 models, should we express a
  preference? Own preference: Prefer shortest encoding in all cases (For int,
  length info, strings, tag number, floating, bignum etc).
    Jim: Worries about things like bignums into ints
      Worries about loss of tagging
    Carsten agrees, but has questions about whether it matters
    Jim: Yes in the crypto world completely different things
    Paul: For example a counter that is supposed to be 128bits
    Carsten: You can represent that in 64 bits if you can
    Jim: TSA signature, 2 int.
    Jeffrey: 2 examples. From FIDO: AGL found software bugs based on processors
    getting the length wrong. Geo location extansion to web authen that use
    floating point, did not want short encoding for floating. Matt Miller: More
    in the cryptographic context, if uses as a counter, semantic of a counter
    but syntax a set of bits of determinate length. Catastrophic to decrypt.
    Proposal: if we propose shortest encoding you have to have very clear
    considerations that you have to be careful
      Carsten: Example of things that need to be constant size
    Joe: We have the option of saying "don't do canonicalization of any
    protocol, use bytes"
      Carsten: We can do both. In the crypto space, we have learned it is bad,
      but in other use spaces, it could be OK
    Thiago Macieira: What happens if you decode and reincode in a compliant
    program
      You may be making bignums in all implementations
      Does this mean that bignums become mandatory?
      Carsten: It might be an option for your decoder
        Another thing a decoder can do is *check* canonicalization.
        Does this answer the question?
      Thiago: It's an answer to the question, but I am not satisfied.
    Matt: An alternative nuclear option: point to a different document
      Sean: A separate document might be good
    Jeffrey: All the specs should specify canonical output for testing
      Also contrains encoders to what they can put out
      We don't have to state a preference, but for basic generic data model we
      can have one but not extended generic, and giving such a canonicalization
      a name is sufficient for other specs.
    Sean: To what extent is type and tag is assumed to be saved across encoding
      Jim: +1. Are you canonicalizing the data model or the CBOR structure
      Carsten: Yes
        It's important for parser speed that it can discard information that is
        immaterial to the data model. Whether that includes int/bignum is up to
        questio
    Joe: If something is in canonical form, and I re-generate canonical form,
    these are equal.
      Carsten: That's almost the definition of canonical.
    Paul: Suggests that we only list ideas, but no preferences
      We should give some information, "but you make your own rules, and you're
      gonna cry."
    Jeffrey: Generic canonical is bad thing, protocol specific
    canonicalizatiion is important Matt: CBOR has as strong idea of what types
    are, unlike JSON. 1 more vote to do not deal with canon in this doc Joe: A
    world with 20 CBOR canonicalizations would be much worse than where we are
    today
      ASN.1 are an anti-pattern for writing a parser
      1) Never canonicalize, it's evil
      2) Here is a canonicalization form in this doc
      3) There is a canonicalization form in another doc
    Sean: I guess the point is, to what extent is CBOR type & tag information
    supposed to be preserved when serializing between different
    implementations? Carsten: We can split the technical issues from the
    procedural issues. Jeffrey: The rest of the set is close to ready, but this
    is not, suggests different doc Alexey: Group needs to decide between
    "saying very little" and "strongly against it" Paul: For this doc we can
    say "it got took out for a reason". There might be a doc in the future
    Carsten: Splitting out might be the best way. Thiago: Also yank of
    equivalence of keys
      Joe: Good point, we need to do analysis first
  - implementation matrix
  Jeffrey: There should be a better way to determine consensus to accept PR
  submitted to github
    Pull requests should be discussed on the list sooner
    Francesca: Reminder: important/big PR should go to the list
    Joe: we can be more aggressive on getting them included when we think the
    consensus has been reached Paul: Maybe wait three weeks after the end of
    discussion in
     the mailing list to include them
  Jeffrey: Chrome has two implementations that will get added

* Array Tag, draft-jroatch-cbor-tags-07, Presented by Carsten Bormann
  https://youtu.be/FrVinVcs-P0?t=1h35m56s
  2-byte space or 3-byte space
    Paul: 2-byte is fine
    Jim: 3-byte is fine and we might end up regretting
    Sean: Weak +1 for 3-byte
  Other registration in IANA overlaps
    Alexey: We cannot stop them but maybe we can convince them
    Zach Shelby: Did something in CORE
      Also has a fast track
      Maybe have a separation of rules
  Will be adopted in the WG

* Wrap-up: Chairs
  https://youtu.be/FrVinVcs-P0?t=1h53m38s
  Joe: will be stepping down as CBOR chair