CBOR WG Meeting IETF 105 - Montreal Tuesday, July 23, 2019, 10:00 - 11:30 Chairs: Francesca Palombini, Jim Schaad Recordings: https://youtu.be/UxwaM20zNa4 Slides: https://datatracker.ietf.org/meeting/105/session/cbor Note takers: Christian Amsüss * Introduction [10'] : Chairs Agenda bashing and WG status update Recording: https://youtu.be/UxwaM20zNa4?t=110 Slides: https://datatracker.ietf.org/meeting/105/materials/slides-105-cbor-chairs-03 Francesca pointing out Note Well and going throug the agenda Carsten Bormann: may need >10' for CDDL2, more like 20. Status update: interims were had and recorded. CDDL is RFC8610 now. Charter updated. CBOR-bis has been progressed and will be discussed today CBOR array in shepherd review. Non WG documents: * CDDL 2 - Collect ideas [20'] : Carsten Recording: https://youtu.be/UxwaM20zNa4?t=294 Slides: https://datatracker.ietf.org/meeting/105/materials/slides-105-cbor-cddl-cbor-tags-02 (slides 1-16) CDDL has been published. Done, but what next. post-1.0 was a topic in IETF103, and items were collected in cbor-cddl-freezer. Can take things out of it, but probably not want everything now at the same time. Prioritize. Probably other things around as well, discuss on list. Henk: Everything from freezer will be added onto the existing CDDL, not a fundamental change to syntax. Anything written in past 5 years will still keep functioning when freezer items are added. Carsten: mental node: Pointing out that forward compatibility is important and work done using CDDL is precious. There are extension points that can be exercised w/o "new CDDL". WG could exercise that, eg. with other regex schemes (eg. YANG ppl have struggle with Open... ppl where YANG prescribes W3C syntax like CDDL, but Open... want to use POSIX (really weird!); occasionally run into that but not lots of pressure here). The .bits is little-endian, big could be defined as well (but requires knowing how big it is, so a bit more work but could be done). Could also go to bitfields. (Bitfield example is at slide 10 as per printed numbers). Bitfield could look like this -- say in the field that there is a particular number of bits for items in there. As ppl increasingly use CDDL to combine new stuff with old stuff in binary form, that's a good thing to have. But also see T2TRG work by Ivaylo on binary bitstream stuff in YANG -- look at this before committing to anyting. All those things can be done w/o changing the language, just using extension points. Another thing doable w/o changing the language is having alternative representations. About JSON representation; description of a possible serialization fits on a slide in CDDL notation. Could go ahead and write a (informational) document to describe expressing the AST of CDDL in JSON for interoperability. Example with three rules on slide 9 (as per printed numbers). There's many ways how it could be done, that'sone. Can also put new things in the language. Cuts (for reducing set of choices) currently only works for map keys, could extend to whole map members or even further. Could have computed literals useful for specs where there is structure in constants, or auto-advancement. Could have better literal expressions for specific tags. Regexp literals have been suggested but not urgent. Could embed ABNF so not stuck with regexp, but use full power of ABNF. Larger projects: Could have co-occurrence constraints, eg. if two integer items are somewhere, then one could be required to be less than the other. Currently not doable as no pointer/selector construct. YANG uses something like XPath for that. Depends on how far along you want to go in the validation chain in CDDL. Ppl came far w/o, but maybe want to do this. Another large project: Modules. CDDL specs work together to create larger things. Could have modules, namespaces, import/export with URIs, and versioning of modules. Want to talk w/ YANG ppl on how they did it, trying to get it right. "Variants": Small details where CBOR serialization is different from JSON (eg. int/string). Would be nice to have single document to describe. But there can still be variants from user, and then again there are two variants of the CDDL. Can put things atop of language (like C preprocessor, so a CDDL 1 comes out of the processor), or inside. Both possible, will need to decide. Many projects have post validation mechanism: Validation not only decides whether input is fine, but also annotates. Could go beyond that by having real default values, or adding units and other values. Could go right into the relationship to semantics and RDF. All of this needs to be priorized. Proposal is to install a WG document that serves as road map, maybe starting from a restructured freezer document. No intent to make the RFC as a document (roadmap RFCs are sth diffeent), this is only a running document but it's a WG document and the WG agrees on it. My intention is to take today's output and update freezer to get there. If there's anything you'd like to have, here's the mike. (Nobody). Oh, we're done -- new version is 1.0, nobody needs anything (new ;-) Henk breaking awkward silence: Supporting all those items b/c composed them together. If you think any of them is vital to current work, say "This is the very least I need". If have roadmap, we'll have sequence and work them off. If you find sth useless, we may strike it off, or make a sequence of it. If there's still awkward silence, we'd go to the ppl who put in the requirements. Eg. constants w/ base and addition is convenient thing for me and others ... and now queue is filling. Laurence Lundblade: Ability to express CBOR in JSON is valuable, trying to do already, already trying in EAT[?] in RATS to express claims. Carsten: That's a different thing. You talk about one spec for CBOR and JSON, that's different. What I said was about representing CDDL in JSON. LL: Variant slide? CB: About single spec for JSON and CBOR LL: Yes, that Jefrrey Yasskin: Ability to refer to CBOR definitions from other specs is important. Sean Lenard before CDDL 1 discussed clearer-for-author ways to do cuts, interested if can be accomplished in CDDL2. It's interest not own energy to do it. CB: Will need ppl with language experience, possibly from outside. [...] Environment eggs? [...] CB: Ah, import. Yasskins: High priority. CB: Also send important things to mailing list. Francesca Palombini not-as-chair: About keeping this CDDL freezer updated in WG as item -- think we need to discuss that b/c there's danger this will delay things. Helpful but may delay. Option: Have a wiki instead to keep track of priorities etc. Just b/c writign and updating documents takes time. CB: But wikis too. Advantage is that we win 6 weeks per year to make changes. FP: Yes but one more active document on table. CB: Yes, and singling it out as WG document gives it status. FP: Sure but it just would take time to do these edits. Just think wiki is more dynamic. LL: Just suggesting github issues with labels. CB: Good idea and roadmap can point to them, but there needs to be information on how things fit together. Issues stand side-by-side and don't tell you relation.FP: Good to have it in one place, and can refer to issues. But yes, discuss that as well. Alexey: Discussing format of how to preserve this. Don't have to spend time discussing this here. Happy with chairs making unilateral decisions on this w/ consultation. Good thing about WG document is better control. Ppl find this and are more likely to come to WG. That's positive about having a WG doc. Other side was implying that CB has lots on his plate... FP: Yes. Still relevan but doc expired. Wiki can still be official. A: Fine with chairs making decision. If stays document, maybe find another editor to help out. Jefffrey: about features. Some more complex features make me nervous, lots of speculaitive "maybe we can use this here" ... but could get it wrong. Make sure there are solid use cases. Arrays came from somewhere, but actually didn't pan out that way. McDonald: prefer roadmap doc to github issues which are annoying @@@ FP plz copy/paste Hank: Prefer to look to documents. Can also cross-ref to github, chain. But manual process. About time, it's a mixed argument. About contributors and accessibility is important. Can track documents. Cf. side meetings: hard time tracking them b/c they live on wikis. WG documents: * CBOR specification status [50'] : Carsten https://tools.ietf.org/html/draft-ietf-cbor-7049bis Recording: https://youtu.be/UxwaM20zNa4?t=1941 Slides: https://datatracker.ietf.org/meeting/105/materials/slides-105-cbor-cddl-cbor-tags-02 (slides 17-44) Items from last face-to-face that are not done yet; going through. Error levels: 80% done but needs more work. More editorial changes to come up. "strict": is a confusing concept; if any concepts worth preserving, "better give them different and more specific names. There was sth about "decoders that check whether preferred encoding has been used, there was "text about security merits but they don't exist. There were others: "What part of CBOR validity checking do we factor here? Need better "terminology. "require valid" mode will always be hard to do for all "tags, as new tags can be registered. Expectation will always be that "generic decoder does some work, but some will be done by application, "and application validity can only be done by applciation. On tag validty, discussed structural vs semantic. Last meeting decided to move tags out to separate documents, but this sends a signal of demoting tags / undermining stability of tag part of ecosystem. In hindsight, probably don't want to do that. We don't have to. Wanted to stick w/ structural validity but say it's ultimately an explicit concept. Make explicit that generic decoder could present tags it considers structurally invalid to the applications as such. App could then implement semantic validity checking if so desired. Jeffrey: How's that gonna show up in the spec fo rthe higher-level application? CB: "Tag validity for this tag works this way". J: So "even though it's invalid, it's valid for *this protocol"? CB: Yes Jim Schaad from floor: Can you give examples of structurally invalid tag that you can make work? CB: Not talking about not-wellformed. For instance, some tags require array as contained element. In CBOR that's type 4. Now we have array tags. An app w/ array tags could say that "you can use this tag w/ array tags as structural component, even though original definition only said CBOR arrays". Structurally, expect meta-type 4 but what you get is tag-4-plus-byte-string. That's structurally invalid but semantically can be OK. JS: Problem w/ applications that do this. Decoder will only do this for thigns it has learned about. [...] CB: In specific interface, already there is a way to present unknown tag. Could use same interface "unknown tag" to present known tag with structural unexpectedness. Alert application to "this is not your normal", and app that may have code for new tag can also have code for old-but-unconventional tag. All that has to be written up at some point... Another thing about tag vality from Peter [?]: Some early tags don't work properly b/c decoding is based on serialization order. Unless generic deocder alerady knows it and keeps serialization order available, there's no chance to decode it. Some impls always preserve order in maps (often by accident), but if generic decoder does that, it can be done in application, but if not (and it may), then you can't process. That's a weird thing and we normally don't want to have it, but it was expeditious there at that time. Should have text "it's not entirely forbidden, but don't do that". Had discussion about tag validity: embedded CBOR item doesn't require anything from byte string for validity, while embedded mime requires "valid MIME" which is complicated. But hard part is required for the easy part. Missing guidance for defining tags. Should also look into validity of tag 24. Good generic decoder validity check is well-formedness of the embedded thing as validity criterion for thing outside. Can easily be checked for being unambiguous. That's my suggestion, discussed at interim already. Other validity checking: good idea to check, but not all will be able to. Mandatory checking might be a problem. Other meaning of "strictness" could be applied, a decoder could say it's "map-validity-checking" or not, and app developer would know that of the decoder. New issue (previous from -04): JSON-to-CBOR conversion not normative, but normatively referenced by other specs. Fish stick / aquarium situation. (JSON lacks CBOR-level information). Main issue is number system, JSON doesn't distinguish int/float. CBOR separates them. JSON-to-CBOR needs to make decision on how to represent integers expressed in JSON b/c floats can exceed CBOR 64/65 bit integer range. But as ppl usually do I-JSON, where float is stuffed in binary64. In binary64, everything 63 bits is inexact, so can't know. Recommendation: two pieces of guidance. Users of pure JSON can detect integers and store in number (possibly bignum). Users of I-JSON put everything into a binary64 and see if absolute value is <2**53 and make it an int, otherwise stay with float (Old text had several numbers). JS: Alexey, has IESG made statement on I-JSON vs JSON? Alexey: Think not. Proposal seems to make sense. CB: Most people are in I-JSON space, but there are others, and they can benefit from this. Major editorial ToDos that need fixing. "follows" terminology to be removed (in favor of "encloses"? unsure.) Current text says uneven number of maps are invalid but it's deeply hidden, let's make it explicit/redundant. Security considerations need finishing. Separate terms for abstract data item from encoded data item. Minor editorial issues [did not go into detail, see slide]. [skipping backup slides] FP: As it's hard to see from slides, but are all github issues covered? CB: That was the intention. FP: Now time for conversation. Anyone not happy w/ proposals? JS: Back to JSON numbers. Said "need to decide" -- is that "app needs to decide" or "wg needs to decide"? CB: App needs to. Or generic decoder implementer needs to. CB: Plan should be to use time until Singapore to complete this, and to go through WGLC so completion can happen in Singapore. FP: Next interim 31st of July is cancelled, next is 14th of August (?). Would be good to have timeline for update for covering the remaining issues. CB: Almost all of them can be covered so ppl can read before the interim. From that there can be questions in the interim. CB: cbor-sequence. Would like to go ahead with this. CBOR tag definitions. array is done, and nothing came up during write-up, to go to IESG. For OID we are chartered. Time, template etc for when we are rechartered. When is that? Alexey: Latest charter update based on chair text was done yesterday. Last person blocking rechartering cleared. Couple of tweaks, but next week or so. JS: May say it's adopted [?]? Alexey: Have my permission. * Flextime [5'] + Wrap up Recording: https://youtu.be/UxwaM20zNa4?t=3404 With 30 more minutes ... anyone to the mike? CB: Relaying one fun bit of data from last weeks: Driving licenses. Will be on your phone in the future. Might have CBOR in it. LL: Not sure about procedures ... discussing strict mode. FP: Go ahead. LL: related to CBOR-bis issues slide. Made comments on github issue, reiterating: Trying to understand strict mode, that was confusing to get head around and figure out what as an implementer of decoder I should do. Finally feel like having wrapped head around. Conslusion: Strict mode portrayed as counter part to canonical encoding. If no canonical is there, you can do strict. Strict mode addresses ambiguity in decoding. My thinking now is that variabiltiy in encoding in terms of serialization, and particularly with map ordering and duplicates in them map. Variation in serialization is not ambiguous b/c clear rules are there for what integer value is (even though encoded differently). So not much difference there that doesn' tneed to be covered b ystrict mode b/c no ambiguity. [...] With map ordering and map duplicates, that's a charactistic of valid or invalid. Don't see reason for strict mode to exist. If we can have good text for valid/invalid, no need to have strict mode any more. CB: (slide on that? page 21). 723 is confused about this. There's a number of issues under it, and agree term should go away b/c has no clear meaning. Two thigns of interest: Entities looking at encoding CBOR items actually decode them, sometimes just compare items in total, eg in hashing. For that it's useful to have deterministic encoding / "canonical". Decoder will normally not have code to check whether deterministic was used. Could have mode that demands that deterministic encoding. That's useful of a generic decoder -- of course can also fake it by re-encoding and comparing, so it doesn't need to be a necessary feature of an encoder but an efficiency thing. Other thing: Performing some validity checks is expensive, so may be parameters to control them. Expect code implementing mime check to have a switch for actually validating that. There may be flag to validate UTF-8 wellformedness. May be flag for map validity. Map validity is interesting and different from others b/c naïve implementation might lose input values (and undefined which ones) on duplicate map keys, and that can be used in certain kinds of attacks. That's where application can't do validation b/c it already lost information on the way. So that's one place where it's implrtant to tell the incoder to "not lose information for me" ie. "err out rather than losing items". That's also about strict mode, but different [?]. Turns into an enumeration of flags an app might set in a decoder, which would include canonical checking and map validity and maybe others. May be more to ease app's life. Eg. my decoder has flag that uses different data type for strings in key maps and other strings, b/c that's useful on that platform but has no place in standard. That a good way forward? LL: Yes. One thing about nonpreferred serialization. If you receive, you can unambiguously create preferred. JS shaking head: If you deserialize all the way to the data model, may not be able to encode back deterministically, b/c data-model-to-CBOR-types pops up. CB unsure: Well, first if language like Lua that doesn't distinguish btwn array and map, yes. Many environments do preserve enough. JS: If encode date and go back, may not end up with the same string. LL: My comment was only on serialziation, which is different, right? If decoder could tell whether serialization is preferred is interesting. Probably some cost to it. CB gathering more feedback CB: Jeffrey, what hurts you most? Jeffrey: Changes so far have been good. Get back to whole thign to check whether rigurous enough for web standard process. Good you're fixing outstanding issues. Hope to have time to go through whole thing and double-check in next month or two.