=========================================================== CBOR WG Meeting IETF 104 - Prague Thursday, Mar 28, 2019, 09:00 - 10:30 Chairs: Francesca Palombini, Jim Schaad =========================================================== Note Takers: Jaime Jimenez, Christian Amsuess Jabber: Francesca Slides: * https://datatracker.ietf.org/meeting/104/materials/slides-104-cbor-chairs-02.pdf * https://datatracker.ietf.org/meeting/104/materials/slides-104-cbor-consolidated-01.pdf Recording: https://www.youtube.com/watch?v=L8lk8hEEax8 =========================================================== Detailed minutes =========================================================== * Introduction [10'] : Chairs Agenda bashing and WG status update Francesca Palombini [FP] updating on the working group status. Welcoming new co-chair Jim Schaad. Update on documents. CBOR will recharter. Updates on Github with 7049bis CBOR Array Tag WGLC ended CDDL is on the Editor Queue! Congrats!! WG documents: * CDDL [5'] : Chairs https://tools.ietf.org/html/draft-ietf-cbor-cddl Carsten Bormann [CB] Presenting "What's up next?" Peek at post-01, please message list if you want something that's not there on the post-1.0 list or are bothered by something. Pronunciation? "C D D L" and "cuddle", WG should have opinion. Compare to "scasi". Do we want pronounceable? Do we want cuddle? (SCSI used to be pronounced "sexy"). What do you say? Few hands for "cuddle", many for "C D D L". Don't have to have single -- Object ID vs "oid". Maybe leave open, maybe keep in mind and form opinion and send to list. "CDDL doctors"? ("cuddle doctors"? "cuddle fairies"?) CDDL should be sufficiently accessible so we shouldn't need, and make a point of it not being needed. But tutoring and coaching would be good. https://cddl.space is out there, looking for tutors there. Jeffrey Yaskin: It's pretty close to sufficiently accessible, but some of the matching is not completely obvious, so would be good to have tutors. CB: Even self-explanatory things don't always arrive at recipient. Eg. yesterday: How do I write multimaps? Putting that somewhere would help. If you want to be on that page with advice or name, please contact me. Thar's all on CDDL, now for cbor-bis. * CBOR specification status [40'] : Carsten https://tools.ietf.org/html/draft-ietf-cbor-7049bis Advance of last month: pretty well-understood between error levels (CBOR syntax, semantics and application level). "not well-formed" usually leads to not continuing; code for check is in appendix "not valid" can still send data to application, but might look strange to application. What application can/should do could be discussed. "unexpected" errors are application expectations, CDDL/YANG could come in here (but also possible to define in text). It's useful to talk about this here as there, e.g. for integer-vs-float discussion. Expectation of number spaces was to integrate them as in JSON, didn't work well for applications. [...] Some applications still throw them together, and might then produce validity problems with tags that expect a float. Levels were not understood in original CBOR spec, now better. #17: (well-formed/valid) is now mostly integrated. What is still missing is redundancy reduction. (editorial ToDo; technical is resolved). ToDo: "strict". What's there comes from before preferred encoding (where editorial changes are still open). There is application content to this, can't be strict on generic level. A decoder that checks deterministic encoding has the same issue, a generic one can't check application components of this. (Better word for "deterministic-checking decoder"?) Jeffrey Yaskin (JY): Was suggested by Laurence to remove the strict decoder. The use for it is that an application can demand a strict decoder, but that does not mean enough. Rejecting invalid input can be sufficient to applications. CB: I don't have the text but I think it was the preferred decoder. JY: Strict decoder currently does not require preferred encoding, but application can say that it rejects non-preferred encoding. Could even define strict mode just as that. CB: maybe we should have this single paragraph somewhere where it says "encoders might be deterministic checking, etc" Laurence Lundblade (LL): Strict mode talks about deterministic, but talks of rejecting invalid CBOR too. looked into bis, found 18 instances of things that could go wrong, like string in epoch-date (easy) and utf8 checks (hard). Check for invalid mime is expensive to compute, and could be highly variable. That notion of "valid" / "invalid" seems problematic. Strict mode is oriented around validity. Also talks about firewall/security stuff. CB: We have done that already, that is fixed (firewall/security) LL: Still advocating strict mode going away. Checking some things for validity and not others is not a mode. Don't expect decoder to check for all, because some so expensive, and make so much more sense to change higher up. Not sure about deterministic. CB: once thing a strict mode decoder could do and I think this is the only validity that happens out of tags. It is the duplication of mappings. That might depend on the application and it is something that a generic decoder might do up to a certain extent. LL: UTF-8 checking is also outside of tags and about validity. JY: "Generic decoders" and "protocol decoders". Tag validity checks are not suitable for generic decoders, for LL's reasons. Protocol decoders are more limited in their scope, and they can use "must not be invalid" terminology. CB: that is a great conclusion for the next 5 slides I have. Most difficult pull request: #18 tag validity from JY. Clarifications good, but before we continue, think about 7049 validity requirements. Validity requirements in tag definitions depend on "State of universe" when tag was defined, and that changes. Tag 1, for example, takes numbers, and by the time we defined that we didn't define to tag decimal, and came up with 1001 to fix it. That sounds pretty onerous. (Or tag 36 for binary messages). Recurring theme -- don't get tag validity right the first time between meaning and written text. Two models: * "reactionary tag validity": write things down, lay out substructures, interpret what founders meant over long time later. New substructure can never go into old one. Plus side: Never ambiguity, but as JY said tag validity is an application topic. * "progressive tag validity": abstract semantics in definition, That rules out some things (byte string is not in real numbers), but later tags can come in. Later tags can claim to be useful for earlier tags. Still can miss things (forgot to say that new tag can go in old), but can do more. JY: I really like the progressive option. To decide whether it works we have to think why do we use tags for. In any particular protocol we can define the structure of the data (byte string as byte array) and not tag it, and still be parsed. I think the benefit of the tag is for human readable content. Things might not be able to use data with new tags until they're updated . As long as we are always defining the structures on the specs those debuggers will keep up. You have to say what it is able to contain for the purpose of the protocol. CB: Scary aspect here: When application gets composed, a generic decoder is used. If I have to wait for generic decoder to accept a new tag combination, it's bad. If it's the job of the generic decoder to deliver unknown combination to application as-is, we won't have deployability problem. Sean Lenard (SL): I also agree with the progressive tag validity option for similar reasons, also treating cases were the generic decoder is capable of sending some of the information up to the application. For me I would like to use one tag for my message that two, or eventually having to support a large quantity of tags. Brendan Moran (BM): Progressive tag validity is begging for many compatibility failures. Devices will not be updated, especially in IoT. Take 1 taking any number in R and firmware update takes timestamp for validity, the data structure that delivers the updates can run into errors. Great idea semantically, but practically problematic. CB: Right now when you get the timestamps in SUIT, do they say anything on what they are. BM: They are untagged for that very reason. CB: You did right thing, the application protocol made the right decision BM: if I have to make the decisions then the tags are not helping (as opposed to the spec) CB: Two reasons for tag, one reason for diagnostic debugging (those are enabled by self describing structures) and the other use are if you want to use them in protocol as extension point. BM: I understand the goal and the approach, but I also see that in a world where devices are not updated frequently, this will cause a critical failure. CB: Only if the application dev didn't do their job. JY: Can expect app devs to not do their job. Tags are not for protocol designers, they don't help with defining what data your app must accept. Don't fully understand the use case about extensibility point, but don't need to go into that in this meeting. Ad "use for diagnostic tools": Send data over protocol that expects integers, and diagnostic tool renders any number as time stamp. Diagnostic tool might hide the application breakage here. Not sure whether downsides convince me against progressive typing, but should discuss this further. CB: I think it is a sharp instrument, it needs to be handled with care. LL: One thing I saw tagging useful for was knowing how a generic encoder may change in native representation. Generic decoder can express date as platform date, and makes generic decoders easier to use CB: Let me give an example. One is when I have a place in my protocol where I can accept any integer. If I need more than 64 bits I use tag 2 and 3, less major types 0 and 1. In this case the tag is part of the sets of CBOR types that creates the space of integer. Then there are places in YANG-CBOR where we use tags to identify different pieces of information, where you tag them as a union of enumerations. That is an application function that is all about tagging things, but it is the application, not the decoder. In the first case a generic decoder could handle it in the second (YANG-CBOR) it is the application. Jim Schaad (JS): Opposite side of this is that a generic encoder becomes a pain to think about. Application needs to tell it that is allowed to use this set of things but not that set of things. CB: We have that problem with floating point, it is painful. AK: A couple of question, I suppose the idea is that you can use one or the other, right? (Cons vs Progr). CB: (going to slide 15) Two sides: What do we do about the tags we have? Can enable progressive for new ones, but have old ones that are limited. Do we want to open them up? In the end the app makes the call, but as JS said, now the interface to the generic encoder becomes more complicated. AK: Ok, another question on the semantic interoperability. Who gets to choose if abstract semantics match? CB: A string is not a number. AK: So are there existing rules as to which values are compatible and which are not? CB: Don't have rules, when you define a tag you say what can can be in there. AK: That'd be the primitive types. CB: by defining it at the semantic level AK: Question was if I define a new tag, how does it fulfil the actual semantics May need to have closer look. CB: Weird part is that progressive approach creates something like a type system. A new tag definition should document what it expects to have inside and what it exposes to the outside. ("Say it's in real numbers"). We don't have a formal system for that, not sure whether we have to invent it right on the spot. AK: yes. Maybe defining that is the long term goal. CB: May want to do that at some point, maybe even have CDDL learn semantics at some point. JY: The data model section anticipates the tags. I am fine with the idea of doing it ad hoc for a while, even if we get it wrong for a while. CB: If we go all-progressive, we'd allow app to define its expectedness within tags. (slide 15). Reclaiming 36 would be extreme case. Creates some instability. Ways forward: Don't really want to do slide 16 upper half (although we want to apply PR18 as "default"), and explore progressive tag validity. Saying "we do ad-hoc for a while" is kind of weird in internet standard, but has been done by application already but rather make explicit what people have been doing. I'd be comfortable with that, IESG might not. JY: I think progressive tag validity is pretty experimental and it should be on a "proposed standard" category RFC instead of "Internet standard". CB: OK, certainly one way of handling this. Goes against "batteries included". Some tags are really needed. JY: Yes, tags 2 and 3 perhaps we can say they use the reactionary type but tag 1 definitely needs to move to the other document. I get your point but maybe we should do it anyway. CB: Sounds like a way forward. Sean Leonard (SL): I just have a question about the big num type, can someone explain why someone wants to extend that? CB: Don't think there's controversy here SL: That means noone sees a need to extend that in the future then. My read of 7049, the reactionary approach didn’t seem so apparent. CB: Some tags are really locked down, some are so experimental we got them wrong. Think it's a good idea too, with CoSE that was split up between standard and proposed. Only security area can do normative in informational docs. Not the outcome I expected but we should try that. Alexey (AD): Not entirely true. FP: Continue discussion on list? CB: Yes. FP: Proposal to start with? CB: I think the next step would be to integrate a team, do this bit and see if we are happy with what we have. It is much better to do this in the concrete, when it is obvious what we want to do. CB: Other ToDos (slide 18) Already talked about strict. IANA considerations to be updated. (btw: From related specs there's one tag already in spec-required space). Reviews, WGLC? No WGLC but split doc, and then round of reviews, then WGLC. FP: Can you give us an indicative timeline that you estimate? CB: Should have split within April. FP: in three days then? CB: More like April 10th. FP: ok LL: About -bis. In syntax errors and well-formed... CB: That's deleted by #17 LL: Was looking at github HEAD. CB: Not up to date yet, want to check text that's not yet perfect, still going through the large deletion commit. This is WIP. LL: So one thing is secure-coding/secure-imp/sec-considerations for strict mode. How do we want to approach this? I'd like to see that defensive decoding should be in security considerations and nowhere else in the document, and be pretty sharp to the point where CBOR decoders are not to be expected to rely on any outside security, should say that. CB: Yes. LL: Strict mode talks about Firewalll... CB: No longer. Should be on github HEAD. Point is that strict is cut down anyway, the promises in security considerations should be very small. LL: Right, probably 0. that was the main think I wanted to make sure. That's it. CB: Someone with some hours free could look through appendix C and review for whether it's defensive enough. It's pseudocode but still. LL: I wasn't worried about that. I do have one more thing, which is in section 4.7 and the use of MUST sometimes is used to refer to compliance to the whole protocol and other times is to define stricter support. CB: Problem with those MUSTs (I think there's 2 left in section 4), one is MUST for application which is slightly weird, because we do that nowhere else. We have some implicit MUSTs, but that's called out explicitly again in section 4. LL: That's the qualified one (if you do it for serialization then you must do this) (chairs bringing up section 4) LL: In Section 4.10, there are several MUSTs in deterministic encoding. I'd call them qualified / conditional MUSTs. But there's a "No protocol MUST rely on tag ordering", that should not be qualified but is absolute for CBOR. There's two kinds of MUSTs, but that is confusing. CB: maybe putting the deterministic encoding in the section that is about integrating cbor protocol maybe it wasn't so smart. We are getting rid of 4.11 we want to structure the text slightly different. JY: The 4.10 gives the protocol a piece of terminology to define itself. Is in reasonable place I think, but if it's confusing people it's not. Maybe move to own section" What authors can use to define their protocols" and terms for other specs. CB: 4.9 and 4.10 could go into own section, even in front of Section 4. FP: Would that answer your question LL? LL: Yes. That means all MUSTs in 4 would be not qualified/conditional. There would be text in intro that says some points here are MUSTs and some are not. CB: Other CBOR housekeeping. bormann-cbor-sequence patterned after RFC7464. JSON has the type of patterned sequence. This document mainly tries to register the missing media type and content format. Compare documents, shows CBOR sequences is shorter, that's because CBOR is easier. JSON sequence tries to leverage record separator to provide error recovery, and I neither know how to do that nor think it's needed. Channels already have integrity protection, if it breaks it breaks. I published that thing and people gave positive feedback in order to have it as a normative reference. Would like to accelerate this, it's not in charter so plan A would be to find an AD sponsor and B would be the rechartering session later. Alexey AD: Which priority would this be compared to the work in the group? CB: It's a small number of use cases, a trivial amount of work so it's balanced. Alexey AD: ... ok .... SL: Read draft, think it's fine, no problem with WG adopting it. Still curious about use cases. Reasons for JSON sequences were streaming, curious what the industry or demand fo rCBOR sequences is. CB: The one document that was defined before was senml, it has a weird definition for streaming. With CBOR sequences it could have been done right, other documents could also benefit. John Mattsson had one example for using that. It is just a natural thing, when you have an env where you have to send a record of data and you have the length, it is not necessary to bind the record with an array tag. CB: Who read? JS, AK (+ Sean Leonard). SL: There is something you might want to think about. In JSON sequences they show the record delimiter, without having to show where one sequence ends. For PDUs that have a known length you could skip some of the code. You cannot do that with CBOR CB: CBOR does not use delimiters, period. SL: neither is JSON. CB: But JSON is not using the entire character set. Answer to your requirement is there, the elements of the CBOR records is in the byte strings. You can get ASN.1 beauty by packaging data item in byte string. SL: I can think of a few ways of do it but it would be interesting on mechanism to skip large quantities of cbor data. CB: If you need something skippable, that's the design pattern: Put a byte string around. SL: Possibly tagged CB: Possibly, but in streaming it's not necessary because that's just streaming. SL: Would it be tag as CBOR encapsulated data? JY: Should this document suggest or require that pattern? CB it could suggest, great idea, not require FP: Continue to talk about this doc, its place in the working group in the charter discussion later. * CBOR Array Tag [10'] : Carsten https://tools.ietf.org/html/draft-ietf-cbor-array-tags Array draft completed WGLC. Good editorial comments received. (recapping what it does / is) Work needed: editorials in tracker, more use cases for homogeneous and clamped (Current text for clamped is easy to misunderstand, not telling anything about history of data but how they should handle; that's very specific to work this came from in JS environment). Discussion around deterministic encoding of BE vs LE: conclusion is that precision and byte order is application thing, and there is no deterministic encoding at generic CBOR level. LL: on the BE vs LE, you could imagine a generic encoder would have to support both. The point of choosing one is to avoid the need to implement both. CB: Yes. This is out of character with the rest of CBOR, and the nice thing about it is that CBOR is stable enough to survive it. We can do that because the work we're interfacing with has different principles, and typed arrays come in that form. LL: let me ask again, why not LE? And say that's it. Why not mandate one to prevent ambiguity. CB: Because referenced work allows both. If you build something informed by two tracks of colliding standards, how do you handle that? Can make one group force the other, or find something that breaks neither, and this will not break Chronos and Javascript. LL: Ok CB: Jonathan convinced me that this is the right thing to do. Other: FP: We are going to schedule interim meetings and will continue with the calls every two weeks. Coordinating with ACE, CORE and COSE. CB: In interim between 103 and 104 we had regular CoRE and CBOR on Wednesday but on different times, that was confusing. FP: Make sure that you enter your preference. AK: Can we coordinate with OMA, 1DM, not to IETF level but avoid their slots. They'd also be happy to adjust if they get sufficient heads-up. FP: Yes. Will still be Wednesday but can fix time slot AK: Started collecting data on IoT directorate wiki when who meets. Jaime Jimenez (JJ): Is now the right place to ask for Jitsi instead of WebEx for those meetings. FP: With WebEx we can record, but will look into it. (From notetaker: https://jitsi.org/live-streaming-and-recording-a-jitsi-conference/ ) * Next step for the WG (charter discussion) [15'] : Chairs / all FP: Some documents now in backburner (missing on slides: coordinate reference systems, and Jim's CMS content types for CBOR); rechartering. CB: Last one is interesting b/c it's not something we do to interface with CMS, but it's something CMS does to interface w/ CBOR JS: Put in to notify people here, not to run it in this group. FP: So now we're aware. FP: Talked about sequence tag. Uncharted and not adopted: OID/OIDs+ (may be taken out of that). CB: We need to sort this into buckets. CDDL2.0 is about evolution of one of our core specs. Not maintenance or housekeeping but evolution based on user requirements now understood. That's one bucket with a single item (maybe also time tag). Other bucket: Housekeeping (like sequence), and tag definitions of different qualities (some housekeeping, some motivated by applications within IETF, eg. template). Template came out of LPWAN, they are now thinking about data models. That bucket of tags comes from inside IETF. Time tag is in different bucket that's about things that need to be done as a part of the ecosystem. With most tags being split out, we can think about restructuring. Third area of tags are registrations from outside, like error tag or geographic coordinates. Mainly domain of designated experts to talk with applicants, but if a tag with wider applicability comes out that should be doc'd in RFC, don't know whether WG doc or independent. Should think about that bucket, those will come in at increasing speed. FP: Alexey, AD opinions? AM: Trying to come up with one. Was checking charter. FP: CBOR-bis, CDDL, array and oid tag, and then recharter. AM: general advice: prioritize them and don't take too many. Generic charter point for extensions is fine, but limit your milestones and active documents. AK: sequence stuff is actually very interesting, based on the discussions in the RG seems that having this type of tools are very useful. Regarding the Error tag, what is the interplay between CBOR and the protocol carrying it (e.g., CoAP); should there be more detailed Bad Request version to help app stack with this? CB: When we defined CoAP we thought about this problem for a while and came up with errors that are supposed to be handled by the application, these need to be defined that is why we have 404. There are other errors that are hard to handle by the application, and can only be described by the application. Those hard to handle errors "something went wrong" is often only application-specifically describable, there's diagnostic payload for that. HTTP now has something similar ("problem document"). Maybe worth looking at that space to see whether we should do something at CBOR side. Will you write draft? JS: I have to see very type problem statements before adding that. AK: Just something for us to explore later-on, some exploration can be done in T2TRG. FP: I hear that there are 4 WG areas. CB: There's evolution of core documents (CDDL2.0). There's housekeeping stuff. And there are three buckets of tags (we want, IETF wants, others want). FP: So we should prioritise on those to ensure we don’t take too many. AM (AD): No opinion on which one you want to work on, just prioritize and pick. CB: The importance goes across these four categories. Regarding the ones that the "ietf wants" we can push to using side, keep us in the loop but have them do them. The "we want": we still have to do them, they don't come with tight time lines, they come as we slowly learn (little urgency, not that much work because when it this WG the problem is well understood). Can be done with high prio because they're little work. "outside wants" can be handled by just registering and later fixing up spec, which is unclean but works for people with shipping deadlines. Accommodates the people actually working on them. AM (AD): would the wg like to check if they are competent enough to work on some of the drafts or whether they have to be delegate to another one? This is just input to your decision process. (chairs nod) FP: was good discussion. CB: timeline? FP: We still have 3 docs that need to be published, but I don’t think we will have anything very soon anyway. CB: Of the two main documents, one is shipped and the other in the process of baking. This opens up new slots from my POV. Don't want tags to be in the way of rechartering. Rechartering can be based on whichever model we pick for handling tag documents. No strong position about whether we should have shipped cbor bis for rechartering, but we should have a timeline and be ready by the time that is shipped. AM (AD): Asking for rechartering once you ship the first revision will help the IESG. This WG started slow but it is keeping up pace. Demonstrating that the work gets done would be good for IESG. CB: So ship array tag? JS: My expectation was that chairs would, over next month, write candidate charter until May; when there's agreement push to IESG. AM (AD): Sounds good. FP: Thanks! =========================================================== Summary of APs: =========================================================== CDDL: * AP Carsten to call for CDDL "Doctors" in ml CBORBis: * AP Jeffrey to rebase PR17 * AP Carsten PR17 to check the deleted text is redundant (editorial) * AP Carsten to add a paragraph to say "encoder/decoder may use preferred encoding/decoding" and get rid of "strict mode" * AP Jeffrey to continue discussing tag validity in the mailing list * AP Carsten to implement the proposal (split progressive tags in a different doc) + notify mailing list * AP Carsten update IANA considerations * AP Laurence to check that the text about "defensive decoder" makes sense after Carsten has implemented the "strict mode" changes * AP anybody to check the code in appendix C and see if that's defensive enough * AP Carsten to move 4.9 and 4.10 to its own sections (in front of existing section 4) + change text in section 4 to reflect that Tags doc: * AP Carsten to submit new version bormann-cbor-sequence: * AP Carsten to implement the proposal: should suggest (but not require) making the stream-of-items be a stream-of-byte-sequences-containing items so that decoders can quickly skip items or distribute their decoding to other threads, and so that decoders can recover from brokenly-encoded items.