Skip to main content

Minutes IETF104: cbor
minutes-104-cbor-01

Meeting Minutes Concise Binary Object Representation Maintenance and Extensions (cbor) WG
Date and time 2019-03-28 08:00
Title Minutes IETF104: cbor
State Active
Other versions plain text
Last updated 2019-04-02

minutes-104-cbor-01
===========================================================
CBOR WG Meeting
IETF 104 - Prague
Thursday, Mar 28, 2019, 09:00 - 10:30
Chairs: Francesca Palombini, Jim Schaad
===========================================================

Note Takers: Jaime Jimenez, Christian Amsuess
Jabber: Francesca

Slides:
*
https://datatracker.ietf.org/meeting/104/materials/slides-104-cbor-chairs-02.pdf
*
https://datatracker.ietf.org/meeting/104/materials/slides-104-cbor-consolidated-01.pdf

Recording: https://www.youtube.com/watch?v=L8lk8hEEax8

===========================================================
Detailed minutes
===========================================================

* Introduction [10'] : Chairs
  Agenda bashing and WG status update

Francesca Palombini [FP] updating on the working group status.

Welcoming new co-chair Jim Schaad.
Update on documents.
CBOR will recharter.
Updates on Github with 7049bis
CBOR Array Tag WGLC ended

CDDL is on the Editor Queue! Congrats!!

WG documents:

* CDDL [5'] : Chairs
  https://tools.ietf.org/html/draft-ietf-cbor-cddl

Carsten Bormann [CB] Presenting

"What's up next?"
Peek at post-01, please message list if you want something that's not there on
the post-1.0 list or are bothered by something. Pronunciation? "C D D L" and
"cuddle", WG should have opinion. Compare to "scasi". Do we want pronounceable?
Do we want cuddle? (SCSI used to be pronounced "sexy"). What do you say? Few
hands for "cuddle", many for "C D D L". Don't have to have single -- Object ID
vs "oid". Maybe leave open, maybe keep in mind and form opinion and send to
list.

"CDDL doctors"? ("cuddle doctors"? "cuddle fairies"?) CDDL should be
sufficiently accessible so we shouldn't need, and make a point of it not being
needed. But tutoring and coaching would be good. https://cddl.space is out
there, looking for tutors there.

Jeffrey Yaskin: It's pretty close to sufficiently accessible, but some of the
matching is not completely obvious, so would be good to have tutors. CB: Even
self-explanatory things don't always arrive at recipient. Eg. yesterday: How do
I write multimaps? Putting that somewhere would help. If you want to be on that
page with advice or name, please contact me.

Thar's all on CDDL, now for cbor-bis.

* CBOR specification status [40'] : Carsten
  https://tools.ietf.org/html/draft-ietf-cbor-7049bis

Advance of last month: pretty well-understood between error levels (CBOR
syntax, semantics and application level). "not well-formed" usually leads to
not continuing; code for check is in appendix "not valid" can still send data
to application, but might look strange to application. What application
can/should do could be discussed. "unexpected" errors are application
expectations, CDDL/YANG could come in here (but also possible to define in
text). It's useful to talk about this here as there, e.g. for integer-vs-float
discussion. Expectation of number spaces was to integrate them as in JSON,
didn't work well for applications. [...] Some applications still throw them
together, and might then produce validity problems with tags that expect a
float. Levels were not understood in original CBOR spec, now better.

#17: (well-formed/valid) is now mostly integrated. What is still missing is
redundancy reduction. (editorial ToDo; technical is resolved).

ToDo: "strict". What's there comes from before preferred encoding (where
editorial changes are still open). There is application content to this, can't
be strict on generic level. A decoder that checks deterministic encoding has
the same issue, a generic one can't check application components of this.
(Better word for "deterministic-checking decoder"?) Jeffrey Yaskin (JY): Was
suggested by Laurence to remove the strict decoder. The use for it is that an
application can demand a strict decoder, but that does not mean enough.
Rejecting invalid input can be sufficient to applications. CB: I don't have the
text but I think it was the preferred decoder. JY: Strict decoder currently
does not require preferred encoding, but application can say that it rejects
non-preferred encoding. Could even define strict mode just as that. CB: maybe
we should have this single paragraph somewhere where it says "encoders might be
deterministic checking, etc" Laurence Lundblade (LL): Strict mode talks about
deterministic, but talks of rejecting invalid CBOR too. looked into bis, found
18 instances of things that could go wrong, like string in epoch-date (easy)
and utf8 checks (hard). Check for invalid mime is expensive to compute, and
could be highly variable. That notion of "valid" / "invalid" seems problematic.
Strict mode is oriented around validity. Also talks about firewall/security
stuff. CB: We have done that already, that is fixed (firewall/security) LL:
Still advocating strict mode going away. Checking some things for validity and
not others is not a mode. Don't expect decoder to check for all, because some
so expensive, and make so much more sense to change higher up. Not sure about
deterministic. CB: once thing a strict mode decoder could do and I think this
is the only validity that happens out of tags. It is the duplication of
mappings. That might depend on the application and it is something that a
generic decoder might do up to a certain extent. LL: UTF-8 checking is also
outside of tags and about validity. JY: "Generic decoders" and "protocol
decoders". Tag validity checks are not suitable for generic decoders, for LL's
reasons. Protocol decoders are more limited in their scope, and they can use
"must not be invalid" terminology. CB: that is a great conclusion for the next
5 slides I have.

Most difficult pull request: #18 tag validity from JY. Clarifications good, but
before we continue, think about 7049 validity requirements. Validity
requirements in tag definitions depend on "State of universe" when tag was
defined, and that changes. Tag 1, for example, takes numbers, and by the time
we defined that we didn't define to tag decimal, and came up with 1001 to fix
it. That sounds pretty onerous. (Or tag 36 for binary messages). Recurring
theme -- don't get tag validity right the first time between meaning and
written text. Two models:
    * "reactionary tag validity": write things down, lay out substructures,
    interpret what founders meant over long time later. New substructure can
    never go into old one. Plus side: Never ambiguity, but as JY said tag
    validity is an application topic. * "progressive tag validity": abstract
    semantics in definition, That rules out some things (byte string is not in
    real numbers), but later tags can come in. Later tags can claim to be
    useful for earlier tags. Still can miss things (forgot to say that new tag
    can go in old), but can do more.
JY: I really like the progressive option. To decide whether it works we have to
think why do we use tags for. In any particular protocol we can define the
structure of the data (byte string as byte array) and not tag it, and still be
parsed. I think the benefit of the tag is for human readable content. Things
might not be able to use data with new tags until they're updated . As long as
we are always defining the structures on the specs those debuggers will keep
up. You have to say what it is able to contain for the purpose of the protocol.
CB: Scary aspect here: When application gets composed, a generic decoder is
used. If I have to wait for generic decoder to accept a new tag combination,
it's bad. If it's the job of the generic decoder to deliver unknown combination
to application as-is, we won't have deployability problem. Sean Lenard (SL): I
also agree with the progressive tag validity option for similar reasons, also
treating cases were the generic decoder is capable of sending some of the
information up to the application. For me I would like to use one tag for my
message that two, or eventually having to support a large quantity of tags.
Brendan Moran (BM): Progressive tag validity is begging for many compatibility
failures. Devices will not be updated, especially in IoT. Take 1 taking any
number in R and firmware update takes timestamp for validity, the data
structure that delivers the updates can run into errors. Great idea
semantically, but practically problematic. CB: Right now when you get the
timestamps in SUIT, do they say anything on what they are. BM: They are
untagged for that very reason. CB: You did right thing, the application
protocol made the right decision BM: if I have to make the decisions then the
tags are not helping (as opposed to the spec) CB: Two reasons for tag, one
reason for diagnostic debugging (those are enabled by self describing
structures) and the other use are if you want to use them in protocol as
extension point. BM: I understand the goal and the approach, but I also see
that in a world where devices are not updated frequently, this will cause a
critical failure. CB: Only if the application dev didn't do their job. JY: Can
expect app devs to not do their job. Tags are not for protocol designers, they
don't help with defining what data your app must accept. Don't fully understand
the use case about extensibility point, but don't need to go into that in this
meeting. Ad "use for diagnostic tools": Send data over protocol that expects
integers, and diagnostic tool renders any number as time stamp. Diagnostic tool
might hide the application breakage here. Not sure whether downsides convince
me against progressive typing, but should discuss this further. CB: I think it
is a sharp instrument, it needs to be handled with care. LL: One thing I saw
tagging useful for was knowing how a generic encoder may change in native
representation. Generic decoder can express date as platform date, and makes
generic decoders easier to use CB: Let me give an example. One is when I have a
place in my protocol where I can accept any integer. If I need more than 64
bits I use tag 2 and 3, less major types 0 and 1. In this case the tag is part
of the sets of CBOR types that creates the space of integer. Then there are
places in YANG-CBOR where we use tags to identify different pieces of
information, where you tag them as a union of enumerations. That is an
application function that is all about tagging things, but it is the
application, not the decoder. In the first case a generic decoder could handle
it in the second (YANG-CBOR) it is the application. Jim Schaad (JS): Opposite
side of this is that a generic encoder becomes a pain to think about.
Application needs to tell it that is allowed to use this set of things but not
that set of things. CB: We have that problem with floating point, it is
painful. AK: A couple of question, I suppose the idea is that you can use one
or the other, right? (Cons vs Progr). CB: (going to slide 15) Two sides: What
do we do about the tags we have? Can enable progressive for new ones, but have
old ones that are limited. Do we want to open them up? In the end the app makes
the call, but as JS said, now the interface to the generic encoder becomes more
complicated. AK: Ok, another question on the semantic interoperability. Who
gets to choose if abstract semantics match? CB: A string is not a number. AK:
So are there existing rules as to which values are compatible and which are
not? CB: Don't have rules, when you define a tag you say what can can be in
there. AK: That'd be the primitive types. CB: by defining it at the semantic
level AK: Question was if I define a new tag, how does it fulfil the actual
semantics May need to have closer look. CB: Weird part is that progressive
approach creates something like a type system. A new tag definition should
document what it expects to have inside and what it exposes to the outside.
("Say it's in real numbers"). We don't have a formal system for that, not sure
whether we have to invent it right on the spot. AK: yes. Maybe defining that is
the long term goal. CB: May want to do that at some point, maybe even have CDDL
learn semantics at some point. JY: The data model section anticipates the tags.
I am fine with the idea of doing it ad hoc for a while, even if we get it wrong
for a while.

CB: If we go all-progressive, we'd allow app to define its expectedness within
tags. (slide 15). Reclaiming 36 would be extreme case. Creates some
instability. Ways forward: Don't really want to do slide 16 upper half
(although we want to apply PR18 as "default"), and explore progressive tag
validity. Saying "we do ad-hoc for a while" is kind of weird in internet
standard, but has been done by application already but rather make explicit
what people have been doing. I'd be comfortable with that, IESG might not.

JY: I think progressive tag validity is pretty experimental and it should be on
a "proposed standard" category RFC instead of "Internet standard". CB: OK,
certainly one way of handling this. Goes against "batteries included". Some
tags are really needed. JY: Yes, tags 2 and 3 perhaps we can say they use the
reactionary type but tag 1 definitely needs to move to the other document. I
get your point but maybe we should do it anyway. CB: Sounds like a way forward.
Sean Leonard (SL): I just have a question about the big num type, can someone
explain why someone wants to extend that? CB: Don't think there's controversy
here SL: That means noone sees a need to extend that in the future then. My
read of 7049, the reactionary approach didn’t seem so apparent.

CB: Some tags are really locked down, some are so experimental we got them
wrong. Think it's a good idea too, with CoSE that was split up between standard
and proposed. Only security area can do normative in informational docs. Not
the outcome I expected but we should try that. Alexey (AD): Not entirely true.
FP: Continue discussion on list? CB: Yes. FP: Proposal to start with? CB: I
think the next step would be to integrate a team, do this bit and see if we are
happy with what we have. It is much better to do this in the concrete, when it
is obvious what we want to do.

CB: Other ToDos (slide 18)
Already talked about strict.
IANA considerations to be updated. (btw: From related specs there's one tag
already in spec-required space). Reviews, WGLC? No WGLC but split doc, and then
round of reviews, then WGLC. FP: Can you give us an indicative timeline that
you estimate? CB: Should have split within April. FP: in three days then? CB:
More like April 10th. FP: ok

LL: About -bis. In syntax errors and well-formed...
CB: That's deleted by #17
LL: Was looking at github HEAD.
CB: Not up to date yet, want to check text that's not yet perfect, still going
through the large deletion commit. This is WIP. LL: So one thing is
secure-coding/secure-imp/sec-considerations for strict mode. How do we want to
approach this? I'd like to see that defensive decoding should be in security
considerations and nowhere else in the document, and be pretty sharp to the
point where CBOR decoders are not to be expected to rely on any outside
security, should say that. CB: Yes. LL: Strict mode talks about Firewalll...
CB: No longer. Should be on github HEAD. Point is that strict is cut down
anyway, the promises in security considerations should be very small. LL:
Right, probably 0. that was the main think I wanted to make sure. That's it.
CB: Someone with some hours free could look through appendix C and review for
whether it's defensive enough. It's pseudocode but still. LL: I wasn't worried
about that. I do have one more thing, which is in section 4.7 and the use of
MUST sometimes is used to refer to compliance to the whole protocol and other
times is to define stricter support. CB: Problem with those MUSTs (I think
there's 2 left in section 4), one is MUST for application which is slightly
weird, because we do that nowhere else. We have some implicit MUSTs, but that's
called out explicitly again in section 4. LL: That's the qualified one (if you
do it for serialization then you must do this) (chairs bringing up section 4)
LL: In Section 4.10, there are several MUSTs in deterministic encoding. I'd
call them qualified / conditional MUSTs. But there's a "No protocol MUST rely
on tag ordering", that should not be qualified but is absolute for CBOR.
There's two kinds of MUSTs, but that is confusing. CB: maybe putting the
deterministic encoding in the section that is about integrating cbor protocol
maybe it wasn't so smart. We are getting rid of 4.11 we want to structure the
text slightly different. JY: The 4.10 gives the protocol a piece of terminology
to define itself. Is in reasonable place I think, but if it's confusing people
it's not. Maybe move to own section" What authors can use to define their
protocols" and terms for other specs. CB: 4.9 and 4.10 could go into own
section, even in front of Section 4. FP: Would that answer your question LL?
LL: Yes. That means all MUSTs in 4 would be not qualified/conditional. There
would be text in intro that says some points here are MUSTs and some are not.

CB: Other CBOR housekeeping.

bormann-cbor-sequence
patterned after RFC7464. JSON has the type of patterned sequence. This document
mainly tries to register the missing media type and content format.

Compare documents, shows CBOR sequences is shorter, that's because CBOR is
easier. JSON sequence tries to leverage record separator to provide error
recovery, and I neither know how to do that nor think it's needed. Channels
already have integrity protection, if it breaks it breaks.

I published that thing and people gave positive feedback in order to have it as
a normative reference. Would like to accelerate this, it's not in charter so
plan A would be to find an AD sponsor and B would be the rechartering session
later.

Alexey AD: Which priority would this be compared to the work in the group?
CB: It's a small number of use cases, a trivial amount of work so it's balanced.
Alexey AD: ... ok ....
SL: Read draft, think it's fine, no problem with WG adopting it. Still curious
about use cases. Reasons for JSON sequences were streaming, curious what the
industry or demand fo rCBOR sequences is. CB: The one document that was defined
before was senml, it has a weird definition for streaming. With CBOR sequences
it could have been done right, other documents could also benefit. John
Mattsson had one example for using that. It is just a natural thing, when you
have an env where you have to send a record of data and you have the length, it
is not necessary to bind the record with an array tag.

CB: Who read? JS, AK (+ Sean Leonard).
SL: There is something you might want to think about. In JSON sequences they
show the record delimiter, without having to show where one sequence ends. For
PDUs that have a known length you could skip some of the code. You cannot do
that with CBOR CB: CBOR does not use delimiters, period. SL: neither is JSON.
CB: But JSON is not using the entire character set. Answer to your requirement
is there, the elements of the CBOR records is in the byte strings. You can get
ASN.1 beauty by packaging data item in byte string. SL: I can think of a few
ways of do it but it would be interesting on mechanism to skip large quantities
of cbor data. CB: If you need something skippable, that's the design pattern:
Put a byte string around. SL: Possibly tagged CB: Possibly, but in streaming
it's not necessary because that's just streaming. SL: Would it be tag as CBOR
encapsulated data? JY: Should this document suggest or require that pattern? CB
it could suggest, great idea, not require

FP: Continue to talk about this doc, its place in the working group in the
charter discussion later.

* CBOR Array Tag [10'] : Carsten
  https://tools.ietf.org/html/draft-ietf-cbor-array-tags

Array draft completed WGLC. Good editorial comments received.
(recapping what it does / is)

Work needed: editorials in tracker, more use cases for homogeneous and clamped
(Current text for clamped is easy to misunderstand, not telling anything about
history of data but how they should handle; that's very specific to work this
came from in JS environment).

Discussion around deterministic encoding of BE vs LE: conclusion is that
precision and byte order is application thing, and there is no deterministic
encoding at generic CBOR level. LL: on the BE vs LE, you could imagine a
generic encoder would have to support both. The point of choosing one is to
avoid the need to implement both. CB: Yes. This is out of character with the
rest of CBOR, and the nice thing about it is that CBOR is stable enough to
survive it. We can do that because the work we're interfacing with has
different principles, and typed arrays come in that form. LL: let me ask again,
why not LE? And say that's it. Why not mandate one to prevent ambiguity. CB:
Because referenced work allows both. If you build something informed by two
tracks of colliding standards, how do you handle that? Can make one group force
the other, or find something that breaks neither, and this will not break
Chronos and Javascript. LL: Ok CB: Jonathan convinced me that this is the right
thing to do.

Other:

FP: We are going to schedule interim meetings and will continue with the calls
every two weeks. Coordinating with ACE, CORE and COSE. CB: In interim between
103 and 104 we had regular CoRE and CBOR on Wednesday but on different times,
that was confusing. FP: Make sure that you enter your preference. AK: Can we
coordinate with OMA, 1DM, not to IETF level but avoid their slots. They'd also
be happy to adjust if they get sufficient heads-up. FP: Yes. Will still be
Wednesday but can fix time slot AK: Started collecting data on IoT directorate
wiki when who meets. Jaime Jimenez (JJ): Is now the right place to ask for
Jitsi instead of WebEx for those meetings. FP: With WebEx we can record, but
will look into it.  (From notetaker:
https://jitsi.org/live-streaming-and-recording-a-jitsi-conference/ )

* Next step for the WG (charter discussion) [15'] : Chairs / all

FP: Some documents now in backburner (missing on slides: coordinate reference
systems, and Jim's CMS content types for CBOR); rechartering. CB: Last one is
interesting b/c it's not something we do to interface with CMS, but it's
something CMS does to interface w/ CBOR JS: Put in to notify people here, not
to run it in this group. FP: So now we're aware. FP: Talked about sequence tag.
Uncharted and not adopted: OID/OIDs+ (may be taken out of that). CB: We need to
sort this into buckets. CDDL2.0 is about evolution of one of our core specs.
Not maintenance or housekeeping but evolution based on user requirements now
understood. That's one bucket with a single item (maybe also time tag). Other
bucket: Housekeeping (like sequence), and tag definitions of different
qualities (some housekeeping, some motivated by applications within IETF, eg.
template). Template came out of LPWAN, they are now thinking about data models.
That bucket of tags comes from inside IETF. Time tag is in different bucket
that's about things that need to be done as a part of the ecosystem. With most
tags being split out, we can think about restructuring. Third area of tags are
registrations from outside, like error tag or geographic coordinates. Mainly
domain of designated experts to talk with applicants, but if a tag with wider
applicability comes out that should be doc'd in RFC, don't know whether WG doc
or independent. Should think about that bucket, those will come in at
increasing speed. FP: Alexey, AD opinions? AM: Trying to come up with one. Was
checking charter. FP: CBOR-bis, CDDL, array and oid tag, and then recharter.
AM: general advice: prioritize them and don't take too many. Generic charter
point for extensions is fine, but limit your milestones and active documents.

AK: sequence stuff is actually very interesting, based on the discussions in
the RG seems that having this type of tools are very useful. Regarding the
Error tag, what is the interplay between CBOR and the protocol carrying it
(e.g., CoAP); should there be more detailed Bad Request version to help app
stack with this? CB: When we defined CoAP we thought about this problem for a
while and came up with errors that are supposed to be handled by the
application, these need to be defined that is why we have 404. There are other
errors that are hard to handle by the application, and can only be described by
the application. Those hard to handle errors "something went wrong" is often
only application-specifically describable, there's diagnostic payload for that.
HTTP now has something similar ("problem document"). Maybe worth looking at
that space to see whether we should do something at CBOR side. Will you write
draft? JS: I have to see very type problem statements before adding that. AK:
Just something for us to explore later-on, some exploration can be done in
T2TRG.

FP: I hear that there are 4 WG areas.
CB: There's evolution of core documents (CDDL2.0). There's housekeeping stuff.
And there are three buckets of tags (we want, IETF wants, others want). FP: So
we should prioritise on those to ensure we don’t take too many. AM (AD): No
opinion on which one you want to work on, just prioritize and pick. CB: The
importance goes across these four categories. Regarding the ones that the "ietf
wants" we can push to using side, keep us in the loop but have them do them.
The "we want": we still have to do them, they don't come with tight time lines,
they come as we slowly learn (little urgency, not that much work because when
it this WG the problem is well understood). Can be done with high prio because
they're little work. "outside wants" can be handled by just registering and
later fixing up spec, which is unclean but works for people with shipping
deadlines. Accommodates the people actually working on them.

AM (AD): would the wg like to check if they are competent enough to work on
some of the drafts or whether they have to be delegate to another one? This is
just input to your decision process. (chairs nod)

FP: was good discussion.
CB: timeline?

FP: We still have 3 docs that need to be published, but I don’t think we will
have anything very soon anyway. CB: Of the two main documents, one is shipped
and the other in the process of baking. This opens up new slots from my POV.
Don't want tags to be in the way of rechartering. Rechartering can be based on
whichever model we pick for handling tag documents. No strong position about
whether we should have shipped cbor bis for rechartering, but we should have a
timeline and be ready by the time that is shipped. AM (AD): Asking for
rechartering once you ship the first revision will help the IESG. This WG
started slow but it is keeping up pace. Demonstrating that the work gets done
would be good for IESG. CB: So ship array tag? JS: My expectation was that
chairs would, over next month, write candidate charter until May; when there's
agreement push to IESG. AM (AD): Sounds good. FP: Thanks!

===========================================================
Summary of APs:
===========================================================

CDDL:
* AP Carsten to call for CDDL "Doctors" in ml

CBORBis:
* AP Jeffrey to rebase PR17
* AP Carsten PR17 to check the deleted text is redundant (editorial)
* AP Carsten to add a paragraph to say "encoder/decoder may use preferred
encoding/decoding" and get rid of "strict mode" * AP Jeffrey to continue
discussing tag validity in the mailing list * AP Carsten to implement the
proposal (split progressive tags in a different doc) + notify mailing list * AP
Carsten update IANA considerations * AP Laurence to check that the text about
"defensive decoder" makes sense after Carsten has implemented the "strict mode"
changes * AP anybody to check the code in appendix C and see if that's
defensive enough * AP Carsten to move 4.9 and 4.10 to its own sections (in
front of existing section 4) + change text in section 4 to reflect that

Tags doc:
* AP Carsten to submit new version

bormann-cbor-sequence:
* AP Carsten to implement the proposal: should suggest (but not require) making
the stream-of-items be a stream-of-byte-sequences-containing items so that
decoders can quickly skip items or distribute their decoding to other threads,
and so that decoders can recover from brokenly-encoded items.