cbor at IETF122

Physical: Boromphimarn 4
Online: https://meetecho-sin.ietf.org/client/?session=33820

Chairs: Barry Leiba, Christian Amsüss (remote)
Notes: MT, CA

Agenda

Agenda; document overview (3 minutes, chairs)
Packed CBOR (10 minutes, CB)
dns-cbor (10 minutes, ML)
EDN / application-oriented literals (35 minutes, CB / JH / RM)

Notes

BL doing introductions

CA: Any minute taker?
BL: MT will help.

Document overview

CB: CDE has 10d in WGLC left. Still hard for me to make proposals about
that, hopefully will do before the WGLC ends.
CA: Noting published RFCs since last meeting

Packed CBOR

CB presenting
https://datatracker.ietf.org/meeting/122/materials/slides-122-cbor-packed-cbor-00

CB (p1): Packed is not data compression; more "succinct data
structures", useful when there is redundancy. This relies on tags and
simple values. There was a first WGLC 3 years ago, then there was
implementation work.
CB (p1): Remaining question on consumption of tags. Other documents want
to use packed CBOR too.

CB (p2): Current status of tag numbers allocated. The considered
timeframe is decades.
CB: Using 1+0 is uncontroversial so far.
CB: How many 1+1? Used 74 of 232. Have to do that for decades (30, RM
said 50). Gives us 3/year; didn't manage for initial tags (to be
expected; "sofas for the new apartment"; DE aware).
CB: This draft would take some ground here. How do we curate this
resource? No right answer -- all about estimation.
CB (p3): Proposal (using 1/4 of remaining 1+1), additional proposals
tuning that number.

CB (p4): Details on the radical idea, about not using tags at all for
references. Instead, use only tag 6 for everything. The cost is the
additional array encoding, i.e., slightly more than 1 byte per
reference.
CB on "16+8": 16 straight references, 8 inverted; they differ in order
of component that go into operation.

RM: Like radical version. If we do radical by default, can we introduce
8 or 16 or 42 later when we see lots of deployments?
CB: Yes, the radical reduction makes references work.
CB: We won't remove the reduced way when adding the new ones, creating
some redundancy and thus inefficiency, but yes, we can add later.

MM: The reduced might work for us well, but looking at consumption we
use, radical and maybe-reduced have no difference for us -- anyway, we
would use many more backreferences. We will be using thousands. Are we
able to have a tag that does things a bit more efficiently, say,
6(nnn(x)) to use nnn for inverted reference?
CB: Could design a lot of additional mechanism, would prefer to still
keep this simple, which is why radical proposal does not do that.
CB: Difference in perspective. You might have tables with 1000s of
entries. Some applications work that way. Other perspective is
constrained-node perspective that mainly uses 1+0/1+1, as well as others
(compressing SBOM with large numbers of tags but many very likely tags
that'd go for 1+1).
CB: Could take much time to research.

CB, concluding: People like radical approach, don't have numbers. We
need good benchmarks to decide.

dns-cbor

ML presenting

Presented slides:
https://datatracker.ietf.org/meeting/122/materials/slides-122-cbor-a-concise-binary-object-representation-cbor-of-dns-messages-01.pdf

ML(p2): recap

ML(p3): Example, showing a message size reduction by 50%.
ML(p4): Overall objective of reduction for queries and responses.
ML(p5): IANA reviews addressed, expert review not done (but some
discussion already had)

ML(p7): Example on RR sets.
ML: Useful for DNSSec signing, but also to reduce size.

ML(p8-18): Example on name compression by reference. That can be done
for prefixes and suffixes. Which tag to use? From the 1+0 or 1+1 space?
Possible resolution: use shared values instead. That was tried out,
resulting in a "Virtual" packing table implicitly added to the message.
The setup tag can be made implicit when using the new dns+cbor
Content-Format.
ML: With that, we can use pre-shared values.

ML (p20=21): with story, ready for adoption?

MCR: Yes, please adopt. I thought it was already.
MCR: Question: What are effects of packed to your efforts?
ML: For named compression, not much effect, using shared references (?).
For address compression (optional), impact of either radical or reduced
is not big. (Not gaining too much there anyway)
MCR: So Chairs please adopt.
CB: Packed goes from 2-3 to 1-byte. This already improves things.

CL (on chat): +1 on adoption, yes. I'm in the same category, I thought
this was a WG doc already.

RM: Weird to do this here. What did DNS folks say about doing it there,
with experts from CBOR commenting?
RM: What did the DNS folk say?
ML: Last time I asked, passed by DNSdir and asked here, and all were
happy.

AN: Where does this fit in charter?
BL: → The Chairs will check that.

CB: On which WG, either way needs to stay mutually informed.
CB: On the CoRE similar document, got early DNS review, and LC work was
shared with two DNS WGs (DNSOP and DPRIVE). But this is about using CBOR
optimally, so it should rather be here.

CL (chat): +1 to adopt.

EDN / application-oriented literals

CB: Background slides, then switching to RM.

CB presenting
https://datatracker.ietf.org/meeting/122/materials/slides-122-cbor-edn-background-00

CB (p1): History/context -- extension points. Had none in EDN to reflect
that exension point, fixing that gap with app-literals; WGLC brought
feature creep, now done.
CB: Specific point open; e-ref (in active use) and others depend it.

RM: Why go over history again then?

CB: Requirements: humane representation. Superset of JSON
CB (p3): Third Party extensions; part of CBOR ecosystem w/o WG work.
(Similar with CDDL)
CB: From that, requirements. Not destabilizing EDN when adding new
extensions; not destabilizing extended implementations; tolerance to not
total deployment of new extensions.
CB (p5): usability (addressed by EDN). People use Unicode; ASCII as
baseline. (Requirement fulfilled by JSON and previously not stated
explicitly.)

CB: When implementing, something CBOR-related but living outside CBOR is
created. We don't want to require new extensions to neatly integrate
into other based specifications and to necessarily adhere with
implementation approaches.
CB: I have a URI library I can use, I don't have ABNF I trust (3986 has
??).

CA (chat): That data type may not be typically described in ABNF.

RM presenting
https://datatracker.ietf.org/meeting/122/materials/slides-122-cbor-sessa-edn-escaping-discussion-01

RM (p1): More explicit on details now.
RM (p2): On EDN app-strings; what's agreed and what is not now.
RM summarizing what we have consensus on.
CB (chat): +1
RM: Consensus on comments may come in app strings.

MCR: I discovered that there are base64 decoders intolerant of white
spaces between base64 quadruples. Should clarify if allowed.
RM: Some don't tolerate any whitespace at all.
CB (on chat): The current document allows white space and comments
everywhere for base64. There is no reason to placate specific b64
implementations. Extracting the b64 is a function of the extension.
MCR: Don't surprise people by making it only every 4 characters.

RM: Controversial items (one-leve/two-level is least important, b/c
doens't affect implementor)

RM (p3): What escape do we need and want? I think we have consensus on
JSON compat. Admit EDN double-quoted strings that are not allowed in
JSON?

CA (no hat): It's important. The way of JSON of doing modern Unicode is
unintuitive and already incompatible. I wouldn't dance around it.
CL (on chat): If it looks JSONey, it should be the same as JSON. I don't
feel this super strongly, though. A optional flag is just cowardice on
the part of spec writers. :)

RM: OK with lots of those, just want them explicit.

RM (p4): Don't want to support both mechanisms in single-quoted string.

CB (on chat): It needs to be allowed for ", but we can outlaw it for '
CA (chat): +1
CB: We don't have strong compatibility requirements in EDN. We can
outlaw things now. Still not sure I want people to have to edit their
Unicode characters.

RM: If we support the curly brace notation for double quoted strings,
and one has a JSON document, do you think that can be turned into a
single quoted string without re-encoding?
CB: Yes.
CA (chat): We'd be compatible in the recommended set of ways (as opposed
to the legacy \uxxx\uxxx UTF-16ish in JSON)
CA (chat): No. I don't see the conversion from " in a JSON compatible
way as so important

RM: On escaping and comments. Doesn't matter too much b/c they are
discarded anyway.
RM: Can we have a backslash, a single quote, and doesn't end up as
single quote? The thing that receives the output of an h'' app string
doesn't get the \' any more, it just gets the '. Does that make sense?
CA: To me yes.
RM: If you have a date, date has ' in time zone, \' of EDN does not
appear in whatever processes the date.
CB: Correct.
MCR: Yes.

RM (p6): I think we're close to consensus that some systems want ASCII
EDN. Don't know. Both JH and CA gave examples where it's useful to
convert.
RM: CA brought up machine processing for non-ASCII to ASCII. Sounds
reasonable, heard no objections.

CB (on chat): ASCII-only can be useful for keyboarding and for ogling
(careful human scrutiny)

RM: No need for globally defined \u encoding in most extensions --
either they use restricted set (IP, date, e-ref, using ASCII), or if
they don't, they have native extension mechanisms. Hypothetically, if
had some systems that didn't have native encoding, could as author of
that extension add \u yourself. But means that machine processing of
extensions (...). If you don't understand an extension, can't process
that.

CB (on chat): We have two different objectives in escape processing
here: finding the end of the string; producing the input to the
app-extension
CL (on chat): If escaping isn't consistent, a syntax highlighter won't
be able to highlight it generically without understanding the type as
well.

RM asked JH: p7, providing a new format would be harmful
CA: Great example because two thirds of them are already wrong (xn-- is
distinct CRI, & is not even defined that way)

CB (on chat): I think Joe's "ignore char after \ " rule is great for
the first

RM: No personal position. Back to an early slide from Carsten, we don't
want to constrain what other libraries do. Libraries can already do
that. We cannot use a new mechanisms but I simpatize for Joe's argument.

CB: Important to keep in mind that our examples are about formats from
the IETF. Still it violates compatibilities.
CA: Fine to use application specific when some lib does it, just don't
want to require it from every single format.

BL: Out of time, need to come back on list.

AOB

BL: Check dates on interim calls, to be requested soon. Will ask for
1h30 at IETF 123.