Tommy:
Greasing codepoints is only on dimension
Could we write down this advice and insight?
David:
MLS is almost out the door. We noticed that in our last EDM call that
they didn’t have grease. It took a lot of back and forth to explain when
and how you grease. Some people know how to do it and what it means, and
the TLS grease document doesn’t explain all the details. For example,
you don’t treat grease codepoints differently on the receiver — don’t
try to special case greased values. You can hold it wrong (JA3).
Martin Thomson: I’ve reviewed a number of grease changes, and I see
people writing code to handle grease on receipt. That’s OK for tests,
but very much not OK for production code.
David:
In QUIC, I had done chaos protection. Might be grease, or not. The first
flight in QUIC is not meaningfully encrypted. The way people implemented
the first packet was to have one CRYPTO frame, since that’s reasonable.
Then we found a bug with YouTube where the QUIC first packet had been
ossified to look for the TLS SNI at a specific fixed offset in the
packet. Fix was to up the CRYPTO frame and add padding in between, etc.
No problems since.
(It’s not GREASE — it’s LARD!)
Matt Joras:
We have similar problems. We don’t use the standard QUICv1 version to
make sure not everything is v1. We switched version number without doing
anything else different, and the metrics showed that things got way
better for performance…!! People had been applying policy based on SNI
broke these policies. This means that it skews measuring functional
changes. Ossification can apply to perf and policy, not just
functionality working.
Dave Thaler:
Good to mention this change of ordering.
Mention the relationship to fuzzing; you can fuzz protocols, not just
APIs. People think about fuzzing to avoid crashing.
Jared Mauch:
Akamai clients do a lot of their own telemetry. Do we want to
standardize any of these telemetry methods for how things are working
when you extend the protocol. Generally a performance degradation or
performance improvement, etc, so you need to look at perf metrics to
detect the differences.
As people are using more memory safe languages, they’re less used to
fuzzing, etc.
Tommy:
Jared: Shows the need for metrics. Look at retransmits and reordering,
etc. Data sharing during interop about what you actually tested.
Dave:
Hackathons are about interop testing. Can we have hackathon networks
that help people test in these environments. Induce reordering, packet
loss, etc.
Charles Eckel:
At the hackathon we do let people set up special network or let people
participate remotely.
Lucas:
Bugs are going to happen — stream handling is hard. Is there a curve
where we are very focused on correctness early on in the document and
protocol implementation development, and then you switch to focus on
perf and deployability, and then it’s “good enough”… and then you
interact with real applications that push the use cases and you hit
these kinds of issues. Progressive MP4 as an example that pushes the
limits of behavior.
Martin: Rare bugs happen all the time on the internet. Always the case.
Anything less than perfect will affect things. I don’t we can do
anything at the hackathon or metrics. I think they just needed a better
unit test. I found a similar bug that needed a crazy combination that I
only found by fuzzing. Agreed with the cycles of development that Lucas
described. The “good enough” point lets you get quite far and you can
miss things.
David:
Another anecdote. Google’s QUIC stack is pretty old now. We have a ton
of tests and users. We had a site that complained that stuff started
breaking a ton, once we switched to IETF QUIC. All our telemetry was
green. But if you hit a particular flow control block, twice in quick
succession, you’d lose the second unblock even and the connection would
get stuck. We then wrote a specific unit test for this.
For EDM, I think we have two values: one, write the documents for how to
make things better; two, what we do outside documents. A lot of these
bugs are found after publication. What if we had a list of fun stories
of ways this blew up and tests you should have.
Chris:
We’re all making good faith efforts to implement the protocols
correctly. But because of time pressure, and some point we get to “good
enough” and we ship something too early. If we had more time, would we
get it perfect?
Matt: We need to deploy to find out real world issues.
Chris: We can’t test every branch, and the protocol complexity made this
more possible. Harder to fully test every corner case.
Charles:
Make it easier to find code that has been implemented for drafts, and
published RFCs.
Could be reference implementation, tool, etc.
Speeds up the standards and deployment process.
Still useful without being open source.
We have “implementation status” in drafts. We have GitHub for
collaboration. We have the hackathon.
Dave:
Using the related-implementation tag in our WGs!
Give guidance in the doc to chairs on how to add references that aren’t
open source or public, etc.
Tommy: Should we publish this?
Dave: Somewhere yes! But seems like a BCP and maybe that’s not IAB
stream, and maybe AD sponsored.
As a WG chair, I would prefer to have a BCP to point to about the right
way to handle this.