# EDM at IETF 116 {#edm-at-ietf-116} ## Greasing {#greasing} Tommy: * Examples like MLS; doesn’t have greasing, but we could add it * Greasing codepoints is only on dimension * Ordering, changing other behaviors * Some “greasing” is harmful for perf and likely won’t be done * Could we write down this advice and insight? * “Guidelines for greasing effectively” * Something to refer to other than TLS drafts * Could also point people to use-it-or-lose-it? But not as direct on useful advice David: MLS is almost out the door. We noticed that in our last EDM call that they didn’t have grease. It took a lot of back and forth to explain when and how you grease. Some people know how to do it and what it means, and the TLS grease document doesn’t explain all the details. For example, you don’t treat grease codepoints differently on the receiver — don’t try to special case greased values. You can hold it wrong (JA3). Martin Thomson: I’ve reviewed a number of grease changes, and I see people writing code to handle grease on receipt. That’s OK for tests, but very much not OK for production code. David: In QUIC, I had done chaos protection. Might be grease, or not. The first flight in QUIC is not meaningfully encrypted. The way people implemented the first packet was to have one CRYPTO frame, since that’s reasonable. Then we found a bug with YouTube where the QUIC first packet had been ossified to look for the TLS SNI at a specific fixed offset in the packet. Fix was to up the CRYPTO frame and add padding in between, etc. No problems since. (It’s not GREASE — it’s LARD!) Matt Joras: We have similar problems. We don’t use the standard QUICv1 version to make sure not everything is v1. We switched version number without doing anything else different, and the metrics showed that things got way better for performance…!! People had been applying policy based on SNI broke these policies. This means that it skews measuring functional changes. Ossification can apply to perf and policy, not just functionality working. Dave Thaler: Good to mention this change of ordering. Mention the relationship to fuzzing; you can fuzz protocols, not just APIs. People think about fuzzing to avoid crashing. Jared Mauch: Akamai clients do a lot of their own telemetry. Do we want to standardize any of these telemetry methods for how things are working when you extend the protocol. Generally a performance degradation or performance improvement, etc, so you need to look at perf metrics to detect the differences. As people are using more memory safe languages, they’re less used to fuzzing, etc. ## Testing deployments in different environments {#testing-deployments-in-different-environments} Tommy: * Mentioned testing in different network environments * Loss/reordering seen in FINAL\_SIZE bug * Aggregated metrics can hide these issues if not specifically called out Jared: Shows the need for metrics. Look at retransmits and reordering, etc. Data sharing during interop about what you actually tested. Dave: Hackathons are about interop testing. Can we have hackathon networks that help people test in these environments. Induce reordering, packet loss, etc. Charles Eckel: At the hackathon we do let people set up special network or let people participate remotely. Lucas: Bugs are going to happen — stream handling is hard. Is there a curve where we are very focused on correctness early on in the document and protocol implementation development, and then you switch to focus on perf and deployability, and then it’s “good enough”… and then you interact with real applications that push the use cases and you hit these kinds of issues. Progressive MP4 as an example that pushes the limits of behavior. Martin: Rare bugs happen all the time on the internet. Always the case. Anything less than perfect will affect things. I don’t we can do anything at the hackathon or metrics. I think they just needed a better unit test. I found a similar bug that needed a crazy combination that I only found by fuzzing. Agreed with the cycles of development that Lucas described. The “good enough” point lets you get quite far and you can miss things. David: Another anecdote. Google’s QUIC stack is pretty old now. We have a ton of tests and users. We had a site that complained that stuff started breaking a ton, once we switched to IETF QUIC. All our telemetry was green. But if you hit a particular flow control block, twice in quick succession, you’d lose the second unblock even and the connection would get stuck. We then wrote a specific unit test for this. For EDM, I think we have two values: one, write the documents for how to make things better; two, what we do outside documents. A lot of these bugs are found after publication. What if we had a list of fun stories of ways this blew up and tests you should have. Chris: We’re all making good faith efforts to implement the protocols correctly. But because of time pressure, and some point we get to “good enough” and we ship something too early. If we had more time, would we get it perfect? Matt: We need to deploy to find out real world issues. Chris: We can’t test every branch, and the protocol complexity made this more possible. Harder to fully test every corner case. ## Finding code {#finding-code} Charles: Make it easier to find code that has been implemented for drafts, and published RFCs. Could be reference implementation, tool, etc. Speeds up the standards and deployment process. Still useful without being open source. We have “implementation status” in drafts. We have GitHub for collaboration. We have the hackathon. Dave: Using the related-implementation tag in our WGs! Give guidance in the doc to chairs on how to add references that aren’t open source or public, etc. Tommy: Should we publish this? Dave: Somewhere yes! But seems like a BCP and maybe that’s not IAB stream, and maybe AD sponsored. As a WG chair, I would prefer to have a BCP to point to about the right way to handle this. ## TODO items {#todo-items} * Working on greasing advice document — Lucas volunteered to help write text * Adopt Charles’s doc — but seems like not as an IAB stream document, so we’d want to find an AD sponsor. * Need to have an IAB tools team liaison help drive fixes to datatracker to let you add “related-implementation” tags on adopted documents as an author, etc. File a ticket first, then start the conversation.