MAPRG at IETF 105 

Note takers: Rich Salz / Brian Trammell

Overview & Status - Dave Plonka & Mirja Kühlewind


Understanding Evolution and Adoption of Top Level Domains and DNSSEC - Yo-Der Song (remote)
--------------------------------------------------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-understanding-evolution-and-adoption-of-top-level-domains-and-dnssec-01

George Michaelson: A study like this needs to explore the concept of generative (hash) domain names for botnet C&C. New gTLDs are attractive targets for abuse due to the ease of getting names. Many new TLDs have no intention of delegation -- e.g. .ibm is not an open domain, and its growth is driven by its business mission. You typify ccTLDs as if there is ICANN-prescribed normative behavior. A ccTLD represents a government; it's only different in the quality that they have tanks. Your paper was really interesting and thank you for presenting it!

Tim Wattenberg: Where'd you get the data, and what kind is it? Which kind of analysis did you do with that data? 
Yo-Der Song: three datasets from NZ: In presentation there is overview graph and second level domains created over time for .kiwi and .nz 
Tim: And you got numbers from them but no zone files?
Yo-Der Song: Yes, just numbers.
Tim: RIPE Atlas may be useful for future research

Ulrich Wisser (from The Swedish Internet Foundation): To George we are not a government agency. For the record, we do not own tanks. ccTLDs are all over the map. More domains signing DNSSEC than sending DNS records. Do these newly signed domains support CDNSKEY?
Yo-Der Song: From metadata, information not available. 

Ben Schwartz: 10% of records are invalid, can your clarify what you mean by invalid? e.g your they fail with CERTFAIL?
Yo-Der Song: It believe it is hash fail with DSNKEY record but bot entirely sure.

Hiromu Shiozawa (JPNIC): On top ten TLD graph. I get a similar analysis for our data. Which data do you use?
Yo-Der Song: Top ten TLV was based on local campus network at Uni Auckland.

Dave: You mention problem oof overlap e.g. between .kiwi and .nz. I'm curious, how do you effectively determine "clones"/aliais within your dataset? Do you have to do active measurement or can you get this from the registrations?
Yo-Der Song: Didn't look specifically at this but something to look at in the future


TLS 1.3 Client Measurements - Tommy Pauly 
---------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-tls-13-client-measurements-0-00

George Michaelson: Is this our sample from Apple only from your clients, or other measuring points?
Tommy: Population of beta users or opt-in data collection on normal devices across the world e.g. on mobile or wireless.
George Michaelson: Is there a skew in measuring because of the data set (e.g., no Android measurements or older phones). Might be a depending on update circles because of negotiation. It come to the question is that only newer devices you measure. Having a larger population v6 would have a higher outtake because it's more present on newer releases. 80% of v6 in mobile.
Tommy: Yes, git points. Also only getting data on devices that have been upgraded to enable 1.3. So these are no older devices and no devices we don't own. It is early adopters on the client side.

Ian Swett: Similar question about selection bias: was wondering if you have attempted to do hold back experiment where half of the user did not have 1.3 enabled but recorded when 1.3 was available and see if other half actually got better. Because that would remove server selection bias and we found that for quic that's was kind of important.
Tommy: Yes, early on before when were turning it on by default we would do probing essentially detecting when LTS 1.3 was available even if we didn't use it. But because there is the performance benefit we didn't want to hold people back. But yes, that's a very good point.

Jan Rüth: PDF seems to show that TLS 1.3 servers seems to be better connected and more up to date. So what the real reason for the low PDF? Are they just better connected or is it the protocol?
Tommy: Yes, very good point. Lots of bias. Also that we show that these are servers that are clearly adopting "leading edge" (such as 1.3 and ipv6), they  are probably doing other optimizations too. In isolated measurement we definitely do see the benefits. RTT are a mayor fact. I don't think they dominate this but your are correct.

Roland van Rijswijk: Lead remark about they being leaders in adopting this. I have a request: Can you look if these people have their domains signed and if is was valid by looking at AD bit in the DNS response that our clients get to determine DNSSEC deployment?
Tommy: We don't have that but that would be great. Noted

Brian Trammell: There are a few things that you can look into in the responses that might identify server software, so you can correlate early adopter by what software they use. Because on of the things we saw for example with ECN is that nobody cares about actually turning it now it's just that defaults are better. Would bee good for this community to know to what extent is the work we doing driving defaults some of this deployment, and to what extent is this people driving these decision?
Tommy: good point, will need to figure out how to do that in a privacy-preserving way.

Jana: Very similar point: Have you tried fingerprinting SSL libraries at the server to corrective if there are libraries that have bad defaults turned on and then performance actually sucks and that's why people have actually turned it off? That's another correlation would I would be interested in but I don't know how easy/feasible it is for you to fingerprint SSL libraries? Or if you would even do it?
Tommy: There would be the concern about ossifying on the fingerprints of a given implementation; we've seen firewalls expecting certain pattern of our own TLS implementation from our clients, so it's a tricky area.
Jana: Did not see any breakages mentioned.
Tommy: Yes we won't share breakage numbers but it's very low and pretty much the same as 1.2; nothing specific to 1.3.
Jana: That's fantastic. Thanks!


Measuring QUIC Dynamics over a High Delay Path - Gorry Fairhurst
----------------------------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-measuring-quic-dynamics-over-a-high-delay-path-01

Christian Huitema: You are mentioning QUIC, but you're testing one of 19 implementations. We'll get 19 different results. 
Gorry: When they are done, we expect less diversity. Doubtful they will vary all that much with these simple tests.
Christian: Disagree. There is different when usin v4/v6, or how to check window sizes, etc.
Gorry: Let's get together to talk about the results and see why we get them.

Jana:  Super impressed how close QUIC is to PEP TCP and far it in to plain TCP, which also makes me skeptical. You're using Quicly, which is the right implementation because I'm partially responsible for the performance of this implementation, but I'm trying to understand what the TCP PEP is doing. Is it a full terminator?
Gorry: It's Split TCP you can see at traces.
Jana: Part of what you see is the different between Reno and Cubic because quickly currently implements Reno. For the file sizes you have I expect Cubic ramp up much faster because this is a high BDP link, so I expected a different there.
Gorry: There might be also some timer issues that could be fixed. This is not a complete piece of work, it's a start.
Jana: Super happy to continue working with this and fix quickly implementation for any bugs.. Marten Seemann and I have a simulation environment for testing. It would be good validate this stuff with the simulator to see if you can replicate the same effects there. 

Ian: Thanks for not using chromium quic, since there are many paper and benchmarks based on that and that could overfit one implementation. Any maybe have a bug in there that turn out to be a performance enhancer. That's always my far. Thanks for picking an implementation that is the closes to the most current version of the recovery and congestion control spec, so it validates that spec largely works pretty well. There are some approaches that could improve performance of quic here, any maybe TCP as well. BBR is one of them. There might be other things that could mitigate this.

Aaron Falk: Do you know heater AQM is in use in those networks?
Gorry: They were not in use in those experiments.
Aaron: I'd be interested in seen how that's compared. I would also be interested in comparison of fairness in terrestrial network. How QUIc behaves compared to TCP.
Gorry: Would probably put this in a simulator because there are also some wireless effect here.

Colin Perkins: Would like to echo Christian's point. Lot of people have read papers about certain quic implementation and widely extrapolate those results. This group or qui group might consider to put out a statement about how to benchmark quic and how to report realists and how to describe what you are evaluation. It#s harming reception of quic when people to experiment with very early version of quic compared to well-developed versions of TCP and right very general conclusions.
Gorry: That what I wanted to present here. We use quic and we will keep up to date and that the only way to benchmark it now. Please don't use gQUIC benchmark as your baseline. It's a different things. We really need to track latest version of quic wg.
Colin: Need to make sure publication are more specific about what the test.


Trials and tribulations of migrating to IETF QUIC - Ian Swett
-------------------------------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-trials-and-tribulations-of-migrating-to-ietf-quic-00

Jana: Thanks for bringing this. Was watching what happens when the gQUIC format changes and as expected, it has already been ossified. This is exactly what we were worried about and it seems like some of them are actually updating, which is promising. But you mention the breakage a couple of RTTs in which suggest a certain behaviour in middle boxes: they allow a few packets to go though and then die. We've seen this before and given we see this again, I wonder ifs this is the sort of thing you want to protect against, because if we assume that either all packet go through or always all packets are block that is not compatible with how middleboxes are build because they might pass it on to different node that actually does detecting of what type of packet it is; by the time this comes back you already forwarded a few packets. Not sure exactly what's causing this to happen but we've seen that. May be worthwhile to document some of this detections for others to use, because  it would be super useful also for non chroming client to build these detections.  And second I wonder is this is some we can specifically detect as behavior and use for failover in chrome?
Ian: We thought about specifically detecting it but it turns out that so far we have only observed one vendor actually do this. So we can email the vendor and ask to fix it. We#ve done that. Otherwise we ossify the protocol on this weird this and the have this workaround feature and are stuck with it forever and everybody else would need to add it too. I'd be okay if chrome had to add it because we are early adopters and pushing it, but if every other client on earth would have to add a bunch of extra heuristics that would be kind of sad.
Mirja: Insights about why this behavior or the vendor's intention?
Ian: Enterprise policy to drop quic. Way they drop unknown protocols is to allow few packets in each direction. If after 2-3 packets they haven't figured out what protocol they start dropping. QUIC go classed in v46 as unknot protocol and got into this weird bin. The other thing I would like to comment: This isn't good for any transport weather SCTP over DTLS. If you led th handshake through and then black-hole it, it's gone be a really bad time for the user experience. So to support UDP innovation the Internet, I would like people to try to avoid doing this, not just for QUIC. That is not just for us that is for anything.
Jana: What I've seen on console of middle box is that is say "packet received; can't classify; cannot classify" and then drops it. It's possible that is trying to do classification on different thread and then thread times out or give up and then it decided to black-hole. Agree this is a huge issue. Unfortunately people how can change this aren't necessarily in this room-

Tommy Pauly: Is blackholing only on a per-connection basis or and not the whole device, so  per-IP?
Ian: Typically it is the whole IP. Basically every QUIC connection on that host starts getting black-hold. Because it's a middle box where all quic connections go though. It's sort of host based.
Tommy: Because there is also be background data because of course always something might drop off after a few packets. Would it be possible to look at all parallel connection and if this other connection is further along it was just a network impairment. That would not work in this case?
Ian: We don't really trace that. Can't say if that is true. 

Brian Trammell: As co-editor of manageability draft, we will file you a couple of issue to take this presentation into the draft. You said dropping all packets for the connection is better than some from user visibility. That's kind of obvious and we should actually write this down and yell it from the tree tops of the IETF. You said for one of these you get the first 3 packets and then you drop. Do you see other thing where you see intermediate drops after you have stun up some actual data. Can you characterize the drop pattern you see?
Ian: We see some other post-handshake dropouts. Anecdotally seems to be NAT related, instead of having a usage based timeout, it's fixed timeout of e.g. after 60s I timeout this UDP instead of after 60s of idle and then you have to ping to reopen it.
Brian: See this also on non quic traffic. 
Ian: Not super common but does happen.


Packet Loss Signaling for Encrypted Protocols -Alexandre Ferrieux
---------------------------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-packet-loss-signaling-for-encrypted-protocols-01

Ian: Interesting data. For spin bit there is a fair amount of analysis what happens if one of the two endpoints ties to "subvert" the signal. It seem like it would be very easy to convince a middle box to give it any signal you wanted.  Do you have thoughts on how to reject implementation that are either buggy or not complying to the scheme.
Alexandre: It's unilateral, only server sets bits and that's what we are most interested in.
Ian: Right now using these bits would make header protection to fail and drop packet at received.
Igor Lubashev: No worse than TCP's plaintext.


The RPKI Wayback Machine - Roland Van Rijswijk
------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-the-rpki-wayback-machine-01

Robert Kisteleki: Not sure if it's worthwhile, but an option to disable manifest check might get the earlier data.
Roland: Yes, and we'll make the validated-only data available.
Brian: Former RPKI skeptic.  Thanks. Volume, coverage, and visualization of data is useful to see trends. There is a risk to turning data validation on. Risk for your dataset is 1/accuracy, right?
Roland: yes
Brian: And it's prefixes or announcements?
Roland: Announcements have prefixes in them


Recyling Large-Scale Internet Measurements to Study the Internet's Control Plan - Jan Rüth
-------------------------------------------------------------------------------
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-recycling-large-scale-internet-measurements-to-study-the-internets-control-plane-00


Hackathon Report - Dave Plonka
----------------
-> did not present, see the slides
https://datatracker.ietf.org/meeting/105/materials/slides-105-maprg-maprg-hackathon-report-01