Measurement and Analysis for Protocols Research Group (maprg) - IETF-99 (Prague)

Intro & Overview, Project "Advertisements"  - Mirja Kuhlewind & Dave Plonka 
---
Agenda is combination of:
-Reporting on nascent measurement results
-Call to arms for new projects spinning up

Chairs keen to hear feedback on how the agenda has been structured.

Lars Eggert: We had ANRW workshop on Saturday for second time, next summer will be in Montreal. There will be a CFP etc. This year we didn't realise conflict with IMC deadline - will try to avoid next year. Please keep in mind if you have something relevant to publish. If you are a North American academic and would like to be considered for TPC chair, let us know.

Dave Plonka: Where to send comments and feedback on ANRW?

Lars: irtf-discuss list.


IPv6 Reluctanct Devices and Applications - Mikael Abrahamsson
---
Abstract: There are numerous reports of consumer products and devices that either (a) don't support IPv6 yet or (b) ostensibly support IPv6 in that they configure an IPv6 address, but subsequently seem not to use it, perhaps due to their Happy Eyeballs implementation. We request that a measurement effort be undertaken to identify and report on this behavior, so that we can mitigate this impediment to IPv6[-only] operation. 

Dave: This is something we're interested in - can we proactively do something with measurement requests from IETF?

Fred Baker: Root operator looking at IPv6 deployment suggested that we go for interop test where we can prove that devices do IPv6-only. So people would get the message in a positive manner. Is this useful?

Mikael: Would this be for apps? Devices?

Fred: Any kind of device. Cisco is working with customers on IPv6-only deployment. Several networks too. So you're extending this to the reach of other places.

Mikael: Apple did NAT64 testing - IPv6-only WiFi - one way of testing. At least then you know that device and application can speak IPv6 - spreading that approach into more places would be good.

Tim Winters (UNH-IOL): Involved the program that monitor device IPv6 stacks. No smart TVs though - coverage not complete. We can configure device to turn on IPV6, but not necessarily on by default - could make that a requirement in future. Testing we do shows more application testing in the last year in IPv6-only environments - starting to see movement in that direction from vendors.

Mikael: Would love to see IPv6-default-on be a requirement.

Tim Winters (UNH-IOL): People don't always want it on all the time - I'll take an action to press on this. 

Mikael: When people say 'market doesn't require IPv6' I say market requires INternet access - today that is both stacks with default to IPv6-on.

Fred: Have preso later today on challenges of enabling IPv6

Dave: Interested in ideas from the group about how to conduct this measurement study. More discussion on the list, and if we have something to share by the next meeting that would be good.

Fingerprint-based detection of DNS hijacks using RIPE Atlas - Pawel Foremski
---
Abstract: The DNS protocol does not protect from hijacking resolver traffic. In fact, some network operators and government agencies transparently redirect DNS queries made from end-user devices to popular recursive resolvers to their own servers. This effectively allows the hijacking party to easily monitor, block, and manipulate the content published to the World Wide Web, and in general to control its Internet users.
In this paper, we present a novel fingerprinting algorithm that is able to detect hijacking of recursive DNS resolver traffic. By sending specific DNS queries to the resolver and analyzing the replies using machine learning algorithms, we are able to classify the resolver as legitimate or rogue. We evaluate our technique through the RIPE Atlas platform, collecting DNS measurements on Google Public DNS and OpenDNS servers from 9,500 probes distributed around the world.

Aaron Falk: Encourage you to publish this so I can share a link to your work. Surprised to see that RTT was on there given use of anycast for DNS. Was anycast a factor in your measurements? Did you look at the contents of the response - just interception or data manipulation?

Pawel: Regarding anycast, as far as I know both Google and OpenDNS are anycast operators so couldn't compare. Regarding replies we haven't had time yet to analyse - that is future work.

Colin Perkins: Did you do anything to infer the reason for the hijacking?

Pawel: Yes, interesting - that is future work. Hard to think of source of information to answer this question - motivation often hard to infer.

Jana Iyengar: This is great work, please publish. Have you considered calling some of these ISPs and just talking to them?

Pawel: Yes, we are considering that - but it would be future work.

Jana: We've done some of that and found that in some cases they are no clueful about the implications of what they are doing and willing to learn.

Giovane Moura: Root DDoS attacks in 2015 - can download data and noticed that hijacked probes had very short RTT - if you just use public measurements with standardised queries - difference between your method and that would be interesting. Does your method show more hijacks?

Pawel: That's interesting. We would need to compare the measurements. Data is open. Would be easy to run more meaurements.

??: How did you split your data set into training and testing? Did you perform any cross validation?

Pawel: 30 times randomly, so splitted equally. No cross validation yet but on our track.

Aaron Falk channeling jabber (Simon Ferlin): Would be interesting to see user experience for hijacked probes?

Pawel: Definitely but how would you collect this data? May crowd-sourcing?


Rate-limiting of IPv6 traceroutes is widespread: measurements and mitigations - Pablo Alvarez
---
Abstract: With IPv6, high-frequency traceroutes show many more missing hops than IPv4. This appears to be due to the fact that RFC 4443 states IPv6 routers MUST rate-limit the ICMPv6 error packets that traceroute and similar programs depend on to determine hops on a route. We measured the characteristics of this rate-limiting from many vantage points to thousands of targets across continents. We find that about 2/3 of all routers exhibit rate-limiting at frequencies < 100Hz. The distribution of the data suggest most of these rate-limits might be factory defaults. We discuss strategies to overcome this limitation, including altering the order of packets across single or multiple traceroutes, and merging information from traceroutes to different targets.

Mirja: Are there any defaults in the RFC?

Pablo: No.

Jen Linkova: Problem here is that there is default value that you could change. But hardware cannot be changed. Don't know how many ICMP messages router is sending to other hosts. Could also be affected by neighbour discovery - I'm not surprised to see IPv6 is worse here. Changing defaults may not be possible due to hardware.

Mikael Abrahamsson: Did you try changing source addresses.

Pablo: Did quick and dirty test - looks like limits are per router, not per source.

Mikael: I don't think you can make that leap - could be per line card, there might be several of these. Did you differentiate between getting between 0, 1, 2 or 3 responses for a router?

Pablo: Only testing each hop once.

Mikael: Would be interesting to see what happened if you did send more.

Andrew McGregor: Having worked on a few IPv6 router implementations it's likely there are higher limits out there. You're likely to find that despite CPU being fast, there isn't enough PPS available between control and forwarding planes to get limits higher - you find limits in exotic places inside router chassis. Yes it's usually linecard CPU doing this.

Pablo: There are additional rate limits beyond token bucket?

Aandrew: possibly, or limit is not token bucket, but some PPS limit inside router implementation - e.g. for all control plane packets.

Pablo: The data I got are consistent in most cases with a token bucket.

John Brzozowski: It would be interesting to run this internally within a large provider network. Addressing methodology could prevent getting a proper response - maybe links aren't using globally routable addresses.

Pablo: IPv4 traces often include RFC1918 addresses. Didn't filter out non-globally routable addresses. Can look at data more for this.

Mirja: Did you just say you want to conduct some measurements and make the results public?

John: I could do that.

Andrew ??: limit may be a hardware limit, e.g. if slides are used. You will hit limits, ICMP are usually software imposed limits, talking to vendors could help find a way to conduct this testing without difficulties like this. It's really old hardware that you would have a concern with.

Olivier Bonaventure: Is the data that you collect publicly available?

Pablo: Not sure - talk to me offline.

Olivier: Public data projects (CAIDA etc.) help to incentivise future researchers to make data public, and use existing traceroute tools instead of writing their own tools to get private data.

kIP: a Measured Approach to IPv6 Address Anonymization - David Plonka 
---
Abstract: Related pre-print, "kIP: a Measured Approach to IPv6 Address Anonymization" (Plonka & Berger)Privacy-minded Internet service operators anonymize IPv6 addresses by truncating them to a fixed length, perhaps due to long-standing use of this technique with IPv4 and a belief that it's "good enough." We claim that simple anonymization by truncation is suspect since it does not entail privacy guarantees nor does it take into account some common address assignment practices observed today. To investigate, with standard activity logs as input, we develop a counting method to determine a lower bound on the number of active IPv6 addresses that are simultaneously assigned, such as those of clients that access World-Wide Web services. In many instances, we find that these empirical measurements offer no evidence that truncating IPv6 addresses to a fixed number of bits, e.g., 48 in common practice, protects individuals' privacy. 
To remedy this problem, we propose kIP anonymization, an aggregation method that ensures a certain level of address privacy. Our method adaptively determines variable truncation lengths using parameter k, the desired number of active (rather than merely potential) addresses, e.g., 32 or 256, that can not be distinguished from each other once anonymized. We describe our implementation and present first results of its application to millions of real IPv6 client addresses active over a week's time, demonstrating both feasibility at large scale and ability to automatically adapt to each network's address assignment practice and synthesize a set of anonymous aggregates (prefixes), each of which is guaranteed to cover (contain) at least k of the active addresses. Each address is anonymized by truncating it to the length of its longest matching prefix in that set.

Tim Chown: Don't think you can tell identifiers apart based on randomisation. Sometimes you get new IID, sometimes you don't depending on address class.
    
Dave: 3rd column is stability of address. Can use sliding time window and test whether I've ever seen that value before. I'm saying this works if temp/slaac/privacy address mechanism - we can reject persistent pseudorandoms.

Alex ???: We can apply to other areas, e.g. rate-limiting for services. Would be great if in some way we could take this work into IETF to make a protocol where I could identify suitable prefix lengths for anonymization. European data legislation is also relevant now.

Dave: agree. are results portable - if place you're porting them from has better visibility than where you are. 

Alex: Can we publishing results and aggregating them? To know where to cut the prefix.

Dave: Integrate observations in some private way. There are methods that allow you to share addresses in an opaque way.

Mirja: Guidance document that we should take on as a group.

Dave: Don't know where in the IETF community to talk about such a thing. Could be candidate for a draft in this group initially.

Pablo Alvarez: In terms of re-use, do you have some idea of temporal stability. Do ISP properties change over time? How often would you need to re-new and share?

Dave: Don't know, but you're right to be concerned. It works well for offline analysis. We need to do more work here.

Mikael Abrahamsson: Anti-abuse handling in IETF would be interestd if you can identify prefix lengths allocated by ISPs. Where is the household level of aggregation. RIRs have a way of letting ISPs publish this, but nobody does. Measurement would help.

Dave: We can discover, or they can tell us - would love to compare the two.

Measuring Latency Variation in the Internet - Toke Hoiland-Jorgensen
---
Abstract: Related paper, "Measuring Latency Variation in the Internet" (Toke Hoiland-Jorgensen et al., CoNEXT '16)We analyse two complementary datasets to quantify the latency variation experienced by internet end-users: (i) a large-scale active measurement dataset (from the Measurement Lab Network Diagnostic Tool) which shed light on long-term trends and regional differences; and (ii) passive measurement data from an access aggregation link which is used to analyse the edge links closest to the user. 
The analysis shows that variation in latency is both common and of significant magnitude, with two thirds of samples exceeding 100 ms of variation. The variation is seen within single connections as well as between connections to the same client. The distribution of experienced latency variation is heavy-tailed, with the most affected clients seeing an order of magnitude larger variation than the least affected. In addition, there are large differences between regions, both within and between continents. Despite consistent improvements in throughput, most regions show no reduction in latency variation over time, and in one region it even increases. 
We examine load-induced queueing latency as a possible cause for the variation in latency and find that both datasets readily exhibit symptoms of queueing latency correlated with network load. Additionally, when this queueing latency does occur, it is of significant magnitude, more than 200 ms in the median. This indicates that load-induced queueing contributes significantly to the overall latency variation.

Mikael Abrahamsson: Is same TCP congestion avoidance algorithm used throughout measurement period?

Toke: Good questions. Don't know.

Andrew McGregor: MLabs server side TCP hasn't changed over this period. Client side may have.

Mikael: TCP Window scaling might also have changed over this time span.

Mirja: Did you try to detecet presence of AQM?

Toke: No, not directly.

Mirja: Data is from 2015? Most interesting data would be 2015-2017.

Toke: Think it would be straightforward to re-run this on a new dataset.

Mirja: Please share dataset and code with list.

Vaibhav Bajpai: Caveat - measurements towards MLabs - may not be 'normal'.

Toke: Not speedtest.net.

Vaibhav: That's not normal traffic either - traffic to facebook, google, etc. is 'normal'.

Qiabong Xie (Netflix): how big was total data set?

Toke: 200M - we erred on the side of minimising false positives, hence ended up with 5M flows.

Qiabong Xie (Netflix): if you see 80% of flow that see congestion it might not be sender induced.

Toke: some flows do not get a single congestion event at all over 10 seconds, but latency increases quite a lot.

Qiabong Xie (Netflix):: percentage means that rest of the flows don't have this pattern - likely cross-traffic induced, not sender induced.

Toke: could be - i wouldn't say that none of the other flows didn't have sender-induced congestion events, but certainly for a lot of these flows that is true.

Garret Tyson:: How did MLAb evolve during your measurement period? for example africa has more measurement servers now than previously.

Toke: that's true.

Measuring YouTube Content Delivery over IPv6 - Vaibhav Bajpai
---
Abstract: To appear in the SIGCOMM Computer Communication Review (CCR), July issue. We measure the performance of YouTube over IPv6 using ~100 SamKnows probes connected to dual-stacked networks representing 66 different origin ASes. Using a 29-months long (Aug 2014 - Jan 2017) dataset, we show that success rates of streaming a stall-free version of a video over IPv6 have improved over time. We show that a Happy Eyeballs (HE) race during initial TCP connection establishment leads to a strong (more than 97%) preference over IPv6. However, even though clients prefer streaming videos over IPv6, we observe worse performance over IPv6 than over IPv4. We witness consistently higher TCP connection establishment times and startup delays (~100 ms or more) over IPv6. We also observe consistently lower achieved throughput both for audio and video over IPv6. We observe less than 1% stall rates over both address families. Due to lower stall rates, bitrates that can be reliably streamed over both address families are comparable. However, in situations, where a stall does occur, 80% of the samples experience higher stall durations that are at least 1s longer over IPv6 and have not reduced over time. The worse performance over IPv6 is due to the disparity in the availability of Google Global Caches (GGC) over IPv6. The measurements performed in this work using the youtube test and the entire dataset is made available to the measurement community.

Mo Boucadair: There is no recommendation about 300ms in the RFC. It's up to the implementors to choose timers. Formally there's no update to the RFC required.

Vaibhav: Correct.

Dave Wilson: Interesting that stall rates, bit rates the same - would user really see a difference? Not sure that latency is that important for video.

Vaibhav: Startup delay is also worth looking at.

Dave Wilson: A lot of people think measuring latency is a guide to customer experience - but your results suggest otherwise.

Mikael Abrahamsson: Would be interesting to get the diagrams for the last 12 months of data to see what the current state is vs. what was happening in 2013. Don't want historical problems to impact conclusion about what to do now.

Jana Iyengar: regarding dip on slide 8 - is client running TCP?

Vaibhav: Yes that is TCP.

Jana: one of the graphs in my presentation coming up next is going to coincide exactly with that dip and i don't know why.

Ian Swett: Did you compare IPv4-only vs. Happy Eyeballs?

Vaibhav: We run a test over IPv6 then over IPv4. Almost at same time.

Ian: GGC nodes that don't speak IPv6 are just ignored from sample?

Vaibhav: Yes, we have to.

Ian: Results are surprising to me, but I will make some enquiries.

Mikael Abrahamsson: If closest GCC node is IPv4 only, then IPv6 tests will be over a different path.

Vaibhav: Would be interested in expansion of dual-stacked GGC nodes.

Giovane Moura: are earlier measurements comparable with this?

Vaibhav: 2015 work was latency towards websites in general, this work is specific to YouTube. 

Jen Linova: Would be interested to see difference between when nearest GGC node is dual-stack vs. when nearest node is IPv4-only.

Vaibhav: Yes we are working on this now.

Jen: Onus is on operators to provide IPv6 connectivity - it's not up to Google to provision IPv6 on GGC nodes - they'll be dualstack when on dualstack networks.

Vaibhav: Understood, my mistake.


The QUIC Transport Protocol: Design and Internet-Scale Deployment - Jana Iyengar
---
Abstract: QUIC is an encrypted, multiplexed, and low-latency transport protocol designed from the ground up to improve transport performance for HTTPS traffic and to enable rapid deployment and continued evolution of transport mechanisms. QUIC has been globally deployed at Google on thousands of servers and is used to serve traffic to a range of clients including a widely-used web browser (Chrome) and a popular mobile video streaming app (YouTube). We estimate that 7% of Internet traffic is now QUIC. This talk will cover the Internet-scale process that we used to perform iterative experiments on QUIC, performance improvements seen by our various services, and our experience deploying QUIC globally. 

Mirja: Which connection control does this data use?

Jana: Cubic with two connection emulation because TCP uses to connection.

Andrew Doganow: Now QUIC is by default enabled in Chrome. I live in Singapore where you're not helping; you making it slower because slow connections is e.g. 1 Gig or 200MB/s. What we're seeing is some of the infrastructure is blocking QUIC for various 'security' reasons. Are you looking at making it more geo-aware?

Jana: First thing is that it shouldn't hurt. If operators are blocking it, that's fine - Chrome will use TCP.

Andrew D.: Fallback delay is what I'm talking about - it still works, but addtional delay is noticeable.

Jana: Expect that QUIC will be more feature-rich in future, e.g. encryption integrated. Would like to better understand what delays are that you are seeing. We are working on issues where middleboxes will drop QUIC traffic after the handshake.

Colin Perkins: Are you able to break down performance results based on TCP version running at the receiver. Do you win out against all versions fo TCP, or do you not have the data to tell?

Jana: Don't have that data. There's some other micro-benchmarking data. Chrome user base is mostly Windows, so you are asking which version of Windows. Chrome distribution data would help - might be able to get that.

Mikael Abrahamsson: Are you doning packet-level path MTU discovery?

Jana: Did a bunch of experiments and fixed at 1350 bytes for all of our clients

Martin Gunnarsson: Were mobile clients connected over wifi or cellular?

Jana: Don't have that data right now.

Jana: Please read the paper. There's a fun story about middlebox ossification; believe it or not QUIC already ossifying.

Mirja: Thanks to all our speakers. If you have data, send it to the mailing list. If you would like to present at our next meeting the earlier you can let the chairs know the better.