Minutes IETF105: pearg
minutes-105-pearg-00
Meeting Minutes | Privacy Enhancements and Assessments Research Group (pearg) RG | |
---|---|---|
Date and time | 2019-07-24 17:30 | |
Title | Minutes IETF105: pearg | |
State | Active | |
Other versions | plain text | |
Last updated | 2019-07-28 |
minutes-105-pearg-00
1. Privacy Standards and Anti-Standards Pete Snyder presenting Chair of W3C PING Standards as a privacy-focused implementor (Brave) With respect to the W3C DOM APIs, etc. Brave violates standards in order to improve user privacy. Example: AudioContext. To reduce fingerprinting, Brave nulls out" a lot of things that the browser can provide, like hardware capabilities. (Only in third party contexts.) Many other cases with a similar behavior. Common problems: 1. Standards are extremely specific about what has to be implemented, but not at all specific about how to mitigate the resulting privacy concerns. W3C standards often have a privacy section, but it's not normative and doesn't provide mitigations, just a list of concerns. Websites assume the standard behavior, so mitigations break things. Example: Referrer. There are privacy problems, but the Privacy Considerations in the spec just say "user agents can do whatever they want". But websites rely on this, so if the User-Agent actually EKR: What would you like this text to say? Pete: Don't expect this in third-party context, Referrer should only be sent on user gesture, etc. EKR: I think you have an optimistic view of whether web authors read specifications. They just use whatever Chrome does. Pete: They use derived outputs from the specification. EKR: (disagree) Pete: Chrome is following the specification, so we need to work on the specification EKR: We can take this offline 2. Uncommon use cases Example: Canvas lets you read things out of canvas, not just write to the canvas. This is very uncommon for ordinary canvas users, but widely used for fingerprinting canvas behaviors. EKR: How would you restrict this. Pete: User gesture, user permission EKR: Permission to read back from canvas? Pete: Couple this to other permissions, like writing to disk EKR: I think we'll be constantly be bothering the user with permission prompts. Pete: Could also be on user gesture. EKR: Users can be tricked to make a gesture. Pete: Yes, but at least it would take this out of the common path 3. No worse than status quo Example: Communication with third party servers: "no worse than an image tag". Example: Client Hints. These values uniquely identify a nontrivial number of users. "What's the harm? Cookies allow this anyway." When you're in a hole, stop digging! 4. Other issues "We know this introduces a privacy harm, but we'll make a new standard later to fix the problem." Formalizing bad practices. Then bad actors just use both the new and the old version. Consider the harm to the user, not just the harm to site owners or the average user. Adding anything new to the platform adds a privacy harm! Just because you can't see the harm doesn't mean it isn't there. Chris Wood: You mention cases where standards could be improved to help user privacy. What tangible steps could the IRTF or the IETF take to move in that direction? Peter: At the W3C we have "horizontal review groups" like PING, Accessibility that at least have the option to give input on other standards. I think that kind of formal process would help concerned actors get involved. Stephane Bortzmeyer: Most of your examples were from W3C. Beside Client Hints do you have examples of IETF standards that have this problem? In theory the Security Considerations should include any privacy problems. Pete: I don't have any specific examples, but W3C standards also have these sections. I think this is just step 1 of a 10-step road. Ted Lemon: One of the patterns I've seen is making a list of things that a specific site is allowed to do, i.e. a set of entitlements. The problem with slowing down the advancement of progress is that it's very difficult to do because somebody wants the new feature, and there's no way to restrict access to the new feature when the user doesn't want it. Do we need a framework for restricting this access? Pete: Could be user gestures, permission prompts, hardcoded list of safe sites, Ted: It sounds like you agree with me, that we need a way to make that happen. Gary (??) Qualcomm: When I was chair of the geolocation group at W3C we discussed the topic of whether a web service could declare their intentions to the user, but we couldn't see a way to do it without it being abused by rogue parties. But now with Certificate Transparency and similar tools I wonder if we could apply some kind of site-specific restrictions. Pete: Geolocation is an example of something that actually doesn't cause a lot of harm, because users understand what this means and they say no (or yes). I'm concerned about the 100 other things. --------------------- Sandra Siby, PhD student at EPFL Traffic analysis of encrypted DNS Summary of DNS-over-TLS and DNS-over-HTTPS Scenario: Adversary observes stub-recursive DoT or DoH traffic. A webpage may load static content and ads from a variety of domains, producing a set of queries whose size and timing are observable by the adversary. The set of queries and responses related to loading a webpage could act as a fingerprint for that webpage. Monitoring: From a closed world of 1500 webpages, a trained classifier achieved ~90% precision and recall From an open world of 5000 webpages (1% "monitored"), the classifier got ~70% precision and recall Censorship: Adversary has to look at partial DoH traces, because they must reach a decision and act before the page finishes loading.. Generally the 4th TLS record usually corresponds to the first DoH query. This leaks the length. Collateral damage would be high if using this to block. If the censor waits until the 15th record they can generally identify the page with higher confidence, but by this time the page has mostly loaded. Robustness: How often does the adversary have to retrain? What is the effect of client location, Google vs Cloudflare DoH, Firefox vs. Cloudflare stub resolver? These changes reduce the effectiveness but do not eliminate it. Countermeasures: We implemented padding to 128 in the Cloudflare client, and contacted Cloudflare. Cloudflare implemented padding to multiples of 128 (not 468) on their responses. The anonymity set for a given page is not the entire set of webpages, only the other pages within a cluster. Wes Hardaker: Is your ToR protection DoH over Tor? Sandra: No, it is regular DNS, over Tor. Stephen Farrell: How does constant padding work? Sandra: We simulate constant size by increasing the sizes in our trace to all be the maximum size. Stephen: Why do you think that makes such a difference compared to 468? Sandra: There is still variability in the sizes with 468, and this is a major feature for the classifier Stephen: Mostly response sizes? Sandra: Both query and response sizes. But the responses are more variable so they do have a higher impact. Stephen: Do you know what causes the variability? Is it DNSSEC? Sandra: No, we haven't investigated the content. Stephen: Presumably someone is using a really long query name. Don't do that? DKG: I am the guilty party for proposing 468, specifically to provoke this research, so thank you. We only investigated individual queries and responses, so I'm glad that you investigated multiples queries and responses. Sandra: In a trace, we investigated whether each query was from client to resolver or resolver to client, so even without sizes we also had the directionality. DKG: Also number of queries and cadence. Sandra: Yes DKG: Why do you think DNS over TLS was markedly better than DoH? Sandra: We saw that there was much less variability in sizes. I wonder if it was because of configuration messages? DKG: Like an OPTIONS frame? Sandra: Yes, something like that. ??? NLNet Labs: There's a hypothesis in DoH that if you mix DoH with standard web traffic, that would obfuscate the traffic a lot more. Did you think about that? Sandra: Yes, but right now we consider them as separate, but if you mix them that would affect the results. Wes Hardaker: Is your dataset available? Sandra: Not at the moment, but we are planning to make it available. Christian Huitema: Exactly how do you measure the length of the queries and responses? Sandra: We take the sizes of APPLICATION_DATA TLS records. Christian: So you assume that there is a direct mapping between the TLS record sizes and query sizes. Sandra: We did decrypt them and verify that there was a close correlation, which is plotted in the paper. Christian: So what if the client bundled queries into a single TLS record? Would that change your results? Sandra: I think it might, especially if the client was trained on traces where individual queries are in separate TLS records. Jeffrey Yasskin: Do you have any sense on how precision and recall scale as you expand the universe of sites to the size of the internet? Sandra: I assume it would go down, but I don't know by how much. Sara Dickinson: Have you considered Oblivious DNS? Sandra: Oblivious DNS has a slightly different adversary model, where the recursive resolver cannot map the client and the query. We are not looking at on-path traffic. ???: Do you have any thoughts on whether an adversary could use this for censorship? Sandra: Yes, we did analyze this. ???: Are there extra features that could help like inter-packet arrival times? Sandra: We did consider this in our initial analysis, and it improved precision/recall but not by much, and using timing as a feature is also hard because it depends on the location of the adversary. That's why we decided not to use timing-based features. Our initial analysis showed that it helped, but only by a little bit. -------------------------- Chris Wood The IETF is plugging holes that reveal what the client is actually up to on the internet. In this work, we're assuming that things that should obviously be encrypted (like DNS and SNI) are encrypted. Results: Many (especially older) servers have a unique IP address, and many websites have a very small anonymity set, of 1-2 domains. What if you also consider a stronger adversary who also observes all the ciphertext traffic patterns? For simplicity, we just look at the total traffic. The anonymity set size shifts to the left, i.e. uniqueness goes up as expected. DoH, DoT, and ESNI are great, but perhaps there is more we can do to reach the goal. Other notes: Happy Eyeballs is great for performance but might make things easier for the adversary. DNS-based load balancing makes individual IP addresses less indicative of a given service, although it's possible that they could be reduced to an ASN to have a more useful classification signal. Website fingerprinting is becoming harder thanks to the work being done at the IETF. Finding the right tradeoff is difficult. Dave Plonka: I've been working on anonymity sets on the client side. It seems very different, and complementary to the work in MAPRG. Riad Whaby: It looks like anonymity set of 2 is 100x smaller than anonymity set of 1. So 95% of websites can be identified? Chris: Yes, it was a large number. ???: Are you saying that load balancers are an adversary? Chris: No, but load balancers mean that the server IP will change. Nick Sullivan: Have you thought about active adversaries? Chris: No, not yet. ---------------------- Roland van Rijswijk-Deij Focus is mostly on privacy of traffic in flight. What about the DNS resolver? They have access to all your traffic, perhaps legitimately. Too easy to say "they shouldn't do this". Can we find a better way that still provides some protection? [Slides contain all information] Planning to offer this feature in Unbound. More privacy-preserving than just recording all queries. Dave Plonka: Are you comfortable giving out your full bloom filter to the NCSC? I can't believe they'd give you their bloom filter back. Roland: What they are sharing with the NCSC is that they saw some threat. Dave: Don't you have to take the coincidence of the bloom filters to show evidence of the attacks? Who's finding the intersection? Roland: The network operator participates in the national detection network and does detection on their own. You could have a distributed model with a query API but we don't have that. I wouldn't advocate giving the bloom filter to NCSC, but maybe with a university researcher under some conditions. ??? UK NCSC: I would be concerned about giving a leg up to a threat model if you shared the bloom filter widely. Roland: When I say "sharing with third parties" it should have an asterisk "under certain conditions". You do have a point; you want to have certain safeguards in place. It also depends on the network where you collect the information. SURFNET is a research network, and one of the goals is to do research on the network. It has a data sharing policy explaining under which conditions the data can be shared within the SURFNET constituency or around the world. Their privacy officer suggested that this would not subject to GDPR. ----------------------- Fernando Gont, Numeric Identifiers draft-gont-numeric-ids-history: Reviews why we keep running into the same problems, sometimes even with the same identifier in different protocols Chair: We are considering adopting -ids-history and -ids-generation in PEARG. Chair: How many people have read these documents? (Judged "A few") EKR: What is the research content? Fernando: Identifying the reasons for ... [see jabber] ------------------------ Pluggable Transports David Oliver, Guardian Project. [Slides contain all information] ------------------------ Methods of Censorship Joe Hall, CDT [Most info in slides] Joe: The research content is a "review article". EKR: How often do things change? Every two years? Every two months? I'm ambivalent on the non-technical content question. I think you should call it "censorship". Mallory: It seems like in the charter PEARG is interested in threats, and censorship and privacy are two sides of the same coin, so I think it should be in charter. Wendy Seltzer: I think the nontechnical forms are a valuable addition. Thinking about that broad piece of the threat model is helpful to understanding how effective anti-censorship mechanisms are. Stephane Bortzmeyer: It's difficult to keep this up to date. I think we should not have mitigations, both because it can get out of date and because it can have bad consequences, such as revealing that you are trying to get around the censorship. And I think we should call it "censorship", even if the censors don't want to be called censors. ???: I'm not sure how much beyond RFC 6970 we'll be able to contribute. I don't think there is a need to change the wording. However, the title could be updated. Chairs: Hum for adoption. Sounds like positive support, but we'll confirm on the list. For now we'll treat it as a normal document, while we consider the "living document" question.