Skip to main content

Minutes IETF105: pearg
minutes-105-pearg-00

Meeting Minutes Privacy Enhancements and Assessments Research Group (pearg) RG
Date and time 2019-07-24 17:30
Title Minutes IETF105: pearg
State Active
Other versions plain text
Last updated 2019-07-28

minutes-105-pearg-00
1. Privacy Standards and Anti-Standards
Pete Snyder presenting
Chair of W3C PING

Standards as a privacy-focused implementor (Brave)
With respect to the W3C DOM APIs, etc. Brave violates standards in order to
improve user privacy. Example: AudioContext.  To reduce fingerprinting, Brave
nulls out" a lot of things that the browser can provide, like hardware
capabilities.  (Only in third party contexts.) Many other cases with a similar
behavior.

Common problems:
1. Standards are extremely specific about what has to be implemented, but not
at all specific about how to mitigate the resulting privacy concerns. W3C
standards often have a privacy section, but it's not normative and doesn't
provide mitigations, just a list of concerns. Websites assume the standard
behavior, so mitigations break things. Example: Referrer.  There are privacy
problems, but the Privacy Considerations in the spec just say "user agents can
do whatever they want".  But websites rely on this, so if the User-Agent
actually

EKR: What would you like this text to say?
Pete: Don't expect this in third-party context, Referrer should only be sent on
user gesture, etc. EKR: I think you have an optimistic view of whether web
authors read specifications.  They just use whatever Chrome does. Pete: They
use derived outputs from the specification. EKR: (disagree) Pete: Chrome is
following the specification, so we need to work on the specification EKR: We
can take this offline

2. Uncommon use cases
Example: Canvas lets you read things out of canvas, not just write to the
canvas.  This is very uncommon for ordinary canvas users, but widely used for
fingerprinting canvas behaviors. EKR: How would you restrict this. Pete: User
gesture, user permission EKR: Permission to read back from canvas? Pete: Couple
this to other permissions, like writing to disk EKR: I think we'll be
constantly be bothering the user with permission prompts. Pete: Could also be
on user gesture. EKR: Users can be tricked to make a gesture. Pete: Yes, but at
least it would take this out of the common path

3. No worse than status quo
Example: Communication with third party servers: "no worse than an image tag".
Example: Client Hints.  These values uniquely identify a nontrivial number of
users.  "What's the harm?  Cookies allow this anyway."

When you're in a hole, stop digging!

4. Other issues
"We know this introduces a privacy harm, but we'll make a new standard later to
fix the problem." Formalizing bad practices.  Then bad actors just use both the
new and the old version. Consider the harm to the user, not just the harm to
site owners or the average user.

Adding anything new to the platform adds a privacy harm!  Just because you
can't see the harm doesn't mean it isn't there.

Chris Wood: You mention cases where standards could be improved to help user
privacy.  What tangible steps could the IRTF or the IETF take to move in that
direction? Peter: At the W3C we have "horizontal review groups" like PING,
Accessibility that at least have the option to give input on other standards. 
I think that kind of formal process would help concerned actors get involved.

Stephane Bortzmeyer: Most of your examples were from W3C.  Beside Client Hints
do you have examples of IETF standards that have this problem?  In theory the
Security Considerations should include any privacy problems. Pete: I don't have
any specific examples, but W3C standards also have these sections.  I think
this is just step 1 of a 10-step road.

Ted Lemon: One of the patterns I've seen is making a list of things that a
specific site is allowed to do, i.e. a set of entitlements.  The problem with
slowing down the advancement of progress is that it's very difficult to do
because somebody wants the new feature, and there's no way to restrict access
to the new feature when the user doesn't want it.  Do we need a framework for
restricting this access? Pete: Could be user gestures, permission prompts,
hardcoded list of safe sites, Ted: It sounds like you agree with me, that we
need a way to make that happen.

Gary (??) Qualcomm: When I was chair of the geolocation group at W3C we
discussed the topic of whether a web service could declare their intentions to
the user, but we couldn't see a way to do it without it being abused by rogue
parties.  But now with Certificate Transparency and similar tools I wonder if
we could apply some kind of site-specific restrictions. Pete: Geolocation is an
example of something that actually doesn't cause a lot of harm, because users
understand what this means and they say no (or yes).  I'm concerned about the
100 other things.

---------------------
Sandra Siby, PhD student at EPFL
Traffic analysis of encrypted DNS

Summary of DNS-over-TLS and DNS-over-HTTPS
Scenario: Adversary observes stub-recursive DoT or DoH traffic.
A webpage may load static content and ads from a variety of domains, producing
a set of queries whose size and timing are observable by the adversary.  The
set of queries and responses related to loading a webpage could act as a
fingerprint for that webpage.

Monitoring:
From a closed world of 1500 webpages, a trained classifier achieved ~90%
precision and recall From an open world of 5000 webpages (1% "monitored"), the
classifier got ~70% precision and recall

Censorship:
Adversary has to look at partial DoH traces, because they must reach a decision
and act before the page finishes loading.. Generally the 4th TLS record usually
corresponds to the first DoH query.  This leaks the length.  Collateral damage
would be high if using this to block. If the censor waits until the 15th record
they can generally identify the page with higher confidence, but by this time
the page has mostly loaded.

Robustness:
How often does the adversary have to retrain?  What is the effect of client
location, Google vs Cloudflare DoH, Firefox vs. Cloudflare stub resolver? These
changes reduce the effectiveness but do not eliminate it.

Countermeasures:
We implemented padding to 128 in the Cloudflare client, and contacted
Cloudflare.  Cloudflare implemented padding to multiples of 128 (not 468) on
their responses.

The anonymity set for a given page is not the entire set of webpages, only the
other pages within a cluster.

Wes Hardaker: Is your ToR protection DoH over Tor?
Sandra: No, it is regular DNS, over Tor.

Stephen Farrell: How does constant padding work?
Sandra: We simulate constant size by increasing the sizes in our trace to all
be the maximum size. Stephen: Why do you think that makes such a difference
compared to 468? Sandra: There is still variability in the sizes with 468, and
this is a major feature for the classifier Stephen: Mostly response sizes?
Sandra: Both query and response sizes.  But the responses are more variable so
they do have a higher impact. Stephen: Do you know what causes the variability?
 Is it DNSSEC? Sandra: No, we haven't investigated the content. Stephen:
Presumably someone is using a really long query name.  Don't do that?

DKG: I am the guilty party for proposing 468, specifically to provoke this
research, so thank you.  We only investigated individual queries and responses,
so I'm glad that you investigated multiples queries and responses. Sandra: In a
trace, we investigated whether each query was from client to resolver or
resolver to client, so even without sizes we also had the directionality. DKG:
Also number of queries and cadence. Sandra: Yes DKG: Why do you think DNS over
TLS was markedly better than DoH? Sandra: We saw that there was much less
variability in sizes.  I wonder if it was because of configuration messages?
DKG: Like an OPTIONS frame? Sandra: Yes, something like that.

??? NLNet Labs: There's a hypothesis in DoH that if you mix DoH with standard
web traffic, that would obfuscate the traffic a lot more.  Did you think about
that? Sandra: Yes, but right now we consider them as separate, but if you mix
them that would affect the results.

Wes Hardaker: Is your dataset available?
Sandra: Not at the moment, but we are planning to make it available.

Christian Huitema: Exactly how do you measure the length of the queries and
responses? Sandra: We take the sizes of APPLICATION_DATA TLS records.
Christian: So you assume that there is a direct mapping between the TLS record
sizes and query sizes. Sandra: We did decrypt them and verify that there was a
close correlation, which is plotted in the paper. Christian: So what if the
client bundled queries into a single TLS record?  Would that change your
results? Sandra: I think it might, especially if the client was trained on
traces where individual queries are in separate TLS records.

Jeffrey Yasskin: Do you have any sense on how precision and recall scale as you
expand the universe of sites to the size of the internet? Sandra: I assume it
would go down, but I don't know by how much.

Sara Dickinson: Have you considered Oblivious DNS?
Sandra: Oblivious DNS has a slightly different adversary model, where the
recursive resolver cannot map the client and the query.  We are not looking at
on-path traffic.

???: Do you have any thoughts on whether an adversary could use this for
censorship? Sandra: Yes, we did analyze this.

???: Are there extra features that could help like inter-packet arrival times?
Sandra: We did consider this in our initial analysis, and it improved
precision/recall but not by much, and using timing as a feature is also hard
because it depends on the location of the adversary.  That's why we decided not
to use timing-based features.  Our initial analysis showed that it helped, but
only by a little bit.

--------------------------
Chris Wood

The IETF is plugging holes that reveal what the client is actually up to on the
internet.

In this work, we're assuming that things that should obviously be encrypted
(like DNS and SNI) are encrypted.

Results:
Many (especially older) servers have a unique IP address, and many websites
have a very small anonymity set, of 1-2 domains.

What if you also consider a stronger adversary who also observes all the
ciphertext traffic patterns?  For simplicity, we just look at the total
traffic. The anonymity set size shifts to the left, i.e. uniqueness goes up as
expected. DoH, DoT, and ESNI are great, but perhaps there is more we can do to
reach the goal.

Other notes:
Happy Eyeballs is great for performance but might make things easier for the
adversary. DNS-based load balancing makes individual IP addresses less
indicative of a given service, although it's possible that they could be
reduced to an ASN to have a more useful classification signal.

Website fingerprinting is becoming harder thanks to the work being done at the
IETF.

Finding the right tradeoff is difficult.

Dave Plonka: I've been working on anonymity sets on the client side.  It seems
very different, and complementary to the work in MAPRG.

Riad Whaby: It looks like anonymity set of 2 is 100x smaller than anonymity set
of 1.  So 95% of websites can be identified? Chris: Yes, it was a large number.

???: Are you saying that load balancers are an adversary?
Chris: No, but load balancers mean that the server IP will change.

Nick Sullivan: Have you thought about active adversaries?
Chris: No, not yet.

----------------------
Roland van Rijswijk-Deij

Focus is mostly on privacy of traffic in flight.  What about the DNS resolver? 
They have access to all your traffic, perhaps legitimately. Too easy to say
"they shouldn't do this".  Can we find a better way that still provides some
protection?

[Slides contain all information]

Planning to offer this feature in Unbound.
More privacy-preserving than just recording all queries.

Dave Plonka: Are you comfortable giving out your full bloom filter to the NCSC?
I can't believe they'd give you their bloom filter back. Roland: What they are
sharing with the NCSC is that they saw some threat. Dave: Don't you have to
take the coincidence of the bloom filters to show evidence of the attacks? 
Who's finding the intersection? Roland: The network operator participates in
the national detection network and does detection on their own.  You could have
a distributed model with a query API but we don't have that.  I wouldn't
advocate giving the bloom filter to NCSC, but maybe with a university
researcher under some conditions.

??? UK NCSC: I would be concerned about giving a leg up to a threat model if
you shared the bloom filter widely. Roland: When I say "sharing with third
parties" it should have an asterisk "under certain conditions".  You do have a
point; you want to have certain safeguards in place.  It also depends on the
network where you collect the information.  SURFNET is a research network, and
one of the goals is to do research on the network.  It has a data sharing
policy explaining under which conditions the data can be shared within the
SURFNET constituency or around the world.  Their privacy officer suggested that
this would not subject to GDPR.

-----------------------
Fernando Gont, Numeric Identifiers

draft-gont-numeric-ids-history:
Reviews why we keep running into the same problems, sometimes even with the
same identifier in different protocols

Chair: We are considering adopting -ids-history and -ids-generation in PEARG.
Chair: How many people have read these documents? (Judged "A few")

EKR: What is the research content?
Fernando: Identifying the reasons for ... [see jabber]

------------------------
Pluggable Transports
David Oliver, Guardian Project.

[Slides contain all information]

------------------------
Methods of Censorship
Joe Hall, CDT

[Most info in slides]

Joe: The research content is a "review article".

EKR: How often do things change?  Every two years?  Every two months?  I'm
ambivalent on the non-technical content question.  I think you should call it
"censorship".

Mallory: It seems like in the charter PEARG is interested in threats, and
censorship and privacy are two sides of the same coin, so I think it should be
in charter.

Wendy Seltzer: I think the nontechnical forms are a valuable addition. 
Thinking about that broad piece of the threat model is helpful to understanding
how effective anti-censorship mechanisms are.

Stephane Bortzmeyer: It's difficult to keep this up to date.  I think we should
not have mitigations, both because it can get out of date and because it can
have bad consequences, such as revealing that you are trying to get around the
censorship.  And I think we should call it "censorship", even if the censors
don't want to be called censors.

???: I'm not sure how much beyond RFC 6970 we'll be able to contribute.  I
don't think there is a need to change the wording.  However, the title could be
updated.

Chairs: Hum for adoption.  Sounds like positive support, but we'll confirm on
the list.  For now we'll treat it as a normal document, while we consider the
"living document" question.