Skip to main content

Minutes IETF104: pearg
minutes-104-pearg-00

Meeting Minutes Privacy Enhancements and Assessments Research Group (pearg) RG
Date and time 2019-03-25 08:00
Title Minutes IETF104: pearg
State Active
Other versions plain text
Last updated 2019-03-26

minutes-104-pearg-00
PEARG 104 NOTES

---

Shivan:

Does anyone have any comments on the agenda?

First talk is by Iain.
https://datatracker.ietf.org/doc/slides-104-pearg-iain-safe-measurementpdf/01/

Iain: from the Tor Project; first IETF event.

[talking on use cases of TOR]

[Tor metrics philosophy: public, non sensitive data, guided by the Tor research
safety board], key safety principles: data minimisation, source aggregation and
transparency

“Everything is open source, design documents are published”. [use of
simulations and test-beds] “Before a Tor client connects to the network, it
needs to have a view of the network”.

[speaking on different systems of measurement, including Privcount, Rappor,
Prochlo]

“I’m hoping to generalise all of this in the draft […] so compliance becomes
easier” [..] “Happy to take any questions, feedback or suggestions”

[questions]

Question 1: Are the metrics reasonably accurate?

[…]

Question 2: I think it’s a good piece of work to continue. This is just
positive feedback basically. To what extent are you trying to reach out to get
input from others on large scale measurements?

Iain: All feedback is welcome. There are different classes of measurement, such
as active and passive. There is an issue about keeping a “do not scan list” as
well.

Question 2: I think that is a good topic to cover.

Question 3: Can you comment on differences between Prio and Privcount?

Iain: I don’t know much about the Prio system so I cannot comment.

[=]

Remote presentation by Ryan Guest:
https://datatracker.ietf.org/doc/slides-104-pearg-ryan-log-data-privacy/00/

Techniques for identifying personal data: “To identify personal data, we have
two techniques: dict. or location identifier based (such as US states)”

We have some unique formats for our domain: user IDs, AWS keys, etc., which can
be searched for monitoring our logs. [speaking on method for data deletion]
There is certain classes of data which we don’t want to see at all, SSNs,
credit card numbers, etc. We drop it at the start.

[speaking on masking, aggregation, generalisation, categorisation,
tokenisation, differential privacy and encryption]

There are some really interesting properties for enforcing user privacy.

[questions]

Shivan: Does Salesforce plan to publish any of this work in an academic paper
or research report? Answer: We’d like to. We’re thinking of the best way to do
that, either to open source tools, or more of an academic white paper setting.

Q1: Ben: You mentioned some bits about data encryption and key management, do
you have anything to say on having keys available to different services, or
different levels of services

Answer: A log may have service specific encrypted values, or cipher text which
is encrypted with user specific values, access control is on those, we use it
on a variety of things. We have a system which dictates who can access your
data.

Q2: Pallavi from Salesforce: I had a question about identifying customer data.
Are you following REGEX or GNOME type of searches? I think that would be of
interest since those need t one tweaked on a regular basis. Because if
identification doesn’t happen then something might slip through the cracks. Is
there any standardisation for that?

Answer: We haven’t standardised it per se. We make assumptions about things
that are specific to our organisation. Our teams are trying to figure out the
right balance, including through ML, there have been problems with false
positives. Sometimes browser headers look like IP addresses.

Q3: How do maximise the utility versus privacy tradeoff?

Answer: We parameterise this, so each specific use case can dictate what level
of anonymity and privacy should be there.

Q3: to what extent do general techniques overlap with the previously presented
draft?

Answer: There is a lot of overlap. There are a lot of people doing different
things. Feedback is always appreciated.

[=]

Nick:
https://datatracker.ietf.org/doc/slides-104-pearg-slides-irtf104-pearg-privacy-pass/00/

Talking about privacy pass. High level overview: it is a lightweight zero
knowledge protocol.

[lowdown on Cloudflare reverse proxy] In order to reduce malicious activity,
like SPAM or malicious payloads, there are several different techniques, such
as user challenges (captcha). This disproportionately affects those that are
privacy conscious, including VPN/Tor users. It is difficult to distinguish
bad/good when IP reputation is the only thing taken into account. “There are
around 11 million websites which use Cloudflare, so the problem is quite big.

We want to reduce this problem.” How to solve a challenge and get back some
currency or proof or token which can be further used. [speaking on Chaum’s
Ecash, 1983] “The flow is: first issuance and the second is redemption (or
spending it)”

[speaking on OPRF, VRF, fundamental components/terminology and scenarios]

“Privacy pass has been released as a Firefox and Chrome extension, with about
50,000 daily active users and trillions of requests every week. This work is in
public domain. As a way forward, we are working to integrate Privacy Pass with
more CAPTCHA providers.” (V)OPRF proposal submitted to CFRG.

[questions]

Question 1: Wes from ISI: Fascinating work, I really like the intent and goals
behind it. Couple of questions: What is the percentage of CPU increase to do
the level of math associated with this?

Answer: It’s cheaper than TLS cache. With RSA it was slighty more expensive.

Question 1: It sounds like there is a limited number of tokens that can be
handed to clients

Answer: Correct. Not an infinite number of tokens. Decisions for parameters are
dependent on use case. You can modify the code to do up to a 100, but depends
on use case.

Question 1: There is no reason why the client couldn’t share their cookies with
someone else, right?

Answer: Correct. There has to be some type of double-spend protection. Someone
could do farming and use it to bypass captchas on a larger scale. Key rotation
may be a way to reduce that.

Question 1: You talked about double spending on server-side, is there any
double-spend d

Answer: Essentially, you have to keep the double spend strike register as long
as the lifetime of the server’s private key. Key rotation is the way to reduce
that, but you may get into a little bit of a chicken and the egg problem.

Question 2: Is there plans for a specification for privacy pass for later
consumption or standardisation?

Answer: Yes, potentially. In terms of HTTP, there is nothing that would prevent
standardisation. We are first exploring federation, experimenting with other
captcha providers. This is potentially generalisable to a lot of things.

[=]

Amelia Andersdotter (in-person) and Christoffer Langstrom (Remote):
https://datatracker.ietf.org/doc/slides-104-pearg-amelia-christoffer-differential-privacy/00/

[presentation on Differential Privacy] This will be a very high level view of
differential privacy. Presentation will be in approximately three parts: what
is the aim, what are some methods for achieving the aims, how could this be
applied in IETF standardisation?

“Differential privacy is a way of remedying specific threats to privacy, like
identifiability (which is mentioned in RFC 6973. The overall aim of
differential privacy is to provide users with plausible deniability. A user
should be able to deny inclusion in any database”

Christoffer: Epsilon Delta description of Differential Privacy. Idea is to make
sure that any one person in a dataset is not overly exposed to risks.

[describing the math behind Differential Privacy] “Differential privacy allows
for quantification of the degree to which privacy is preserved.” [speaking on
methods to apply Differential Privacy] “Perturb the answer to a query; we have
a dataset, and someone queries, so we compute the true answer and then add a
bit of noise to it, then we return the noisy answer for the query. In this
case, the answer is known to be roughly correct, but it is not known whether >
or <. Drawbacks to adding noise: Statistical estimator quality is worsened.
Sample sizes will be larger. If we allow indefinite queries - they may be able
to infer the original answer. We will then require a privacy budget, re: how
many times someone can query for a particular answer. Need to trust database
maintainer.

Amelia: There is a second method which you can use, is to perturb the
measurement. Make sure that everything which goes into the database is already
perturbed when it gets there. There are a few common methods for this: removing
or obfuscating identifiers, swapping data and randomising responses. The same
drawbacks of the previous method also apply. Need to trust the entity which
makes the measurements. “Very specific case of protecting the identity of an
originating individual for a particular piece of data in a data set. Security
is still important with this.”

“The challenge for IETF is that differential privacy mostly applies to APIs.
Nevertheless, there are some ideas, e.g., protocols which provide predictably
false data. Have a client or serve provide a false answer at a predictable
rate.”

[questions]

Question 1: Ben Kadek: WRT introducing random or false data, the key insight
required seems to be: what random distribution to use?

Answer: Amelia: In the QUIC spin bit case, either you spin the bit or you don’t
[…] taking a true stream of data and masking it sometimes to not provide the
true data.

Question 2, Dave Wheeler: I am glad you had this talk. I had some ideas on
differential privacy if you could help clear them up. From some cryptographic
reading, I read that anonymisation could be undone with large enough datasets.

Answer: Amelia: Correct. Diff Privacy is a statistical method, so what you’re
talking about is similar to wha Christoffer mentioned earlier (privacy budget).
Diff Privacy can provide repudiative properties for a single individual’s
inclusion in a dataset. It is not a catch-all to privacy problems.

Question 3: This may be naive, I apologise. When you are introducing noise into
a system, how do you ensure that your noise will affect each item differently?

Answer: Amelia: Privacy budget limitation applies here as well.

Question 4: I wanted to remark on your point about APIs. I think we have been a
bit narrow in this community about what is an API. There is no good that one
end of a protocol connecting to another protocol cannot be thought of exactly
that as well.

Answer: Christoffer: I agree.

[=]

Martin Schanzenbach (talk on re:claimID)
https://datatracker.ietf.org/doc/slides-104-pearg-gnunet-reclaimid-self-sovereign-decentralized-identity-management-using-secure-name-systems-martin-schanzenbach-fraunhofer-aisec/00/

We took a look at the identity provider market. There are several issues, such
as: privacy concerns (companies provide free services & want to make money),
liability risks (data loss from breaches = excessive legal implications),
oligopoly (lack of federation). “Primary objective is to enable users to
exercise the right to digital self-determination”. Our approach includes 1)
avoiding third party services for ID management/data sharing 2) open, free
services that are not under centralised control, 3) free software.

“What does an IdP do?” 1) ID provisioning and access control (including
management of IDs and data, sharing of such data, enforcing auth decisions). 2)
Verification of ID (e.g., this is indeed Alice’s email address, or this is
indeed Bob’s country of residence). re:claim is a decentralised directory
service: secure name system with open registration, idea borrowed from NameID,
implementation uses the GNU name system. We added a cryptographic access
control layer, using attribute base encryption.

[demonstration of example]

“In summary, we have implemented this idea as part of GNUnet, there is a proof
of concept and demo on gitlab. It is currently a bit rough around the edges”

[questions]

Question 1 (Alex): [..]

Answer: You always know which identity it is. Because you are looking it up in
the name space. However, one could argue that using privacy preserving
credentials does not make much sense as you are always identifiable as a single
identity.

Question 2: I’ve worked on a similar project, except we used DNS. The problem
is that there are a lot of people trying to do this thing. However, in the real
world only two companies which have primary stake. How do we get adoption? Do
you have any reflections on strategies for adoption?

Answer: If you offer the software to users then show the benefits in terms of
the privacy offered, I think adoption would increase.

Question 3 ([..] from Cloudflare): Question on SSO, re: economy on getting
website to integrate.

Answer: I don’t have a solution for this. I think the first part is to get
users to want this.

Question 4: In cases where federation actually works, it is because parties
want info about users and users want to share that info. In some cases, in
order to make progress on privacy, UX is the area to focus, and not technology.
I suspect that all of these projects should stop. Should instead focus on
getting users to want to use UX.

[…]

[=]

Presentation by Brook on Next Generation Internet
https://datatracker.ietf.org/doc/slides-104-pearg-brook-ngi/00/

The internet is not serving what we hoped it would serve. It doesn’t always do
what we want it to do. How do we create a next generation, user-centric
internet? [speaking on open calls, grants for Next Generation Internet]
Distributed data, privacy and trust enhancing technologies, service
portability, data decoupling and strengthening of internetwork trust. […] NLNet
is looking to offer grants which range from 5,000 pounds to 50,000 pounds.
Barrier of entry is high. NGI also has several open calls. Most deadlines are
in April (1st/30th).

Question 1: Is this a EU program? Are there any restrictions on who can apply?

Answer: You do not have to be a EU resident, but that would make it easier. The
proposal should be beneficial for EU.

Shivan: That concludes our meeting. Thank you.