PEARG 104 NOTES

---

Shivan:

Does anyone have any comments on the agenda?

First talk is by Iain. https://datatracker.ietf.org/doc/slides-104-pearg-iain-safe-measurementpdf/01/

Iain: from the Tor Project; first IETF event.

[talking on use cases of TOR]

[Tor metrics philosophy: public, non sensitive data, guided by the Tor research safety board], key safety principles: data minimisation, source aggregation and transparency

“Everything is open source, design documents are published”. [use of simulations and test-beds] “Before a Tor client connects to the network, it needs to have a view of the network”. 

[speaking on different systems of measurement, including Privcount, Rappor, Prochlo]

“I’m hoping to generalise all of this in the draft […] so compliance becomes easier” [..] “Happy to take any questions, feedback or suggestions”

[questions]

Question 1: Are the metrics reasonably accurate?

[…]

Question 2: I think it’s a good piece of work to continue. This is just positive feedback basically. To what extent are you trying to reach out to get input from others on large scale measurements?

Iain: All feedback is welcome. There are different classes of measurement, such as active and passive. There is an issue about keeping a “do not scan list” as well.

Question 2: I think that is a good topic to cover.

Question 3: Can you comment on differences between Prio and Privcount?

Iain: I don’t know much about the Prio system so I cannot comment.

[=]

Remote presentation by Ryan Guest: https://datatracker.ietf.org/doc/slides-104-pearg-ryan-log-data-privacy/00/

Techniques for identifying personal data: “To identify personal data, we have two techniques: dict. or location identifier based (such as US states)”

We have some unique formats for our domain: user IDs, AWS keys, etc., which can be searched for monitoring our logs. [speaking on method for data deletion] There is certain classes of data which we don’t want to see at all, SSNs, credit card numbers, etc. We drop it at the start.

[speaking on masking, aggregation, generalisation, categorisation, tokenisation, differential privacy and encryption]

There are some really interesting properties for enforcing user privacy.

[questions]

Shivan: Does Salesforce plan to publish any of this work in an academic paper or research report? 
Answer: We’d like to. We’re thinking of the best way to do that, either to open source tools, or more of an academic white paper setting.

Q1: Ben: You mentioned some bits about data encryption and key management, do you have anything to say on having keys available to different services, or different levels of services

Answer: A log may have service specific encrypted values, or cipher text which is encrypted with user specific values, access control is on those, we use it on a variety of things. We have a system which dictates who can access your data.

Q2: Pallavi from Salesforce: I had a question about identifying customer data. Are you following REGEX or GNOME type of searches? I think that would be of interest since those need t one tweaked on a regular basis. Because if identification doesn’t happen then something might slip through the cracks. Is there any standardisation for that?

Answer: We haven’t standardised it per se. We make assumptions about things that are specific to our organisation. Our teams are trying to figure out the right balance, including through ML, there have been problems with false positives. Sometimes browser headers look like IP addresses.

Q3: How do maximise the utility versus privacy tradeoff?

Answer: We parameterise this, so each specific use case can dictate what level of anonymity and privacy should be there.

Q3: to what extent do general techniques overlap with the previously presented draft?

Answer: There is a lot of overlap. There are a lot of people doing different things. Feedback is always appreciated.

[=]

Nick: https://datatracker.ietf.org/doc/slides-104-pearg-slides-irtf104-pearg-privacy-pass/00/

Talking about privacy pass. High level overview: it is a lightweight zero knowledge protocol.

[lowdown on Cloudflare reverse proxy] In order to reduce malicious activity, like SPAM or malicious payloads, there are several different techniques, such as user challenges (captcha). This disproportionately affects those that are privacy conscious, including VPN/Tor users. It is difficult to distinguish bad/good when IP reputation is the only thing taken into account. “There are around 11 million websites which use Cloudflare, so the problem is quite big.

We want to reduce this problem.” How to solve a challenge and get back some currency or proof or token which can be further used. [speaking on Chaum’s Ecash, 1983] “The flow is: first issuance and the second is redemption (or spending it)”

[speaking on OPRF, VRF, fundamental components/terminology and scenarios]

“Privacy pass has been released as a Firefox and Chrome extension, with about 50,000 daily active users and trillions of requests every week. This work is in public domain. As a way forward, we are working to integrate Privacy Pass with more CAPTCHA providers.” (V)OPRF proposal submitted to CFRG.

[questions]

Question 1: Wes from ISI: Fascinating work, I really like the intent and goals behind it. Couple of questions: What is the percentage of CPU increase to do the level of math associated with this?

Answer: It’s cheaper than TLS cache. With RSA it was slighty more expensive.

Question 1: It sounds like there is a limited number of tokens that can be handed to clients

Answer: Correct. Not an infinite number of tokens. Decisions for parameters are dependent on use case. You can modify the code to do up to a 100, but depends on use case.

Question 1: There is no reason why the client couldn’t share their cookies with someone else, right?

Answer: Correct. There has to be some type of double-spend protection. Someone could do farming and use it to bypass captchas on a larger scale. Key rotation may be a way to reduce that.

Question 1: You talked about double spending on server-side, is there any double-spend d

Answer: Essentially, you have to keep the double spend strike register as long as the lifetime of the server’s private key. Key rotation is the way to reduce that, but you may get into a little bit of a chicken and the egg problem.

Question 2: Is there plans for a specification for privacy pass for later consumption or standardisation?

Answer: Yes, potentially. In terms of HTTP, there is nothing that would prevent standardisation. We are first exploring federation, experimenting with other captcha providers. This is potentially generalisable to a lot of things.

[=]

Amelia Andersdotter (in-person) and Christoffer Langstrom (Remote): https://datatracker.ietf.org/doc/slides-104-pearg-amelia-christoffer-differential-privacy/00/

[presentation on Differential Privacy] This will be a very high level view of differential privacy. Presentation will be in approximately three parts: what is the aim, what are some methods for achieving the aims, how could this be applied in IETF standardisation?

“Differential privacy is a way of remedying specific threats to privacy, like identifiability (which is mentioned in RFC 6973. The overall aim of differential privacy is to provide users with plausible deniability. A user should be able to deny inclusion in any database”

Christoffer: Epsilon Delta description of Differential Privacy. Idea is to make sure that any one person in a dataset is not overly exposed to risks.

[describing the math behind Differential Privacy] “Differential privacy allows for quantification of the degree to which privacy is preserved.” [speaking on methods to apply Differential Privacy] “Perturb the answer to a query; we have a dataset, and someone queries, so we compute the true answer and then add a bit of noise to it, then we return the noisy answer for the query. In this case, the answer is known to be roughly correct, but it is not known whether > or <.
Drawbacks to adding noise: Statistical estimator quality is worsened. Sample sizes will be larger. If we allow indefinite queries - they may be able to infer the original answer. We will then require a privacy budget, re: how many times someone can query for a particular answer. Need to trust database maintainer. 

Amelia: There is a second method which you can use, is to perturb the measurement. Make sure that everything which goes into the database is already perturbed when it gets there. There are a few common methods for this: removing or obfuscating identifiers, swapping data and randomising responses. The same drawbacks of the previous method also apply. Need to trust the entity which makes the measurements. “Very specific case of protecting the identity of an originating individual for a particular piece of data in a data set. Security is still important with this.”

“The challenge for IETF is that differential privacy mostly applies to APIs. Nevertheless, there are some ideas, e.g., protocols which provide predictably false data. Have a client or serve provide a false answer at a predictable rate.”

[questions]

Question 1: Ben Kadek: WRT introducing random or false data, the key insight required seems to be: what random distribution to use?

Answer: Amelia: In the QUIC spin bit case, either you spin the bit or you don’t […] taking a true stream of data and masking it sometimes to not provide the true data.

Question 2, Dave Wheeler: I am glad you had this talk. I had some ideas on differential privacy if you could help clear them up. From some cryptographic reading, I read that anonymisation could be undone with large enough datasets.

Answer: Amelia: Correct. Diff Privacy is a statistical method, so what you’re talking about is similar to wha Christoffer mentioned earlier (privacy budget). Diff Privacy can provide repudiative properties for a single individual’s inclusion in a dataset. It is not a catch-all to privacy problems.

Question 3: This may be naive, I apologise. When you are introducing noise into a system, how do you ensure that your noise will affect each item differently?

Answer: Amelia: Privacy budget limitation applies here as well.

Question 4: I wanted to remark on your point about APIs. I think we have been a bit narrow in this community about what is an API. There is no good that one end of a protocol connecting to another protocol cannot be thought of exactly that as well.

Answer: Christoffer: I agree.

[=]

Martin Schanzenbach (talk on re:claimID) https://datatracker.ietf.org/doc/slides-104-pearg-gnunet-reclaimid-self-sovereign-decentralized-identity-management-using-secure-name-systems-martin-schanzenbach-fraunhofer-aisec/00/

We took a look at the identity provider market. There are several issues, such as: privacy concerns (companies provide free services & want to make money), liability risks (data loss from breaches = excessive legal implications), oligopoly (lack of federation). “Primary objective is to enable users to exercise the right to digital self-determination”. Our approach includes 1) avoiding third party services for ID management/data sharing 2) open, free services that are not under centralised control, 3) free software.

“What does an IdP do?” 1) ID provisioning and access control (including management of IDs and data, sharing of such data, enforcing auth decisions). 2) Verification of ID (e.g., this is indeed Alice’s email address, or this is indeed Bob’s country of residence). re:claim is a decentralised directory service: secure name system with open registration, idea borrowed from NameID, implementation uses the GNU name system. We added a cryptographic access control layer, using attribute base encryption.

[demonstration of example]

“In summary, we have implemented this idea as part of GNUnet, there is a proof of concept and demo on gitlab. It is currently a bit rough around the edges”

[questions]

Question 1 (Alex): [..]

Answer: You always know which identity it is. Because you are looking it up in the name space. However, one could argue that using privacy preserving credentials does not make much sense as you are always identifiable as a single identity.

Question 2: I’ve worked on a similar project, except we used DNS. The problem is that there are a lot of people trying to do this thing. However, in the real world only two companies which have primary stake. How do we get adoption? Do you have any reflections on strategies for adoption?

Answer: If you offer the software to users then show the benefits in terms of the privacy offered, I think adoption would increase.

Question 3 ([..] from Cloudflare): Question on SSO, re: economy on getting website to integrate.

Answer: I don’t have a solution for this. I think the first part is to get users to want this.

Question 4: In cases where federation actually works, it is because parties want info about users and users want to share that info. In some cases, in order to make progress on privacy, UX is the area to focus, and not technology. I suspect that all of these projects should stop. Should instead focus on getting users to want to use UX.

[…]

[=]

Presentation by Brook on Next Generation Internet https://datatracker.ietf.org/doc/slides-104-pearg-brook-ngi/00/

The internet is not serving what we hoped it would serve. It doesn’t always do what we want it to do. How do we create a next generation, user-centric internet? [speaking on open calls, grants for Next Generation Internet] Distributed data, privacy and trust enhancing technologies, service portability, data decoupling and strengthening of internetwork trust. […] NLNet is looking to offer grants which range from 5,000 pounds to 50,000 pounds. Barrier of entry is high. NGI also has several open calls. Most deadlines are in April (1st/30th).

Question 1: Is this a EU program? Are there any restrictions on who can apply?

Answer: You do not have to be a EU resident, but that would make it easier. The proposal should be beneficial for EU.

Shivan: That concludes our meeting. Thank you.