IETF-118-PPM

[Note-taker Michael B]

Administrivia - Chairs

DAP Open Issues Review

Chris Patton.

Five issues all corresponding to the same underlying feature -
collecting a report multiple times with different aggregation
parameters. Deemed safe as long as meets the requirements of the VDAF
draft. No one has implemented this.

Simon Friedberger: Still want to support heavy hitters. I will implement
it some time next year unless someone does this first.
Tim Geoghegan: I will also implement this. Would like Heavy Hitters and
this feature to be left in.

Issue #519: How should batching strategies be handled? Supporting multiple query types adds complexity.

Tim Geoghegan: I proposed moving the time-interval query type. Don't
want to restate those arguments poorly, but you might need it to realign
data you get from DAP and other methods. It's not about removing
time-interval, or having something equivalent. I think it's about making
the helper agnostic to query types

Chris Patton: that's right - it's about removing it from the protocol
altogether. TG: I think you need to think about when the leader should
reject uploads based on time intervals. CP: I think we could do that.

Shan Wang: If we are removing the interval type, could we relax the
fixed size mechanism. CP: we could make it optional, or remove it, any
preference? SW: I think we could make it optional, if we don't care.

Simon Friedberger: Happy with performance improvements, but also happy
to do nothing at the moment. CP: Mozilla is using this? Daphne is using
this.

Tim Geoghegan: Simon's comment jogged my memory; if time internal gets
downgraded, then implementations still have to deal with maintaining the
code that does it. So could taking this out might not simplify
implementations if they still need it for particular purposes.

Ben Schwartz: Is there a simplification if a task commits to one query
type or another, rather than both? CP: It already does.

CP: I think people want to do nothing for now, and acknowledge that some
implementations may not implement some query types.

Issue #489: Supporting drill-down

Breaking down results by the aggregate result by arbitrary labels.
Something akin to SQL. If clients label reports, then it might allow
fingerprinting. Don't want to deanonymise clients with specific label.
But also want to enforce minimum batch size, to ensure a certain size of
anonymity set. This makes things a bit more complicated. We could add
labels to report metadata, at the risk of fingerprinting. Or, at CFRG we
will talk about 'mastic', which might allow this, but not be
sufficiently flexible.

Tim Geoghegan: if doing mastic, is that reflected in the aggregation
parameter? CP: We need to get the aggregation parameter right. TG:
That's true for a lot of properties. CP: It's not true for every type of
query pattern we might not

Shan Wang: Creating one task per label? CP: As Tim points out, it
doesn't scale well. This might be easy in task-prov, but not to spin up
a new task for every label you might have.

Chris Wood: we've faced the same problem in privacy pass. Don't see any
problem with the first option. (adding labels to metadata). It's up to
deployments to figure out the overall privacy story, and whether it
affects privacy. We should do both options.

Kunal Talwar: Without something like this, the client is assured that
their data isn't going to be subject to a differencing attack.

Ben Case: We have a use case where we would like some of these labels.
In most cases, it's not the fingerprinting that's the issue. Shifting
towards DP and DAP should solve these issues around reidentification
attacks. We would like to support one report having many labels, while
respecting DP. So +1 to this being useful functionality, and happy to
help think about the details.

TG: this could be very valuable functionality. But there's a lot of
sharp edges to this cryptography. But you need to consult with good
cryptographers to work this out. Not sure what the answer is.

Simon FriedBerger: if you want to do fingerprinting, then it's not that
important, because the leader can see IPs.

CP: If we go with labels on metadata, then there's still a lot of
thinking to do. We could do what Ben says, and relax that a bit. But
have to be careful.

Shan Wang: There are other ways to fingerprinting, but other
technologies can be used to mitigate. Labelling is intrinsic to the
whole aggregation process, making it harder to reduce fingerprinting.

Chris Wood: Can still do labelling through tasks. This is purely
aesthetic. CP: So we should implement the wire-change now, and use it if
we need to. People are out there who want to ship this very soon. CW:
People can ship this without the RFC, if they need to.

Going to a poll: "Should we add labels to the report metadata?" (Yes:
8; No: 9; No opinion: 34)

Kunal Talwar: It's not possible to simulate this through multiple tasks,
combinatorially.

TG: Might be moot, seeing the poll results. "Should we do plumbing in
order to do labels".

Sam Weiler: do any of the 'No' voters want to come forward and say more?

Shan Wang: I think this opens more possibilities for an attack. I don't
think we should put it in the standard until we are sure of the impact?

Chris Wood: Can we put in a separate draft, as an extension, so it
doesn't block the main draft?

CP: Yes. Would anyone who wants this be against putting this as a
report-extension.

Jonathan Hoyland: Does combining labels create more information than
just having multiple tasks? Labels can be bound together. CP: A report
is bound to a particular task. JH: So CW's suggestion that this is the
same as combining from multiple tasks, is not true.

Issue #500 Agreement on task parameters. A desirable property of DAP is that honest parties executing a task agree on the task parameters. Three suggestions: derive task ID from task parameters; add specific parameters to AAD data; or specify that the application must implement some mechanism to accomplish this.

Nick Doty: is this really the same as issue 500?

CP: This is an attempt to concisely restate that problem. Want clients
to be aware of the task parameters too.

JH: is it required for the client to understand what the parameters are?
CP: Yes. Must be transparent to the client

CW: I like proposal 2 (AAD). In particular, not sure if this is
necessary for every deployment. You could build task provisioning on top
of this.

CP: task-prov extension already accomplishes this. Don't think there's
anything to do there. Might be redundant work from that PoV.

TG: I think we should do nothing; as CP laid out, none of the proposals
here prevent an aggregator lying about what it's doing, as long as they
say the right thing. I don't know why we should put that into the spec.

Shan Wang: I support #1: (derive task ID). To respond to TG, it's not
about something being bullet-proof, it's about being auditable.

SF: I agree with SW. It's free on-the-wire.

CW, CP and SF discuss whether this is simpler on-the-wire, and what it
adds.

SW: transparency is the most important impact of this issue.

TG: CP, you can check my thinking here. Worried that if you ...

SF: that's what we get from option 2. TG: adding stuff to AAD is nice,
and doesn't add to message size.

CP: I think majority of people want to do nothing? TG: and underline
that task-prov allows you to do this if you want.

Issue #141. Recovery after batch mismatch.

Fatal state where leader and helper don't agree. We've taken steps to
mitigate this. Two options: Can do nothing else, but could add error
correction.

SF: does this forbid a DoS attack where you send a decryptable report to
the leader, but not the helper? CP: yes. Does anyone object to doing
nothing?

Mike Bishop: if we can detect it, can we just drop the mismatched
reports? CP: No, we can't tell which reports don't match. MB: Can we
just restart the batch? CP: that would be messy.

TG: I think we need to know more about the mismatch scenarios, and what
could be done about them. What if the helper is mismanaging things to
try to induce a differencing attack? THink we need more concrete
information. CP: I agree, this is not fleshed out.

Issue #446: Cheaper checksum?

Proposal: make checksum 1) cheaper; 2) optional; 3) do nothing

TG: I have an implementation that does this, but I support the proposal
to take it out of the spec. Could put it in the header or metadata?
Don't mind leaving it in, as my spcec already does it. Would lean
towards leaving it out. DOn't think we have a super-strong argument for
keeping it.

SF: do nothing.

TG: then let's do nothing - no one is saying 'let's do soemthing'

SF: pretty irrelevant in the grand scheme of DAP
Outcome: do nothing.

Issue #472. Deviation from TLS-syntax. Proposals: extend TLS syntax, fully comply as-is, or explain deviations.

TG: it's just a couple of places where we instantiate short literals. I
think it wouldn't take much space. Let's do it in DAP.

SF: Would be nice if we had it. Otherwise, let's just do it in DAP. (?).
CP: I don't want to add another dependency to this draft.

SW: I think we can probably resolve it by adding something in the text.

CW: TLS is very rigid about how wide things are on the wire. QUIC has
moved to variable length encoding. Could we pivot the wire format of DAP

to something that supports something more flexible? THis is a wider
trend. CP: this would be quite a bit of work. CP: I don't object in
principle. Does anyone want to go back to square-one? CW, can you take
it as an action?

SW: ???

CW: QUIC syntax means that a report error may not always be a constant
size. More space efficient. CP: we do a lot of 32-bit prefixing. We
didn't want to do 24-bit prefixing. Would benefit quite a lot in terms
of bandwidth. CW: Bandwidth is not the primary concern here. But let's
not bikeshed. CP: If we can resolve things like this, it's an
improvement. But CW can take a stab at it.

Issue #459

Do nothing, because the conversation would be too complicated.

Issue #450

Make this a PUT instead of a POST. No EKR here to defend this.

TG: PUT seems appropriate to me. CP: proposal is PUT, and add report ID
to the path.

Mark Nottingham: [question about GET]. CP: GET isn't defined for this.
MN: All HTTP resources should support GET. But PUT doesn't quite replace
this. CP: A second PUT isn't quite respected. TG: Not true, or clients
could't doa re-try.

CP: Running out of time for other talks, so going to take this offline.

Differential Privacy in DAP

(Junye Chen's presentation)

Ben Case: Interested in the scope of the threat models here. Are you
interested in VDAFs which are three-party? Wanted to ask if that's
something that'd be useful to put in as a longer-term document. Happy to
volunteer to get something like that done.

Junye CHen: Happy to receive those contributions.

BC: E.g. where you ahve three aggregators, and all pairs of aggregators
are honest.

CP: I'd like to keep this valid to DAP; other proposals don't fit the
DAP architecture.

Haiguang Wang: [clarification on diagram]. How do you provide an upper
bound on accuracy, when one client is dishonest? JC: harder when
multiple dishonest clients. HW: If clients give you false data, how do
you know if the data is accurate or not? JC: It depends on deployment.
Here we just describe a number of honest/corrupted clients, and how that
impacts the privacy of the honest client(s)

Sam Weiler: yes, this is a different problem to accuracy.

HW: I will ask offline.

Ben Case, on Slide 13: I understand that privacy amplifications help
here, but what if you vary N. What would happen if you had a batch-size
of 10 million? JC: obviously you would have better privacy properties,
but the privacy budget would be wasted. BC: that's basically my point.
Shouldn't we keep the size in mind when doing those calculations?

Shan Wang: I think the batch size of 500,000 is the usual size we are
going to use. I think pure client randomization has better properties up
to ~1 million. KT: I don't think we have to waste privacy budget. On
aggregate we can get the same privacy.

TG: do we expect that the DP document published by the PPM WG will only
be useable by organistations that employ talented cryptographers? We
have to be careful here.

CP: yes, this is just advice on how one chooses epsilon. I'm new to DP.
We have to make it as accessible as possible, but we don't want to tell
people exactly how to calculate epsilon. We want big organisations that
employ experts to be able to do their best work. And then that can feed
back to IETF to make the internet better.

TG: I agree with CP. It's already challenging to do something
implementable that works. Guidance is even harder. Maybe IETF can do
that. But that's a hard responsibility to take on. Also, Nick Doty
raised that the draft is informational, and I'm not sure if I want it to
be experimental or normative. It is going to spell out nuts and bolts of
DP, so is informational appropriate? Multiple people would have to
depend on this.

Shan Wang: different sizes of N make this complicated.

JC: I agree - more considerations of effect of batch sizes on local DP
would be useful.

Martin Thomson: I don't think we should adopt this yet. Want more
clarity on this first.

KT: I think we should think of DP as 'I want to use cryptography' rather
than 'I want to use RSA'. It depends on the mechanisms and requirements.

CP: primary value of this document is around establihsing the shape of
the solution, and putting tools in our toolbag. It's not going to solve
all our problems. COuld be an extension to DAP. We don't want this to be
'the only thing we can do'.

Task Provisioning Extension to DAP (TaskProv)

(Shan Wang)

Simon Friedberger - I understand why you want to do it automatically
with a standardised API, but not why you want to do it in-band?

SW: doing it in-band gives benefits. Task parameter authentication
property. I agree we could achieve all the properties, except where we
need flexibility of the client. One appealing reason is that we can do
it all in-band, as opposed to separate extensions or an out-of-band
interface.

SF: I like the fact that clients could define a proposal. Is that
something you're planning on using?

SW: yes

Sam Weiler: to what extent does this solve CP's Issue #500? CP: it
solves it completely and elegantly. You might also want to solve it a
different way. It's sufficient for that issue

Ben Schwartz: is this safe? It seems like I can give a unique task ID
for each user in the population.

SW: to do that, you need to be able to target a user with a particular
task config.

BS: It raises a bunch of issues around linking multiple reports from a
single user, because each user could be given a distinct task config.
This is especially problematic if the user is connecting directly (i.e.
not through a MASQUE or OHTTP proxy).

SW: How do you target a user?

BS: Even if you don't have an explicit link to another part of a user's
identity, you still have the ability to watch users across time as they
submit reports.

SW: Does this make that any worse? Or is that already a problem now,
using source IP, or some other identifier?

BS: certainly if you don't have any additional identity protections for
the user.

TG: a lot of talk about task transparency in a system that's all about
privacy. Aggregators can do the derivation with a certain set of
parameters, and then use a different one. My point is that this isn't
like certificate transparency - it doesn't provide you total
auditability. I have other points, but will take those to the list.

CP: Very interested in TG's ideas about how to improve this. BS, I don't
understand the tracking concern. Core DAP does not hide that someone is
participating in the protocol. It's just about protecting the data.

Sam Weiler: This isn't in charter. Do I need to talk to the ADs?

[no response.]

CP: seems like a pretty simple change.

SW: going to use a quick poll for adoption. [yes 8; no 0; no opinion
38]