# IETF-118-PPM {#ietf-118-ppm} \[Note-taker Michael B\] ## Administrivia - Chairs {#administrivia---chairs} * Note-well * Milestone (Dec 2023) may be missed, but WG is making good progress. ## DAP Open Issues Review {#dap-open-issues-review} Chris Patton. Five issues all corresponding to the same underlying feature - collecting a report multiple times with different aggregation parameters. Deemed safe as long as meets the requirements of the VDAF draft. No one has implemented this. Simon Friedberger: Still want to support heavy hitters. I will implement it some time next year unless someone does this first. Tim Geoghegan: I will also implement this. Would like Heavy Hitters and this feature to be left in. ### Issue #519: How should batching strategies be handled? Supporting multiple query types adds complexity. {#issue-519-how-should-batching-strategies-be-handled-supporting-multiple-query-types-adds-complexity} Tim Geoghegan: I proposed moving the time-interval query type. Don't want to restate those arguments poorly, but you might need it to realign data you get from DAP and other methods. It's not about removing time-interval, or having something equivalent. I think it's about making the helper agnostic to query types Chris Patton: that's right - it's about removing it from the protocol altogether. TG: I think you need to think about when the leader should reject uploads based on time intervals. CP: I think we could do that. Shan Wang: If we are removing the interval type, could we relax the fixed size mechanism. CP: we could make it optional, or remove it, any preference? SW: I think we could make it optional, if we don't care. Simon Friedberger: Happy with performance improvements, but also happy to do nothing at the moment. CP: Mozilla is using this? Daphne is using this. Tim Geoghegan: Simon's comment jogged my memory; if time internal gets downgraded, then implementations still have to deal with maintaining the code that does it. So could taking this out might not simplify implementations if they still need it for particular purposes. Ben Schwartz: Is there a simplification if a task commits to one query type or another, rather than both? CP: It already does. CP: I think people want to do nothing for now, and acknowledge that some implementations may not implement some query types. ## Issue #489: Supporting drill-down {#issue-489-supporting-drill-down} Breaking down results by the aggregate result by arbitrary labels. Something akin to SQL. If clients label reports, then it might allow fingerprinting. Don't want to deanonymise clients with specific label. But also want to enforce minimum batch size, to ensure a certain size of anonymity set. This makes things a bit more complicated. We could add labels to report metadata, at the risk of fingerprinting. Or, at CFRG we will talk about 'mastic', which might allow this, but not be sufficiently flexible. Tim Geoghegan: if doing mastic, is that reflected in the aggregation parameter? CP: We need to get the aggregation parameter right. TG: That's true for a lot of properties. CP: It's not true for every type of query pattern we might not Shan Wang: Creating one task per label? CP: As Tim points out, it doesn't scale well. This might be easy in task-prov, but not to spin up a new task for every label you might have. Chris Wood: we've faced the same problem in privacy pass. Don't see any problem with the first option. (adding labels to metadata). It's up to deployments to figure out the overall privacy story, and whether it affects privacy. We should do both options. Kunal Talwar: Without something like this, the client is assured that their data isn't going to be subject to a differencing attack. Ben Case: We have a use case where we would like some of these labels. In most cases, it's not the fingerprinting that's the issue. Shifting towards DP and DAP should solve these issues around reidentification attacks. We would like to support one report having many labels, while respecting DP. So +1 to this being useful functionality, and happy to help think about the details. TG: this could be very valuable functionality. But there's a lot of sharp edges to this cryptography. But you need to consult with good cryptographers to work this out. Not sure what the answer is. Simon FriedBerger: if you want to do fingerprinting, then it's not that important, because the leader can see IPs. CP: If we go with labels on metadata, then there's still a lot of thinking to do. We could do what Ben says, and relax that a bit. But have to be careful. Shan Wang: There are other ways to fingerprinting, but other technologies can be used to mitigate. Labelling is intrinsic to the whole aggregation process, making it harder to reduce fingerprinting. Chris Wood: Can still do labelling through tasks. This is purely aesthetic. CP: So we should implement the wire-change now, and use it if we need to. People are out there who want to ship this very soon. CW: People can ship this without the RFC, if they need to. **Going to a poll: "Should we add labels to the report metadata?" (Yes: 8; No: 9; No opinion: 34)** Kunal Talwar: It's not possible to simulate this through multiple tasks, combinatorially. TG: Might be moot, seeing the poll results. "Should we do plumbing in order to do labels". Sam Weiler: do any of the 'No' voters want to come forward and say more? Shan Wang: I think this opens more possibilities for an attack. I don't think we should put it in the standard until we are sure of the impact? Chris Wood: Can we put in a separate draft, as an extension, so it doesn't block the main draft? CP: Yes. Would anyone who wants this be against putting this as a report-extension. Jonathan Hoyland: Does combining labels create more information than just having multiple tasks? Labels can be bound together. CP: A report is bound to a particular task. JH: So CW's suggestion that this is the same as combining from multiple tasks, is not true. ### Issue #500 Agreement on task parameters. A desirable property of DAP is that honest parties executing a task agree on the task parameters. Three suggestions: derive task ID from task parameters; add specific parameters to AAD data; or specify that the application must implement some mechanism to accomplish this. {#issue-500-agreement-on-task-parameters-a-desirable-property-of-dap-is-that-honest-parties-executing-a-task-agree-on-the-task-parameters-three-suggestions-derive-task-id-from-task-parameters-add-specific-parameters-to-aad-data-or-specify-that-the-application-must-implement-some-mechanism-to-accomplish-this} Nick Doty: is this really the same as issue 500? CP: This is an attempt to concisely restate that problem. Want clients to be aware of the task parameters too. JH: is it required for the client to understand what the parameters are? CP: Yes. Must be transparent to the client CW: I like proposal 2 (AAD). In particular, not sure if this is necessary for every deployment. You could build task provisioning on top of this. CP: task-prov extension already accomplishes this. Don't think there's anything to do there. Might be redundant work from that PoV. TG: I think we should do nothing; as CP laid out, none of the proposals here prevent an aggregator lying about what it's doing, as long as they say the right thing. I don't know why we should put that into the spec. Shan Wang: I support #1: (derive task ID). To respond to TG, it's not about something being bullet-proof, it's about being auditable. SF: I agree with SW. It's free on-the-wire. CW, CP and SF discuss whether this is simpler on-the-wire, and what it adds. SW: transparency is the most important impact of this issue. TG: CP, you can check my thinking here. Worried that if you ... SF: that's what we get from option 2. TG: adding stuff to AAD is nice, and doesn't add to message size. CP: I think majority of people want to do nothing? TG: and underline that task-prov allows you to do this if you want. ### Issue #141. Recovery after batch mismatch. {#issue-141-recovery-after-batch-mismatch} Fatal state where leader and helper don't agree. We've taken steps to mitigate this. Two options: Can do nothing else, but could add error correction. SF: does this forbid a DoS attack where you send a decryptable report to the leader, but not the helper? CP: yes. Does anyone object to doing nothing? Mike Bishop: if we can detect it, can we just drop the mismatched reports? CP: No, we can't tell which reports don't match. MB: Can we just restart the batch? CP: that would be messy. TG: I think we need to know more about the mismatch scenarios, and what could be done about them. What if the helper is mismanaging things to try to induce a differencing attack? THink we need more concrete information. CP: I agree, this is not fleshed out. ### Issue #446: Cheaper checksum? {#issue-446-cheaper-checksum} Proposal: make checksum 1) cheaper; 2) optional; 3) do nothing TG: I have an implementation that does this, but I support the proposal to take it out of the spec. Could put it in the header or metadata? Don't mind leaving it in, as my spcec already does it. Would lean towards leaving it out. DOn't think we have a super-strong argument for keeping it. SF: do nothing. TG: then let's do nothing - no one is saying 'let's do soemthing' SF: pretty irrelevant in the grand scheme of DAP Outcome: do nothing. ### Issue #472. Deviation from TLS-syntax. Proposals: extend TLS syntax, fully comply as-is, or explain deviations. {#issue-472-deviation-from-tls-syntax-proposals-extend-tls-syntax-fully-comply-as-is-or-explain-deviations} TG: it's just a couple of places where we instantiate short literals. I think it wouldn't take much space. Let's do it in DAP. SF: Would be nice if we had it. Otherwise, let's just do it in DAP. (?). CP: I don't want to add another dependency to this draft. SW: I think we can probably resolve it by adding something in the text. CW: TLS is very rigid about how wide things are on the wire. QUIC has moved to variable length encoding. Could we pivot the wire format of DAP to something that supports something more flexible? THis is a wider trend. CP: this would be quite a bit of work. CP: I don't object in principle. Does anyone want to go back to square-one? CW, can you take it as an action? SW: ??? CW: QUIC syntax means that a report error may not always be a constant size. More space efficient. CP: we do a lot of 32-bit prefixing. We didn't want to do 24-bit prefixing. Would benefit quite a lot in terms of bandwidth. CW: Bandwidth is not the primary concern here. But let's not bikeshed. CP: If we can resolve things like this, it's an improvement. But CW can take a stab at it. ### Issue #459 {#issue-459} Do nothing, because the conversation would be too complicated. ### Issue #450 {#issue-450} Make this a PUT instead of a POST. No EKR here to defend this. TG: PUT seems appropriate to me. CP: proposal is PUT, and add report ID to the path. Mark Nottingham: \[question about GET\]. CP: GET isn't defined for this. MN: All HTTP resources should support GET. But PUT doesn't quite replace this. CP: A second PUT isn't quite respected. TG: Not true, or clients could't doa re-try. CP: Running out of time for other talks, so going to take this offline. ## Differential Privacy in DAP {#differential-privacy-in-dap} (Junye Chen's presentation) Ben Case: Interested in the scope of the threat models here. Are you interested in VDAFs which are three-party? Wanted to ask if that's something that'd be useful to put in as a longer-term document. Happy to volunteer to get something like that done. Junye CHen: Happy to receive those contributions. BC: E.g. where you ahve three aggregators, and all pairs of aggregators are honest. CP: I'd like to keep this valid to DAP; other proposals don't fit the DAP architecture. Haiguang Wang: \[clarification on diagram\]. How do you provide an upper bound on accuracy, when one client is dishonest? JC: harder when multiple dishonest clients. HW: If clients give you false data, how do you know if the data is accurate or not? JC: It depends on deployment. Here we just describe a number of honest/corrupted clients, and how that impacts the privacy of the honest client(s) Sam Weiler: yes, this is a different problem to accuracy. HW: I will ask offline. Ben Case, on Slide 13: I understand that privacy amplifications help here, but what if you vary N. What would happen if you had a batch-size of 10 million? JC: obviously you would have better privacy properties, but the privacy budget would be wasted. BC: that's basically my point. Shouldn't we keep the size in mind when doing those calculations? Shan Wang: I think the batch size of 500,000 is the usual size we are going to use. I think pure client randomization has better properties up to ~1 million. KT: I don't think we have to waste privacy budget. On aggregate we can get the same privacy. TG: do we expect that the DP document published by the PPM WG will only be useable by organistations that employ talented cryptographers? We have to be careful here. CP: yes, this is just advice on how one chooses epsilon. I'm new to DP. We have to make it as accessible as possible, but we don't want to tell people exactly how to calculate epsilon. We want big organisations that employ experts to be able to do their best work. And then that can feed back to IETF to make the internet better. TG: I agree with CP. It's already challenging to do something implementable that works. Guidance is even harder. Maybe IETF can do that. But that's a hard responsibility to take on. Also, Nick Doty raised that the draft is informational, and I'm not sure if I want it to be experimental or normative. It is going to spell out nuts and bolts of DP, so is informational appropriate? Multiple people would have to depend on this. Shan Wang: different sizes of N make this complicated. JC: I agree - more considerations of effect of batch sizes on local DP would be useful. Martin Thomson: I don't think we should adopt this yet. Want more clarity on this first. KT: I think we should think of DP as 'I want to use cryptography' rather than 'I want to use RSA'. It depends on the mechanisms and requirements. CP: primary value of this document is around establihsing the shape of the solution, and putting tools in our toolbag. It's not going to solve all our problems. COuld be an extension to DAP. We don't want this to be 'the only thing we can do'. ## Task Provisioning Extension to DAP (TaskProv) {#task-provisioning-extension-to-dap-taskprov} (Shan Wang) Simon Friedberger - I understand why you want to do it automatically with a standardised API, but not why you want to do it in-band? SW: doing it in-band gives benefits. Task parameter authentication property. I agree we could achieve all the properties, except where we need flexibility of the client. One appealing reason is that we can do it all in-band, as opposed to separate extensions or an out-of-band interface. SF: I like the fact that clients could define a proposal. Is that something you're planning on using? SW: yes Sam Weiler: to what extent does this solve CP's Issue #500? CP: it solves it completely and elegantly. You might also want to solve it a different way. It's sufficient for that issue Ben Schwartz: is this safe? It seems like I can give a unique task ID for each user in the population. SW: to do that, you need to be able to target a user with a particular task config. BS: It raises a bunch of issues around linking multiple reports from a single user, because each user could be given a distinct task config. This is especially problematic if the user is connecting directly (i.e. not through a MASQUE or OHTTP proxy). SW: How do you target a user? BS: Even if you don't have an explicit link to another part of a user's identity, you still have the ability to watch users across time as they submit reports. SW: Does this make that any worse? Or is that already a problem now, using source IP, or some other identifier? BS: certainly if you don't have any additional identity protections for the user. TG: a lot of talk about task transparency in a system that's all about privacy. Aggregators can do the derivation with a certain set of parameters, and then use a different one. My point is that this isn't like certificate transparency - it doesn't provide you total auditability. I have other points, but will take those to the list. CP: Very interested in TG's ideas about how to improve this. BS, I don't understand the tracking concern. Core DAP does not hide that someone is participating in the protocol. It's just about protecting the data. Sam Weiler: This isn't in charter. Do I need to talk to the ADs? \[no response.\] CP: seems like a pretty simple change. SW: going to use a quick poll for adoption. \[yes 8; no 0; no opinion 38\]