Skip to main content

Minutes interim-2025-ppm-01: Tue 20:00
minutes-interim-2025-ppm-01-202503112000-00

Meeting Minutes Privacy Preserving Measurement (ppm) WG
Date and time 2025-03-11 20:00
Title Minutes interim-2025-ppm-01: Tue 20:00
State Active
Other versions markdown
Last updated 2025-03-12

minutes-interim-2025-ppm-01-202503112000-00

PPM interim 2025-03-11

Notes by Chris P./Tim G.

DAP feature set (Chris Patton)

https://datatracker.ietf.org/meeting/interim-2025-ppm-01/materials/slides-interim-2025-ppm-01-sessa-dap-feature-set-00

Will cover the desired feature set of DAP and extensibility of it in the
future.
Initial use cases (from the BoF) where all about client-side telemetry

  • use cases like ENPA (2020) or Firefox Prio experiments back in ~2018

Post-BoF new use cases emerge:

  • federated machine learning
  • private ad conversion measurement at W3C (PATCG/PATWG)

DAP's job is to privately compute some aggregation function over a set
of measurements

  • sometimes "refined" using an "aggregation parameter"
  • agg_result = F(agg_param, meas_1, meas_2, ..., meas_n)

DAP solves this using a 2 party MPC

  • privacy holds if >1 aggregator is honest
    DAP is three interactions:
  • clients upload reports to leader
  • leader aggregates reports with the helper
  • collector collects aggregate results from aggregators (via
    leader)

A task captures configuration of an instance of DAP

  • what VDAF is being run
  • identities of some participants
  • various secrets
    Security goals:
  • securely execute VDAF

    • privacy if >1 aggregator is honest
    • leader and collector are frequently same entity
    • robustness against bogus measurements
  • task agreement

    • agreement on task ID, but negotiation + distribution of task
      parameters is out of scope of DAP
  • replay protection
    Non goals:

  • differential privacy
  • Sybil attack resistance
    • Injection of lots of data to affect privacy goals

Extension of DAP:

  • DAP is compatible with a class of VDAF

    • Right now we have Prio3 family
    • PINE adapted for federated learning
    • Mastic for attribute based aggregation, approximate heavy
      hitters
    • New VDAFs can be defined in the future to solve new problems
  • New batch modes

    • DAP comes with time interval and leader selected batch modes
    • DAP has extension points for new batch modes
  • Report extensions

    • Clients can add more info to uploaded reports
    • taskprov/bind to grant hope that everyone agrees on task params
    • reportauth for sticking Privacy Pass token in a report
    • differential privacy budget parameters
  • Extensions to HTTP API

Features we aren't using yet:

  • Maybe stuff we could take out
  • Multi-step/round VDAFs

    • Prio3, Mastic, PINE are all single-round
    • Chris sez we should keep it because arithmetic sketching might
      have really neat applications
    • Tim also thinks we should keep this, and reckons implementations
      can just not implement multi-round VDAFs if they don't want them
  • Aggregation parameter

    • Not used by Prio3 or PINE, Mastic does have it
    • Chris sez we should keep it
    • Tim also thinks we should keep this
  • Batch modes

    • Technically leader selected covers all possible use cases, since
      it lets leader construct arbitrary batches

Rejected features:

  • secure aggregation beyond VDAFs

    • general purpose MPC (too slow)
  • multiple helpers (aka >2 aggregators)

    • weaker trust model, but too hard to implement or specify
  • multiple collections of the same batch

    • varying aggregation parameter
    • needed for heavy hitters via Poplar1/Mastic
    • too much protocol complexity, privacy leaks we can't fix
  • batched preparation

Aligning DAP with RFC 9205 / BCP 56 (Tim)

  • RFC 9205: Building protocols on top of HTTP (BCP 56 is about
    protocols in general)
  • Recent changes to DAP

    • Catch up to latest VDAF draft
    • Async aggregation

      • Why? To reduce burden on Helper observed in prod: HTTP
        requests shouldn't take too long
        • High computation (big histograms)
        • Resource contention (waiting to read/write to a row in a
          DB)
    • Align collection job handling logic/ aggregation job handling

  • Goals:

    • There are some guidelines from 9205 that we should follow

      • Tim: I plan to do PRs to do just that
        • Editorial changes to the draft; HTTP error codes (no
          other changes)
    • Long-running operations need polishing

      • E.g., agg share request is synchronous, but a reader might
        expect it to be asynchronous given that every other
        interaction is asynchronous
    • Potentially change wire formats

  • Observations from aggregation job step

    • status code is 200 specific

      • Ben Schwartz: 9205 is telling us that the server is free to
        choose between 200 and 201
        • Tim G.: yes. let's also look at the ACME RFC for
          inspiration
    • response body contains an enum discriminant, which you use to
      tell whether the job is processing or ready

      • empty response body is sufficient
  • Collection jobs

    • Same observations as for aggregation jobs
    • Forbid sync, but aggregation jobs might be sync
  • Aggregate share request

    • Forbid async (opposite problem as collection jobs!)
  • In summary, we need:

    • to change response codes
    • to distinguish "ready" from "processing" with an empty response
      body
    • A consistent semantic for long-running operations
      • Follow CRUD (Create, Read, Update, Delete)
      • We don't need asynchronous upload
  • Proposed changes

    • aggregation: tweak HTTP status codes and message formats
    • agg share:

      • Add async mode
      • Require an identifier for "agg share ID"
    • collection: add async mode

  • When responding to a request, you get to choose whether to handle
    synchronously or asynchronously.

  • Discussion

    • Ben S.: Let's repeat HTTPDIR review after this process

      • Tim G.: agree, but there a couple of design decisions we've
        made that we think are right and they might disagree with.
        • Ben S.: Let's not assume that we're going to be told to
          make those changes.
    • Chris P.: I'm on board, but we should check with implementers to
      see if they're willing to implement these changes.

    • Tim G.: My next step is to catch up on ACME and learn from their
      experience

DAP bulk uploads (Alex K.)

  • PPA (W3C): Proposal to use DAP for measuring ad campaigns
  • Involves an intermediary between clients and leader
  • Intermediary has millions of reports it wants to upload
  • two design questions:

    • should this be its own draft or in the core protocol
    • how to deal with "partial failures"? Request may fail while the
      Leader is ingesting the data (could be gigabytes!)
  • Discussion

    • Tim G.:

    • Chris P.:

      • Let's consider adopting async semantics, like collection or
        aggregation flows
      • Not sure we should way on resumable upload getting
        implemented

        • Alex K.: It might be nice to allow the client to upload
          one stream
      • If we take this in DAP, then I agree let's merge the APIs

    • Ben S.:

      • resumable upload is higher complexity than we need
      • question about atomicity: it's going to be important for
        each uploaded batch to be acknowledged (or rejected)
        • this probably means you can't process in a streaming
          fashion
          • this probably means you need to agree on a maximize
            size
    • Tim: Let's look at AWS S3 for inspiration

      • Alex K.: This API is too heavy weight for this use case
        • Ben S.: Agree.

ECDH-PSI (wangyuchen)

  • Implemented by industry

    • PrivateID - Meta
    • PETAce - TikTok
    • Private Join / Compute - Google
    • secretflow-PSI - Ant Group
    • Matching between different internal offices (US DoE)
    • Detecting leaked passwords (Microsoft)
    • CSAM (Apple)
    • Match tax code (Bank of Italy, ISTAT)
    • Official UN statistics (UN)
  • Use cases

    • Determine cardinality / intersection for common customers w/o
      sharing data directly

      • join marketing, finding dormant credit card users, blocklist
        matching
    • Adopted in China: Taobao, Alipay, Bytedance, China Unionpay

  • Stems from Meadows86 w/ EC parameters

    • Records converted to points on curve, then masked by secret keys
      of both parties with scalar multiplication
    • If r_a = r_b they match the same point
  • Security model

    • Semi-honest model, often considered secure
    • Looked at malicious model
    • Using masked message in one session to another session

      • TLS channel binding mitigates
    • Using KDF to truncate the binary rep of EC points

    • Quantum:

      • Ben Schwartz: Is it subject to post-collection decryption?
      • Wang Yu Chen: Yes
    • Deb (AD): Given wide industry use, why does this need a new
      standard?

      • Wang Yu Chen: Existing implementations are not
        interoperable.