Chair welcome ("PearG"):  
Note well / Wear masks, in person.

## Draft updates (5 mins)   {#draft-updates-5-mins}

*   RG draft statuses
    *   IP Address Privacy Considerations:
        *   No recent updates since the last meeting, but updates coming
            soon
    
    *   Censorship:
        *   Recent update
    
    *   Numeric IDs
        *   Sent to RFC editor
    
    *   Safe Internet measurements:
        *   Review
        *   Maybe interesting for PPM, as well

## Presentations (100 mins)   {#presentations--100-mins}

*   Interoperable Private Attribution (Martin Thomson) - 30 mins
    *   Attribtion: important piece of the ad industry
    *   Trains!
    *   Let's talk about the Tokyo subway system
    *   Actually, let's talk about identifiers, like access cards (e.g.,
        PASMO)
    *   Using passenger tracking for the purpose of capacity planning,
        performance, etc.
    *   Specifically, for systems that track when a person enters the
        system and when the person exits
    *   But logs are a privacy risk and can be used for other purposes,
        even if they are inherently pseudonymous - identities could be
        linked.
    *   Can we create a design that aggregates the data that's
        interesting, and provides individual privacy?
    *   One design is using tokens with buckets
    *   Tokens need to be:
        *   anonymous
        *   authenticated
        *   time-delayed "opening"/redemption
        *   ephemeral
    
    *   Moving on to advertising
    *   Attribution: information from one context and linking it in a
        different context
    *   Answer a question: "How many people saw the ad, then came to the
        show?"
    *   Understanding whether certain advertising is working:
        *   good placment
        *   creatives
        *   how much to spend
        *   how long to run campaigns
    
    *   Current, cross-context attribution allows linking people across
        contexts
    *   With advertising, the context is everything:
        *   Whether an ad was shown, and if that ad was clicked
        *   Was a product puchased, or not
        *   where was the ad shown
    
    *   Interoperable Private Attribtion (IPA)
        *   People have an identifier (significant protections against
            revealing the identifier)
        *   Sites can request an encrypted and secret-share of that
            identifier
        *   Sites have a view of the identifier, but it's not linkable
            cross-site
    
    *   Attribution in MPC (multi-party computation)
        *   sites gather events
        *   MPC decrypts identifiers and performs attribution
        *   aggregated results are the output (histogram)
    
    *   MPC does not, itself, see the original query
    *   MPC:
        *   Any computation if you only need addition and multiplication
        *   It can be expensive
        *   IPA uses a three-party, honest-majority threat model
    
    *   Differential Privacy
        *   (epsilon, delta)-DP for hiding individual contributions
        *   Every site gets a query budget that renews each epoch (e.g.,
            week)
        *   This does provide leakage across time (epochs), more
            research needed in this area
        *   Parameters are not fixed yet
    
    *   Client's encrypted identifiers are bound to a site, they are
        bound to:
        *   the site that requested them
        *   the epoch/week they are requested
        *   the type of event: source (ad), trigger (purchase)
    
    *   IPA: advances and challenges
        *   IPA's flexibility provides somewhat of a drop-in replacment
            for current anti-fraud systems
        *   IPA's flexibility hurts accountability
            *   Existing challenge in making the system auditable
        
        *   MPC performance is a challenge, especially at the scale of
            10s of billions
    
    *   Status: Good progress, overall, but still requires research in
        some areas
    *   Currently running some synthetic trials
    *   Ongoing work in W3C working groups, protocol may come to PPM in
        the future
    
    *   Brian Trammel: MPC performance is a challenge. Computation or
        communication complexity?
    *   MT: A lot is algorithmic (linear), but some of that will likely
        improved, but much of it is communication cost. Originally,
        records were working on the order of ~40GB, but it's still
        mutli-gigabytes in size
    *   Chris Wood: 1) What was the MPC functionality you needed (as
        defined by the existing adtech industry), 2) Now that
        functionality is defined, and how you implement. How did you
        reach this design?
    *   MT: Need more time. Lots of people took the steps to get here.
        Apple's PCM took an initial approach. This is mostly about
        understanding how the advertising industry uses measurement as a
        core part of their processes. There is a "need" vs. "want"
        different of perspective by different parties, and those
        discussions are on-going. If you add cross-device attribution,
        it gets more complicated.
    *   CW: There is an academic research community that has spent a lot
        of time designing MPC protocols. There seems to be some overlap
        and collaboration opportunity here.
    *   Shivan: Who would run the servers in the MPC protocol?
    *   MT: We need to trust them to not collude - to be determined
    *   Jonathan Hoyland: If it's run by a third-party that is running
        an auction, what are the guarantees that they're actually
        running the MPC protocol
    *   MT: Currently leaning on the oversight / auditing.
    *   JH: Can the response include a proof?
    *   MT: Recently asked if Verifiable MPC was considered - but VMPC
        is not ready yet. So, "trust and verify" is the current approach

*   Secure Partitioning Protocols (Phillipp Schoppmann) - 20 mins
    *   Let's go more into details for scaling aggregation computations
    *   Billions of impressions from billions of clients
    *   ALl clients submit their reports to the MPC cluster
    *   MPC outputs the aggregate results
    *   Goals
        *   When sharding the MPC cluster, every client must use the
            same shard
        *   We need a private mechanism for mapping one client to the
            same shard
        *   This should have low communication cost
        *   "correctness" must not be affected
    
    *   Assumptions:
        *   Bound on the number of contribitions
        *   Many clients, fewer shards
    
    *   Blueprint: partitioning from distributed OPRFs
        *   client has an index (i), and payload (v)
        *   One server has an OPRF key (server 1)
        *   Other server (server 2) will learn the result of OPRF
            computation
        *   server 1 must add some padding queries
        *   Server 2's output of OPRF is used for mapping client to
            target partition
    
    *   Dense Partitioning: OPRF Output = Shard ID
    *   If there are only a small set of shards, then this is reasonable
    *   Sparse Partitioning: OPRF Output = Random Client ID
        *   Can the client's reports be aggregated before the MPC
            computation?
        *   This doesn't result in creating a client identifier because
            server 1 pads the set of known client identifier if dummy
            values, so server 2 can't distinguish between real users and
            fake users
    
    *   How can the sparse histogram be private without seeing the
        actual histogram?
        *   View the output of the OPRF as a histogram
        *   Make sure frequency can't be linked to specific users
        *   Choose a threshold, below threshold add dummy values, above
            threshold \[..\] (?)
    
    *   Conclusion: efficient for these use cases
    *   Next steps: Is there general interest? Are there other protocols
        where this might be useful? Are there other properties that are
        needed?
    
    *   Chris Patton: Definitely interesting, but maybe not as an
        independent draft
    *   PS: So, add this into individual drafts, instead of making a
        general purpose protocol
    *   CP: Yes
    *   Martin Thomson: The bounds seem to be fundemental. How confident
        are you that these are required costs?
    *   PS: The numbers are not the absolute lower bound, they are based
        on the curent design described in this presentation
    *   MT: IPA may not be able to set an upper bound on the number of
        contributions, for example due to a Sybil attack
    *   PS: While any party can create reports, but fraudulent reports
        may be able to be filtered downstream

*   DP3T: Deploying decentralized, privacy-preserving proximity tracing
    (Wouter Lueks) - 25 mins
    *   D3-PT, started back in March 2020, first draft in May 2020,
        September 2020 - Summer 2021 working on presence tracing
    *   Non-traditional academic environment - scaling to millions of
        users on a small timescale
    *   Relying on existing infrastructure had a large impact
    *   The system was designed that they were purpose-built and
        couldn't be re-used for other purposes
    *   Risks associated with digital contact tracing:
        *   Must embed social contact / graph
        *   location tracing
        *   medical information
        *   social interactions
        *   social control risk
    
    *   Time has shown what can go wrong with designs/deployments like
        this
        *   Police departments in crime solving
        *   data leaks
        *   harassment of specific subgroups
    
    *   It is very important that systems should be designed with
        purpose-limitations in mind, so they can't be easily abused in
        other ways
    *   Relying on existing infrastructure, using phones with BTLE
        sending beacons
    *   Proximity can be derived based on the beacons they saw
    *   Exposure notification works by the set intersection of beacons
        the person (who tested positive) saw and all of the identifiers
        that another person broadcast
    *   The design of these beacon broadcasts required that the OS
        vendor must be involved
    *   While the design was relatively simple, relying on existing
        hardware made the situation more difficult/complicated
    *   The result of collaboration with Google/Apple, was the
        Google/Apple Exposure Notification (GAEN) Framework/API
    *   For full effect, you need privacy at all layers of the stack,
        including the bluetooth protocl stack
        *   MAC address must rotate at the same time as the beacons
    
    *   Similarly, at the network layer, a network adversary can detect
        uploading the report of seen beacon identifiers (when reporting
        covid positive) - CH used dummy uploads to hide
    *   Lessons learned:
        *   Purpose limitations
        *   context matters (how/where they are deployed)
        *   Privacy at all layers
    
    *   Tommy Pauly: More comment than questions: for privacy at all
        layers, Apple is routing upload report through iCPR
    *   WL: While this is great, there might be other sidechannels we
        need to look at
    *   XXX: How do you authenticate IDs?
    *   WL: There isn't any binding, but the upload requires knowing the
        underlying seed from which the beacon was derived
    *   Chris Wood: What would've an ideal interface looked like, and
        how would you've designed it differently?
    *   WL: The strictness provided protections, but it introduced
        challenges, as well. There isn't an easy answer.

*   LogPicker: Strengthening Certificate Transparency Against Covert
    Adversaries (Alexandra Dirksen) - 25 mins
    *   HTTPS is mostly a default now (90%+ of all page loads are https
        in chrome)
    *   CAs are the trust anchors of the Web PKI
    *   There are recent illicit certificate creations, and seemingly
        increasing
        *   WoSign
        *   Digicert
        *   Diginotar
        *   Comodo
        *   TurkTrust
    
    *   For rogue certificates, where you get a certificate for a domain
        that you don't own (e.g., HTTPS interception)
    *   In the attacker scenario, a covert attacker obtaining a rogue
        certificate
    *   Certificate transparency overview
    *   CT is still vulnerable to this attack
        *   All logs belong to a CA vendor
        *   First compromise was in 2020
        *   vulnerable to collaboration attacks
        *   vulnerable to split view attack
    
    *   Gossip is proposed as a mitigation for Split View attacks
    *   LogPicker: a decentralized approach
        *   CA contacts one log (leader) from a large set of logs (log
            pool)
        *   Leader then contacts the other logs in the pool
        *   the pool then selects one log, at random
        *   The selected log includes the certificate in its merkle tree
        *   The logs that participated in choosing the log create a
            proof, and that proof is aggregated and sent back to the CA
            for inclusion in the certificate
    
    *   This design meets the goals
    *   Chris Wood: The log pool uses an election protocol?
    *   AD: Yes, two protocols
    *   CW: Have you looked at alternative solutions that use threshold
        signing?
    *   AD: The aggregated signature uses BLS, but which signature
        scheme is used is not strictly defined