Notes by Christopher Patton
[cpatton@cloudflare.com](mailto:cpatton@cloudflare.com)

# Privacy-Preaserving Measurement Techniques: Informal Comparison   {#privacy-preaserving-measurement-techniques-informal-comparison}

by Sofi Celi (Brave)

*   Preface
    *   Initial investigation only -- not a complete overview
    *   Question: If I want to "measure" something securely, which
        scheme should I use?
        *   Not easy to answer right now.
        *   Unclear expectations on efficiency/\$$$

*   Main notion
    *   Want to know something about users, e.g., for improving
        usability, without leaking privacy
    *   PPM WG formed for solving this problem

*   Wishful thinking
    *   Desire for "semantic security": Database should not revel
        anyhthing about a user that can't be learnt without access to
        database

*   Techniques/schemes
    *   Differential privacy: Add random noise to either each input
        measurement or to output aggregate
        *   RAPPPOR \[EPK14\]:
            *   Add local random noise => costly
        
        *   PROCHLO \[BEM+17\]
            *   More effidcient than RAPPOR, different architecture
            *   Required "trusted architecture"
                *   Have to trust a third-party for "shuffling"
    
    *   Prio \[CGB17\]
        *   PPM WG is working on standard for Prio-based schemes
        *   Secure aggregation with small number of non-colluding
            aggregation servers
        *   Inherent leakage (so-called "f-privacy"):
            *   E.g., when computing the mean, leak the number of inputs
        
        *   "Robustness" against malicious clients via "SNIPs"
            (secret-shared, non-interactive proofs)
        *   Only suitable for "numeric values"
    
    *   STAR \[DSQ+21\]
        *   Solves the "heavy hitters" problem
        *   Server only learns the value of a measurement if at least k
            clinets submit that measurement
            *   Client constructs ciphertext for their data, sends
                k-of-N secret share of the randomness used to encrypt
        
        *   Three entities:
            *   Client
            *   Randomness server - helps client compute secret shares
            *   Aggregation server - does the actual aggregation
        
        *   Goal: low monetary cost
    
    *   Poplar \[BBCG+21\]
        *   Similar security properites as Prio
        *   Requires two non-colluding servers (Prio needs only 1-of-N
            honest servers)

*   Other considerations
    *   \[There are big tables here in the slides that might be useful\]

*   User nneds: Voice of the user is absent right nows
    *   Do users understand that or how their data is collected in a
        privacy-preserving manner?
    *   Even when some PPM mechanism is in-use, user consent still needs
        to be considered.

# The Decoupling Principle   {#the-decoupling-principle}

by Barath Raghavan (USC / INVISV)

Lots of cool new privacy technqiues: What do they have in common? This
talk about is one design principal core to many of them

*   Nutshell:
    *   Decouple who you are from what you do.
        *   Old idea that gets redfiscovered periodically
            *   Work of Chaum introduces this idea in the 80s
    
    *   Easiest when splitting entity and mechanism
    *   Applying principal is always protocol/context specific

*   Context
    *   "Oridnary" data confidentiality is "nearly solved"
        *   TLS, encryption of data at rest, etc.
    
    *   What's left: Layered metadata privacy problem
        *   Many solutions needed, and they overlap
    
    *   Privacy challenges are fundamental to the Internet
        *   We reliy on others to handle traffic

*   Terminology
    *   (Non-)sensitivity of information (context-specific)
    *   WAnt to define: "There is some party that, in some context, has
        information about some user."

*   Caveats
    *   Inhrently difficult to categorize information as (non-)sensitive
    *   Identity and data often conflated

*   Example: Mix-nets / onion routing (Tor)
    *   Sender sends mesasge to some receiver, wants metadata privacy
        for ID and message
        *   Sender: knows all sensitive information (sender message, ID)
        *   Mix 1: knows sender ID
        *   Mix 2, ..., N: knows a message was sent by a user
        *   Reciver: knows message
    
    *   Decoupling principle: Third parties should know at most one
        sensitive piece of information (in this case, either the ID or
        message but not bouth)

*   Many otheer exmaples!
    *   Chaum's designs
    *   Privacy Pass
    *   Oblivious DNS (a la ODoH)
    *   PGPP ("pretty good phone privacy")
        *   Mobile ID, user ID: Know entity knows both
    
    *   Private Relay
        *   First relay knows the IP, second knows the origin being
            reqwuested (neither knows both)
    
    *   Some private aggregate systems

*   Why does this work?
    *   Users care about hiding their identity and (meta)data of their
        requests
    *   User's don't care about
        *   whether they reveal they're a user
        *   whether they can hide that a request/response was handled by
            service

*   Cautionary Tale: Security Gateways/VPNs
    *   Sender: all sensitive data:
    *   Gateway: All sensitive info
    *   Receiver: non-sensitive user ID + sensitive data
        *   NO DECOUPLING!!

*   Other considerations:
    *   Non-collusion
    *   trusted exsecution environments
    *   Side-channels

# Practical Privacy-preserving Authentication for SSH (to appear at USENIX '22)   {#practical-privacy-preserving-authentication-for-ssh-to-appear-at-usenix-22}

ia.cr/2022/740

by Mike Rosulek

*   SSH authentication via public keys
    *   Client asks if it can authenticate with a given public key,
        server responds with "yes" / "no" answer

*   Problem: Clients are easily fingerprintable.
    *   Client tries all public keys it has in some order, even those
        that aren't intended to be used by the server
        *   Ben Cox / Fillippo Valsorda demonstrated that public keys
            can be scraped from SSH and used to fingerprint GitHub
            users.
            *   Fillippo developed a tool for this.
    
    *   Mitigation: Client only sends the "right" public keys to the
        server

*   Problem: malicious client can still "probe" the server to enumerate
    other users
*   Problem: Server sees which key was used
*   Problem \[Note taker missed it\]

*   New authentication method for SSH
    *   Big picture: Use a mixture of keys in a single authentication
        attempt
        *   Server doesn't see which one was used
        *   Client does not learn if a public key can be used for
            authentication unless it holds the corresponding secret key.
    
    *   Client won't connect unless server knows and "explicitly
        includes" one of the client keys

*   Overview of the protocol
    *   New primitive: Anonymous multi-KEM
        *   Many decapsulaters, scheme hides the identity of each
    
    *   Use private set-intersection (PSI)
        *   Client learns intesrection of keys that can be used, server
            learns only if the client can uathorize

*   Technical contributions:
    *   Ananymopus multi-KEM instantiation compatible with SSH
    *   \[Note taker missed something having to do with PSI\]
    *   UC security

*   Concrete performance (total round-trip time of authentication):
    *   Worst case: all RSA keys
        *   client has 20 keys, server has 100: 320 ms
    
    *   Best case: all ECDSA/EcDSA keys
        *   client has 20 keys, server has 100: 28 ms

*   Q+A
    *   How does this apply to current issues for GitHub?
        *   Mike: See paper.

# draft-irtf-pearg-safe-internet-measurement   {#draft-irtf-pearg-safe-internet-measurement}

by Mallory Knodel (CDT)

*   Adopted by WG, need help!
*   New document structure:
    *   Intro
    *   Consent
    *   Safety considerations
    *   Risk analysis

*   Open issues (github.com/IRTF-PEARG/draft-safe-internet-measurement)
    *   All are about identifying additional resources

*   Next steps
    *   Is the Table of Contents "complete"? Any missing sections?f
    *   IAB workshop M-TEN on measurement techniques in encrypted
        networks
        *   Call for papers is out, we should submit a draft.
        *   Wants to make network measurement easier/safe even when
            traffic is encrypted

*   Chair (Chris W.): PPM WG might benefit from this document (and vice
    versa)
    *   Mallory: Folks who are involved in PPM: Let's chat!

*   Chair: Censorship draft \[did notetaker get the name right\] cleared
    last call, should see RFC soon