Notes by Christopher Patton [cpatton@cloudflare.com](mailto:cpatton@cloudflare.com) # Privacy-Preaserving Measurement Techniques: Informal Comparison {#privacy-preaserving-measurement-techniques-informal-comparison} by Sofi Celi (Brave) * Preface * Initial investigation only -- not a complete overview * Question: If I want to "measure" something securely, which scheme should I use? * Not easy to answer right now. * Unclear expectations on efficiency/\$$$ * Main notion * Want to know something about users, e.g., for improving usability, without leaking privacy * PPM WG formed for solving this problem * Wishful thinking * Desire for "semantic security": Database should not revel anyhthing about a user that can't be learnt without access to database * Techniques/schemes * Differential privacy: Add random noise to either each input measurement or to output aggregate * RAPPPOR \[EPK14\]: * Add local random noise => costly * PROCHLO \[BEM+17\] * More effidcient than RAPPOR, different architecture * Required "trusted architecture" * Have to trust a third-party for "shuffling" * Prio \[CGB17\] * PPM WG is working on standard for Prio-based schemes * Secure aggregation with small number of non-colluding aggregation servers * Inherent leakage (so-called "f-privacy"): * E.g., when computing the mean, leak the number of inputs * "Robustness" against malicious clients via "SNIPs" (secret-shared, non-interactive proofs) * Only suitable for "numeric values" * STAR \[DSQ+21\] * Solves the "heavy hitters" problem * Server only learns the value of a measurement if at least k clinets submit that measurement * Client constructs ciphertext for their data, sends k-of-N secret share of the randomness used to encrypt * Three entities: * Client * Randomness server - helps client compute secret shares * Aggregation server - does the actual aggregation * Goal: low monetary cost * Poplar \[BBCG+21\] * Similar security properites as Prio * Requires two non-colluding servers (Prio needs only 1-of-N honest servers) * Other considerations * \[There are big tables here in the slides that might be useful\] * User nneds: Voice of the user is absent right nows * Do users understand that or how their data is collected in a privacy-preserving manner? * Even when some PPM mechanism is in-use, user consent still needs to be considered. # The Decoupling Principle {#the-decoupling-principle} by Barath Raghavan (USC / INVISV) Lots of cool new privacy technqiues: What do they have in common? This talk about is one design principal core to many of them * Nutshell: * Decouple who you are from what you do. * Old idea that gets redfiscovered periodically * Work of Chaum introduces this idea in the 80s * Easiest when splitting entity and mechanism * Applying principal is always protocol/context specific * Context * "Oridnary" data confidentiality is "nearly solved" * TLS, encryption of data at rest, etc. * What's left: Layered metadata privacy problem * Many solutions needed, and they overlap * Privacy challenges are fundamental to the Internet * We reliy on others to handle traffic * Terminology * (Non-)sensitivity of information (context-specific) * WAnt to define: "There is some party that, in some context, has information about some user." * Caveats * Inhrently difficult to categorize information as (non-)sensitive * Identity and data often conflated * Example: Mix-nets / onion routing (Tor) * Sender sends mesasge to some receiver, wants metadata privacy for ID and message * Sender: knows all sensitive information (sender message, ID) * Mix 1: knows sender ID * Mix 2, ..., N: knows a message was sent by a user * Reciver: knows message * Decoupling principle: Third parties should know at most one sensitive piece of information (in this case, either the ID or message but not bouth) * Many otheer exmaples! * Chaum's designs * Privacy Pass * Oblivious DNS (a la ODoH) * PGPP ("pretty good phone privacy") * Mobile ID, user ID: Know entity knows both * Private Relay * First relay knows the IP, second knows the origin being reqwuested (neither knows both) * Some private aggregate systems * Why does this work? * Users care about hiding their identity and (meta)data of their requests * User's don't care about * whether they reveal they're a user * whether they can hide that a request/response was handled by service * Cautionary Tale: Security Gateways/VPNs * Sender: all sensitive data: * Gateway: All sensitive info * Receiver: non-sensitive user ID + sensitive data * NO DECOUPLING!! * Other considerations: * Non-collusion * trusted exsecution environments * Side-channels # Practical Privacy-preserving Authentication for SSH (to appear at USENIX '22) {#practical-privacy-preserving-authentication-for-ssh-to-appear-at-usenix-22} ia.cr/2022/740 by Mike Rosulek * SSH authentication via public keys * Client asks if it can authenticate with a given public key, server responds with "yes" / "no" answer * Problem: Clients are easily fingerprintable. * Client tries all public keys it has in some order, even those that aren't intended to be used by the server * Ben Cox / Fillippo Valsorda demonstrated that public keys can be scraped from SSH and used to fingerprint GitHub users. * Fillippo developed a tool for this. * Mitigation: Client only sends the "right" public keys to the server * Problem: malicious client can still "probe" the server to enumerate other users * Problem: Server sees which key was used * Problem \[Note taker missed it\] * New authentication method for SSH * Big picture: Use a mixture of keys in a single authentication attempt * Server doesn't see which one was used * Client does not learn if a public key can be used for authentication unless it holds the corresponding secret key. * Client won't connect unless server knows and "explicitly includes" one of the client keys * Overview of the protocol * New primitive: Anonymous multi-KEM * Many decapsulaters, scheme hides the identity of each * Use private set-intersection (PSI) * Client learns intesrection of keys that can be used, server learns only if the client can uathorize * Technical contributions: * Ananymopus multi-KEM instantiation compatible with SSH * \[Note taker missed something having to do with PSI\] * UC security * Concrete performance (total round-trip time of authentication): * Worst case: all RSA keys * client has 20 keys, server has 100: 320 ms * Best case: all ECDSA/EcDSA keys * client has 20 keys, server has 100: 28 ms * Q+A * How does this apply to current issues for GitHub? * Mike: See paper. # draft-irtf-pearg-safe-internet-measurement {#draft-irtf-pearg-safe-internet-measurement} by Mallory Knodel (CDT) * Adopted by WG, need help! * New document structure: * Intro * Consent * Safety considerations * Risk analysis * Open issues (github.com/IRTF-PEARG/draft-safe-internet-measurement) * All are about identifying additional resources * Next steps * Is the Table of Contents "complete"? Any missing sections?f * IAB workshop M-TEN on measurement techniques in encrypted networks * Call for papers is out, we should submit a draft. * Wants to make network measurement easier/safe even when traffic is encrypted * Chair (Chris W.): PPM WG might benefit from this document (and vice versa) * Mallory: Folks who are involved in PPM: Let's chat! * Chair: Censorship draft \[did notetaker get the name right\] cleared last call, should see RFC soon