Minutes IETF111: pearg
|Meeting Minutes||Privacy Enhancements and Assessments Research Group (pearg) RG|
|Title||Minutes IETF111: pearg|
Administrivia (5 minutes)
Blue sheets / scribe selection / NOTE WELL
- Agenda Bash
Draft updates (20 minutes)
- IP address privacy draft missed the cutoff, came from interim, pretty skeletal
- More content comming, throw some in. Particularly usecase. Lots in interim. If interested let chairs and authors know.
Safe measurement update
* A number of prior contributions, a few small changes
* Fills major need
* Goal of draft: describe for academia and industry guidelines on measurements that don't violate user privacy
* Interesting things around scope. Important. Strengthened in last version
* Not substitute for ethics review. Complements, not replaces
* Tries to define better what Internet measurement scope means. Interested in definition.
* Identifies user and who it is safe for.
* In three parts
* Isolate risk with dedicated testbed
* Respect other infrastructure
* Data minimization
* Masking * Risk analysis * Alas TOC not coming out of XML yet! * Changes since -04 : disclosure issues * Recent research, IP address * Safety != Ethics? * Since -05 nits * Things still open in github
* Responsible disclosure
* Ip addresses
* Future computing capability
* Look at CADIA
* Want to bring in learnings, add to table of contents.
* Please open issues, better yet PRs * Very hard to explain what consent would mean in lower layers
- Stephen (from chat): two things on consent 1) be good to include examples of when that was handled well and when badly (or controversially) and 2) I think this document might end up being a model for other IETF docs that mention consent so it should be done carefully
- Mallory: Good idea on adding examples, depends on carefully. User consent shouldn't give free reign. Not clear this goes beyond what this context. Internet traffic is different.
IP address privacy update
New Work / Presentations (1hr 35 mins)
Website Fingerprinting in the Age of QUIC - Jean-Pierre Smith (20 mins)
- From PETS two weeks ago
- In early days if adversary wanted to see content could use Wireshark
- Now things are encrypted. But certs, SNI reveal what user is viewing.
- Users use PETs like Tor and VPNs
- Most information no longer there
- In ideal world end of story
- But... packet sizes, timings, directions.
- Creates fingerprinting
- Attacker sees features from websites, constructs model
- Usually interested in particular set of pages vs. not
- Quite a lot of work on Tor and websites
- Mostly TCP focused
- QUIC has multiple streams, vs. one flow per connection
- Traces can mix TCP and QUIC for the same navigation
- Collect large dataset by scanning webpages
- 100 pages are the target, 16000 the rest
- Number of classifiers trained, many deep learning. Identify webpage!
- Identify recall and precision
- TCP trained identifying QUIC?
- Doesn't work: TCP trained worked well on TCP, not QUIC
- Now train on QUIC and test on QUIC
- They work just as well QUIC-QUIC and TCP-TCP
- Some classifiers a bit better on QUIC than on TCP
- Could be due to QUIC server variability being limited, so variation across web pages different. Reduced middlebox interference?
- Mixed classification/Split ensamble
- Mixed: do both at once
- Split: detect QUIC or TCP, send to dedicated classifier for case
- Mixed: slight decrease in performance
- Split: Very simple trace distinguishing, 99% accuracy
- Due to handshake differences
- QUIC initial client hello quite large vs. SYN-ACK handshake
- Real easy!
- Ensamble: QUIC vs TCP provides weight to predictions of TCP and QUIC
- Not as good as Mixed for same sample budget
- Conclusions: QUIC not more difficult than TCP, some problems when TCP only classifier
- Joint possible, cost to adversary
- Antoine: How large is training vs test? Is there drift over time with pages, or is all the data gathered at once?
- A: Split was 90% training to 10% holdout.
- A: Didn't evaluate drift. Has been evaluated before by a number of people. Realistic Website Fingerprinting. Running collection of the webpages could maintain high precision and recall. Quite a number of recent works. Triplet fingerprinting: use 10 new samples to get it back up to high precision despite very old samples.
- Q: any other research in area?
- A: Four or five other groups doing it, expect more stuff coming out over the months and year
ShorTor: Improving Tor Network Latency through Multi-Hop Overlay Routing - Kyle Hogan (20 mins)
- Presented last meeting, excited to be back
- Work in progress
- Questions can be answered as going
- Short Tor is an overlay that reduces latency between relays by making better routing
- Design, Evaluation, Integration, Security
- We don't change the client
- Interest in feedback
- Overlay routing: sometimes fastest path between A to B goes through a server C.
- Tor already has lots of forwarding
- Can we take advantage of it?
- Go via another relay if faster
- To evaluate need latencies between things in Tor
- Ethics interlude: we've worked with Tor, let operators opt out, our relays are pretty restrictive, Don't record any connection we didn't make ourselves.
- Q: Since whole path unkown, per hop?
- A: right now additional control plane to indicate that this route should go Via.
- Only thing skipped is Onion encryption. Queuing not skipped.
- Currently focus on top consensus weights
- Top 125K relays
- Big, enduring, most circuits, stay up
- Challenging to get small relays in all-pairs dataset
- Future: all*
- Churn will complicate: relay stops existing midway through
- Graph gets presented!
- Simulated circuits through selection (limited to measured relays) see what the latency would be one way or the other
- Some ridiculous round trip delays
- High round trip times seem to be low BW relays, but not selected
- Circuit selection is unchanged. Via looks at latency. No via should ever accept traffic in excess of BW
- Using scheduling to make via lower priority then direct.
- If via slows down, stop using. Doesn't impact circuit.
- Integration: as Tor relays adopt can be used.
- Get good speedups even with small number supporting.
- Can get 1500ms speedups sometimes, with just a few hundred big relays supporting
- MATors framework and network traffic share based security analysis
- All circuit selections supported, including AS diversity
- Next steps: finish analysis: only have 1 M, not 50 M pairs
- Finish security analysis with representative dataset
- Dataset touchy subject
- Tor latency find exit relay that was chosen
- Client will never see it, relays decide
- Don't want clients to change behavior
- Q: Watson: Sounds pretty intrusive
- A: Yeah, need two new fields so intermediate will be able to learn how to forward, next relay will use previous relay instead of connection to disambiguate circuit ID
- Q: Antonine: does latency deanonymize?
- A: Yes, possibly: fast=short, so selecting relays on speed means close relays. We have less correlation because circuit is the same. But reducing latency closer to geodistance
Private Relay - Tommy Pauly (20 mins)
- Given a lot of talk about IP privacy, let's look at a deployment.
- Love to hear feedback on how this could evolve
- Private relay several pieces of IETF
- Separate IP addresses from origin servers accessed
- Not full Tor threat model, seemed very common linkage used by many parties for tracking and hurting privacy
- MASQUE, Oblivious DoH
- QUIC TLS 1.3 to connect to proxies
- To access using RSA blind signatures
- scope iOS15 and Mac OS Monterray (both beta)
- All Safari browsing
- All DNS traffic
- All unencrypted HTTP traffic
- Highest vuln traffic without all of it
- Underlying tech to protect against pixel trackers in mail
- Privacy goals
- No entity can connect who you are and what you are looking at
- Performance good enough for generic web browsing
- Left on, not flipped on and off
- Two hop minimum!
- Ingress and Egress proxy siting between access network and origin
- Ingress forwards encryped connection to Egress
- Operated by differen entities
- Clients control which, nested encrypion
- Gets manifest with hops and how to combine
- In order to track would need collusion. Policy enforced contractually
- Q: Jonathan: (more of a comment) Global passive adversary can identify through both hops. Impossible to prevent
- A: Yup
- Privacy not slow
- Aggressively use QUIC and MASQUE features to accelerate. Lots has to do with deployment and routing and global coverage
- Lots of fast open. Proxying at stream level
- If talking to normal TLS/TCP origin forwarding QUIC dgram through ingress that is request to egress+TLS client hello. Egress does the rest, without waiting
- Fast open, QUIC on last mile regardless of server IPv6 everywhere, Web on par, sometimes faster
- No v4 recapsulate in middle
- Break as little as possible
- No impact on local routes
- Failover for private hostnames and address
- Off if over VPN or proxy is being used
- Rough GeoIP preserved
- Hint to egress to use particular geolocationd data
- Long term need to move away from this use of IP addresses.
- More standards on geolocation and how that's shared and fraud prevention with IP privacy
- Expand MASQUE
- Open interoperable network
- Ingress into carrier networks
- Egress within content providers
- Client selection of policies, route selection
- Victor: Authentication of origin still end to end?
- A: yes. TCP proxied, TLS not, QUIC origins fully.
- Q: Stephen:- how would you characterise the longer term complexity trade-offs between this approach and trying to eventually move to something simpler and more generic but harder to get deployed like "all over Tor" or an equivalent?
- A: Would like to see consensus around deployment. MASQUE proxies good start. Tor compat interesting
- Q: Matthew: DNS for key management? Hard coded relays?
- A: Right now public keys are all coming from iCloud control plane. Short term decision for this feature. Long term more open and discoverable and extensible.
- Q: Andrew: Using the CFRG draft for RSA?
- A: yes, why we want that draft
FLoC - Josh Karlin (remaining time)
- Tech Team lead on Privacy Sandbox
- Trial, chewing on feedback, thinking on next
- Web partioning on top level track
- Lots of companies observe browsing. Would like to stop it
- First build walls: break third party cookies. Same origin policy
- Partition everything...
- Important use cases: SSO
- Personalized avertising fraud, logouts with federated login
- Lots of work here to provide them
- FLOC focused on interest based advertising
- Target array of interests not just context on page
- Goals to support interest adds, hard to track individuals
- Today script runs on broswers
- Backend takes contextual queues to user, sends profile back
- With FLOC ad-tech given cohort of similarity, predictive models find ads
- History as group, adtech backend few changes
- API rejects for reasons: sensitive cohort, incognito, history cleared
- Cohort: client side
- No new data.
- Only part used domains; no path or contents
- Thousands of users, no sensitive info, no fingerprinting service
- Encode user history by taking domains, hash into 64 bits. Sparse vector
- Random projection onto 50 dimensional space
- Apply grouping from Chrome server to 16 bits capturing thousands in each group
- Pages eligable: only pages using API, only without private IP and not oped out
- Origin trial concerned not representative for early adopters
- Now used sites with ads on them
- k-anonymous: 2,000 chrome sync users per cohort
- Prevent transmission of cohorts with sensitive sites in them via revoking if correlated
- Dropped 4% of cohorts
- Origin trial: Page and user opt outs, bunch of other things in slides
- got lots
- Especially Mozilla and privacy analysis
- No auto-opt in: Done
- Cohorts hard to understand for users: use topics to make clear what is revealed
- Topics would be curated, have curated lists
- Users can understand what they are indicating
- Users opt in or out
- Ad Topics Hint also related
- New fingerprinting surface
- Reduce it? Can we use 8 bits? Privacy sandbox tackles all the tracking
- Random topics with some probability?
- Give sites different topics?
- Taken together can drop cross-site fingerprinting issues
- Human curation and t-topic analysis
- Right now global browsing history
- Per third party topic based on where third party is
- 100% subsetting
- Disadvantages if multiple parties can work on once
- Q: What happens when off?
- A: Nothing vs random we're thinking about. Training easier if we understand cohorts have meaning. 5% random right now. Cohort fairly high
- Q watson: User justification for participation
- A: Sites get money, particularly tail sites. Big money from personalization
- Q followup: Need to see data, open and free doesn't mean making money from slimy people
- Q: Matthew: Recomputation of map?
- A: Yes. Has to be done regularly as web changes. Sensitivities change. Full intention of sharing
- Q: Wes: Do not track vs. this?
- A: Not sure question follows. What we offer is privacy sandbox. Still offering third party blocking and customization. Users still get to set up the barriers.