Privacy Enhancements and Assessments Research Group (PEARG) Agenda - IETF 115

Time and Date

Administrivia (5 mins)

Drafts (10 mins)

RG draft statuses

In IESG review:

Sent to IRTF chair:

Active RG documents

draft-irtf-pearg-safe-measurement

Mallory Knodel: can send messages to folks in ppm or other groups
Mallory: want more feedback, especially from those working on
measurement
Mallory: conversation in IAB M-TEN workshop/meeting as well, how to do
harm reduction holistically in measurement

draft-irtf-pearg-ip-address-privacy-considerations

Bradford Lassey: pushed new version last week, needs a solid editorial
pass
Bradford: want comments on how the document can be made more useful

Presentations (75 mins)

David Oliver - Clean Insights Overview (20 mins)

Presentation

Slides:
https://datatracker.ietf.org/meeting/115/materials/slides-115-pearg-clean-insights

privacy-preserving measurement to empower user
started in 2017 at BK-CIS at Harvard and MIT Media Lab
2020-2022: symposium, and cleaninsights.org is born!

goal: preserving privacy while driving product development
in 2020, symposium held where we had the following observations: (1)
devs want patterns and insights not in a way that alienates/endangers
users. (2) even small projects want some sort of measurement. (3) many
invasive options exist, but building a safe platform still needs to be
done and it will attrac devs

looked at (1) aggregating data at the source. (2) server discarding PII.
(3) making measurement to transparent to the user. (4) generalising data
collected, use deresolution to reduce identifiability

developed: client SDKs, anonymizing proxy, and a best practices guide

Clean Insights Proxy sits b/w analytics engine and users' devices.
Another possibility is domain/other fronting.

Supporting basic functionalities that people expect: counting users,
installs, locations, crash reports, surveys. etc. If these didn't exist,
devs would go back to invasive/dirty solutions.

Improving anonymity: send reports in batches (say every week), and hide
timestamps of measured activities. err on the side of consent.

Borrowing from law: time-bound "campaigns" when measurements are made.

Consent: consider what you're collecting, ask specific questions. get
rid of the source data once the insight is arrived at.

Consent principles: Digestible, Transparent, Affords Variance (users
have options), People-centric

Examples of campaigns. Umbrella (time limited), Circulo (designed like a
focus group, state exactly what will be measured for how long)

Implementations and impact reports available: Mailvelope, Save by Open
Archive, Tella, WeClock

we're promoting is that orgs become "conscientious collectors"

Other PPM approaches: affinity is that proxying is probably necessary;
principle that question/decisions should be determined before data
collection; sanitising data so that leaks are not toxic. CleanInsights
can fail on one-time visits, relies on implementers to know what is
toxic/non-toxic data.

More info on cleaninsights.org. check out consent guide, gitlab repo and
impact reports.

Questions

Stephen F.: how do you implement something like the right to be
forgotten? how do you trace what needs to be deleted?
David: Haven't thought about that in detail yet. We do give the option
to people to opt out of campaigns (and still giving the app
functionality if they do opt out).

Jonathan Hoyland: meaningless of consent questions, and giving consent
means going through multiple hurdles.
David: understand "consent" has been degraded in the current
circumstances. small projects, this is not internet scale yet. but we do
want consent to be meaningful.

Konrad K: are you working on reducing trust on the collector? using
cryptographic techniques, etc.
David: Haven't implemented it yet, but other presentations and PPM have
given us ideas on how to do that. for now making sure that our
architecture can incorporate those techniques at a later stage.

Sofia Celi / Dan Jones - Practically-exploitable Cryptographic Vulnerabilities in Matrix (25 mins)

Presentation

Slides:
https://datatracker.ietf.org/meeting/115/materials/slides-115-pearg-matrix

(Sofia Celi starting the presentation)

Matrix is a standard for secure, decentralized real-time messaging. It
provides interopable messaing and calls. Long-term goal is also to
achieve underlying messaing and data sync for apps.

Matrix optionally supports e2ee, untrusted servers.

Element is the flagship client, has over 60 million users. Matrix is
being used by German and French govt departments. Other orgs are also
planning to adopt it.

Security properties in Matrix: confidentiality, integrity and
authentication. there is also partial forward secrecy.

Parties in Matrix: user, device, homeserver. Homeserver stores
communication history and account information. User can be several
devices.

Each user has a cryptographic identity. It achives trust b/w the user's
devices. Each device has a cryptographic identity, this is generally
used to establish communication channel keys and exchanges.

User cross-signing keys verifies devices keys.

Olm: pairwise secure channels (between devices). Modification of Signal
double ratchet and 3DH.

Megolm: group messaing through unidirectional channels. similar to
Signal "sender keys". has not been formally analysed.

Attack 1a: membership events are unsigned. Group membership is managed
through events (when someone wants to join/modify info). But not
authenticated. Faulty assumption that only 'user' messages need to be
encrypted. Group membership events were not, and a malicious homeserver
could send fakes ones.

Attack 1b: Server controls the list of user's devices. Homeserver
provides this list. Probably too onerous on users to

(speaker is now Dan Jones)

Attack 2: out-of-band verification. How can users ensure their
connection is not being MITMed? Short Authentication String: a shared
secret is shared out-of-band. If they match, true cryptographic
identities are sent to each other using a secure channel (constructed
using said shred secret).

Attack 2 (contd.): Homeserver can trick device into sending a
homeserver-controlled identity (rather than their own). This is caused
by lack of domain separation b/w cross-signing key identifiers and .

Attack 3: Semi-trusted impersonation. When a user adds a device, they'd
like the new device to see old messages. In this process though, there
are missing checks. result is that attacker can create their own device,
room session and impersonate. implementation mistake, but the
specification was not detailed enough as well.

Attack 4: Trusted impersonation. First do a semi-trusted impersonation,
then set up a room session (over Megolm). this passes checks.

Attack 5: Confidentiality break. Two new subprotocols: (1) Megolm key
backups on the server, and a recovery key is shared b/w devices to
decrypt them. (2) secure storage and secret sharing.

Attack 5 (contd.): use the impersonation attacks. +trick a target
device. decrypt all the messages. this is an implementation mistake.

(speaker is now Sofia again)

Matrix is trying to solve really difficult problems (secure group
messaging, backups, history sharing, federation, multiple clients across
platforms). We need more formal proofs though. Identify gaps in specs.

Questions

Stephen F.: did you look at federation b/w homeservers?
Dan Jones: for our work, no. but there is work on state-sync, can check
and send.

Dan: next work is on modeling. matrix fixed these issues really fast.
Sofia: understand what kind of security properties they are providing.
eg. deniability was a goal, but not formally designed.

Vittorio Bertola: are these the same vulns discovered in September? very
pertinent to work of the mimi wg.
Sofia: same. disclosed and fixed now. attended mimi, and it's quite
important that they have analysis going hand in hand with the dev.

Jonathan H.: side meeting tomorrow on formal methods!

Simone Basso - OONI Measurement of Internet censorship (30 mins)

Presentation

Slides:
https://datatracker.ietf.org/meeting/115/materials/slides-115-pearg-ooni-measurement-of-internet-censorship

presentation willl provide overview of how OONI works. but will also
focus on measurement of encrypted protocols and some experimental tests.

OONI is a free software project. Since 2012, more than a billion network
measurements have been collected.

There are apps (mobile apps, desktop apps, CLI). There are multiple
tests in clients. The flagship experiment is web connectivity, which
checks whether a URL is a blocked on a user's network.

not focusing on instant messaging and circumvention today. psiphon+ tor+
snowflake tests are under cirvumvention. there are also other
experimental tests, not run as frequently though.

Measurement principles: (not all tests follow these principles, recent
ones do though.) the test takes a URL as input. first, we need to know
all ways a website could be censored. second, we need to know whether
the website is actually up and available at the time the test is
running.

Principles (contd.): first step is DNS lookup. we have used getaddrinfo.
historically, most censorship is implemented by the DNS resolver of the
internet service provider (ISP). more recently, we have noted that
unecnrypted (udp) DNS queries may also be intercepted. we do both types
of queries. we get set of IP addresses or errors.

Principles (contd.): classified as censorship/anomaly (unexpected, may
be indicative of censorship)/error based on response. when we have a set
of IP addresses, we construct the endpoint. (even if DNS response was
censored, we send true IPs to users to continue with other tests).
continue with TCP, TLS connections.

Censorship in Iran: there is already censorship in Iran, but something
changed in September during the protests. first chart shows censorship
DNS over HTTPS (DoH) servers. On September 24, many measurements
indicated that DNS of open DoH servers was being tampered with. TCP and
TLS connections were timing out.

Censorship in Iran (contd.): bit more complicated than just detecting
SNI. also depends on IP address. eg. when we provided true IP addresses,
some connections did succeed.

Censorship in Russia: study published in March. first, like others, we
noticed that twitter was not accessible like before. suspected
throttling. we measured speed during TLS handshake. something happened
around 26 Feb, handshakes were timing out/very slow for many
measurements. now we are working on detecting throttling using a body
fetch. scope is that such throttling is extreme, and usually disrupts
access.

QUIC censorship: our main website censorship test does not test for QUIC
yet. but constructed an experimental test for QUIC. Compared failure
rates of HTTPS vs. HTTP/3. In China, it is quite common to block IP
addresses. So some blocking is directed at that, and UDP connections
were blocked like TCP connections. But censorship using resets, etc. we
were able to evade in certain circumstances using HTTP/3.

Quicping: we send an intial quic packet, minimum size. we send an
invalid version. we expect it to receive something. quic is a new
protocol, so circumstances may be changing every day.

DNSCheck: measures DoT/DoH blocking. follows the principles discussed in
the beginning of the presentation. even if DNS is censored, true IP
addresses are connected to. the test is now in OONI probe clients.

DNSCheck (contd.): we now have data, not fully analysed yet. initial
results show: eg. Cloudflare DoH is being censored (at TLS stage) in
China, Saudi Arabia.

Conclusion: support DoH3 and DoQ in DNScheck. also want to mimic
broswers' fingerprints (may be associated with certain censorship
measures). need to integrate Quic into our main web connectivity
measurements.

Questions

Marco Davids: do you look at the root cause of blocking? our website
(???) is blocked in Iran. it's blocked by Google's end, not by Iran
networks probably.
Simone: it is not something we conclude directly with our tool. but
indirectly, our measurements can help us identify it. Google probably
gives 403, so we can see that in the data (whereas if Iran networks were
blocking it, it would be connection reset or timing out).

Alex C.: are you planning on any extensions on quicping? you are only
doing negotiation. but there is SNI elements in the final RFC. quicping
would underreport censorship in its current design.
Simone: did not know that.

DKG: any insight into how you decide how much information you want to
take/gather? just used OONI probe and it detected blockpages, which may
not be true. also relates to responsible information gathering (first
presentation).
Simone: we inform users completely, they have to give consent and we
provide a summary of risks. not an anon tool, quite the opposite. the
client submits measurements to our server. we ask a question so that
users demonstrate they know what OONI is doing. eg. ISP will know that
OONI is being used.
Simone: part of why we have experimental tests, users need to know what
they're doing to run those (not available normally).