IRTF DINRG – IETF 124 Meeting Notes (Extended Edition)

Date: November 4, 2025
Time: 14:30–16:30 UTC
Location: Meetecho (recorded session)
Chairs: Dirk Kutscher (HKUST(GZ)), Lixia Zhang (UCLA)
Notetaker: Saidu Sokoto


1. Opening and Administrativia


2. Cloud Outages, Fast and Slow – Indranil Gupta (UIUC)

Summary: An in-depth examination of cloud failure patterns, focusing on cascading outages in complex distributed systems and the AWS October 2025 incident.

Main Points
- Human error (70–85%) remains the dominant cause of outages, but automation now introduces “machine error.”
- The AWS Oct 19–20, 2025 outage began with an empty DNS entry (dynamodb.us-east-1.amazonaws.com) that caused cascading failures in DynamoDB, EC2, and Network Load Balancers.
- Race condition between asynchronous “enactors” led to data loss.
- Key lesson: “Asynchrony is a way of life” in distributed systems; lack of concurrency control in automated agents led to the failure.
- Proposed safety checks and “guardrails” for future automation and AI-driven infrastructure management.

Discussion Highlights (from chat and live Q&A)
- Brett Carr (AWS) clarified that “droplets” are physical hosts, not EC2 instances, confirming the presentation’s accuracy.
- Christian Huitema: Compared this to a “thundering herd” problem — multiple agents overwhelming shared resources.
- Lixia Zhang: “The root cause really is complexity.” Suggested the need for complexity audits akin to security audits.
- Vicky Risk: Observed that automation enabled dangerous parallelism — when humans did tasks sequentially, these errors were less likely.
- Andrew Campling: Urged involvement of ops engineers early in system design to mitigate such risks.
- Lixia Zhang: Distinguished between root cause of local failure (AWS) and root cause of global disruption, noting universities and education systems worldwide were affected.
- Indranil Gupta: Emphasized fixing bad programming “habits” that propagate through infrastructure layers (DNS often being the visible symptom). Suggested built-in configuration checks and mutual exclusion mechanisms for DNS enactors.
- Vicky Risk: Proposed “velocity controls” for health-check systems to prevent cascade amplification.

Action Points
- Explore research on complexity auditing and automated safety mechanisms for large-scale systems.
- Discussion to continue on DINRG mailing list regarding lessons from automation failures.


3. Measuring the Consolidation of DNS and Web Hosting Providers – Nick Feamster (University of Chicago)

Research Question: How reliant is the Internet on a small number of organizations for DNS and web hosting?

Key Findings
- DNS and web hosting are highly concentrated: Cloudflare and Amazon each >30% of domains; five companies (Cloudflare, Amazon, Akamai, Fastly, Google) host ~60% of top sites.
- Over 70% of domains use a single organization for authoritative DNS.
- Consolidation is global and consistent across vantage points.

Discussion Highlights
- Brian Trammell: Proposed analyzing consolidation within cloud providers by region (e.g., AWS us-east-1) via IP geo-location.
- Pete Resnick: Asked if the distinction between front-end (CDN) and back-end hosting was measured; Nick clarified that front-end dependencies were the focus, suggesting future work on backend metrics.
- Gianpaolo Scalone: Suggested extending analysis to include ECH-enabled domains, since they hide back-end structure.
- Christian Huitema: Shared that ICANN maintains monthly DNS concentration metrics (ICANN ITHI M9 graph) — potential collaboration.
- Andrew Campling: Warned of “digital colonialism”—global power concentration in few hosts who may ignore takedown requests, shaping global information flow.
- Dan Sexton: Expanded this into content control concerns—centralized hosts deciding what content remains online.
- Pete Resnick: Pointed out that decentralization complicates content moderation, creating a “whack-a-mole” scenario, which might not be entirely negative.
- Tom Newton: Suggested real-time tracking of website dependencies (“cloudiuse.com”) to expose such concentration dynamically.

Action Items
- Nick to release the measurement code and dataset publicly.
- DINRG to consider periodic re-measurement as a standing activity, potentially in collaboration with ICANN.


4. Beyond ECH: Architectural Directions for Source Privacy and Decentralization – Gianpaolo Angelo Scalone (Vodafone Group)

Motivation
- While Encrypted Client Hello (ECH) hides destinations, client-facing servers (CDNs) still see both client IP and target domain — allowing correlation and surveillance.
- This centralization introduces jurisdictional mismatch and privacy risks.

Proposal: Customer-Facing Relay (CFR)
- Lightweight relay operated at the ISP or enterprise edge.
- Randomizes or rotates source IPs, decoupling source identity from destination.
- Preserves TLS/ECH semantics without requiring DNS changes.
- Builds on customer–ISP trust relationship for accountability.

Discussion Highlights
- Christian Huitema: Compared CFR to Oblivious HTTP (OHTTP); Gianpaolo replied CFR is simpler and more deployable at the network edge.
- Andrew Campling: Praised CFR for balancing privacy with accountability through contractual ISP relationships.
- Fig: Argued that jurisdictional mismatch can be desirable (e.g., bypassing censorship); Gianpaolo clarified CFR complements ECH by adding source privacy, not removing freedom.
- Aldo: Linked CFR’s goals to the EU’s Digital Operational Resilience Act (DORA) on infrastructure independence.
- Christian Huitema: Noted privacy must protect against both big tech surveillance and state control by ISPs; CFR must address both adversaries.
- Andrew Campling: Quipped, “I can’t vote out big tech.”
- Dan Sexton: Highlighted vertical consolidation — browsers, devices, CDNs, and DNS often controlled by a single vendor, compounding privacy issues.

Action Points
- Continue technical analysis of CFR deployment risks on the mailing list.
- Coordinate with PEARG and HRPC for broader privacy and decentralization implications.


5. A Generic Framework for Building Dynamic Decentralized Systems – Diogo Jesus (NOVA University Lisbon)

Overview
- Presented GFDS, a modular system for building decentralized applications with dynamic behavior and adaptive resource use.
- Architecture includes protocol, event, discovery, resource, timer, communication, configuration, and security managers.
- Reference implementation: Babel, supporting decentralized storage, ML, and messaging.

Discussion Highlights
- Roland Bless (KIT): Mentioned “Vailet,” a Rust framework with similar goals. Diogo expressed interest.
- Dirk Kutscher: Requested elaboration on how GFDS supports decentralization properties and encouraged follow-up via the mailing list.

Action Points
- Further discussion of GFDS capabilities and evaluation use cases (swarm robotics, IoT, Web3) on the mailing list.


6. Panel Discussion – Architectural Trade-offs in Decentralized Social Systems

Panelists: Christine Lemmer-Webber (ActivityPub), Brian Truong (AT Protocol/Bluesky), Ted Hardie (moderator), with active participation from the audience.

Key Themes

Closing Reflections
- Dirk Kutscher: Quoted Tocqueville — “Decentralization is really, really hard.” Called DINRG a natural home for this ongoing exploration.
- Lixia Zhang: Emphasized starting with clear problem definitions before standardization.

Action Points
- Draft a DINRG informational document summarizing architectural trade-offs (identity, portability, governance, moderation).
- Coordinate with AT Protocol BoF for follow-up.


7. Open Discussion and Future Plans


8. Summary of Decisions and Actions

Topic Action Responsible
Cloud Outages Explore complexity auditing and automation safety Community
Infrastructure Consolidation Publish dataset, continue periodic measurement Nick Feamster & DINRG
Source Privacy (CFR) Analyze deployment risks, coordinate with privacy RGs Gianpaolo Scalone
GFDS Framework Share implementation details, collect feedback Diogo Jesus
Social Systems Panel Prepare informational draft on decentralization trade-offs Chairs & Panelists

Compiled from manual notes, auto-generated minutes, and full chat log of the DINRG session at IETF-124.