IRTF MAPRG agenda for IETF-110 (online)

Date: March 8 2021, 15:30 - 16:30 UTC Webex link: TBD

Overview & Status - Mirja & Dave (MAPRG) (5 min)

IRTF Note-well: https://irtf.org/policies/irtf-note-well-2019-11.pdf

Collecting "Typical" Domain Names for Web Servers - Paul Hoffman (10 min)

See accompanying report here: https://www.icann.org/en/system/files/files/octo-023-24feb21-en.pdf

Measuring DNS over TLS from the Edge: Adoption, Reliability, and Response Times - Trinh Viet Doan (15 min)

This is preview of an accepted paper for PAM 2021, March 29-31 2021: https://www.pam2021.b-tu.de/accepted/

Assessing the Privacy Benefits of Domain Name Encryption - Nguyen Phong Hoang (15 min)

This paper was presented at ASIA CCS ’20, October 5–9, 2020, Taipei, Taiwan, and is available here: https://arxiv.org/pdf/1911.00563.pdf

qlog: facilitating analysis of QUIC, HTTP/3 and other (encrypted) protocols - Robin Marx (5 min)

Explicit Congestion Notification (ECN) Deployment Observations - Pete Heist (10 min)

This is about draft-heist-tsvwg-ecn-deployment-observations.

Abstracts

Collecting "Typical" Domain Names for Web Servers - Paul Hoffman

When researchers measure the properties of the authoritative Domain Name System (DNS) servers on the Internet, they first need to define the types of authoritative servers they are sampling. The authoritative servers might be for domain names used for websites, for mail servers, for Internet infrastructure, and so on. Collecting domain names used for web servers is seen by many researchers as being fairly easy, and is thus the basis of much research on authoritative name servers.

However, the current collections of domain names against which one can do research are not that good for making assessments about “typical” domain names. The most popular websites are usually better managed than average websites, so lists of the most popular websites are not terribly representative of the web itself. Extracts from generic top-level domain (gTLD) zone files have many inactive names that are parked or are abandoned, so they too are not representative of the web. Dumps from passive DNS collection systems are inherently regional, and also skewed strongly against websites that are real but not popular.

One source of URLs for typical websites is the collection of the Wikipedias for all the languages of the world. Wikipedia pages often have links to sites that other sources would not have, such as the governments of small cities, colleges and universities of all sizes, obscure sports teams, small regional music and movie studios, personal sites of people who wrote just one popular blog article, and so on.

Wikimedia, the parent organization for all the Wikipedia sites, makes it easy to cull all the outward-facing URLs from the pages from all the Wikipedias. With that set of URL to reduce them to just domain names, and from there to create a set of unique instances of each of those names. This paper shows a methodology for creating a list of unique names, how a sample of those names was used to determine how many domain names for websites have IPv6 addresses, and how many are signed with Domain Name System Security Extensions (DNSSEC).

Note that the dataset here is derived from Wikipedia data, it is in no way associated with Wikipedia itself.

Although the dataset described here cannot be considered fully “typical” of the web, it addresses the drawbacks of many more commonly used lists. This document also discusses the properties of the dataset that would make it less than “typical” for the web, and also compares it with datasets of the most popular websites.

Measuring DNS over TLS from the Edge: Adoption, Reliability, and Response Times - Trinh Viet Doan

The Domain Name System (DNS) is a cornerstone of communication on the Internet. DNS over TLS (DoT) has been standardized in 2016 as an extension to the DNS protocol, however, its performance has not been extensively studied yet. In the first study that measures DoT from the edge, we leverage 3.2k RIPE Atlas probes deployed in home networks to assess the adoption, reliability, and response times of DoT in comparison with DNS over UDP/53 (Do53). Each probe issues 200 domain name lookups to 15 public resolvers, five of which support DoT, and to the probes’ local resolvers over a period of one week, resulting in 90M DNS measurements in total. We find that the support for DoT among open resolvers has increased by 23.1% after nine months in comparison with previous studies. However, we observe that DoT is still only supported by local resolvers for 0.4% of the RIPE Atlas probes. In terms of reliability, we find failure rates for DoT to be inflated by 0.4–32.2 percentage points when compared to Do53. While Do53 failure rates for most resolvers individually are consistent across continents, DoT failure rates have much higher variation. As for response times, we see high regional differences for DoT and find that nearly all DoT requests take at least 100 ms to return a response (in a large part due to connection and session establishment), showing an inflation in response times of more than 100 ms compared to Do53. Despite the low adoption of DoT among local resolvers, they achieve DoT response times of around 140–150 ms similar to public resolvers (130–230 ms), although local resolvers also exhibit higher failure rates in comparison.

Assessing the Privacy Benefits of Domain Name Encryption - Nguyen Phong Hoang

As Internet users have become more savvy about the potential for their Internet communication to be observed, the use of network traffic encryption technologies (e.g., HTTPS/TLS) is on the rise. However, even when encryption is enabled, users leak information about the domains they visit via DNS queries and via the Server Name Indication (SNI) extension of TLS. Two recent proposals to ameliorate this issue are DNS over HTTPS/TLS (DoH/DoT) and Encrypted SNI (ESNI). In this paper we aim to assess the privacy benefits of these proposals by considering the relationship between hostnames and IP addresses, the latter of which are still exposed. We perform DNS queries from nine vantage points around the globe to characterize this relationship. We quantify the privacy gain offered by ESNI for different hosting and CDN providers using two different metrics, the k-anonymity degree due to co-hosting and the dynamics of IP address changes. We find that 20% of the domains studied will not gain any privacy benefit since they have a one-to-one mapping between their hostname and IP address. On the other hand, 30% will gain a significant privacy benefit with a k value greater than 100, since these domains are co-hosted with more than 100 other domains. Domains whose visitors’ privacy will meaningfully improve are far less popular, while for popular domains the benefit is not significant. Analyzing the dynamics of IP addresses of long-lived domains, we find that only 7.7% of them change their hosting IP addresses on a daily basis. We conclude by discussing potential approaches for website owners and hosting/CDN providers for maximizing the privacy benefits of ESNI.

qlog: facilitating analysis of QUIC, HTTP/3 and other (encrypted) protocols - Robin Marx

The qlog project enables structured logging of protocol events directly at the endpoints. This bypasses scalability and privacy related issues inherent in utilizing packet captures, especially for encrypted protocols. It also allows for the creation of re-usable tools and the easier sharing of datasets.

While originally focused mainly on QUIC and HTTP/3, we are now starting an effort to make qlog a protocol-agnostic logging framework. This comes with some salient challenges, and we hope to gather feedback from the maprg to make sure qlog can be optimally used for protocol measurement and analysis work in the future.

Explicit Congestion Notification (ECN) Deployment Observations - Pete Heist

This note presents data gathered at an Internet Service Provider's gateway on the observed deployment and usage of ECN. Relevant IP counter and flow tracking data was collected and analyzed for TCP and other protocols.