Agenda IETF110: maprg

Meeting Agenda Measurement and Analysis for Protocols (maprg) RG
Title Agenda IETF110: maprg
State Active
Other versions markdown
Last updated 2021-03-08

Meeting Agenda

## IRTF MAPRG agenda for IETF-110 (online)

Date: March 8 2021, 15:30 - 16:30 UTC
Webex link: TBD

### Overview & Status - Mirja & Dave (MAPRG) (5 min)
#### IRTF Note-well:

### Collecting "Typical" Domain Names for Web Servers - Paul Hoffman (10 min)

See accompanying report here:

### Measuring DNS over TLS from the Edge: Adoption, Reliability, and Response
Times - Trinh Viet Doan (15 min)

This is preview of an accepted paper for PAM 2021, March 29-31 2021:

### Assessing the Privacy Benefits of Domain Name Encryption - Nguyen Phong
Hoang (15 min)

This paper was presented at ASIA CCS ’20, October 5–9, 2020, Taipei, Taiwan,
and is available here:

### qlog: facilitating analysis of QUIC, HTTP/3 and other (encrypted) protocols
- Robin Marx (5 min)

### Explicit Congestion Notification (ECN) Deployment Observations - Pete Heist
(10 min)

This is about draft-heist-tsvwg-ecn-deployment-observations.

### Abstracts

#### Collecting "Typical" Domain Names for Web Servers - Paul Hoffman

When researchers measure the properties of the authoritative
Domain Name System (DNS) servers on the Internet, they first need
to define the types of authoritative servers they are sampling. The
authoritative servers might be for domain names used for websites,
for mail servers, for Internet infrastructure, and so on. Collecting
domain names used for web servers is seen by many researchers
as being fairly easy, and is thus the basis of much research on
authoritative name servers.

However, the current collections of domain names against which
one can do research are not that good for making assessments
about “typical” domain names. The most popular websites are
usually better managed than average websites, so lists of the
most popular websites are not terribly representative of the
web itself. Extracts from generic top-level domain (gTLD) zone
files have many inactive names that are parked or are abandoned,
so they too are not representative of the web. Dumps from passive
DNS collection systems are inherently regional, and also skewed
strongly against websites that are real but not popular.

One source of URLs for typical websites is the collection of the
Wikipedias for all the languages of the world. Wikipedia pages
often have links to sites that other sources would not have, such
as the governments of small cities, colleges and universities of all
sizes, obscure sports teams, small regional music and movie studios,
personal sites of people who wrote just one popular blog article,
and so on.

Wikimedia, the parent organization for all the Wikipedia sites, makes
it easy to cull all the outward-facing URLs from the pages from all
the Wikipedias. With that set of URL to reduce them to just domain
names, and from there to create a set of unique instances of each
of those names. This paper shows a methodology for creating a list
of unique names, how a sample of those names was used to determine
how many domain names for websites have IPv6 addresses, and how many
are signed with Domain Name System Security Extensions (DNSSEC).

Note that the dataset here is derived from Wikipedia data, it is
in no way associated with Wikipedia itself.

Although the dataset described here cannot be considered fully
“typical” of the web, it addresses the drawbacks of many more
commonly used lists. This document also discusses the properties of
the dataset that would make it less than “typical” for the web,
and also compares it with datasets of the most popular websites.

#### Measuring DNS over TLS from the Edge: Adoption, Reliability, and Response
Times - Trinh Viet Doan

The Domain Name System (DNS) is a cornerstone of communication on
the Internet. DNS over TLS (DoT) has been standardized in 2016 as
an extension to the DNS protocol, however, its performance has not
been extensively studied yet. In the first study that measures DoT
from the edge, we leverage 3.2k RIPE Atlas probes deployed in home
networks to assess the adoption, reliability, and response times of
DoT in comparison with DNS over UDP/53 (Do53). Each probe issues 200
domain name lookups to 15 public resolvers, five of which support
DoT, and to the probes’ local resolvers over a period of one week,
resulting in 90M DNS measurements in total. We find that the support
for DoT among open resolvers has increased by 23.1% after nine months
in comparison with previous studies. However, we observe that DoT
is still only supported by local resolvers for 0.4% of the RIPE
Atlas probes. In terms of reliability, we find failure rates for
DoT to be inflated by 0.4–32.2 percentage points when compared to
Do53. While Do53 failure rates for most resolvers individually are
consistent across continents, DoT failure rates have much higher
variation. As for response times, we see high regional differences
for DoT and find that nearly all DoT requests take at least 100
ms to return a response (in a large part due to connection and
session establishment), showing an inflation in response times of
more than 100 ms compared to Do53. Despite the low adoption of DoT
among local resolvers, they achieve DoT response times of around
140–150 ms similar to public resolvers (130–230 ms), although
local resolvers also exhibit higher failure rates in comparison.

#### Assessing the Privacy Benefits of Domain Name Encryption - Nguyen Phong

As Internet users have become more savvy about the potential for
their Internet communication to be observed, the use of network
traffic encryption technologies (e.g., HTTPS/TLS) is on the rise.
However, even when encryption is enabled, users leak information
about the domains they visit via DNS queries and via the Server
Name Indication (SNI) extension of TLS. Two recent proposals
to ameliorate this issue are DNS over HTTPS/TLS (DoH/DoT) and
Encrypted SNI (ESNI). In this paper we aim to assess the privacy
benefits of these proposals by considering the relationship between
hostnames and IP addresses, the latter of which are still exposed.
We perform DNS queries from nine vantage points around the globe
to characterize this relationship. We quantify the privacy gain
offered by ESNI for different hosting and CDN providers using
two different metrics, the k-anonymity degree due to co-hosting
and the dynamics of IP address changes. We find that 20% of the
domains studied will not gain any privacy benefit since they have
a one-to-one mapping between their hostname and IP address. On the
other hand, 30% will gain a significant privacy benefit with a k
value greater than 100, since these domains are co-hosted with
more than 100 other domains. Domains whose visitors’ privacy
will meaningfully improve are far less popular, while for popular
domains the benefit is not significant. Analyzing the dynamics of
IP addresses of long-lived domains, we find that only 7.7% of them
change their hosting IP addresses on a daily basis. We conclude by
discussing potential approaches for website owners and hosting/CDN
providers for maximizing the privacy benefits of ESNI.

#### qlog: facilitating analysis of QUIC, HTTP/3 and other (encrypted)
protocols - Robin Marx

The qlog project enables structured logging of protocol events
directly at the endpoints.  This bypasses scalability and privacy
related issues inherent in utilizing packet captures, especially for
encrypted protocols.  It also allows for the creation of re-usable
tools and the easier sharing of datasets.

While originally focused mainly on QUIC and HTTP/3, we are
now starting an effort to make qlog a protocol-agnostic logging
framework.  This comes with some salient challenges, and we hope to
gather feedback from the maprg to make sure qlog can be optimally
used for protocol measurement and analysis work in the future.

#### Explicit Congestion Notification (ECN) Deployment Observations - Pete Heist

This note presents data gathered at an Internet Service Provider's
gateway on the observed deployment and usage of ECN. Relevant IP
counter and flow tracking data was collected and analyzed for TCP
and other protocols.