Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.

Minutes IETF113: irtfopen
minutes-113-irtfopen-01

Versions:

Meeting Minutes		IRTF Open Meeting (irtfopen) RAG
Date and time		2022-03-22 13:30 Meeting Local UTC
Title		Minutes IETF113: irtfopen
State		Active
Other versions		plain text
Last updated		2022-03-31

minutes-113-irtfopen-01

IRTF Open Meeting
=================

Tuesday, 22 March 2022, at 13:30-15:30 UTC
Room: Grand Park Hall 2

Chair: Colin Perkins
Minutes: Mat Ford

## Introduction and Status Update

IRTF Chair
Slides:
https://datatracker.ietf.org/meeting/113/materials/slides-113-irtfopen-introduction-and-agenda-00

## Solar Superstorms: planning for an Internet apocalypse

Sangeetha Abdu Jyothi
Paper: https://dl.acm.org/doi/10.1145/3452296.3472916
Slides:
https://datatracker.ietf.org/meeting/113/materials/slides-113-irtfopen-solar-superstorms-planning-for-an-internet-apocalypse-00

Q&A:
Wes Hardaker: Thankyou for the excellent work. Will pass this on to ham
radio friends. I am a root server operator. Root instances or DNS
instances maybe their installation doesn't pose a problem, but if they
become disconnected from the rest of the system they'll fail to get
updates at which point DNSSEC will stop validating after a while as the
signatures expire. Did you find islands of connectivity that would
otherwise become cut off?

SAJ: Important question. I don't have a complete answer yet. Initial
paper only looked at primary areas impacted, not the end-to-end
discussion. Currently I'm doing more work to flesh out more complete
answer. It could be possible that there could be a few very big islands,
but not a lot of tiny islands. Preliminary analysis suggests most of Asia
to Europe connectivity will stay up, but US<->Europe is most vulnerable.
Complete disconnection is hard to predict, but significant reduction in
capacity between bigger land masses is possible.

Nicolas Kuhn: I think it is worth pointing out that there are regular
solar storms and apart from Starlink accident with very low earth orbit
satellites, satellites were fine. Do you have pointers to justify your
assumptions on satellites being vulnerable?

SAJ: Sats typically have shielding to protect them from solar activity
during their lifespan, 5-10 years. In the past decade or two the number
of satellites has grown exponentially. We have not had a large storm in
that time. Starlink satellites were impacted when they were at a lower
altitude than their operating altitude, they were hit by two successive
G1 (low intensity) storms, and de-orbited. If a G5 (most intense) storm
happened, we don't know what the impact would be for sats, even those at
operating altitude. Radiation shielding does offer some protection but
it's not guaranteed to protect against large solar storm. Satellites do
have thrusters to correct de-orbiting events but they require connectivity
to Earth ground stations for command and control. Storms could impact
this connectivity for 10 - 12 hours. So very large storms do have the
capacity to impact satellites even when operating at higher altitudes.
Before current International Space Station was installed, the US had
Skylab which was destroyed in the 1970s as a consequence of solar storms.
We have not experienced such a large storm very recently.

Richard Sheffenegger: Is the impact focussed on the high latitude /
sun-facing side? Or are induced currents the same globally (just
depending on the inductor loop area)?

SAJ: When it comes to the direct impact on satellites, those facing the
sun are at a much higher risk but when we look at induced current, these
are caused by interaction of these magnetic particles with Earth's
magnetic field. In the case of induced current, it's not just the
sun-facing side, the dark side is also vulnerable. Higher latitudes on
both sun-facing side and dark side are equally vulnerable.

CSP: Is there anything we should be doing differently when we design
protocols that would help with resilience to this type of event?

SAJ: Need to consider resilience at every layer of the stack. With DNS
for example, root servers are well distributed, but we don't know how the
entire hierarchical tree would be impacted - do we need to change
caching, or change how DNS records are managed? That is not clear. When
it comes to protocols, BGP which allows only a single path might be too
restrictive when capacity is severely limited. That's further analysis we
are planning to do. Within an AS, OSPF or other intra-AS routing
protocols fair very well because they are decentralised and can use
whatever paths are available. The inter-domain protocol needs more
investigation.

CSP: Makes sense. I guess there's a whole bunch of coordination issues
and management issues with large scale cloud provider infrastructure
networks and so on as well. Some interesting problems.

## Unbiased experiments in congested networks

Bruce Spang
Paper: https://arxiv.org/abs/2110.00118
Slides:
https://datatracker.ietf.org/meeting/113/materials/slides-113-irtfopen-unbiased-experiments-in-congested-networks-00

Q&A:
Jana Iyengar: Great talk - very illuminating at a minimum. This is
something everyone should take into account when conducting these
experiments. Shows clearly that small A/B tests aren't necessarily good
enough. Wondering if you've examined what would happen if you made a
client sticky for a particular A/B test. How exactly was choice made for
serving a particular type of content? Bottleneck has to be shared between
control and experimental groups. If bottleneck is close to the user then
if user is in a bucket that is either control or treatment then the user
would effectively be at the far end of your scale. Have you looked into
this? Where your experiments sticky to users?

BS: There are things we can do on the allocation side to avoid some of
these issues. Experiments we ran were sticky to users. Overall point
you're making is true - if you can guarantee that users will never share
resources then you can avoid this bias. If you believe users are not
sharing any bottleneck links with each other then you avoid bias. Didn't
explore this too much because we found it hard to measure whether users
were sharing links, and didn't have a good sense of how often that would
be the case. If you were allocating users instead of sessions, gut
instinct that this is better than just allocating sessions in terms of
interference. You could also allocate networks or particular servers or
try to reduce probability that treatment and control share the same link -
will reduce bias of the experiment.

JI: Very helpful. If you tried to use these techniques, it could tell you
something about where users do end up sharing bottlenecks as well.

Brian Trammell: If we use what we know about networks to design
experiments about networks we can get a lot smarter about this.
Switchback experiment looked like a diagram of TDMA. A/B tests are like
CDMA. Are there ways to use this multiple access metaphor to find other
ways to analyse this? My SRE mind wants to automate code partition, watch
with things show up in the power spectrum. Wondering if there's a more
fundamental way to split this up. Not expecting an answer right now!

BS: Super interesting question, don't have an answer offhand. One thing
social networks do is allocate a user and all of their friends to a
particular experiment. Could be something there for networks too.

Jonathan Morton: One of the big lessons to take away from this is how
important it is to understand the testing methodology in detail when we
look at a set of purported results particularly ones that are used for
marketing. Fine details can have a big effect such as how a particular
A/B test was conducted and how potential bias that you identified has
been mitigated.

BS: Definitely.

CSP: You said there was a need for better experiment methodology. What if
anything should we be doing when we're designing and evaluating new
protocols to improve confidence and results? Is there any general
guidance we should be providing, or is it read this paper and think about
these issues?

BS: We build good systems with what we do today. This gives us another
tool to think about evaluating algorithms. When designing new algorithms
in the IETF I'd think about the fact that the way we run experiments to
evaluate these algorithms can be biased so think about other ways to run
them that might mitigate those effects.

CSP: Makes sense. Thanks!

Recordings of the talks, and links to the papers, are available from
https://irtf.org/anrp/

Minutes IETF113: irtfopen minutes-113-irtfopen-01

Minutes IETF113: irtfopen
minutes-113-irtfopen-01