Skip to main content

Minutes IETF113: irtfopen

Meeting Minutes IRTF Open Meeting (irtfopen) RAG
Date and time 2022-03-22 13:30
Title Minutes IETF113: irtfopen
State Active
Other versions plain text
Last updated 2022-03-31

IRTF Open Meeting

Tuesday, 22 March 2022, at 13:30-15:30 UTC
Room: Grand Park Hall 2

Chair: Colin Perkins
Minutes: Mat Ford

## Introduction and Status Update

IRTF Chair

## Solar Superstorms: planning for an Internet apocalypse

Sangeetha Abdu Jyothi

  Wes Hardaker: Thankyou for the excellent work. Will pass this on to ham
  radio friends. I am a root server operator. Root instances or DNS
  instances maybe their installation doesn't pose a problem, but if they
  become disconnected from the rest of the system they'll fail to get
  updates at which point DNSSEC will stop validating after a while as the
  signatures expire. Did you find islands of connectivity that would
  otherwise become cut off?

  SAJ: Important question. I don't have a complete answer yet. Initial
  paper only looked at primary areas impacted, not the end-to-end
  discussion. Currently I'm doing more work to flesh out more complete
  answer. It could be possible that there could be a few very big islands,
  but not a lot of tiny islands. Preliminary analysis suggests most of Asia
  to Europe connectivity will stay up, but US<->Europe is most vulnerable.
  Complete disconnection is hard to predict, but significant reduction in
  capacity between bigger land masses is possible.

  Nicolas Kuhn: I think it is worth pointing out that there are regular
  solar storms and apart from Starlink accident with very low earth orbit
  satellites, satellites were fine. Do you have pointers to justify your
  assumptions on satellites being vulnerable?

  SAJ: Sats typically have shielding to protect them from solar activity
  during their lifespan, 5-10 years. In the past decade or two the number
  of satellites has grown exponentially. We have not had a large storm in
  that time. Starlink satellites were impacted when they were at a lower
  altitude than their operating altitude, they were hit by two successive
  G1 (low intensity) storms, and de-orbited. If a G5 (most intense) storm
  happened, we don't know what the impact would be for sats, even those at
  operating altitude. Radiation shielding does offer some protection but
  it's not guaranteed to protect against large solar storm. Satellites do
  have thrusters to correct de-orbiting events but they require connectivity
  to Earth ground stations for command and control. Storms could impact
  this connectivity for 10 - 12 hours. So very large storms do have the
  capacity to impact satellites even when operating at higher altitudes.
  Before current International Space Station was installed, the US had
  Skylab which was destroyed in the 1970s as a consequence of solar storms.
  We have not experienced such a large storm very recently.

  Richard Sheffenegger: Is the impact focussed on the high latitude /
  sun-facing side? Or are induced currents the same globally (just
  depending on the inductor loop area)?

  SAJ: When it comes to the direct impact on satellites, those facing the
  sun are at a much higher risk but when we look at induced current, these
  are caused by interaction of these magnetic particles with Earth's
  magnetic field. In the case of induced current, it's not just the
  sun-facing side, the dark side is also vulnerable. Higher latitudes on
  both sun-facing side and dark side are equally vulnerable.

  CSP: Is there anything we should be doing differently when we design
  protocols that would help with resilience to this type of event?

  SAJ: Need to consider resilience at every layer of the stack. With DNS
  for example, root servers are well distributed, but we don't know how the
  entire hierarchical tree would be impacted - do we need to change
  caching, or change how DNS records are managed? That is not clear. When
  it comes to protocols, BGP which allows only a single path might be too
  restrictive when capacity is severely limited. That's further analysis we
  are planning to do. Within an AS, OSPF or other intra-AS routing
  protocols fair very well because they are decentralised and can use
  whatever paths are available. The inter-domain protocol needs more

  CSP: Makes sense. I guess there's a whole bunch of coordination issues
  and management issues with large scale cloud provider infrastructure
  networks and so on as well. Some interesting problems.

## Unbiased experiments in congested networks

Bruce Spang

  Jana Iyengar: Great talk - very illuminating at a minimum. This is
  something everyone should take into account when conducting these
  experiments. Shows clearly that small A/B tests aren't necessarily good
  enough. Wondering if you've examined what would happen if you made a
  client sticky for a particular A/B test. How exactly was choice made for
  serving a particular type of content? Bottleneck has to be shared between
  control and experimental groups. If bottleneck is close to the user then
  if user is in a bucket that is either control or treatment then the user
  would effectively be at the far end of your scale. Have you looked into
  this? Where your experiments sticky to users?

  BS: There are things we can do on the allocation side to avoid some of
  these issues. Experiments we ran were sticky to users. Overall point
  you're making is true - if you can guarantee that users will never share
  resources then you can avoid this bias. If you believe users are not
  sharing any bottleneck links with each other then you avoid bias. Didn't
  explore this too much because we found it hard to measure whether users
  were sharing links, and didn't have a good sense of how often that would
  be the case. If you were allocating users instead of sessions, gut
  instinct that this is better than just allocating sessions in terms of
  interference. You could also allocate networks or particular servers or
  try to reduce probability that treatment and control share the same link -
  will reduce bias of the experiment.

  JI: Very helpful. If you tried to use these techniques, it could tell you
  something about where users do end up sharing bottlenecks as well.

  Brian Trammell: If we use what we know about networks to design
  experiments about networks we can get a lot smarter about this.
  Switchback experiment looked like a diagram of TDMA. A/B tests are like
  CDMA. Are there ways to use this multiple access metaphor to find other
  ways to analyse this? My SRE mind wants to automate code partition, watch
  with things show up in the power spectrum. Wondering if there's a more
  fundamental way to split this up. Not expecting an answer right now!

  BS: Super interesting question, don't have an answer offhand. One thing
  social networks do is allocate a user and all of their friends to a
  particular experiment. Could be something there for networks too.

  Jonathan Morton: One of the big lessons to take away from this is how
  important it is to understand the testing methodology in detail when we
  look at a set of purported results particularly ones that are used for
  marketing. Fine details can have a big effect such as how a particular
  A/B test was conducted and how potential bias that you identified has
  been mitigated.

  BS: Definitely.

  CSP: You said there was a need for better experiment methodology. What if
  anything should we be doing when we're designing and evaluating new
  protocols to improve confidence and results? Is there any general
  guidance we should be providing, or is it read this paper and think about
  these issues?

  BS: We build good systems with what we do today. This gives us another
  tool to think about evaluating algorithms. When designing new algorithms
  in the IETF I'd think about the fact that the way we run experiments to
  evaluate these algorithms can be biased so think about other ways to run
  them that might mitigate those effects.

  CSP: Makes sense. Thanks!

Recordings of the talks, and links to the papers, are available from