# IRTF Open Meeting

Monday, 27 March 2023, at 13:00-15:00 Japan Standard Time
Room: G401-G402

Chair: Colin Perkins 
Minutes: Colin Perkins
Recording: [YouTube](https://www.youtube.com/watch?v=qHT-jlc1rTI)

## Introduction and Status Update

Speaker: Colin Perkins, IRTF Chair

[Slides](https://datatracker.ietf.org/meeting/116/materials/slides-116-irtfopen-agenda-01)


## Introducing the Usable Formal Methods Proposed Research Group

Speaker: Jonathan Hoyland

[Slides](https://datatracker.ietf.org/meeting/116/materials/slides-116-irtfopen-introducing-the-usable-formal-methods-research-group-00)

Jonathan Hoyland gave an introduction to the new Usable Formal Methods
Proposed Research Group. This group is studying the use of mathematical
techniques to assist in the specification, design, analysis, and
implementation of network protocols. That is, it’s asking if we can bring
mathematical rigour to our work with protocols?

Jonathan gave a very brief history of the use of formal methods for
protocols. He summarised the difference between formal analysis and formal
verification of protocols, with the former broadly focussing on design
correctness and the latter on implementation correctness, and noted that
the research group will consider both aspects of formal methods. He
reviewed the way the academic formal methods research community engaged
with the design of TLS 1.3 and found, and removed, several flaws in early
versions of its design. This was effective, but the correctness proofs are
hard to produce, hard to understand, hard to verify, and hard to adjust. 

For academics formal analysis of protocols is also high risk and low
reward: results that find new types of problem are publishable, but those
that demonstrate correctness of a protocol are hard to publish. 

The UFMRG is aiming to provide a place for experts to gather. It’s aiming
accumulate a pool of knowledge and training materials around formal methods
applied to protocols, and to provide feedback to designers of formal
specification tools. It also hopes to provide a venue to publish proofs of
correctness, possibly via the Applied Networking Research Workshop.

A non goal is to change the IETF processes. This is a research group that
hopes to encourage the use of formal methods, not an attempt to force their
use. Success will come if people see the work in the group and want to use
formal methods in their protocol design.

Questions:

* Rod van Meter, Keio University: proofs depend on a certain set of
  assumptions, is considering these in scope? Yes, there are some standard
  assumptions used, for example that a cryptographic algorithm is secure,
  that need to be verified separately. 
* ???, Telefonica: is the focus solely on security properties? No – that’s
  Jonathan’s background, but it’s not the sole focus of the RG. 


##  Introducing the Research and Analysis of Standard-Setting Processes Proposed Research Group

Speaker: Ignacio Castro

[Slides](https://datatracker.ietf.org/meeting/116/materials/slides-116-irtfopen-introducing-the-research-and-analysis-of-standard-setting-processes-proposed-research-group-00)

Ignacio Castro introduced the new Research and Analysis of Standard-Setting
Processes Proposed Research Group (RASPRG). This group is about
understanding how the standards process works in the IETF, with the goal of
producing data and understanding to inform the IETF and the broader
research community. For example, the group might analyse the email archives
to help understand the consensus formation process, it might develop tools
to make recommendations for draft reviewers, or it might study demographic
shifts in the community that develop standards, etc. 

The group will take a collaborative approach, between practitioners,
developers, policy maters, and researchers, and will provide a repository
of data and tooling. It’s not aiming to make hierarchical comparisons about
different standards developing organisations or to change operational
practise in the IETF. 


## Autonomous NIC Offloads

Speaker: Boris Pismenny 
[Paper](http://www.cs.tau.ac.il/~mad/publications/asplos2021-offload.pdf)
[Slides](https://datatracker.ietf.org/meeting/116/materials/slides-116-irtfopen-autonomous-nic-offloads-00)


Discussion:

* Colin Perkins: The approach assumes certain properties of a protocol to
  be offloaded: it must be size preserving, incrementally computable, have
  a constant size state, and so on. These obviously fit some protocols, but
  not all. To what extent are these fundamental constraints vs. limitations
  of the current work that might be removed in future? 
* Boris: They’re almost all fundamental. The only one that could be
  softened is the one around block sizes for AES CBC, but since AES CBC is
  deprecated that’s not a priority.
* Dave Oran: Do the trade-offs change if you have a user-mode TCP stack?
  No.
* Colin Perkins: Is there any guidance or issues to consider that the IETF
  should consider when designing protocols to make this type of approach to
  offloading work better?
* Boris: This approach also works for TLS 1.3, but one of the issues there
  was that’s it’s not possible to know if a record is application data or
  something else until the entire record has been decrypted. This
  introduces complexity in dealing with application data vs handshake
  packets that could be avoided if different types of data could be
  distinguished after partial decryption. 
* Rod van Meter: To what extent is the value of this work dependent on the
  particular state of the technology? Will the same ideas apply in future
  as hardware advances or do the depend on particulars of current systems?
  If someone offers one parameter you could change in the system, e.g.,
  double bandwidth or more cores, what would improve performance?
* Boris: We’re in the early days of this technology, so hard to predict.
  Making the most out of this requires a lot of software changes. The trend
  is that NICs are getting faster but CPUs are not, which is pushing for
  offloading. As long as we’re using TCP, expects the technology to remain
  relevant. 


## AI/ML for Network Security: The Emperor has no Clothes

Speaker: Arthur Selle Jacobs 

[Paper](http://www.inf.ufrgs.br/~granville/wp-content/papercite-data/pdf/ccsjacobs2022.pdf)
[Slides](https://datatracker.ietf.org/meeting/116/materials/slides-116-irtfopen-aiml-for-network-security-the-emperor-has-no-clothes-00)

Discussion:

* Stephen Farrell: This depends on have access to the training data. Can it
  be extended to not need such access? Yes - you don’t need the exact
  training data, although the insights are better if you have the data and
  you can develop explanations.
* Stephen Farrell: Are there any recommendations that can be made to
  validate models? This is hard to automate unless you’re familiar with the
  domain, but the paper does have some guidelines.
* Doug Montgomery: Is the resulting decision tree model unique or optimal
  in some way? It’s not unique and probably not optimal. It optimises for
  fidelity, and there could be multiple decision trees that have the same
  fidelity. The expressivity of the decision trees is lower than that of a
  neural network, and the different decision tree explanation for the same
  neural network might all be true and reveal different aspects of the
  system.
* Doug Montgomery: Are you retraining on the data? No, they’re evaluating
  on the original data to see how it will perform. The techniques are sold
  as black-box models so you don’t need to retrain the models on the
  synthetic data.
* Brian Trammell: The analysis is based on fairly deep domain analysis. Is
  it accurate to characterise the work as automation assisted analysis of
  decision trees? Yes. 
* Brian Trammell: Can you generate or extract synthetic network data using
  the model to automate the analysis? Yes, there was a HotNets 2021 paper
  doing something similar.
* Priyanka Sinha: The start of the paper is about explaining the models,
  but the later aspects are commenting on the datasets. She’s familiar with
  the “data excellent work”, data sheet for data sets, and wonders if this
  has relevance? Most of the problems have arisen from bad data, but
  choosing what features to use from the data is an important part of the
  model; it’s an important topic. Also, have you considered an information
  gain measure? No, but that’s a good point.
* Diego Lopez: We have been trying to setup mechanisms for publishing
  usable data sets for training ML model. Wondering whether it would be
  advisable for us to try to make available public, high quality, data
  available to train the models? Yes, there’s a lot of private data in
  networking and less sharing than in other domains. Making more data
  available would be valuable. Privacy issues are a concern, and they were
  not able to release all their data as a result.