rasprg Meeting Notes

(note taker: Rod Van Meter and hopefully others)

Agenda

9:30 - 9:40: Agenda bashing

9:40 - 9:50: Chairs' welcome, opening and panellists' introductions

9:50 - 10:10 : Panellist Statements on the topic: "How does
standardisation look in an LLM-enabled future?" (7 mins each)

10 - 11 : Discussion: lead by the chairs and for the panellists to
discuss with each other.

11- 11:30 : Q&A

Panel Session/Open Conversation

First speaker: Jie Bian

Mining Mailing Lists for Internet Protocol Decisions
https://datatracker.ietf.org/meeting/124/materials/slides-124-rasprg-01-jie-bian-00

Jaime Jiménez

"Do not delegate understanding"
https://datatracker.ietf.org/meeting/124/materials/slides-124-rasprg-02-jaime-jimenez-01

AI is a tool to delegate tasks, not understanding

AI Agents can help today with standardization preparation (insight
generation, back office work) and specification writing and review.

(Eye chart slide showing three panels with Claude's analysis of an IETF
document.)

Challenges: compound errors require human in the loop (HITL) at every
step; infrastructure and practices need to be created and refined;
resistance to adoption, so trust must be built.

Joseph Potvin

Evolution of Standards in an ~AI World
https://datatracker.ietf.org/meeting/124/materials/slides-124-rasprg-03-jpotvin-rasprg-00

AI: apparent intelligence.

Wouldn't it be nice if we had a way to transform 200,000+ rules encoded
in >5,000 active RFCs into a searchable database?

Used venice.ai LLM to analyze BGP-4 dissemination rule.

Jean F. Quéralt

Prompt-Run Protocols

"Traditional" is "prompting" a compiler.

"Prompts" are very broad ideas, including things like the strategies
that humans follow.

Going to Middle Earth via Arthur C. Clarke. Yes, a magic spell is a
prompt.

A new role for the IETF? "The Royal Academy of the Technical Language"

Question: are we at the end of protocols?

Suggestions for IETF: explore an RG on this topic, explore how to
structure future RFCs to be prompt-friendly.

"I'd love to change the world, but they won't give me the source code."

(End of introductory remarks from panelists.)

Oral discussion among participants

Q from chair: What are the barriers to adoption?

Jaime: already adopted in commercial organizations. Automation is
driving, once they try this they don't want to go back to old practices.

There is no practices, and there is no determinism in the way it's done
today.

Far future, I don't know.

Joseph: Agreed, don't delegate understanding. I use LLMs a lot, but I
call my approach Artificial Naiveté.

We don't assume that any particular programmer is going to he aware of
every rule from IETF, OASIS, etc.

Ignacio: I'm wondering whether people are adopting this faster.

Jean: I don't see how it's going to move to second phase.

15 years ago, we didn't know about this, it's hard to see what's coming
down the road.

We need to make sure we are maintaining interoperability.

Joseph: Finding the "why" of an y protocol decision would be helpful. I
find that LLMs are constantly running interference, throwing in things
from its training, throwing in extraneous ideas.

When I used the simpler LLM, Venice, it didn't insert a bunch of stuff I
didn't want, but the others all did.

Jaime: My experience with hallucinations is that they become entrenched.

Andrew Campling: AI sceptic. Shouldn't be called "AI", it's not
intelligent in any reasonable definition. Using as efficiency tools:
question the impact of that. Example of training an LLM vs. engineer:
the issue is, you don't have an engineer. In 5 years, you still just
have a model. Danger of hollowing out the skills that are available,
where is the innovation going to come from?

Jean: keyword is "satisfactory". Stop using when good enough. I don't
see why it couldn't come in the future.

Jaime: We're on the top of the hype curve. Has to get to practical
reality.

Question from the chat - Colin Perkins
00:21
A question was “How to structure future RFCs to be LLM friendly”? Do the
panelists have a feel to what extent formal methods help because they
can be automatically checked? That's exactly what we're doing with our
tool. Prompts to humans can be auto generated, giving the human a short
summary of what it is, metadata is then a correct link to the RFC, and
they find a 20 page standard that they didn't realize is directly
applicable to what they're working on.

Joseph: LLMs should enable humans to be more effective. If it's not
doing that it's not interesting. How to structure future RFCs to be LLM
friendly?

Rodney Van Meter: I like the idea of "don't delegate understanding". A
lot of this of re-packaging RFCs seems like a really good idea. Like TCP
has evolved over decades - I often find that Wikipedia is a better
source to find what I need. On using this: the harder the questions are,
the wronger the LLMs are. That's a fundamental problem.

Jaime: chairing CORE working group; I've been tinkering with LLMs on
draft for this WG. Pretty hard to spot errors. With agents you get
compound errors.

Jean: Don't use sight of the fact that LLMs are software. Software has
gotten better and better over the years. There are serious conversations
about Netflix turning into a prompt-based service. I don't want it, but
maybe the new generation? Question is, what is the counter-argument for
it not to happen? About protocols, why would a company not spend so much
money to spend in ITU processes, node A and B talking to each other in a
certain way, maybe is just becoming a prompt. How should we prepare for
this?

Colin Perkins: getting back to "how to structure RFCs to be
LLM-friendly". Formal methods may help. In programming languages a
compiler can check the result. More structured, formalized ways of
writing would help the LLMs with modeling the protocols.

Jamie: ABNF, we have plenty of ABNF in RFCs.

Joseph: maybe there is already a pattern language for RFCs that might
help to structure the way RFCs are expressed. It would help.

Jean: that's what I meant with "royal academy of technical language". We
have things like MUST, SHOULD etc because it solved a problem. We have
YANG. Maybe IETF should determine which models are working better,
giving a rank or something. It's something the IETF has to look into.

Jie: You can use current LLM like ChatGPT to function as a reviewer when
writing something. We think what we write is clear but maybe it's not,
currently you can use it as a reviewer. Companies like Triple AI do this
for paper reviews.

Sue: Having worked with YANG and done several complex chain models...
ChatGPT... automation for configuring and managing networks has had some
spectacular failures. See the news, look at Amazon, this was based on
fairly well installed automation. Problem is that people didn't expect
what was going to happen. ChatGPT had people concerned about security,
where data goes. Both are real issues - screw-ups by accident and there
are screw-ups by bad intent. LLMs have tremdendous power but also
potential to hit new crisis points. Template approach which I'm a
hundred percent behind, a template to write RFCs. Would love to hear
something realism behind the tools.

Joseph: realism, you mean the LLMs in particular? E.g. particular method
that you described, does that help address the issue, since you
mentioned using templates.

Sue: [lost much] important to think about crisis points. Research is
useful in this area.

Joseph: beyond research, there's running code. I did 10 years of
research on this, what I presented here was a very particular use case.
I was particularly focusing on how one could auto-generate prompts.
Incorporate standards conformance on the fly.

Jaime: for developers that use Claude Code etc the bar is very low. IETF
could help in some of the infrastructure, maybe the IETF backend. There
is a process called fine tuning - you write questions and answers and
verify what the LLM has been answering. Make the LLM more accurate, we
could do that. Fine tuning is not expensive.

Sue: would like to talk about offline, RFCs that I'm nurturing on BGP
that do well for an LLM because I'm trying to get people to follow a
template. I have a set of 20+ RFCs that use this template, so if you're
interested in LLMs I'd welcome that experimentation.

Chair: Interesting that two presenters picked BGP as examples. Talking
about RFC4271, which is now being revised. In the next year or so,
revise the draft. I'm not trying to put more work on you, but might be
an opportunity to try in the real world.

Sue: BGP issue for the first the generation. BGPLS and SR, that is where
I have up and coming RFCs that could use some of this approach, I'm
willing to give it a shot. We'd get better RFCs quicker.

Joseph: very happy to collaborate on that. I frequently use LUMO. I gave
it the DWD array and gave it the coordinate. Told it where I'm getting
it from and activated web access. Told it to search around a certain
RFC, what it generated seemed to be a reasonably confident summary. If
we focus on the template design an prompt engineering to get reasonably
good answers from any of the LLMs, we can get it to the point where the
LLMs do the boring ground work and we can take a rule and have a human
checking.

Jean: I think of "human as a subject". How do we solve the problem of
possible bad outcomes. Someone is going to suffer, but I don't see how
we could change the way technology has always been evolving. We have
never created anything that didn't help humans. The goal is always to
help humans. We're trying to replicate humans.

Jari Arkko: I'm in the same boat as Jaime; we have worked together, so
no surprise. About the discussion: we don't have all the information in
standards, things we have to know, question of general fit for problems.
If you get a feature, where does it take us? Faster programming = less
programmers, then maybe you need to do more, faster? Do we give up with
standards and everybody can do their thing per context? What is it?

Jaime: a lot of this has nothing to do with AI, cycles are geting
faster. Question of quality, good quality RFCs from IETF are appreciated
in the long run. AI lets us do things faster with a certain quality
level. Things could change in the way we work. Could see small group
doing an implementation, whole standardization process being generated
from the code.

Jean: To Jari: fascinated by the way you used the word "determinism".
Humans are not deterministic. Asking different people the same request
may yield different output, but it may be sufficient in all cases.
Hallucinations: we keep re-inventing these terms - in the industry, we
used to call them mistakes. That's what a human does. We already have a
concept for that.

Jari: There's a distinction here, if there's a human in the loop it's
usually not a problem at all, as long as the human knows what they're
doing. A box where a router forwards packets you can't have any
appreciable error rate there or things will go very badly. It's about
the level of human involvement. We're all lazy, at some point we start
trusting things.

Jean: tolerance level / sufficency concept: thinking about Harry Potter
- magicians need to go to school, learn specific spells. Same thing
here, engineers need to maintain the infrastructure, the abstraction
layer that makes it possible.

Joseph: regarding factors affecting the future, there will be useful
methods, failures, sophisticated abuse.

There is kind of a built-in bias in the things it is trained on.

Even when doing the technical work on our stuff, this is using tabular
(?) rpgroamming. Was very popular before computing was big in the 60s
and 70s. We were drawing upon older material very few LLMs have access
to. LLMs have difficulty being up to date on data.

Nicola Rustignoli: to Jaime - earlier you said you should not delgate
understanding to LLMs. Was confused about the optimism. My experience,
good at small tasks but when it come to understanding not so good.
Curious about when it will be better.

Jaime: not an expert but it's a statistical thing. Can fine-tune an LLM
and make it smaller.

Nicola: q2, do you know of any policies on the use of LLMs or discussion
of it?

Jaime: every company has policies for the code they generate. In general
in IETF information is public. LLMs have been trained with IETF data for
sure. For authorship, you can at least acknowledge it at the bottom -
thank you very much, Claude.

Alvaro: difference between using LLM to fix grammar etc., but letting it
write, then one should reference it.

Joseph: accessibility of standards to people whose first language is not
English. When doing testing of constrained grammar, we tested it on 26
very different languages. By restricting the grammar, we were able to
get the essence of rules expressed in a correct way for speakers of
different languages.

Michael: there is short term and long term. For long term we can worry
about the inherent dangers of generating text with LLMs, but for the
long term we should focus on low-hanging fruits. Use tools to help make
RFCs shorter, clearer and more precise, no need for generative LLM use
actually - and then, we maybe have an easier time to produce RFC text
that is more suited for consumption by LLMs too. This should be our role
now as the IETF.

James (Kunle) Olorundare: still growing. Not 100% perfection in whatever
way we are using it. I just want to throw a word of caution in here.
hmmmm...lost almost all? :(

Jie: about fine tuning: it's more than just giving a model input and
output. To have some trust in LLM you should also provide your
reasoning. Will help the model to learn how to reason. Asking it to give
not only the answer but also the writing process, it's not like a
fantasy or hallucination.

Jaime: I am am more familiar with this chain of thought and research
that is letter-based, but there must be better ones coming.

Ignacio: Maybe this (with background noise) is a good time to wrap up
the session. Thanks to all of the speakers.

Additional comments/questions from chat

Susan Hares
00:17
What are the potential crisis points? Automation of configuring and
managing networks have had specatular failures.

Susan Hares
00:21
Is there research on security for LMM? Is there research that considers
how to protect against attacks or "screw-ups" by the algorithms?

Andrew Campling
00:34
For a different illustration of the current models - "It Takes Only 250
Documents to Poison Any AI Model" -
https://www.darkreading.com/application-security/only-250-documents-poison-any-ai-model

*limitations of the current models

Rodney Van Meter
00:59
How many pages/words/bytes in the RFCs and on the mailing lists over the
last 4-5 decades? I'm sure it's a lot of text for a human to grasp, but
should be modest compared to the size of e.g. Wikipedia or other major
corpuses.

Colin Perkins
01:02
@Rod the RFC archive is about 67 million words

Rodney Van Meter
01:03
Thanks, an interesting number. (My own rsync copy is far out of date and
offline in Dropbox, so I couldn't do a quick check.)
Rodney Van Meter
01:04
(For comparison,
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia says English
Wikipedia is 5 billion words in 7 million articles.)
Colin Perkins
01:05
The mail archive is 37GB. Doing a word count on that is not trivial.

Susan Hares
01:00
I think we have created a knowledge base on how to better design
automation. Some of the things to look at are crisis points, load
points, whether complexity makes it easy to do "oops", and how to test
tools.

Susan Hares
01:01
Is the speaker ignoring this past knowledge or his point different than
this point?

Colin Perkins
01:11
It [use of AI in writing RFCs] is in the IRTF Code of Conduct –
correct

Rodney Van Meter
01:12
RFC 9775:
Generative artificial intelligence (AI) tools and systems must not be
listed as authors of IRTF documents, presentations, or other
materials. The use of generative AI to create text or other content
is permitted but must be disclosed if significant amounts of such
content are included, for example through an acknowledgement
describing which AI system was used and how it contributed. The use
of AI to perform spelling or grammar checks and corrections, to
translate between languages, or to otherwise improve the presentation
of content need not be disclosed.

Ignacio Castro
01:13
Nick Sullivan presented RFCGPT here, an early attempt of loading an LLM
with RFCs for Q&A
https://datatracker.ietf.org/meeting/119/materials/slides-119-rasprg-llms-and-rfcgpt-leveraging-large-language-model-platforms-to-understand-standards-01

Susan Hares
01:15
Joseph's point on non-native English speakers - is important.
Colin Perkins
01:15
+1 to Michael. There's a lot of low-hanging fruit here.