IETF 122 RTGWG Minutes
Chairs:
Jeff Tantsura (jefftant.ietf@gmail.com)
Yingzhen Qu (yingzhen.ietf@gmail.com)
WG Page: https://datatracker.ietf.org/group/rtgwg/about/
Materials: https://datatracker.ietf.org/meeting/122/session/rtgwg
##
9:30-11:30 - Thursday Session I, March 20th, 2025
-
9:30
Meeting Administrivia and WG Update
RTGWG Charter Update
Chairs (10 mins)
-
9:40
Dynamic Networks to Hybrid Cloud DCs: Problems and Mitigation
Practices
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/
Linda Dunbar (10 mins)
- Linda: Hoping for another last call, looking for people to review
it.
- Yingzhen: Do you think we need another round of directorate review?
I can request that.
-
9:50
Multi-segment SD-WAN via Cloud DCs
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-multisegment-sdwan/
Linda Dunbar (10 mins)
- Linda: This draft is very stable. Ask for WG last call.
- Altanai: How is this approach different from service chaining?
- Linda:It has similarity. Service chaining is chaining functions
together. However, this draft is about two different entities. You
have enterprise between a and b. Conceptually, you're chaining
together gateway one and egress gateway 10. However, this draft is
really about how does the gateway not to decrypt the traffic coming
from a. However, only authenticate the traffic from a. And then with
the information encoded in the geneve header to be able to forward
to the appropriate egress gateway. There's similarity there, but the
purpose is different.
- Yingzhen: We will wait for more people to review it, and we can
start some directorate review at the same time and get it ready for
the last call.
-
10:00
YANG Data Model for IPv6 Neighbor Discovery
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-ipv6-address-resolution-yang/
Fan Zhang (10 mins)
- Acee: Router discovery is already some place else. It would be good
to do an analysis of what is missing to cover everything in the
Neighbor discovery RFC, RFC4861 RFC4862.
- Fan: We're just trying to cover the rest of it.
- Acee: Keep it with just what's in there and not all of it, at least
there should be a reason why you're not putting the rest in. And
somebody else should be able to augment and add the rest of it. Or
you add it before Working Group Last Call.
- Fan: The Last Call is for the ARP YANG model.
- Acee: I know that.
- David Lamparter: Did you test the SEND pieces in some actual
deployment to see if there were issues with the model?
- David(from the chat): Really feels like the SEND bits should be
tested, in part because SEND is rarely done and thus few people are
easily able to spot issues with the model.
- Fan: we haven't.
- Yingzhen: I don't think they have tested it. So far all tests are
done through some YANG tools.
- Lou Berger: I'm confused why it's called address resolution when
it's v6 neighbor discovery.
- Fan: The original draft title is yang data model for IPv6 address
resolution. We first limit the scope to just correspondence to the
ARP which is just for IPv6. As we add more nodes and it's not only
about address resolutions, so we changed the title to neighbor
discovery.
- Lou: But you didn't change the modle.
- Fan: The modle was changed from IETF ND. The original model name is
IETF ND, during the WG adoption poll, Acee suggested that it's only
about address resolution, so we should name it IETF IPv6 address
resolution, so we changed the model name. It's a little bit
conflicting, welcome suggestions.
- Lou: If the direction is to allow for and then add all the other
elements that you would need to manage neighbor discovery, and if
that's the direction, I think you should call the module v6
neighborhood discovery
- Yingzhen: Something for you to consider. I guess maybe we can review
the original specification to see whether more content needs to be
added. And later figure out what's really the right name.
- Erik Kline: Not to disagree with Lou, my quick schema of this was it
was kind of looking like linux proxy's netconf interface config for
just the address resolution parts. There's a bunch of other ND stuff
seems to be tracked elsewhere, also ND has information node query
stuff and a whole lot of things. I don't know what the group wants
to do, centralize it or split it up. But there's a lot of extensions
even now, for things that fall under the name ND but that are not
actually neighbor discovery but some other ICMP-based protocol.
- Lou: I was keying off the comment from Acee that they're gonna add
in everything for neighbor discovery and if that's the intent, it
would make sense to change the name. You gotta choose whether you're
doing neighbor discovery or address resolution. Pick and be
consistent.
- Erik: I was re-thinking because they had already said RAs and other
things were somewhere else. They are coming back? The RA stuff is
not coming back? I don't know.
- Fan: It's already been defined in another YANG model, I think.
- Acee: I'm not the co-author of the original that just did the RA,
but I am a co-author of the bis version of that draft. So I know
where it is. In a separate module.
- Erik: I agree it's just address resolution.
- Acee: Actually, they've added more, so there're some counters for
some other things in there. And if the intent is to add everything
in there, it's fine the way it is. But it would good to get as much
of it, as much of the neighbor discovery protocol because I don't
think neighbor discovery is that insurmountable a protocol that you
can't put it all in one model, other than router advertisement which
is already in an RFC. Kind of circling here.
- Yingzhen: Continue the discussion in the mailing list.
- 10:10
Fast failure detection in VRRP with Point to Point BFD
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-vrrp-bfd-p2p/
Aditya Dogra
- Jeff Haas: BFD chair hat on. Did the authors consider using seamless
BFD rather than 5880 standard BFD?
- Aditya: This draft can encompass all variants of BFD, we are not
really focus on point-to-point BFD or mulpoint BFD or SBFD, you can
use anything.
- Jeff Hass: So the motivation for doing seamless BFD, if you're doing
standard 5880 single Hop BFD for this, you have to have a running
state machine, and that's what looks like it's partially motivating
the procedure in the draft to have this no magic session, whereas if
you're able to do seamless BFD, what you're actually doing is
addressing the device that happens to have the role, which might be
under a current vrid rather than a specific IP address. So this
might get you out of the point of needing to have quite as much
magic vrrp procedures to decide whether you need to failover or not.
- Aditya: We do look into the seamless BFD in recent times because
there's another draft which came for the seamless BFD. So from that,
what we understand is more like we can push some of the updates from
the vrrp state machine toward the seamless BFD. Definitely that is
something which we can take a look. But as I mentioned, whatever the
peer learning model that is very much specific to the VRRP, we can
use the seamless BFD the way OSPF or ISIS they have used to carry
those discriminators, that will help to scale it. Comments
definitely taken. Seamless BFD can be considered for the scalability
aspects as well as to clean up some of the state machine.
- Jeff Hass: Rather than worrying about the scalability, you're at
that point worrying less about having less state in VRRP to enable
BFD and just focusing on the BFD procedures. You care about the role
of being able to failover. You don't care about having a point to
point session with the backup.
- Aditya: Sure.
- Acee: In this draft, basically the backup advertisements are sent at
normal rate, and they're just to let all the vrrp routers that are
participating know who to set up BFD stations, correct?
- Aditya: Right. But we do think about that we don't want to send the
BFD updates at the same frequency as active.
- Acee: I understand that. It's more of a provisioning of the BFD
sessions so you can have this full mesh of BFD sessions among the
routers that are participating in the vrrp, right?
- Aditya: That is true.
- Acee: It seems like if you could avoid adding that using seamless
BFD like Jeff was saying. I don't envision it right, that would
probably be simpler. It sounded like something to look at.
- Jeff T: We started this work kind of before seamless BFD came into
picture, but all valid points.
- Vengada Prasad Govindan: In the slide 6, you have mentioned 3 *
Advert interval,I think it should be multiple Advert interval, 3 is
the default, but you can have it higher or lower.
- Aditya: That is true. When you reduce it or the granular you make
that advertisement interval, the higher the load it puts on the CPU
and that is one of the reasons why we want to do this way.
- Vengada: That's fine. But my point is it does not have to be three.
Second point, just my thought with no particular preference, I think
BFD is slightly a better fit because of the fact that you have the
neighbor signal capability. So even if in the case you have
asymmetric BFD timers, one guy is fast and another is slow, BFD
could be a better fit because the faster guy detecting it can just
signal the neighbor down and therefore essentially bring the session
back.
- Aditya: Yes, in that situation, the point to point BFD will be much
better solution than seamless BFD.
- Yingzhen: Right now, we have BFD for point to point and point to
multipoint, both in RTGWG and now Jeff brought up the seamless BFD.
So I think the chairs will work with the authors, we need some
coordination work to make sure everything goes right.
-
10:20
SR Policy Programming RPC
https://datatracker.ietf.org/doc/draft-ali-spring-sr-policy-programming-rpc/
Zafar Ali (10 mins)
- Boris Khasanov: We need more clarification about motivation, so far
serialization advantage does not sound quite convincing.
- Zafar Ali: RPC is much simplier and easier for implementation.
- Boris Khasanov: I see the point but cannot agree in regards to PCEP
(there are PCEP libraries in Go), for BGP I could agree.
- Zafar Ali: Let's talk ofline.
- Lou Berger: Why can't we use YANG? It is well defined why there is
need for a new tool?
- Zafar Ali: Fare point.
- Jeff Tantsura: Zafar, would be good to make a comparison table.
From the chat :
Dhruv Dhody: Does the NMDA ephemeral data store not able to handle this?
Jeff Tantsura: +1 Dhruv
Lou Berger: was going to ask same quesstion. this still sounds like
standard yang + xxx-conf|gNMI
Joel Halpern: We started I2RS assuming we needed something else, and
after the analysis concluded that indeed YANG can do the job.
Jeffrey Haas: The yang language was fine. The idea of the data store
relationships... that took more than a little effort to address.
Dhruv Dhody: @andrew - In Yang, you have the flexibility of updating the
full SR policy or just CP, so we dont have to worry about "the unit of
signalling"
Joel Halpern: Yep, and as Jeff pointed out, we did the work. So let's
leverage that.
Dhruv Dhody: Adding NMDA ephemeral state in comparison is needed,
currently it is not even mentioned.
Andrew Stone: @Dhruv true, but considering path updates can be
"frequent" having to update a higher level root rather than a more
specific draft reduces the overall payload processing on both the client
and server (reporting side) to understand what has changed. Or in other
words, most activity will involve updating SID list instructions. Having
to redeploy/reprocess the entire SR Policy with N CPs and N SID lists
just seems.. heavier to me
Jeffrey Haas: So, for part of that analysis how independent is the state
you're provisoning? If fully independent where commit checks aren't
needed... rpc could be done. But the minute you add relationships, you
want config. And SR is all about gluing stuff together. So, I find it
likely that eventually pushes you to really want it in ephemeral.
I'd want to spend more time staring at the draft, but rpc is fine if you
just want to poke the box and get stuff to happen. But the minute you
want stateful relationships with some flavor of possible persistence or
even nicer operational state, ephemeral config starts to be strongly
appealing
And as Andrew suggests above, the main headache even with ephemeral is
you have a "commit" operation which can be slow.
Zafar Ali: This is not related to "configure" SR policies by a
controller. But about defining an RPC to program ephemeral states at
router, e.g., gRPC.
Andrew, in most cases, we only need a single CP. May be in a remote
case, there may be need for a policy with two CPs - so the overhead is
not there wrt policy being unit of singling is low. The correct data
model is SR policy level. But we can discuss more offline.
Lou, I will connect with you after the RTGWG.
Jeff -
Re: how independent is the state you're provisoning from config?
This is completely independent. Think of PCEP programming an SR Policy
is independent of the configuration.
-
10:30
The Challenges and Requirements for Routing in Computing Cluster
network
https://datatracker.ietf.org/doc/draft-li-rtgwg-computing-network-routing/
Yizhou Li (10 mins)
- Jeff T: What is the purpose of the work? Is it informational
reference for all of us of what people tried in the last 30 years,
there are many attempts to do this, or new routing protocols that
would potentially lead to the creation of a new working group to
develop it? What are you trying to achieve?
- Yizhou: It's informational first. Because we think the computing
cluster network currently, e.g, running AI training or inference,
they have special characteristics different from like 20 years ago,
currently we want to try to collect all the information and use this
document to provide an overall view to see whether it is a good time
or a right thing to revisit whether we can make a different strategy
or come up with a better strategy to do the hybrid routing. So
that's the current goal of this.
- Jeff T: I would think you need to take a similar approach we did for
routing in data centers. About four years ago, we've written a draft
that specifies the requirements for routing protocols in data
centers, this result in RIFT and LSVR working groups, both are
successful. So unless you want to create a preference, I think you
need to start with going and specifying what's missing, what's not
working, why this work is needed. So the motivation, not just these
are the things we need to do.
-
10:40
Routing in Satellite Networks: Challenges & Considerations
https://datatracker.ietf.org/doc/draft-lj-rtgwg-sat-routing-consideration/
Tianji Jiang (10 mins)
- Jeff: What IETF can do here?
- Tianji: My expectation is to promote some new technologies for the
routing part, but first start with the problem statement, usecase or
requirements. There are many things that can be done here, like no
full-set routing, intelligence on board satellites, because of the
limitations and also some predictable and predetermined satellite
footprint that can help with the RTG work.
- Erik Kline: Can we use PCEP as sort of solution? What's missing?
- Tianji: We want to get some soft based routing here because we won't
get all the routing intelligence on the ground to calculate some
path, because the satellite has predictable and predetermined
information to help with the PCE or headend to do that sort of work.
But that information has to be through some way, like a routing
exchange, to down to the ground part to do the work.
- Erik Kline: All these satellites already have a management plane.
They have usually some kind of S-band TT&C channel. These things are
already being controlled and monitored continuously from the ground
and I'm not saying that a network operator would want to reuse that
to relay PCE messages, but it could be done.
- Tianji: If you have a multiple satellites above and they have to
talk with each other, it seems like there are some things they have
to exchange.The UE will talk to multiple satellites above and those
will get multiple paths, also for each path, you get multiple
satellites.This is more complicated case.
- Erik Kline: In the UE case, isn't it directed which one to hand off?
- Tianji: This is something that's not going to be discussed from the
day 1 for 6G. The UE can directly talk to multiple satellites, and
for each satellite going to the ISLA link to the next and merge at
some point, and go down to the ground station. I'm not talking about
the radio
- Jeffrey Zhang: You use the 5G/6G use cases to arrive at those
conclusions and the challenges on the routing aspects. Those are
just generic problems to solve in the satellite networks, not
specific to 5G/6G.once we solve the general problem of routing among
the satellite the 5G/6G problem will be solved from the transport
side. The 5G/6G itself may need to do some work on their side. But
on the IETF side, we have a generic problem to solve basically,
routing among highly mobile and limited capacity, those are generic
problems we need to solve.
- Tony Li: We have problems about dynamics and scale so it's about
routing, nothing new here. SDN is questianable if we think about
thousands satellites and RIB/FIB updates there and UL bandwith. It
is surely distributed solution.
- Tianji: A very good point. If I have thousands of satellites in my
network, we don't want to do all the LSDB flooding part. That's why
in the routing consideration, the first point is no full set of
routing intelligence on board the satellite. We try to simplify
things for that part and to integrate with the TN network.
- Yingzhen: Tony, do you want to mention something about the work
being done in TVR?
- Tony Li: No, this has got more to do with RFC9717 than TVR. But I
agree a distributed solution is more preferable than a centralized
solution.
-
10:50
IGP Color-Aware Shortcut
https://datatracker.ietf.org/doc/draft-cheng-lsr-igp-shortcut-enhancement/
Changwang Lin (10 mins)
- Peter Psenak: Happy to see you removed the IGP extensions, but
what's is left? All you are describing is on the tunnel headend what
is your steering policy into the same destination. I don't see why
we need any RFC on this. It's a local behavior, you can do whatever
you want.
- Changwang Lin: The IGP shortcut is first described in RFC 3906. We
just want to update this RFC, we can use the tag of IGP prefix to
look at the colored SR policy.
- Peter Psenak: But that RFC doesn't specify how you load balance your
traffic across multiple tunnels to the same destination.
- Changwang Lin: This RFC does not mention the tag of the prefix will
look up all the TE-tunnels.
- Peter Psenak: they don't want to solve that problem became that
problem is header implementation.
- Liyan Gong: We have checked the RFC3906. There are some descriptions
for the shortcut calculation in the SPF calculation part and the
routing calculation part. We think maybe we have changed this
description, we want to hear the advice from the RTGWG.
- Yingzhen: We need more people to look at this document and see
whether this informational thing needs to be specified.
-
11:00
Artificial Intelligence (AI) for Network Operations
https://datatracker.ietf.org/doc/draft-king-rokui-ainetops-usecases/
Cheng Li (15 mins)
- Jeff T: A chair comment. The NMRG in IRTF, they have been working on
these topics for at least three years. There're whole bunch of
drafts that they're progressing. That's exactly the place to have
more theoretical discussion.I don't think they're talking about
agentic AI specifically, but applicability of AI to network
operations for sure. In IETF, we have got nmops whose charter is
management of the network. There's also some discussion regarding
applicability of AI machine learning technologies, network
management, so that's the right place to go.
- Tony Li: Our comany does work on AI. Why can't you use existing
manage technologies?
- Cheng Li: We don't refuse.
- Tony Li: So what we should standartize?
- Reza: Idea of this draft is about some aspects what we need to pay
attention. May be we realize that we already have everything needed.
Idea to check if we have some gaps or not. This is very open
question, but this something we want to explore this together.
- Tony Li: We have pretty good management plane, if not please let us
know.
- Daniel King: Don't panic, there are about 20 use cases, the existing
IETF technologies may have already that is needed to cover all those
gaps. The content related to the research will be removed.
- Michael: It was a nice idea, but have you considered the sand box
environment before you apply your generative AI routing policy or
something? Because many of the cases that I got when I tried to
create some routing policies from AI, some of them may have wrong
origination or something else. Do you have some consideration to
make sandbox environment before applying to the production.
- Reza: The idea to use sandboxes definetly make sense. Idea about
digital twin will be considered in another draft, it is a very good
idea. The whole idea is is there any interest from routing group to
do this? Is this the right place to do it? And this is mainly
triggering questions rather than answers.
- Jeff: NMRG has been working for digital twins. I am not saying that
we cannot work on that here.
- Reza: We will present that there tomorrow.
-
Jeff: Please do split unsupervised learning from any kind of
statistical analysis, this is pure math, been doing it for 20 years,
deployed at scale, nothing needs to be solved.
The existing NBI and SBI are good. All of that stuff is working. I
don't think we need any replacement or anything better than that,
but what we really need is correlation of events, because events,
it's really multidimensional. We are getting many events coming from
different things. They are very difficult to correlate. It's
difficult to correlate link flop on one side of the network with CPU
load going up on another side of the network. So the real value of
this work would be definition of how to correlate events and
potentially if we are going into machine learning, how to cluster
them, use standard clustering methodology.
-
Tony Hill: But the type of work you're aiming to do here is probably
not protocol work. We can check the status of different documents,
it is downstream work, looks it is more suitable for ITU-T or ETSI.
Daniel King: People found this doc quite useful as a cookbook. One
issue is that operators want to know how AI solutions can be sort of
source of truth for them. We need some kind of visualization. We
need to understand how the engine has actually reached that
conclusion, and that's something that we're looking into as well.
If time permits:
Scenarios and Protocol Extension Requirements of a Generalized IPv6
Tunnel
https://datatracker.ietf.org/doc/draft-li-rtgwg-gip6-protocol-ext-requirements/
Xinxin Yi (5 mins)
Poll for "Should the WG work on a general tunneling mechanism that
supports iOAM etc.?"
Yes(24) No(11) No Opinion(7)
- Yingzhen: You can continue the discussion on the list.
SR based Loop-free implementation
https://www.ietf.org/archive/id/draft-deng-rtgwg-sr-loop-free-01.txt
Lijie Deng (10 mins)
(No time for presentation)