Skip to main content

Minutes IETF122: rtgwg: Thu 02:30
minutes-122-rtgwg-202503200230-00

Meeting Minutes Routing Area Working Group (rtgwg) WG
Date and time 2025-03-20 02:30
Title Minutes IETF122: rtgwg: Thu 02:30
State Active
Other versions markdown
Last updated 2025-03-30

minutes-122-rtgwg-202503200230-00

IETF 122 RTGWG Minutes

Chairs:
Jeff Tantsura (jefftant.ietf@gmail.com)
Yingzhen Qu (yingzhen.ietf@gmail.com)

WG Page: https://datatracker.ietf.org/group/rtgwg/about/
Materials: https://datatracker.ietf.org/meeting/122/session/rtgwg

##

9:30-11:30 - Thursday Session I, March 20th, 2025

  1. 9:30
    Meeting Administrivia and WG Update
    RTGWG Charter Update
    Chairs (10 mins)

  2. 9:40
    Dynamic Networks to Hybrid Cloud DCs: Problems and Mitigation
    Practices

    https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/

    Linda Dunbar (10 mins)

  • Linda: Hoping for another last call, looking for people to review
    it.
  • Yingzhen: Do you think we need another round of directorate review?
    I can request that.
  1. 9:50
    Multi-segment SD-WAN via Cloud DCs
    https://datatracker.ietf.org/doc/draft-ietf-rtgwg-multisegment-sdwan/

    Linda Dunbar (10 mins)

  • Linda: This draft is very stable. Ask for WG last call.
  • Altanai: How is this approach different from service chaining?
  • Linda:It has similarity. Service chaining is chaining functions
    together. However, this draft is about two different entities. You
    have enterprise between a and b. Conceptually, you're chaining
    together gateway one and egress gateway 10. However, this draft is
    really about how does the gateway not to decrypt the traffic coming
    from a. However, only authenticate the traffic from a. And then with
    the information encoded in the geneve header to be able to forward
    to the appropriate egress gateway. There's similarity there, but the
    purpose is different.
  • Yingzhen: We will wait for more people to review it, and we can
    start some directorate review at the same time and get it ready for
    the last call.
  1. 10:00
    YANG Data Model for IPv6 Neighbor Discovery
    https://datatracker.ietf.org/doc/draft-ietf-rtgwg-ipv6-address-resolution-yang/

    Fan Zhang (10 mins)

  • Acee: Router discovery is already some place else. It would be good
    to do an analysis of what is missing to cover everything in the
    Neighbor discovery RFC, RFC4861 RFC4862.
  • Fan: We're just trying to cover the rest of it.
  • Acee: Keep it with just what's in there and not all of it, at least
    there should be a reason why you're not putting the rest in. And
    somebody else should be able to augment and add the rest of it. Or
    you add it before Working Group Last Call.
  • Fan: The Last Call is for the ARP YANG model.
  • Acee: I know that.
  • David Lamparter: Did you test the SEND pieces in some actual
    deployment to see if there were issues with the model?
  • David(from the chat): Really feels like the SEND bits should be
    tested, in part because SEND is rarely done and thus few people are
    easily able to spot issues with the model.
  • Fan: we haven't.
  • Yingzhen: I don't think they have tested it. So far all tests are
    done through some YANG tools.
  • Lou Berger: I'm confused why it's called address resolution when
    it's v6 neighbor discovery.
  • Fan: The original draft title is yang data model for IPv6 address
    resolution. We first limit the scope to just correspondence to the
    ARP which is just for IPv6. As we add more nodes and it's not only
    about address resolutions, so we changed the title to neighbor
    discovery.
  • Lou: But you didn't change the modle.
  • Fan: The modle was changed from IETF ND. The original model name is
    IETF ND, during the WG adoption poll, Acee suggested that it's only
    about address resolution, so we should name it IETF IPv6 address
    resolution, so we changed the model name. It's a little bit
    conflicting, welcome suggestions.
  • Lou: If the direction is to allow for and then add all the other
    elements that you would need to manage neighbor discovery, and if
    that's the direction, I think you should call the module v6
    neighborhood discovery
  • Yingzhen: Something for you to consider. I guess maybe we can review
    the original specification to see whether more content needs to be
    added. And later figure out what's really the right name.
  • Erik Kline: Not to disagree with Lou, my quick schema of this was it
    was kind of looking like linux proxy's netconf interface config for
    just the address resolution parts. There's a bunch of other ND stuff
    seems to be tracked elsewhere, also ND has information node query
    stuff and a whole lot of things. I don't know what the group wants
    to do, centralize it or split it up. But there's a lot of extensions
    even now, for things that fall under the name ND but that are not
    actually neighbor discovery but some other ICMP-based protocol.
  • Lou: I was keying off the comment from Acee that they're gonna add
    in everything for neighbor discovery and if that's the intent, it
    would make sense to change the name. You gotta choose whether you're
    doing neighbor discovery or address resolution. Pick and be
    consistent.
  • Erik: I was re-thinking because they had already said RAs and other
    things were somewhere else. They are coming back? The RA stuff is
    not coming back? I don't know.
  • Fan: It's already been defined in another YANG model, I think.
  • Acee: I'm not the co-author of the original that just did the RA,
    but I am a co-author of the bis version of that draft. So I know
    where it is. In a separate module.
  • Erik: I agree it's just address resolution.
  • Acee: Actually, they've added more, so there're some counters for
    some other things in there. And if the intent is to add everything
    in there, it's fine the way it is. But it would good to get as much
    of it, as much of the neighbor discovery protocol because I don't
    think neighbor discovery is that insurmountable a protocol that you
    can't put it all in one model, other than router advertisement which
    is already in an RFC. Kind of circling here.
  • Yingzhen: Continue the discussion in the mailing list.
  1. 10:10
    Fast failure detection in VRRP with Point to Point BFD
    https://datatracker.ietf.org/doc/draft-ietf-rtgwg-vrrp-bfd-p2p/
    Aditya Dogra
  • Jeff Haas: BFD chair hat on. Did the authors consider using seamless
    BFD rather than 5880 standard BFD?
  • Aditya: This draft can encompass all variants of BFD, we are not
    really focus on point-to-point BFD or mulpoint BFD or SBFD, you can
    use anything.
  • Jeff Hass: So the motivation for doing seamless BFD, if you're doing
    standard 5880 single Hop BFD for this, you have to have a running
    state machine, and that's what looks like it's partially motivating
    the procedure in the draft to have this no magic session, whereas if
    you're able to do seamless BFD, what you're actually doing is
    addressing the device that happens to have the role, which might be
    under a current vrid rather than a specific IP address. So this
    might get you out of the point of needing to have quite as much
    magic vrrp procedures to decide whether you need to failover or not.
  • Aditya: We do look into the seamless BFD in recent times because
    there's another draft which came for the seamless BFD. So from that,
    what we understand is more like we can push some of the updates from
    the vrrp state machine toward the seamless BFD. Definitely that is
    something which we can take a look. But as I mentioned, whatever the
    peer learning model that is very much specific to the VRRP, we can
    use the seamless BFD the way OSPF or ISIS they have used to carry
    those discriminators, that will help to scale it. Comments
    definitely taken. Seamless BFD can be considered for the scalability
    aspects as well as to clean up some of the state machine.
  • Jeff Hass: Rather than worrying about the scalability, you're at
    that point worrying less about having less state in VRRP to enable
    BFD and just focusing on the BFD procedures. You care about the role
    of being able to failover. You don't care about having a point to
    point session with the backup.
  • Aditya: Sure.
  • Acee: In this draft, basically the backup advertisements are sent at
    normal rate, and they're just to let all the vrrp routers that are
    participating know who to set up BFD stations, correct?
  • Aditya: Right. But we do think about that we don't want to send the
    BFD updates at the same frequency as active.
  • Acee: I understand that. It's more of a provisioning of the BFD
    sessions so you can have this full mesh of BFD sessions among the
    routers that are participating in the vrrp, right?
  • Aditya: That is true.
  • Acee: It seems like if you could avoid adding that using seamless
    BFD like Jeff was saying. I don't envision it right, that would
    probably be simpler. It sounded like something to look at.
  • Jeff T: We started this work kind of before seamless BFD came into
    picture, but all valid points.
  • Vengada Prasad Govindan: In the slide 6, you have mentioned 3 *
    Advert interval,I think it should be multiple Advert interval, 3 is
    the default, but you can have it higher or lower.
  • Aditya: That is true. When you reduce it or the granular you make
    that advertisement interval, the higher the load it puts on the CPU
    and that is one of the reasons why we want to do this way.
  • Vengada: That's fine. But my point is it does not have to be three.
    Second point, just my thought with no particular preference, I think
    BFD is slightly a better fit because of the fact that you have the
    neighbor signal capability. So even if in the case you have
    asymmetric BFD timers, one guy is fast and another is slow, BFD
    could be a better fit because the faster guy detecting it can just
    signal the neighbor down and therefore essentially bring the session
    back.
  • Aditya: Yes, in that situation, the point to point BFD will be much
    better solution than seamless BFD.
  • Yingzhen: Right now, we have BFD for point to point and point to
    multipoint, both in RTGWG and now Jeff brought up the seamless BFD.
    So I think the chairs will work with the authors, we need some
    coordination work to make sure everything goes right.
  1. 10:20
    SR Policy Programming RPC
    https://datatracker.ietf.org/doc/draft-ali-spring-sr-policy-programming-rpc/

    Zafar Ali (10 mins)

  • Boris Khasanov: We need more clarification about motivation, so far
    serialization advantage does not sound quite convincing.
  • Zafar Ali: RPC is much simplier and easier for implementation.
  • Boris Khasanov: I see the point but cannot agree in regards to PCEP
    (there are PCEP libraries in Go), for BGP I could agree.
  • Zafar Ali: Let's talk ofline.
  • Lou Berger: Why can't we use YANG? It is well defined why there is
    need for a new tool?
  • Zafar Ali: Fare point.
  • Jeff Tantsura: Zafar, would be good to make a comparison table.

From the chat :

Dhruv Dhody: Does the NMDA ephemeral data store not able to handle this?

Jeff Tantsura: +1 Dhruv
Lou Berger: was going to ask same quesstion. this still sounds like
standard yang + xxx-conf|gNMI
Joel Halpern: We started I2RS assuming we needed something else, and
after the analysis concluded that indeed YANG can do the job.
Jeffrey Haas: The yang language was fine. The idea of the data store
relationships... that took more than a little effort to address.
Dhruv Dhody: @andrew - In Yang, you have the flexibility of updating the
full SR policy or just CP, so we dont have to worry about "the unit of
signalling"
Joel Halpern: Yep, and as Jeff pointed out, we did the work. So let's
leverage that.
Dhruv Dhody: Adding NMDA ephemeral state in comparison is needed,
currently it is not even mentioned.
Andrew Stone: @Dhruv true, but considering path updates can be
"frequent" having to update a higher level root rather than a more
specific draft reduces the overall payload processing on both the client
and server (reporting side) to understand what has changed. Or in other
words, most activity will involve updating SID list instructions. Having
to redeploy/reprocess the entire SR Policy with N CPs and N SID lists
just seems.. heavier to me
Jeffrey Haas: So, for part of that analysis how independent is the state
you're provisoning? If fully independent where commit checks aren't
needed... rpc could be done. But the minute you add relationships, you
want config. And SR is all about gluing stuff together. So, I find it
likely that eventually pushes you to really want it in ephemeral.
I'd want to spend more time staring at the draft, but rpc is fine if you
just want to poke the box and get stuff to happen. But the minute you
want stateful relationships with some flavor of possible persistence or
even nicer operational state, ephemeral config starts to be strongly
appealing
And as Andrew suggests above, the main headache even with ephemeral is
you have a "commit" operation which can be slow.
Zafar Ali: This is not related to "configure" SR policies by a
controller. But about defining an RPC to program ephemeral states at
router, e.g., gRPC.
Andrew, in most cases, we only need a single CP. May be in a remote
case, there may be need for a policy with two CPs - so the overhead is
not there wrt policy being unit of singling is low. The correct data
model is SR policy level. But we can discuss more offline.
Lou, I will connect with you after the RTGWG.
Jeff -
Re: how independent is the state you're provisoning from config?
This is completely independent. Think of PCEP programming an SR Policy
is independent of the configuration.

  1. 10:30
    The Challenges and Requirements for Routing in Computing Cluster
    network

    https://datatracker.ietf.org/doc/draft-li-rtgwg-computing-network-routing/

    Yizhou Li (10 mins)

  • Jeff T: What is the purpose of the work? Is it informational
    reference for all of us of what people tried in the last 30 years,
    there are many attempts to do this, or new routing protocols that
    would potentially lead to the creation of a new working group to
    develop it? What are you trying to achieve?
  • Yizhou: It's informational first. Because we think the computing
    cluster network currently, e.g, running AI training or inference,
    they have special characteristics different from like 20 years ago,
    currently we want to try to collect all the information and use this
    document to provide an overall view to see whether it is a good time
    or a right thing to revisit whether we can make a different strategy
    or come up with a better strategy to do the hybrid routing. So
    that's the current goal of this.
  • Jeff T: I would think you need to take a similar approach we did for
    routing in data centers. About four years ago, we've written a draft
    that specifies the requirements for routing protocols in data
    centers, this result in RIFT and LSVR working groups, both are
    successful. So unless you want to create a preference, I think you
    need to start with going and specifying what's missing, what's not
    working, why this work is needed. So the motivation, not just these
    are the things we need to do.
  1. 10:40
    Routing in Satellite Networks: Challenges & Considerations
    https://datatracker.ietf.org/doc/draft-lj-rtgwg-sat-routing-consideration/

    Tianji Jiang (10 mins)

  • Jeff: What IETF can do here?
  • Tianji: My expectation is to promote some new technologies for the
    routing part, but first start with the problem statement, usecase or
    requirements. There are many things that can be done here, like no
    full-set routing, intelligence on board satellites, because of the
    limitations and also some predictable and predetermined satellite
    footprint that can help with the RTG work.
  • Erik Kline: Can we use PCEP as sort of solution? What's missing?
  • Tianji: We want to get some soft based routing here because we won't
    get all the routing intelligence on the ground to calculate some
    path, because the satellite has predictable and predetermined
    information to help with the PCE or headend to do that sort of work.
    But that information has to be through some way, like a routing
    exchange, to down to the ground part to do the work.
  • Erik Kline: All these satellites already have a management plane.
    They have usually some kind of S-band TT&C channel. These things are
    already being controlled and monitored continuously from the ground
    and I'm not saying that a network operator would want to reuse that
    to relay PCE messages, but it could be done.
  • Tianji: If you have a multiple satellites above and they have to
    talk with each other, it seems like there are some things they have
    to exchange.The UE will talk to multiple satellites above and those
    will get multiple paths, also for each path, you get multiple
    satellites.This is more complicated case.
  • Erik Kline: In the UE case, isn't it directed which one to hand off?
  • Tianji: This is something that's not going to be discussed from the
    day 1 for 6G. The UE can directly talk to multiple satellites, and
    for each satellite going to the ISLA link to the next and merge at
    some point, and go down to the ground station. I'm not talking about
    the radio
  • Jeffrey Zhang: You use the 5G/6G use cases to arrive at those
    conclusions and the challenges on the routing aspects. Those are
    just generic problems to solve in the satellite networks, not
    specific to 5G/6G.once we solve the general problem of routing among
    the satellite the 5G/6G problem will be solved from the transport
    side. The 5G/6G itself may need to do some work on their side. But
    on the IETF side, we have a generic problem to solve basically,
    routing among highly mobile and limited capacity, those are generic
    problems we need to solve.
  • Tony Li: We have problems about dynamics and scale so it's about
    routing, nothing new here. SDN is questianable if we think about
    thousands satellites and RIB/FIB updates there and UL bandwith. It
    is surely distributed solution.
  • Tianji: A very good point. If I have thousands of satellites in my
    network, we don't want to do all the LSDB flooding part. That's why
    in the routing consideration, the first point is no full set of
    routing intelligence on board the satellite. We try to simplify
    things for that part and to integrate with the TN network.
  • Yingzhen: Tony, do you want to mention something about the work
    being done in TVR?
  • Tony Li: No, this has got more to do with RFC9717 than TVR. But I
    agree a distributed solution is more preferable than a centralized
    solution.
  1. 10:50
    IGP Color-Aware Shortcut
    https://datatracker.ietf.org/doc/draft-cheng-lsr-igp-shortcut-enhancement/

    Changwang Lin (10 mins)

  • Peter Psenak: Happy to see you removed the IGP extensions, but
    what's is left? All you are describing is on the tunnel headend what
    is your steering policy into the same destination. I don't see why
    we need any RFC on this. It's a local behavior, you can do whatever
    you want.
  • Changwang Lin: The IGP shortcut is first described in RFC 3906. We
    just want to update this RFC, we can use the tag of IGP prefix to
    look at the colored SR policy.
  • Peter Psenak: But that RFC doesn't specify how you load balance your
    traffic across multiple tunnels to the same destination.
  • Changwang Lin: This RFC does not mention the tag of the prefix will
    look up all the TE-tunnels.
  • Peter Psenak: they don't want to solve that problem became that
    problem is header implementation.
  • Liyan Gong: We have checked the RFC3906. There are some descriptions
    for the shortcut calculation in the SPF calculation part and the
    routing calculation part. We think maybe we have changed this
    description, we want to hear the advice from the RTGWG.
  • Yingzhen: We need more people to look at this document and see
    whether this informational thing needs to be specified.
  1. 11:00
    Artificial Intelligence (AI) for Network Operations
    https://datatracker.ietf.org/doc/draft-king-rokui-ainetops-usecases/

    Cheng Li (15 mins)

  • Jeff T: A chair comment. The NMRG in IRTF, they have been working on
    these topics for at least three years. There're whole bunch of
    drafts that they're progressing. That's exactly the place to have
    more theoretical discussion.I don't think they're talking about
    agentic AI specifically, but applicability of AI to network
    operations for sure. In IETF, we have got nmops whose charter is
    management of the network. There's also some discussion regarding
    applicability of AI machine learning technologies, network
    management, so that's the right place to go.
  • Tony Li: Our comany does work on AI. Why can't you use existing
    manage technologies?
  • Cheng Li: We don't refuse.
  • Tony Li: So what we should standartize?
  • Reza: Idea of this draft is about some aspects what we need to pay
    attention. May be we realize that we already have everything needed.
    Idea to check if we have some gaps or not. This is very open
    question, but this something we want to explore this together.
  • Tony Li: We have pretty good management plane, if not please let us
    know.
  • Daniel King: Don't panic, there are about 20 use cases, the existing
    IETF technologies may have already that is needed to cover all those
    gaps. The content related to the research will be removed.
  • Michael: It was a nice idea, but have you considered the sand box
    environment before you apply your generative AI routing policy or
    something? Because many of the cases that I got when I tried to
    create some routing policies from AI, some of them may have wrong
    origination or something else. Do you have some consideration to
    make sandbox environment before applying to the production.
  • Reza: The idea to use sandboxes definetly make sense. Idea about
    digital twin will be considered in another draft, it is a very good
    idea. The whole idea is is there any interest from routing group to
    do this? Is this the right place to do it? And this is mainly
    triggering questions rather than answers.
  • Jeff: NMRG has been working for digital twins. I am not saying that
    we cannot work on that here.
  • Reza: We will present that there tomorrow.
  • Jeff: Please do split unsupervised learning from any kind of
    statistical analysis, this is pure math, been doing it for 20 years,
    deployed at scale, nothing needs to be solved.
    The existing NBI and SBI are good. All of that stuff is working. I
    don't think we need any replacement or anything better than that,
    but what we really need is correlation of events, because events,
    it's really multidimensional. We are getting many events coming from
    different things. They are very difficult to correlate. It's
    difficult to correlate link flop on one side of the network with CPU
    load going up on another side of the network. So the real value of
    this work would be definition of how to correlate events and
    potentially if we are going into machine learning, how to cluster
    them, use standard clustering methodology.

  • Tony Hill: But the type of work you're aiming to do here is probably
    not protocol work. We can check the status of different documents,
    it is downstream work, looks it is more suitable for ITU-T or ETSI.

    Daniel King: People found this doc quite useful as a cookbook. One
    issue is that operators want to know how AI solutions can be sort of
    source of truth for them. We need some kind of visualization. We
    need to understand how the engine has actually reached that
    conclusion, and that's something that we're looking into as well.

If time permits:

Scenarios and Protocol Extension Requirements of a Generalized IPv6
Tunnel

https://datatracker.ietf.org/doc/draft-li-rtgwg-gip6-protocol-ext-requirements/

Xinxin Yi (5 mins)

Poll for "Should the WG work on a general tunneling mechanism that
supports iOAM etc.?"
Yes(24) No(11) No Opinion(7)

  • Yingzhen: You can continue the discussion on the list.

SR based Loop-free implementation
https://www.ietf.org/archive/id/draft-deng-rtgwg-sr-loop-free-01.txt
Lijie Deng (10 mins)
(No time for presentation)