Skip to main content

Minutes IETF119: rtgwg: Mon 23:30

Meeting Minutes Routing Area Working Group (rtgwg) WG
Date and time 2024-03-18 23:30
Title Minutes IETF119: rtgwg: Mon 23:30
State Active
Other versions markdown
Last updated 2024-04-06


IETF 119 RTGWG Minutes

Jeff Tantsura (
Yingzhen Qu (

WG Page:

9:30-11:30 - Tuesday Session I, March 19, 2024
Meeting Administrivia and WG Update
Chairs (10 mins)

  • (no questions/comments)


Multi-segment SD-WAN via Cloud DCs
Linda Dunbar (10 mins)

  • (no questions/comments)

  • "Do you think the multi-segment SD-WAN as described in the draft is
    a use case that the IETF should work on?"

    • yes: 20 no: 9 no_opinion: 33 total: 84
  • "Do you support adoption of this work in RTGWG?"

    • yes: 9 no: 9 no_opinion: 25 total: 92

Will be taken to list.


Path-aware Remote Protection Framework

Yisong Liu / Changwang Lin (10 mins)

  • Linda Dunbar: not too clear on whether BGP, IGP or BFD are enabled?

    • we can also use existing protocols, BFD. How to notify this
  • Linda Dunbar: Just to confirm, you'll use BFD for local detection,
    but for propagating you'll use something else?

    • yes, more specific solution on how to notify failure, want to
  • David Lamparter: it is not clear if the existing mechanisms have any
    issues supporting what you need.

    • want to improve specific topologies(?)
  • Antoine Fressancourt: about the selection of the remote repair node.
    Is it a self election mechanism from receiving a failure
    notification ? If a node tries to repair a path, does it stop the
    upstream relay of the failure notification ? Can two remote node be
    repairing a path in parallel?

  • Maria Matejka: Trying to fix BGP specifically? - should make the
    existing policies & implementations better. Or is this about using
    an existing routing protocol vs. a new one?

    • will give more specific info
  • Zhaohui Zhang: this seems not to be a problem if you're running an
    IGP? There's also RIFT which should solve this problem

  • Zhenbin Li: we have a mechanism of interworking BFD sessions,
    failure notification can be sent. Can this be reused in this

    • will look into this
  • Cheng Wei Qiang(China Mobile): last meeting a draft was presented
    re. protection requirements in datacenter; that draft outlines
    protection requirements. Existing convergence times are not enough,
    trying to provide a direction for microsecond protection
    possibilities. Really need specific solutions.

  • Jeff Tantsura: (1) well tuned routing protocols can provide e.g.
    below 100ms. Unless this can be improved on, it's not helpful. Are
    you looking at data plane based notifications? (2) if you lose a
    link, you don't lose reachability, just bandwidth. Need to consider
    this in notification. (3) If this is keyed on router ID - (?) -
    previous work in this field exists. Overall, complexity must not
    outweigh the cost.

  • Jeff Haas: there's work in idr & probably elsewhere re. next-nexthop
    nodes / what is upstream for your ECMP paths. Also global
    convergence for protocols being considered wrt. too much rapid
    flapping caused by events. Dynamic signaling layer outside
    reachability being considered.

  • Keyur Patel: More clear problem definition would be beneficial,
    especially what time scales. Interesting work going on elsewhere,
    even wrt. predicting failures.

  • Dmitry Afanasiev: (...?...) entropy labels in protocol headers
    (...?...) times are really small

  • Jeff Tantsura: years ago there were presentations about changing
    entropy inputs from TCP, ... congestion signalling. But all of this
    happening on endpoints to allow routers to choose different links.


Destination/Source Routing
Shu Yang (10 mins)

  • Jeff Tantsura: had discussion with authors, there were no changes,
    considering fast tracking this.

  • Jen Linkova: yet another use case, seems to be required to properly
    implement correct source address selection for rule 5.5. Needed for
    flash numbering.

  • Yingzhen Qu: Implementation section would be good.

  • Nan Geng: looking for places to find summaries of previous

  • Jeff Tantsura: everything should be on the rtgwg mailing list.


A Routing Architecture for Satellite Networks
Tony Li (20 mins)

  • Dongyu Yuan: (1) surface of the satellites could be divided into
    stripes, how does it work with inclined orbits? (2) When satellites
    enter some specific area and lose connection, how is it handled?

    • yes absolutely the orbits are inclined and this needs to be

    • no need to modify IGP, everything is flaky. Things will be
      handled normally, only issue is to make sure the routing churn
      doesn't kill the IGP.

  • Weiqiang Cheng: (...?...) overhead is too high? Bandwidth for the
    satellite is very expensive. (...?...) SRv6 solution may be more

    • didn't go to SRv6 because IPv4 wanted and overhead is even
      larger than SR-MPLS. More efficient solutions are possible, but
      wanted to use existing tools.
  • Weiqiang Cheng: even with IPv4, some segment routing solution

    • yes, proprietary approaches are possible but this is trying to
      do it with existing hardware.
  • Zhenbin Li: we know SR-MPLS label stack has overhead, is it possible
    to use existing TE solutions?

    • could use RSVP, but that'd need LSP setup; paths changing every
      15s would create a lot of overhead. with SR, the gateway can do
      this; label stack changes but only need to deal when a packet
      for that destination appears and there's no signalling. And
      off-stripe only adds 2 labels, max should be 64bit overhead.
  • Tianji Jiang (China Mobile): positions are predictable, question is
    how to use this information. It's already known ahead of time what
    the routing should be.

    • disagree; information is about what could work, but not about
      what does work. There will be failures.
  • Tianji Jiang: harsh environment, hardware capabilities limited.
    Bandwidth can be very low, too low to run IGPs or BGP.

    • have conflicting information that cannot be shared (room
  • Tianji Jiang: disagree with running the full stack

    • satellites don't need to run that much, it's just one instance
      of IS-IS. It has been done on 16MHz MC68000s.
  • Keyur Patel: concepts of gateway & slices seem good. Orbits are
    always predefined and failures can be predicted - is the attempt to
    bake this into the protocol?

    • yes, planned, but doesn't need to be baked in too much. The
      gateway can handle most.
  • Keyur Patel: are there specifics metrics/... for the gateway (?)

    • gateway has all the information. Major thing we want from the
      routing protocol is link liveliness. Lots of cases where
      expected link doesn't come up.

(presentation continues @ "Off-Stripe Return Forwarding")

  • Jie Dong: scalability important, churn needs minimizing. In this
    proposal the striping helps with that, but does it result in
    non-optimal paths?

    • gateway has all the information, sees all links and no stripe
      boundaries, can pick best path. L2 almost irrelevant, mostly L1.
      Gateway has all information for all stripes at least in its
      area. E.g. covering California, if 5 stripes cover it, gateway
      needs at least these 5 (maybe not all e.g. 30)
  • Jie Dong: what does the user station need?

    • user station only has the information it needs, will get gateway
      SID or area(?) SID. Don't need more.
  • Shukri Abdallah: algorithm for selecting areas? how to prove this?

    • have a requirement, don't have an algorithm
  • Dmitry Afanasiev: did you consider inserting intermediate (?)
    particularly for satellites stable re. earth. Adressing geographic
    areas directly?

    • trying to avoid any connection to surface; don't need and don't
      want since it would need mapping back and forth. Seen geographic
      adressing, but how does that deal with failures? #1 is Trying to
      deal with access problem.

Jeff Tantsura/chair: little input from industry, if you're in the room
please help with requirements so we can build something that works.

  • "Do you think it’s ok for RTGWG to pick up some routing work related
    with satellite networks?""

    • yes: 42 no: 4 no_opinion: 3 total: 129

Jeff Tantsura/chair: Considering interim meeting for this before IETF
120, if there is enough interest.


Extension of Application-aware Networking (APN) Framework for Application Side

Zhenbin Li/Shuping Peng (10 mins)

  • Jeff Tantsura: since this was brought for adoption; this is out of
    scope for rtgwg. IESG rejected a previous charter. rtgwg is giving
    space for occasional upd (ZTEates to keep this work visible.
    Providing a way for updates.

  • Daniel Huang (ZTE): application attributes - encapsulated in user
    packet? Maybe this should be managed by control plane. Overhead from
    being in user packets.


Application-aware Data Center Network (APDN) Use Cases and Requirements

Hongyi Huang

  • Zili Meng: what kind of different information needs to be carried?
    Is it similar to each other?

    • does carry different, only looking at broader use case right now


Use Cases and Requirements for Implementing Lossless Techniques in Wide Area Networks
Hongyi Huang

  • out of time for Q/C


Use Cases-Standalone Service ID in Routing Network
Daniel Huang

  • draft not presented, out of time


Copied from the Chat

Ketan Talaulikar

Nice to see the chairs in the spotlight ;-)

Yingzhen Qu

please help with collective note taking:

Andrew Alston

Adoption calls still have to go to the list don't they?

Acee Lindem

Should go to INT Area

Yingzhen Qu

@Andrew, yes, the adoption call will go to the list.

Himanshu Shah

On Path aware remote protection - I believe the author is proposing a
scheme to handle the remote failure with a path aware backup path
already programmed in the FIB.

David Lamparter

@Antoine didn't catch your question for the notes either, sorry

Himanshu Shah

As soon as the notification arrives, switch over happens. The whole goal
is to reduce the service outage instead of waiting for BGP withdraw..

Himanshu Shah

The switchover scheme is not yet proposed.

Antoine Fressancourt

@David My question is about the selection of the remote repair node. Is
it a self election mechanism from receiving a failure notification ? If
a node tries to repair a path, does it stop the upstream relay of the
failure notification ? Can two remote node be repairing a path in

Himanshu Shah

Sorry i meant "notification scheme" is not yet proposed.

John Scudder

Did the person at the mic really say this solution would provide
microsecond scale repair?

John Scudder

Ain’t no way.

Himanshu Shah

I agree - has to be milliseconds depending on what the notification
scheme is..

Weiqiang Cheng

Weiqiang Cheng

This draft gives some analysis on requirements of fast protection in AI

Yingzhen Qu

@meetecho, please switch the camera to the presenter

Lorenzo Miniero


Yingzhen Qu

thank you

Antoine Fressancourt

@Weiqiang thanks for the link

David Lamparter

the queue is still stuck with people from the previous draft, can we
clear that?

Weiqiang Cheng

@ John, I mentioned the requirement is sub-millisecondes even us. I
don't think the solution will provide the us scale. But it is valuable
to look for the way to improve the recovery time.

Himanshu Shah

@weiqiang - the proposal reminds me of AIS/RDI type of scheme.. :-)

Adrian Farrel

Looks like a pretty picture to me

Dave Phelan

Wasn’t drawn on a napkin.

Christopher Hawker

Another pretty picture!

Jeff Tantsura

@Jeff haas - on fast recovery topic- you have mentioned other drafts
that are being progressed in other eg's - please share the draft names

Acee Lindem

Is it just me or doesn't the Meetecho private chat work?

Christian Hopps

is the presentation over? the questioners seem to be assuming that

Christian Hopps

can we finish the presentation first?

Shukri Abdallah

Do you specify an algorithm for selecting orbits that form a stripe?

Christopher Hawker

@Acee just tried sending you a private chat.

Lorenzo Miniero

Acee: what do you mean by doesn't work? I use it regularly when I need
to get in touch with people in rooms, e.g., to provide assistante to
remote speakers

Lorenzo Miniero

You can try contacting me privately here too, if you want to test

Adrian Farrel

@lorenzo, it is not immediately obvious how to do it from the chat
window without starting up zulip

Lorenzo Miniero

Adrian: you need to click on the name of the participant from the
participants list, and options will appear. The balloon icon will open a
private chat

Adrian Farrel

Yup, but not the participants name in the chat window :-)

Lorenzo Miniero

Ah no, that's correct: those names are not clickable

Dawei Fan

off-stripe forwarding seems find short path, my question is how about
the propogation in this architecture. it is the same as current

Yisong Liu

@John, we hope to provide a solution for millisecond repair. but it may
need more work on the specific solution and we'll continue to do that

Andrew Stone

Fair to say TE goals are mainly for user stations, to TE path from
gateway to satellite is what's critical and TE from satelite to gateway
is not ?

Yisong Liu

@Jeff Haas, please help to confirm do you refer to this draft:

Andrew Stone

** for user downstream traffic

Tony Przygienda

well, from experience, satellite guys do -really- like their proprietary
stuff ;-)

Adrian Farrel

Quote Jeff: The satellite work is operating in a vacuum

Christian Hopps

that's fantastic

Tom Hill

Tony, I had some queries about approach, but I'll pick it up in a coffee
break. More musing on past efforts.

Tony Przygienda

well, security through obscurity is a concept. Plus, interop is only
interesting if you look to shop vendors. Most routier vendors are not
particularly good satellite builders and though building a router is not
easy, putting a satellite into orbit is of different scale ...

Tony Przygienda

I doubt culturally the term "router scientiscist" will ever achieve the
same level of awe when mentioned as "rocket scientiscist" ;-)

Christian Hopps

Why should this be in the network? I just don't get it. What servers a
client uses should not be part of the routing database.

Andrew Alston

This is also outside of charter - APN is a large topic - and for RTGWG
to take on any larger topic - it has to be explicitly chartered to do so
- as per the current charter

Dmitry Afanasiev

Also, there probably going to be only few LEO constellations operating
at any given time, very likely interconnected only via ground gateways -
so there is little incentive to bother with standardization

Tony Przygienda

@Dima, well, with the amunt of space junk Elon and AMZN are shooting up
there and grid arrays I'm not sure that is true anymore ...

Changwang Lin

There are two types of notification mechanisms for:

Proactively notify. The suggestion is to only protect the two-level
network and only notify upwards once.

Flow triggered notification. Send notifications to the direction of the
flow when it is perceived that it cannot be forwarded. This method can
notify upstream. If there is no protection path upstream, subsequent
traffic will trigger notification to the higher-level device again.
Remote nodes can simultaneously repair a remote path fault

The remote path aware document needs to address several issues:

In a specific topology, convergence does not depend on the control
surface protocol.
Control surface protocols, such as BGP, extend support to add remote
path information on the next hop of the route.
Fault perception: perceived by the remote end, and then notified to
other protocols to quickly notify the remote end.
Switching process: It does not rely on the control plane and completes
fast switching on the forwarding plane, achieving microsecond level
Christian Hopps

Write a protocol for applications to choose servers don't try and wedge
this into routing.

Andrew Alston

+! Christian

Dmitry Afanasiev

but problem is certainly interesting and it seems it can be solved with
available tools + reasonable amount of tweaking and without too big

David Lamparter

did the slides just disappear or is it just me?

Tony Przygienda

is there a preso after that? Otherwise it's 2aM+ here and bed would be
nice instead of suffering through this stuff ...

David Lamparter

(nevermind, back now)

Christian Hopps

Tony this stuff is progressing b/c lots of people dislike it so much
they are ignoring it.

David Lamparter

we are at →
Extension of Application-aware Networking (APN) Framework for
Application Side

(it's 11:10 now, so >30min over)
Application-aware Data Center Network (APDN) Use Cases and Requirements

Use Cases and Requirements for Implementing Lossless Techniques in Wide
Area Networks

Use Cases-Standalone Service ID in Routing Network

Tony Przygienda

well, RFCs are pretty clear what you do with wtuff after 2 failed BOFs.
But of course some grownups have to apply the prcedural framework ...

Andrew Alston

Well - its failed 2 BOF's if I recall - and the IESG wouldn't approve
their proposed charter - and now it's being shopped - but as I said -
its explicitly out of charter

Dmitry Afanasiev

@Tony - number of sats is big, no doubt, and it is growing fast, but as
for systems - it's just 2, maybe 3-4 more of comparable scale will come
up later, but that's it

Jeff Tantsura

I'm here

Tony Przygienda

@David: ok, looks like I get some snooze time back and couple hours
sleep before morning meetings get me up again ...

David Lamparter


Tony Przygienda

@Jeff, yepp, probably but I doubt you'll ever grow up (and we love you
for it ;-)

David Lamparter

@Jeff: mic queue is still locked from previous draft btw

Yingzhen Qu

@David, thanks for the reminder

Jeff Tantsura


Tony Przygienda

well, I disagree with the premise that network substrate needs to
understand the application semantics ...

Andrew Alston

You are not alone in that Tony

Dmitry Afanasiev


Tony Przygienda

as @Dima once said: it's all distributed linear algebara at the end ;-)
and this does not know whether you're computing tensor cross-section to
calculate static stability or train some generative parrot

Dmitry Afanasiev

but collective ops is a special beast, HPC interconnects historically
provided support for it, at least some of them

David Lamparter

anyone know how this relates to coinrg?

Tony Przygienda

noithing against in-substrate support for folding of course ...

Dmitry Afanasiev

@David collective operations - e.g all-reduce, used in ML training,
doing intermediate reduction in network can improve performance.
Definitely computation in the network.

Jeff Tantsura

SHARP is doing this today

Dmitry Afanasiev

@Jeff exactly, it's a very good example

Tony Przygienda

thinking that to the end you basically want S-I-PMSI capable on the
substrate setting up such hierarchical folding "trees" and that's one of
the things I tried to talk to folks about as BIER use case (think
generalized sharp distribution substrate ;-)

Dmitry Afanasiev

but no SHARP for Eth/IP .. at least yet :)

Tony Przygienda

unfortunately, the folks dealing with taht are religiously against any
type of multicast (for which I have some understanding)

Kehan Yao

@Dmitry, agree. collective operations offloaded to the switch is a
common solution for AI/HPC. So for AI networking, maybe people shouldn't
wear glasses, and some in-network behavior maybe helpful.

John Scudder

I’m not able to stay for this talk on “lossless techniques“ but I want
to point out to anyone who isn’t aware that it sure sounds the same as
what detnet works on. 

Tony Przygienda

@Dima: RIFT a natural for it once we manage to get the multicast folks
finishe their work ;-) though of course the multicast here becomes not
multicast but really distributed folding operatoin

Yingzhen Qu

@John, noted.

Jeff Tantsura

multicast (or tree building at all) is the least of your problem, ASIC
doing reduction at line rate is though

Tony Przygienda

ASICs are easy, they build a new one every 6 months ;-)

Dmitry Afanasiev

@Tony yes, establish reduction topology - this is straightforward, but
also deal with data loss, reducer failures, latency bounds, maybe
agreement on quantization

Tony Przygienda

@Dima, you know me, I'm a control plane whinie, the larger the scale the
better ;-)

Tony Przygienda

but of course I agree, practically to get such distributed folding
deliver real gainss is anything but eays at the volumes

Dmitry Afanasiev

vectors to be reduced can be quite large, so enough of buffer space on
reducing switches

Tony Przygienda

oh, people complained about private messaging not working because I just
discovered a long queue of little symbols at the top I ignored ;-) Cute

Dmitry Afanasiev

@Tony I'm with you wrt scale - the larger the better, also that's where
really interesting problems start to appear, and just throwing money is
not enough to make those problems go away :)

Tony Przygienda

@Dima, nah, money helps. You just start to throw it at smart people
rather than brute forcing it ;-)

Tony Przygienda

okey, bed now. fun session ;-)