LSR Interim on IS-IS Flooding Speed September 29th, 2021 Chairs: Acee Lindem (acee@cisco.com) Chris Hopps (chopps@chopps.org) See meeting materials for presentations: https://datatracker.ietf.org/meeting/interim-2021-lsr-01/session/lsr 1. Joint Statement Chris H: Are you going to start a new draft? Bruno: It's going to be a new draft with some text from existing drafts. Chris: Great news. Acee: Hopefully we won't be relegated to the five authors. I'll volunteer to go to contributor on the combined draft. This is the case where we should be able to let the authors figure it out. We have 8-9 people. Hopefully we won't be relegated. Peter: We can figure it out. It won't be an issue. 2. Team Bruno – Congestion Control and Flow Control Acee: How did you decide most ISIS implementations are CPU bound? I have more experience with OSPF implementations, I found them more I/O bound. I know OSPF LSAs are smaller than ISIS LSPs, How did you make the determination? Bruno: Good question. We used FRR, and try to find the exact reason for the performance. It was not dropping LSPs on the I/O path, it was just the receiver being busy. We did some tests with Cisco IOX XR and we have less data, only through debugging. We might be wrong. We found the receiver is waiting to acknowledge the LSPs, and it's too late in terms there are lots of LSPs in its backlog. It can never speed-up to the sender with the CPU not capable of acknowledging LSPs fast enough, or they're not supposed to set the timeout on the sender. In general, it suggests to us that the receiver has to do some CPU work before acknowledging. Acee: I was just wondering. Thanks. Bruno: The limit is not when you have a single adjacency but when you have multiple senders with a single receiver. Chris: You want to do an adoption? How does this work with the combined draft? It doesn't make sense to do a code point in this case. Bruno: I agree. The slides were done before the Monday meeting. We will seek adoption on the combined draft. Chris: People are gonna like the combined draft. Bruno: We will have less content in the combined draft, because we're restricted to common agreement. So maybe less details, we'll see. Acee: I looked at the interim participants, and we have a lot of people that are going to have an opinion on it anyway in the meeting, with the exception of possibly some of the people in the Asian timezones. 3. Team Les – Congestion Control Acee: Clarifying point here, you've modified the receiver to be able to put the rate-limit at which it will drop, right? Les: Correct. This test code on the receiver that just arbitrarily says after I processed 100 LSPs, I'm going to stop and wait until the next second, but LSPs are still in the input queue. Chris: if I'm not mistaken, both the RWIN or the PSNP interval are static values. It's not something that has to be signaled during a burst. Les: I believe for RWIN to be effective, it has to be dynamic. Otherwise, the alternative is that you pick a conservative value that you think is safe, even under the maximum load conditions, and then you don't achieve the highest throughput that you might be able to actually support. Chris: I used to think the same from a high level. But when you break the queues down to per adjacency, you can give it a static value. You think it needs to be dynamic to adapt to the network conditions, but if each neighbor has a queue of 30, your RWIN is 30 no matter how many neighbors you have. Les: The routers I work on are not implemented that way. Chris: That's software. Les: No, it's not just a software issue. Chris: You can modify software to have a queue per neighbor. Les: Many comments that could be made here. But if you go back to the slides that represent the data plane. You're talking about what's available to the control plane, and we might still have a debate about that. But you're not talking about what's going on in the data plane. That's not easy to change. Chris: You're correct. The RWIN is about how much each queue is for each adjacency. If you want to bring everything together including other traffic, you have to do other things, we're just getting started. Les: If you're trying to determine the state of the receiver, and you want to know whether the receivers getting overwhelmed, you cannot just look at the input queue to ISIS. You have to look at the state of the router as a whole. You might be dropping packets in the data plane and ISIS in the control plane has no idea that this is happening. As far as ISIS is concerned, the input queue is half empty. Chris: This is why you guys are getting together. I'm not saying this is black and white, or you have to choose. Your approach is dealing with the overall congestion that can happen. The RWIN is doing something else, you can send me this many LSPs and I won't drop them. It's another way to optimize the situation. Guillaume: I want to go back to RWIN needs to be dynamic. When you want to do congestion control, you need the window to be dynamic. You can use RWIN as an alternative on what your allowed window on the receiver side is. Chris: I don't believe you need a dynamic RWIN. Guillaume: I tend to believe you do. It's better you want to avoid messes in between. Chris: The green ones are the queues per neighbor. I mean RWIN in this case is fine. Tony Li: Implementations are all over the map right now, and hardware has been making great strides in making interesting progress. That said, there are a whole lot of implementations that are not particularly elegant, that they comply with the first model that Les showed with a giant punt queue and everything just funneling into that punt queue. Now there's more complications, the things that fall into that punt queue are rate-limited. And so, even though the queue may be 10,000 packets deep, you can only dump 1000 packets per second into it or something and hardware is not very forgiving about that, and maybe you can program your queue rate or not. Whatever it is, it's a messy business and there's no question about that. And there are a number of software-only implementations out there that I've seen are that are pretty slim - thanks to containerization and people doing software as a service. Yes, we still see that too. So again we've got an enormous amount of variation we have to try to capture and RWIN is only a first approximation, and maybe over time we learn better ways of doing things. But the whole point is to give us some mechanisms because we realized that some feedback in the control loop is better than no feedback. Acee: Because of all these implementations, this isn't going to be a draft with a hammer that says you must do this and you must not do that. But I'd encourage implementations to allow speed up without all the receivers sending RWIN. You could have a way to bypass that or statically configure it so you don't depend totally on receiving RWIN. Tony: Totally agree. RWIN is optional. Acee: Anybody with a rate limit on a punt queue without some classification, that would just e very naive. Tony: It's not what you want, it's what you get. Chris: Most routers do some amount of classifications. Your point well made. It's all over the place. I don't think we should limit ourselves to it. Tony: Why? Are we making sense for the cases where you can get additional useful information. Les: What came out of our consensus on Monday was, Tony's probably the one who voiced this, let's just provide the tools that are needed to support a variety of strategies. And over time, we'll find out what works and what doesn't work. Three years from now, we may still have people who prefer one approach over another. But at least we have the tools to support a variety of strategies. Chris: Great outcome. Tony Li: We also noticed that we're still learning about TCP mechanisms so if you would think that we're going to get this right on the first pass, it's nothing but arrogant. Acee: We're gonna wait on the combined draft and then do an adoption call. I don't think we have to wait for it to be presented in an IETF meeting. Chris: I agree with that. If the authors get it out fast, we can do an adoption call before the next IETF. Bruno: It's only a few weeks before the IETF. I think it's ok to present a new draft, but thank you for that. Les: The submission deadline is Oct 25th, we'll try to make that. Agree? Bruno: Yes. We can wait for the next meeting. Les: But as far as the working group chairs, anything you can do to accelerate the working group adoption process, from the point of view of us authors, I think is most welcome. Acee: This is the best possible outcome. With a combined draft, there is a much better possibility of the different mechanisms working together as opposed to having two independent tracks. Aijun: We'd like to ask about the PUA draft adoption call. We've asked several times, and the draft has been discussed extensively. It would be nice to start adoption call? Acee: Let's address it offline. There are some technical issues we need to discuss. It's not relevant to ISIS flooding optimization. Chris: It's on our minds. We'll circle back. Chat History Tony Przygienda I understood correctly, in the merged draft the advertisement of the window & change in PSNP procedure is only a MAY in normative sense? Peter Psenak That is correct Tony Przygienda thanks Peter. all clear Les Ginsberg All new protocol extensions defined in the TBD draft are at the discretion of the implementation. No matter which approach an implementation takes it needs to account for the fact that a given neighbor may not support a particular advertisement - either sender or receiver. John Scudder Idle thought: Presumably the RWIN scheme has the usual characteristic that fairness suffers if one client has a very small RTT and another has a large RTT. Then again given that we're doing flooding, maybe that's fine, I'm not sure fairness is of paramount importance. Guillaume Solignac @John we considered indeed that fairness was not a primary design goal John Scudder (Oh yeah, I think that's what this current slide says.) John Scudder Noting to ask Les (or for a co-author to answer here): is it realistic to assume that Ack delay is both stable and knowable a priori? Tony Przygienda as I pointed out (maybe not on the list) IMO this configuration is unnecessary, using the % outstadning un'acked LSPs of the current TX rate is a very good proxy in my implementation experience. Peter Psenak yes, the "assumed" PSNP delay is something that we know upfront John Scudder Slide 4 says getting the delay right is important (if I recall correctly). I get that you can compile or configure in an "assumed" number, but those quotation marks tell a story, and the story is that compiled-in constants are always wrong sooner or later, usually sooner. IME. Tony Przygienda you start slow, you don't quue too many LSPs outstanding and with the the RX ack'ing fast the TX rate goes and with that naturally the amount of LSPs sent without ACK (i.e. buffer on TX is getting deeper if the RX processing is fast enough, i.e. BW * delay product is kept roughly constant, with lower delay BW goes up)7 Guillaume Solignac What you are describing sounds like a congestion window @Tony right ? Tony Przygienda yepp ... pretty much ... Guillaume Solignac :) Peter Psenak what we signal is the fastest PSNP interval that the sender may consider John Scudder I see. Thanks. Tony Przygienda so the RX rate is kind of proxy for your explicitly advertised window but the difference is it comes roughly for free as byproduct of normal protocol operations (albeit well signalled RX window if you can afford to generate it is will be superior of course) Guillaume Solignac Yes. But if you start with a small window (ideally 1), and increase as long as you have TX pressure, you get the same results without relying on precise timers to pace your sending Christian Hopps @John Scudder FWIW partialSNPInterval is a standard defined thing in ISO 10589. I don't think the value is standardized though. Tony Przygienda and TX'ed RX window (or delay known upfront) allows for a faster start of course (depending how/if you decay the max rate in steady state) yeah, 10589 only says "2 seconds is reasonable" ;-) John Scudder "constants are always wrong sooner or later, usually sooner" Tony Przygienda assuming data plane state/features or particular behavior to implement link-state is slippery slope to perdition ;-) we deal with multi-protocol boxes with lots of greedy little things trying to grab data plane from one milli to another Jeff Tantsura More rudimentary implementations would benefit from a clearly described goals an “optimized” implementation should aim Tony Przygienda IETF is not in business of standarizing implementations, IETFs job is to standardize things on the wire & enough etxternally observable behavior to guarantee interoperability. e'thing else ended up making things worse than better Peter Psenak very true Jeff Tantsura We all agree ;-) Tony Przygienda plus, highly scaled implementations are not only very complex but lots of stuff is RFC1925 #4 met often with disbelief by people who weren't there ;-) Jeff Tantsura Make reading 1925 a mandatory part of the overall process ;-)