IETF 108 LSR Meeting Chairs: Acee Lindem Chris Hopps Secretary: Yingzhen Qu Date: Thursday, 30 July 2020 Time: 14:10 - 15:50 UTC 1 Administrivia and WG update Acee/Chris Chris H: What is the status of the base network programming draft? Peter: SRv6 programming is in AD review, and is progressing in the SPRING WG. Chris H: It still needs to do IETF LC, right? Perhaps there could still be issues raised? Martin: Yes, it's going to IETF LC. This will happen by the end of the week. it could be contentious. From LSR perspective, this should progress. The appeal has been cleared. Chris: One objection during the appeal was around the penultimate hop and that affects the TLV space. We could do LC and submit it to IESG, but we don't want to be seen as doing an end-run on the process, with people pointing at the TLV saying "see the TLV is already there." Martin: You need to be careful. I don't expect the network programming draft will progress faster. Most concerns were about PSP, and the new version has added clarifications and use cases. I believe we're on the safe side. Chris: Hopefully this will progress between this and next IETF. 2 IGP Flex Algorithm https://datatracker.ietf.org/doc/draft-ietf-lsr-flex-algo/ Peter Psenak Peter: Ready for WGLC. There are multiple implementations. Chris H: This is reasonable. Any objection? (no) I think it's ready moving forward. Acee: I will send some editorial comments. Peter: Sure. 3 OSPF Prefix Originator Extensions https://datatracker.ietf.org/doc/draft-ietf-lsr-ospf-prefix-originator/ Ketan Talaulikar Acee: Speaking as co-author, we made lots of changes. Please read the new version. It now has two TLVs because of new use cases.. We'd like to have more reviews before WGLC. 4 Experimental results on IS-IS flooding Sarah Chen Acee: What's the TX interval for PSNP numbers? Sarah: 1 ms as TX interval. Chris H: Thank you. This is fantastic. This is what had been asked for during the interim. Is there simulation code available? Sarah: I used our production code, and I already put the threshold-based PSNP in our production code. Eduard: Thanks for the good research. In new production, ISIS will typically have some signature for security reasons, like MD5, and this adds 100 ms or 200 ms delay, and will influence the results. Do you have anything for certification purpose? Sarah: No. Acee: can you put it to the list? Eduard: It's long on purpose, for security computation, it protects against brute force attacks. Jeff Haas: Nice to see these things. The results are similar to what we see when we did TCP performance tuning in BGP. Tony Li confirmed that it's done on real hardware as well. These are optimistic results. One concern is in real network, it may start to deteriorate. For future, I'd suggest to start putting in machinery to start perturbing and see how quickly and often things get dropped in bursts. That would simulate the real world a bit better. Sarah: There are so many parameters that we can tune. The purpose of this simple setting is to show the correlation between LSP interval and PSNP interval. Tony P: I'm assuming the retransmission is done ASAP. Suggesting faster PSNP is good idea. I think the combination of this hybrid PSNP plus transmit ramping up should solve 99% of the problem. That's my guts feeling. Les: Echo other people's comments, great work. If the LSP transmission interval is 1ms and you have 2000 LSPs, why did you take 3 seconds? Sarah: it's our implementation. There is some overhead. Les: this is great work. Hybrid PSNP is great idea. Given that testing doesn't include dynamic adjustments of either drafts, so we can't draw conclusion whether the RX-based or TX-based approach might be the best one. Sarah: The initial work was to test the drafts, but was happy to see PSNP works fine and solved the issue. Bruno: I'm interested in results from stronger sender, weaker receiver. May try multiple senders? Sarah: That's a good point. 5 IS-IS Topology-Transparent Zone https://datatracker.ietf.org/doc/draft-chen-isis-ttz/ Huaimo Chen Chris H: As a chair, we wish we had one draft between area proxy and TTZ. There are differences. I know the authors tried to work together at one point. Since we have multiple proposals, that can't seem to come together, we though we would try the experimental track and let the market decide. You also said most people think it's good, I don't know that's the case. I know there are lots of people putting +1 out there, but how many implementations? How many operators want to deploy? I'm not saying those are barriers to adoption but I'm not sure that you can claim most people support. I saw lots of experts think this is overly complex, that having zones and smooth transition might add some value, but can't justify the complexity. So as a chair I want to let you know, if we move forward with this, many of the experts with industry knowledge are probably not going to review or provide feedback. If this then gets submitted to IESG, it will have to be in the report about a very rough consensus on the work and that many experts in the group did not agree with it. Acee: With chair hat on, the time line is a misrepresentation. The draft did go forward once and it was quite different, and then it was rejected by ISIS WG and expired, and came back after area-proxy draft. It's not really the same. The draft is more like area-proxy. I don't know why it was changed in 2013 to get rid of the single node abstraction. Also agree with Chris, regarding lots of "+1" support, not sure how many read the draft. Speaking as WG member, the other two drafts (area-proxy and route reflectors) have much better specifications. I share the concern with the distributed algorithm, maybe you have done some experiments for the transition. But things don't happen all at once without handshaking across the whole routing domain in the zone. So I don't think the draft is as seamless as it claims, the quality is not as good as the other two drafts, even though it has higher goals. Huaimo: In 2013, the draft included two solutions. Abstract the zone to a single node, select Zone DR/leader which generates LSP for the zone. Second solution is to abstract the zone with edges fully connected. Acee: When it was resurrected in recent history in only included the full mesh of edges. Huaimo: Along the path, we focused on one solution. Now people are interested in abstracting as single node, we brought back something published in 2013. Chris H: it's individual draft, trying different things is fine. I echo Acee's comments. Ot doesn't feel finished. You mentioned you just have two adjs on the link at one time, although it doesn't specify how that works. For P2P, that wouldn't work. You have mechanisms with TLV to do the transition. We don't want to end up where you can do things 10 different ways. We try for KISS solution, keep it simple silly. All these options are running up against that. Huaimo: The draft provides more options because we want to have discussions and we will converge to one option. Chris H: That's reasonable. Huaimo: For smooth transition, we did experimental implementation. If we don't have smooth transfer, when we abstract area to a single node, we will have adjcency flaps. If we don't have a solution, we have service interruptions. Chris H: Nobody is arguing that. What I see people arguing is that nobody really cares about a brief service interruption because it's not going to happen that often. Routers come up and down. 6 Flooding Topology Minimum Degree Algorithm https://tools.ietf.org/html/draft-ietf-lsr-flooding-topo-min-degree-00 Huaimo Chen Skipped due to time limitation - now WG document. 7 Using IS-IS MT for SR based Virtual Transport Network https://datatracker.ietf.org/doc/draft-xie-lsr-isis-sr-vtn-mt/ Chenhao Ma /Jie Dong Tony Li: have you considered using flex-algo? seems you can simplify things. Jie: we did. For flex-algo, we have a draft describing the routine. Peter: what's this draft standardizing? I see nothing. Jie: We had discussions on the list about how to use topology TE attributes to advertise information at policy level, which is not fully specified. whether this draft is standard or informational, we need the WG's feedback. Peter: I'm not sure you need to standardize. Chris H: so it is for informational. Acee: we'll discuss more after the base draft adopted in SPRING. 8 IGP Extensions for Segment Routing Service Segment https://datatracker.ietf.org/doc/draft-lz-lsr-igp-sr-service-segments/ Yao Liu Acee: You could use BGP to communicate, so don't need this. Yao: Yes, if there is BGP. Chris H: The IGP is special, we don't need extra info. The feedback is BGP-LS is designed to carry such info, not IGP. Yao: The main purpose is to provide info to BGP-LS. Chris H: You're using IGP as transport. We can continue on the list. 9 IS-IS Extensions to Support Packet Network Slicing using SR https://tools.ietf.org/html/draft-zch-lsr-isis-network-slicing-03 ISIS Extension to Support Network Slicing over IPv6 Dataplane https://tools.ietf.org/html/draft-peng-lsr-isis-network-slicing-srv6-00 IGP Flexible Algorithm with L2bundles https://tools.ietf.org/html/draft-peng-lsr-flex-algo-l2bundles-01 Ran Chen 10 Prefix Unreachable Announcement https://datatracker.ietf.org/doc/draft-wang-lsr-prefix-unreachable-annoucement/ Aijun Wang / Zhibo Hu Acee: Ff there are connections between ABRs, we don't need to mess up the protocol. Praveen: How different is this with already existing mechanism, like max-age? Is it specific to summary route? Acee: Yes. Praveen: There is no special relation between BGP and IGP. Acee: There are lots of solutions for this problem. the draft is half-baked. I don't think it's the right way. Praveen: In RIFT, we have default route issue. [aijun was having audio problems so couldn't reply] Chris B: Please don't use this time to trash other people's draft. The presenter is having audio problems so no one is around to defend the work. Chris H: Yes, there was some support on the list for this work, so I was hoping that someone or another one of the other coauthors would join the queue. Since that is not happening, I suggest we continue this on the list. The Chat History Jeffrey Haas Question for Sarah's presentation on reduction interval slide (slide 5?) - is duration including the # of retransmits? John Scudder Kind of interesting that retransmissions drops as txinterval drops, until you get txint down to 1 Jeffrey Haas it does feel like they're hitting a rate limiter I'd also be interested in the transmission cadence. insufficient gapping with highly grouped bursts might lead to drops there as well Tony Li Yes, duration is wall clock from first LSP to last LSP. Tony Przygienda yeah, we have the stuff forever and we have no massive RTX'ssions probably some implementation problem Jeffrey Haas Tony Li, that includes the retransmits? Tony Przygienda but there is tons good stuff here I'll chime in Tony Li Yes, rexmit included. Thank you, Tony! Jeffrey Haas Interesting. It suggests paying the retransmit tax is still worth it for the reduced total time Tony Li No, hang on... Tony Przygienda hysteresis has to be very aggressive on back-off Yingzhen Qu I'm wondering whether data traffic has any impact? Tony Przygienda not if implemented correctly Christian Hopps Yingzhen, shouldn't Jeffrey Haas It's all queues. Depends on what's the resources being competed for. Tony Li These experiments were pure control plane. No hardware was harmed. Or used. Tony Przygienda yeah, if you don't prioritise properly from input until protocol read out you're toast no matter what Jeffrey Haas Tony Li - raw and not on traditional router with host plane ? I guess better question is what's the platform? If raw linux, this discussion is a bit off for actual routing hardware Tony Li Control plane in namespace containers, all on one server. Jeffrey Haas But does help develop the optimistic limits ah. container underlay mechanisms can lead to weird behavior on bursty traffic Tony Li We did repeat on physical hardware, same results. Jeffrey Haas observations from tcp in bgp land in similar environemtn John Scudder Did I correctly understand the last line to be "too fast to meter"? Tony Li Tool failure. John Scudder ah Henk Smit So faster bursting of LSPs causes fewer retransmissions ? It seems the sender's retransmission code is too aggressive .... Tony Li Faster PSNPs decreases retransmissions. By default, retransmisssions are more aggressive than PSNP, which is just plain broken. Henk Smit Sure. But I assume that if you have the receiver send PSNPs every 2 sec, you have set the sender Tony Przygienda sigh, I say it on the mike, I assumed e'body knows that you obviously break the spec and send PSNP (ACKS) as fast as you can mea culpa Henk Smit ... sender's retransmit-interval to at least 3 sec. Tony Li "Everybody knows" is pointless. We're out to revise specs. We need to write it down. Tony Przygienda correct, as I said, "mea culpa" too many implementations over course of my life I guess until one doesn't realise such stuff Henk Smit Tony, in the example by Sarah, when you set psnp-interval to 2 sec, what did you set the rxmit-interval to ? Tony Przygienda no interval, I think what they suggest i.e. fast start with ramp up is best solution I was frankly always greedy & ACK'ed as fast as I could Henk Smit Question was for Tony Li. Tony Przygienda oh, sorry differentiate ;-) Henk Smit Sorry. :) Tony Li Left the rexmit alone. Henk Smit Old IOS code has that set to 5 sec. Can I assume your code still has it to 5 sec ? Tony Li Bad assumption. But that's not the point. The point is that without fast PSNPs, delay seems like you need rexmit. Tony Przygienda and yes, what Jeff says about the bursts is very relevant but very HW specific and very hard to put finger on. It's the usual trashing with too much context change vs. state compression slowing convergence Henk Smit Just trying to understand why # of retransmits increases when you transmit slower. It seems so counter-intuitive. Tony Przygienda and then once you start do that on tons interfaces lots of other stuff kicks on in terms of buffer management nope, basically not enough PSNPs make it out to confirm all the stuff that came in so the TXer starts to RTX because timers kick in Tony Li Exactly. Tony Przygienda that's why the greedy PSNP basically works albeit the ramp up suggested here is somewhat better since it protects receiver so this TX watching for retx/loss and backing off and ramping up PNSP @ receiver is gold locks IMO no need for any fast signalling, the protocol pacing basically drives max. hysteresis and both the TXer and the RXer can clamp down the hysterisis Tony Li Our data doesn't support that. All it says is that faster PSNP is a requirement. Henk Smit Sure. Another rule-of-thumb would be: psnp-interval should be (much) lower than rxmit-interval. I just wonder if that was the case here. Tony Li It wasn't. The point is that a PSNP of about 15 entries is near optimal. Henk Smit OK. Good to know that number. I'm just wondering why it is 15 ... Tony Li It's a balance between excessive overhead and acking before the transmitter gets bored. Tony Przygienda this will be implementation dependent, link etc anyway ... if both sides adjust they'll find the optimal point, that's the point of dynamic system hysterisis Tony Li It's not clear that we need sophisticated. The data shows that performance is good as long as you don't wait to 90. Tony Przygienda well, you break the spec on PSNP side as you suggest (and I always did ;-) and TX otherwise you won't speed up how sophisticated you want to make it is kind of implementation technique IMO Tony Li The question is what do we write down. Tony Przygienda well, as usual it's dangerous to write implementation guidelines as Yakov always warned ;-) I like hysteresis since that will amount for weird things like losses due to link error, buffer overruns and so on but it's surely more complex than saying "just send TX as fast you want and send PSNP _fast_" ;-) Jeffrey Haas These days I'm fond of "implementation considerations". If you don't have great flow control, drop rate, and queue behaviors in reactions to bursty traffic are things to watch for. If you had a nice mechanism to communicate the sustainable xmit cadence, that'd help a ton Tony Przygienda back pressure signalling is always difficult/slow and surprising ;-) And we don't have _fast signalling_ into link-state seems to me personally like an overkill. but that's just me ;-) in rift doing weird stuff (well, HW did it for me unintentionally) I found that putting SeqNr on packets & seeing losses of eg. a PSNP packet and backing off very aggressively improves the RTX rate in overlaod/problems. But we don't have that on ISIS formats so that's it Jeffrey Haas how fast you tune the too hot/too cold on your faucet is certainly a problem. better to find a warm enough and leave it alone for a while Tony Li Thus 15. Tony Przygienda that's just hysteresis parameters and matter of taste, you want fast ramp-up, fast slow-down and so on and 15 may be a great number or may not depending on how fast the TX go/link and so on but it may be the simplest recommendation Jeffrey Haas for obvious reasons, I see this as akin to bfd session tuning Tony Przygienda same thing really Tony Li It's a control loop. Tony Przygienda yep ;-) wonderful if you can make 30 years career aspect of a single concept you study in dynamic control theory 101 ;-) Tony Li Re: TTZ -- I'm still open to collaborating, but it MUST be in public and NOT on a design team. Legal stuff. Les Ginsberg But is there a need to introduce "zone". Can't we do whatever is needed using areas? Tony Li I think so, but... Robert Raszuk Well BGP is not a dump track ... truck John Scudder aaaand this is why I'm not in the "virtual is the solution to everything" camp Tony Li Ayup Christian Hopps +1 Henk Smit Solution for all these disconnect problems: someone should build a better Internet .... :) Aijun Wang Can we discuss this on the maillist? there still exists noise. John Scudder Plus we're over time anyway.