LSR IETF 109 Chairs: Acee Lindem Chris Hopps Secretary: Yingzhen Qu Date: Monday, 16 November 2020  Time: 12:00 – 14:00 ICT WG Page: http://tools.ietf.org/wg/lsr/  Materials: https://datatracker.ietf.org/meeting/109/session/lsr  Note taking: https://codimd.ietf.org/notes-ietf-109-lsr Meetecho: https://meetings.conf.meetecho.com/ietf109/?group=lsr&short=&item=1 Jabber: http://jabber.ietf.org/logs/lsr  iCalendar: https://datatracker.ietf.org/meeting/109/sessions/lsr.ics  * Administrivia and WG update 12:00 - 15m Acee/Chris * Effect of Parameter Regulation on ISIS flooding Experimental results 12:15 15m Gang Yan Les Ginsberg: Did you do any adjustment to the PSNP response time? There was a presentation last time from Sarah and Tony that indicated that it made a big difference. Gang: We set it to 800ms instead of standard 5s, a small enough value, and we attempted to send an ack ASAP. Les: Second quick question is, the receive window that you've talked about, what you're actually doing is just monitoring the local retransmission queue for each interface, correct? Gang: Yes. We discussed internally, the unacknowledged queue size is one condition from which we can get some feedback from the receiver side. Sometimes we don't have enough information to control it. For some interfaces, we can only check the queue size. Seometimes we need feedback from the receiver side. Les: Just wanted to clarify what you did in your tests, thank you. Bruno Decraene: Similar to Les, When you do the test with a static window, it could be useful to to acknowledge PSNP faster, as illustrated by Sarah and Tony. Gyan Mishra: Was the test done with simulation or hardware? Gang: We will try to run more tests, with more links and complex scenarios. We tested in Huawei hardware. Xuesong Geng (from chat): Some supplemental info for the question from Les about Receive window: the length of the queue is local information, but the receive window is the threshold of the queue, which could be configured by local information or the information from the receiver. * Using Flex-Algo for SR based VTN 12:30 - 10m Jie Dong https://datatracker.ietf.org/doc/draft-zhu-lsr-isis-sr-vtn-flexalgo/ Acee: Speaking as chair, can you start more discussion on this draft to have people read it and socialize it? I read it, but the need of the V flag is not clear to me. Jie: We will start to have more discussion on the mailing list, and then have the adoption started on the list? Peter: I commented earlier. The way the total constaints are being used. Basically a bundle must advertise the summary of all the included affinities of the individual members, can't use an exclude or combined exclude with include because that would break. So, if this is what the author had planned to do then the restriction should be clearly described in the draft. Jie: We had a discussion on the list, we'll add it to the next version of the draft. Acee: Let's take the discussion to the list. * Using IS-IS MT for SR based Virtual Transport Network 12:40 - 10m Chongfeng Xie https://datatracker.ietf.org/doc/draft-xie-lsr-isis-sr-vtn-mt/ Acee: This one is more straight forward. We can put it in the queue for WG adoption, the key issue is whether we need it. I think it's good to have an informational draft. Chris: Let's do a raise hands. Pretty good support, so we can start the adoption call soon on the list. * OSPF Transport Instance 12:50 - 10m Acee Lindem https://datatracker.ietf.org/doc/draft-acee-lsr-ospf-transport-instance/ Aijun: How about using top TLV in router LSA so we don't have to use separate instance? Acee: It depends on what information you want to put in the router LSA. Aijun: You mentioned some information to the server and it's similar to Linda's draft. I think try to put the information in independent TLV might be simpler. Chris: I think that this should be discussed on the list for sure, especially in the context of some of the presentations today. It's coming up later and I think you're referring to it. Les: I didn't realize until just now that you had revived this. RFC 6823 introduced an application identifier, with a registry, to control the assignment for the applications. Do you think that concept has any relevance here? Acee: Yeah, we will look at that. Les: Maybe we could, we'd have to change things a bit but maybe we could share the registry between the two IGPs. Zhenbin Li: I have a Similar concern as Aijun. From my point of view, you produce that instance, and this will add complexity for the network evolution, because this may introduce more configuration. We know the BGP they always do the same session to support the different services. So from my point of view, maybe we can also create a similar mechanism for better network evolution, and also reduce the possible cost of the IGP. Gyan: This feature, it can work right I guess. Would it work, let's say in conjunction with the layering concept where you can divide multiple instances? Let's say if you had 5000, maybe you could leverage flex-algo in conjunction, I guess. Acee: This is about distribution information using OSPF. The transport instance is for information that isn't for routing. Chris: Direct analogy to RFC 6823. * IGP FlexAlgo in IP Networks 13:00 - 20m Parag Kaneriya https://datatracker.ietf.org/doc/draft-bonica-lsr-ip-flexalgo/ Acee: Why did you define a new top level TLV for OSPF extended prefix LSA? why not use the existing one? I'm not saying it's wrong, I'm just wondering the reasoning. Parag: Is that specific to algo 0? Peter: The existing one will would advertise algo 0 reachability. Gyan: How will this inter-op with say SRv6? Is there any benefit? Parag: It's independent of SR MPLS and SRv6. Maybe the operator doesn't want to run SR or they don't have this hardware capability, they can use this software-based programming into hardware and can achieve the same level of the data plane separation for network slicing. Ron Bonica: I add to that. So long as the addresses that you bind to flex algo are different from the locators that you will bind to flex algo, you should be able to run them both at once. But the question is, why would you ever run them both at once? Gyan: Just for corner cases. Chris: I thought this is an alternate. Ron: That's why I'm asking why would you ever run them both at once. Peter Psenak: Nothing is preventing you from running both. Bruno What you're trying to do is already available in SRv6 flex algo. Why is there different TLV for SRv6 and IPv6 flex algo? Parag: SRv6 has more sub-tlvs which are not applicable here. So we can't use the same TLV for IPv6. Peter: Yeah, in theory, one could do that and use the locator TLV without the SIDs. That is not prohibited at the moment so one can do it. I don't think, from an encoding perspective, that would really be something that I would prefer. But theoretically, you can advertise the locator without any SID and treat this as an IPv6 prefix. Whether we want to do it or not is another matter. I would rather not to, I would prefer to have a clean encoding and not mix things. Shraddha: Locators are not 128 bits, here it is. Peter: In theory, it has prefix and mask, it can be. In theory, one can do tha it. Bruno: In theory, some implementation would not support? Peter: Basically what you're asking is why we are not using the locator TLV. One can do it but I don't think we want to do it. Tony Li: I just noticed you published 01 version. Can you summarize the changes? Parag: We added MT-ID of 12 bits to reduce the number of top-level TLVs. Plus editorial changes. Jie: it's a new application to use IP flex-algo. Can different application use the same flex-algo ID? If so, can the computation results be shared? Parag: No. It's not good to use the same flex-algo, there is a forwarding conflict. Peter: You need to run a separate calculation for each application, nothing stops you from using the same flex-algo ID. Jie: The computation for different applications will be different. Peter: There is a section in the draft describes that. Ketan: I support to use new TLV for IPv6, not the same as SRv6. It's better to keep them separate. Acee: This generated lots of interest. I think we can do adoption based on interest. Chris H: Not enough time to show hands. Let's take it to the list. Acee: I looked again and agree that separate TLV is good for OSPF. IGP Extensions for Advertising Hop-by-Hop Options Header Processing Action 13:20 - 10m Yali Wang https://datatracker.ietf.org/doc/draft-wang-lsr-hbh-process/ Chris: This reminds me of the IFIT draft I had issues with. I noticed the latest IFIT draft now using the information for path computation. One of the complaints before was this was the sort of capability that had to do with OAM, and not routing. So it wasn't appropriate to put it into the routing protocol. Are you going to use this to determine routes? Yali: We want to mention the motivation is advertise the HBH processing capability which is not the same as IFIT. The node capability of supporting HBH header may impact path calculation so we need to know it. If we have service requiring these HBH capabilities, we need to know which node or links can support them. Chris: As an operator, and I wand to do OAM, do I want to upgrade my line card? I'm not sure I believe the use case. I don't want my network traffic being determined by whether I've go an updated line card. It feels funky. Acee: These are in-band OAM techniques? Yali: Yes. Acee: And you're going to only compute paths that support In-band OAM so this is similar in function to the previous draft. Yali: There are more use cases using HBH header. Ron Bonica:If the only application is path computation, can't you just change link color? Chris: Yes. That's what I thought. I could just mark my interfaces that support this and then I'm done. Jie Dong: To my understanding, it's to advertise some generic HBH processing capability. There are other use cases, some traffic needs this. We have another draft in 6man related to this which may need this. So this can be a generic mechanism. Some application would select route with this capability. Chris: Note that my previous comments on Yali's presentation were as WG member. * Passive Interface Attribute  13:30 - 10m Aijun Wang https://datatracker.ietf.org/doc/draft-wang-lsr-passive-interface-attribute/ Chris: I'm curious about some of the use cases. it seemed like on the mailing list that there was one use case where it was inter-AS, there was some discussion that was already covered by TLV. And then there's a reference to Linda Dunbar's draft which I went and read, which was about using passive interfaces to describe load-balancing on servers with anycast addresses. And in that case, why wouldn't you just advertise the anycast as a prefix and attach the attributes to that prefix. Why would you call a server a link? It seems weird to be using a link in these cases when you know if it's inter-AS there's already something for that and if it's servers, then why not use the prefix. Thanks. Aijun: The proposal by Linda is connected to stub link, so I think it's suitable putting them in passive interface attribute. Chris: My point is I think that draft is wrong. It is describing a server and an anycast address, and then trying to attach the information for it to a link. Why not just do a prefix for the server? We can discuss it on the list. Acee: With generic way of advertising a stub link, you're using the fact that it's passive to make inferences for these use cases. Why wouldn't you just advertise this as an inter-AS boundary or something like that, or whatever your use cases. and we can discuss this on the list. I'll review the update. * Prefix Unreachable Announcement 13:40 - 10m Gyan Mishra/ Aijun Wang https://datatracker.ietf.org/doc/draft-wang-lsr-prefix-unreachable-annoucement/  Chris: Speaking as working member. When I was thinking about this, a lot of people seem to have complaints about the mechanism for withdrawing the unreachable, and using timers. Compared to RIFT, how RIFT had a more deterministic way to deal with it. It really made me think if there was a way to attach the unreachable to the summary route, then you would just re-advertise the summary route with or without the unreachable, the holes basically. And that way, if you don't have that problem. I don't know if you're able to do it like that in OSPF. Gyan: Right. So with this, the prefix unreachable advertisement would only be present when a component is down. So when that component comes up the unreachable stops immediately, so it wouldn't stay. It would help the propagation. But when the summary is advertised, it would be the summary plus this component attached to the summary route. So when the receivers I guess on the other side, say on the left side, let's see when they receive the summary they'll see the routes, the components that are down and it'll automatically program it in. To program into a FIB to drive traffic and not send any traffic to that summary route. Chris: If you have a sub-tlv, then you can withdraw it by just changing the summary though. Acee: Speaking as WG member. I was gonna say there's a lot of complexity here for the different use cases and it looks like you even added some additional use cases in your slides and the interaction with the RIB is going to be quite complicated. You should describe what you'd expect to be installed in these cases, and how it's going to work. I think that needs to be specified. Gyan: It was the convergence and let's say you know if you have a false positive or false negative. I think with timers, because you're impacting the data plane, it's supposed to make the convergence better, not worse. So we want to make sure that whatever timers, when services come back, that it would be flood that is part of the summary. So when the service comes back up, it's immediately flooded and removed. So you converge immediately as soon as any part of the same summary advertisement same sub TLV. So when the service is back, it comes back. When it goes down, the data plane converges immediately, really improving the data plane convergence. On the mailing list use-case with the BGP, which could be like in this drawing we actually have multiple areas, but if you had a single area, and that issue with the next hop, when you're tied to, because you're doing that next time rewrite, because that hierarchical. It's dependent that the loopback doesn't go down, that's something that existed for a long time. So this thing would make the next hop converge immediately and I think that would be a huge gain for any service provider or operator. That's probably the biggest one. The timers and converging the data plan and making sure that when services come up, we just have to make sure that the entry that's tied to the prefix is actually gone. With this new TLV, it would age out quickly like immediate it would be immediately flooded in as soon as the service comes up. Boom, that blackhole route is gone. The unreachable is gone. As soon as the component is down, the next hop goes away, then immediately the data plane converges. It would really have to be immediate. It can't be lingering. That's really the critical, because we don't want to make things worse. We want to make it better and improve convergence. Chris: I think this probably needs a lot more discussion. Acee: It feels like the different use cases have different RIB behaviors. Mixing them is confusing as well. Gyan: It is. There's some stuff I think we have to really clarify. But I agree there is a mix of use cases. We have to work on cleaning that up. Tony Li: I have a concern about scalability. It seems to me like if you had a significant failure, you can use this to inject a whole lot of stuff into an IGP relatively suddenly. That seems like an unintentional stress test. Gyan: Yes, that's a good point. But I think that is something that we will control. There is a timer that we would have. We have a variable based on the idea that it would be a user configurable, based on the number of component prefixes that are down versus the number of routes. So, how many negative components you have, and how many services are up and it would be something configurable. By default, as long as you have one match, and then there's one component that's up and the summary always gets out in time. So you could keep it the same or you could change it depending on the use case. If you don't want it to change and you want to always advertise a summary no matter whatE. Let's say you have 1000 component prefixes, but as long as one component is up you always advertise. And that's something that you could change and you could say, well make it like 50% if you have 50% of your components are down, really, you shouldn't be sending out summary maybe at that point. Hope that answers the question. Does anyone have any other any other questions or comments and we can pick it up I guess on the mailing list. Chris: We should probably wrap up. Acee: We will continue this discussion on the list.