Minutes IETF109: rtgwg
minutes-109-rtgwg-00
Meeting Minutes | Routing Area Working Group (rtgwg) WG | |
---|---|---|
Date and time | 2020-11-17 05:00 | |
Title | Minutes IETF109: rtgwg | |
State | Active | |
Other versions | plain text | |
Last updated | 2020-11-24 |
minutes-109-rtgwg-00
IETF 109 RTGWG Chairs: Chris Bowers Jeff Tantsura Secretary: Yingzhen Qu Date: Tuesday, 17 Nov 2020 Time: 12:00 – 14:00 ICT Meetecho: https://meetings.conf.meetecho.com/ietf109/?group=rtgwg&short=&item=1 1 12:00 10m Administrivia and WG update Jeff/Chris Stephane: Regarding the ti-lfa draft, we have reduced the number of authors to be compliant with IETF rules, I think it is ready for WG LC. can we add it to the LC queue? Jeff: we'll handle your request and review your update. 2 12:10 15m Multilevel configuration https://datatracker.ietf.org/doc/draft-bogdanovic-multilevel-configuration/ Dean Bogdanvoic Tony Li: how do you see it with the hierarchical architecture in NETCONF or openconfig? Dean: This is the area that requires lots of work. I don't have a good answer yet. Jeff: This is a large topic, and it's going to affect the models that we've been working on for years. 3 12:25 10m SRv6 Midpoint Protection https://datatracker.ietf.org/doc/draft-chen-rtgwg-srv6-midpoint-protection/ Xuesong Geng Parag Kaneriya: how do you differentiate the condition the route is not present vs. link down? IPv6 prefix not advertised, or not link down? Xuesong: on slide 2, there are three stages. when IGP is already converged, and we can't find the route anymore, some upstream nodes will do the proxy forwarding. If there is no IGP convergence yet, node B, the adjacency of the failed node will know that something wrong happened and will do the proxy forwarding. Parag: what about the function supposed to be executed at node E? Xuesong: you mean the function should be limited to some nodes? Parag: in SRv6, some function maybe executed at node E, maybe accounting or rewrite? Xuesong: so you're asking what if there are some functions ended on node E. node B can only do the forwarding function. Parag: can we use ti-lfa to avoid this situation? Xuesong: there are cases discussed in the document about types of B, we have different cases. There are different scenarios. Parag: The idea would be to use backup path to reach E via ti-lfa. Xuesong: actually it skipped failed node E. Jeff T: we've not converging. you're asking to use ti-lfa. please take your question to the list. Stewart: So I'm afraid my question was very similar. When you set up segment routing path you set that path up for a reason, and that reason made me want traffic to go to a particular node for a particular reason, or you want to protect other paths from being overloaded by that traffic. So with all repairs and alternatives, I get really worried that you're going to bust all the original creating of the SR path. And I would imagine the only node that could actually reroute a path is the head end node, which has the whole network at which it can to choose from, as opposed to a node in the middle, which could be busting someone else's SLA. xuesong: Actually, I think that is a very good question. I can respond to the question in two aspects. The first one is that if some node, cannot be bypassed, We have some mechanisms that are already being referenced in the document. Maybe some node, like node E, it performs the firewall to the whole path, so it can't be bypassed. We can for example expand the IGP, and to advertise that I cannot be bypassed so do not bypass me. This is the first solution for that. And the second one is the motivation for this solution is just to give a method to supplement the ti-lfa mechanisms. Maybe something wrong happens in the endpoint and we can have some methods to bypass the failed node and to go to the next endpoint, at least the packets will not be lost, and the the flow will not be interrupted. So that is the motivation here. Okay. That is my response so I think the mechanism itself is, is reasonable and it can solve some problems here. Stewart: Can we show you're not going to bust something else, then fine but it's quite a complicated problem I already mentioned, making sure you aren't going to bust so carefully engineered traffic plan. Zhenbin Li: To protect the function at E is a different scenario. Second, the confusion is to mix with ti-lfa, this is for SR traffic engineering, when E failed, we need B to proxy forward the traffic. This can be explained in the mailing list for details. Jeff T: The motivation is not clear in the draft. It's difficult to understand, so not good for WG adoption. I have a question about experimental trucks. I mean you don't define any new extension, you kind of define possible behavior. So, probably informational would be better track rather than experimental. Xuesong: Actually there are some new behavior defined for repair nodes. For example, to do the proxy forwarding we define some SRv6 functions. So there will be some new functions defined in this document. Jeff T: I'm looking forward to reading new updated version of the draft. Bruno: it's about terminating a segment. let's have some discussions in SPRING. We've discussed with Jeff, there are multiple documents about terminating a TE or a tunnel, and what do you do with a VPN etc. Jeff T: please post your questions to the list. From chat: Ketan Talaulikar @ Parag - this proposal is somewhat similar to draft-ietf-spring-node-protection-for-sr-te-paths? @ Stewart - this was exactly the discussion that happened around the adoption of draft-ietf-spring-node-protection-for-sr-te-paths 4 12:35 10m Egress Protection for Segment Routing Networks https://datatracker.ietf.org/doc/draft-hegde-rtgwg-egress-protection-sr-networks/ Shraddha Hegde Stephane Litkowski: Could you elaborate more on how you are allocating the service labels from the SID? Shraddha: yes. that's right. Stephane: so you can't make it work without repair label allocation. Shraddha: that's too much overhead. Stephane: you may want to clarify it in the draft. Shraddha: yes, I'll. Peter: So I have two comments here so first of all similar to what Stephane was asking, I can see this working in a preallocation. if we go to the very next hop becomes way more complex to synchronize the sids or labels between these especially if you have multiple PEs not just two but even more. And the second question is, well, it looks like more like a deployment. It's not really standardizing any protocol extension, people always ask whether these things should really be published as a draft. So, that's my comment. Shraddha: Yes, there are other solutions floating around. This is one way to deploy it. we consider it as good informational document. we got some feedback, allocating the labels statically is a little complicated so we are also working to see how we can automate that. And that will most likely require protocol extensions. Peter: Absolutely. That's definitely something to look at. Because, as I said, it can become very complicated. Sounds easy if you have two PEs and use the same label or sid. If you do an allocation, other than third VRF, it can really get more complicated Greg: have you considered double failures? Shraddha: So the multiple failures. there's always possibility that traffic can get dropped or can undergo micro loops when there are multiple failures that's applicable to ti-lfa in general as well. And nothing special, nothing different for egress protection. Jeff T: FRR never claims to protect double or triple failures, it's a formal statement. Please send your question to the list. Xuesong: I have a concern about deployment complexity because the request of static assignment of labels. The second is that SRm6 is still under discussion for maybe not proper to have a reference here. Jeff T: points taken. 5 12:45 15m Dynamic-Anycast in Compute First Networking use cases and Problem Statement https://datatracker.ietf.org/doc/draft-geng-rtgwg-cfn-dyncast-ps-usecase/ Architecture of Dynamic-Anycast in Compute First Networking https://datatracker.ietf.org/doc/draft-li-rtgwg-cfn-dyncast-architecture/ Yizhou Li Jeff T: we're not going to take questions here. Please send your questions to Yizhou. 6 13:00 15m The Problem Statement for Precise Transport Networking https://datatracker.ietf.org/doc/draft-xiong-rtgwg-precise-tn-problem-statement/ The Requirements for Precise Transport Networking https://datatracker.ietf.org/doc/draft-xiong-rtgwg-precise-tn-requirements/ Daniel Huang 7 13:15 15m BFD for Multi-point Networks and VRRP Use Case https://datatracker.ietf.org/doc/draft-mirsky-bfd-p2mp-vrrp-use-case/ BFD on Multi-Chassis Link Aggregation Group (MC-LAG) Interfaces https://datatracker.ietf.org/doc/draft-mtm-rtgwg-bfd-mc-lag/ Greg Mirsky 8 13:30 20m Extension of Transport Aware Mobility in Data Network https://datatracker.ietf.org/doc/draft-mcd-rtgwg-extension-tn-aware-mobility/ Kausik Majumdar Dhruv Dhoday: we were discussing this on chat. Flow spec would be a much better choice, it will provide you much more granularity for steering and we already have a mechanism both in BGP and PCEP to do generic flow specification so why did you not use that and instead define a new 5g metadata sub-TLV? Kausik: you mean like BGP SR policy case? Dhruv: in BGP and PCEP, we have a flow specification, where you can have more granularity control of defining, which flows needs to flow in SR policy or even in any PCEP base tunnel. So we have much better techniques, in my opinion. Were you aware of that? Kausik: I'm aware of flow-spec. This is a generic approach. There is already SAFI session, and we need the metadata. Either do it with a SAFI session or flow spec, I don't see huge advantages. But we can discuss more. Dhruv: let's take this offline. Uma: flowspec can be used, definitely, that will give much more granular specification at the UPF and PE combined scenario, but the here the case is the granular is not required, rather than he wants to maintain end to end SLA. The mobility domain and internet domain, if you want security or one of their 30 characteristics, you want to maintain that, and they want to maintain the same characters, that's used in the mobile domain which is the resource based. So yeah, you can use both ways, actually, to me. Jeff T: so you're limited by UDP space, why don't we look up tid so GTP header next, it will give you a much better context. Uma: This is discussed in the DMM draft. I think the problem is tid is a scalar. It doesn't represent the properties we are seeking and that it is more loaded with 5g control plane characteristics so we don't want to touch that, so that's the reason. We had a calculation of how many slices can be done in a typical network, that's in the DMM draft. We found out for most practical deployments the UDP source port ranges are good enough. The problem is we are limited by what you can put in the packet, so we don't want to enhance too many things in the data planes so that's why we are limited with that UDP source port. it's kind of middle ground approach we took. From chat: Dhruv Dhody Cant this be done by flowspec? Stephane Litkowski Right, this is also my question :) Flowspec will also provide the full granularity for steering Jeffrey Haas Flowspec is usually implemented as a firewall rather than destination based lookups. Different hardware paths. Regardless of whether you're looking at encoding stuff fancier than just destination based in flowspec or tunnel encaps, that's the penalty you pay. Tony Przygienda no free lunch. more complex lookup, more gates, more energy, will cost more. as a side note, isn't that what we have flow-id in v6 for? Date: Wednesday, 18 Nov 2020 Time: 14:30– 15:30 ICT Meetecho: https://meetings.conf.meetecho.com/ietf109/?group=rtgwg&short=&item=2 0 14:30 5m Administrivia Jeff/Chris 1 14:35 10m Protocol Assisted Protocol (PAP) https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/ Shuanglong Chen Xiang Ji: first question, how do we address IP path to run this protocol? second question, for performance, instead of TCP or UDP, have you considered SCTP? Zhenbin: The first one depends on configuration, the reachability is dependent on the forwarding table. For the 2nd, we only considered TCP and UDP, we will consider your suggestion. Randy Bush: I like the separation. I think it's architectural reasonable, but not MD5, pick something from this millennium. Zhenbin: we will consider more options. Jeff T: I would really advise the authors to articulate who is the end consumer of the data, because networking devices don't usually do anything with kind of information. So, if the consumer of the data is NMS, the real question would be why not use south north communication we already have, such as gRPC. I'd really like to see better justification of doing it horizontally rather than vertically. And hopefully on the list. Zhenbin: Thanks for your comments. We talked about the reason to introduce west-east network monitoring. This should be a light protocol, Randy Bush: back to the discussion IDR, protocol has to have this extract mechanism. That's what I meant when I said I kind of like it architecturally. 2 14:45 20m GRASP https://datatracker.ietf.org/doc/draft-ietf-anima-grasp/ Toerless Eckert zhenbin: Thank you for the information. This is a very valuable reference for PAP. My understanding is that GRASP is based on TCP, and I have a concern about resource consumption. if PAP is used to locate BGP errors, we may need full meshed TCP sessions, it's very challenging. Full mesh BGP peers is already a challenge for operators, and PAP is only for network monitoring, that's my concern. Toerless: GRASP doesn't really specify the underlying protocol, it can be used in TLS, QUIC, any secure transport protocols and standardized in IETF. You can also use UDP or TCP, just you have to consider security mechanisms. Brian Carpenter: The prototype code was done by me, it's not production code. I looked in TCP or TLS at the moment. but I looked at what would be involved in switching to UDP, the messages of course could possibly well be wrapped up in UDP packets and said, but my impression was that the overhead that TCP allegedly adds, I would have to add back into code, just to make sure that UDP messages are delivered to the right. And you would also have to put in more recovery logic, because UDP doesn't do it for you. So, I'm not convinced that there's an efficiency argument against using a transparent layer. Toerless: Through the years and Anima we've seen a lot of opinions about transport layer so I think that the prudent thing is obviously to define the Message Protocol, independent of the transport and the security as we've done in grasp and then basically let the chips, or IETF, or whatever the wisdom is. so I think if we had for example more IoT people in here, they would come and say that some even lower overhead on things that are based on UDP is really crucial for very low end devices. That's why they did COAB avoid TCP, and that's based on UDP, so I don't think there is an industry wide common understanding of what the best transport is. And that's why we did GRASP the way we did it. Zhenbin: Thanks for the information. We'll consider using GRASP for PAP. Toerless: I'd like you guys provide some detail examples, such as BGP, for automation. With ANIMA, we have security mechanisms, and we'd be happy to build those examples. If we have this framework, we'd be happy to have routing people using it. Zhenbin: Happy to do it. Chris: there seemed like a pretty good way to bootstrap the sort of the use cases. We could spend a year or two discussing things for PAP, but instead it makes sense to try these use cases in the existing coding. It seems like you would get a lot of benefit from a reliable transport. To begin with, you wouldn't have to worry about a lot of protocol stuff if you could just be sure that messages were getting received. That would be a pretty good way to to get this moving a lot faster with solid convincing use cases. Toerless: I can't remember there was this one effort to try to unify and build better common security for the different routing protocols, and I think that ended in pretty much nothing. If people also think that the securing of the routing protocol is still something that the industry struggles with, with that, beside the other operational examples given in the PAP draft i think the securing might be something that could be most easily done upfront. Jeff T: Thanks Toerless for the presentation. Looking forward to more collaborations. 3 15:05 20m xBGP: When You Can't Wait for the IETF and Vendors https://dl.acm.org/doi/10.1145/3422604.3425952 https://pluginized-protocols.org/ Olivier Bonaventure Jeff Haas: The nice thing that is an output of this presentation is that, yes, plugins can be done for allowing code to be hooked externally, you know the Linux kernel is an excellent example of that. The challenge I think that's not really covered in the paper, or at least the presentation here of the paper is most of the headache for BGP is really incremental deployment issues. So writing code isn't all that bad. Getting code that can run in implementations they're scattered around the internet and follow BGP rules in terms of validation and such. That's the main challenge. So they pay I agree that having plugin frameworks is actually a very useful thing. But I think that's not necessarily the hard problem here. Olivier: But I think there are benefits in being able to deploy extension inside an AS, so for ibgp only. So doing it over external BGP is much more difficult, but for ibgp I think it makes sense and it could address some use cases for network operators that have specific requirements. And it could it could also allow network operators or network designers to develop new extensions, before they are being discussed within the IETF. And so you could get information, prototype, running code that could be discussed within the IETF without having to change complex implementations and to do a full implementation. Jeff Haas: Understood. I'll leave my comments with I agree with you that doing this is not hard. In that, if you contain, so I'd call it the blast radius of this problem to something that's strictly internal, you're right that this is not too much of a problem, but the blast radius is really the discussion that's the difficult one for incremental deployment. It's very common for features in development to have unfortunate side effects, far away from where they're at. I will cite the attribute 128 issue that many people are familiar with. As an example of a feature that had two different versions that caused a large network outages. My suggestion is perhaps part of the conversation if you're going to talk at least about BGP is whether or not, BGP should be basically leveraged to side protocol for different purposes, separate the inter domain case from your local case. That's it. Thank you. Jared Mauch: I think this is interesting, but I have a lot of concerns here specifically around deployability and usability testing. So, if we were to do this, similar to what Jeff was talking about with attribute 128 issue of which I'm still trying to get the config out of our many network devices, we have a variety of code bases that we still run. I'm concerned about what happens when we deployed these plugins and they have different results across different vendors and how that kind of bug ecosystem interacts because it's one thing to specify a protocol and a method for transporting, but it's a whole another thing to discuss what that operational impact is, and I think similar to what Jeff commented about about the internal use cases versus external use cases. We have a lot of internal use cases where we transport BGP data and signal that around our internal infrastructure. And those use cases tend to not really align with what shows up in the public Internet. And a lot of what happens is people have many more fine grained signaling that they want to do for geo location. Then what you actually want to announce in the routing table. As a result, so I'm very concerned about something like this and what happens to the operational use case, when we have a lot better things like if we want to signal to feed data or something there is the geo RFC that was recently published. There was discussion last week about how to actually go and potentially authenticating sign that data to provide more granular information than what you actually want to publicly expose in BGP. there's definitely an interesting idea but I really fear the unintended side effects that this is going to introduce instability that'll create as a result. Olivier: there are two different parts of the presentation and there are two different elements. So the first one is the ability to consider a routing protocol a kind of microkernel operating system that exposes an API and that allows you to extend it. And this is generic, and although we implemented in BGP because it was easy for us to extend BGP and to do test with BGP. We believe that it can apply to any routing protocol and that's why we discuss with Jeff to present that here because this is an idea that is generic and it would be applicable to any routing protocol I agree with you that doing that over external BGP session and over the pubic internet is something that would be dangerous, and that would need to take much more care to be able to do that. This is a research prototype which is intended to show that you can extend a routing protocol, you can view the routing protocol as a different way. And for the BGP extensions and for the BGP use case, we are focused on internal BGP issues we don't consider external BGP as a possible solution right now. Rudiger Volk: I love this view and way of taking up the definition of manipulation. When you provide agile to complexes you introduce fragility and you need to control very carefully. I see Jeff's points, however I note that we usually industrial production is a system we already some control points. In BGP, that's quite clear the policy definitions are available to operators. My previous on policy configurations did show me that evolving the definitions of policy in a network is something exactly points to the control problem that Jeff was mentioning. BTW, this is about communities being used in Internet. I see an opening with your approach to actually allow users to do much nicer and better controlled stuff for extending functionality far beyond what's been done so far with policy language. Great power comes with great responsibility, it should not be abused. Nevertheless the agility is much needed. Olivier: let me try to to answer a bit some of your concerns. So you mentioned that operators do a lot of with BGP communities to implement the policies. And basically the BGP communities is an ad hoc solution so you have to play with route filters which acts with access lists which with maps and stuff like that, to be able to implement the policies that operators wants. On the other side, by using xBGP what you have is that you have standard programming language in our prototype but this could be another language. And when you have standard programming language it's also possible to use software validation techniques where you can specify the properties that the plugin is to be able to support, to be able to be acceptable by a router. So as you see we don't want a plugin that runs forever, but we can have properties that are defined formally, and we can design tools, based on the progress in software verification that will verify that the plugin is working, is behaving correctly according to the appropriate properties of the underlying BGP. So that's something that you can do if you have a much more expressive language, then match the access list, route maps that we have today. And we can leverage lots of advances in software verification in the last year to verify that. Eduard V: Thanks for your presentation. I have one concern, by itself is a good idea, but unfortunately from a generic term world wide multi vendor interoperability is more important, could not be a trade off instead of multi vendor interoperability. The API should be exactly the same for all implementations, on the previous slide if you've seen that prototype implementation, in one case you have 400 API calls and the other 600 API calls, this is a very typical alarm. Big alarm because it looks like you don't have mature stable API, which you could really demand from all vendors, and it means there will be no multi vendor interoperability. This particular effort will not be accepted by the market. Therefore, from my point of view, you could mitigate if multi vendor review your problem. If you will clearly specify and make it mandatory some basic feature or basic API calls, but you shouldn't be very rigid for API, which should be in a mandatory. Then it's potentially possible to progress. Olivier: So just to answer your comments. We have a very simple API, which is available on the website, that they can provide the slide that Jeff shows here, shows the number of lines of code that we have to change to be able to implement the API. So this is not the number of API. So we have about 10, different functions in the API so the API is very simple. And the slide shows only the number of lines of code that we had to change in the BGP implementation to be able to support the API, which means that in fact, the API was already part of the interworking and build implementation and we did not have to change that much to be able to support the, the abstract API that I mentioned earlier with a workflow. Jeff T: Thank you Olivier for the presentation. I'm looking forward to hosting you anytime. Thanks you everyone for joining us tonight. Some chat history: Eduard V AMINA presentation is very good - easy to understand even for new people. Thanks to Toerless. John Scudder There are some liabilities with reliable transports, but the benefits usually vastly outweigh them. Brian Carpenter https://github.com/becarpenter/graspy for the code etc Jeffrey Haas audio not working the effort was called karp John Scudder ^ routing protocol security, that is Jeffrey Haas the issue with karp was bootstrapping often required getting components of ip working before you could get the job of security or routing done. Joel Halpern Arguably, the issue with karp was that folks did not see enough return (value) for the effort. So they stopped. John Scudder An interesting metric of protocol complexity. If we are to believe it, it's actually encouraging it only is increasing linearly. Jeffrey Haas karp had many issues. but it was helpful in highlighting the bootstrapping issues. :-) Joel Halpern I grant the bootstrap part was (and is) non-trivial. BRSKI has a lot of moving parts to get around that. Jeffrey Haas my point, joel. Brian Carpenter Yes. I waited so long for BRSKI that I added something called Quick And Dirty Security to the GRASP prototype. Jeffrey Haas Anyone care to start the betting pool for first outage from the plugins? RIPE had a record for a while on those for a bit Jared Mauch I had an unknown BGP attribute finder that I ran for awhile when there was the large round of leaks of them, around the timeframe that the 128 attribute issue occurred. Getting code upgraded on devices always takes longer than expected. Jeffrey Haas new code, faster, more agile, is not the problem. not blowing up the internet is. :-) that said, standardizing ways to interact with bgp pipelines is good. Jared Mauch I would be interested in improving the way the specs are written and how you do the capability signaling, etc.. vs bgp4+++++++++++++ but the path to transition would be difficult. Tony Li There are enough open source implementations so that if someone wants to blow up BGP, they don't need plugins. Jeffrey Haas The fun of plugins is that eventually you have same issue as things like linux kernel extnsions: overlaps, ordering issues, etc. John Scudder quite. security through obscurity (or through gatekeepers) is a non-starter. Nonetheless https://i.imgur.com/joMRjHj.jpg Jared Mauch considering how we moved away from plugins in web browsers, etc.. i'm similarly concerned about going down that path in routing protocols. Martin Horneffer Tony: which work well enough for academia? Jared Mauch and that's before the security/attack profile Tony Li Martin: my sense of the bar for academia is very, very low. Jeff Tantsura within right boundaries some fo this can be done, offloading best path to a custom algo was unthinkable just a few years ago