Minutes IETF114: sidrops
minutes-114-sidrops-00
Meeting Minutes | SIDR Operations (sidrops) WG | |
---|---|---|
Date and time | 2022-07-27 14:00 | |
Title | Minutes IETF114: sidrops | |
State | Active | |
Other versions | plain text | |
Last updated | 2022-08-15 |
minutes-114-sidrops-00
SIDROPS 114 Session: Wednesday July 27 2022, 14:00 - 16:00 (UTC) Session recording: https://www.youtube.com/watch?v=FVYH7O0jN4Q 1) Agenda bashing and Chair's slides - [5 minutes] Warren Kumari: IÕm the OPS Area Director, and my term ends in March. I still enjoy being the OPS AD but I would like for others to run. So, if you would like to know more about this role please talk to me. 2) Igor Lubashev/Sriram Kotikalapudi- [20 minutes]?Source Address Validation Using BGP UPDATEs, ASPA, and ROA (BAR-SAV)? https://datatracker.ietf.org/doc/html/draft-sriram-sidrops-bar-sav-00 Slides: https://datatracker.ietf.org/meeting/114/materials/slides-114-sidrops-source-address-validation-using-bgp-updates-aspa-and-roa-bar-sav-00 Keyur Patel: (as a working group member): I think itÕs a great presentation. Two points I will make: It will be great to see how this solution solves the problem for iBGP cases while you have eBGP cases. Most of the problems will probably also require you to have a control within eBGP, itÕs quite possible in this scenario that across iBGP a solution may not be honoured, or may not be enforced. But what is critical is if we can zoom into iBGP cases that would be phenomenal. ?Igor: ItÕs very valuable to also look at iBGP. One of the things that we want to do on the internet is that if one network will not enforce it, maybe the next network should have a chance to still do it. Obviously, all the SAV filtering is best done close as possible. Jeff Haas: I offer two observations for you. The core enhancements from 8704 is on one part you can add stuff to your source address validation from additional BGP data thatÕs not being used for forwarding, so being able to seed it from other sources is great. The presentation is an excellent example of that. We talked about ROAs as one example, and IÕm glad to see this is going forward. But the second thing ties into the slide you are displaying here (slide: The problem: some stats), about the economics; why havenÕt we seen more of this stuff. ItÕs one part the tooling for adding 8704 is not out there, but the bigger one is Source Address Validation in hardware is predicated and burning FIB resources to do the extra lookup to see if you can actually do Source Address Validation. ItÕs cheaper than doing firewalling, so from that perspective itÕs a wonderful thing. But itÕs still additional cost in your FIB in cases where you have SAV covered by what you are using for forwarding you basically get it for free, just at the cost of extra forwarding lookups. Every other thing we are looking at here, and as you started expanding use cases, youÕre looking at effectively doubling or tripling the size of your FIB to be able to implement this functionality. So, part of what youÕre fighting against is the economic cost of something thatÕs there for security that isnÕt actually selling moving bits around. ItÕs actually helping stop you moving bits. Igor: Economics is definitely a driver, probably more for some networks than others. Some networks would actually benefit. We see that 15% of the networks chose to implement something. So there is some value there. Bur for anything we do, itÕs important that it is as economical as possible especially for the small networks, because that is where the Source Address Validation is done best, especially for the first movers. Ben Maddison: There are three separate things. Firstly, to continue from the point that Jeff made, certainly all of that is true, but one caveat to that is, speaking from the networks that I operate, we typically run out of interfaces long before we run out of packets per second. That interaction with the hardware is not necessarily a dealbreaker, even for reasonable large networks that have a similar sort of shape. The second point is to reiterate what I mentioned in SAVNET (a new WG), is that using RPKI objects in this way breaks the fail-open semantics that they have in the current use case and I think we need to think quite hard about that. I think that there are other use cases where we want something thatÕs like a sticky RPKI object. I think that is going to be required when we replace the IRR, so I think there is other useful work that would need the same thing. The third point is, your example of pruning the customer cone using the ASPA is problematic semantically. Because I can imagine a scenario where a customer wants to use a transit link purely in the outbound direction and never intends to advertise any inbound reachability over it. And therefor doesnÕt include that adjacency in their ASPA, but expects the return traffic to continue working. And using it to prune the Source Address Validation and filter in that way breaks that assumption. If we want to do that, we have to be very careful how we document it, so it doesnÕt end up being a nasty surprise at four in the morning. ItÕs a corner case, but a valid corner case. All of those points accumulated, leave me wondering if we want to use RPKI objects for Source Address Validation I think I would prefer looking at defining new objects with those precise semantics, rather than trying to shoehorn the stuff we already have in this hole. I think itÕs worth while to do this potentially, and IÕll be happy to spend cycles on trying to get it done, but I donÕt love the idea of reusing the existing objects. Igor: With regards to your comment about the implementation, yes, you are right. The devil is in the implementation. If you have a temporary loss of cache, your implementation should expect this to happen and not start to deny a lot more. To your last comment: Absolutely, having a purpose built signal is much less of a hack than using another signal that was not built for the purpose. I see it as a trade-off between doing one more thing that is new, versus a signal that already exists. Well, ASPA doesnÕt really exist, so itÕs an opportunity. Koen van Hove: I will also continue on the point that Ben made, regarding what you expect the failure condition to be, because we have seen that RPKI publication points donÕt meet this 100% uptime availability. At any time there is some publication point that isnÕt quite working as it should. So youÕre not retrieving the ROAs or in the future the ASPA objects. For what I understood that would mean that a lot of traffic is being blocked, what indeed goes against the fail-open nature of the RPKI. What happens if part of your cache goes??Igor: You cache your information, and you assume it's still valid for 24 or 48 hours. And only after 48 hours you remove it. Other people might come up with better heuristics. Geoff Huston: In looking through this, what I understand youÕre trying to do is to take the simple case of Source Address Validation, which is a stub network where you have enforced symmetric traffic and youÕre trying to synthesise that form of stub by taking an arbitrary subnet of networks and say what is the total amount they can advertise in that arbitrary subnet. Which is a closed connectivity domain and youÕre looking at the union of all possible source addresses. But youÕre using a tool -the ASPA- which is a policy based tool that has directionality involved. YouÕre not using a connectivity tool that doesnÕt care. In my head there is a lingering doubt that when you try and assemble that closed, connected customer cone - which is a cone of connectivity, all the possible source addresses are now known, because of ROV etc, therefor I can now apply this ACL, your implicit assumption is that the policy directionality is the same as isomorphic to straight connectivity. I donÕt think thatÕs the case. If that is the case, there is a problem there. IÕm not reassured that itÕs not true. Igor: Everything that has been done since 2000, is looking at the signals and BGP is the same thing. It has directionality. ASPA is just another signal like that and that is information we do have and we donÕt have others right now. What can we do, given what we have, because thatÕs all we can do immediately. When we build something new, it has to have the properties that itÕs cheap for early adopters and they get immediate value. We think, by getting ASPA and ROA information, we can enhance the state of the art we have. Geoff Huston: An overly restrictive view of the prefixes coming from this bounded set of networks will create filters that are too enthusiastic. The problem with going down this path, is the operator push-back where an otherwise perfectly valid packet gets discarded because of some automated tool then becomes an operational cost. ThatÕs the underlying concern when I review this work. YouÕre starting from a small set thatÕs constrained in stead of from a large set that is maybe overly liberal, but at least it encompasses all that connectivity. There is more work to do here, obviously, but policy and connectivity donÕt always align. Igor: What SAV is doing, is that itÕs trying to expand the number of prefixes that it will find to put in your permissive list. ItÕs trying to make the list more permissive. Maybe this goes against the original purpose of ASPA. It think it should be explored, another ASPA record type. Geoff Huston: I suppose we agree that this is not ready yet, but itÕs an interesting path to take. Keyur Patel (speaking as a working group member): You talk about customer cone and relationships. IsnÕt this problem wider than that? The attack can also happen from a peering AS. Igor: Yes, the algorithm works on peering interfaces and customer interfaces. When you look at a peering interface youÕre trying to discover their customer cone. Ben Maddison: I think the failure mode Keyur is hinting at, is when you have a combination of a regionalised peering and a partial transit service offered by that peer or even to one of your own customers in a different region. Then is breaks this kind of closed traffic relation. And you end up not discovering potentially valid sources, even under the expanded algorithm, because those parts donÕt show up. But I donÕt think thatÕs related to the RPKI stuff, but with the algorithm assumptions. Igor: There are two things there. One: If those networks actually bother to create an ASPA record, they would list themselves as providers, so it would discover it. In a particular location where the customer is, you would likely receive BGP messages coming from that interface. Ben Maddison: There is a gap there, because of the fact that you donÕt have to talk about lateral peering and ASPA records. That the paths donÕt show up in BGP because of policy, the adjacencies donÕt show up in the ASPA because they are peering s and as a result that gets left out of the cone. IÕm sure there is a gap there that we would need to look into. It only happens in the partial transit case. Igor: We can look at it. ItÕs an interesting case. 3) Tom Harrison - [10 minutes]?Signed TAL?https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-signed-tal Slides: https://datatracker.ietf.org/meeting/114/materials/slides-114-sidrops-draft-ietf-sidrops-signed-tal-10-00 There were no questions 4) Tim Bruynzeels - [20 minutes] Delegated CAs and Repositories Slides: https://datatracker.ietf.org/meeting/114/materials/slides-114-sidrops-delegated-cas-and-repositories-00 Chris Morrow: Yes please, standardise. Tim: Ok, in that case IÕll send something to the list. Alexander Azimov - [20 minutes] ASPA - 08+ Slides: https://datatracker.ietf.org/meeting/114/materials/slides-114-sidrops-aspa-01.pdf Ben Maddison: I think I said everything on the mailinglist. To reiterate: I think itÕs less work for the Relying Party maintainers if there are fewer objects. I think that the overwhelmingly more common cases that networks have the same -or very close to the same- transits on both address families. One of the things I tried to point out in the recent discussion, is that for an operator that is used to a user interface where it presents this model that both the address families are the same topologically, the day that this operator then needs to read these objects to see what is actually being transmitted on the wire, is much less surprising if it doesnÕt diverge too much from that mental model. The other thing to point out is that there has been a lot of progress in new implementations over the last few weeks, and all of these are based on the version 8 profile. I think rolling that back is quite a lot of work for a fair number of people. IÕm quite strongly in favour of the version 8 change. Alexander; What are you suggesting to do with the RTR specification? Ben Maddison: Certainly the formats in the RTR protocol and the asn1 diverge, but I donÕt feel thatÕs a huge problem. I think that the consideration for the RTR protocol is to make things as convenient as possible for routers to use it in policy decisions. And I think the existing format is mostly fairly well suited for that. Whether we do this translation when it arrives at the router or whether itÕs being processed by the Relying Party, I think the profile change can co-exist with the existing RTR specifications pretty comfortably. Alexander: We can split policies based on the RPKI cache but donÕt you think that for debugging things can become really complicated. Ben Maddison: I think youÕre right, but with the structure to use in provisioning tools and conflict management tools, itÕs inevitably different from what is convenient for the router to store in its internal data structures. So the translation has to happen somewhere. Probably the best place for that translation to happen is the RP because the compute is cheap there and my feeling it that itÕs the least surprising place to occur. Because youÕre going from one protocol to another. Alexander: In the mean time, IÕm reading the chat and I see Randy Bush is still opposing this change and heÕs authoring the RTR draft. I donÕt know how to make everybody happy about the ASPA object style. Maybe we should ask the chairs to be more involved in the process. Ties de Kock: We have an internal implementation of the version 8 profile and IÕll explain why I prefer the version 7 profile in hindsight. In the end, the user interface that users will be presented may or may not align closely to the objects that people create. ItÕs all about making sure that the right objects are created and that there is no confusion in these objects themselves. What we realised after implementing, is that we could create quite a lot of edge cases in the content of version 8, where the content semantically overlaps and you need to take a union there within the objects and covering this with a proper set of test objects was very hard. This is the main reason why I prefer the version 7 object. Even though I really like the idea of having a single signed object per AS. IÕm just afraid that covering all these cases where IPv4 and IPv6 overlap or not could lead to interesting edge cases. Ruediger Volk: When I look at the proposed change of the profile, it looks like version 7 is much less complex than version 8. I guess some of the implications Ties mentioned, are related to the more complex data structure. On the other hand, my understanding is that a more complex data structure is not used in any significant way to express more functionality. IÕm not happy about moving to version 8. I was expecting the work on 8210-bis would not have to be re-done. For the question if we are actually delaying the creation of the operational system, if a re-run of 8210-bis is required, I would strictly oppose the idea. Moving on with version 8, IÕm very unhappy with added complexity. My main concern is that I donÕt want to see development of the operational system being delayed. But complexity is going to have costs and should be avoided. Tim Bruynzeels: This discussion started with a desire to use up less space and the AFI limit came to be as an additional thought in the process. So the first proposal I made then, was to have a single AS object with two distinct lists for each address family, then the address family limit was introduced as a way to compress even further, and then the idea came to be that this might actually reflect what people want to do. All in all, this can express the exact kind of data as you can express now with version 7. In that sense, it really is a matter of preference. I want to second what Ruediger said, I would hate for this discussion to delay deployment and experience with ASPA. If it would come to that, IÕm willing to change my implementation to follow whichever profile the WG finds acceptable. To comment on the data format versus 8210-bis, if you look at ROAs, you can have multiple prefixes in a single ROA object. You donÕt have this structure in your router. You would have to validate multiple ROA objects and make a union of everything. That is what gets sent to the router. And similar, whatever the profile is, the translation can happen at different levels. It can happen in the RP, as is currently done in ROAs already. It can also happen in the UI where I can present users with an interface that allows them to provide a common list and my software can figure it out. ItÕs trivial to do that. My main message is that I want this to work. Randy Bush: What is presented to the user in the UI is arbitrary. In either schemes you can present separate or joint in the UI, it makes no difference. What is on the wire is almost never seen by the operator. A few X.509 keys actually look to the garbage on the wire. When I want to see whatÕs being published, I donÕt look into the repository, I donÕt look on the wire, I look at my router. What is on the router, is separated, IPv4 and IPv6. The 8210-bis change, while tecnically and procedurally possible, is not what we want to do because we want to keep the burden of any hacks north of the router. The whole purpose of 8210-bis is to minimise the load on the router. And the router chews up IPv4 and IPv6 separately. Like it or not, IPv4 and IPv6 topologies are not congruent. This is especially seen in Asia. Is there any operator in this meeting who actually uses multi-protocol BGP so they have IPv4 and IPv6 in a single configuration with their peer, or is it all the rest of us, who have separate sessions for IPv4 and IPv6. ItÕs not pretty, but this is the reality. Rob Austein (relayed by Warren Kunmari):?I am extremely uncomfortable with requiring transit on different AFIs to be on the same path when we well know that sometimes they are not. Maybe I misunderstood the question. Chris Morrow: It sounds like a lot of this discussion should be on the mailing list. Ben Maddison: In response to Ruediger: I donÕt think we're arguing here about more or less complexity in the system as a whole. We're discussing where the complexity should be dealt with. But we need to fix this issue sooner rather than later. Alexander: I'm not the person to declare consensus. Version 7 is simple and can fly faster. Nevertheless letÕs continue the discussion on the list. I will try to summarise this discussion on the list. Sriram Kotikalapudi: Can you go to slide 9? When you have a provider that has no tier 1, we said in the draft they can use AS0 ASPA and that's fine, I think we also said that an IXP route server will also register an AS0 ASPA, thatÕs also fine. A bit more tricky, and not in the draft yet, is about if you have a transit provider who happens to be present as a client on an IXP RouteServer, in that case, they should register an ASPA with the RS as the provider. Do you agree we should include that in the draft. Alexander: It doesn't matter if they are Tier1. If you think something is missing from the document, add it. Sriram Kotikalapudi: I'll discuss some other things as well. Randy Bush: If it's too complex, maybe you should take that as a warning. IXPs that put their ASN in the path, are against specifications. It's not worth it, please reduce complexity. Alexander: IÕm doing my best. Although, I need to stress that they are in the wild. Ben: It's true that a transit-free-network at a non-transparent RS should create that RS as one of its providers , but that is wildly uncommon. In the interest of simplicity, the document should emphasise that a non-transparent IXP RS is just a transit provider, although it forwards on mac addresses and not IP addresses. Alexander: Thank you for the comments. I think we're on the same page here. Sriram Kotikalapudi: It turns out that the non-transparent IXP is less complex than the transparent IXP but both of them can be taken care of in the draft without too much complexity. It appears to me that if the RS is transparent, itÕs maybe worthwhile to just add an ASPA by the client. We need to think about that carefully.