sidrops 113 Friday 25 March 2021 11:30 UTC https://www.youtube.com/watch?v=5e211IkY_ic Agenda 1. Chair slides - Chris Morrow Chris handed over the mike to Allison Mankin, ombudsteam IETF. Allison explained that the ombudsteam received quite a number of cases from people who are concerned and feel uncomfortable with the discussions in sidrops. Allison reminded us of the Code of Conduct. There is a need for more respect and care. Warren added that he never had any code of conduct issues in any of the working groups where he is an AD for. Lars-Johan Liman: I would like to add one other recommendation that has saved me a few times, and that is when you feel upset about something that someone has written, and you write an answer, do that, but donÕt send it. Let is sit there for a couple of hours, go back, have a cup of coffee, read it again and you may find you might want to change a few words here and there, or change the entire tone of the message. Warren: Yes, and if the main discussions are between two or three people, and there is more than two or three e-mails a day, you might want to say stop and take a step back and ask yourself if you really should be having this discussion right now. Allison: We, as the ombudsteam will continue to monitor these cases, so do us a favour and keep reading the code of conduct and think about how to be good to each other. Warren: If someone is feeling stress or attacked, feel free to mention it to the chairs and feel free to mention it to me. 2. Job Snijders - Update on Resource Signed Checklist (RSC) and rpki-client https://datatracker.ietf.org/meeting/113/materials/slides-113-sidrops-sidrops-rsc-00 Jeff Haas: A quick operational question; Is there any long term concern about adding a large number of objects to the RPKI system and impacts on the various applications that use it? Job Snijders: It is very important to know that RSC files are distributed outside the global RPKI repository system. To illustrate what that means exactly: ROAs or CRLs, or manifest files are distributed inside the global repository system, so if you use rsync or RRDP, those are the files you pull into the system, but RSC files are not distributed through that means. They are distributed in a one-to-one fashion. So I could generate an RSC file, e-mail it to you and the global participants in the ecosystem would never know that I generated one and sent it to you. So there is no burden on the global system. Warren Kumari: What is the relation between this draft and ÒRPKI has no identityÓ, it feels like there is a close relation. Job Snijders: The relationship has been noted in the RSC draft itself. The RSC draft references the Òno identityÓ draft and the RSC draft explains that RSC files cannot be used to confirm identity. All it does is it confirms that somebody has possession of the private keys and the resources with which itÕs signed are subordinate to the certificate authority. So from my perspective there is no conflict. Ruediger Volk: Referencing back to Jeffs question about what the load is on the general distribution mechanism, I was surprised by your request for revocation tooling, which quite obviously will have a need on the distribution system. Job Snijders: The only load on the global system is if you revoke an RSC, the serial is appended to the CRL of that CA. So, per RSC that youÕre revoking youÕre adding a few bytes to a CRL. But then again, the RSC files could be short-lived. This is something weÕll have to figure out in the wild. 3. Ben Maddison - Discard Origin Authorization (DOA) https://datatracker.ietf.org/doc/slides-113-sidrops-sidrops-doa/ Jeff Haas: I have not read your draft. My question is: Is this intended to address RTHV adjacent AS. Ben Maddison: That the purpose of the Peer-AS ID field. The default behaviour is that this will not allow transit for RTVH routes, but if you add transit of one of your providers to that list of peer-AS IDs, thatÕs a signal to the receiver that you have authorised that transit and it should matched and accepted. 4. Ignas Bagdonas - BGPsec performance scalability https://datatracker.ietf.org/doc/slides-113-sidrops-sidrops-bgpsec-scalability/ Sriram Kotikalapudi: We did some studies with caching the signatures that have been verified during the signature verification on the updating cache segments of the AS path, the signatures that you have verified and next time the same update or another update that has a common AS path segment with the previous one you can make use of the cache so that is another way of improving the performance. Perhaps you have taught of it? Ignas Bagdonis: Yes, I did. This is actually contrary to the recommended practices of using elliptic signatures. You can do this only if your random number is stable and that leaks your key. ThatÕs not the right thing to do. Caching is possible, and rearranging a few things here and there, you can cache and thatÕs the point. However, signature signing and verification for AS path longer than, in this particular instance 4 or 5 hops, becomes less computationally expensive than calculating the hash. And that is the problem. So youÕre not limited by the performance of elliptic curve as such, you are limited by the overall performance of the memory system. Job Snijders: You asked: Do we care? I can indicate, just like in IEPG, I do care. And I do think now is a good time to start work on this. I think version zero will give us valuable operational feedback on how it works in the wild, provided that BGPsec router key publication becomes easily accessible to operators, and from there migrating to a performance enhanced version seems a very logical and organic way to further the development of this protocol. 5) Sriram Kotikalapudi - ASPA Verification Algorithms: Enhancements and RS Considerations https://datatracker.ietf.org/meeting/113/materials/slides-113-sidrops-aspa-verification-procedures-01 Ben Maddison: IÕm a little confused by the route server handling. What is broken prior to this update. There is no difference from the perspective of AS4, if the route server is transparent, it can be ignored all together. And if itÕs a non-transparent route server, then itÕs indistinguishable from a transit provider. All that needs to happen in order for AS4 to correctly detect this as a leak, is AS1 needed to create some ASPA with any content, as long as it doesnÕt have AS3 in it. And thatÕs what the previous version of the algorithm said. I find the additional corner cases a complication rather than a simplification. Sriram Kotikalapudi: If you focus on, here weÕre looking at AS4, from AS4 points of view, what you said is correct. It doesnÕt need to know about the presence of the route server being transparent or not, it needs the ASPAs that the tools RS clients should have with any ASN. But if you look at from the point of view from AS3, when AS3 RS client is invalidating AS, then it helps for it to see that AS1 has registered an ASPA including the RS ASN in it. And that is one reason toÉit is not necessary to include the RS ASN in the ASPA, like you said, because they are already assuming that the non-transparent route server is a rarity. However, in cases when the RS is a non-transparent or even otherwise, it helps the route server client. To AS4 it doesnÕt matter, but for the route server client AS3 it matters to some extend and to have the ASN of the route server in the ASPA. Ben Maddison: I think itÕs important to realise that AS3 knows itÕs speaking to a route server. I donÕt think that having corner cases in the protocol helps anyone. I think itÕs more complication. And the validation that AS3 applies, can take into account its local knowledge. I think this makes the validation procedure harder to understand. Alexander Azimov: The problem appears in a slightly different topology. Imagine, on this drawing, that AS4 is a customer. And it received a prefix from AS3, and in this case, if AS1 hasnÕt signed AS2 as its provider, it will treat such a route as a route leak. So, one hop away from the non-transparent IXP, we cannot distinguish if it is a route leak or if it is a non-transparent IXP. Ben Maddison: Are you talking about the case where the IXP route server is non-transparent? Alexander Azimov: Yes. And a slightly different drawing where AS4 is a customer of AS1. Ben Maddison: In that case, we embrace the fact that the route server is a transit provider. Alexander Azimov: I agree with you. In this direction, I had a plan to change the document. Sriram Kotikalapudi: To what you already said; for the non-transparent case, it is not so rare. Ben can you send an e-mail to the list? 6) Job Snijders - RPKI & Certificate Transparency https://datatracker.ietf.org/meeting/113/materials/slides-113-sidrops-sidrops-ct-00 Ties de Kock: I like the idea of CT, and I see a lot of value in applying CT to EE certificates, like the Resource Signed Checklist because itÕs very hard to observe objects and to know what was actually published and unless you have CT on EE certificates, you cannot show an important attack in RPKI, which is the omission of objects from the view that you present to somebody. It is critical that EE certificates are included. Job Snijders: It would be cool if EE certificates can be included in the CT log infrastructure, and IÕm not excluding that path, but to reduce the scope and get somewhere, I think itÕs great if we start with CAs. And maybe add more to it. Ties de Kock: I donÕt see much benefit in removing the code path where for a CA certificate, you submit it to the log, incorporate the RSC and for EE certificates you donÕt. WeÕd have to prototype this. If you want the Relying Parties to check CT, they will need to check the attestations that are in the CA certificates. Which means that when you want to create a CA certificate, you need to get enough responses from qualified logs (at least in the web context) and that implies that log availability causes an upper bound on CA availability, and more brittleness in the RPKI scares me a lot, being an actual CA operator. How do you think about this risk? Job Snijders: I consider RP implementations, at this point of time out of scope. UPs are believers and they just absorb RRDP and rsync. Separately, we would create verifiers, maybe based on existing RP coded bases, and they would absorb the logs and maybe use net monitoring alerts. An RP in the RPKI context is a believer, not a verifier. Koen van Hove: In WebPKI, the end game of CT, is that if a CA really misbehaves, we remove it from our trust, and we no longer trust this CA. What do you see as the end game for RPKI? There is currently no alternatives for the RIRs. Job Snijders: If an RIR misbehaves, I will remove them from mu truster. Koen van Hove: So the goal is to see if an RIR misbehaves? Job Snijders: The first call is to engage with the RIR and confirm with them the situation and request an RFO. But if the same type of incident happens over and over again, or if there are systematic issues, it could motivate some operators to remove temporarily or permanently cease that Trust Anchor. So the goal of transparency is to be able to hold organisations accountable. Distrusting the root is the end of the process. Russ Housley: I have real problems with this work. My concern is that RPKI, unlike WebPKI, was constructed so that the CA is authoritative for the resources that it issues. For the WebPKI, all of the roots are able to do anything with any aspect of the name space. When we started working on this, the Internet Architecture Board suggested IANA ran 0/0 and the RIRs would be subordinates. To accommodate easier transfers amongst the RIRs, each of the RIRs became equal roots for 0/0. I argue that you donÕt need this (CT), if you go back to the first model. Job Snijders: I should clarify, in the RPKI ecosystem there are 22.000 CAs. The ones I think CT should apply to are the RIRs that have the 0/0 certificates and their intermediate operational certificates. The moment this bounces to an LIR, they can only harm themselves. Ruediger Volk: Following Russ, some of the basics of WebPKI and RPKI are very different. You should keep in mind that in RPKI, CA and identity are not really the same thing, so looking at WebPKI is not the most valuable thing to do. Establishing tracking mechanisms and monitoring for what is in the RPKI, is important. For resource holders, an independent signalling of what the global view of their resources is. WebPKI is directing you to bad tracks. I donÕt want to dismiss this effort overall, but I donÕt think itÕs heading in the right direction. Ben Maddison: We need to distinguish better between signing events and publication events. They happen close together in time, but they are not the same. This is about signing events that are not visible though any theoretical version of the publication system. CT is about the signing events. Also, I know and trust that my RIR does things with the right intentions. But that is not where the chain of trust needs to stop. I need to be able to demonstrate to a third party when the RPKI causes some substantial outage for one of my customers. Having some version of CT allows me to use this in a more robust fashion. 7) Koen van Hove - RPKI off the beaten happy path https://datatracker.ietf.org/meeting/113/materials/slides-113-sidrops-rpki-off-the-beaten-happy-path-00 (1.52.16) Job Snijders: I think that Publish in Parent is a technique that helpt the entire ecosystem. One of the fears is that a sibling CA of yours can do something that somehow knocks you out, and for instance in the partial RPKI data example you listed that if CA 3 has an issue, CA 4 disappearsÉa lot of scenarios are alleviated if Publish in Parent is used. For this reason and other reasons, we as ecosystem should strive to encourage the default setting is ÒPublish in your ParentÓ. Because it makes life easier. If you publish in the parent, the parent can -out of band -apply some restrictions. Like with e-mail, in the SMTP protocol itÕs not encoded that I can only send you up to 10 MB, but if I try to send you a 10MB e-mail, your mail server might say itÕs too large. This is local policy, that each parent repository can apply local policy as it sees fit. Koen van Hove: You make a good point, but then you get a first class citizen (publish in parent) and a second class citizen as Delegated. I think that thatÕs a consequence of that solution that people need to be aware of. Tim Bruijnzeels: I think this is an example of a number of issues that may occur, and the question how we should deal with them. Also, to be a bit more specific, my feeling is that there are things to be discussed with regards to these suggestions Job just made. I think there might be work there, but the current reality is that parent CAs can only be reactive. I think we should look at more pro-active measures. If we want to do things with repositories, that implies that we need to look at the publication protocol. It also implies that we may need to think about what trusted repositories are. The current reality is that we can only be reactive. Ruediger Volk: Thinking of bad characteristics of rsync, it has been identified as a danger spot and is being replaced. For the volumetric attacks that you described, I think they will blow up earlier than they hit the routers. What you really should be checking is your first slide, where you only told the ROAs that certain CAs are supposed to publish, and you did now show what resource sets the CAs were holding. The issue that you constructed, depended on the unusual idea that the delegation of resources was not hierarchical and overlapping between siblings. The monitoring and tracking system should show that. And the policies, that this should not be happening when running your registry, have not been formally raised, but very well understood. Koen van Hove: I want to point out that this was based on a real-life example which is currently in the APNIC and IDNIC relationship. It doesnÕt happen a lot, but it happens. Ruediger Volk: That boils down to Russ his previous remark, about having single or multiple roots. And not having clear and formal policies about how the resources are managed under the overlapping roots. Ben Maddison:I think this is an important problem. We need to be clear on what the action we take is. IÕm not convinced that any of the actions we take against the potential DoS-es that exist should be changes to a protocol. I think we need to be much clearer on how Relying Parties are dealing with placing limits on their willingness to traverse trees and lots of directories and objects. ItÕs not necessarily the case that they need to implement the same protection mechanisms. But it would be good if there was some collaboration between Relying Party implementers to document what recognised attack vectors there are and how they deal with them. This could be an informational document. Jared Mauch: This reminds me of the early days of usenet news. You would have these files and they would get transmitted over this protocol. One of the companies that decided to build commercial software to run a usenet news server, found out that leveraging the underlying operating system was actually inefficient. The data is just still data, it doesnÕt mean an implementation should just look at: Do we abstract this out and store it in our own internal data store, optimised for that use case. Maybe that historical context is of use in this. We should be looking beyond the actual file systems. Koen van Hove: I agree, and for RRDP a lot of implementations already do that. The rsync protocol makes it more difficult to achieve that. But rsync is still a requirement. Ties de Kock: We had the initial reports about these issues and we resolved parts of these attack vectors. It got a lot harder to do a Dos attack on all Relying Parties worldwide. However, you also showed that if you want to attack a specific instance, you can still so that. Because itÕs really hard to set these limits in a way that in a recursive case, which still needs to be quite wide, because some CAs are quite wide, you cannot abuse it. So, in my opinion we need some work on that in sidrops. So that at least relying party instances can detect it when the administrative domain changes when traversing the tree. For some entities it may be logical that they have an extremely large repository, while for a non-RIR has less objects in there. I think we should continue investigating this issue. Warren closes the meeting and thanks the speakers.