---------- primary note taker -------------- Sidrops notes * Oliver Borchert * Community interest in an informational RFC? * Oliver suggests the wg maintains a living document containing operational experience, similar to what was suggested in GROW * Other input? * Oliver requests that anyone who has ideas about what else to test contact him * Keyer Patel * as_set is not sorted, RFC says is should be “not found” * 8097 you see multiple states, yes because they are out of sync * Alex Azimov * Do we need to update 8097 so that the state is not shared? * Oliver suggests there should be more than one cache for each router * Oliver suggests that you do validation in iBGP, rather than using the 8097 community * Alex: did you look at the BIRD implementation? The cache is internal. * Oliver suggests that we need to think about this issue. There are different ways to deal with it, it’s good to understand the issues so you can decide. * Ruediger Volk * We thought that as_set would disappear, but maybe they are here to stay. The aggregator should appear as the last as before the as_set. * Oliver: We use as_sets to prevent loops * R: yes, we should document, and we should identify a repository for information * Job Snijders * as_set in openbgp is ignored when doing validation, bird has a knob for this * Oliver: look at your * Many people ask what the performance impact is? Job suggests a single document dedicated to showing there is no real performance impact. * Oliver Borchert - 2 -> Validation unverified * The distinction between “not found” and “unverified” helps especially with debugging. It also allows for more precise local policy based on validation states. * Rob Austein * No strong objection, but I think it’s a little confused. Origin Validation has three states for simplicity. Not found is everything that is neither valid/invalid. * Invalid -> wrong and I can prove it * Valid -> right and I can prove it * Not found -> everything else, including not validated * BGPSec -> valid, right and I can prove it, invalid -> everything else * Ruediger Volk * Marking the difference between “not found” and “unverified” is useful in origin validation. I expect “not found” to mean that there is no covering ROA. * BGPSec * Oliver suggests that you can also treat “unverified” the same as “not found” or “invalid” respectively is possible * Ruediger suggests that this will allow the use of 8097 community signalling, allowing to use the * Chris asks where the work is, Olvier * Alex Azimov * ASPA is not a general solution for all issues, but it solves a lot of operational security issues * Alex asks for adoption * Ruediger Volk asks about open-policy clean-up, Alex states there is some ad-hoc text that needs to be cleaned up still. * Sriram Kotikalapudi (drop invalid if still routable) * Job Snijders, this may be overtaken by events as origin validation is being deployed now. The majority of problem ROAs are with few parties, and they can be reached to fix. The timeline for new router software is a few years. * Warren Kumari dropping in IETF network * Ruediger Volk there should not be a special state for AS0 * Tim said that if special state of AS0 exists and is intended then it needs special treatment * George Michaelson (deploy reconsidered) * Ruediger Volk we need an implementation and iterop testing, Geoff Huston agrees. ----------- secondary note taker ------------ SIDROPS 103 - Bangkok 1a) Oliver Borchert - [10 minutes] https://datatracker.ietf.org/doc/draft-borchert-sidrops-rpki-state-unverified-00.txt testing router implementations of RPKI and OV. Working on convergence time changes/etc based on no policy at all, how long to converge what difference? - 2-7% averages With policy the change is 2-7% as well With validating caches - made tests with cache operations (if router crashes or cache crashes...) Detection of loss of cache - configuration based timeouts/etc. If fast discovery, more churn in iBGP - because validation changes incur re-announce. Turn down timers on the cache connetion, to enable failure survival (use 2 caches) Validation state signaling using rfc8097 With and without AS-SET routes have some interesting challenges. (skipping a few slids since we covered the content previously) Impact on iBGP traffic using OV signaling, There's some interesting churn in announcement counts. If a announce with 3 prefixes, and 2 of 3 validate, the output on iBGP is 2 announcements. Conflicting results in OV data can be caused by differences in cache content sent to routers. Consider how validation content is cached in your network, to avoid oscillating decision process inside your network. Some conclusion, there's more/better documentation to be made. there should be more examples and useful examples. In both router-vendor and rpki cache vendor documentation. Be concerned about stability of the cache and it's view of the world, this MAY cause instability issues in routes inside your iBGP. "Should there be documentation/a-document about how to configure/use/etc these features?" call-out to GROW's 'living documents' effort? A request for more ideas about tests and such from the community. What wouldbe valuable from this perspective? (questions from the audience) Agreement from audience about writing a document. Discussion about AS-Sets and their confusing semantics in OV. 1b) Oliver Borchert - Proposing 'Unverified" state for routes. It's important to distinguish between valid/unvalid/unfound. Unfound is 'watered down'. 6811 could be updated with paragraph 2 prefix-to-as-mapping (slide 5) Unverified says that the router choose not to validate or a bug, on the next ibgp hop away. 8097 should also have the proper changes too. Blind trust in the peer here is the problem. Oliver's stance is that policy writing about OV/BGPSEC can expose the need for this sort of state. 8205 as well needs updating. "unverified" means: Router did not even attempt verification/validation. Audience Interaction time! 'rob austein' - not objecting so much as saying that confusion reigns. There's a clear use of the states as information about the validation algorithm... 'rudiger volk' - OV/BGPSEC are different from this unverified perspective. 2) Tim Bruijnzeels - TA Key Roll Document Conversations of last meeting ended with: "multiple keys in parallel should be possible, at least 2, support of planned/unplanned rollover." Learn some lessons from DNSSec... Make some running code which implements this as well. Should this be at the hackathon? Additional slides on actual process processing... TAL reconfig requires problems with deploying a new TAL, which could be messy... A TAL is just plaintext with keys and fingerprints. the new object could be: "TAK" - "Trust Anchor Keys" The validation software has TAL, and can get the TAK, the TAK verifies the TAL, done. For adding more TAL/TAK, just cross sign the TAK files to the TALs... verify the TAL and keep on (follow the picograms from slide 7-> end) audience questions - keyrolls suck TL;DR(tm)... There are a bunch of 'holy crow' things that wes may write up, from tls, dnssec, etc... Prepare with many more keys than you expect. If things rely upon software updates, that can take time, lots of time. 5011 type changes COULD take a long time as well. manual config sucks too... everyone waits too long :( tim: yea... job: powerdns peeps would like to chat about this as well. (from dnssec lessons) rudiger: rpki/dnssec there are differences in their usages. rpki RP software may/probably-does-have some alerting/etc when problems arise with key materials. different from dns-sec, the RP software expects smarter ops (whoops) Russ - certs in rpki have TTL/expiry, dnssec dos not. Randy - no data means 'not-found' - not 'invalid' which is nicer than how dns-sec managed the problem. 3) alexander azimov - leak detection review. bgp-open-policies communuties new-rpki object possibly we don't need the communities based problem? time could be the key here, no? ROA deployment is taking 'long time'... Perhaps communities can help it(detection) happen quicker? ASPA updated and now new abstract and restructure. clarification in -01 - tries to detect hijacks/leaks - not for as-path problems on th whole. bgp-open-policy cleanup/lsat-call request (leak prevention) route-leak-detection-mitigation detects mistakes (last-call request?) aspa-verification - adoption call requested now. 4) sriram - drop-invalids "What is this thing?" - Drop invalid if valid or not found less specific exists. Addressed questions / comments from list/readers: AS0 ROA wording adjusted/included. "if as0, always drop" default route has to be excluded (right?) yes, must be excluded... See slide 5 about multihoming picturey "multihoming customers have to understand RPKI.. because it's important" Draft updated to -02 wee! Some possible feasibliity discussion coming in slides 7-8 (please review slides for details) Audience Questions: Job: not super sure if this draft is going to be helpful? OV deployment(s) seem to be reject invalid and go make people get fixed. DISR might have been more 'ease into this..' instead of using your network position to fix problems. Polite operators note that just calling the offenders is probably the best path here. warren: notes that the NOC / IETF is dropping invalids, no one has noticed and no tickets... so it's quite possible that there's no impact. rudiger volk: timing is important here, but AS0 is not special in anyway... it's just that we know that as0 routes should never exist. 5) George Michaelson - Deploying Validation Reconsidered Problem statement again - need to get coordination between RPs, Signers and such.. in order to move the new validation considered world. Running over the possible changes required: "How hard is this?" - not entirely sure, lots of disagreement. There's a bunch of RIR angst about this problem... Please define a date to get stuff done: new model of validation new OID get discussion/etc at *NOG so operators can/should get their ducks lined up. a draft exists, for folk to read/review. Asking for adoption of this draft so we can discuss/move-forward. There's angst here about deadlock, and that without anyone moving forward we'll be stuck here 'forever'. code must change tals must change Since the deployment appears small today (<500 or so?) perhaps NOW is the time to move not when we have 'thousands' or 'tens of thousands' of users? Please adopt this draft... so we can move forward. CHAIR: "PLEASE ASK FOR ADOPTION ON-LIST" George is asking also about a date ... what is a good date? how far out? --------------------------------------------------------------