Minutes IETF103: sidrops
minutes-103-sidrops-01
| Meeting Minutes | SIDR Operations (sidrops) WG | |
|---|---|---|
| Title | Minutes IETF103: sidrops | |
| State | Active | |
| Other versions | plain text | |
| Last updated | 2018-11-06 |
minutes-103-sidrops-01
---------- primary note taker --------------
Sidrops notes
* Oliver Borchert
* Community interest in an informational RFC?
* Oliver suggests the wg maintains a living document containing
operational experience, similar to what was suggested in GROW
* Other input?
* Oliver requests that anyone who has ideas about what else to test
contact him
* Keyer Patel
* as_set is not sorted, RFC says is should be “not found”
* 8097 you see multiple states, yes because they are out of sync
* Alex Azimov
* Do we need to update 8097 so that the state is not shared?
* Oliver suggests there should be more than one cache for each router
* Oliver suggests that you do validation in iBGP, rather than using the
8097 community * Alex: did you look at the BIRD implementation? The
cache is internal. * Oliver suggests that we need to think about this
issue. There are different ways to deal with it, it’s good to understand
the issues so you can decide.
* Ruediger Volk
* We thought that as_set would disappear, but maybe they are here to
stay. The aggregator should appear as the last as before the as_set. *
Oliver: We use as_sets to prevent loops * R: yes, we should document,
and we should identify a repository for information
* Job Snijders
* as_set in openbgp is ignored when doing validation, bird has a knob
for this * Oliver: look at your * Many people ask what the performance
impact is? Job suggests a single document dedicated to showing there is
no real performance impact.
* Oliver Borchert - 2 -> Validation unverified
* The distinction between “not found” and “unverified” helps especially
with debugging. It also allows for more precise local policy based on
validation states. * Rob Austein
* No strong objection, but I think it’s a little confused. Origin
Validation has three states for simplicity. Not found is everything
that is neither valid/invalid. * Invalid -> wrong and I can prove it
* Valid -> right and I can prove it * Not found -> everything else,
including not validated * BGPSec -> valid, right and I can prove it,
invalid -> everything else
* Ruediger Volk
* Marking the difference between “not found” and “unverified” is
useful in origin validation. I expect “not found” to mean that there
is no covering ROA. * BGPSec
* Oliver suggests that you can also treat “unverified” the same as “not
found” or “invalid” respectively is possible * Ruediger suggests that
this will allow the use of 8097 community signalling, allowing to use
the * Chris asks where the work is, Olvier
* Alex Azimov
* ASPA is not a general solution for all issues, but it solves a lot of
operational security issues * Alex asks for adoption * Ruediger Volk asks
about open-policy clean-up, Alex states there is some ad-hoc text that needs
to be cleaned up still.
* Sriram Kotikalapudi (drop invalid if still routable)
* Job Snijders, this may be overtaken by events as origin validation is
being deployed now. The majority of problem ROAs are with few parties, and
they can be reached to fix. The timeline for new router software is a few
years. * Warren Kumari dropping in IETF network * Ruediger Volk there should
not be a special state for AS0 * Tim said that if special state of AS0
exists and is intended then it needs special treatment
* George Michaelson (deploy reconsidered)
* Ruediger Volk we need an implementation and iterop testing, Geoff Huston
agrees.
----------- secondary note taker ------------
SIDROPS 103 - Bangkok
1a) Oliver Borchert - [10 minutes]
https://datatracker.ietf.org/doc/draft-borchert-sidrops-rpki-state-unverified-00.txt
testing router implementations of RPKI and OV.
Working on convergence time changes/etc based on no policy at all, how long to
converge
what difference? - 2-7% averages
With policy the change is 2-7% as well
With validating caches - made tests with cache operations (if router crashes
or cache crashes...)
Detection of loss of cache - configuration based timeouts/etc.
If fast discovery, more churn in iBGP - because validation changes incur
re-announce.
Turn down timers on the cache connetion, to enable failure survival (use 2
caches)
Validation state signaling using rfc8097
With and without AS-SET routes have some interesting challenges.
(skipping a few slids since we covered the content previously)
Impact on iBGP traffic using OV signaling, There's some interesting churn in
announcement counts.
If a announce with 3 prefixes, and 2 of 3 validate, the output on iBGP is 2
announcements. Conflicting results in OV data can be caused by differences
in cache content sent to routers.
Consider how validation content is cached in your network, to avoid
oscillating decision process
inside your network.
Some conclusion, there's more/better documentation to be made. there should
be more examples and useful examples. In both router-vendor and rpki cache
vendor documentation.
Be concerned about stability of the cache and it's view of the world, this
MAY cause instability issues in routes inside your iBGP.
"Should there be documentation/a-document about how to configure/use/etc
these features?"
call-out to GROW's 'living documents' effort?
A request for more ideas about tests and such from the community.
What wouldbe valuable from this perspective?
(questions from the audience)
Agreement from audience about writing a document.
Discussion about AS-Sets and their confusing semantics in OV.
1b) Oliver Borchert - Proposing 'Unverified" state for routes.
It's important to distinguish between valid/unvalid/unfound. Unfound is
'watered down'. 6811 could be updated with paragraph 2 prefix-to-as-mapping
(slide 5)
Unverified says that the router choose not to validate or a bug, on the next
ibgp hop away. 8097 should also have the proper changes too. Blind trust in
the peer here is the problem.
Oliver's stance is that policy writing about OV/BGPSEC can expose the need
for this sort of state. 8205 as well needs updating. "unverified" means:
Router did not even attempt verification/validation.
Audience Interaction time!
'rob austein' - not objecting so much as saying that confusion reigns.
There's a clear use of the states as information about the validation
algorithm...
'rudiger volk' - OV/BGPSEC are different from this unverified perspective.
2) Tim Bruijnzeels - TA Key Roll Document
Conversations of last meeting ended with:
"multiple keys in parallel should be possible, at least 2,
support of planned/unplanned rollover."
Learn some lessons from DNSSec...
Make some running code which implements this as well. Should this be at the
hackathon? Additional slides on actual process processing... TAL reconfig
requires problems with deploying a new TAL, which could be messy...
A TAL is just plaintext with keys and fingerprints.
the new object could be: "TAK" - "Trust Anchor Keys"
The validation software has TAL, and can get the TAK, the TAK verifies the
TAL, done. For adding more TAL/TAK, just cross sign the TAK files to the
TALs... verify the TAL and keep on (follow the picograms from slide 7-> end)
audience questions - keyrolls suck TL;DR(tm)...
There are a bunch of 'holy crow' things that wes may write up, from tls,
dnssec, etc... Prepare with many more keys than you expect. If things rely
upon software updates, that can take time, lots of time. 5011 type changes
COULD take a long time as well. manual config sucks too... everyone waits
too long :(
tim: yea...
job: powerdns peeps would like to chat about this as well. (from dnssec
lessons) rudiger: rpki/dnssec there are differences in their usages.
rpki RP software may/probably-does-have some alerting/etc when problems
arise with key materials.
different from dns-sec, the RP software expects smarter ops (whoops)
Russ - certs in rpki have TTL/expiry, dnssec dos not.
Randy - no data means 'not-found' - not 'invalid' which is nicer than how
dns-sec managed the problem.
3) alexander azimov - leak detection review.
bgp-open-policies
communuties
new-rpki object
possibly we don't need the communities based problem?
time could be the key here, no?
ROA deployment is taking 'long time'... Perhaps communities can help
it(detection) happen quicker?
ASPA updated and now new abstract and restructure.
clarification in -01 - tries to detect hijacks/leaks - not for as-path
problems on th whole.
bgp-open-policy cleanup/lsat-call request (leak prevention)
route-leak-detection-mitigation detects mistakes (last-call request?)
aspa-verification - adoption call requested now.
4) sriram - drop-invalids
"What is this thing?" - Drop invalid if valid or not found less specific
exists.
Addressed questions / comments from list/readers:
AS0 ROA wording adjusted/included.
"if as0, always drop"
default route has to be excluded (right?)
yes, must be excluded...
See slide 5 about multihoming picturey
"multihoming customers have to understand RPKI.. because it's important"
Draft updated to -02 wee!
Some possible feasibliity discussion coming in slides 7-8
(please review slides for details)
Audience Questions:
Job: not super sure if this draft is going to be helpful?
OV deployment(s) seem to be reject invalid and go make people get
fixed. DISR might have been more 'ease into this..' instead of
using your network position to fix problems. Polite operators note
that just calling the offenders is probably the best path here.
warren: notes that the NOC / IETF is dropping invalids, no one has noticed
and no tickets... so it's quite possible that there's no impact.
rudiger volk: timing is important here, but AS0 is not special in anyway...
it's just that we know that as0 routes should never exist.
5) George Michaelson - Deploying Validation Reconsidered
Problem statement again - need to get coordination between RPs, Signers and
such.. in order to move the new validation considered world.
Running over the possible changes required: "How hard is this?" - not
entirely sure, lots of disagreement. There's a bunch of RIR angst about this
problem... Please define a date to get stuff done:
new model of validation
new OID
get discussion/etc at *NOG so operators can/should get their ducks
lined up.
a draft exists, for folk to read/review.
Asking for adoption of this draft so we can discuss/move-forward.
There's angst here about deadlock, and that without anyone moving forward
we'll be stuck here 'forever'. code must change tals must change
Since the deployment appears small today (<500 or so?) perhaps NOW is the
time to move not when we have 'thousands' or 'tens of thousands' of users?
Please adopt this draft... so we can move forward.
CHAIR: "PLEASE ASK FOR ADOPTION ON-LIST"
George is asking also about a date ... what is a good date? how far out?
--------------------------------------------------------------