Skip to main content

Minutes IETF103: sidrops
minutes-103-sidrops-01

Meeting Minutes SIDR Operations (sidrops) WG
Date and time 2018-11-06 06:50
Title Minutes IETF103: sidrops
State Active
Other versions plain text
Last updated 2018-11-06

minutes-103-sidrops-01
---------- primary note taker --------------

Sidrops notes

* Oliver Borchert
   * Community interest in an informational RFC?
       * Oliver suggests the wg maintains a living document containing
       operational experience, similar to what was suggested in GROW
   * Other input?
       * Oliver requests that anyone who has ideas about what else to test
       contact him
   * Keyer Patel
       * as_set is not sorted, RFC says is should be “not found”
       * 8097 you see multiple states, yes because they are out of sync
   * Alex Azimov
       * Do we need to update 8097 so that the state is not shared?
       * Oliver suggests there should be more than one cache for each router
       * Oliver suggests that you do validation in iBGP, rather than using the
       8097 community * Alex: did you look at the BIRD implementation? The
       cache is internal. * Oliver suggests that we need to think about this
       issue. There are different ways to deal with it, it’s good to understand
       the issues so you can decide.
   * Ruediger Volk
       * We thought that as_set would disappear, but maybe they are here to
       stay. The aggregator should appear as the last as before the as_set. *
       Oliver: We use as_sets to prevent loops * R: yes, we should document,
       and we should identify a repository for information
   * Job Snijders
       * as_set in openbgp is ignored when doing validation, bird has a knob
       for this * Oliver: look at your * Many people ask what the performance
       impact is? Job suggests a single document dedicated to showing there is
       no real performance impact.

   * Oliver Borchert - 2 -> Validation unverified
       * The distinction between “not found” and “unverified” helps especially
       with debugging. It also allows for more precise local policy based on
       validation states. * Rob Austein
           * No strong objection, but I think it’s a little confused. Origin
           Validation has three states for simplicity. Not found is everything
           that is neither valid/invalid. * Invalid -> wrong and I can prove it
           * Valid -> right and I can prove it * Not found -> everything else,
           including not validated * BGPSec -> valid, right and I can prove it,
           invalid -> everything else
       * Ruediger Volk
           * Marking the difference between “not found” and “unverified” is
           useful in origin validation. I expect “not found” to mean that there
           is no covering ROA. * BGPSec
       * Oliver suggests that you can also treat “unverified” the same as “not
       found” or “invalid” respectively is possible * Ruediger suggests that
       this will allow the use of 8097 community signalling, allowing to use
       the * Chris asks where the work is, Olvier

* Alex Azimov
   * ASPA is not a general solution for all issues, but it solves a lot of
   operational security issues * Alex asks for adoption * Ruediger Volk asks
   about open-policy clean-up, Alex states there is some ad-hoc text that needs
   to be cleaned up still.

* Sriram Kotikalapudi (drop invalid if still routable)
   * Job Snijders, this may be overtaken by events as origin validation is
   being deployed now. The majority of problem ROAs are with few parties, and
   they can be reached to fix. The timeline for new router software is a few
   years. * Warren Kumari dropping in IETF network * Ruediger Volk there should
   not be a special state for AS0 * Tim said that if special state of AS0
   exists and is intended then it needs special treatment

* George Michaelson (deploy reconsidered)
   * Ruediger Volk we need an implementation and iterop testing, Geoff Huston
   agrees.

----------- secondary note taker ------------
SIDROPS 103 - Bangkok

1a) Oliver Borchert - [10 minutes]
https://datatracker.ietf.org/doc/draft-borchert-sidrops-rpki-state-unverified-00.txt

testing router implementations of RPKI and OV.
 Working on convergence time changes/etc based on no policy at all, how long to
 converge
   what difference? - 2-7% averages
   With policy the change is 2-7% as well

 With validating caches - made tests with cache operations (if router crashes
 or cache crashes...)
   Detection of loss of cache - configuration based timeouts/etc.
     If fast discovery, more churn in iBGP - because validation changes incur
     re-announce.
  Turn down timers on the cache connetion, to enable failure survival (use 2
  caches)

  Validation state signaling using rfc8097
  With and without AS-SET routes have some interesting challenges.

  (skipping a few slids since we covered the content previously)

  Impact on iBGP traffic using OV signaling, There's some interesting churn in
  announcement counts.
    If a announce with 3 prefixes, and 2 of 3 validate, the output on iBGP is 2
    announcements. Conflicting results in OV data can be caused by differences
    in cache content sent to routers.

    Consider how validation content is cached in your network, to avoid
    oscillating decision process
      inside your network.

    Some conclusion, there's more/better documentation to be made. there should
    be more examples and useful examples. In both router-vendor and rpki cache
    vendor documentation.

    Be concerned about stability of the cache and it's view of the world, this
    MAY cause instability issues in routes inside your iBGP.

    "Should there be documentation/a-document about how to configure/use/etc
    these features?"
      call-out to GROW's 'living documents' effort?

    A request for more ideas about tests and such from the community.
    What wouldbe valuable from this perspective?

    (questions from the audience)
      Agreement from audience about writing a document.
      Discussion about AS-Sets and their confusing semantics in OV.

1b) Oliver Borchert - Proposing 'Unverified" state for routes.
  It's important to distinguish between valid/unvalid/unfound. Unfound is
  'watered down'. 6811 could be updated with paragraph 2 prefix-to-as-mapping
  (slide 5)

  Unverified says that the router choose not to validate or a bug, on the next
  ibgp hop away. 8097 should also have the proper changes too. Blind trust in
  the peer here is the problem.

  Oliver's stance is that policy writing about OV/BGPSEC can expose the need
  for this sort of state. 8205 as well needs updating. "unverified" means:
  Router did not even attempt verification/validation.

  Audience Interaction time!
    'rob austein' - not objecting so much as saying that confusion reigns.
      There's a clear use of the states as information about the validation
      algorithm...

    'rudiger volk' - OV/BGPSEC are different from this unverified perspective.

2) Tim Bruijnzeels - TA Key Roll Document
  Conversations of last meeting ended with:
      "multiple keys in parallel should be possible, at least 2,
       support of planned/unplanned rollover."

  Learn some lessons from DNSSec...
  Make some running code which implements this as well. Should this be at the
  hackathon? Additional slides on actual process processing... TAL reconfig
  requires problems with deploying a new TAL, which could be messy...
   A TAL is just plaintext with keys and fingerprints.
   the new object could be: "TAK" - "Trust Anchor Keys"

   The validation software has TAL, and can get the TAK, the TAK verifies the
   TAL, done. For adding more TAL/TAK, just cross sign the TAK files to the
   TALs... verify the TAL and keep on (follow the picograms from slide 7-> end)

   audience questions - keyrolls suck TL;DR(tm)...
     There are a bunch of 'holy crow' things that wes may write up, from tls,
     dnssec, etc... Prepare with many more keys than you expect. If things rely
     upon software updates, that can take time, lots of time. 5011 type changes
     COULD take a long time as well. manual config sucks too... everyone waits
     too long :(
   tim: yea...
   job: powerdns peeps would like to chat about this as well. (from dnssec
   lessons) rudiger: rpki/dnssec there are differences in their usages.
     rpki RP software may/probably-does-have some alerting/etc when problems
     arise with key materials.
         different from dns-sec, the RP software expects smarter ops (whoops)

    Russ - certs in rpki have TTL/expiry, dnssec dos not.
    Randy - no data means 'not-found' - not 'invalid' which is nicer than how
    dns-sec managed the problem.

3) alexander azimov - leak detection review.
  bgp-open-policies
  communuties
  new-rpki object

  possibly we don't need the communities based problem?
  time could be the key here, no?
  ROA deployment is taking 'long time'... Perhaps communities can help
  it(detection) happen quicker?

  ASPA updated and now new abstract and restructure.
    clarification in -01 - tries to detect hijacks/leaks - not for as-path
    problems on th whole.

  bgp-open-policy cleanup/lsat-call request (leak prevention)
  route-leak-detection-mitigation detects mistakes (last-call request?)
  aspa-verification - adoption call requested now.

4) sriram - drop-invalids
  "What is this thing?" - Drop invalid if valid or not found less specific
  exists.

  Addressed questions / comments from list/readers:
    AS0 ROA wording adjusted/included.
      "if as0, always drop"
    default route has to be excluded (right?)
      yes, must be excluded...

    See slide 5 about multihoming picturey
      "multihoming customers have to understand RPKI.. because it's important"

    Draft updated to -02 wee!
    Some possible feasibliity discussion coming in slides 7-8
    (please review slides for details)

    Audience Questions:
          Job: not super sure if this draft is going to be helpful?
            OV deployment(s) seem to be reject invalid and go make people get
            fixed. DISR might have been more 'ease into this..' instead of
            using your network position to fix problems. Polite operators note
            that just calling the offenders is probably the best path here.

    warren: notes that the NOC / IETF is dropping invalids, no one has noticed
    and no tickets... so it's quite possible that there's no impact.

    rudiger volk: timing is important here, but AS0 is not special in anyway...
    it's just that we know that as0 routes should never exist.

5) George Michaelson - Deploying Validation Reconsidered
  Problem statement again - need to get coordination between RPs, Signers and
  such.. in order to move the new validation considered world.

  Running over the possible changes required: "How hard is this?" - not
  entirely sure, lots of disagreement. There's a bunch of RIR angst about this
  problem... Please define a date to get stuff done:
        new model of validation
        new OID
        get discussion/etc at *NOG so operators can/should get their ducks
        lined up.

    a draft exists, for folk to read/review.
    Asking for adoption of this draft so we can discuss/move-forward.

    There's angst here about deadlock, and that without anyone moving forward
    we'll be stuck here 'forever'. code must change tals must change

    Since the deployment appears small today (<500 or so?) perhaps NOW is the
    time to move not when we have 'thousands' or 'tens of thousands' of users?

    Please adopt this draft... so we can move forward.
    CHAIR: "PLEASE ASK FOR ADOPTION ON-LIST"

    George is asking also about a date ... what is a good date? how far out?

--------------------------------------------------------------