Network Working Group                                   E. Levy-Abegnoli
Internet-Draft                                             Cisco Systems
Intended status: Standards Track                            June 2, 2009
Expires: December 4, 2009


                  Preference Level based Binding Table
                 <draft-levy-abegnoli-savi-plbt-00.txt>

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 4, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   A trusted database located on the first switch, storing the binding
   between end-nodes Link-Layer-Addresses (LLA) and their IPv6 addresses
   would be an essential part of source address validation.  To build



Levy-Abegnoli           Expires December 4, 2009                [Page 1]


Internet-Draft    Preference Level based Binding Table         June 2009


   such a database, one must:
   1.  Describe the source of information
   2.  How the information is maintained
   3.  How the collisions are resolved.
   and solutions would differ by one or more of these elements.

   While also getting its binding data from NDP, this draft proposes an
   alternate to "first-come-first-serve" basis [fcfs]), by specifying a
   preference algorithm to deal with collisions.  Instead of the
   simplistic first-come first serve collision handling, the proposed
   algorithm relies on the following criterias to choose between two
   coliding entries:
   o  Where the entries were learnt from (access port, trunk port, etc)
   o  Credential carried by the entries (CGA proof, certificate, mac/lla
      match, etc.)
   o  State of the current entry
   o  Age of the entry

   Since the state of the entry is one of the element of the algorithm,
   this draft also describes a tracking mechanism to maintain entries in
   states where the preference algorithm can enable end-nodes movement.






























Levy-Abegnoli           Expires December 4, 2009                [Page 2]


Internet-Draft    Preference Level based Binding Table         June 2009


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Goals and assumptions  . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Definitions and Terminology  . . . . . . . . . . . . . . .  4
     2.2.  Scenarios considered . . . . . . . . . . . . . . . . . . .  4
   3.  Source of information  . . . . . . . . . . . . . . . . . . . .  5
   4.  Binding table  . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Data model . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.2.  Entry preference algorithm . . . . . . . . . . . . . . . .  7
       4.2.1.  Preference Level . . . . . . . . . . . . . . . . . . .  7
       4.2.2.  Entry update algorithm . . . . . . . . . . . . . . . .  8
       4.2.3.  Enabling slow movement . . . . . . . . . . . . . . . .  8
     4.3.  Binding entry tracking . . . . . . . . . . . . . . . . . .  9
     4.4.  Binding table state machine  . . . . . . . . . . . . . . .  9
   5.  Configuration  . . . . . . . . . . . . . . . . . . . . . . . . 12
     5.1.  Switch port configuration  . . . . . . . . . . . . . . . . 12
     5.2.  Binding table configuration  . . . . . . . . . . . . . . . 12
   6.  Bridging NDP traffic . . . . . . . . . . . . . . . . . . . . . 13
     6.1.  Bridging DAD NS  . . . . . . . . . . . . . . . . . . . . . 13
     6.2.  Bridging other NDP messages  . . . . . . . . . . . . . . . 17
   7.  Normative References . . . . . . . . . . . . . . . . . . . . . 18
   Appendix A.  Contributors and Acknowledgments  . . . . . . . . . . 18
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18



























Levy-Abegnoli           Expires December 4, 2009                [Page 3]


Internet-Draft    Preference Level based Binding Table         June 2009


1.  Introduction

   To populate the first-switch binding table, this document propose a
   scheme based on NDP snooping, and introduces a preference level
   algorithm to deal with collisions.  It is organized as follows:
   o  Section 3) describes the source of information and Section 4.1 the
      binding table data model.  While the proposed approach would fit
      multiple sources of bindings, such as Neighbor Discovery Protocol
      (NDP) snooping, DHCP (snooping), MLD (snooping) and static
      entries, this document focuses on NDP.  Section 6) details how an
      L2-switch can leverage NDP DAD messages to populate it table
   o  Entries lifecycle must be fully controlled by the switch.
      Section 4.4 details how this is achieved.  This include some
      mechanism to test entries reachability, described in Section 4.3.
   o  The resolution of collisions is detailed in Section 4.2.


2.  Goals and assumptions

   The primary goal of the proposed approach is for the layer2 switch to
   maintain an accurate view of the nodes attached to it, directly or
   via another layer2 switch.  This view is referred to as the switch
   "binding table".  The following goals are also looked at:
   o  Enable nodes (slow) movement
   o  Prevent binding address spoofing

   The binding table includes the nodes IPv6 address, link-layer
   address, switch port they were leanrt from, whether an access port or
   a trunk port (port to another switch).

   This binding table is the keystone to detect and arbitrage in case of
   collisions.  It also brings a couple of interesting by-products: it
   can provide some address spoofing mitigation, and it can be used to
   limit multicast traffic forwarding.

2.1.  Definitions and Terminology

   The following teminology is being used:

   plb-switch:  A switch that implement the algorithms described in this
      draft


2.2.  Scenarios considered

   Three main scenarios are considered in this document:





Levy-Abegnoli           Expires December 4, 2009                [Page 4]


Internet-Draft    Preference Level based Binding Table         June 2009


   1.  Scenario A: a plb-switch connected to a set of L3-nodes, whether
       hosts of routers


     +------+
     |HostA +-----------------+
     +------+                 |
                              |
                        +-----+------+
     +------+           |            |
     |HostB +-----------+ SWITCHA    +
     +------+           |            |
                        +-----+------+
                              |
     +------+                 |
     |HostC +-----------------+
     +------+


   2.  Scenario B: a plb-switch SWITCH_A connected to L3-nodes and to
       another plb-switch switch_B

     +------+                                                   +------+
     |HostA +-----------------+                  +--------------+HostD |
     +------+                 |                  |              +------+
                              |                  |
                        +-----+------+     +-----+------+
     +------+           |            |     |            |       +------+
     |HostB +-----------+ SWITCHA    +-----+ SWITCHB    +-------|HostE |
     +------+           |            |     |            |       +------+
                        +-----+------+     +-----+------+
                              |                  |
     +------+                 |                  |              +------+
     |HostC +-----------------+                  +--------------+HostF |
     +------+                                                   +------+

   3.  Scenario C: a plb-switch switch_A connected to L3-nodes and to a
       non plb-switch switch_B


3.  Source of information

   Basically, there should be the following source of data for filling
   the table:
   o  Neighbor Discovery, address initialization: when a node performs
      address initialization, it sends a DAD (Duplicate Address
      Detection) Neighbor Solicitation (NS) message.  The procedure is
      described in RFC4861 [RFC4861] and RFC4862 [RFC4862].  This



Levy-Abegnoli           Expires December 4, 2009                [Page 5]


Internet-Draft    Preference Level based Binding Table         June 2009


      message does not contain a link-layer address option.  However,
      the switch can find out the MAC address the DAD NS was sent from,
      as well as the port it was received on, and use that to create an
      entry in the binding table.  It can also issue its own DAD NS to
      the sender to trigger it to send a Neighbor Advertisement (NA)
      carrying the binding information needed.  Note that even nodes
      which get their address from DHCPv6 should perform DAD to validate
      it.  Quite commonly, upon finishing address initialization, a node
      will send an unsolicited NA (to all-nodes) to announce the
      address.  The switch can learn the binding from this message as
      well.
   o  Neighbor Discovery, address resolution: during the address
      resolution exchange, the owner of an address is going to announce
      the binding with its link-layer address.  This information could
      be seen by the switch, and used to fill the binding table.
   o  Neighbor Discovery, other messages: there are plenty of NDP
      messages that carry the binding between the IPv6 layer3 address,
      and the link-layer address in a Source Link-Layer Address option
      (SLLA).These messages can also be snooped by the layer2 switch to
      learn the binding.

   Note that the binding information can also be learnt from other
   protocol sources such as DHCP or even be configured statically on the
   switch.  This is outside the scope of this document to detail how
   this would be performed.  However, binding table entries learnt by
   non-NDP methods might collide with entries leant via NDP snooping,
   and section [address collision resolution] describes how to prefer
   one entry over another one.


4.  Binding table

   A table is maintained on the switch(es) that binds layer3 (IPv6)
   address and link-layer address (MAC).

4.1.  Data model

   A record of the binding table should contain the following
   information:
   o  v6addr: Layer 3 address
   o  zoneid : the zone id
   o  port: layer2 interface from which the entry was learn
   o  vlanid : LAN identifier the address belong to.
   o  lla: Link-Layer Address (mac)
   o  preflevel: Preference level for this entry
   o  state: entry state





Levy-Abegnoli           Expires December 4, 2009                [Page 6]


Internet-Draft    Preference Level based Binding Table         June 2009


   o  lifetime: Lifetime of the entry
   o  timestamp: Time of last update

   A global scope address should be unique across ports and vlans.  A
   link-local scope address is unique within a vlan.  Therefore, the
   database is a collection of l3-entries, keyed by ipv6-address and
   zoneid.  A zoneid is a function of the scope of the address (LINK-
   LOCAL, GLOBAL) and the vlanid:
   o  for scope GLOBAL, zoneid = 0
   o  for scope LINK-LOCAL, zoneid = vlanid

   A collision between an existing entry and a candidate entry would
   occur if the two entries have the same v6addr and zoneid.These field
   are referred to as the "key"

   The fields of an entry other than the key (port, vlanid, lla, etc)
   will the referred to as attributes.  Changing attributes of an entry
   require complying with the Entry update algorithm described in
   Section 4.2.

4.2.  Entry preference algorithm

4.2.1.  Preference Level

   The preference level (preflevel) is an attribute of an entry in the
   binding table.  It is setup when the entry is learnt, based on where
   it is learnt from, the credentials associated with it and other
   criterias to-be-defined.  The preflevel is used to arbitrage between
   two candidate entries (with identical key) in the binding table.  The
   higher the preference level is, the more preferred the entry.

   One of the key elements of the preflevel associated to an entry is
   the port is was learnt from.  For example, an entry would have
   different preflevels if it is learnt from:
   o  An access port: it typically attaches to end-nodes
   o  A trunk port: it attaches to a non plb-switch
   o  A trusted access port: it attaches to trusted end-nodes
   o  A trusted trunk: it attaches to another plb-switch

   Another important element is the credentials associated with this
   learning.  An entry could be associated with cryptographic proof
   (CGA), and/or the LLA learnt could match the source MAC of the frame
   from which it was learnt.

   The following preflevel values have been identified (from lowest to
   highest):





Levy-Abegnoli           Expires December 4, 2009                [Page 7]


Internet-Draft    Preference Level based Binding Table         June 2009


   o  LLA_MAC_MATCH: LLA (found in NDP option) and MAC (found at layer2)
      are identical;
   o  TRUNK_PORT: the entry was learnt from a trunk port (connected to
      another switch)
   o  ACCESS_PORT: the entry was leant from an access port (connected to
      a host)
   o  TRUSTED_ACCESS: The entry was learnt from a trusted port
   o  TRUSTED_TRUNK: The entry was learnt from a trusted trunk
   o  DHCP_ASSIGNED: the entry is assigned by DHCP
   o  CGA_AUTHENTICATED: The entry is CGA authenticated, per [RFC3972]
   o  CERT_AUTHENTICATED: the entry is authenticated with a certificate
   o  STATIC: this is a statically configured entry per [RFC3971].

   An entry can sum up preference values, for instance it could be
   TRUNK_PORT + LLA_MAC_MATCH.  However, the preference level value
   should be encoded in such a way that the sum of preferences 1 to N-1
   is smaller that preference N. For example:
   o  An entry learnt from a trunk port with matching lla/mac would have
      a bigger preflevel than one simply matching lla/mac.
   o  However an entry learnt from an access port with matching mac/lla
      would have a smaller preflevel than an entry learnt from a trusted
      port.

4.2.2.  Entry update algorithm

   Once an entry is installed in the binding table, its attributes
   cannot be changed without complying with this "entry update
   algorithm".

   The algorithm is as follows, starting with rule_1, up to rule_5, in
   that order until one rule is satisfied:
   1.  Updating of an entry is allowed when the preflevel carried by the
       change is bigger than the preflevel stored in the entry.
   2.  Updating of an entry is denied if the preflevel carried by the
       change is smaller than the preflevel stored in the entry
   3.  Updating of an entry in state INCOMPLETE is denied if the change
       is not associated with the port where this entry was first learnt
       from.
   4.  Updating of an entry is denied if the preflevel carried by the
       change is equal to the preflevel stored in the entry, and the
       entry is in state REACHABLE or VERIFY (see Section 4.4)
   5.  Updating an entry is allowed otherwise.

4.2.3.  Enabling slow movement

   It is quite a common scenario that an end-node is moving from one
   port of the switch to another one, or to a different switch.  It is
   also possible that the end-node is updating its hardware and start



Levy-Abegnoli           Expires December 4, 2009                [Page 8]


Internet-Draft    Preference Level based Binding Table         June 2009


   using a different MAC address.  There are two paradoxical goals with
   the trusted binding table: insure entry ownership and enable
   movement.  The former drives the locking of the address, mac, and
   port alltogether, and prevent updates other than on the base of
   preference.  It also works a lot better when entry lifetime is very
   long or infinite.  The latter requires that a node can easily move
   from one port to another one, from one mac to another one.  Enforcing
   address ownership will tend to lead to rejection of any movement and
   classify it as an attack.

   The algorithm described in Section 4.2.2, conbined with the
   capability to manage entry states reviewed in Section 4.4 enables
   end-nodes to move from on switch port to another port (or one mac to
   another) under three scenarios:
   1.  The node disconnect from its original port at least for T1 (T1 is
       configurable as described in Section 5. and the move does not
       lead to a less preferred entry
   2.  The node disconnect at least for T3 (T3 is also configurable).
   3.  The entry seen after the node moves is "prefered", for instance
       because the node moved from an ACCESS_PORT to a TRUSTED_PORT.

   Note that movement driven bu T1 is tied up to the accuracy of the
   REACHABILIY state.  Maintaining this state with some entry tracking
   mechanism as described in Section 4.3 is going to it a lot more
   efficient.

4.3.  Binding entry tracking

   In order to maintain an accurate view of the devices location and
   device state, which is a key element of the binding table entry
   preference algorithm, an entry tracking mechanism can be enabled.
   The tracking of entries is performed on a per-port basis, per IPv6
   address basis, by "layer-2 unicasting" DAD NS on the port the address
   was first learnt from, to the Destination MAC (DMAC) known to be
   bound to that address.

   The DMAC can be learnt from the LLA option carried in some NDP
   messages, configured statically, or, in last resort, from the source
   mac (SMAC) address of NDP messages referring to that address.  In the
   case of NDP messages not sourced with UNSPECIFIED address, that would
   be the source address of the messages.  In the case of DAD NS, that
   would be the target address

4.4.  Binding table state machine

   The entry lifecycle is driven by the switch, not by NDP: this is
   especially important to insure that entries are kept as long as
   needed in the table rather than following the rule of the ND cache,



Levy-Abegnoli           Expires December 4, 2009                [Page 9]


Internet-Draft    Preference Level based Binding Table         June 2009


   dictated by other requirements.

   Typically, an entry will be created INCOMPLETE, move to REACHABLE
   when binding is known, move back and forth from REACHABLE to VERIFY
   if tracking is enabled, at some point move to STALE when the device
   (the address owner) stop talking on the link.  The entry could stay
   in that state for very long, sometime forever depending on the
   configuration (see "configuration" section.

   Four states are defined:
   1.  INCOMPLETE: an entry is set in this state when it does not have
       the L3/L2 binding yet.  This happens when an entry is created
       without the LLA.  Typically, such entry is created when an end-
       node coming up sends a DAD NS to verify address uniqueness (DAD
       NS don't carry SLLA option).  Creating an entry in that state
       still requires an L3 address, found in the target field of the NS
       DAD, or in the source field for any other message.  While the
       entry is created INCOMPLETE, the switch waits T0 to avoid
       collision.  Then it unicast a DAD NS on the port were the first
       message was seen, to the SMAC address found in the received
       frame.  In the absence of a response, the DAD NS is retried every
       T0 up to R0 times.  There are two ways to get out of that state
       *  After R0 retries without seeing a response, the entry is
          deleted
       *  A response is received, carrying an SLLA option.  The entry
          moves into REACHABLE.
       *  The LLA is received in any other message seen on that port.
          The entry moves into REACHABLE.
   2.  REACHABLE: As soon as the LLA is learnt, the entry moves to
       REACHABLE and, if tracking is enabled, a timer T1 is started
       ("configuration" Section 5).  Upon T1 expiration, the entry moves
       into VERIFY state.  If tracking is not enabled, the entry remains
       T6 at most in that state without any reachability hint (obtain
       with NDP inspection or other features, before moving to STALE.
   3.  VERIFY: In this state, a binding is known (L3/L2) but must be
       verified.  A DAD NS is unicast to the L3/L2 destinations and a
       timer T2 is started.  There are two ways to get out of that
       state:
       *  T2 expires: the entry is moved to STALE after R2 retries.
       *  NA is received: the binding can move back to REACHABLE
   4.  STALE: when getting into that state; a timer T3 is started based
       on the configuration (see "configuration" section).  Upon expiry,
       the entry is deleted.

   The binding table state machine is as follows:






Levy-Abegnoli           Expires December 4, 2009               [Page 10]


Internet-Draft    Preference Level based Binding Table         June 2009


         T0                                                  E1
       +------+ send DAD-NS                             +----------+
       |      | increment r0                            |          |
       |      V                                         |          |
   +---+--------------+                  +--------------+---+      |
   |                  |      E1          |                  |<-----+
   |  INCOMPLETE      +----------------->|  REACHABLE       |
   |                  |           T1     |                  |
   |                  |   /--------------+                  |
   +-----+------------+  /               +------+-----------+
         |R0            /                  A    |     A
         |             /                  /     |     |
         V            /                  /      |T1   |E1
       delete        /                  /       |     |
                    V                  /        V     |
   +------------------+         E1    /  +------------+-----+
   |                  +---------------   |                  |T3
   |   VERIFY         |         R2       |   STALE          +---> delete
   |                  +----------------->|                  |
   |                  |                  |                  |
   +---+--------------+                  +------------------+
       |      A
       |      | send DAD-NS
       +------+ increment r2
          T2

   The following events are driving the state transitions:
   o  E1: A link-layer -address (LLA) was received, for the L3 address
   o  T0: Timer expired.  Time an entry wait for any binding message
      (NA, etc.) in INCOMPLETE state before another NS is sent, up to
      INCOMPLETE_MAX_RETRIES
   o  T1: Timer expired.  Time an entry stays in REACHABLE state until
      we start verifying (polling) it or moving to STALE.
   o  T2: Timer expired.  Time an entry wait for any binding message
      (NA, etc.) in VERIFY state before another NS is sent, up to
      VERIFY_MAX_RETRIES
   o  T3: Timer expired.  Time an entry is left in STALE state until it
      is deleted or a binding message is received.
   o  R0: Exhaustion of INCOMPLETE_MAX_RETRIES
   o  R2: Exhaustion of VERIFY_MAX_RETRIES

   Default values are as follows:
   o  T0: 3 seconds
   o  T1: 300 seconds
   o  T2 : 10 seconds
   o  T3 : 24 hours





Levy-Abegnoli           Expires December 4, 2009               [Page 11]


Internet-Draft    Preference Level based Binding Table         June 2009


   o  INCOMPLETE_MAX_RETRIES: 3
   o  VERIFY_MAX_RETRIES: 3

   All the default values should be overridden-able by configuration.


5.  Configuration

5.1.  Switch port configuration

   Qualifying a port of the switch is of primary importance to influence
   the "entry update algorithm" (see Section 4.2).  The switch
   configuration should allow the following values to be configured on a
   per-port basis:
   o  TRUNK_PORT: the port of the switch is connected to another switch
      port, that is not a plb-switch.
   o  ACCESS_PORT: the port of the switch is connect to an end-node.
   o  TRUSTED_PORT: the port of the switch is connected to a trusted
      end-node.
   o  TRUSTED_TRUNK: the port of the switch is connected to another plb-
      switch.

5.2.  Binding table configuration

   The following elements, acting on the binding table behavior, should
   be configurable, globally or on a per-port basis:
   1.  T0: (global) frequency at which the switch is unicast DAD NS to
       obtain an INCOMPLETE entry link-layer address.  Default is three
       seconds.  Associated configuration elements are:
       *  INCOMPLETE_MAX_RETRIES (R0), which is the maximum number of NS
          sent by the switch before deleting the entry.  Default is 3.
   2.  T1: (per-port) maximum reachable lifetime is the time an entry is
       kept in REACHABLE without sign of activity, before transitioning
       to VERIFY (if "tracking on") or STALE otherwise.  T1 may be set
       to "infinite".  Default value is 300 seconds.
   3.  Tracking on/off: (per-port) when turned on, it enables the
       tracking of entries in the binding table.  Reachability of
       entries is then tested every T1 by unicasting (at layer2) DAD NS
       (unless reachability is established indirectly by NDP
       inspection).  Associated configuration elements are:
       *  T2: (global) verify-interval is the waiting time between re-
          sending the DAD NS up to R2 times.  Default value for T2 is 10
          seconds
       *  VERIFY_MAX_RETRIES (R2), is the maximum number of DAD NS the
          switch will unicast to the entry owner before moving the entry
          to STALE.  Default value for R2 is three times.





Levy-Abegnoli           Expires December 4, 2009               [Page 12]


Internet-Draft    Preference Level based Binding Table         June 2009


   4.  T3: (per-port= maximum stale lifetime is the time an entry is
       kept in STALE without sign of activity, before being deleted from
       the binding table.  T3 may be set to "infinite".  Default value
       is 24 hours.


6.  Bridging NDP traffic

   One important aspect of an "NDP-aware" switch is to efficiently
   bridge the NDP traffic to destinations.  In some areas, the switch
   might have a behavior different from a regular non plb-switch:
   1.  When intercepting an NDP message carrying binding information,
       the switch can lookup its binding table, decide the message is
       not worth bridging and drop it.  This may be the case when a
       binding entry already exist and is not consistent with the one
       being received.
   2.  When the received message is a DAD NS for a target the switch has
       a pending (incomplete) entry, received from a different port, the
       switch may decide to drop it.  If it came "second", in the
       (small) window during which the switch is attempting to track the
       entry, it suggest this might be an attack.
   3.  When intercepting a multicast NDP message, such as a DAD NS, for
       which it already has an entry in its binding table, the switch
       may decide to forward it only to the target owner.
   4.  When receiving a DAD NS or other multicast NDP messages, a switch
       enable for MLD snooping might decide to prevent the bridging of
       the message on trunk ports to other switches (based on MLD report
       received on these port).  The plb-switch however may decide to
       force a copy of these messages on these trunks, to insure the
       other switch is able to populate its own binding table.  This
       behavior should be configurable on a per-port basis.

   The general bridging algorithm is as follows.  When an NDP message is
   received by the layer2 switch, the switch extracts the link-layer
   information, if any.  If no LLA is provided, the switch should bridge
   normally the message to destination.  If LLA is provided, the switch
   can lookup its binding table for this entry.  If no entry is found,
   it creates one, and bridges the message normally.  If an entry is
   found with attributes consistent with the ones received (port,
   zoneid, etc), it should bridge the message normally.  If the
   attribute are not consistent, and a change is allowed (see
   Section 4.2), it should update the attributes and bridge the message.
   If the change is disallowed, it should drop the message.

6.1.  Bridging DAD NS

   Bridging DAD NS is critical to both security and binding table
   distribution.  Flows below study some relevant cases.



Levy-Abegnoli           Expires December 4, 2009               [Page 13]


Internet-Draft    Preference Level based Binding Table         June 2009


   In scenario A, the switch SWITCH_A has only end-nodes connected to
   it.

   Scenario A:

   +--------+          +--------+          +--------+         +--------+
   | host 1 +          |SWITCH_A|          |host 2  |         | host 3 |
   +--------+          +--------+          +--------+         +--------+
       |                   |                   |                  |
       |                switch up              |                  |
       |                   |    DAD NS tgt=X   |                  |
       |                   |<------------------+                  |
       |                no hit                 |                  |
       |                X stored, pref=ACCESS  |                  |
       |                   |                   |                  |
       |  DAD NS tgt=X conditional forward (1) |                  |
       |<------------------O------------------------------------->|
       |  NA               |                   |                  |
       |------------------>|                   |                  |
       |                 hit, newpref=ACCESS   |                  |
       |                 do not replace        |                  |
       |                 drop                  |                  |
       |                   |                   |                  |
       |                   |   ...             |                  |
       |                   |                   |    DAD NS tgt=X  |
       |                   |<-------------------------------------|
       |                 hit, newpref=ACCESS   |                  |
       |                 forward to owner      |                  |
       |                   |------------------>|                  |
       |                   |                   |                  |
       |   DAD NS tgt=X conditional forward (1)|                  |
       |<------------------|                   |                  |
       |                replace                |                  |
       |  NA               |                   |                  |
       |<------------------|                   |                  |
       |                   |                   |                  |
       |                   |                   |                  |

   When nodes come up, the switch is assumed to be already up.  As the
   result of it, since the switch stores entries for all addresses it
   snoops, it is going to have a fairly accurate view of the nodes
   (addresses).  Host 2 comes up, and sends a DAD NS for target X,
   intercepted by the switch.  Switch_A does not have X in its binding
   table, stores it (INCOMPLETE), and bridges it to other nodes host1
   and host3.  If MLD snooping is in effect, the switch might decide not
   to forward it at all (no other known group listener for the
   solicited-node multicast group), or only to a few hosts.  Regardless
   of MLD snooping, flow (1) is not absolutely "useful" and could even



Levy-Abegnoli           Expires December 4, 2009               [Page 14]


Internet-Draft    Preference Level based Binding Table         June 2009


   be harmful.  If we assume the switch knows all addresses of the link/
   vlan, then it knows nobody owns yet this address.  In that case,
   sending it to other hosts would be an invite for an attack.  There is
   a tradeoff between two issues which are not equally probable: a risk
   to break DAD and a risk to be vulnerable to a DoS on address
   resolution.

   The latter is well understood: should the switch broadcast DAD NS, an
   attacker can immediately claim ownership with an NA.  As far as the
   former, it would happen if following conditions are met:
   1.  The initial DAD NS for X, and any subsequent NDP packets (NA to
       all-nodes, etc) were missed by the switch.
   2.  In addition:
       *  the newly received NS carries a duplicate address.
       *  or host2 is the attacker, however he could not have seen X
          yet, since the switch has not.  So he would have to know it
          from non-trivial means.

   In scenario B, SWICTH_A is also connected to a second switch
   SWITCH_B, which runs the same logic to populate its own binding
   table.






























Levy-Abegnoli           Expires December 4, 2009               [Page 15]


Internet-Draft    Preference Level based Binding Table         June 2009


   Scenario B:

   +--------+          +--------+          +--------+         +--------+
   | host 1 +          |SWITCH_A|          |SWITCH_B|         | host 2 |
   +--------+          +--------+          +--------+         +--------+
       |                   |                switch up             |
       |                   |                   |    DAD NS tgt=X  |
       |                   |                   |<-----------------|
       |                   |             No hit, no trunk up      |
       |               switch up         X stored in Bt, pref= ACCESS
       |                   |                   |                  |
       |  DAD NS tgt=X     |                   |                  |
       |------------------>|                   |                  |
       |               no hit                  |                  |
       |               X stored, pref=ACCESS   |                  |
       |               forward on trunk (2)    |                  |
       |                   |------------------>|                  |
       |                   |                 hit (host2)          |
       |                   |                   | forward to owner |
       |                   |                   |----------------->|
       |                   |                   |    NA            |
       |                   |                   |<-----------------|
       |                   |                  hit, owner          |
       |                   |     NA           forward on trunk    |
       |                   |<------------------|                  |
       |                hit, newpref=TRUSTED_TRUNK                |
       |                replace                |                  |
       |  NA               |                   |                  |
       |<------------------|                   |                  |
       |                   |                   |                  |
       |                   |                   |                  |

   When SWITCH_A comes up, it may come after SWITCH_B. In this case, it
   is unaware about end-nodes attached to SWITCH_B. SWITCH_B however
   knows all of them, with the same assumptions as in scenario A. Upon
   receiving a DAD NS for target X, and in the absence of a hit,
   SWITCH_A creates an INCOMPLETE entry and forwards it to SWITCH_B.
   1.  If SWITCH_B has it in its table, then it can forward it only on
       the interface of X's owner (host2).  Host2 responds, and response
       reaches SWITCH_A. SWITCH_A has already an entry for X associate
       with interface to host1, while this one is received from the
       trunk.  The trunk is a TRUSTED_TRUNK, hence entries received over
       it are preferred.  SWITCH_A updates its binding table and
       propagate to host1.  This is the case of a valid address
       duplication.
   2.  If SWITCH_B receiving the DAD NS over the trunk, does not have X
       in its table, it can drop the NS, while creating an INCOMPLETE
       entry for X. Or it can broadcast locally (with the same reasoning



Levy-Abegnoli           Expires December 4, 2009               [Page 16]


Internet-Draft    Preference Level based Binding Table         June 2009


       as for the previous scenario).

   Scenario C connects SWITCH_A to a SWITCH_B that does not run the same
   binding table alrorigthm (referred to as a non plb-switch).  In this
   scenario, SWITCH_A forwarding on the trunk a DAD NS for target X.
   Configuration should tell whether any response coming from SWITCH_B
   is to be trusted (in the lack of better credential such as CGA/RSA
   proof).  If SWITCH_B is fully trusted, then the trunk is configured
   as "TRUSTED_TRUNK" and scenario B applies.  Otherwise, the trunk is
   configured as "TRUNK" and response is ignored.

   Scenario C:

   +--------+          +--------+          +--------+         +--------+
   | host 1 +          |SWITCH_A|          |SWITCH_B|         | host 2 |
   +--------+          +--------+          +--------+         +--------+
       |                   |                switch up             |
       |                   |                   |   DAD NS tgt=X   |
       |                   |                   |<-----------------|
       |                   |                   |                  |
       |               switch up               |                  |
       |                   |                   |                  |
       |  DAD NS tgt=X     |                   |                  |
       |------------------>|                   |                  |
       |               no hit                  |                  |
       |               X stored, pref=ACCESS   |                  |
       |                   |------------------>|                  |
       |                   |                   |   to group       |
       |                   |                   |----------------->|
       |                   |                   |   NA             |
       |                   |                   |<-----------------|
       |                   |     NA            |                  |
       |                   |<------------------|                  |
       |                hit, newpref=TRUNK     |                  |
       |                do not replace         |                  |
       |                drop NA                |                  |
       |                   |                   |                  |
       |                   |                   |                  |
       |                   |                   |                  |


6.2.  Bridging other NDP messages

   When running the proposed binding table populate algorithm, switches
   are expected to have an accurate view of end-nodes attached to them.
   While scenario C is problematic, scenario A and B are clearer.  If a
   switch has an entry in its table that conflicts with binding observed
   in an NDP message just received, it should drop the message (if new



Levy-Abegnoli           Expires December 4, 2009               [Page 17]


Internet-Draft    Preference Level based Binding Table         June 2009


   data has a smaller preflevel) or update its entry and bridge the
   message.

   If the switch does not have such entry, it should create the entry
   and bridge the message, including to trunks.

   In the case of multicast messages, it should bridge it on trunks
   regardless of group registration, to give a chance to other switch to
   buildup a more accurate binding table.


7.  Normative References

   [RFC3971]  Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure
              Neighbor Discovery (SEND)", RFC 3971, March 2005.

   [RFC3972]  Aura, T., "Cryptographically Generated Addresses (CGA)",
              RFC 3972, March 2005.

   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
              September 2007.

   [RFC4862]  Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless
              Address Autoconfiguration", RFC 4862, September 2007.

   [fcfs]     Nordmark, E. and M. Bagnulo, "First-Come First-Serve
              Source-Address Validation Implementation",
              draft-ietf-savi-fcfs-01 I-D, March 2009.


Appendix A.  Contributors and Acknowledgments

   This draft benefited from the input from: Pascal Thubert.


Author's Address

   Eric Levy-Abegnoli
   Cisco Systems
   Village d'Entreprises Green Side - 400, Avenue Roumanille
   Biot-Sophia Antipolis - 06410
   France

   Email: elevyabe@cisco.com






Levy-Abegnoli           Expires December 4, 2009               [Page 18]