draft-ietf-trill-rbridge-protocol-02

TRILL Working Group                                        Radia Perlman
INTERNET-DRAFT                                                       Sun
                                                             Silvano Gai
                                                           Nuova Systems
                                                          Dinesh G. Dutt
                                                                   Cisco
Expires: July 2007                                          January 2007



                 Rbridges: Base Protocol Specification
                 --------- ---- -------- -------------
               <draft-ietf-trill-rbridge-protocol-02.txt>


Status of This Document

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Distribution of this document is unlimited. Comments should be sent
   to the TRILL working group mailing list.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Abstract

   RBridges provide the ability to have an entire campus, with multiple
   physical links, look to IP like a single subnet. The design allows
   for zero configuration of switches within a campus, optimal pair-wise
   routing, safe forwarding even during periods of temporary loops, and
   the ability to cut down on ARP/ND traffic. The design also supports
   VLANs, and allows forwarding tables to be based on RBridge
   destinations (rather than endnode destinations), which allows
   internal routing tables to be substantially smaller than in
   conventional bridge systems.


Radia Perlman et al.                                            [Page 1]


INTERNET-DRAFT                                          RBridge Protocol


Acknowledgements

   'Many people have contributed to this design, including the working
   group co-chairs Donald Eastlake 3rd and Erik Nordmark, and many other
   members of the working group such as, in alphabetic order, Alia
   Atlas, Stewart Bryant, Dino Farinacci, Don Fedyk, Eric Gray, Sanjay
   Sane, and Joe Touch. We invite you to join the mailing list at
   http://www.postel.org/rbridge.


Table of Contents

      Status of This Document....................................1
      Abstract...................................................1
      Acknowledgements...........................................2

      1. Introduction............................................3
      2. Detailed Rbridge Design.................................7
      2.1. Link State Protocol...................................7
      2.1.1. Separate Instances..................................7
      2.1.2. Multiple Rbridge IS-IS Instances....................8
      2.2. Distribution Tree Calculation.........................9
      2.3. Pruning the Ingress Rbridge Tree.....................10
      2.4. Designated Rbridge...................................12
      2.5. Wiring Closet Topology...............................13
      2.6. Learning Endnode Location............................14
      2.7. Forwarding Behavior..................................14
      2.7.1. Receipt of a Native Packet.........................14
      2.7.2. Receipt of an In-transit Packet....................15
      2.7.2.1. Flooded Packet...................................15
      2.7.2.2. Unicast Packet...................................16
      2.7.2.3. IS-IS Packet.....................................16
      2.8. IGMP Learning........................................16
      2.9. RBridge Nicknames....................................16
      2.10. Forwarding Header on 802 Links......................17
      2.11. Handling ARP/ND Queries.............................18
      2.12. Discovering IP Multicast Routers....................20
      2.13. Assuring Freshness of Endnode Information...........20
      3. Rbridge Addresses, Parameters, and Constants...........21
      4. Security Considerations................................22
      5. IANA Considerations....................................22
      6. Conclusions............................................22
      7. References.............................................23
      7.1. Normative References.................................23
      7.2. Informative References...............................23
      Disclaimer................................................24
      Additional IPR Provisions.................................24
      Author's Address..........................................25
      Expiration and File Name..................................25



Radia Perlman et al.                                            [Page 2]


INTERNET-DRAFT                                          RBridge Protocol


1. Introduction

   In traditional IPv4 and IPv6 networks, each subnet must have a unique
   prefix.  This means that a node that moves from one subnet to another
   must change its IP address, and a node in multiple subnets must have
   multiple addresses.  It also means that a company with many subnets
   (separated by routers) will have difficulty making full use of its IP
   address block (since any subnet not fully populated will waste
   addresses), and IP routers require significant configuration. Bridges
   avoid these problems because bridges can transparently glue many
   physical links into what appears to IP to be a single LAN.

   However, bridge forwarding via the spanning tree using the layer 2
   header has some disadvantages:

   o The spanning tree limits which links can be used, and therefore
      concentrates traffic onto selected links

   o Forwarding based on a header without a TTL is dangerous, because
      temporary loops might arise due to topology changes, lost spanning
      tree messages, or components such as repeaters coming up

   o Forwarding cannot be pair-wise shortest paths, but instead whatever
      path remains after the spanning tree eliminates redundant paths

   We define the term "campus" to be the set of links connected by any
   combination of RBridges and bridges. A campus appears to IP nodes to
   be a single subnet.

   This document presents the design for RBridges (routing bridges),
   which combines the advantages of bridges and routers [RBridges].
   Like bridges, RBridges are zero configuration, and are transparent to
   IP nodes.  Like routers, RBridges forward on pair-wise shortest
   paths, and do not have dangerous behavior during temporary loops.
   RBridges have the additional advantage that they can optimize ARP
   (IPv4) [RFC826] and ND (IPv6) [RFC2461] by avoiding the broadcast /
   multicast behavior of the queries.

   RBridges are fully compatible with current bridges as well as current
   IPv4 and IPv6 routers and endnodes.  They are as invisible to current
   IP routers as bridges are, and like routers, they terminate a bridged
   spanning tree.

   The main idea is to have RBridges run a link state protocol amongst
   themselves. This enables them to have enough information to compute
   pairwise optimal paths for unicast, and to calculate distribution
   trees for delivery of packets to unknown destinations, or multicast /
   broadcast packets.

   RBridges must learn the location of endnodes. They learn the location


Radia Perlman et al.                                            [Page 3]


INTERNET-DRAFT                                          RBridge Protocol


   and layer 2 addresses of attached nodes from the source address of
   native frames, as bridges do. Additionally, in order to facility
   proxy ARP or proxy ND optimizations, RBridges also learn the (layer
   3, layer 2) addresses of attached IP nodes from ARP or ND replies.

   Once an RBridge learns the location of a directly attached endnode,
   it informs the other RBridges in its link state information.

   RBridge forwarding can be done, as with a router, via pairwise
   shortest paths.

   To mitigate the temporary loop issues with bridges, RBridges must
   always forward based on a header with a hop count. Although the hop
   count will quickly discard looping frames, it is also desirable not
   to spawn additional copies of frames. This can be accomplished by
   having RBridges specify the next RBridge recipient while forwarding
   across a shared-media link.

   Frames must be encapsulated as they travel between RBridges for
   several reasons:

   1. to prevent source MAC learning from frames in transit

   2. to direct frames towards the egress RBridge.  This enables
      forwarding tables of RBridges to be sized with the number of
      RBridges rather than the total number of nodes in the common
      broadcast domain

   3. to include a hop count for frames in transit (for links, like
      Ethernet, that do not already contain a hop count)

   In order to coexist with Ethernet bridges [802.1D] on Ethernet links,
   frames in transit on Ethernet links must be encapsulated with an
   Ethernet header. The outer header of an RBridge-forwarded frame must
   look, to an Ethernet bridge on the path between two RBridges, like
   the header of a normal frame that the bridge will forward. To enable
   RBridges to distinguish encapsulated frames, a new Ethertype (to be
   assigned) will be used in the outer header.

   Inside that header is a shim header that RBridges will add to the
   frame that will contain:

   o the ingress-RBridge and a second RBridge which is either the root
      RBridge of a distribution tree (in the case of a broadcast /
      multicast / unknown destination frame) or the egress-RBridge (in
      the case of a unicast frame to a known destination)

   o  a hop count

   The simplest encoding of two RBridge IDs in the shim header would be


Radia Perlman et al.                                            [Page 4]


INTERNET-DRAFT                                          RBridge Protocol


   two 6-byte MAC addresses. However, to achieve a more compact
   encoding, RBridges piggyback a nickname acquisition protocol on the
   link state protocol, to acquire unique (within the campus) 2-byte
   nicknames, and the nickname specifies the RBridge in the shim header.

   Inside the shim header is the original frame, as injected into the
   campus.

   RBridges must also support VLANs.

   A VLAN is a way that has been used within layer 2 to partition
   endnodes into different communities [802.1Q].  The usual method of
   determining which community a frame belongs to is based on the port
   from which it is received. The first bridge inserts a VLAN tag, based
   on its port configuration, and the last bridge removes the VLAN tag.
   However, sometimes the VLAN tag might be inserted or deleted by an
   endnode on a link (where "endnode" is a source or sink of traffic on
   the bridged LAN).

   RBridges may be configured with VLAN membership per port, just like
   bridges are. And they will also enforce that a frame originating on a
   particular VLAN only gets delivered to other links in the same VLAN.

   A side-effect of VLANs is that it makes RBridges more scalable, since
   endnode membership in a VLAN is only of interest to RBridges that
   have an attached port configured to be in that VLAN. This means that
   endnode membership in VLAN A only needs to be announced to RBridges
   attached to a link in VLAN A.

   There are several types of frames which RBridges must deliver, and
   each is handled slightly differently:

   1. frames for known unicast destinations

   2. frames for unknown unicast destinations

   3. frames for layer 2 multicast addresses derived from IP multicast
      addresses

   4. frames for layer 2 broadcast / multicast frames which are not
      derived from IP multicast addresses

   5. ARP/ND queries

   6. IGMP membership reports

   If a frame belongs in a particular VLAN, the frame must be delivered
   only to links in that VLAN.

   RBridges will calculate a distribution tree for each potential root


Radia Perlman et al.                                            [Page 5]


INTERNET-DRAFT                                          RBridge Protocol


   RBridge Ri, which we will refer to as the "Ri-tree". In theory,
   RBridges could have calculated a single bidirectional distribution
   tree for the entire campus. However, it was decided that the
   additional computation necessary to compute additional RBridge trees
   was warranted because:

   1. using a tree rooted at the ingress RBridge optimizes the
      distribution path and (almost always) the cost of delivery when
      the number of destination links is a subset of the total number of
      links, as is the case with VLANs and IP multicasts

   2. for unknown destinations, using a tree rooted at the ingress
      RBridge minimizes out-of-order delivery because in the case where
      a flow starts before the location of the destination is known by
      the RBridges, the path to the destination will be the same as the
      path directly to the destination

   RBridges will not use the bridge spanning tree algorithm to calculate
   trees. Instead, the trees are calculated based on the link state
   information, selecting a particular RBridge as the root, and with a
   deterministic tie-breaker so all RBridges calculate the same
   distribution tree based on the same root and same link state
   database. Therefore the tree calculation is done without requiring
   any additional exchange of information between RBridges.

   Instead of calculating a single shared tree, the other extreme is to
   calculate a separate tree for each ingress RBridge, and distribute
   multicast along the tree with the ingress tree as root (where VLAN-
   tagged traffic and IP multicast traffic can be pruned, but otherwise
   all multicast traffic with the same ingress travels on the same
   links). Two reasons why this solution might not be preferable: are:

   1. In some cases, a different tradeoff might be wanted in terms of
      expense of computation vs. optimality of traffic distribution (so
      fewer trees would be desired)

   2. It might be desirable to allow choosing a different distribution
      tree than the one rooted at the ingress RBridge, in order to allow
      multipathing of multicast traffic injected by a particular
      RBridge.

   For this reason, we allow an RBridge R1 to announce (via a flag in
   its link state announcement) whether RBridges should compute a tree
   rooted at R1. The default is yes. If R1 is a tree root, then any
   RBridge R2 can choose the R1-tree for distribution of multicast
   traffic that R2 is injecting into the campus. And in the shim header,
   RBridges can specify which (bidirectional) tree the multicast packet
   should travel along. Therefore, the default is tree calculation
   rooted at every RBridge, but configuration can cut down on the number
   of trees that must be calculated.


Radia Perlman et al.                                            [Page 6]


INTERNET-DRAFT                                          RBridge Protocol


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].



2. Detailed Rbridge Design



2.1. Link State Protocol

   Running a link state protocol among RBridges is straightforward.  It
   is the same as running a level 1 routing protocol in an area, with
   endnode addresses being layer 2 addresses rather than, say, IP
   addresses.  IS-IS [ISO10589] is natural choice for a link state
   protocol because it is easy in IS-IS to define new TLVs for carrying
   new information, and because IS-IS can be done with zero
   configuration. All that is required to run IS-IS is for each RBridge
   to have a unique 6-byte system ID, which can be any of the RBridge's
   MAC addresses.



2.1.1. Separate Instances

   The instances of IS-IS that RBridges will implement is separate from
   any routing protocol that IP routers will implement, just as the
   spanning tree messages are not implemented by IP routers.

   To prevent potential confusion between an IS-IS instance being run by
   IP routers and the IS-IS instance being run by RBridges, RBridge IS-
   IS messages will be sent to a different layer 2 multicast address
   than layer 3 IS-IS routing messages.  The RBridge IS-IS instance is
   also differentiated by defaulting to a distinct, constant "area
   address" (the value 0) that would never appear as a real IS-IS area
   address.

   RBridge IS-IS messages will be sent with the same Ethertype (in the
   outer header) as RBridge-encapsulated data packets. RBridge IS-IS
   messages will be differentiated from RBridge-encapsulated data
   packets because RBridges will use a different multicast address (in
   the outer header) for IS-IS messages than for encapsulated multicast
   data messages. Unicast RBridge-encapsulated packets are sent to a
   specific neighbor, so would not have a group address in the outer
   header.






Radia Perlman et al.                                            [Page 7]


INTERNET-DRAFT                                          RBridge Protocol


2.1.2. Multiple Rbridge IS-IS Instances

   There are two types of information that are carried in RBridge link
   state information; "core-RBridge information", and "endnode
   information". In theory this information could all be contained in
   one instance of RBridge IS-IS. However, since endnode information for
   a particular VLAN only needs to be known to RBridges that are
   connected to links configured to be in that VLAN, each RBridge R1
   will run a "core" instance of IS-IS for the core RBridge information,
   and an instance per VLAN that R1 is attached to, for the endnode
   information for those VLANs.

   The core-RBridge information, which is carried in the core-RBridge
   instance, is:

   1. the system IDs of RBridges which are neighbors of RBridge R1, and
      the cost of the link to each of those neighbors

   2. the VLAN numbers of VLANs directly connected to R1

   3. a Flag indicating whether RBridges should calculate a tree rooted
      at R1 (default = yes)

   4. the nickname of Rbridge R1

   Even if RBridge R2 is not connected to VLAN A, it is relevant to R2
   that some other RBridge R1 is connected to VLAN A, even though R2
   does not need to know which endnodes are in VLAN A. The reason for
   this is to allow R2 to filter multicast / unknown destination packets
   that are VLAN-tagged. If R2 is forwarding a multicast packet tagged
   with VLAN A, R2 need not forward it onto branches of the distribution
   tree that have no downstream VLAN A links.

   The endnode information for VLAN A, which is carried in the VLAN A
   IS-IS instance injected by R1, contains:

   1. L2INFO: layer 2 addresses of nodes on a VLAN A link attached to R1
      which have transmitted frames but have not transmitted ARP or ND
      replies (i.e., these are not known to be IP nodes)

   2. L3and2INFO: layer 3, layer 2 addresses of IP nodes attached to R1,
      which R1 has learned through ARP/ND replies emitted by endnodes on
      an attached VLAN A link.  For data compression, only the portion
      of the address following the campus-wide prefix need be carried.
      (This is a more important optimization for IPv6 than for IPv4)

   3. Multicast Router attached: This is one bit of information that
      indicates whether there is an IP multicast router attached. This
      information is used because IGMP [RFC3376] Membership Reports must
      be transmitted to all links with IGMP routers, and not to links


Radia Perlman et al.                                            [Page 8]


INTERNET-DRAFT                                          RBridge Protocol


      without IGMP routers. Also, all packets for IP-derived multicast
      addresses must be transmitted to all links with IGMP routers
      (within the VLAN), in addition to links from which an IP node has
      explicitly asked to join the group which the packet is for.

   4. Layer 2 addresses derived from IPv4 or IPv6 IGMP notification
      messages received from attached endnodes, indicating the location
      of listeners for these multicast addresses.

   If R1 has learned endnode E's location first from a data packet (and
   therefore has included E's layer 2 address in the L2INFO), and later
   E transmits an ARP/ND reply, R1 MUST include E in the L3andL2INFO,
   and MAY remove E from L2INFO.

   Given that RBridges must already support delivery only to links
   within a VLAN (for multicast or unknown frames marked with the VLAN's
   tag), the same mechanism is used by the per-VLAN instance of IS-IS to
   distribute endnode information solely to RBridges within a VLAN.

   The per-VLAN instance of IS-IS will appear to the RBridges to be
   running on a single link. R1 will originate a VLAN-A-specific IS-IS
   frame.  All RBridges will recognize the frame as a VLAN A multicast
   frame (even if they are not connected to VLAN A), and prune the
   specified distribution tree so as to only deliver the frame along
   branches with VLAN A links. This is the same behavior core RBridges
   would have for any VLAN A multicast / broadcast / unknown destination
   frame. RBridges that are connected to VLAN A links will, in addition
   to forwarding along the specified distribution tree, process the
   frame in their VLAN-A IS-IS instance.

   Thus suppose that RBridges R1, R2, and R3 are all on VLAN A, on links
   scattered throughout the campus. The VLAN A IS-IS instance will
   appear to be a single link (broadcast domain) with R1, R2, and R3 as
   neighbors. The only information carried in the instance is the
   endnode information for VLAN A. The other RBridges on the campus
   facilitate delivery within the VLAN A broadcast domain, and therefore
   may be on the path between R1 and R2, but will treat the VLAN A
   instance link state frames as ordinary datagrams.

   The way that RBridges distinguish which IS-IS instance the link state
   information is for is based on the VLAN tag in the inner header.



2.2. Distribution Tree Calculation

   Some frames (e.g., to unknown destinations, or multicast
   destinations) will need to be delivered to multiple links. RBridges
   must calculate at least one tree, and the default is to calculate a
   tree for every RBridge. However, in order to avoid requiring the


Radia Perlman et al.                                            [Page 9]


INTERNET-DRAFT                                          RBridge Protocol


   RBridges in a campus from calculating as many trees, each RBridge MAY
   be configured to indicate that it should not be the root of a
   distribution tree.

   If all RBridges have their flags set to "don't calculate a tree
   rooted at me", then RBridges MUST calculate a tree rooted at the
   RBridge with lowest ID, regardless of that lowest-ID RBridge's flag
   setting, and that tree MUST be the one specified for all
   multicast/unknown frames.

   In IS-IS a shared link is modeled as a pseudonode, with a 7-byte ID
   consisting of a 6-byte ID owned by the Designated Router (DR), plus a
   nonzero byte assigned by the DR. The "I want to be a Root" flag is
   defaulted to "no" for pseudonodes.

   Calculation of a tree rooted at R1 is done by performing the SPF
   (shortest path first) calculation with R1 as the root, and with a
   deterministic tie-breaker, so that all RBridges calculate the same
   distribution tree. The tie-breaker is that if a node N can be
   attached to either parent P1 or P2 with the same minimal path cost
   from R1 to N, then choose P1 if P1's ID is lower than P2.

   The calculated tree is a bidirectional tree. Each RBridge R keeps a
   set of adjacencies (port, neighbor pair) selected for each
   distribution tree. So, for instance, for the distribution tree rooted
   at R1, R chooses the adjacency which connects R to its parent in that
   SPF tree, as well as any adjacencies that connect children to R. Once
   the adjacencies are chosen, it is irrelevant which ones are towards
   the root R1, and which are away from R1. So R might have calculated
   that adjacencies a, c, and f are in the tree. That means that if
   there is a multicast packet that indicates it should be transmitted
   on distribution tree R1, and it is received on any adjacency other
   than a, c, or f, R should discard the packet. If it is received on
   any of the selected adjacencies (a, c, or f), then R should forward
   onto the other two adjacencies.



2.3. Pruning the Ingress Rbridge Tree

   Packets which must be flooded (e.g., multicasts, unknown
   destinations), are flooded along the selected distribution tree
   rooted at the second RBridge specified in the shim header, and pruned
   based on whether there are potential receivers downstream of each of
   the branches. In the case of a VLAN-tagged packet, it is forwarded
   only on branches that have RBridges participating in that VLAN
   reachable via that branch.

   Further pruning is done in the case of IGMP Notification Messages
   [RFC3376], where these are to be delivered only to ports with IP


Radia Perlman et al.                                           [Page 10]


INTERNET-DRAFT                                          RBridge Protocol


   Multicast Routers. In the case of a multicast derived from an IP
   multicast, these multicast data packets are delivered only to links
   that have registered listeners, plus links which have IP Multicast
   routers.

   The actual tree to forward along is chosen based on the specified
   RBridge in the shim header, say R1. Say that RBridge R knows that
   adjacencies (a, c, and f) are in the R1-distribution tree.

   R marks pruning information for each of the adjacencies in the
   R1-tree. For each adjacency for each tree, R marks:

   o  Set of VLANs reachable downstream, and for each one of those, a
      flag indicating whether there are IP multicast routers downstream,
      and the set of layer 2 multicast addresses derived from IP
      multicast group for which there are receivers downstream

   Forwarding for a non-unicast is done as follows, when RBridge R
   receives a non-unicast packet on one of its adjacencies, and the shim
   header indicates the selected tree is the R1-tree:

      - if the adjacency from which the packet was received is not one
        of the adjacencies in the R1-tree, the packet is dropped

      - if the inner packet is tagged as belonging to VLAN A, and the
        packet is unknown destination, or layer 2 multicast not derived
        from IP multicast, or distributed ARP/ND, the packet is
        forwarded onto an adjacency if and only if that adjacency is in
        the R1-tree, and marked as reaching VLAN-A links

      - if the packet is an IGMP Notification message, marked as VLAN-A,
        then the packet is forwarded onto adjacencies in the R1-tree
        that indicate there are downstream VLAN-A IP multicast routers

      - if the packet is a layer 2 multicast address derived from IP
        multicast groups, and marked as VLAN-A, then the packet is
        forward onto adjacencies in the R1-tree that indicate there are
        downstream VLAN-A IP multicast routers, as well as adjacencies
        that indicate there are downstream VLAN-A receives for that
        group address.

   For each link for which R is Designated RBridge, R additionally
   checks to see if it should decapsulate the frame and send it to the
   link (e.g., if it is a distributed ARP in the specified VLAN for that
   link), or process the packet (e.g., if it is a per-VLAN IS-IS
   instance link state announcement for a VLAN that R is attached to).






Radia Perlman et al.                                           [Page 11]


INTERNET-DRAFT                                          RBridge Protocol


2.4. Designated Rbridge

   One RBridge on each link needs to be elected to have special duties.
   This elected RBridge is known as the Designated RBridge. IS-IS
   already holds such an election.

   The Designated RBridge is the one on the link that will learn and
   advertise the identities of attached endnodes, encapsulate and
   forward frames that originate on that link to the rest of the campus,
   decapsulate and forward frames onto that link received from other
   RBridges, initiate a distributed ARP when an ARP query is received
   for an unknown destination, and answer ARP queries when the target
   node is known.

   It is dangerous to have multiple RBridges being Designated RBridge.
   This could temporarily happen if a partitioned bridged LAN were
   connected with a bridge or repeater. The situation will resolve once
   the better priority RBridge's IS-IS Hello is received by the other
   RBridges on the link. However, it is possible that some intervening
   bridges might be temporarily discarding the IS-IS Hello messages due
   to being in preforwarding state.

   The one message type that is not delayed due to preforwarding state
   is the spanning tree BPDU. If RBridges listen to BPDUs, and if the
   LANs for which R1 was DR, and for which R2 was DR get joined, then
   one or the other of R1 or R2 will note that the bridge Root has
   changed identity, let's say R2 notices.

   The conservative thing to do would be to invoke something like a
   preforwarding state, in which R2 stops forwarding anything to or from
   the link until it is sure the IS-IS link election would have
   completed. But the IS-IS election could get slowed down due to
   bridges in preforwarding state, and it would be undesirable to
   disrupt traffic to and from the link just because the root ID has
   changed.

   The solution is to have RBridges participate in the spanning tree
   election, with higher priority for becoming root (actually, lowest
   numerical priority value) than any of the bridges, and with the same
   priority as for becoming Designated RBridge on the link. Then an
   RBridge is Designated RBridge if and only if it is the spanning tree
   Root.

   Note that RBridges MUST NOT merge spanning trees from different
   ports. If two ports of R1 are connected to the same bridged LAN, then
   the regular bridge spanning tree algorithm will partition the LAN
   into distinct LANs for each of R1's ports. However, if two of R1's
   ports are connected to the same shared medium (without any bridges
   between the ports), then the regular bridge spanning tree algorithm
   will turn off one of R1's ports.


Radia Perlman et al.                                           [Page 12]


INTERNET-DRAFT                                          RBridge Protocol


   So for example, R1 will initiate BPDUs on each of its ports, with
   itself as Root (with highest, i.e., numerically lowest priority), 0
   cost from Root, and the port ID. There are several possible cases:

   o  R1 is the highest priority RBridge on the bridged LAN, in which
      case it will become spanning tree Root and Designated RBridge.

   o  R1 receives a BPDU from itself (because two of its ports are on
      the same shared medium without any bridges between). In this case,
      the numerically lowest port will stay on, and the other port(s)
      will go into spanning tree backup state.

   o  R1 receives a BPDU from someone else with higher priority
      (numerically lower priority|ID), in which case R1 is not Root, and
      not Designated RBridge. It is possible this is due to a bridge
      being configured with the lowest priority, and then if R1 declines
      being DR, the LAN becomes orphaned from the campus. We will treat
      this case as a misconfiguration of bridges, and the LAN will
      become orphaned until the misconfiguration is corrected, but an
      RBridge could in theory eventually discover it is not receiving
      any IS-IS Hellos, and become DR even though it is not spanning
      tree Root.



2.5. Wiring Closet Topology

   In the case where there are two (or more) groups of endnodes, each
   attached to a bridge (say B1 and B2 respectively), and each bridge is
   attached to an RBridge (say R1 and R2 respectively), with a link
   connecting B1 and B2, it may be desirable to have the B1-B2 link only
   as a backup in case one of R1 and R2, or the links B1-R1 or B2-R2
   fail.

   Default behavior would be that one of R1 or R2 (say R1) would become
   Designated RBridge, and forward traffic to/from the link, so endnodes
   attached to B2 would be connected to the campus via the path B2-B1-
   R1, rather than the desired B2-R2.

   The solution is to configure R1 and R2 to be part of a "wiring closet
   group", with a configured ID Rx (which can be R1 or R2's ID). Both R1
   and R2 participate in the bridge spanning tree on the configured
   ports as root Rx, which will cause the spanning tree to break the
   B1-B2 link as desired, and both R1 and R2 will act as Designated
   RBridge on each of their respective partitions.

   In the BPDU, Root will be "Rx", cost to Root will be 0, Designated
   Bridge ID will be "R1" when R1 transmits, and "R2" when R2 transmits,
   and port ID will be a value chosen independently by each of R1 and R2
   to distinguish each of its own ports. If R1 and R2 were actually on


Radia Perlman et al.                                           [Page 13]


INTERNET-DRAFT                                          RBridge Protocol


   the same shared medium with no bridges between them, the result will
   be that the one with the larger ID will see "better" BPDUs (because
   of the tie-breaker on the third field; the ID of the transmitting
   RBridge), and will turn off the port.

   The only misconfiguration that can occur is if the link R1-R2 is on
   the cut set of the campus, and R2 and/or R1 have been configured to
   believe they are part of a wiring closet group.  In that case, the
   link will become partitioned and the campus will become partitioned.



2.6. Learning Endnode Location

   RBridges learn endnode location from native frames. They learn (layer
   3, layer 2) pairs (for the purpose of supporting ARP/ND optimization)
   from listening to ARP or ND replies.

   This endnode information is learned by the DR, and distributed to
   other RBridges through the link state protocol.



2.7. Forwarding Behavior



2.7.1. Receipt of a Native Packet

   R1 receives a native (i.e., not RBridge-encapsulated) unicast frame.
   R1 knows that this is a native frame because the Ethertype is not
   "RBridge encapsulated frame". The destination in the layer 2 header
   is D, the source is S.

   R1 inserts a VLAN tag if required, according to the same rules as
   bridges do.

   Once the VLAN (if any) is established, the layer 2 address of D is
   looked up in the destination table for that VLAN to find the egress
   RBridge R2, or discover that D is unknown.

   If D is known, with egress R2, then R1 encapsulates the packet, with
   R2 indicated in the shim header as egress RBridge and R1 as the
   ingress Rbridge. In the outer header, R1 puts "R1" as source, and
   next hop RBridge (in the path to R2) as "destination", and
   "encapsulated RBridge packet" as the Ethertype.

   If D is unknown, R1 encapsulates the packet, with "R1" indicated as
   ingress RBridge in the shim header, and outer header with source=R1,
   destination = "all-RBridges". The egress RBridge field indicates the


Radia Perlman et al.                                           [Page 14]


INTERNET-DRAFT                                          RBridge Protocol


   chosen distribution tree. The default is for R1 to put its own
   nickname there. However, R1 MAY be configured to select some other
   tree. If R1 is configured to decline to be a tree root, then R1 MUST
   select some other RBridge which has elected to be a tree root.

   If the frame is an IGMP announcement, then R1 learns the group
   membership, and announces a receiver for that layer 2 group address
   in its per-VLAN link state instance. If the frame is a PIM or MRD
   message, R1 learns that there is an IP multicast router (for the
   specified VLAN) on its link, and adds that information into its per-
   VLAN IS-IS link state information. If the frame is an ARP/ND reply,
   then R1 learns the (layer 3, layer 2) correspondence, and adds that
   information into its per-VLAN link state information.

   If the frame is an IS-IS packet, then there is no need for an inner
   header.  The outer header will contain source=transmitting RBridge,
   destination="all RBridges" layer 2 multicast, and protocol type will
   equal "IS-IS". The shim header will indicate the nickname of the
   RBridge that initiated the packet (or 0, if that RBridge does not yet
   have a nickname), and the egress RBridge will indicate the specified
   tree along which to send this packet. There is no reason to have an
   inner Ethernet header, and for IS-IS, instead of an inner header,
   there will be only a 2-byte field, consisting of a flag indicating
   whether this is the core instance of IS-IS, and the VLAN tag, if this
   is not the core instance.

   If the frame is an IS-IS packet, then if it is not the core instance,
   it is forwarded like a non-IP-multicast flooded frame, as well as
   processed, if the RBridge belongs to the specified VLAN. If it is the
   core instance, then the IS-IS frame is never forwarded, and instead
   is processed.



2.7.2. Receipt of an In-transit Packet

   RBridge R1 receives an encapsulated frame (as indicated by
   Ethertype="Rbridge-encapsulated").



2.7.2.1. Flooded Packet

   If the destination in the outer header is "all-RBridges", then R1
   forwards along the tree indicated by the shim header, pruned as
   specified in section 2.3.






Radia Perlman et al.                                           [Page 15]


INTERNET-DRAFT                                          RBridge Protocol


2.7.2.2. Unicast Packet

   If the destination in the outer header is not R1, then R1 drops the
   frame.

   If the shim header indicates R1 is the egress RBridge, then R1
   extracts the inner frame and forwards it onto the link containing the
   destination, or processes the packet if the destination in the inner
   frame is R1.

   Else, R1 looks up the egress RBridge R2 indicated in the shim header,
   in its forwarding table, and forwards the packet towards R2, by
   replacing the outer header with one with source=R1,
   destination=nexthop RBridge towards R2, and Ethertype "RBridge-
   encapsulated".



2.7.2.3. IS-IS Packet

   If the protocol type in the outer header indicates this is an IS-IS
   packet, then R1 processes the packet accordingly.



2.8. IGMP Learning

   RBridges learn, based on seeing IGMP [RFC3376] packets, which
   multicast addresses should be forwarded onto which links.

   IGMP messages have to be forwarded throughout the campus, since IP
   routers in the broadcast domain also need to see these messages.

   IGMP messages are forwarded by RBridges throughout the campus like
   any layer 2 multicast. They are recognized by having an IP message
   type=2 in the IP header. In addition, they are processed by RBridges
   in order to extract, from announcements, what egress RBridges have
   receivers for which groups.



2.9. RBridge Nicknames

   To make the shim header smaller, RBridges dynamically acquire 2-byte
   nicknames that are unique within the campus. The nickname allocation
   protocol is piggybacked on the core IS-IS RBridge instance as
   follows:

   We will assign a new type value to be carried in the IS-IS core
   instance LSPs.  The TLV will carry the nickname the LSP source wishes


Radia Perlman et al.                                           [Page 16]


INTERNET-DRAFT                                          RBridge Protocol


   to use.

   Each RBridge chooses its own nickname.  However, each RBridge is also
   responsible for ensuring that its nickname is unique.  If R1 chooses
   nickname x, and R1 discovers, through receipt of R2's LSP, that R2
   has also chosen x, then the RBridge with the lower system ID keeps
   the nickname, and the other one must choose a new nickname.

   If two RBridge domains merge, then there might be a lot of nickname
   collisions for a short time, but as soon as each side receives the
   link state packets of the other, the RBridges that need to change
   nicknames will quickly become aware of this, and choose new nicknames
   that do not, to the best of their ability, collide with any existing
   nicknames.

   To minimize the probability of nickname collisions, each RBridge
   chooses its nickname randomly from the set of assigned nicknames.
   Alternatively, we could use some sort of hash algorithm (such as the
   bottom 16 bits of the MD5 of the RBridge's system ID), to choose the
   first nickname, and then if there is a collision, go to the next 16
   bits of the MD5, and so on, until all 128 bits of the MD5 hash are
   exhausted, in which case the RBridge hashes its own system ID again,
   this time together with the constant "1".

   There is no reason for all RBridges to use the same algorithm for
   choosing nicknames.  Picking them at random, or using a hash, are an
   attempt to avoid collisions when the network starts up, but that is
   only an optimization.  Even if all RBridges used the same algorithm,
   say as a worst case, they all start with "1" and count up
   sequentially until they find an uncontested nickname, the network
   will eventually stabilize.  And once it is stable, nicknames should
   remain stable even as Rbridges go up or down.

   To minimize the probability of a new RBridge usurping a nickname
   already in use, an RBridge should wait to acquire the link state
   database from a neighbor before it announces its own nickname.



2.10. Forwarding Header on 802 Links

   It is essential that RBridges coexist with ordinary bridges.
   Therefore, a frame in transit must look to ordinary bridges like an
   ordinary layer 2 frame. However, it must also be differentiable from
   a native layer 2 frame by RBridges. To accomplish this, we use a new
   layer 2 protocol type ("Ethertype").

   A frame in transit on an 802 link will therefore have two 802
   headers, since the original frame (including the original 802 header)
   will be tunneled by the RBridges. But rather than just having an


Radia Perlman et al.                                           [Page 17]


INTERNET-DRAFT                                          RBridge Protocol


   additional 802 header, we include additional information between the
   two headers, with a header we refer to as the "shim header".

   An encapsulated frame would look as follows:

           +--------------+-------------+-----------------+
           | outer header | shim header | original frame  |
           +--------------+-------------+-----------------+

                     Figure 1 Encapsulated Frame

   The outer header contains:

   o  L2 destination = next RBridge, or for flooded frames, a new (to be
      assigned) multicast layer 2 address meaning "all RBridges"

   o  L2 source = transmitting RBridge (the one that most recently
      handled this frame)

   protocol type = "to be assigned...RBridge encapsulated frame"

   The 6-byte shim header includes:

   o  TTL = starts at some value and decremented by each RBridge.
      Discarded if=0. This field uses 6 bits for TTL, and the remaining
      10 bits are reserved.

   o  ingress RBridge nickname. 16 bits

   o  egress RBridge nickname (or selected distribution tree, in the
      case of broadcast / multicast). 16 bits



2.11. Handling ARP/ND Queries

   We will use the term "optimized ARP/ND response" to cover several
   possible behaviors an RBridge might utilize. Non-optimized behavior
   would consist of treating an ARP or ND query as an ordinary layer 2
   broadcast/multicast, and send the query to all links in the campus,
   allowing the target to respond as to an ordinary ARP/ND query. This
   behavior is essential when the location of the target is unknown,
   although RBridges could suppress multiple queries to the same target
   within some amount of time.

   When the target's location is assumed to be known by the first
   RBridge, it need not flood the query. Alternative behaviors of the
   first Designated RBridge that receives the ARP/ND query would be to:

   1. send a response directly to the querier, with the layer 2 address


Radia Perlman et al.                                           [Page 18]


INTERNET-DRAFT                                          RBridge Protocol


      of the target, as believed by the RBridge

   2. encapsulate the ARP/ND query to the target's Designated RBridge,
      and have the Designated RBridge at the target forward the query to
      the target. This behavior has the advantage that a response to the
      query will be definitive. If the query does not reach the target,
      then the querier will not get a response

   3. block ARP/ND queries that occur for some time after a query to the
      same target has been launched, and then respond to the querier
      when the response to the recently-launched query to that target is
      received

   The reason not to do the most optimized behavior all the time is for
   timeliness of detecting a stale cache. Also, in the case of SEND
   [RFC3971], cryptography might prevent behavior 1, since the RBridge
   would not be able to sign the response with the target's private key.

   It is not essential that all RBridges use the same strategy for which
   option to select for a particular query. However, once the first
   Designated RBridge decides on a strategy for a particular query, the
   other RBridges must carry that through. If the first RBridge responds
   directly to the querier, or blocks the query, then no other RBridges
   are involved.

   If the first Designated RBridge R1 decides to unicast the query to
   the target's Designated RBridge R2, then R2 must decapsulate the
   query, and initiate an ARP/ND query on the target's link. When/if the
   target responds, R2 must encapsulate and unicast the response to R1,
   which will decapsulate the response and send it to the querier.

   If the first Designated RBridge R1 decides to flood the query (which
   it MUST do if the target is unknown, but MAY do if it wants to assure
   freshness of the information), the query is encapsulated to be
   flooded through the indicated VLAN.

   The distributed ARP query is carried by RBridges through the RBridge
   distribution tree. Each Designated RBridge, in addition to forwarding
   the query through the distribution tree, initiates an ARP query on
   its link(s). If a reply is received from the target by Designated
   RBridge R2, R2 initiates a link state update to inform all the other
   RBridges of D's location, layer 3 address, and layer 2 address, in
   addition to forwarding the reply to the querier.

   It is the querier's Designated RBridge R1 that chooses which strategy
   to employ when seeing an ARP query.

   Some mix of these strategies (responding directly, unicasting the
   query to the target's Designated RBridge, or flooding the query)
   might be the best solution. For instance, even if the target's


Radia Perlman et al.                                           [Page 19]


INTERNET-DRAFT                                          RBridge Protocol


   location and (layer 3, layer 2) correspondence is in the link state
   information R1 received from R2, if the target's location has not
   been recently verified by R1 through a broadcast ARP/ND or unicast
   query to the target, then R1 MAY broadcast or unicast the query or
   respond directly. So for instance, RBridges could keep track of the
   last time a broadcast ARP/ND occurred for each endnode E (by any
   source, and injected by any RBridge). Let's say the parameter is 20
   seconds. If a source S on RBridge R1's link does an ARP/ND for D, if
   R1 has not seen an ARP/ND for D within the last 20 seconds, R1
   unicasts the query to force a reply from the target; otherwise it
   proxies the reply.

   When R2 forwards a unicast ARP/ND query, if the target does not
   respond, then R2 MAY replay the query, and if the target does not
   respond, R2 will remove the target from its link state information.



2.12. Discovering IP Multicast Routers

   Until Multicast Router Discovery [RFC4286] is universally deployed,
   RBridges must discover IP multicast routers because they transmit PIM
   messages. So an RBridge concludes there is an IP multicast router on
   its port if it either receives an MRD message, or a PIM message on
   that port. A PIM message is recognized because the protocol type in
   the IP header is decimal 103.



2.13. Assuring Freshness of Endnode Information

   Designated RBridge R1 can ensure freshness of its endnode information
   by doing ARP/ND queries periodically to ensure that the endnodes are
   actually there. This can be a problem if the endnodes are in power-
   saver mode, and this should be a configuration parameter on R1 as to
   whether R1 should "ping" the endnodes by doing ARP/ND queries.
















Radia Perlman et al.                                           [Page 20]


INTERNET-DRAFT                                          RBridge Protocol


3. Rbridge Addresses, Parameters, and Constants

   Each RBridge needs a unique ID within the campus.  The simplest such
   address is a unique 6-byte ID, since such an ID is easily obtainable
   as any of the EUI-48's owned by that RBridge.  IS-IS already requires
   each router to have such an address.

   A parameter is the value to which to initially set the hop count in
   the envelope.  Recommended default=20.

   A new Ethertype must be assigned to indicate an RBridge-encapsulated
   frame.

   A layer 2 multicast address for "all RBridges" must be assigned for
   use as the destination address in flooded frames.

   To support VLANs, RBridges (like bridges today), must be configured,
   for each port, with the VLAN in which that port belongs.

   We may want a parameter to determine whether an RBridge should
   periodically do queries to ensure that the endnode information is
   fresh, and if so, with what frequency.

   A parameter indicates whether an RBridge wants to be the root of a
   distribution tree.

   Configuration for wiring closet topology consists of system ID of the
   RBridge with lowest ID. If R1 and R2 are part of a wiring closet
   topology, only R2 needs to be configured to know about this, and that
   R1 is the ID it should use in the spanning tree protocol on the
   specified port.





















Radia Perlman et al.                                           [Page 21]


INTERNET-DRAFT                                          RBridge Protocol


4. Security Considerations

   The goal is for RBridges to not add additional security issues over
   what would be present with traditional bridges.  RBridges will not be
   able to prevent nodes from impersonating other nodes, for instance,
   by issuing bogus ARP replies.  However, RBridges will not interfere
   with any schemes that would secure neighbor discovery.

   As with routing schemes, authentication of RBridge messages would be
   a simple addition to the design (and it would be accomplished the
   same way as it would be in IS-IS).  However, any sort of
   authentication requires additional configuration, which might
   interfere with the perception that RBridges, like bridges, are zero
   configuration.



5. IANA Considerations

   A new Ethertype must be assigned to indicate an RBridge-encapsulated
   frame.

   A layer 2 multicast address for "all RBridges" must be assigned for
   use as the destination address in flooded frames.



6. Conclusions

   This design allows transparent interconnection of multiple links into
   a single IP subnet.  Management would be just like with bridges
   (plug-and-play).  But this design avoids the disadvantages of
   bridges.  Temporary loops are not a problem so failover can be as
   fast as possible, and shortest paths can be followed.

   The design is compatible with current IP nodes and routers, and with
   current bridges.















Radia Perlman et al.                                           [Page 22]


INTERNET-DRAFT                                          RBridge Protocol


7. References


7.1. Normative References

   [802.1D] "IEEE Standard for Local and metropolitan area networks /
   Media Access Control (MAC) Bridges", 802.1D-2004, 9 June 2004.

   [802.1Q] "IEEE Standard for Local and metropolitan area networks /
   Virtual Bridged Local Area Networks", 802.1Q-2005, 19 May 2006.

   [ISO10589] ISO/IEC 10589:2002, "Intermediate system to Intermediate
   system routeing information exchange protocol for use in conjunction
   with the Protocol for providing the Connectionless-mode Network
   Service (ISO 8473)," ISO/IEC 10589:2002.

   [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol", RFC
   826, November 1982.

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
   Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2461] Narten, T., Nordmark, E. and W. Simpson, "Neighbor
   Discovery for IP Version 6 (IPv6)", RFC 2461 (Standards Track),
   December 1998.

   [RFC3376] Cain, B., "Internet Group Management Protocol, Version 3",
   RFC 3376, October 2002.

   [RFC4286] Haberman, B., Martin, J., "Multicast Router Discovery", RFC
   4286, December 2005.

   [SNOOP] Christensen, M., Kimball, K, Solensky, F., "Considerations
   for IGMP and MLD Snooping Switches", draft-ietf-magma-snoop-12.txt


7.2. Informative References

   [Arch] Gray, E., "The Architecture of an RBridge Solution to TRILL",
   draft-ietf-trill-rbridge-arch-01.txt, October 2006, work in progress.

   [PAS} Touch, J., & R. Perlman "Transparent Interconnection of Lots of
   Links (TRILL) / Problem and Applicability Statement", draft-ietf-
   trill-prob-01.txt, Octover 2006, work in progress.

   [RBridges] Perlman, R., "RBridges: Transparent Routing", Proc.
   Infocom 2005, March 2004.

   [RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure
   Neighbor Discovery (SEND)", RFC 3971, March 2005.


Radia Perlman et al.                                           [Page 23]


INTERNET-DRAFT                                          RBridge Protocol


   [RP1999] Perlman, R., "Interconnection: Bridges, Routers, Switches,
   and Internetworking Protocols", Addison Wesley Chapter 3, 1999.



Disclaimer

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.



Additional IPR Provisions

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.

   Copyright (C) The IETF Trust (2007).  This document is subject to the
   rights, licenses and restrictions contained in BCP 78, and except as
   set forth therein, the authors retain all their rights.








Radia Perlman et al.                                           [Page 24]


INTERNET-DRAFT                                          RBridge Protocol


Author's Address

   Author's Addresses

   Radia Perlman
   Sun Microsystems

   Email: Radia.Perlman@sun.com


   Silvano Gai
   Nuova Systems

   Email: sgai@nuovasystems.com


   Dinesh G. Dutt
   Cisco Systems, Inc.
   170 Tasman Dr.
   San Jose, CA 95134-1706

   Phone: 1-408-527-0955
   EMail: ddutt@cisco.com



Expiration and File Name

   This draft expires in July 2007.

   Its file name is draft-ietf-trill-rbridge-02.txt.





















Radia Perlman et al.                                           [Page 25]

Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 6325. Expired & archived
	Select version	00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 RFC 6325
	Compare versions
	Author
	Replaces	draft-perlman-trill-rbridge-protocol
	RFC stream
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion