TRILL Working Group Radia Perlman
INTERNET-DRAFT Intel Labs
Intended status: Informational Donald Eastlake
Huawei
Anoop Ghanwani
Brocade
Expires: September 6, 2011 March 7, 2011
RBridges: Appointed Forwarders
<draft-perlman-trill-rbridge-multilevel-00.txt>
Abstract
This document describes issues, and various possible approaches, to
extending TRILL to use multiple levels of IS-IS.
Status of This Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. Distribution of this document is
unlimited. Comments should be sent to the TRILL working group
mailing list <rbridge@postel.org>.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Acknowledgements
The helpful comments of the following are hereby acknowledged: David
Michael Bond.
R. Perlman, et al [Page 1]
INTERNET-DRAFT RBridges: Multilevel TRILL
Table of Contents
1. Introduction............................................3
1.1 TRILL Scalability Issues...............................3
1.2 Improvements Due to Multilevel.........................4
1.3 More on Areas..........................................4
1.4 Terminology and Acronyms...............................5
2. Multilevel TRILL Issues.................................6
2.1 Non-zero Area Addresses................................6
2.2 Aggregated versus Unique Nicknames.....................7
2.3 Building Multi-Area Trees..............................9
2.4 The RPF Check for Trees...............................10
2.5 Area Nickname Acquisition.............................11
2.6 Link state representation of areas....................11
3. Area Partition.........................................13
4. Multidestination Scope.................................14
5. Co-Existence with Old RBridges.........................15
6. Summary................................................16
7. Security Considerations................................16
8. IANA Considerations....................................16
9. References.............................................17
9.1 Normative References..................................17
9.2 Informative References................................17
R. Perlman, et al [Page 2]
INTERNET-DRAFT RBridges: Multilevel TRILL
1. Introduction
The IETF TRILL protocol [RFCtrill] [RFCadj] provides optimal pair-
wise data frame forwarding without configuration, safe forwarding
even during periods of temporary loops, and support for multipathing
of both unicast and multicast traffic. TRILL accomplishes this by
using [IS-IS] link state routing and encapsulating traffic using a
header that includes a hop count. The design supports VLANs and
optimization of the distribution of multi-destination frames based on
VLANs and IP derived multicast groups. Devices that implement TRILL
are called RBridges.
Familiarity with [RFCtrill] is assumed in this document.
1.1 TRILL Scalability Issues
There are multiple issues that might limit the scalability of a
TRILL-based network:
o the routing computation load,
o the volatility of the LSP database creating too much control
traffic,
o the volatility of the LSP database causing the TRILL network to be
in an unconverged state too much of the time,
o the size of the LSP database,
o the size of the end node learning table (the table that remembers
(egress RBridge, VLAN/MAC) pairs),
o the traffic due to upper layer protocols use of broadcast and
multicast, and
o the hard limit of the number of RBridges, due to the 16-bit
nickname space.
Extending TRILL IS-IS to be multilevel (hierarchical) helps with some
of these issues.
IS-IS was designed to be multilevel [IS-IS] [RFC1195] be partitioned
into "areas". Routing within an area is known as "level 1 routing".
Routing between areas is known as "level 2 routing". The level 2 IS-
IS network consists of level 2 routers and links between the level 2
routers. Level 2 routers may participate in one or more areas, in
addition to their role as level 2 routers.
Each area is connected to the level 2 area through one or more
"border routers", which participate both as a router inside the area,
and as a router inside the level 2 "area".
R. Perlman, et al [Page 3]
INTERNET-DRAFT RBridges: Multilevel TRILL
1.2 Improvements Due to Multilevel
Partitioning the network into areas reduces the size of the LSP
database in each router, and stops volatility of the topology in one
area from disrupting other areas. Allowing TRILL to utilize IS-IS's
hierarchy solves the first 4 issues above, but does not necessarily
help the other 3 issues (size of end node learning table, traffic due
to upper layer protocols using multicast, hard limit of 16-bit
RBridge nicknames).
We propose two variants of hierarchical or multilevel TRILL. One we
call the "unique nickname" variant. The other we call the
"aggregated nickname" variant. In the aggregated nickname variant,
border RBridges replace either the ingress or egress nickname field
in the TRILL header of unicaat packets with an aggregated nickname
representing an entire area.
The aggregated nickname variant has the following advantages:
o it solves the 16-bit RBridge nickname limit,
o it lessens the amount of inter-area routing information that must
be passed in IS-IS,
o it greatly reduces the RPF information (since only the area
nickname needs to appear, rather than all the ingress RBridges in
that area), and
o it enables computation of trees such that the portion computed
within a given area is rooted within that area.
The unique nickname variant has the advantage that border RBridges do
not need to do end node learning for end nodes in their own area.
1.3 More on Areas
Each area is configured with an "area address", which is advertised
in IS-IS messages, so as to avoid accidentally interconnecting areas.
Note that although the area address had other purposes in CLNP, (IS-
IS was originally designed for CLNP/DECnet), for TRILL the only
purpose of the area address would be to avoid accidentally
interconnecting areas.
Currently, the TRILL specification says that the area address "must
be zero". If we change the specification so that the area address
value of zero is a default, then most of IS-IS multilevel machinery
works as originally designed. However, there are some TRILL-specific
issues, which we address below in this document.
R. Perlman, et al [Page 4]
INTERNET-DRAFT RBridges: Multilevel TRILL
1.4 Terminology and Acronyms
This document uses the acronyms defined in [RFCtrill] and the
following additional acronym:
DBRB - Designated Border RBridge
R. Perlman, et al [Page 5]
INTERNET-DRAFT RBridges: Multilevel TRILL
2. Multilevel TRILL Issues
The TRILL-specific issues introduced by hierarchy include the
following:
a) configuration of non-zero area addresses, encoding them in IS-IS
PDUs, and interworking with old RBridges that do not understand
nonzero area addresses,
b) nickname management,
c) advertisement of filtering information (VLAN reachability, IP
multicast addresses) across areas,
d) computation of trees across areas for multi-destination frames,
e) computation of RPF information for those trees, and
g) compatibility, as much as practical, with existing, unmodified
RBridges. The most important form of compatibility is with
existing TRILL fast path hardware. Changes that require upgrade to
the slow path firmware/software are more tolerable.
Filtering information is only an optimization, as long as
multidestination frames are not prematurely filtered. Thus, for
instance, border RBridges could advertise they can reach all possible
VLANs, and have an IP multicast router attached. This would cause
multidestination traffic to be transmitted to the border router, and
possibly filtered there, when the traffic could have been filtered
earlier based on VLAN or multicast group.
2.1 Non-zero Area Addresses
The current TRILL base protocol specification [RFCtrill] says that
the area address in IS-IS MUST be zero. The purpose of the area
address is to ensure that different areas are not accidentally hooked
together. Furthermore, zero is an invalid area address for layer 3
IS-IS, so it was chosen as an additional safety mechanism to ensure
that layer 3 IS-IS would not be confused with TRILL IS-IS. However,
TRILL uses a different multicast address and Ethertype to avoid such
confusion, so it is not necessary to worry about this.
Since current TRILL RBridges will reject any IS-IS messages with
nonzero area addresses, the choices are; all RBridges must be
upgraded, neighbors of old RBridges must remove the area address from
IS-IS messages when talking to an old RBridge (which might cause
inadvertent merging of areas), to ignore the problem of accidentally
merging areas entirely, or to keep the fixed "area address" field as
0 in TRILL, and add a new, optional TLV for "area name" that, if
present, could be compared, by new RBridges, to prevent accidental
merging
R. Perlman, et al [Page 6]
INTERNET-DRAFT RBridges: Multilevel TRILL
2.2 Aggregated versus Unique Nicknames
In the unique nickname variant, all nicknames across the campus must
be unique. In the aggregated nickname variant, RBridge nicknames are
only of local significance within an area, and the only nickname
externally (outside the area) visible is the "area nickname", which
aggregates all the internal nicknames.
The aggregated nickname approach eliminates the potential problem of
nickname exhaustion, minimizes the amount of nickname information
that would need to be forwarded between areas, minimizes the size of
the forwarding table, and simplifies RPF calculation and RPF
information.
With unique cross-area nicknames, it would be intractable to have a
flat nickname space with RBridges in different areas contending for
the same nicknames. Instead, each area would need to be configured
with a block of nicknames. Either some RBridges would need to
announce that all the nicknames other than that block are taken (to
prevent the RBridges inside the area from choosing nicknames outside
the area's nickname block), or a new TLV would be needed to announce
the allowable nicknames, and all RBridges in the area would need to
understand that new TLV.
Currently the encoding of nickname information in TLVs does not allow
any aggregation. The information could be encoded as ranges of
nicknames to make this somewhat manageable; however, a new TLV for
announcing nickname ranges would not be intelligible to old RBridges.
In contrast, the aggregated nickname approach enables passing far
less nickname information and works as follows:
Each area would be assigned a 16-bit nickname. This would not be the
nickname of any actual RBridge. Instead, it would be the nickname of
the area itself. Border RBridges would know the area nickname for
their own area(s).
In the following picture, R2 and R3 are area border RBridges. A
source S is attached to R1. The two areas have nicknames 15961 and
15918, respectively. R1 has a nickname, say 27, and R4 has a
nickname, say 44 (and in fact, they could even have the same
nickname, since the RBridge nickname will not be visible outside the
area).
R. Perlman, et al [Page 7]
INTERNET-DRAFT RBridges: Multilevel TRILL
Area 15961 level 2 Area 15918
+-------------------+ +-----------------+ +-------------+
| | | | | |
| S--R1---Rx--Rz-----R2----Rb---Rc--Rd---Re--R3---Rk--R4---D |
| 27 | | | | 44 |
| | | | | |
+-------------------+ +-----------------+ +-------------+
Let's say that S transmits a packet to destination D, and let's say
that D's location is learned by the relevant RBridges already. The
relevant RBridges have learned the following:
1) R1 has learned that D is connected to nickname 15918
2) R3 has learned that D is attached to nickname 44.
The following sequence of events will occur:
- S transmits an Ethernet packet with source MAC = S and destination
MAC = D.
- R1 encapsulates with a TRILL header with ingress RBridge = 27, and
egress = 15918.
- R2 has announced in the level 1 IS-IS instance in area 16961, that
it is attached to all the area nicknames, including 15918.
Therefore, IS-IS routes the packet to R2. (Alternatively, if a
distinguished range of nicknames is used for area, Level 1
RBridges seeing such an egress nickname will know to route to the
nearest border router.)
- R2, when transitioning the packet from level 1 to level 2,
replaces the ingress RBridge nickname with the area nickname, so
replaces 27 with 15961. Within level 2, the ingress RBridge field
in the TRILL header will therefore be 15961, and the egress
RBridge field will be 15918. Also R2 learns that S is attached to
nickname 27 in area 15961.
- The packet is forwarded through level 2, to R3, which has
advertised, in Level 2, reachability to the nickname 15918.
- R3, when forwarding into area 15918, replaces the egress nickname
in the TRILL header with R4's nickname (44). So, within the
destination area, the ingress nickname will be 15961 and the
egress nickname will be 44.
- R4, when decapsulating, learns that S is attached to nickname
15961.
Now suppose that D's location has not been learned by R1 and/or R3.
What will happen, as it would in TRILL today, is that R1 will forward
R. Perlman, et al [Page 8]
INTERNET-DRAFT RBridges: Multilevel TRILL
the packet as a multidestination frame, choosing a tree. As the
multidestination frame transitions into level 2, R2 replaces the
ingress nickname with the area nickname.
Now suppose that R1 has learned the location of D (attached to
nickname 15918), but R3 does not know where D is. In that case, R3
will turn the packet into a multidestination frame within the area.
Care must be taken so that, in case R3 is not the Designated
transitioner for that multidestination frame, but was on the unicast
path, that another RBridge within that area not forward the now
multidestination frame back into level 2. Therefore, it would be
desirable to have a marking, somehow, that indicates the scope of
this packet to be "only this area".
There is an issue with tree nicknames that would be a problem with
the unique nickname variant, but is solved with the aggregated
variant, as follows:
Suppose nicknames were unique within the TRILL campus, and that the
TRILL header was not rewritten by the border RBridges. In that case,
there would have to be globally known nicknames for the trees.
Suppose there are k trees. For all of the trees with nicknames
located outside an area, the trees would all be rooted at (one of)
the border RBridge(s). Therefore, there would be no path splitting
of multidestination with the area.
In contrast, with the aggregated nickname solution, each border
RBridge can have a mapping from the level 2 tree nickname to the
level 1 tree nickname. There need not even be agreement about the
total number of trees; just that the border RBridge have some
mapping, and replace the egress RBridge nickname (the tree name) when
transitioning levels.
Care must be taken that it be clear, when transitioning between level
2 and area X, which (single) border RBridge will transition the
packet between the levels.
2.3 Building Multi-Area Trees
It is easy to build a multi-area tree by building a tree in each area
separately, (including the level 2 "area"), and then having only a
single border RBridge, say R1, in each area, attach to the level 2
area. R1 would forward all multidestination packets between that
area and level 2.
People might find this unacceptable, however, because of the desire
to path split (not always sending all multidestination traffic
through the same border RBridge).
R. Perlman, et al [Page 9]
INTERNET-DRAFT RBridges: Multilevel TRILL
Having multiple border RBridges introduces some complexity:
a) calculating the RPF check when a multidestination frame originates
outside the area (which border RBridge injected the frame into the
area?)
b) calculating the filtering information (which border RBridge will
transition the frame into level 2?)
This might be solvable if all RBridges are multilevel aware, however
it is difficult to imagine how to ensure that old RBridges would
calculate RPF and filtering information sensibly.
Ignoring old RBridges for now, various possible solutions are
a) elect one border RBridge for transitioning all multidestination
frames between levels (call that the Designated Border RBridge
(DBRB))
b) allow the DBRB to appoint other border RBs to forward some subset
of the inter-level frames. (as the DRB does, on a per-VLAN basis,
on a link). Make the appointment information visible to the other
RBridges in the area so that they can calculate their RPF and
filtering information.
If b), then on what basis would the appointment be made? Various
possibilities are as follows:
o based on VLAN
o based on tree root
o based on ingress RBridge nickname
The more flexibility that is allowed, the more complex announcement
of information becomes, and the more complex the tree database
becomes. If appointment is made based on VLAN, then the RPF check
would need to be based on (tree, VLAN, ingress nickname), rather than
simply (tree, ingress nickname) as it is today.
2.4 The RPF Check for Trees
For multidestination frames originating in R1's area, computation of
the RPF check is done as today. For multidestination frames
originating outside R1's area, computation of the RPF check must be
done based on one of the border RBridges (say R1, R2, or R3).
An RBridge, say R4, located inside an area, must be able to know
which of R1, R2, or R3 transitioned the frame into the area from
level 2. (or into level 2 from an area).
R. Perlman, et al [Page 10]
INTERNET-DRAFT RBridges: Multilevel TRILL
This could be done based on having the DBRB announce the assignments
to all the RBs in the area.
2.5 Area Nickname Acquisition
In the aggregated nickname variant, each area must acquire a unique
area nickname. It is probably simpler to allocate a block of
nicknames (say, the top 2000) to be area addresses, and not used by
any RBridges.
The area nicknames need to be advertised and acquired through level
2.
Within an area, all the border RBridges must discover each other
through the level 1 IS-IS database, by advertising, in their LSP "I
am a border RBridge".
Of the border RBridges, one will have highest priority (say R7). It
will be R7 that dynamically participates, in level 2, to acquire a
nickname for the area. R7 will give the area a pseudonode name, such
as R7.5, within level 2. So an area will appear, in level 2, as a
pseudonode.
The pseudonode will participate, in level 2, in acquiring a nickname
for the area.
Within level 2, all the border RBridges [for the area] advertise
reachability to the pseudonode, which will mean connectivity to the
area nickname.
2.6 Link state representation of areas
Within an area, say area A, there is an election for the DBRB,
(Designated Border RB), say R1. This will be done through LSPs
within area A. The border RBs announce themselves, together with
DBRB priority. (Note that the election of the DBRB cannot be done
based on Hello messages, because the border RBs are not necessarily
physical neighbors of each other. They can, however, reach each
other through connectivity within the area, which is why it will work
to find each other through level 1 LSPs.)
R1 acquires the area nickname (in the aggregated nickname approach),
gives the area a pseudonode name (just like the DRB would give a
pseudonode name to a link). R1 advertises, in area A, what the
pseudonode name for the area is (and the area nickname that R1 has
acquired).
R. Perlman, et al [Page 11]
INTERNET-DRAFT RBridges: Multilevel TRILL
The pseudonode LSP initiated by R1 includes any information
extraneous to area A that should be input into area A (such as area
nicknames of external areas, or perhaps (in the unique nickname
variant), all the nicknames of external RBs in the TRILL campus and
filtering information such as IP multicast groups and VLANs). All
the other border RBs for the area announce (in their LSP) attachment
to that pseudonode.
Within level 2, R1 generates a level 2 LSP on behalf of the area,
also represented as a pseudonode. The same pseudonode name could be
used within level 1 and level 2, for the area. (There does not seem
any reason why it would be useful for it to be different, but there's
also no reason why it would need to be the same). Likewise, all the
area A border RBs would announce, in their level 2 LSPs, connection
to the pseudonode.
R. Perlman, et al [Page 12]
INTERNET-DRAFT RBridges: Multilevel TRILL
3. Area Partition
It is possible for an area to become partitioned, so that there is
still a path from one section of the area to the other, but that path
is via the level 2 area.
An area will naturally break into two areas in this case.
An area address might be configured to ensure two areas are not
inadvertently connected. That area address appears in Hellos and
LSPs within the area. If two chunks, connected only via level 2,
were configured with the same area address, this would not cause any
problems. (They would just operate as separate level 1 areas.)
A more serious problem occurs if the level 2 area is partitioned in
such a way that it healed by using a path through a level 1 area.
TRILL will not attempt to solve this problem. Within the level 1
area, a single border RBridge will be the DBRB, and will be in charge
of deciding which (single) RBridge will transition any particular
multidestination frames between that area and level 2. If the level
2 area is partitioned, this will result in multidestination frames
only reaching the portion of the TRILL campus reachable through the
partition attached to the RBridge that transitions that frame. It
will not cause a loop.
R. Perlman, et al [Page 13]
INTERNET-DRAFT RBridges: Multilevel TRILL
4. Multidestination Scope
It would be desirable to be able to mark a multidestination frame
with a scope that indicates this packet should not exit the area.
This is particularly true when, in the aggregated nickname variant, a
unicast packet turns into a multidestination packet.
This could be done by having two tree nicknames, for each tree; one
being the tree "only for this area", and the other being for multi-
area trees.
Alternatively, a packet intended only for the area could be tunneled
(within the area) to the RBridge Rx, that is the appointed
transitioner for that form of packet (say, based on VLAN), with
instructions that Rx only transmit the packet within the area, and Rx
could initiate the multidestination frame within the area. Since Rx
introduced the frame, and is the only one allowed to transition that
frame within levels, this would accomplish scoping of the packet to
within the area.
Since this case would only occur when unicast frames need to be
turned into multidestination (because the border RBridge in the
destination area does not know the location of the destination), the
suboptimality of tunneling between the border RBridge that receives
the unicast frame and the appointed level transitioner for that
frame, would not be an issue.
R. Perlman, et al [Page 14]
INTERNET-DRAFT RBridges: Multilevel TRILL
5. Co-Existence with Old RBridges
RBridges that are not multilevel aware have a problem with
calculating RPF check and filtering information, since they would not
be aware of assignment of border RBridge transitioning.
A possible solution, as long as any old RBridges exist within an
area, is to have the border RBridges elect a single DBRB (Designated
Border RBridge), and have all inter-area traffic go through the DBRB
(unicast as well as multidestination). If that DBRB goes down, a new
one will be elected, but at any one time, all inter-area traffic
(unicast as well as multidestination) would go through that one DRBR.
R. Perlman, et al [Page 15]
INTERNET-DRAFT RBridges: Multilevel TRILL
6. Summary
This draft outlines the issues and possible approaches to multilevel
TRILL. The variant involving area nicknames for aggregation has
significant advantages in terms of scalability; not just of avoiding
nickname exhaustion, but allowing, for instance, RPF checks to be
aggregated based on an entire area.
Some issues are not difficult, such as dealing with partitioned
areas. Some issues are more difficult, especially dealing with old
RBridges.
7. Security Considerations
TBD
8. IANA Considerations
This document requires no IANA actions. RFC Editor: Please delete
this section before publication.
R. Perlman, et al [Page 16]
INTERNET-DRAFT RBridges: Multilevel TRILL
9. References
Normative and Informational references for this document are listed
below.
9.1 Normative References
[IS-IS] - ISO/IEC 10589:2002, Second Edition, "Intermediate System to
Intermediate System Intra-Domain Routing Exchange Protocol for
use in Conjunction with the Protocol for Providing the
Connectionless-mode Network Service (ISO 8473)", 2002.
[RFC1195] - Callon, R., "Use of OSI IS-IS for routing in TCP/IP and
dual environments", RFC 1195, December 1990.
[RFCtrill] - Perlman, R., D. Eastlake, D. Dutt, S. Gai, and A.
Ghanwani, "RBridges: Base Protocol Specification", draft-ietf-
trill-rbridge-protocol-16.txt, in RFC Editor's queue.
[RFCadj] - Eastlake, D., R. Perlman, A. Ghanwani, D. Dutt, V. Manral,
"RBridges: Adjacency", draft-ietf-trill-adj, work in progress.
9.2 Informative References
None.
R. Perlman, et al [Page 17]
INTERNET-DRAFT RBridges: Multilevel TRILL
Authors' Addresses
Radia Perlman
Intel Labs
2200 Mission College Blvd.
Santa Clara, CA 95054-1549 USA
Phone: +1-408-765-8080
Email: Radia@alum.mit.edu
Donald Eastlake
Huawei Technologies
155 Beaver Street
Milford, MA 01757 USA
Phone: +1-508-333-2270
Email: d3e3e3@gmail.com
Anoop Ghanwani
Brocade Communications Systems
130 Holger Way
San Jose, CA 95134 USA
Phone: +1-408-333-7149
Email: anoop@brocade.com
R. Perlman, et al [Page 18]
INTERNET-DRAFT RBridges: Multilevel TRILL
Copyright and IPR Provisions
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License. The definitive version of an IETF
Document is that published by, or under the auspices of, the IETF.
Versions of IETF Documents that are published by third parties,
including those that are translated into other languages, should not
be considered to be definitive versions of IETF Documents. The
definitive version of these Legal Provisions is that published by, or
under the auspices of, the IETF. Versions of these Legal Provisions
that are published by third parties, including those that are
translated into other languages, should not be considered to be
definitive versions of these Legal Provisions. For the avoidance of
doubt, each Contributor to the IETF Standards Process licenses each
Contribution that he or she makes as part of the IETF Standards
Process to the IETF Trust pursuant to the provisions of RFC 5378. No
language to the contrary, or terms, conditions or rights that differ
from or are inconsistent with the rights and licenses granted under
RFC 5378, shall have any effect and shall be null and void, whether
published or posted by such Contributor, or included with or in such
Contribution.
R. Perlman, et al [Page 19]