TRILL (Transparent Interconnection of Lots of Links) Edge Directory Assistance Framework
draft-ietf-trill-directory-framework-03
The information below is for an old version of the document.
| Document | Type | Active Internet-Draft (trill WG) | |
|---|---|---|---|
| Authors | Linda Dunbar , Donald E. Eastlake 3rd , Radia Perlman , Igor Gashinsky | ||
| Last updated | 2013-02-21 | ||
| Replaces | draft-dunbar-trill-directory-assisted-edge | ||
| Stream | Internet Engineering Task Force (IETF) | ||
| Formats | plain text htmlized pdfized bibtex | ||
| Reviews |
GENART Last Call review
(of
-05)
Ready with Issues
SECDIR Last Call review
(of
-05)
Has Issues
|
||
| Stream | WG state | WG Document | |
| Document shepherd | (None) | ||
| IESG | IESG state | I-D Exists | |
| Consensus boilerplate | Unknown | ||
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-ietf-trill-directory-framework-03
TRILL working group L. Dunbar
Internet Draft D. Eastlake
Category: Informational Huawei
Radia Perlman
Intel
Igor Gashinsky
Yahoo
Expires: April 2013 February 21, 2013
TRILL (Transparent Interconnection of Lots of Links)
Edge Directory Assistance Framework
draft-ietf-trill-directory-framework-03
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
Dunbar, et al Expires November 2013 [Page 1]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License.
Abstract
Edge RBridges currently learn the mapping between MAC addresses
and their egress RBridges by observing the data packets traversed
or by the TRILL ESADI protocol. When ingress RBridge receives a
data frame whose destination address (MAC&VLAN) that RBridge does
not know, the data frame is flooded within the VLAN across the
TRILL campus.
This draft describes the framework for using directory services
to assist edge RBridges by reducing multi-destination frames,
particularly unknown unicast frames flooding, and ARP/ND,
improving TRILL (Transparent Interconnection of Lots of Links)
network scalability in environments such as data centers.
Table of Contents
1. Introduction ................................................ 4
2. Terminology ................................................. 5
3. RBridge Campus Impact of Massive Number of End Stations in a
DC ............................................................ 5
3.1. Issues of Flooding Based Learning in DCs ............... 5
3.2. Some Examples .......................................... 7
4. Benefits of Directory Assisted Edge RBridge in DC ........... 8
5. Generic operation of Directory Assistance ................... 9
5.1. Information in Directory for Edge Bridges .............. 9
5.2. Push Model ............................................ 10
5.3. Pull Model ............................................ 12
6. Conclusion and Recommendation............................... 14
7. Security Considerations..................................... 14
8. IANA Considerations ........................................ 14
9. Acknowledgements ........................................... 14
10. References ................................................ 14
10.1. Normative References.................................. 14
Dunbar, et al [Page 2]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
10.2. Informative References................................ 15
Authors' Addresses ............................................ 15
Dunbar, et al [Page 3]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
1. Introduction
Edge RBridges (devices implementing [RFC6325], also known as
TRILL (Transparent Interconnection of Lots of Links) Switches)
currently learn the mapping between destination MAC addresses and
their egress RBridges by observing data packets or by ESADI (End
Station Address Distribution Information). When an ingress
RBridge receives a data frame for a destination address
(MAC&VLAN) that RBridge does not know, the data frame is flooded
within that VLAN across the TRILL campus.
This draft describes the framework for using directory services
to assist edge RBridges by reducing multi-destination frames,
particularly ARP [RFC826], ND [RFC4861], and unknown unicast,
improving TRILL network scalability in environments, such as data
centers.
Data center networks are different from enterprise campus
networks in several ways, in particular:
1. Data centers, especially Internet and/or multi-tenant data
centers tend to have a large number of end stations with a wide
variety of applications.
2. Topology is usually based on racks and rows.
- Guest OS assignment to Servers, Racks, and Rows is
orchestrated by a Server/VM Management system, not done at
random.
3. Rapid workload shifting in data centers can accelerate the
frequency of the physical servers being re-loaded with
different applications. Sometimes, the applications loaded into
one physical server at different times can belong to different
subnets.
4. With server virtualization, there is an increasing trend to
dynamically create or delete VMs when demand for resource
changes, to move VMs from overloaded servers to less loaded
servers, or to aggregate VMs onto fewer servers when demand is
light.
Both 3) and 4) above can lead to applications in one subnet being
placed in different locations (racks or rows) or one rack having
applications belonging to different subnets.
Dunbar, et al [Page 4]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
2. Terminology
The terms "Subnet" and "VLAN" are used interchangeably in this
document because it is common to map one subnet to one VLAN.
Bridge: IEEE Std 802.1Q-2011 compliant device [802.1Q]. In this
draft, Bridge is used interchangeably with Layer 2
switch.
DA: Destination Address
DC: Data Center
EoR: End of Row switches in data center. Also known as
aggregation switches.
End Station: Guest OS running on a physical server or on a
virtual machine. An end station in this document has at
least one IP address and at least one MAC address, which
could be in DA or SA field of a data frame.
RBridge: ''Routing Bridge'', an alternative name for a TRILL
switch.
SA: Source Address
Station: A node, or a virtual node, with IP and/or MAC addresses,
which could be in the DA or SA of a data frame.
ToR: Top of Rack Switch in data center. It is also known as
access switches in some data centers.
TRILL: Transparent Interconnection of Lots of Links [RFC6325]
TRILL switch: A device implementing the TRILL protocol [RFC6325]
VM: Virtual Machines
3. TRILL Campus Impact of Massive Number of End Stations in a DC
3.1. Issues of Flooding Based Learning in DCs
It is common for Data Center networks to have multiple tiers of
switches, for example, one or two Access Switches for each server
rack (ToR), aggregation switches for some rows (or EoR switches),
Dunbar, et al [Page 5]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
and some core switches to interconnect the aggregation switches.
Many aggregation switches deployed in data centers have high port
density. It is not uncommon to see aggregation switches
interconnecting hundreds of ToR switches.
+-------+ +------+
+/------+ | +/-----+ |
| Aggr11| + ----- |AggrN1| + EoR Switches
+---+---+/ +------+/
/ \ / \
/ \ / \
+---+ +---+ +---+ +---+
|T11|... |T1x| |T21| .. |T2y| ToR switches
+---+ +---+ +---+ +---+
| | | |
+-|-+ +-|-+ +-|-+ +-|-+
| |... | | | | .. | |
+---+ +---+ +---+ +---+Server racks
| |... | | | | .. | |
+---+ +---+ +---+ +---+
| |... | | | | .. | |
+---+ +---+ +---+ +---+
Figure 1: Typical Data Center Network Design
The following problems could occur when TRILL is deployed in a
data center with large number of end stations and the end
stations in one subnet/VLAN could be placed under multiple edge
RBridges:
- Unnecessary filling of slots in the MAC learning table of
edge RBridges, e.g. RB1 (or T11), due to RB1 receiving
broadcast/multicast traffic (e.g. ARP/ND, cluster multicast,
etc.) from end stations under other edge RBridges that are
not actually communicating with any end stations attached to
RB1.
- Some edge RBridge ports being blocked for user traffic when
there are more than one RBridge ports connected to an edge
bridged LAN. When there are multiple RBridge ports connected
to a bridged LAN in each VLAN, only one (the Appointed
Forwarder port) can forward/receive native traffic for that
bridged LAN or VLAN. The rest have to be blocked for
forwarding/receiving native traffic for that VLAN. When
Dunbar, et al [Page 6]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
servers have dual uplinks to two different ToR switches (or
edge RBridges), some links may not be fully utilized.
- Packets being flooded across TRILL campus when their DAs are
not in ingress RBridge's cache.
- In an environment where VMs migrates, there is higher chance
of cached entries becoming invalid, causing traffic to be
black holed by the egress RBridge. If VMs send out
gratuitous ARP/ND or [802.1Qbg] VDP messages upon arriving
at new locations, the ingress nodes might not have the MAC
entries for the newly arrived VMs, causing more unknown
flooding.
3.2. Some Examples
Consider a data center with 1600 server racks. Each server rack
has at least one ToR switch. The ToR switches are further divided
into 8 groups, with each group being connected by a set of
aggregation switches. There could be 4 to 8 aggregation switches
in each set to achieve load sharing for traffic to/from server
racks. If TRILL is deployed in this data center environment,
let's consider the following two scenarios for the TRILL campus
boundary:
- Scenario #1: TRILL campus boundary starts at ToR switches:
If each server rack has one uplink to one ToR, there are
1600 edge RBridges. If each rack has dual uplinks to two ToR
switches, then there will be 3200 edge RBridges
In this scenario, the TRILL domain will have more than 1600
(or 3200) + 8*4 (or 8*8) nodes, which is a large IS-IS
domain. Even though a mesh IS-IS domain can scale up to
thousands of nodes, it is challenging for aggregation
switches to handle IS-IS link state advertisement among
hundreds of parallel ports.
- Scenario #2: TRILL campus boundary starts at the aggregation
switches:
With the same assumption as before, the number of nodes in
the TRILL campus will be less than 100, and aggregation
switches don't have to handle IS-IS link state advisements
among hundreds of parallel ports.
Dunbar, et al [Page 7]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
But bridged LANs are formed under the aggregation switches
in this scenario. With aggregation switches being the
RBridge edge nodes, multiple RBridge edge ports could be
connected to one bridged LAN. To avoid potential loops,
TRILL requires only one of multiple RBridge edge ports
connected to each VLAN being designated as Appointed
Forwarder [RFC6439], and other ports being blocked for
native frames in that VLAN.
In addition, the number of MAC&VLAN<->Egress RBridge Mapping
entries to be learned and managed by RBridge edge node can
be very large. In the example above, each edge RBridge has
200 edge ports facing the ToR switches. If each ToR has 40
downstream ports facing servers and each server has 10 VMs,
there could be 200*40*10 = 80000 end stations attached. If
all those end stations belong to 1600 VLANs (i.e. 50 per
VLAN) and each VLAN has 200 end stations, then under the
worst-case scenario, the total number of MAC&VLAN entries to
be learned by the edge RBridge can be 1600*200=320000, which
is very large.
4. Benefits of Directory Assisted Edge RBridge in DC
In the data center environment, application placement to servers,
racks, and rows is orchestrated by Server (or VM) Management
System(s). That is, there is a database or multiple databases
(distributed model) that have the knowledge of where each
application is placed. If the application location information
can be fed to RBridge edge nodes, in some form of Directory
Service, then there is much less chance of RBridge edge nodes
receiving unknown MAC-DA, therefore less chance of flooding.
Avoiding unknown unicast MAC-DA flooding to the TRILL campus is
especially valuable in the data center environment because there
is a higher chance of an edge RBridge receiving packets with
unknown unicast DA and broadcast/multicast messages due to VM
migration and servers being loaded with different applications.
When a VM is moved to a new location or a server is loaded with a
new application with a different IP/MAC addresses, it is more
likely that the DA of data packets sent out from those VMs are
unknown to their attached edge RBridges. In addition, gratuitous
ARP (IPv4, [RFC826]) or Unsolicited Neighbor Advertisement (IPv6,
[RFC4861]) sent out from those newly migrated or activated VMs
have to be flooded to other edge RBridges that have VMs in the
same subnets.
The benefits of using directory assistance include:
Dunbar, et al [Page 8]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
- Avoid flooding unknown unicast DA across TRILL campus. The
Directory enforced MAC&VLAN <-> Egress RBridge mapping table
can determine if a data packet needs to be forwarded across
TRILL campus.
When multiple RBridge edge ports are connected via a bridged
LAN to end stations (servers/VMs), a directory assisted edge
RBridge won't need to flood unknown unicast DA data frames
to all ports of the edge RBridges in the frame's VLAN when
it ingresses a frame. It can depend on the directory to tell
it where the DA is. When the directory doesn't have the
needed information, the frames can be dropped or flooded
depending on the policy configured.
- Reduce flooding of decapsulated Ethernet frames with unknown
MAC-DA to a bridged LAN connected to RBridge edge ports.
When an RBridge receives a TRILL frame whose destination
Nickname matches with its own, the normal procedure is for
the RBridge to decapsulate the TRILL header and forward the
decapsulated Ethernet frame to the directly attached bridged
LAN. If the destination MAC is unknown, the normal
Ethernet switch's flooding will occur with the decapsulated
Ethernet frame. With directory assistance, the egress
RBridge can determine if the MAC-DA in a frame matches with
any end stations attached via the bridged LAN. Frames can be
discarded if their DAs do not match.
- Reduce the amount of MAC&VLAN <-> Egress RBridge mapping
maintained by edge RBridges. There is no need for an edge
RBridge to keep MAC entries of remote end stations that
don't communicate with the end stations locally attached.
- Eliminate ARP/ND being broadcasted or multi-casted through
the TRILL core.
5. Generic operation of Directory Assistance
5.1. Information in Directory for Edge RBridges
To achieve the benefits of directory assistance for TRILL, the
corresponding directory server entries will need, at a minimum,
the following logical attributes:
Dunbar, et al [Page 9]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
[{IP, MAC/VLAN, {list of attached RBridge nicknames}, {list of
interested RBridges}]
The {list of attached RBridges} are the RBridges to which the
host (or VM) specified by the [IP or MAC/VLAN] in the entry is
attached. The {list of interested RBridges} are the remote
RBridges that might have attached hosts to communicate with the
host in this entry.
When a host has multiple IP addresses, there will be multiple
entries.
The {list of interested RBridges} would get populated when an
RBridge queries for information, or pushed down from management
systems. The list is used to notify those RBridges when the host
(specified by the IP/MAC/VLAN) in the entry connectivity to its
attached RBridges changes. An explicit list in the directory is
not needed as long as the interested RBridges can be determined.
There are two different models for Directory assistance to
RBridge edge nodes: Push Model and Pull Model.
5.2. Push Model and Requirements
Under this model, Directory Server(s) push down the MAC&VLAN <->
Egress RBridge mapping for all the end stations that might
communicate with end stations attached to an RBridge edge node.
Under this model, if the packet's destination address can't be
found in the MAC&VLAN<->Egress RBridge table, the ingress RBridge
can be configured to:
simply drop a data packet,
flooding to TRILL campus, or
Start the pull process to get information from directory
server(s)
It may not be necessary for every edge RBridge to get the entire
mapping table for all the end stations in a data center. There
are many ways to narrow the full set down to a smaller set of
remote end stations that communicate with end stations attached
to an edge RBridge. A simple approach is to only pushing down the
mapping for the VLANs that have active end stations under an edge
RBridge. This approach can reduce the number of mapping entries
being pushed down.
Dunbar, et al [Page 10]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
However, the Push Model usually will push down more entries of
MAC&VLAN<->Egress RBridge mapping to edge RBridges than needed.
Under the normal process of edge RBridge cache aging and unknown
DA flooding, rarely used mapping entries would have been removed.
But it can be difficult for Directory Servers to predict the
communication patterns among applications within one VLAN.
Therefore, it is likely that the Directory Servers will push down
all the MAC&VLAN entries if there are end stations in the VLAN
being attached to the edge RBridge. This is a disadvantage of the
Push Model compared with the Pull Model described below.
In the Push Model, it is necessary to have a way for an RBridge
node to request directory server(s) to start pushing down the
mapping entries. This method should at least include the VLANs
enabled on the RBridge, so that directory server doesn't need to
push down the entire mapping entries for all the end stations in
the data center. An RBridge must be able to get mapping entries
when it is initialized or restarted.
The Push Model's detailed method and any hand-shake mechanism
between RBridge and Directory Server(s) is beyond the scope of
this framework draft.
When directory server needs to push down a very large number of
entries to edge RBridges, summarization should be considered. For
example, with one edge RBridge Nickname being associated with all
attached end stations' MAC addresses and VLANs as shown below:
Dunbar, et al [Page 11]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
+------------+-------+--------------------------------+
| Nickname1 |VID-1 | IP/MAC1, IP/MAC2, ,, IP/MACn |
| |------ +--------------------------------+
| |VID-2 | IP/MAC1, IP/MAC2, ,, IP/MACn |
| |------ +--------------------------------+
| | .... | IP/MAC1, IP/MAC2, ,, IP/MACn |
+------------+------ +--------------------------------+
| Nickname2 |VID-1 | IP/MAC1, IP/MAC2, ,, IP/MACn |
| |------ +--------------------------------+
| |VID-2 | IP/MAC1, IP/MAC2, ,,IP/MACn |
| |------ +--------------------------------+
| | | IP/MAC1, IP/MAC2, ,, IP/MACn |
+------------+------ +--------------------------------+
| ------- |------ +--------------------------------+
| | | IP/MAC1, IP/MAC2, ,, IP/MACn |
+------------+------ +--------------------------------+
Table 1: Summarized table pushed down from directory
Whenever there is any change in MAC&VLAN <-> Egress RBridge
mapping, that can be triggered by end stations being added,
moved, or de-commissioned, an incremental update can be sent to
the edge RBridges which are impacted by the change. Therefore,
something like a sequence number has to be maintained by
directory servers and RBridges. Detailed mechanisms will be
specified in a separate document.
5.3. Pull Model and Requirements
Under this model, an RBridge pulls the MAC&VLAN<->Egress RBridge
mapping entry from the directory server when its cache doesn't
have the entry. There are several options to trigger the pulling
process:
- The RBridge edge node can send a pull request whenever it
receives an unknown MAC-DA, or
- The RBridge edge node can simply intercept all ARP/ND
requests and forward them to the Directory Server(s) that
has the information on where the target stations are located.
- The Pull Directory response could indicate that the MAC-DA
being queried is unknown or that the requestor is
administratively prohibited from getting an informative
response.
Dunbar, et al [Page 12]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
By using a Pull Directory, the frame with unknown MAC-DA doesn't
have to be flooded across TRILL domain and the ARP/ND requests
don't have to be broadcast or multicast across TRILL domain.
The ingress RBridge can cache the response pulled down from the
directory. The timer for cache should be short in an environment
where VMs move frequently. The cache timer could be configured by
management system or could be sent down along with the Pulled
reply by the directory server(s).
One advantage of the Pull Model is that edge RBridge can age out
MAC&VLAN entries if they haven't been used for a certain
configured period of time or a period of time provided by the
Directory. Therefore, each edge RBridge will only keep the
entries that are frequently used, so mapping table size will be
smaller. Edge RBridges would query the Directory Server(s) for
unknown MAC-DAs in data frames or ARP/ND and cache the response.
When end stations attached to remote edge RBridges rarely
communicate with the locally attached end stations, the
corresponding MAC&VLAN entries would be aged out from the
RBridge's cache.
An RBridge waiting for response from Directory Servers upon
receiving a data frame with an unknown DA is similar to a L2/L3
boundary router waiting for ARP/ND response upon receiving an IP
data frame whose DA is not in the router's IP/MAC cache table.
Most deployed routers today do hold the packet and send an ARP/ND
requests to the target upon receiving a packet with IP DA not in
its IP-MAC cache. When ARP/ND replies are received, the router
will send the data frame to the target. This practice minimizes
flooding when targets don't exist in the subnet.
When the target doesn't exist in the subnet, routers generally
re-send ARP/ND request a few more times before dropping the
packets. So, the holding time by routers to wait for ARP/ND
response can be longer than the time taken by the Pull Model to
get IP-MAC mapping from directory if target doesn't exist in the
subnet.
For RBridges with mapping entries being pushed down from
directory server, they can be configured to use Pull model for
targets which don't exist in the mapping data pushed down.
A separate document will specify the detailed messages and
mechanism for edge RBridges to pull information from directory
server(s).
Dunbar, et al [Page 13]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
6. Conclusion and Recommendation
The traditional RBridge learning approach of observing data
plane will have difficulty keep pace with the ever growing
number of end stations in Data centers.
Therefore, TRILL should provide a directory assisted approach.
This document describes the basic framework of using a directory
assisted approach for RBridge edge nodes. More detailed
mechanisms will be described in separate drafts.
7. Security Considerations
Accurate mapping of IP addresses into MAC addresses is important
to the correct delivery of information. The security of specific
directory assisted mechanisms will be discussed in the document
or documents specifying those mechanisms.
For general TRILL security considerations, see [RFC6325].
8. IANA Considerations
This document requires no IANA actions. RFC Editor: please delete
this section before publication.
9. Acknowledgements
This document was prepared using 2-Word-v2.0.template.dot.
10. References
10.1. Normative References
As an Informational document, this draft has no Normative
References.
Dunbar, et al [Page 14]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
10.2. Informative References
[802.1Q] IEEE Std 802.1Q-2011, "IEEE Standard for Local and
metropolitan area networks - Virtual Bridged Local Area
Networks", May 2011.
[802.1Qbg] IEEE Std 802.1Qbg-2012, ''Media Access Control (MAC)
Bridges and Virtual Bridged Local Area Networks --- Edge
Virtual Bridging'', July 2012.
[RFC826] Plummer, D., "An Ethernet Address Resolution Protocol",
RFC 826, November 1982.
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
September 2007.
[RFC6325] Perlman, et, al ''RBridge: Base Protocol Specification'',
https://datatracker.ietf.org/doc/rfc6325/, July, 2011
[RFC6439] Perlman, et, al ''RBridges: Appointed Forwarders'',
https://datatracker.ietf.org/doc/rfc6439/, Nov 2011
Authors' Addresses
Linda Dunbar
Huawei Technologies
5430 Legacy Drive, Suite #175
Plano, TX 75024, USA
Phone: (469) 277 5840
Email: ldunbar@huawei.com
Donald Eastlake
Huawei Technologies
155 Beaver Street
Milford, MA 01757 USA
Phone: 1-508-333-2270
Email: d3e3e3@gmail.com
Dunbar, et al [Page 15]
Internet-Draft TRILL Directory Assistance Framework Oct 2012
Radia Perlman
Intel Labs
2200 Mission College Blvd.
Santa Clara, CA 95054-1549 USA
Phone: +1-408-765-8080
Email: Radia@alum.mit.edu
Igor Gashinsky
Yahoo
45 West 18th Street 6th floor
New York, NY 10011
Email: igor@yahoo-inc.com
Dunbar, et al [Page 16]