Network Working Group M. Bhatia
Internet-Draft Alcatel-Lucent
Intended status: Standards Track M. Chen
Expires: June 15, 2012 Z. Wang
Huawei Technologies Co., Ltd
L. Guo
China Telecom
M. Binderberger
December 13, 2011
Bidirectional Forwarding Detection (BFD) on Link Aggregation Group (LAG)
Interfaces
draft-mmm-bfd-on-lags-00
Abstract
This document proposes a mechanism to run BFD on Link Aggregation
Group (LAG) interfaces. It does so by running an independent BFD
session on every LAG member link.
A dedicated well-known multicast IP address for both IPv4 and IPv6 is
introduced as the destination IP address of the BFD packets when
running BFD on the member links of the LAG.
There is currently no standard that describes how BFD should run on
LAG interfaces. As a result multiple non-interoperable BFD
implementations for LAG interfaces exist. This draft provides a
short overview as a context for the new proposed mechanism.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
Bhatia, et al. Expires June 15, 2012 [Page 1]
Internet-Draft BFD for LAG Interfaces December 2011
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 15, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Bhatia, et al. Expires June 15, 2012 [Page 2]
Internet-Draft BFD for LAG Interfaces December 2011
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. BFD over LAG with a single session . . . . . . . . . . . . . . 4
2.1. Existing implementations . . . . . . . . . . . . . . . . . 4
2.2. BFD over Big Pipe . . . . . . . . . . . . . . . . . . . . 5
3. BFD over member links of the LAG . . . . . . . . . . . . . . . 6
3.1. BFD protocol details . . . . . . . . . . . . . . . . . . . 6
3.2. BFD influence on the LAG Management Module . . . . . . . . 6
3.3. Concluded BFD state . . . . . . . . . . . . . . . . . . . 7
3.4. Motivation for the technical design . . . . . . . . . . . 7
4. BFD for LAG and layer-3 applications . . . . . . . . . . . . . 8
5. Security Consideration . . . . . . . . . . . . . . . . . . . . 9
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Normative References . . . . . . . . . . . . . . . . . . . 9
8.2. Informative References . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
Bhatia, et al. Expires June 15, 2012 [Page 3]
Internet-Draft BFD for LAG Interfaces December 2011
1. Introduction
The Bidirectional Forwarding Detection (BFD) protocol [RFC5880]
provides a mechanism to detect faults in the bidirectional path
between two forwarding engines, including interfaces, data link(s),
and to the extent possible the forwarding engines themselves, with
potentially very low latency.
BFD can be used for detecting failures of the path between two
network devices. Typically the application clients are not aware of
any inner structure of the underlying interface, being layer 3
applications themselves like Open Shortest Path First (OSPF)
[RFC2328] or Border Gateway Protocol (BGP)[RFC4271]. While this
works for interfaces like Ethernet and Packet Over SONET (POS), it
causes problems for bundled interfaces like LAG.
A LAG is used to bind together several physical ports between two
adjacent nodes so they appear to higher-layer protocols as a single,
higher bandwidth "virtual" pipe. A LAG interface thereby allows
aggregation of multiple network interfaces as one virtual interface
for the purpose of providing fault-tolerance and higher bandwidth.
The problem for BFD is that a single BFD session is subject to the
load-balance algorithm used for the LAG, i.e. BFD has no control
which physical links are used and in which sequence. This makes it
impossible for BFD to guarantee a detection of anything but a full
LAG shutdown. This LAG shutdown would be initiated by the LAG
Management Module (LMM) and is typically multiple times slower than
BFD detection times (multiple 100msec of LMM vs. multiple 10msec of
BFD). The solution proposed in this document is to run a BFD session
on every physical member link the LAG is built upon. This requires
the LMM to request BFD sessions for every member link, using BFD as a
fast detection mechanism. BFD can combine this information from LMM
with the layer-3 centric session requests from OSPF and alike to
provide fast detection for layer-3 applications.
2. BFD over LAG with a single session
2.1. Existing implementations
As mentioned, no standard exists on how to run BFD over LAG
interfaces. As a result, a simple approach has been chosen by
several implementations, that allows it to interoperate and solves
the problem of establishing a BFD session over a LAG. This is
typically done by treating LAG as a big, virtual pipe and ignoring
the underlying structure (i.e., the component member links). This is
not desirable as it does not allow deterministic and fast detection
Bhatia, et al. Expires June 15, 2012 [Page 4]
Internet-Draft BFD for LAG Interfaces December 2011
of individual member link failures. We call this conventional
approach of running BFD as "BFD over Big Pipe" or "BBP" in short.
There are various ways of running BFD over LAG interfaces. Some
implementations send BFD packets only over the primary or the active
member link . Others spray BFD packets over all member links of the
LAG. There are issues with each of these approaches.
In the first approach, BFD will remain up as long as the primary port
is alive. It will go down once the primary port goes down till
another port is selected as the primary. Another problem with this
design is with BFD being oblivious to the presence of other member
links in the LAG. If a non-primary member link goes down, then BFD
remains unaffected as it can still send BFD packets over the primary
link. BFD will thus remain up and all traffic sent over the failed
member link will get dropped, till an upper layer protocol like Link
Aggregation Control protocol (LACP) detects the failed link and
removes it from the LAG.
In the second approach, BFD packets are sprayed over all the member
links of a LAG. This is done naively via round-robin, where each BFD
packet is sent using the subsequent member link, in a round-robin
fashion. It solves the problem of BFD going down because of the
primary port going down, but it still does not solve the problem of
traffic getting lost when one of the member link goes down. This is
because when a member link goes down, BFD still remains up and
traffic continues to go over the link that has failed till a higher
layer protocol detects this and removes the offending link from the
LAG.
2.2. BFD over Big Pipe
This document proposes using one of the mechanisms described in the
previous section in combination with the new mechanism of a separate
BFD session per LAG member link, which will be defined in the next
section.
For this reason we need to standardize the simple approach. The main
task is to define what it means to treat LAG as a single "big pipe".
It means:
o BFD must work no matter what member link, packets are sent and
received on.
o the Rx/Tx link can change any time and/or regularly with every
change pattern without causing BFD to fail
This allows to use the LAG like any other interface and RFC 5880 and
Bhatia, et al. Expires June 15, 2012 [Page 5]
Internet-Draft BFD for LAG Interfaces December 2011
RFC 5881 can be used without modification.
As described in the last section it is advantageous to spray the
packets in a round-robin fashion across the LAG member links, as
opposed to sending those over only the primary or the active port.
It is thus RECOMMENDED that implementations do that. However, there
are still some issues left with spraying the BFD packets, that will
get addressed in the scheme described in the next section.
3. BFD over member links of the LAG
3.1. BFD protocol details
The proposal is to run a BFD session on every member link of the LAG.
The BFD packets are IP/UDP based packets as defined in RFC 5880
[RFC5880] and RFC 5881 [RFC5881]. Currently, only asynchronous mode
is considered in this document. The echo function is outside the
document's scope. At least one system MUST take the Active role
(possibly both). The BFD sessions on the member links are
independent sessions. They use their own, unique local discriminator
and their own set of state variables. Timer values MAY be different,
even between the sessions belonging to the same LAG.
The destination IP address is a dedicated well-known multicast IP
address (224.XXX.XXX.XXX for IPv4, FFXX:: for IPv6, to be assigned by
IANA). On Ethernet-based LAG member links the corresponding
destination multicast MACs will be 01:00:5e:XX:XX:XX for IPv4 and 33:
33:XX:XX:XX:XX for IPv6. Each member link will use its own MAC
address as the source MAC address.
The demultiplexing of a received packet is solely based on the Your
Discriminator field, if this field is nonzero. A zero value may
happen for the initial Down packet of a session. In this case
demultiplexing a BFD for LAG packet MUST be based on some combination
of other fields which MUST include either the destination IP or the
destination MAC address.
The Address Family used is fixed per LAG, i.e. the BFD sessions on
the member links of a particular LAG are either all using IPv4 or all
using IPv6. An implementation MUST provide a configuration knob to
select the address family and MAY extend this to some sort of auto-
discovery. The default address family is IPv4.
3.2. BFD influence on the LAG Management Module
The LAG Management Module (LMM) is a client of BFD, requesting BFD
sessions for all the LAG member links. For link failure detection
Bhatia, et al. Expires June 15, 2012 [Page 6]
Internet-Draft BFD for LAG Interfaces December 2011
the LMM can use BFD instead of or in parallel with LACP.
A member link of the LAG is not used anymore for data forwarding when
the particular BFD session running over that link goes down. The
member link MUST be removed from the LAG. The BFD session for the
link remains, i.e. it is not deleted.
To add a member link to the LAG, LMM MAY wait for the BFD session on
the link to come Up. There may be a deadlock situation since the
link interface not being active (e.g., layer 3 protocol down) may
prevent BFD packets, including other control protocols packets (e.g.
ARP) that are tightly coupled with the status of the interface, to be
transmitted between the pair of interfaces, thus failing to bring up
the interfaces.
To avoid the deadlock, BFD packets SHOULD NOT be blocked by the layer
N protocol status of the interface when the application depends on
the BFD status to enable layer N of the interface. If this cannot be
achieved then the BFD status MUST be ignored by the application when
bringing up an interface. The BFD status can then be used afterwards
to bring the interface down.
The behaviour of the LMM MUST be configurable if waiting for BFD
status of Up to add a member link is supported, to allow an
alternative mode of adding the member link irrespective of the BFD
state for interoperability purpose.
3.3. Concluded BFD state
An additional state variable is introduced for BFD on LAG: the
concluded state. The state values are Down and Up.
The details of how BFD derives the concluded state is outside the
scope of the document. The idea is that the LMM may declare a LAG as
down when a certain threshold has been hit, e.g. a minimum required
bandwidth for the LAG. BFD could for example duplicate the LMM logic
or it could use an API to LMM to learn about the decision of the LAG
management module. What is relevant for BFD on LAG is that the
concluded state is the overall state of the LAG.
The concluded state is important for layer-3 clients requesting BFD
sessions over the LAG or over Vlans on the LAG. Details will be
discussed in section 4.
3.4. Motivation for the technical design
The primary goal was to stay close to the existing standards RFC 5880
[RFC5880] and RFC 5881 [RFC5881], allowing the reuse of existing
Bhatia, et al. Expires June 15, 2012 [Page 7]
Internet-Draft BFD for LAG Interfaces December 2011
implementations for IP-based point-to-point BFD. At the same time
BFD for LAG and the already existing BFD over Big Pipe drafted in
Section 2 should be able to run in parallel. The destination IP
address can be used to disambiguate the BFD packets over the member
links and the BFD packets over the Big Pipe, should an implementation
decide to support both.
To overcome the problem that a member link may not support ARP when
not being an active LAG member a multicast MAC was chosen to allow
the destination interface port to accept the BFD packet. Combining
these two requirements results in using a well-defined IP Multicast
address.
4. BFD for LAG and layer-3 applications
The information about the member links belonging to a LAG interface
comes from the LAG management module (LMM). BFD helps the LMM to
detect and converge fast. Layer-3 protocols may use BFD for LAG in
one of the following ways:
o For sessions requested by layer-3 clients like OSPF a virtual
session is created. This virtual session is not creating actual
BFD packets on the LAG interface. Instead it's state, which is
reported to the layer-3 client, is identical with the concluded
state.
BFD on LAG MUST support this mode, and it is the default mode.
With virtual sessions synchronization between the two BFD modules
on either end of the LAG is guaranteed only by an identical
configuration and logic on both ends.
o BFD for LAG is requested by the LMM only. For sessions requests
from layer-3 applications the BFD over Big Pipe (BBP) mechanism is
used. BBP would run as an independent detection mechanism over
the LAG, detecting when the LMM is bringing the LAG down.
BFD on LAG SHOULD support this mode.
This may result in two separate detection steps before the layer-3
clients are informed: first the detection of a member links
session, then BBP detecting the LAG has been taken down. To
improve the detection time BFD MAY use the concluded state and
declare BBP sessions on the same LAG as Down when the concluded
state is down.
A beneficial side effect of combining BFD on LAG member links with
BBP is the synchronization of both BFD modules on either end of
Bhatia, et al. Expires June 15, 2012 [Page 8]
Internet-Draft BFD for LAG Interfaces December 2011
the LAG by the BBP BFD sessions.
An implementation MUST provide a configuration knob which lets the
user select the mode if both modes are supported.
5. Security Consideration
This document does not introduce any additional security issues and
the security mechanisms defined in [RFC5880] apply in this document.
6. IANA Considerations
The IANA is requested to assign a well-known multicast IP address:
"224.XXX.XXX.XXX" for IPv4 and FFXX:: for IPv6.
7. Acknowledgements
Most of the text for this document came originally from
draft-chen-bfd-interface-00.
We would also like to thank the members of the BFD WG who expressed
strong support about needing such a mechanism.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, June 2010.
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
June 2010.
8.2. Informative References
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006.
Bhatia, et al. Expires June 15, 2012 [Page 9]
Internet-Draft BFD for LAG Interfaces December 2011
Authors' Addresses
Manav Bhatia
Alcatel-Lucent
Bangalore, 560045
India
Email: manav.bhatia@alcatel-lucent.com
Mach(Guoyi) Chen
Huawei Technologies Co., Ltd
Q14 Huawei Campus, No. 156 Beiqing Road, Hai-dian District
Beijing 100095
China
Email: mach@huawei.com
Zuliang Wang
Huawei Technologies Co., Ltd
Q15 Huawei Campus, No. 156 Beiqing Road, Hai-dian District
Beijing 100095
China
Email: liang_tsing@huawei.com
Liang Guo
China Telecom
Guangzhou
China
Email: guoliang@gsta.com
Marc Binderberger
Lausanne,
Switzerland
Email: marc@sniff.de
Bhatia, et al. Expires June 15, 2012 [Page 10]