Network Working Group                                          M. Bhatia
Internet-Draft                                            Alcatel-Lucent
Intended status: Standards Track                                 M. Chen
Expires: June 15, 2012                                           Z. Wang
                                            Huawei Technologies Co., Ltd
                                                                  L. Guo
                                                           China Telecom
                                                         M. Binderberger
                                                       December 13, 2011


Bidirectional Forwarding Detection (BFD) on Link Aggregation Group (LAG)
                               Interfaces
                        draft-mmm-bfd-on-lags-00

Abstract

   This document proposes a mechanism to run BFD on Link Aggregation
   Group (LAG) interfaces.  It does so by running an independent BFD
   session on every LAG member link.

   A dedicated well-known multicast IP address for both IPv4 and IPv6 is
   introduced as the destination IP address of the BFD packets when
   running BFD on the member links of the LAG.

   There is currently no standard that describes how BFD should run on
   LAG interfaces.  As a result multiple non-interoperable BFD
   implementations for LAG interfaces exist.  This draft provides a
   short overview as a context for the new proposed mechanism.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any



Bhatia, et al.            Expires June 15, 2012                 [Page 1]


Internet-Draft           BFD for LAG Interfaces            December 2011


   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on June 15, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.
































Bhatia, et al.            Expires June 15, 2012                 [Page 2]


Internet-Draft           BFD for LAG Interfaces            December 2011


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  BFD over LAG with a single session . . . . . . . . . . . . . .  4
     2.1.  Existing implementations . . . . . . . . . . . . . . . . .  4
     2.2.  BFD over Big Pipe  . . . . . . . . . . . . . . . . . . . .  5
   3.  BFD over member links of the LAG . . . . . . . . . . . . . . .  6
     3.1.  BFD protocol details . . . . . . . . . . . . . . . . . . .  6
     3.2.  BFD influence on the LAG Management Module . . . . . . . .  6
     3.3.  Concluded BFD state  . . . . . . . . . . . . . . . . . . .  7
     3.4.  Motivation for the technical design  . . . . . . . . . . .  7
   4.  BFD for LAG and layer-3 applications . . . . . . . . . . . . .  8
   5.  Security Consideration . . . . . . . . . . . . . . . . . . . .  9
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     8.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
     8.2.  Informative References . . . . . . . . . . . . . . . . . .  9
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
































Bhatia, et al.            Expires June 15, 2012                 [Page 3]


Internet-Draft           BFD for LAG Interfaces            December 2011


1.  Introduction

   The Bidirectional Forwarding Detection (BFD) protocol [RFC5880]
   provides a mechanism to detect faults in the bidirectional path
   between two forwarding engines, including interfaces, data link(s),
   and to the extent possible the forwarding engines themselves, with
   potentially very low latency.

   BFD can be used for detecting failures of the path between two
   network devices.  Typically the application clients are not aware of
   any inner structure of the underlying interface, being layer 3
   applications themselves like Open Shortest Path First (OSPF)
   [RFC2328] or Border Gateway Protocol (BGP)[RFC4271].  While this
   works for interfaces like Ethernet and Packet Over SONET (POS), it
   causes problems for bundled interfaces like LAG.

   A LAG is used to bind together several physical ports between two
   adjacent nodes so they appear to higher-layer protocols as a single,
   higher bandwidth "virtual" pipe.  A LAG interface thereby allows
   aggregation of multiple network interfaces as one virtual interface
   for the purpose of providing fault-tolerance and higher bandwidth.

   The problem for BFD is that a single BFD session is subject to the
   load-balance algorithm used for the LAG, i.e.  BFD has no control
   which physical links are used and in which sequence.  This makes it
   impossible for BFD to guarantee a detection of anything but a full
   LAG shutdown.  This LAG shutdown would be initiated by the LAG
   Management Module (LMM) and is typically multiple times slower than
   BFD detection times (multiple 100msec of LMM vs. multiple 10msec of
   BFD).  The solution proposed in this document is to run a BFD session
   on every physical member link the LAG is built upon.  This requires
   the LMM to request BFD sessions for every member link, using BFD as a
   fast detection mechanism.  BFD can combine this information from LMM
   with the layer-3 centric session requests from OSPF and alike to
   provide fast detection for layer-3 applications.


2.  BFD over LAG with a single session

2.1.  Existing implementations

   As mentioned, no standard exists on how to run BFD over LAG
   interfaces.  As a result, a simple approach has been chosen by
   several implementations, that allows it to interoperate and solves
   the problem of establishing a BFD session over a LAG.  This is
   typically done by treating LAG as a big, virtual pipe and ignoring
   the underlying structure (i.e., the component member links).  This is
   not desirable as it does not allow deterministic and fast detection



Bhatia, et al.            Expires June 15, 2012                 [Page 4]


Internet-Draft           BFD for LAG Interfaces            December 2011


   of individual member link failures.  We call this conventional
   approach of running BFD as "BFD over Big Pipe" or "BBP" in short.

   There are various ways of running BFD over LAG interfaces.  Some
   implementations send BFD packets only over the primary or the active
   member link .  Others spray BFD packets over all member links of the
   LAG.  There are issues with each of these approaches.

   In the first approach, BFD will remain up as long as the primary port
   is alive.  It will go down once the primary port goes down till
   another port is selected as the primary.  Another problem with this
   design is with BFD being oblivious to the presence of other member
   links in the LAG.  If a non-primary member link goes down, then BFD
   remains unaffected as it can still send BFD packets over the primary
   link.  BFD will thus remain up and all traffic sent over the failed
   member link will get dropped, till an upper layer protocol like Link
   Aggregation Control protocol (LACP) detects the failed link and
   removes it from the LAG.

   In the second approach, BFD packets are sprayed over all the member
   links of a LAG.  This is done naively via round-robin, where each BFD
   packet is sent using the subsequent member link, in a round-robin
   fashion.  It solves the problem of BFD going down because of the
   primary port going down, but it still does not solve the problem of
   traffic getting lost when one of the member link goes down.  This is
   because when a member link goes down, BFD still remains up and
   traffic continues to go over the link that has failed till a higher
   layer protocol detects this and removes the offending link from the
   LAG.

2.2.  BFD over Big Pipe

   This document proposes using one of the mechanisms described in the
   previous section in combination with the new mechanism of a separate
   BFD session per LAG member link, which will be defined in the next
   section.

   For this reason we need to standardize the simple approach.  The main
   task is to define what it means to treat LAG as a single "big pipe".
   It means:

   o  BFD must work no matter what member link, packets are sent and
      received on.

   o  the Rx/Tx link can change any time and/or regularly with every
      change pattern without causing BFD to fail

   This allows to use the LAG like any other interface and RFC 5880 and



Bhatia, et al.            Expires June 15, 2012                 [Page 5]


Internet-Draft           BFD for LAG Interfaces            December 2011


   RFC 5881 can be used without modification.

   As described in the last section it is advantageous to spray the
   packets in a round-robin fashion across the LAG member links, as
   opposed to sending those over only the primary or the active port.
   It is thus RECOMMENDED that implementations do that.  However, there
   are still some issues left with spraying the BFD packets, that will
   get addressed in the scheme described in the next section.


3.  BFD over member links of the LAG

3.1.  BFD protocol details

   The proposal is to run a BFD session on every member link of the LAG.
   The BFD packets are IP/UDP based packets as defined in RFC 5880
   [RFC5880] and RFC 5881 [RFC5881].  Currently, only asynchronous mode
   is considered in this document.  The echo function is outside the
   document's scope.  At least one system MUST take the Active role
   (possibly both).  The BFD sessions on the member links are
   independent sessions.  They use their own, unique local discriminator
   and their own set of state variables.  Timer values MAY be different,
   even between the sessions belonging to the same LAG.

   The destination IP address is a dedicated well-known multicast IP
   address (224.XXX.XXX.XXX for IPv4, FFXX:: for IPv6, to be assigned by
   IANA).  On Ethernet-based LAG member links the corresponding
   destination multicast MACs will be 01:00:5e:XX:XX:XX for IPv4 and 33:
   33:XX:XX:XX:XX for IPv6.  Each member link will use its own MAC
   address as the source MAC address.

   The demultiplexing of a received packet is solely based on the Your
   Discriminator field, if this field is nonzero.  A zero value may
   happen for the initial Down packet of a session.  In this case
   demultiplexing a BFD for LAG packet MUST be based on some combination
   of other fields which MUST include either the destination IP or the
   destination MAC address.

   The Address Family used is fixed per LAG, i.e. the BFD sessions on
   the member links of a particular LAG are either all using IPv4 or all
   using IPv6.  An implementation MUST provide a configuration knob to
   select the address family and MAY extend this to some sort of auto-
   discovery.  The default address family is IPv4.

3.2.  BFD influence on the LAG Management Module

   The LAG Management Module (LMM) is a client of BFD, requesting BFD
   sessions for all the LAG member links.  For link failure detection



Bhatia, et al.            Expires June 15, 2012                 [Page 6]


Internet-Draft           BFD for LAG Interfaces            December 2011


   the LMM can use BFD instead of or in parallel with LACP.

   A member link of the LAG is not used anymore for data forwarding when
   the particular BFD session running over that link goes down.  The
   member link MUST be removed from the LAG.  The BFD session for the
   link remains, i.e. it is not deleted.

   To add a member link to the LAG, LMM MAY wait for the BFD session on
   the link to come Up.  There may be a deadlock situation since the
   link interface not being active (e.g., layer 3 protocol down) may
   prevent BFD packets, including other control protocols packets (e.g.
   ARP) that are tightly coupled with the status of the interface, to be
   transmitted between the pair of interfaces, thus failing to bring up
   the interfaces.

   To avoid the deadlock, BFD packets SHOULD NOT be blocked by the layer
   N protocol status of the interface when the application depends on
   the BFD status to enable layer N of the interface.  If this cannot be
   achieved then the BFD status MUST be ignored by the application when
   bringing up an interface.  The BFD status can then be used afterwards
   to bring the interface down.

   The behaviour of the LMM MUST be configurable if waiting for BFD
   status of Up to add a member link is supported, to allow an
   alternative mode of adding the member link irrespective of the BFD
   state for interoperability purpose.

3.3.  Concluded BFD state

   An additional state variable is introduced for BFD on LAG: the
   concluded state.  The state values are Down and Up.

   The details of how BFD derives the concluded state is outside the
   scope of the document.  The idea is that the LMM may declare a LAG as
   down when a certain threshold has been hit, e.g. a minimum required
   bandwidth for the LAG.  BFD could for example duplicate the LMM logic
   or it could use an API to LMM to learn about the decision of the LAG
   management module.  What is relevant for BFD on LAG is that the
   concluded state is the overall state of the LAG.

   The concluded state is important for layer-3 clients requesting BFD
   sessions over the LAG or over Vlans on the LAG.  Details will be
   discussed in section 4.

3.4.  Motivation for the technical design

   The primary goal was to stay close to the existing standards RFC 5880
   [RFC5880] and RFC 5881 [RFC5881], allowing the reuse of existing



Bhatia, et al.            Expires June 15, 2012                 [Page 7]


Internet-Draft           BFD for LAG Interfaces            December 2011


   implementations for IP-based point-to-point BFD.  At the same time
   BFD for LAG and the already existing BFD over Big Pipe drafted in
   Section 2 should be able to run in parallel.  The destination IP
   address can be used to disambiguate the BFD packets over the member
   links and the BFD packets over the Big Pipe, should an implementation
   decide to support both.

   To overcome the problem that a member link may not support ARP when
   not being an active LAG member a multicast MAC was chosen to allow
   the destination interface port to accept the BFD packet.  Combining
   these two requirements results in using a well-defined IP Multicast
   address.


4.  BFD for LAG and layer-3 applications

   The information about the member links belonging to a LAG interface
   comes from the LAG management module (LMM).  BFD helps the LMM to
   detect and converge fast.  Layer-3 protocols may use BFD for LAG in
   one of the following ways:

   o  For sessions requested by layer-3 clients like OSPF a virtual
      session is created.  This virtual session is not creating actual
      BFD packets on the LAG interface.  Instead it's state, which is
      reported to the layer-3 client, is identical with the concluded
      state.

      BFD on LAG MUST support this mode, and it is the default mode.

      With virtual sessions synchronization between the two BFD modules
      on either end of the LAG is guaranteed only by an identical
      configuration and logic on both ends.

   o  BFD for LAG is requested by the LMM only.  For sessions requests
      from layer-3 applications the BFD over Big Pipe (BBP) mechanism is
      used.  BBP would run as an independent detection mechanism over
      the LAG, detecting when the LMM is bringing the LAG down.

      BFD on LAG SHOULD support this mode.

      This may result in two separate detection steps before the layer-3
      clients are informed: first the detection of a member links
      session, then BBP detecting the LAG has been taken down.  To
      improve the detection time BFD MAY use the concluded state and
      declare BBP sessions on the same LAG as Down when the concluded
      state is down.
      A beneficial side effect of combining BFD on LAG member links with
      BBP is the synchronization of both BFD modules on either end of



Bhatia, et al.            Expires June 15, 2012                 [Page 8]


Internet-Draft           BFD for LAG Interfaces            December 2011


      the LAG by the BBP BFD sessions.

   An implementation MUST provide a configuration knob which lets the
   user select the mode if both modes are supported.


5.  Security Consideration

   This document does not introduce any additional security issues and
   the security mechanisms defined in [RFC5880] apply in this document.


6.  IANA Considerations

   The IANA is requested to assign a well-known multicast IP address:
   "224.XXX.XXX.XXX" for IPv4 and FFXX:: for IPv6.


7.  Acknowledgements

   Most of the text for this document came originally from
   draft-chen-bfd-interface-00.

   We would also like to thank the members of the BFD WG who expressed
   strong support about needing such a mechanism.


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5880]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
              (BFD)", RFC 5880, June 2010.

   [RFC5881]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
              (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
              June 2010.

8.2.  Informative References

   [RFC2328]  Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.




Bhatia, et al.            Expires June 15, 2012                 [Page 9]


Internet-Draft           BFD for LAG Interfaces            December 2011


Authors' Addresses

   Manav Bhatia
   Alcatel-Lucent
   Bangalore,   560045
   India

   Email: manav.bhatia@alcatel-lucent.com


   Mach(Guoyi) Chen
   Huawei Technologies Co., Ltd
   Q14 Huawei Campus, No. 156 Beiqing Road, Hai-dian District
   Beijing  100095
   China

   Email: mach@huawei.com


   Zuliang Wang
   Huawei Technologies Co., Ltd
   Q15 Huawei Campus, No. 156 Beiqing Road, Hai-dian District
   Beijing  100095
   China

   Email: liang_tsing@huawei.com


   Liang Guo
   China Telecom
   Guangzhou
   China

   Email: guoliang@gsta.com


   Marc Binderberger
   Lausanne,
   Switzerland

   Email: marc@sniff.de










Bhatia, et al.            Expires June 15, 2012                [Page 10]