INTERNET-DRAFT                                               Ali Sajassi
Intended Status: Standards Track                             Samer Salam
                                                            Sami Boutros
                                                             Keyur Patel
                                                                   Cisco
Expires: January 17, 2013                                  July 16, 2012


                      E-VPN Ethernet Segment Route
             draft-sajassi-l2vpn-evpn-segment-route-01.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright and License Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Sajassi et al.          Expires January 17, 2013                [Page 1]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


Abstract

   [E-VPN] defines a solution and architecture for BGP MPLS-based
   Ethernet VPNs. This document describes procedures and additional BGP
   route attributes that enhance the multi-homing capabilities of the
   solution. These are: the DF Election Attribute and the Inter-chassis
   Communication Attribute. This draft describes their usage, advantages
   and encoding.


Table of Contents

   1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1  Terminology . . . . . . . . . . . . . . . . . . . . . . . .  3
   2  Motivation and Usage  . . . . . . . . . . . . . . . . . . . . .  3
     2.1  Support of Multi-Chassis Ethernet Bundles . . . . . . . . .  3
     2.2 Avoiding Relearning of Subscriber/Session State  . . . . . .  4
     2.3 Preventing Transient Loops and Packet Duplication  . . . . .  4
   3  BGP Encoding  . . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1 DF Election Attribute  . . . . . . . . . . . . . . . . . . .  5
     3.2 Inter-chassis Communication Attribute  . . . . . . . . . . .  5
   4. DF Election with Paxos Algorithm  . . . . . . . . . . . . . . .  6
   5  LACP State Synchronization  . . . . . . . . . . . . . . . . . .  7
   6  Subscriber/Session State Synchronization  . . . . . . . . . . .  8
   7  Security Considerations . . . . . . . . . . . . . . . . . . . .  8
   8  IANA Considerations . . . . . . . . . . . . . . . . . . . . . .  8
   9  References  . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     9.1  Normative References  . . . . . . . . . . . . . . . . . . .  9
     9.2  Informative References  . . . . . . . . . . . . . . . . . .  9
   Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . .  9





















Sajassi et al.          Expires January 17, 2013                [Page 2]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


1  Introduction

   [E-VPN] defines a solution and architecture for BGP MPLS-based
   Ethernet L2VPN services with advanced multi-homing capabilities. In
   this draft we define procedures and extensions that enhance the
   multi-homing capabilities of the E-VPN solution in the following
   areas:

       - Preventing transient loops and packet duplication
       - Support of multi-chassis Ethernet bundles
       - Avoiding relearning of subscriber/session state

   Two new BGP route attributes are defined: the DF Election attribute
   and the Inter-chassis Communication attribute.

   Section 2 discusses the motivation and usage of the new attributes.
   Section 3 describes the BGP encoding.

1.1  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2  Motivation and Usage

   This section focuses on the reasons for defining the 2 BGP
   attributes, and describes their usage in E-VPN.

2.1  Support of Multi-Chassis Ethernet Bundles

   When a CE is multi-homed to a set of PE nodes using the [802.1AX]
   Link Aggregation Control Protocol (LACP), the PEs must act as if they
   were a single LACP speaker for the Ethernet links to form a bundle,
   and operate correctly as a Link Aggregation Group (LAG). To achieve
   this, the PEs connected to the same multi-homed CE must synchronize
   LACP configuration and operational data among them. The
   synchronization is required for the following reasons:

   - to determine if the links in the Ethernet bundle are to operate in
   all-active or hot-standby resiliency mode

   - to detect and handle CE mis-configuration when LACP Port Key is
   configured on the PE

   - to detect and handle mis-wiring between CE and PE when LACP Port
   Key is configured on the PE




Sajassi et al.          Expires January 17, 2013                [Page 3]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


   - to deterministically agree on which link(s) should join a bundle
   based on port and system priorities, especially when the number of
   links exceeds the aggregation capacity of the PEs, and the PE LACP
   System Priority is higher than the CE's

   - to detect and react to actor/partner churn where the LACP speakers
   are not able to converge

   Synchronization of LACP state between PEs is performed using the
   Inter-chassis Communication attribute carried in the Ethernet Segment
   route, as described in the 'LACP State Synchronization' section
   below.

2.2 Avoiding Relearning of Subscriber/Session State

   For certain applications, the PE builds and maintains per subscriber
   or per session 'soft' state that is used for either optimizing the
   traffic forwarding or enforcing security. Examples of such per
   subscriber/session state includes:

   - multicast state derived from IGMP or PIM snooping

   - IP address to MAC address bindings gleaned from snooping ARP and/or
   DHCP packets, and used to prevent address spoofing or masquerading

   When a set of PE nodes provides multi-homed connectivity for an
   Ethernet Segment, this 'soft' state is built on the active PE node
   that forwards and snoops the relevant protocol packets. In case of a
   link or node failure, the state must be reconstructed on the backup
   PE (e.g. by waiting for the next IGMP query or ARP message or by
   issuing unsolicited queries). This may cause traffic disruption and
   affect the availability of the service. Alternatively, the state can
   be synchronized among the PE nodes via BGP, and that would enhance
   the convergence of the service after failure.

   Synchronization of subscriber/session state between PE nodes is
   performed using the Inter-chassis Communication attribute carried in
   the Ethernet Segment route, as described in the 'Subscriber/Session
   State Synchronization' section below.

2.3 Preventing Transient Loops and Packet Duplication

   During routing transients, different PEs may end up electing
   different DFs for the same Ethernet Segment due to inconsistent views
   of the network. If the Ethernet Segment is a multi-homed device, this
   may lead to transient packet duplication. If the Ethernet Segment is
   a multi-homed network, the presence of multiple DFs may lead to
   transient forwarding loops in addition to potential packet



Sajassi et al.          Expires January 17, 2013                [Page 4]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


   duplication.

   To eliminate these issues, an optional handshake mechanism is defined
   to ensure that the PE nodes connected to the same Ethernet Segment
   share a common view of the access network topology. This handshake is
   performed using the DF Election attribute carried in the Ethernet
   Segment route, as discussed in Appendix I: 'DF Election with Paxos
   Algorithm'.

3  BGP Encoding

   This section defines the encoding of the BGP attributes.

3.1 DF Election Attribute

                +---------------------------------------+
                |      State  (2 octets)                |
                +---------------------------------------+
                |      Sequence No.   (4 octets)        |
                +---------------------------------------+
                | Local No. of links  (2 octets)        |
                +---------------------------------------+
                | Total No. of links  (2 octets)        |
                +---------------------------------------+
                |              Flags (1 octet)          |
                +---------------------------------------+
                |      No. of IP addresses (1 octet)    |
                +---------------------------------------+
                |  Ordered list of tuples:              |
                |  [IP address Length (1 octet),        |
                |  IP Address (4 or 16 bytes)]|         |
                +---------------------------------------+

   State field can take one of the following values:

                0x0000 Initializing
                0x0001 Proposal Pending
                0x0002 Promise Pending
                0x0003 Active

   Flags field is encoded as follows:

                7 bits: reserved
                Least significant bit: Protecting flag

3.2 Inter-chassis Communication Attribute





Sajassi et al.          Expires January 17, 2013                [Page 5]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


                +---------------------------------------+
                |       Type  (2 octets)                |
                +---------------------------------------+
                |       Length   (1 or 2 octets)        |
                +---------------------------------------+
                |       Opaque  (var)                   |
                +---------------------------------------+

4. DF Election with Paxos Algorithm

   The procedures in this section guarantee that all PE nodes in a given
   redundancy group agree on a unique DF for a given Ethernet Segment.
   This eliminates the problem of transient forwarding loops and
   transient packet duplicates described above. The procedures can be
   broken down to the following steps:

   1. When a PE discovers the ESI of the attached Ethernet Segment, it
   advertises an Ethernet Segment route with the associated ES-Import
   extended community attribute and with the 'Initializing' code in the
   State field of the DF Election attribute.

   2. The PE then starts a timer to allow the reception of Ethernet
   Segment routes from other PE nodes in the same redundancy group.

   3. When the timer expires, each PE builds an ordered list of the IP
   addresses of all the PE nodes connected to the Ethernet Segment
   (including itself), in increasing numeric value.

   4. The first PE in the ordered list then elects itself as the Arbiter
   Node (AN). It initiates the handshake by sending an Ethernet Segment
   route with 'Proposal Pending' code in the State field of the DF
   Election attribute.

   5. When a PE node receives an Ethernet Segment route with the
   'Proposal Pending' code, it takes one of the following options:

      a. If the receiving PE ranks the transmitting PE's IP address as
      the top entry in its local ordered list, it acknowledges the
      handshake by responding with an Ethernet Segment route with the
      'Promise Pending' code in the State field of the DF Election
      attribute. This includes the scenario where the receiving PE
      forfeits the AN role to another advertising PE with a numerically
      lower IP address.

      b. If the receiving PE does not rank the transmitting PE's IP
      address as the top entry in its local ordered list, and the
      receiving PE had advertised an Ethernet Segment route with the
      'Initializing' code or with the 'Proposal Pending' code, then the



Sajassi et al.          Expires January 17, 2013                [Page 6]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


      PE takes no further action.


   6. When the AN receives 'Promise Pending' from all of the PE nodes in
   the ordered list, it sends an updated Ethernet Segment route with the
   'Active' code in the DF Election attribute.

   7. When the other PE nodes in the redundancy group receive the
   'Active' code from the AN, they respond with an updated Ethernet
   Segment route with the 'Active' code in the DF Election attribute.
   This concludes the handshake.

   In the case where the DF election is performed at the granularity of
   an Ethernet Segment, i.e. there is a single DF for all VLANs on the
   segment, the Arbiter Node is effectively the Designated Forwarder for
   the segment. All the PE nodes start off with their ports, that are
   connected to the segment, blocked in Step 1 (for multi-destination
   traffic from core). And in Step 6, the PE confirmed as the AN (i.e.
   DF) unblocks its port towards the Ethernet Segment. DF election at
   the granularity of (Ethernet Segment, VLAN) is discussed in the "VLAN
   Carving" section below.


5  LACP State Synchronization

   To support CE multi-homing with multi-chassis Ethernet bundles, the
   PE nodes connected to a given CE should synchronize [802.1AX] LACP
   state amongst each other. This includes the following LACP specific
   configuration parameters:

   - System Identifier (MAC Address): uniquely identifies a LACP
   speaker.
   - System Priority: determines which LACP speaker's port priorities
   are used in the Selection logic.
   - Aggregator Identifier: uniquely identifies a bundle within a LACP
   speaker.
   - Aggregator MAC Address: identifies the MAC address of the bundle.
   - Aggregator Key: used to determine which ports can join an
   Aggregator.
   - Port Number: uniquely identifies an interface within a LACP
   speaker.
   - Port Key: determines the set of ports that can be bundled.
   - Port Priority: determines a port's precedence level to join a
   bundle in case the number of eligible ports exceeds the maximum
   number of links allowed in a bundle.

   The above information must be synchronized between the PE nodes
   wishing to form a multi-chassis bundle with a given CE, in order for



Sajassi et al.          Expires January 17, 2013                [Page 7]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


   the former to convey a single LACP peer to that CE. This is required
   for initial system bring-up and upon any configuration change.
   Furthermore, the PEs must also synchronize operational (run-time)
   data, in order for the LACP Selection logic state-machines to
   execute. This operational data includes the following LACP
   operational parameters, on a per port basis:

   - Partner System Identifier: this is the CE System MAC address.
   - Partner System Priority: the CE LACP System Priority
   - Partner Port Number: CE's AC port number.
   - Partner Port Priority: CE's AC Port Priority.
   - Partner Key: CE's key for this AC.
   - Partner State: CE's LACP State for the AC.
   - Actor State: PE's LACP State for the AC.
   - Port State: PE's AC port status.

   The above state needs to be communicated between PEs forming a
   multi-chassis bundle during LACP initial bring-up, upon any
   configuration change and upon the occurrence of a failure.

   It should be noted that the above configuration and operational state
   is localized in scope and is only relevant to PEs within a given
   Redundancy Group, i.e. which connect to the same Ethernet Segment
   over a given Ethernet bundle. Furthermore, the communication of state
   changes, upon failures, must occur with minimal latency, in order to
   minimize the switchover time and consequent service disruption.

   Without synchronization of the above parameters, the system is
   subject to the issues outlined in section 2.2 above.


6  Subscriber/Session State Synchronization

   Synchronization of subscriber/session state between PE nodes is
   performed using the Inter-chassis Communication attribute carried in
   the Ethernet Segment route. The various applications are responsible
   for the encoding and decoding of the relevant data, and this is
   outside the scope of this draft. BGP provides a reliable transport
   service in this case.

7  Security Considerations

   There are no additional security aspects beyond those of VPLS/H-VPLS
   that need to be considered.


8  IANA Considerations




Sajassi et al.          Expires January 17, 2013                [Page 8]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


   To be added in a later revision.


9  References

9.1  Normative References

   [RFC2119]   S. Bradner, "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.



9.2  Informative References

   [E-VPN]   Aggarwal et al., "BGP MPLS Based Ethernet VPN", draft-
               raggarwa-sajassi-l2vpn-evpn-02.txt, work in progress,
               March, 2011.

   [EVPN-REQ] Sajassi et al., "Requirements for Ethernet VPN (E-VPN)",
               draft-sajassi-raggarwa-l2vpn-evpn-req-00.txt, work in
               progress, October, 2010.



Author's Addresses

   Ali Sajassi
   Cisco
   170 West Tasman Drive
   San Jose, CA  95134, US
   Email: sajassi@cisco.com

   Samer Salam
   Cisco
   595 Burrard Street, Suite 2123
   Vancouver, BC V7X 1J1, Canada
   Email: ssalam@cisco.com

   Sami Boutros
   Cisco
   170 West Tasman Drive
   San Jose, CA  95134, US
   Email: sboutros@cisco.com

   Keyur Patel
   Cisco
   170 West Tasman Drive
   San Jose, CA  95134, US



Sajassi et al.          Expires January 17, 2013                [Page 9]


INTERNET DRAFT   draft-sajassi-l2vpn-evpn-segment-route    July 16, 2012


   Email: keyupate@cisco.com


















































Sajassi et al.          Expires January 17, 2013               [Page 10]