BESS Working Group                                           Ali Sajassi
Internet Draft                                             Gaurav Badoni
Category: Standard Track                                 Priyanka Warade
                                                         Suresh Pasupula
                                                           Cisco Systems



Expires: January 2, 2017                                    July 2, 2017


            L3 Aliasing and Mass Withdrawal Support for EVPN
               draft-sajassi-bess-evpn-ip-aliasing-00.txt


Abstract

   This draft proposes an extension to [RFC7432] to do Aliasing for
   Layer 3 routes that is needed for symmetric IRB to build a complete
   IP ECMP.


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright and License Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors. All rights reserved.



Sajassi, et al.         Expires January 2, 2017                 [Page 1]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Table of Contents

   1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1  Terminology . . . . . . . . . . . . . . . . . . . . . . . .  3
   2  IP Aliasing and Backup Path . . . . . . . . . . . . . . . . . .  4
     2.1 Constructing Ethernet A-D per EVPN Instance Route  . . . . .  5
   3 Fast Convergence for Routed Traffic  . . . . . . . . . . . . . .  6
     3.1 Constructing Ethernet A-D per Ethernet Segment Route . . . .  7
       3.1.1 Ethernet A-D Route Targets . . . . . . . . . . . . . . .  7
     3.2 Avoiding convergence issues by syncing IP prefixes . . . . .  7
     3.3 Handling Silent Host . . . . . . . . . . . . . . . . . . . .  8
     3.4 MAC Aging  . . . . . . . . . . . . . . . . . . . . . . . . .  8
   4 Determining Reach-ability to Unicast IP Addresses  . . . . . . .  9
     4.1 Local Learning . . . . . . . . . . . . . . . . . . . . . . .  9
     4.2 Remote Learning  . . . . . . . . . . . . . . . . . . . . . .  9
       4.2.1 Constructing MAC/IP Address Advertisement  . . . . . . .  9
       4.2.2 Route Resolution . . . . . . . . . . . . . . . . . . . .  9
   5  Forwarding Unicast Packets  . . . . . . . . . . . . . . . . . .  9
   6 Load Balancing of Unicast Packets  . . . . . . . . . . . . . . . 10
   7  Security Considerations . . . . . . . . . . . . . . . . . . . . 10
   8  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 10
   9  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     9.1  Normative References  . . . . . . . . . . . . . . . . . . . 10
     9.2  Informative References  . . . . . . . . . . . . . . . . . . 10
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10














Sajassi, et al.         Expires January 2, 2017                 [Page 2]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


1  Introduction

                                     +---------+
                  +-------------+    |         |
                  |             |    |         |
                / |    PE1      |----|         |   +-------------+
               /  |             |    |  MPLS/  |   |             |
              /   +-------------+    |  VxLAN/ |   |     PE3     |---H3
         H1---                       |  NVGRE  |   |             |
              \   +-------------+    |         |---|             |
               \  |             |    |         |   +-------------+
                \ |     PE2     |----|         |
                  |             |    |         |
                  +-------------+    |         |
                                     |         |
                                     |         |
                                     +---------+

  Figure 1: Inter-subnet traffic between Multihoming PEs and Remote PE


   Consider a pair of multi-homing TORs PE1 and PE2. Let there be a host
   H1 attached to them. Consider another TOR PE3 and a host H3 attached
   to it.

   With Asymmetric IRB, if H3 sends inter-subnet traffic to H1, routing
   will happen at PE3. PE3 will have the destination SVI and will
   trigger ARP if it does not have an ARP adjacency to H1. Finally
   routing lookup will resolve destination MAC to H1's MAC address.
   Furthermore, H1's MAC will point to a VxLAN ECMP to T1 and T2, either
   due to host route advertisement or MAC Aliasing as detailed in [RFC
   7432].

   With Symmetric IRB, if H3 sends inter-subnet traffic to H1, routing
   lookup will happen at PE3. PE3 will do a routing lookup in the L3VNI-
   VRF context and is not expected to have the destination SVI.
   Therefore at PE3, we need an IP ECMP list (PE1/PE2) to be built for
   H1's IP address for proper load balancing. If H1 is locally learnt
   only at one of the PEs, PE1 or PE2 due to port-channel hashing, we
   will not be able to build IP ECMP at PE3 as we do not do Aliasing for
   Layer 3 addresses.

   This draft proposes an extension to do Aliasing for Layer 3 routes
   that is needed for symmetric IRB to build a complete IP ECMP.


1.1  Terminology




Sajassi, et al.         Expires January 2, 2017                 [Page 3]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   IRB: Integrated Routing and Bridging

   IRB Interface: A virtual interface that connects the bridging module
   and the routing module on an NVE.

   Broadcast Domain: In a bridged network, the broadcast domain
   corresponds to a Virtual LAN (VLAN), where a VLAN is typically
   represented by a single VLAN ID (VID) but can be represented by
   several VIDs where Shared VLAN Learning (SVL) is used per [802.1Q].

   Bridge Table: An instantiation of a broadcast domain on a MAC-VRF.

   CE: Customer Edge device, e.g., a host, router, or switch.

   EVI: An EVPN instance spanning the Provider Edge (PE) devices
   participating in that EVPN.

   MAC-VRF: A Virtual Routing and Forwarding table for Media Access
   Control (MAC) addresses on a PE.

   Ethernet Segment (ES): When a customer site (device or network) is
   connected to one or more PEs via a set of Ethernet links, then that
   set of links is referred to as an 'Ethernet segment'.

   Ethernet Segment Identifier (ESI): A unique non-zero identifier that
   identifies an Ethernet segment is called an 'Ethernet Segment
   Identifier'.

   LACP: Link Aggregation Control Protocol.

   PE: Provider Edge device.

   Single-Active Redundancy Mode: When only a single PE, among all the
   PEs attached to an Ethernet segment, is allowed to forward traffic
   to/from that Ethernet segment for a given VLAN, then the Ethernet
   segment is defined to be operating in Single-Active redundancy mode.

   All-Active Redundancy Mode: When all PEs attached to an Ethernet
   segment are allowed to forward known unicast traffic to/from that
   Ethernet segment for a given VLAN, then the Ethernet segment is
   defined to be operating in All-Active redundancy mode.

2  IP Aliasing and Backup Path




Sajassi, et al.         Expires January 2, 2017                 [Page 4]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   Host IP and MAC routes are learnt by PEs on the access side via a
   control plane protocol like ARP. In case where a CE is multihomed to
   multiple PE nodes using a LAG and is running in All-Active Redundancy
   Mode, the Host IP will be learnt and advertised in the MAC/IP
   Advertisement only by the PE that receives the ARP packet. As a
   result, the remote PE sees only one next-hop for the Host IP and
   forwards traffic to that advertising PE. Hence, the remote PE is not
   be able to effectively load balance the traffic towards the
   multihomed Ethernet Segment.

   To address this issue, concept of Aliasing that was introduced in RFC
   7432 [RFC7432], can be extended for Layer 3 routes as well. The PE
   SHOULD advertise reachability to an L3 VRF instance on a given ES for
   IP addresses using the existing EAD/EVI route. In this case, the EVPN
   instance is the VRF table to which the host IP address belongs. This
   will henceforth be referred to as the IP-EAD/EVI route.

   A remote PE that receives an IP route with a non reserved ESI SHOULD
   consider it reachable by all PEs that have advertised the IP-EAD/EVI
   advertisement route and the EAD/ES advertisement route containing the
   VRF Route-Targets for that ES. The EAD/ES route must have the Single-
   Active bit in the flags of the ESI Label extended community set to 0
   for Aliasing to take effect.

   The IP-EAD/EVI route cannot be used for route forwarding until the
   associated Ethernet A-D per ES route is received.

   In case of Single-Active redundancy mode, the remote PE SHOULD use
   the IP-EAD/EVI route EVPN Layer 2 attribute extended community as
   mentioned in draft-ietf-bess-evpn-vpws-07 in combination with the
   EAD/ES route to determine the Backup Path for the IP addresses for
   the given IP VRF context. This alternate path SHOULD be installed as
   a backup path for the IP address.


2.1 Constructing Ethernet A-D per EVPN Instance Route

   This draft proposes the advertisement of per EVI Ethernet A-D route
   for IP VRFs to enable Aliasing for IP addresses. The
   usage/construction of this route remains similar to that described in
   RFC 7432 with a few notable exceptions as below.

   * The Route-Distinguisher should be set to the corresponding L3VPN
   context.

   * The Ethernet Tag should be set to 0.

   * The L3 EAD/EVI SHOULD carry one or more IP VRF Route-Target (RT)



Sajassi, et al.         Expires January 2, 2017                 [Page 5]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   attributes.

   * The L3 EAD/EVI SHOULD carry the RMAC Extended Community attribute.

   * The MPLS Label usage should be as described in RFC 7432.

   It is important to note that the prefix for a IP-EAD/EVI and L2-
   EAD/EVI may be identical. However, since the RD of the IP-EAD/EVI is
   set to the corresponding L3VPN context and the RD of the L2-EAD/EVI
   is set to the corresponding MAC-VRF context, the import will happen
   in the respective IP-VRFs and MAC-VRFs and hence, the prefix will not
   be overwritten.

3 Fast Convergence for Routed Traffic

   In EVPN, Host IP reachability is learned via the BGP control plane
   over the MPLS network. All the hosts that are dually connected behind
   an ES are advertised by the PEs belonging to the redundancy group. A
   remote TOR receiving these host routes can loose reachability from
   any of the PEs either due to box reload or core failure or access
   failure for that PE.

   BGP PIC functionality is the existing mechanism for fast convergence
   as described in https://tools.ietf.org/html/draft-rtgwg-bgp-pic-02.
   PIC feature doesn't solve the convergence issue for the access
   failure cases as the PEs are still reachable from the remote TOR.

   To alleviate this, EVPN defines a mechanism to efficiently and
   quickly signal, to remote PE nodes, the need to update their
   forwarding tables upon the occurrence of a failure in connectivity to
   an Ethernet segment.  This is done by having each PE advertise a set
   of one or more Ethernet A-D per ES routes for each locally attached
   Ethernet segment (refer to Section 3.1 below for details on how these
   routes are constructed).  A PE may need to advertise more than one
   Ethernet A-D per ES route for a given ES because the ES may be in a
   multiplicity of EVIs and the RTs for all of these EVIs may not fit
   into a single route.  Advertising a set of Ethernet A-D per ES routes
   for the ES allows each route to contain a subset of the complete set
   of RTs.  Each Ethernet A-D per ES route is differentiated from the
   other routes in the set by a different Route Distinguisher (RD).

   Upon failure in connectivity to the attached ES, the PE withdraws the
   corresponding set of Ethernet A-D per ES routes.  This triggers all
   PEs that receive the withdrawal to update their next-hop adjacencies
   for all IP addresses across IP VRFs associated with the Ethernet
   segment in question.  If no other PE has advertised an Ethernet A-D
   route for the same segment, then the PE that received the withdrawal
   simply invalidates the IP entries for that segment. Otherwise,  the



Sajassi, et al.         Expires January 2, 2017                 [Page 6]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   PE updates its next-hop adjacencies accordingly.

   These routes should be processed with higher priority than other MAC
   or MAC-IP withdrawals upon failure. Similar priority processing is
   needed even on the intermittent RRs.

   This draft is addressing the mass withdrawal behavior for routed
   traffic. For Layer-2, please refer to Section 8.2 of RFC 7432.

3.1 Constructing Ethernet A-D per Ethernet Segment Route

   This section describes the procedures used to construct the Ethernet
   A-D per ES route, which is used for fast convergence (as discussed
   above). The usage/construction of this route remains similar to that
   described in section 8.2.1. of RFC 7432 with a few notable exceptions
   as explained in following sections.

3.1.1 Ethernet A-D Route Targets

   Each Ethernet A-D per ES route MUST carry one or more Route Target
   (RT attributes). The set of Ethernet A-D routes per ES MUST carry the
   entire set of IP VRF RTs for all the IP VRFs in addition to MAC VRF
   RTS for all the EVPN instance to which the Ethernet segment belongs.

3.2 Avoiding convergence issues by syncing IP prefixes

   Consider a pair of multi-homing TORs PE1 and PE2. Let there be a host
   H1 attached to them. Consider another TOR PE3 and a host H3 attached
   to it.

   If the host H1 is learnt on both the PEs, ECMP path list is formed on
   PE3 pointing to (PE1/PE2). Traffic from H3 to H1 is not impacted even
   if one of the TORs becomes unreachable as the path list gets
   corrected upon receiving the mass withdrawal route (Ethernet A-D
   segment).

   Let us consider a case where H1 is locally learnt only on PE1 due to
   port-channel hashing. At PE3, H1 has ECMP path list (PE1/PE2) using
   Aliasing as described in section 2 of this draft. Traffic from H3 can
   reach either of the TORs PE1 or PE2.

   On PE2, all the remote MAC-IP routes belonging to the same Ethernet
   Segment that are advertised by it's respective peers (PE1 in our
   example) should be synced and installed locally on PE2 but not
   advertised as local routes by BGP. When the traffic from H3 reaches
   PE2, it will be able forward the traffic to H1 without any
   convergence delay caused by triggering ARP/ND. In a scaled setup, the
   convergence can be significant as the ARP and ND resolution can take



Sajassi, et al.         Expires January 2, 2017                 [Page 7]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   a lot of time. So syncing the IPv4/6 prefixes that belong to same
   Ethernet Segment helps in solving convergence issues.


3.3 Handling Silent Host

   In continuation with the discussion above, if the reachability of PE1
   is lost, PE3 will update the ECMP list for H1 to PE2, upon receiving
   mass withdrawal from PE1. If host H1 is also withdrawn from PE1, then
   the same route is withdrawn from PE2 and PE3. Hence traffic from H3
   to H1 is black-holed till H1 is re-learnt on PE2.

   This black-holing can be much worse if the H1 behaves like a silent
   host. IP address of H1 will not be re-learnt on PE2 till H1 re-ARPs
   or some traffic triggers ARP for H1.

   PE2 can detect the failure of PE1's reachability in following ways

   a) When core failure or box reload happens on PE1, next hop
   reachability  to PE1 can be detected by the underlay routing
   protocols.

   b) Upon access failure, PE1 sends withdraws the EAD/ES Route and PE2
   can use this as a trigger to detect failure.

   Thus to avoid the black-holing, when PE2 detects loss of reachability
   to PE1, it should trigger ARP/ND for all remote IP prefixes received
   from it's ES peers (i.e. PE1) belonging to same Ethernet Segment
   across IP-VRF contexts. This will force host H1 to reply to the
   solicited ARP/ND from PE2 and refresh both MAC and IP for the
   corresponding host in its tables.

   Even in core failure scenario on PE1, PE1 must withdraw all its local
   L2 connectivity, as L2 traffic should not be received by PE1. So when
   ARP/ND is triggered from PE2 the replies from host H1 can only be
   received by PE2. Thus H1 will be learnt as local route and also
   advertised from PE2.

   It is recommended to have a staggered or delayed deletion of the IP
   routes from PE1, so that ARP/ND refresh can happen on PE2 before the
   deletion.

3.4 MAC Aging

   PE1 would do ARP/ND refresh for H1 before it ages out. During this
   process, H1 on can age out genuinely or due to the ARP/ND reply
   landing on PE2. PE1 must withdraw the local entry from BGP when H1
   entry ages out. PE1 deletes the entry from the local forwarding only



Sajassi, et al.         Expires January 2, 2017                 [Page 8]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   when there are no remote synced entries.

4 Determining Reach-ability to Unicast IP Addresses

4.1 Local Learning

   The procedures for local learning do not change from [RFC7432].

4.2 Remote Learning

   The procedures for remote learning do not change from [RFC7432].

4.2.1 Constructing MAC/IP Address Advertisement

   The procedures for constructing MAC/IP Address Advertisement do not
   change from RFC 7432

4.2.2 Route Resolution

   If the ESI field is set to reserved values of 0 or MAX-ESI, the the
   IP route resolution MUST be based on the MAC-IP route alone.

   If the ESI field is set to a non-reserved ESI, the IP route
   resolution MUST happen only when both the MAC-IP route and the
   associated set of Ethernet AD per ES routes have been received.  To
   illustrate this with an example, consider a pair of multi-homed TORs
   PE1 and PE2 connected to an Ethernet Segment. ES1 in an all-active
   redundancy mode. A given host with IP address H1 is leant by PE1 but
   not by PE2. When the MAC-IP advertisement route from PE1 and a set of
   EAD/ES and Layer 3 EAD/EVI routes from PE1 and PE2 are received, PE3
   can forward traffic destined to H1 to both PE1 and PE2.

   If after (1) PE1 withdraws EAD/ES, then PE3 will forward the said
   traffic to PE2 only.

   If after (1) PE2 withdraws EAD/ES, then PE3 will forward the said
   traffic to PE1 only.

   If after (1) PE1 withdraws the MAC-IP route, then PE3 will do delayed
   deletion of H1, as described in section 3.3.

   If after (1) PE2 advertised the MAC-IP route, but PE1 withdraws it,
   PE3 will continue forwarding to both PE1 and PE2 as long as it has
   the EAD/ES and the Layer 3 EAD/EVI route from both.


5  Forwarding Unicast Packets




Sajassi, et al.         Expires January 2, 2017                 [Page 9]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


   Please refer to Section 5 in the draft-ietf-bess-evpn-inter-subnet-
   forwarding-01

6 Load Balancing of Unicast Packets

   The procedures for load balancing of Unicast Packets do not change
   from [RFC7432]

7  Security Considerations

   The mechanisms in this document use EVPN control plane as defined in
   [RFC7432]. Security considerations described in [RFC7432] are equally
   applicable.

   This document uses MPLS and IP-based tunnel technologies to support
   data plane transport. Security considerations described in [R7432]
   and in [ietf-evpn-overlay] are equally applicable.


8  IANA Considerations



9  References

9.1  Normative References

   [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC1776]  Crocker, S., "The Address is the Message", RFC 1776, April
              1 1995.

   [TRUTHS]   Callon, R., "The Twelve Networking Truths", RFC 1925,
              April 1 1996.


9.2  Informative References


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.





Authors' Addresses



Sajassi, et al.         Expires January 2, 2017                [Page 10]


INTERNET DRAFT       IP Aliasing Support for EVPN>          July 2, 2017


              Ali Sajassi
              Cisco
              Email: sajassi@cisco.com

              Suresh Pasupula
              Cisco
              Email: spasupula@cisco.com

              Gaurav Badoni
              Cisco
              Email: gbadoni@cisco.com

              Priyanka Warade
              Cisco
              Email: pwarade@cisco.com




































Sajassi, et al.         Expires January 2, 2017                [Page 11]