Internet-Draft EVPN-lite February 2021
Wang & Chen Expires 23 August 2021 [Page]
Workgroup:
BESS WG
Published:
Intended Status:
Standards Track
Expires:
Authors:
Y. Wang
ZTE Corporation
R. Chen
ZTE Corporation

Light Weighted EVPN

Abstract

When PBB EVPN [RFC7623] is used in Segment Routing networks, it is complicated to make use of the SID list to carry a function that is aiming for C-MACs.

In [I-D.ietf-spring-srv6-network-programming], End.DX2 function is defined, this function can be used in EVPN VPLS. When it is used in EVPN VPLS, the data-plane learning defined in End.DT2U function can also be activated for End.DX2 function. On the basis of such End.DX2 function, SRv6 EVPN can meet all the requirements per [RFC7623] and bring us some other benefits. Such SRv6 EVPN is called light-weighted SRv6 EVPN, and it will be more simpler than PBB EVPN over SRv6.

It is easy for the light-weighted SRv6 EVPN to carry a SID that is aiming for customer ethernet packets, because there will be no other ethernet header between the SID list and the customer ethernet header. These SIDs may be user-defined functions for the customer ethernet headers.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 23 August 2021.

Table of Contents

1. Introduction

1.1. Background

When there are too many customer-MACs (C-MACs), the RRs and/or ASBRs will be overloaded by the RT-2 routes for these MACs according to [RFC7432]. This issue can be simply solved by making the remote C-MAC entries learnt via data-plane MAC learning (like what PBB VPLS have done since [RFC7041]) rather than received from RT-2 routes. This simplified solution will works as well as PBB VPLS. But this simplified solution will lose many important features which is based on the ESI concept. Because the ingress-ESI can't be learnt via data-plane MAC learning at the egress PE. So when the data packets is forwarded following these MAC entries, they can't benefit from the EAD/EVI routes as per RFC7432. So the All-Active Redundancy mode for ES can't be supported. This make the simplified solution can't work as well as PBB EVPN ([RFC7623]).

This document proposes some new extensions to [RFC7432] to achieve all-active mode ES redundancy on TPEs and reduce the C-MAC loads for RRs and ASBRs at the same time. The new solution will work even more better than PBB EVPN under the help of these extensions, especially when there is no deployment of MPLS dataplane.

Furthermore, it naturally brings the benefits of high scalability, faster network convergence, and reduced operational complexity, and we call it light-weighted EVPNs because of these advantages.

1.2. Overview

In [RFC7432], the C-MACs is advertised via RT-2 route. This behavior is inheritted by [RFC8365] and [I-D.ietf-bess-srv6-services]. but in order to solve the C-MAC overload problem for RRs and ASBRs, we have to return to a PBB-like dataplane C-MAC learning procedures.

We discuss all the requirements for a light-weighted EVPN solution which pushes no C-MAC entries into the backbone network in Section 2. Note that some of these requirements is not supported well by PBB EVPN.

In this document, the light-weighted EVPN solutions are also called as EVPN-lite for short. A total of four EVPN-lite solutions are proposed since [Revision-01]. These solutions are VXLAN over EVPN IP-VRF, light-weighted VXLAN EVPN, light-weighted MPLS EVPN, light-weighted SRv6 EVPN. But this revision focuses its attention on the SRv6 EVPNs.

In order to compare these five solutions with [RFC7348] and [RFC7623] whose C-MAC entries are also not pushed into the backbone network, two terms are introduced in this document, because the comparisons need to be done in unified terminology. One term is "Global ESI Indicator (GEI)", which is called as B-MAC in PBB EVPN. The other term is "EVI's Global Dicreminator (EGD)", which is called as I-SID in PBB EVPN.

Note that the EVI here corresponds to the I-Component of [RFC7623], not the B-Component. In fact, there will be no typical B-components in some of the above seven solutions.

Note that the GEI and EGD in different EVPN-lite solutions are very different. The details will be described in Section 4.

On the basis of GEI concept, then we define two route-types for EVPN-lite: The first route type is GEI/ES route, which is called as RT-2 route in PBB EVPN. The second route type is GEI/EVI route, which is called as EAD/EVI roue in [RFC7432].

The details of these terms are described in Section 1.3.

1.3. Terminology

Most of the terminology used in this documents comes from [RFC7432] and [I-D.ietf-bess-srv6-services] except for the following:

  • Light-weighted EVPN: The EVPN solution with high scalability and reduced operational complexity.
  • EVPN-lite: The Light-weighted EVPN is also called EVPN-lite for short.
  • C-MAC: Customer MAC, it is the same as the C-MAC of PBB EVPN.
  • ISID: a broadcast domain identifier in PBB I-Component.
  • LDV: Local Discreminating Value. It is similar to the Local Discreminating Value of type 3 ESI.
  • GDV: Global Discreminating Value. An identifier with global uniqueness.
  • EGD: EVI-GDV, an EVI's Global Discreminator, it is a GDV for an EVI instance. A EGD is used to idenfify an EVPN Instance (EVI) in data plane. The EGD is a Global Discreminating Value (GDV) of that EVI, so it is also the abbreviation of EVI-GDV. e.g. The EGD of [RFC7348] is a global VNI.
  • ESI Indicator: A Global ID for an ESI. Note that different PE may assign different ESI-indicator for the same ESI, espacially when the ES redundancy mode is single-active. e.g. The ESI indicator of [RFC7623] is B-MAC.
  • GEI: Global ESI Indicator. It is the same as the "ESI Indicator" except for the emphasization to its global uniqueness. A GEI is used in data plane to identify an ESI, because it have global uniqueness across the service domain of a corresponding EVPN Instance (EVI). But an ESI may have a few GEIs, each for a TPE, espacially in the single-active mode of ES redundancy. And in E-Tree scenarios, an ESI may have two GEIs on the same PE, one for Root ACs, one for Leaf ACs. e.g. The GEIs for an ESI of [RFC8317] is two B-MACs, one for root ACs, one for Leaf ACs.
  • GEI/ES: The EVPN route which is used to advertise the relation between ESI and its GEI. Note that the GEI/ES route is advertised per ESI basis on a specified PE. In PBB EVPN, the GEI/ES route is the MAC Advertisement Route. Note that different solutions may have different GEI/ES routes. Note that a GEI/ES don't have to be an EAD/ES route.
  • EAD/EVI: An Ethernet A-D route per EVI.
  • GEI/EVI: The EVPN route which is used to advertise the relation between <ESI/GEI, EVI> and its EVPN label and MPLS nexthops. Note that in PBB EVPN, such route is not used. Note that different solutions may have different GEI/EVI routes. Note that a GEI/EVI don't have to be an EAD/EVI route.
  • Arg.EGD: The argument part of a SID of the End.DX2AGG function is called as Arg.EGD, because the value of that argument will be a AC-ID.
  • RT-2: MAC/IP Advertise Route.
  • MAC Entry: An entry in the EVPN MAC table in data-plane.
  • ESI SID: An SRv6 SID whose function type is End.DX2AGG. Note that when the ESI is all-active mode, the ESI SID is the same on all PEs of that ES, according to Section 4.1. In such case, the ESI SID can be called as ES anycast SID too.
  • ESI IP: An End.DX2AGG SID with its Argument part being set to zero.
  • VXLAN EVPN: EVPN per [RFC8365].
  • EVPN VXLAN: A broadcast domain per [RFC7348], but use IMET routes of [RFC8365] to construct VXLAN tunnels. Note that an EVPN VXLAN will not use EAD/EVI routes or MAC/IP Advertisement Routes.
  • PLR - A router at the point of local repair in the underlay network. In egress node protection, it is the penultimate hop router on an anycast tunnel.
  • Anycast ECMP SID - An anycast SID that is load-balanced by the underlay network.
  • Anycast FRR SID - An anycast SID that is fast-rerouted by the underlay network.

2. Requirements

Light-weighted SRv6 EVPNs should be provided together with the following requirements:

2.1. No C-MAC Awareness in the Backbone

In typical operation, an EVPN PE sends a BGP MAC Advertisement route per C-MAC address. In certain applications, this poses scalability challenges, as is the case in data center interconnect (DCI) scenarios where the number of virtual machines (VMs), and hence the number of C-MAC addresses, can be in the millions. This is called as C-MAC overload of DC Backbone. In such scenarios, it is required to reduce the number of BGP MAC Advertisement routes by relying on a 'EVPN-lite' scheme, as is provided by ESI and its equivalents (e.g. Pseudo B-MAC, ESI IP).

2.2. EVPN IRB Support

The PBB-VPLS/PBB-EVPN is not friendly to IRB usecase because of its complicated Protocol Stack, so it is used just in pure L2VPN usecase up to now in the industry.

The solution should provide efficient forwarding performance in EVPN IRB use cases.

2.3. Unified Encapsulation per Scenario

PBB EVPN, especially the MPLS encapsulation of its B-VPLS, is typically not used in DC Scenario. So we bring PBB and MPLS encapsulation to DC Backbone just due to the C-MAC overload problem. EVPN IRB is widely deplyed in DC scenarios, but PBB EVPN is not friendly for EVPN IRB use cases. So we have to use different solutions in EVPN IRB and C-MAC reduction use cases. We believe that if we choose VXLAN/Geneve data-plane, we will prefer to use the same data-plane in all use cases, e.g. EVPN IRB, C-MAC reduction. So it is necessary to make NVO3/MPLS/SRv6 EVPN to support Section 2.1 in order to provider a unified solution for data center and other secenarios.

2.4. ESI Features Remain Supported

Two redundancy modes are defined in [RFC7432]. They are All-Active mode and Single-Active mode.

In All-active mode, the C-MAC movement among the different adjacent PE nodes of the same ESI should not be considered as C-MAC mobility. In Single-Active mode, such movements can be considered as C-MAC mobility.

2.5. Flexible Multi-homing Remains Supported

Flexible multi-homing means that different ES instances can have different adjacent-PEs. We call all the adjacent-PEs of the same ES instances as that ES's location-set in this document. Flexible multi-homing means that different ES can have different location-set.

For example, ES1's location-set is {PE1}, ES2's location-set is {PE2, PE3}, ES3's location-set is {PE1, PE3}, and ES4's location-set is {PE2,PE4}.

2.6. C-MAC Address Learning and Confinement

In EVPN, all the PE nodes participating in the same EVPN instance are exposed to all the C-MAC addresses learnt by any one of these PE nodes because a C-MAC learnt by one of the PE nodes is advertised in BGP to other PE nodes in that EVPN instance. This is the case even if some of the PE nodes for that EVPN instance are not involved in forwarding traffic to, or from, these C-MAC addresses. Even if an implementation does not install hardware forwarding entries for C-MAC addresses that are not part of active traffic flows on that PE, the device memory is still consumed by keeping record of the C-MAC addresses in the routing information base (RIB) table. In network applications with millions of C-MAC addresses, this introduces a non- trivial waste of PE resources. As such, it is required to confine the scope of visibility of C-MAC addresses to only those PE nodes that are actively involved in forwarding traffic to, or from, these addresses.

2.7. No C-MAC Flushing for All-Active ESes

Just as in [RFC7432], it is required to avoid C-MAC address flushing upon link, port, or node failure for remote All-Active multihomed segments.

2.8. Independent C-MAC Flushing for Single-Active ESes

Just as in [RFC7432], upon single-active ESI's link or port failure, the C-MACs of other single-active ESes from the same PE will not be flushed.

2.9. Independent Convergency per <ESI, EVI>

When the physical port of an All-Active ES works well, but a single Ethernet Tag ID (ETI) of that ES fails, The traffic to that ETI of that ES will be re-routed to other adjacent PE of the same ES, but the traffic to other ETIs of the same ES will not be affected.

Note that when AC (ES link) fails but PE node still works well, there should not be steady bypassing traffic either. The steady bypassing problem is discussed in [I-D.wang-bess-evpn-egress-protection].

2.10. Route Aggregation and Default Route in Backbone

The routes per ESIs can be aggregated in Backbone network. Even the default route should be supported when the B-Component is an EVPN IP-VRF (e.g. in VXLAN over IP-VRF solutions).

In SRv6 EVPN, different sub-interfaces of the same ESI can have different ESI-indicators in order to achieve Independent Convergency per <ESI, EVI>. But only the common prefix of them should be advertised (both in underlay network and in overlay network) before any of the sub-interfaces fails.

2.11. ARP Suppression

The ARP suppression requires <IP,MAC> entries to be steadily held on all TPEs, So it conflicts with Section 2.6. But if the C-MAC confinement requirements is not so important in some scenarios, The ARP Suppression can be activated. This is an option.

2.12. ESI Indicator Aggregation

There are obvious difference between "ESI Route Aggregation" and "ESI Indicator Aggregation". The "ESI Route Aggregation" is that some ESI Indicators are advertised by underlay protocols in a aggregatated manner, but different ESIs still have different ESI-Indicators. The "ESI Indicator Aggregation" is that different ESIs use the same ESI-Indicator.

Note that the "ESI Route Aggregation" is recommanded as long as it is possible, but the "ESI Indicator Aggregation" can only be used under certain restraints.

When two ESes are attached to the same redundancy group of PEs, they can share the same ESI indicator. But this will bring out some issues too. One of these issues is that they may be attached to different groups of PEs in the future. Another issue is that when only one of the ESes fails, the ESI indicator can't be withdrawn by that PE, so the steady bypass of that ES arises immediately after its failture on that PE. If these issues are not so important in some scenarios, The ESI-Indicator Aggregation may be activated. This is an option.

Note that when ESI Indicator Aggregation is activated, the local-bias ES split-horizon procedures or its variations (like what [I-D.eastlake-bess-evpn-vxlan-bypass-vtep] does) should be used.

Note that ESI Indicator Aggregation works well with single-active ESIs (see Section 4.2), its steadby bypassing problem will arise with all-active ESIs only.

Note that the sub-interfaces of an ESI may be assigned with different ESI-indicators, and these ESI-indicators can be aggregated into a common prefix, this common prefix is assigned with the ESI. In such case, only the common prefix should be advertised before any of the sub-interfaces fails. But this is not considered as "ESI Indicator Aggregation", this is "ESI Route Aggregation".

2.13. Unequal load-balance

The light-weighted EVPNs should support the unequal load-balance defined in [I-D.ietf-bess-evpn-unequal-lb].

2.14. AC-aware Service Interface

In AC-aware bundling service interface, the ESes may make its two VLANs to be attached to the same broadcast domain. These two VLANs may be assigned to the same sub-interface, or to different sub-interfaces.

2.15. ESI-agnostical Core-Routers

We should not make the core-routers aware of any per-EVI routing information of an ESI. Because they are just underlay nodes.

The core-routers may not aware of any per-ES routing infomation of the ESIes too. In such case, the anycast ESI SID should be hiden into the SRH, and it is the inner SID for the Node SID of the egress PE.

3. Light-Weighted EVPN Overview

3.1. Use Case

We assign a Global Discreminator EGD1 to an EVI instance EVI1, the EGD1 is a number consists of N bits. We assign an ESI-indicator GEI1 to ESI1 on PE1, and we assign an ESI-indicator GEI2 to ESI1 on PE2. We call the relationship between ESI1 and its two ESI-indicators as ESI1_GEI1 and ESI1_GEI2 respectively. The EGD and GEIs MUST have global uniqueness in EVI1's service domain.

                                 +----------+
                   PE1           |          |
              +-------------+    |          |
              | ESI1_GEI1   |    |          |         PE3
             /|             |----|          |   +-------------+
            / |             |    | IP/MPLS  |   |             |
       LAG /  +-------------+    | Backbone |   |   ESI2_GEI3 |---CE2
   CE1=====                      |   with   |   |             |
           \  +-------------+    |   EVPN   |---|             |
            \ |             |    |   RRs    |   +-------------+
             \|             |----|   and    |
              | ESI1_GEI2   |    |   SPEs   |
              +-------------+    |          |
                   PE2           |          |
                                 +----------+
Figure 1: EVPN MAC Reduction Usecase

We use IMET routes to build a broadcast-list. The broadcast-list is used to forward BUM traffics. The data-plane MAC learning for BUM traffics produces the first batch of C-MAC entries. The subsequent C-MAC entries can be learnt from Unicast traffics and/or BUM traffics. It is clear that we don't use MAC/IP routes to advertise C-MAC entries as usual, that is for fear that the RRs and/or SPEs are overloaded by these C-MACs.

3.2. Packet Walkthrough

#1
[PE1 forward ARP Request to PE2/PE3]
  • When CE1 requests CE2's ARP, PE1 will receive the ARP Request BUM1 from a AC (say AC1) of ESI1. PE1 will forward the ARP Request following the broadcast-list of AC1's EVI instance(say EVI1). The broadcast-list is constructed by IMET routes from PE2/PE3.

    PE1 will forward the ARP Request to PE2/PE3. The ARP Request is encapsulated with GEI1 and EVI1_GDV1. The inner SMAC of the ARP request is M1 which is CE1's MAC address.

#2
[PE2/PE3's Dataplane MAC Learning]
  • When PE2/PE3 receives the ARP Request packet BUM1, they do dataplane MAC learning independently. They will learn that M1 is behind GEI1.

    Note that when PE2 learns that M1 is behind GEI1, it will assume that M1 is behind the local AC whose ESI-indicator is GEI1 too. The local AC may have more higher priority than the remote one.

    After the dataplane MAC learning, the ARP request packet BUM1 is broadcasted to the local ACs, behind one of which is CE2.

#3
[PE2 Discard ARP Request to CE1]
  • On receiving BUM1 from PE1, PE2 use the ingress GEI information in BUM1 to determine its ingress ESI ESI1, When ESI1 is all-active mode and PE2 is about to forward the ARP request to CE1, PE2 will find that the ESI for the outgoing AC is also ESI1, so PE2 discards it for ESI loop-free considerations.

    Note that before that ARP Request packet is discarded, its source-MAC can be learnt, especially in "AC-aware bundling service interface". The MAC entry is learnt against the GEI, but it will consider the local sub-interface on that ES as its outgoing interface, in order to avoid unknown-unicast flooding.

    Note that in "AC-aware bundling service interface", the AC-ID along with that GEI can help the MAC entry to be installed for the correct outgoing interface. Such MAC entry is called as the synced MAC entry.

    When ESI1 is single-active mode, the outgoing AC may be in blocking state, otherwise its corresponding sub-interface on CE1 will take charge of packet-drop behavior instead. So alghough the ESI for the outgoing AC is not the same as ESI1, no loop will arise in the Ethernet Segment.

#4
[PE3 Forward ARP Replay to PE1/PE2]
  • When CE2 replies to CE1 for the ARP request, PE3 will forward the ARP reply U1 according to the MAC entry M1 learnt previously as above.

    PE3 will forward the ARP reply U1 to PE1 or PE2 according to ESI1's RT-1 per EVI routes and RT-1 per ES routes:

    When ESI1 is all-active mode, GEI1 may be the same as GEI2, in such case, we call both of them GEI21 instead. The traffics to M1 will be load-balanced between PE1 and PE2. Because that GEI21 is advertised by both PE1 and PE2l.

#5
[PE1 Forward ARP Replay to CE1]
  • Whe PE1 received the ARP reply packet U1 from PE3, PE1 first match the packet to the its EVI instance EVI1 by U1's EGD information. And PE1 will not discard it because the egress ESI is not the same as the ingress ESI which is determined by U1's GEI information.

4. Light-Weighted SRv6 EVPN

4.1. SRv6 Solution Overview

4.1.1. Aggregatable End.DX2 SID

When an Ethernet Segment ES1 is attached to an EVI, the attachment-circuit AC1 for that <ESI,EVI> is assigned with an End.DX2 SID. Different ACs of the same ESI are assigned with different End.DX2 SIDs, we call them AC SIDs in this document. But these different End.DX2 SIDs must be able to be aggregated into the same prefix, and this prefix are called as ESI prefix in light-weighted SRv6 EVPNs. The format of aggregatable End.DX2 SIDs is illustrated in the following figure:


    |<---  ESI-Indicator(128-N bits) ---->|<----     N bits     --->|
    +------------+------------+-----------+-------------------------+
    |    Block   |   Node     | ESI.LDV   |          AC-ID          |
    +------------+------------+-----------+-------------------------+
    |<------ Locator -------->|<------------- Function ------------>|

Figure 2: End.DX2 SID Formart for Aggregation

Note that the ESI.LDV field is the Local Discreminator Value (LDV) of the ESI (especially the type 3/4/5 ESI). The AC-ID field is the identifier of the AC of that End.DX2 SID's EVI. The ESI.LDV field and the EGD field are integrated into the End.DX2 SID's Function part.

Note that in "AC-aware bundling service interface" the AC-ID field MUST be the same as the Attachment Circuit ID of [I-D.sajassi-bess-evpn-ac-aware-bundling]. But in other service interfaces the AC-ID field can also be the EGD of that AC's EVPN instance. Note that the EGD has a global meaning like a global VNI or a PBB I-SID, while the AC-ID part for an ordinary aggregatable End.DX2 SID typically is only a VLAN-ID on that ES.

Note that an SRv6 ESI-indicator is an 128 bits ESI SID with a zero argument, it is also called as ESI-IP. An ESI-SID may have a non-zero argument part, but the ESI-IPs always have zero argument part.

4.1.2. The Advertisement of ESI-IPs

The SRv6 SID in IMET route is an End.DT2M SID with a zero argument length. The GEI1 and GEI2 are ESI-IPs of End.DX2AGG SID that is defined in Figure 3. We can use IGP protocols to advertise GEI1 and GEI2 to PE3 respectively in SRv6 underlay. So we don't have to use EAD/ES route or EAD/EVI route in SRv6 EVPN in this section.

Note that if ESI1 is single-active mode, GEI1 is different from GEI2, but if ESI1 is all-active mode, GEI1 is the same as GEI2.

Note that when PE1 node fails and the ESI is all active, the PLR node will do underlay anycast FRR switching for GEI1(=GEI2). This will bring out fast network convergency.

Note that when the PE-CE link of GEI1 fails, the IGP route of GEI1 will be withdrawn, So there will be no steady bypassing for that ES, but a temporary bypassing can be performed to further improve the convergency.

The detailed comparisons between light-weighted SRv6 EVPN and PBB EVPN over SRv6 is described in Section 6.

4.2. SRv6-specific EVPN-lite Procedures

[6A]

In Step #1, PE1 will forward the ARP Request to PE2/PE3 with the following SRv6 BE encapsulation: It's underlay Source IP is the End.DX2AGG SID on PE1 for ESI1; It's underlay Destination IP is the End.DT2M SID on PE2/PE3. The locator and function part of the End.DX2AGG SID is GEI1. The Argument part of the End.DX2AGG SID is 0.

Note that the underlay SIP will be the End.DT2U SID (because they don't need an ESI SID) for the single-homed ingress ACs. The multi-homed ingress ACs with single-active behavior may not be assigned with an dedicated ESI-indicator either. In such situations, the underlay SIP can be the End.DT2U SID too. Note that in such situations, the ESI indicator of all single-active ESIs for the same EVI are aggregated into the same IPv6 address.

[6B]

In Step #3, PE2 can compare the ingress-GEI of BUM1 and the GEI of outgoing AC directly, no GEI-to-ESI lookup needed.

Note that when AC-ID is the EGD PE2 can decapsulate the packet following the End.DX2 function or following the End.DX2AGG function. It is just a local matter, while the End.DX2AGG function can reduce the decapsulation forwarding entries. But when AC-ID is that AC's VLAN-IDs, PE2 have to decapsulate the packet following the End.DX2 function.

[6C]

In Step #4, PE3 will forward the ARP reply to PE1 with the following SRv6 BE encapsulation: It's underlay Source IP is the End.DX2AGG SID on PE3 for ESI2; It's underlay Destination IP is the End.DX2AGG SID on PE1 for ESI1 according to the MAC entry M1. The Arg.EGD for the End.DX2AGG SID in DIP is the EGD configured on PE3. Note that the EGD for the same EVI is configured with the same value on PE1/PE2/PE3.

When ESI1 is all-active mode, GEI1 will be the same as GEI2, so we call both of them GEI21 instead. The traffics to M1 will be load-balanced between PE1 and PE2 by the underlay network on PE3. Because GEI21 is advertised by both PE1 and PE2 in the underlay IGP protocol.

Note that if the DIP is the anycast node SID of PE1 and PE2, when the PE-CE link of ESI1 fails, the traffic will be steadily bypassed untill that link recovers again.

[6D]

In Step #5, Whe PE1 received the SRv6 encapsulated ARP reply packet from PE3, PE1 first match the packet to the End.DX2AGG SID of ESI1 by DIP, then match the packet to the EVI instance EVI1 by Arg.EGD.

4.2.1. Decapsulation Optimization

We want to decapsulation the packets destining to different ESIs for the same EVI using the same forwarding entry. In order to achieve this benefit, we can use an AC's EVI's EGD as that AC's AC SID's AC-ID.

These AC SIDs are aggregatable End.DX2 SIDs, so we can consider the ESI prefix aggregated from these End.DX2 SIDs as a new SRv6 function called End.DX2AGG SID, The format of the End.DX2AGG SID is illustrated in the following figure:


    |<------ Locator -------->|<- FUNC -->|<------ Arg.EGD -------->|
    +------------+------------+-----------+-------------------------+
    |    Block   |   Node     | ESI.LDV   |          EGD            |
    +------------+------------+-----------+-------------------------+

Figure 3: End.DX2AGG SID Format

Note that whether these SIDs are considered as lots of End.DX2 SIDs or are considered as a single End.DX2AGG SID with different arguments, it is just a local matter of their PE node's independent choice, other PEs of the same EVI won't be aware of the difference of these two implementations.

A SID with the End.DX2AGG function is called as an "ESI SID" in this document. The ESI's GEI is the locator and fuction part of its corresponding ESI SID. The argument part of the ESI SID is the AC-ID for the corresponding AC. The AC-ID plus the ESI.LDV works like the function part of an End.DX2 SID. The argument part of an ESI SID is called as Arg.EGD in this document, where the EGD is the abbreviation of EVPN Global Discreminator.

Note that when the End.DX2AGG Function is used in AC-aware bundling service interface, the VLAN-IDs of the ingress AC (the AC-ID of the ingress AC) can be carried in the inner ethernet packet.

4.2.1.1. End.DX2AGG Function and Arg.EGD

The "Endpoint with decapsulation and Aggregated L2 table forwarding" behavior (End.DX2AGG for short) is a variant of the End.DX2 behavior.

Two of the applications of the End.DX2AGG behavior are the EVPN VPLS [RFC7432] and the EVPN ETREE [RFC8317] use-cases.

Any SID instance of this behavior is associated with an ESI E. The behavior also takes an argument: "Arg.EGD". This argument provides a local mapping to an EVI V. The outgoing interface corresponds to <ESI E, EVI V> is OIF, and the EVI V's bridge table is L2 Table T .

The End.DX2AGG SID MUST be the last segment in a SR Policy.

When N receives a packet whose IPv6 DA is S and S is a local End.DX2AGG SID, the processing is identical to the End.DT2U behavior except for the Upper-layer header processing which is as follows:

   S01. If (Upper-Layer Header type == 143(Ethernet) ) {
   S02.    Remove the outer IPv6 Header with all its extension headers.
   S03.    Determine the L2 Table T using Arg.EGD.
   S04.    Learn the exposed MAC Source Address in L2 Table T.
   S05.    Find out the OIF, and forward the Ethernet frame to the OIF.
   S06. } Else {
   S07.    Process as per Section 4.1.1
               of [I-D.ietf-spring-srv6-network-programming].
   S08. }

Note that the OIF can be found out using the MAC-entries in L2 Table T, when the EVI V is an E-LAN service.

Note that in AC-aware bundling service interface, the Source MAC should be learnt against the ingress AC SID and the VLAN-IDs of the inner ethernet packet.

5. Advanced Considerations

5.1. ESI Indicator Advertisement Optimization

5.1.1. Advertise ESI SIDs in Underlay Network

The End.DX2AGG SIDs can be advertised as an IP prefix in underlay IGP protocols. Although it is the aggregation of many AC SIDs, the ESI SIDs may still be too many for the underlay network. And the core routers who are service-agnostic have to install these prefixes.

In order to solve these problems, the ESI SIDs can be advertised via EVPN routes in the overlay network.

Note that when the URPF (Unicast Reverse Path Forwarding) is enabled and the ESI SIDs are encapsulated as Source IPs, The ESI SIDs should be advertised in underlay network, even if the ESI SIDs won't be encapsulated as destination IPs. Otherwise the source ESI SID should be hiden into the SRH too.

5.1.2. Advertise ESI SIDs for Overlay Network

When we use EVPN routes to advertise ESI SIDs among the PEs for the overlay network, These routes will not be imported by the core routers. In such case, when the ESI SIDs are used as destination IP addresses, they should be hiden behind the node SID of the corresponding egress PE router.

Note that the association between an ESI SID and its corresponding Node SID is also advertised by such EVPN routes.

We can use EAD/ES route (or EAD/EVI route) to advertise Global ESI Indicator (GEI) (and EGD), these EAD routes is called as GEI/ES or GEI/EVI route in this document. When the GEI/EVI route is used to advertise GEI, the End.DX2AGG SID is advertised in its SRv6 L2 Service TLV, not in its nexthop. The EGD may be carried in the Arg.EGD field of the End.DX2AGG SID, or it can also be determined from its EVI-RTs.

Either GEI/EVI routes (or GEI/ES) routes will be advertised/imported for Global Routing Table (GRT), so their Route-Targets (RT) will be configured with GRT. Because there isn't a dedicated B-component like PBB VPLS and PBB EVPN. Note that the GEI/EVI routes can be installed as /128 routes and the Arg.EGD part can be set to the actual EGD of the corresponding EVI. In such case, when a C-MAC is learnt over an End.DX2AGG SID (as IPv6 SA) in the data-plane, the Arg.EGD field of that SID should be set to the EVI's EGD when the C-MAC entry is installed.

Although GEIs is imported to GRT, they are awared only on PE nodes, the transit nodes in underlay network won't be aware of GEIs (they can aware the common prefix of these GEIs) in order to reduce the FIB consumption. We can use the argument length in the SRv6 SID Structure Sub-Sub-TLV to check whether the EGD is too big for the End.DX2AGG SID, So we can avoid the destruction to the function part of the End.DX2AGG and we can use flexible EGD length.

5.1.3. Advertise AC SIDs for Overlay Network

In order to solve the problem described in Section 2.9, we may have to advertise AC SIDs. But the amount of AC SIDs may be hundreds of times larger than ESI SIDs. It is necessary for the light-weighted SRv6 EVPNs to reduce the advertisement of AC SIDs.

The AC SID of a specified <ESI,EVI> will not be advertised by its PEs, until these PEs know that the <ESI,EVI> fails on at least one of them.

Note that the AC SID for that <ESI,EVI> can be used as the source IP of the SRv6 encapsulation before that AC SID is advertised via EVPN routes. Because that when a MAC is learnt over that AC SID, the packet for that MAC can also be forwarded according to the IP Prefix of the corresponding ESI SID due to the longest match procedures of IP lookup.

The detailed AC-SID advertisement will be added in the future versions.

5.2. Unequal LB Advertisement

When the ESI SIDs are advertised by EVPN routes for the overlay network according to Section 5.1.2, we can advertise the EVPN Link Bandwidth extended community (see [I-D.ietf-bess-evpn-unequal-lb]) or something else along with the ESI SIDs using such EVPN routes.

Note that these extra information (which are advertised along with the EVPN routes) are awared by the PEs only. The underlay network don't have to be aware of it.

Note that when the EVPN Link Bandwidth extended community is advertised along with the ESI SID, The nexthop of the GEI/ES route should not be set to the anycast ECMP Node SID of the advertising PE (egress-PE). On receiving such GEI/ES route, the ingress PE may push this GEI/ES route's nexthop onto the End.DX2AGG/End.DX2 SID when constructing the SID stack, if unequal-LB is required.

5.3. EVPN Egress Protection

5.3.1. EVPN Egress Node Protection

There are two methods to achieve EVPN egress node protection:

  • The first method: Both the ESI SID and the AC SID are anycast SID, and they are hiden behind the corresponding egress Node SID according to Section 5.1.2. So when the egress node fails, the PLR can do "midpoint protection" for that node SID, as a result of that, the destination IP will be rewritten to the ESI SID behind that node SID.

    Note that the ESI SID is an anycast SID, so it will be re-routed by the underlay network after that failure.

    Note that this method requires no special extensions. So it will be suitable for more SRv6 devices than mirror SID.

  • The second method: the egress protection procedures per [I-D.wang-bess-evpn-egress-protection] (which uses an anycast FRR Node SID to achieve underlay anycast FRR protection) can be applied to the GEI/ES route's nexthop, in order to apply underlay anycast FRR protection.

Note that the PLR don't have unequal load-balance information, So neither of these two methods will meet the unequal load-balance requirements after that failure. But it will be the best result unless the unequal load-balance information can be advertised via IGP.

The details will be added in the future versions, but the procedures about the synced MAC entry of [Section 3.2, Paragraph 5, Item 1] will be helpful.

5.4. C-MAC Flush Notification Procedure

The withdraw of GEI Advertisement can be used as C-MAC flush notification like what have been done by [RFC8317] and [I-D.ietf-bess-pbb-evpn-isid-cmacflush].

Note that even if the GEI/EVI routes of Section 5.1 are not advertised, the withdraw of those GEI/EVI route can still be used as a C-MAC flush notification of their <ESI,EVI>.

5.5. E-Tree Support Considerations

E-tree Supprot extensions is similar to [RFC8317] section 5 except for the following notable differences: The leaf B-MACs are replaced by leaf GEIs, the root B-MACs are replaced by root GEIs. the PBB encapsulation is replaced by other encapsulations, the B-component is replaced by an IP-VRF or the underlay GRT. The B-MAC Advertisement Route is replaced by GEI/EVI route or ESI/IP Route.

5.6. EVPN IRB Support Considerations

The dataplane in this draft is no more complex than typical SRv6 EVPN. So it will work as efficient as we should expect in SRv6 EVPN IRB usecase.

5.7. Use AC SID in MAC/IP Advertisement Routes

But the AC SID can be used in MAC/IP advertisement route, even if C-MAC overload is not a real threat. By doing this, the data-plane can be unified among these usecases.

Note that the AC SID is also a typical End.DX2 SID.

6. Comparison with Other Solutions

We briefly compared light-weighted SRv6 EVPN with PBB-VPLS, PBB-EVPN and VXLAN solutions in [Revision-01], further brief comparisions with VTEP Group (and its transplantation in SRv6 network) were described in [Revision-02]. So we just add the detailed comparisons between EVPN-lite SRv6 and PBB EVPN over SRv6 in this revision.

6.1. Detailed Comparisons with PBB EVPN over SRv6

I think the "PBB EVPN over SRv6 underlay" solution will be complex, if we address too much things. I have some examples in the following:

  • The upper-layer header for SRv6 is the PBB-header for B-MACs, not the ethernet header for C-MACs, so the SID list (SR-Path or network programming Instructions) in the SRH can't be constructed for the sake of the I-Component. For example, when a SRv6 SID for MAC-guarding (or something else, just an example) present in the SRH for PBB EVPN SRv6, I think it means BMAC-guarding, no C-MAC guarding.

  • The B-MACs for the all-active ESIs can't be aggregated, but the SRv6 SIDs for ESIs can be aggregated. The underlay can advertise the aggregated prefixes only, so the burden of the underlay network may not be increased too much. When the underlay routes is aggregated, the C-MACs can also be learnt against /128 source-IP, it is the advantage of a light-weighted SRv6 EVPN, which can't be gained from a PBB header.

  • The B-MACs are for overlay protection (the real overlay is the I-VPLS, but the B-VPLS is also an overlay network from the viewpoint of the SRv6 network). But the SRv6 SIDs for ESIs will be for underlay protection, it works like the egress protection. They are two different types of solutions.

  • Although PBB EVPN can be transplanted into SRv6 networks along with the PBB header, It seems to be more complicated to me. Take the EVPN IRB usecases for example, that requires seven sequences of header processing, like (SRv6/B-MAC/C-MAC)(Inner-IP)(C-MAC/B-MAC/SRv6), during the overlay L3 forwarding. I think it will be horrible enough for some ASICs to implement it. When the processing is simplified as (SRv6/C-MAC)(Inner-IP)(C-MAC/SRv6), it sounds like a step forward, not backward, IMHO. We can achieve this goal easily inside the EVPN framework, only if the data-plane learning can still be considered as an option after PBB EVPN.

Fortunately, SRv6 is just too young to have a transplantation of PBB EVPN. So it will waste nothing for the SRv6 nodes to give up the PBB header whom is never used by these SRv6 nodes. Note that the SRv6 functions (End.DT2U and End.DT2M) for L2VPNs have source-IP-based data-plane learning for a long time already.

In EVPN IRB usecase, [I-D.ietf-bess-evpn-irb-extended-mobility] defines some optional extensions to support some specific IRB usecases. In these specific IRB usecases, the <MAC,IP> bindings will change across VM-moves. These extensions can't be applied to PBB EVPNs, they can't be applied to light-weighted EVPNs either. This will not prevent PBB EVPNs and light-weighted EVPNs from supporting typical IRB use-cases.

6.2. Detailed Comparisons with Anycast Node SID

The "Anycast Node SID" solution here is the transplantation of Anycast-VTEP-IP solution in SRv6 data-plane, where the Anycast Node SID is the equivalent of the Anycast VTEP IP address. Note that SRv6 Anycast Node SID is the ultimate aggregation of ESI indicators. The detailed comparisons will be added in the future visions.

7. Security Considerations

Security considerations will be added in future versions.

8. IANA Considerations

8.1. End.DX2AGG SID

IANA is requested to allocate a new code points for the new SRv6 Endpoint Behaviors defined in this document.


+------+-------------+---------------+
| Type | Description | Reference     |
+------+-------------+---------------+
| TBD1 | End.DX2AGG  | This Document |
+------+-------------+---------------+

Figure 4: End.DX2AGG

9. Acknowledgements

The authors would like to thank the following for their comments and review of this document:

Ye Shu.

10. Normative References

[I-D.ietf-bess-evpn-unequal-lb]
Malhotra, N., Sajassi, A., Rabadan, J., Drake, J., Lingala, A., and S. Thoria, "Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-unequal-lb-07, , <https://tools.ietf.org/html/draft-ietf-bess-evpn-unequal-lb-07>.
[I-D.ietf-bess-mvpn-evpn-aggregation-label]
Zhang, Z., Rosen, E., Lin, W., Li, Z., and I. Wijnands, "MVPN/EVPN Tunnel Aggregation with Common Labels", Work in Progress, Internet-Draft, draft-ietf-bess-mvpn-evpn-aggregation-label-05, , <https://tools.ietf.org/html/draft-ietf-bess-mvpn-evpn-aggregation-label-05>.
[I-D.ietf-bess-srv6-services]
Dawra, G., Filsfils, C., Talaulikar, K., Raszuk, R., Decraene, B., Zhuang, S., and J. Rabadan, "SRv6 BGP based Overlay services", Work in Progress, Internet-Draft, draft-ietf-bess-srv6-services-05, , <https://tools.ietf.org/html/draft-ietf-bess-srv6-services-05>.
[I-D.ietf-spring-srv6-network-programming]
Filsfils, C., Camarillo, P., Leddy, J., Voyer, D., Matsushima, S., and Z. Li, "SRv6 Network Programming", Work in Progress, Internet-Draft, draft-ietf-spring-srv6-network-programming-28, , <https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming-28>.
[I-D.sajassi-bess-evpn-ac-aware-bundling]
Sajassi, A., mishra, m., Thoria, S., Brissette, P., Rabadan, J., and J. Drake, "AC-Aware Bundling Service Interface in EVPN", Work in Progress, Internet-Draft, draft-sajassi-bess-evpn-ac-aware-bundling-02, , <https://tools.ietf.org/html/draft-sajassi-bess-evpn-ac-aware-bundling-02>.
[RFC4448]
Martini, L., Ed., Rosen, E., El-Aawar, N., and G. Heron, "Encapsulation Methods for Transport of Ethernet over MPLS Networks", RFC 4448, DOI 10.17487/RFC4448, , <https://www.rfc-editor.org/info/rfc4448>.
[RFC7348]
Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, , <https://www.rfc-editor.org/info/rfc7348>.
[RFC7432]
Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, , <https://www.rfc-editor.org/info/rfc7432>.
[RFC7623]
Sajassi, A., Ed., Salam, S., Bitar, N., Isaac, A., and W. Henderickx, "Provider Backbone Bridging Combined with Ethernet VPN (PBB-EVPN)", RFC 7623, DOI 10.17487/RFC7623, , <https://www.rfc-editor.org/info/rfc7623>.
[RFC8317]
Sajassi, A., Ed., Salam, S., Drake, J., Uttaro, J., Boutros, S., and J. Rabadan, "Ethernet-Tree (E-Tree) Support in Ethernet VPN (EVPN) and Provider Backbone Bridging EVPN (PBB-EVPN)", RFC 8317, DOI 10.17487/RFC8317, , <https://www.rfc-editor.org/info/rfc8317>.
[RFC8365]
Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., Uttaro, J., and W. Henderickx, "A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, DOI 10.17487/RFC8365, , <https://www.rfc-editor.org/info/rfc8365>.

11. Informative References

[I-D.eastlake-bess-evpn-vxlan-bypass-vtep]
Eastlake, D., Li, Z., Zhuang, S., and R. White, "EVPN VXLAN Bypass VTEP", Work in Progress, Internet-Draft, draft-eastlake-bess-evpn-vxlan-bypass-vtep-07, , <https://tools.ietf.org/html/draft-eastlake-bess-evpn-vxlan-bypass-vtep-07>.
[I-D.ietf-bess-evpn-irb-extended-mobility]
Malhotra, N., Sajassi, A., Pattekar, A., Lingala, A., Rabadan, J., and J. Drake, "Extended Mobility Procedures for EVPN-IRB", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-irb-extended-mobility-04, , <https://tools.ietf.org/html/draft-ietf-bess-evpn-irb-extended-mobility-04>.
[I-D.ietf-bess-pbb-evpn-isid-cmacflush]
Rabadan, J., Sathappan, S., Nagaraj, K., Miyake, M., and T. Matsuda, "PBB-EVPN ISID-based CMAC-Flush", Work in Progress, Internet-Draft, draft-ietf-bess-pbb-evpn-isid-cmacflush-01, , <https://tools.ietf.org/html/draft-ietf-bess-pbb-evpn-isid-cmacflush-01>.
[I-D.wang-bess-evpn-context-label-02]
Wang, Y., "'SR-MPLS signalling for CSL-based Context VC' in I-D.wang-bess-evpn-context-label-02", , <https://tools.ietf.org/html/draft-wang-bess-evpn-context-label-02#section-4.2>.
[I-D.wang-bess-evpn-egress-protection]
Wang, Y. and R. Chen, "EVPN Egress Protection", Work in Progress, Internet-Draft, draft-wang-bess-evpn-egress-protection-04, , <https://tools.ietf.org/html/draft-wang-bess-evpn-egress-protection-04>.
[Revision-01]
"Revision-01 of this draft", , <https://tools.ietf.org/html/draft-wang-bess-evpn-cmac-overload-reduction-01>.
[Revision-02]
"Revision-02 of this draft", , <https://tools.ietf.org/html/draft-wang-bess-evpn-cmac-overload-reduction-02>.
[RFC7041]
Balus, F., Ed., Sajassi, A., Ed., and N. Bitar, Ed., "Extensions to the Virtual Private LAN Service (VPLS) Provider Edge (PE) Model for Provider Backbone Bridging", RFC 7041, DOI 10.17487/RFC7041, , <https://www.rfc-editor.org/info/rfc7041>.

Authors' Addresses

Yubao Wang
ZTE Corporation
No.68 of Zijinghua Road, Yuhuatai Distinct
Nanjing
China
Ran Chen
ZTE Corporation
No. 50 Software Ave, Yuhuatai Distinct
Nanjing
China