Internet-Draft IP-Aliasing-no-IRB September 2021
Wang & Zhang Expires 5 March 2022 [Page]
Workgroup:
BESS WG
Published:
Intended Status:
Standards Track
Expires:
Authors:
Y. Wang
ZTE Corporation
Z. Zhang
ZTE Corporation

ARP/ND Synching And IP Aliasing without IRB

Abstract

This draft discusses serveral signalling modes of EVPN Signalled L3VPNs. EVPN Signalled L3VPNs are used to improve L3VPNs for some new use cases. Then it discusses which style of RT-5 routes can be selected for these new use cases, and why they are selected.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 5 March 2022.

Table of Contents

1. Introduction

This draft discusses serveral signalling modes of EVPN Signalled L3VPNs. they are:

  • Adjacencies Discovery modes: spreadable mode and non-spreadable mode.

  • CE-Prefix Auto-discovery modes: Centerlized mode and Distributed mode.

  • Styles of RT-5 routes: RT-5L (L-Sytle), RT-5G (G-Style), RT-5E (E-Style), RT-5M (M-Style).

These signalling modes can help to improve L3VPNs for some new use cases. Then we will discuss which style of RT-5 routes will be selected for each new use case, and why it is selected for that new use case.

1.1. Terminology and Acronyms

Most of the acronyms and terms used in this documents comes from [RFC7432], [I-D.sajassi-bess-evpn-ip-aliasing] and [I-D.wang-bess-evpn-ether-tag-id-usage] except for the following:

* VRF AC -

An Attachment Circuit (AC) that attaches a CE to an IP-VRF but is not an IRB interface.

* VRF Interface -

An IRB interface or a VRF-AC or an IRC interface. Note that a VRF interface will be bound to the routing space of an IP-VRF.

* L3 EVI -

An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN which contains VRF ACs and maybe contains IRB interfaces or IRC interfaces.

* IP-AD/EVI -

Ethernet Auto-Discovery route per EVI, and the EVI here is an IP-VRF. Note that the Ethernet Tag ID of an IP-AD/EVI route may be not zero.

* IP-AD/ES -

Ethernet Auto-Discovery route per ES, and the EVI for one of its route targets is an IP-VRF.

* RMAC -

Router's MAC, which is signaled in the Router's MAC extended community.

* ESI Overlay Index -

ESI as overlay index.

* ET-ID -

Ethernet Tag ID, it is also called ETI for short in this document.

* RT-2R -

When a MAC/IP Advertisement Route whose ESI is not zero is used for IP-VRF forwarding, it is called as a RT-2R in this draft. When it is used for MAC-VRF forwarding, it is not called as a RT-2R in this draft.

* RT-5E -

An EVPN Prefix Advertisement Route with a non-reserved ESI as its overlay index (the E-style RT-5) .

* IRC -

Integrated Routing and Cross-connecting, thus a IRC interface is the virtual interface connecting an IP-VRF and an EVPN VPWS.

* CE-BGP -

The BGP session between PE and CE. Note that CE-BGP route doesn't have a RD or Route-Target.

* CE-Prefix -

An IP Prefixes behind a CE is called as that CE's CE-Prefix.

* RT-5G -

An EVPN Prefix Advertisement Route with a zero ESI and a non-zero GW-IP (the G-style RT-5).

* RT-5L -

An EVPN Prefix Advertisement Route with both zero ESI and zero GW-IP, but a non-zero EVPN Label (the L-style RT-5).

* RT-5M -

An EVPN Prefix Advertisement Route with zero ESI, zero GW-IP and zero EVPN Label, but a non-zero Router's MAC (the M-style RT-5).

* EVC -

Ethernet Virtual Connection, which is typically constructed per <Port, VLAN> basis.

* Internal Remote PE:
When PEx is called as an EVPN route ERy's internal remote PE, that is saying that, PEx is on the ES which is identified by ERy's ESI field. When ERy's SOI is not zero, that is aslo saying that PEx has been attached to the ethernet tag which is identified by the <ESI, SOI>.
* External Remote PE:
When PEx is called as an EVPN route ERy's external remote PE, that is saying that, PEx is not on the ES which is identified by ERy's ESI field. When ERy's SOI is not zero, PEx may aslo be a PE which has not been attached to the ethernet tag which is identified by the <ESI, SOI>.
* CE-Prefix:
When an IP prefix can be reached through CEx from PEy, that IP prefix is called as PEy's CE-prefix behind CEx in this draft. PEy's CE-prefix behind CEx is also called as PEy's CE-prefix for short in this draft.
* Common CE-Prefix:
When an CE-Prefix can be reached through either CEy or CEz from PEy, in this draft, it is called as a common CE-Prefix of CEy and CEz,from the viewpoint of PEy.
* Exclusive CE-Prefix:
When an CE-Prefix of PEy can be reached through CEy, and it can't be reached through other CEs of PEy, it is called as an exlusive CE-Prefix of CEy, from the viewpoint of PEy.
* SNGW:
Sub-Net-specific Gate Way IP address, the SNGW of a subnet is an IP address which is used by the hosts of that subnet to be the nexthop of the default route of these host.
* Intermediate subnet:
The subnet that connects a PE and a CE of a L3 EVI.
* Intermediate SNGW :
The SNGW of a intermediate subnet. It will be the IP address of a IRC interface in this draft.
* Intermediate nexthop :
The CE's IP address in the intermediate subnet.
* Overlay nexthop :
The CE-Prefix's nexthop IP address which is in the address-space of the L3 EVI.
* Original Overlay nexthop :
The overlay nexthop which is advertised by the CE through a PE-CE route protocol.
* L3EVI-Specific EADR -

When the <ESI, L3EVI> uses L3EVI-Specific Ethernet Auto-discovery mode, the only Ethernet A-D per EVI route (which will be <ESI, ET-ID=0>) of that <ESI, L3EVI> is called as a L3EVI-Specific EADR in this draft.

* ACI-Specific EADR -

When the <ESI, SOI> uses ACI-Specific Ethernet Auto-discovery mode, the Ethernet A-D per EVI routes of that <ESI, SOI> are called as ACI-Specific EADRs in this draft.

2. Service Interfaces of L3 EVIs

Service interface describes how an ES is attached to a L3EVI. [I-D.wang-bess-evpn-ether-tag-id-usage] discussed the following service interfaces:

  • Mono VLAN-based Service Interface.

  • Multiple VLAN-based Service Interface.

  • Separated Risk VLAN-bundle Service Interface.

  • Shared Risk VLAN-bundle Service Interface.

  • IRC Service Interface.

Different service interface will require different control-plane procedures, then this draft discusses the behavior of RT5 routes advertisement per each service interface, especially when they are RT5 routes with ESI as overlay index or GW-IP as overlay index.

Note that an ES may be attached to different L3EVIs via different VLANs, and mutiple ESes can be attached to the same L3EVI Instance. So service interface is ESI-specific and EVI-specific. When ES1 is of VLAN-bundle Service Interface to EVI1, it may be of Mono VLAN-based Service Interface for EVI2. Thus service interfaces of L3EVIs are <ESI, EVI>-specific in this draft.

3. IP Discovery Mode

IP discovery in L3EVIs include ARP/ND Adjacencies discovery and CE-prefixes discovery. The adjacencies discovery is done distributively by each VRF-AC using ARP/ND. But the CE-prefixes can be discovered by two ways.

3.1. Adjacencies Discovery

3.1.1. Spreadable Mode (Faraway Mode)

In Spreadable Mode, an adjacent MAC/IP can be imported into IP-VRF by both external remote PEs and internal remote PEs.

In this mode, the RT-2 route of the MAC/IP is used to synchronize adjacency information (<MAC, IP> mapping) to internal remote PEs. and it is also used to advertise host route to external remote PEs.

When CEs are hosts, this mode will make the amount of EVPN routes increase greatly.

The spreadable mode is also called as Faraway Mode because that the external remote PEs of the MAC/IP entries can imported the RT-2 routes of these MAC/IP entries into IP-VRF.

The spreadable mode can be used to avoid making a detour when there is a straightforward path. This mode is typically used in EVPN IRB scenarios, where different hosts of the same BD may be reached through different ESes. Another example of spreadable mode can be found in Section 2.1.2 of [I-D.wz-bess-evpn-vpws-as-vrf-ac], where the CEs are a few routers.

3.1.2. Non-Spreadable Mode (Nearby Mode)

In Spreadable Mode, an adjacent MAC/IP can only be imported into IP-VRF by internal remote PEs. In other words, the MAC/IP can not be imported into IP-VRF by the external remote PEs of it.

In this mode, the RT-2 route of the MAC/IP is just used to synchronize adjacency information (<MAC, IP> mapping) to internal remote PEs.

In non-spreadable mode, it should be insured that only the internal remote PEs of the MAC/IP entry can imported the RT-2 route of the MAC/IP entry into IP-VRF. Thus that RT-2 route should carry EVI-RT and ES-Import RT only, and that's why non-spreadable mode is also called as nearby mode.

An example of non-spreadable mode can be found in Section 2.1.1 of [I-D.wz-bess-evpn-vpws-as-vrf-ac], where the CEs are lots of hosts, and all CEs can be reached through the same VPWS service instance.

3.2. CE-Prefixes Auto-Discovery Modes

There are two ways to discover the IP prefixes behind a CE (that's why these prefixes are called CE-prefixes for short), they are distributed AD-Mode and centerlized AD-mode.

3.2.1. Distributed A-D Mode

The CE-Prefixes inside a DC are discovered by each NVE separately. Then these NVEs advertise their CE-prefixes to DC Gateways and other NVEs of that DC.

Note that the external-prefixes (which are received from other DCs) will be discovered by DC gateways even in distributed AD-mode. Distributed A-D mode and Centerlized A-D mode just talks about how CE-prefixes inside the DC will be discovered.

3.2.2. Centerlized A-D Mode

The CE-Prefixes (behind each NVE) are discovered by the same group of DC Gateways. Then these DC Gateways advertise these CE-prefixes to NVEs.

No matter what the A-D mode is, the distributed forwarding behavior should be expected in this draft. That is, the communication between two subnets behind two NVEs inside the same DC should not be required to pass through the DC Gateway.

4. Styles of RT-5 Route

When a RT-5 route is used to forward a data packet, the label/VNI/SID of that data packet's EVPN header may be obtained relying on four different fields of that RT-5.

In other words, we can say that RT-5 routes can be classified into four styles, which are called L-style, G-style, E-style, M-style respectively.

These styles have different usages and they are suitable for different secenarios.

4.1. L-Style: no Overlay Index

When a L-style RT-5 is used to forward a data packet, the label/VNI/SID of that data packet's EVPN header is obtained from the RT-5's own MPLS Label (that's why it is called L-Style) field (of its NLRI), and the forwarding path is determined by its own underlay next-hop (BGP next hop).

A L-style RT-5 route is also called as a RT-5L in this draft.

Note that the ESI and GW IP fields are both zero at the same time, otherwise it will be considered to be another style.

4.1.1. RT-5L Advertisement in Distributed A-D Mode

When N1/N2 establish CE-BGP sessions with both PE1 and PE2, it is enough for PE1/PE2 to advertise RT-5L routes to DGW1. There is no need for RT-5G or RT-5E advertisement on PE1/PE2 in that usecase.

Note that when N1/N2 establish CE-BGP sessions with both PE1 and PE2, the downlink VRF-interface addresses on PE1 and PE2 may be different IP addresses of the same subnet. Otherwise we may use loopback interfaces to establish the CE-BGP sessions.

4.1.2. RT-5L Advertisement in Centerlized A-D Mode

When a PE advertises RT-5Ls just for its own direct subnets, it can be used in both distributed A-D mode and centerlized A-D mode. When a PE advertises RT-5Ls for CE-prefixes, it can not be used in centerlized A-D mode, otherwise the data forwarding will be centerlized too. When a PE (which is a DC Gateway) advertises RT-5Ls for external-prefixes (which are received from other DCs or non-EVPN neighbors), it can be used in either centerlized A-D mode or distributed A-D mode.

4.2. G-Style: GW-IP as Overlay Index

When a G-style RT-5 is used to forward a data packet, the label/VNI/SID of that data packet's EVPN header is obtained using another EVPN route whose IP field (of its NLRI) matches this RT-5 route's own GW-IP (that's why it is called G-Style) field (of its NLRI), and the forwarding path is determined by that EVPN route.

A G-style RT-5 route is also called as a RT-5G in this draft.

RT-5G can be used wether the CE-prefex AD-mode is centerlized mode or distributed mode. and RT-5G can be used wether the Service Interface is Mono VLAN-based mode or Mutiple VLAN-based mode. It can be a uniform approach to advertise CE-prefixes no matter what the EVPN mode is.

Note that the ESI field of a RT-5G route MUST be zero as per [I-D.ietf-bess-evpn-prefix-advertisement].

4.2.1. RT-5G Advertisement in Distributed A-D Mode

It follows [I-D.wz-bess-evpn-vpws-as-vrf-ac] section 2.3.2 and section 6.2. Note that these procedures can be used in every L3EVI Service Interface, not just in IRC Service Interface.

4.2.2. RT-5G Advertisement in Centerlized A-D Mode

When a PE (which may be a DC gateway) learns that CE-prefix prefix1's overlay next hop is IP1, then the PE advertise a RT-5G for prefix1, the GW-IP of that RT-5G is set to IP1.

An example of RT-5G advertisement in centeralized A-D mode can be found in Section 5;

4.3. E-Style: ESI as Overlay Index

When a E-style RT-5 is used to forward a data packet, the label/VNI/SID of that data packet's EVPN header is obtained from another RT-1 route whose ESI and Ethernet Tag ID matches this RT-5 route's ESI (that's why it is called L-Style) and Supplementary Overlay Index (Section 3.3 of [I-D.wang-bess-evpn-ether-tag-id-usage] and Section 6.3.3 of [I-D.wz-bess-evpn-vpws-as-vrf-ac]), and the forwarding path is determined by that RT-1 route.

A E-style RT-5 route is also called as a RT-5E in this draft.

RT-5E can only be used when the CE-prefex AD-mode is distributed mode. RT-5E can be used in Mono VLAN-based Service Interface. But when RT-5E is used in Multiple VLAN-based Service interface or Separated Risk VLAN-bundle service interface, the ACI-specific ethernet auto-discovery per [I-D.wang-bess-evpn-ether-tag-id-usage] should be followed.

Note that the GW-IP field of a RT-5E route MUST be zero as per [I-D.ietf-bess-evpn-prefix-advertisement].

4.3.1. RT-5E in Bump-in-the-wire use case

The RT-5 route that specifies an ESI as overlay index is first defined in Section 4.3 of [I-D.ietf-bess-evpn-prefix-advertisement], where the Bump-in-the-wire use case (the former RT-5E usage) is also defined there.

Then it is discussed in Section 2.4 and Section 3.6.4 of [I-D.wang-bess-evpn-ether-tag-id-usage]. The RT-5E routes (the latter RT-5E usage) of Section 6 of revision-02 [I-D.wang-bess-evpn-arp-nd-synch-without-irb-02] and Section 1.3 of [I-D.sajassi-bess-evpn-ip-aliasing] are different from these RT-5E routes of Bump-in-the-wire use case in the following factors:

  • Source MAC - The ethernet header can not be absent in the former usage even if the data plane is MPLS. The source MAC MUST be set to the MAC address of the IRB interface of BD-10 in Bump-in-the-wire usecase. But in the latter usage the ethernet header can be absent if the data plane is MPLS.

  • Recursive Resolution - The recursive resolution of the former usage are done in the context of a BD, But the recursive resolution of the latter usage are done in the context of a IP-VRF.

  • EVPN label - The EVPN label of the corresponding RT-1 per EVI route of the former usage is a MPLS label which identifies a BD, But the EVPN label of the corresponding RT-1 per EVI route of the latter usage is a MPLS label which identifies an IP-VRF.

  • ESI - The ESI of the former usage is attached to a BD, But ESIs of the latter usage are attached to IP-VRFs.

The Bump-in-the-wire use case is a special form of EVPN IRB use case, that's why it is different from the non-IRB use cases.

4.3.2. RT-5E Advertisement on Distributed L3 GW

Given that PE1/PE2 (see Figure 1) can install a synced ARP entry to its proper VRF-interface benefitting from the RT-2 route of Section 3.1. So it is not necessary for PE1/PE2 to advertise per-host IP prefixes to remote PEs (e.g. PE3) by RT-2 routes. It is recommended that PE1/PE2 advertise an RT-5E route per subnet to PE3 instead. The ESI of these RT-5E routes can be set to the ESI of the corresponding VRF interface. If the VRF interface fails, these subnets will achieve more faster convergency on PE3 by the withdraw of the corresponding IP-AD/EVI route.

Note that N1/N2 may be a host or a router, when it is a router, those subnets (which are advertised by RT-5E routes) will be the CE-prefixes behind it. When N1 and N2 are hosts, those subnets will be the intermediate subnets (the subnet of N1/N2's own IP address).

When RT-5E routes are used to advertise direct-subnets, the details can be found in Section 4.3.3. When RT-5E routes are used to advertise CE-prefixes, there are two approaches, the details can be found in Section 1.3 of [I-D.sajassi-bess-evpn-ip-aliasing] and Section 6.3 of [I-D.wz-bess-evpn-vpws-as-vrf-ac].

4.3.3. RT-5E Advertisement in Centerlized A-D mode

When the CE-prefixes are discovered by centerlized auto-discovery approaches, the RT-5E can be used to advertise the direct-subnets of NVE1/NVE2, but these RT-5E routes are not used to advertise the CE-Prefixes.

When the direct-subnets are advertised by RT-5E routes, when the main-interface of the corresponding ESI fails, mass-withdraw procedures can be triggered for these prefiexes. This is the advantage of advertising direct-subnets through RT-5E routes instead of RT-5L routes.

Note that the example of the mass-withdraw use-case of RT-5E routes can be found in Section 5.4. and it can be used in Dstributed A-D mode too.

4.4. M-Style: MAC as Overlay Index

When a M-style RT-5 is used to forward a data packet, the label/VNI/SID of that data packet's EVPN header is obtained using another RT-2 route whose MAC field (of its NLRI) matches this RT-5 route's own RMAC, and the forwarding path is determined by that RT-2 route.

A M-style RT-5 route is also called as a RT-5M in this draft.

RT-5M is used in Interfaceful IP-VRF-to-IP-VRF mode and Bump-in-the-wire use case as per [I-D.ietf-bess-evpn-prefix-advertisement].

5. Centerlized RT-5G Advertisement for Distributed L3 Forwarding

When N1/N2/N3 is a router, it is called R1/R2/R3 in the following figure. Note that Figure 6 only illustrates the physical ethernet links, but Figure 1 illustrates the logical L3 adjacencies between PE and CE as the following. We assume that ESI21 are attched to L3EVI VPNx of Section 1.1.2 of [I-D.wang-bess-evpn-ether-tag-id-usage].

                   PE1
                  +----------+
                  | +------+ | ------>
 R1               | |      | | RT-1
+-------+         | | VPNx | | ESI21
|       |  P1.1   | |      | | ETI1
| ...................(10.9)| |
| .     |  ESI21  | +------+ |                          DGW1
| .     |    +    +----------+                    +-------------+
| .     |    |       ^                <---------- |             |
| .     |    |       | RT-2            RT-5G      | +---------+ |
|(10.2) |    |       | 10.2            CE-Prefix1 | |  VPNx   | |
| .     |    |       | ESI21           GW-IP=10.2 | |         |....R3
| .     |    |       |                            | |(3.3.3.3)| |
| .     |    +    +----------+ ------>            | +---------+ |
| .     |  ESI21  | +------+ | RT-2R              |   ^         |
| ...................(10.9)| | 10.2               |   |         |
|       |  P2.1   | |      | | ESI21              +---|---------+
+-------+         | | VPNx | |                        |
    |             | |      | | ------>                | CE-BGP
    |             | +------+ | RT-1                   | Prefix1
    |             +----------+ ESI21                  | NH=10.2
    |              PE2         ETI1                   |
    |                   CE-BGP                        |
    +--------------------->---------------------------+

Figure 1: Centerlized RT-5G Advertisement

If R1 prefers to establish a single CE-BGP session, it can establish the CE-BGP session with DC GW (e.g. PE3 of Section 1.1.2 of [I-D.wang-bess-evpn-ether-tag-id-usage]) instead. This CE-BGP session can be called the centerlized CE-BGP session. But when we use centerlized CE-BGP session, we should use RT-5G route instead.

Note that we just use centerlized CE-BGP session to discover CE-prefixes, but we still expect a distributed Layer 3 forwarding framework.

5.1. CE-side Configurations

Let us assume that CCC Active-Active Protection are used inside PNEC1, that's to say, when R1 send packets to 10.9, these packets will be load-balanced between PE1 and PE2.

5.2. Why Centerlized A-D mode is used

Because of the factors discussed in Section 5.1, perhaps the CE-BGP session can be established between 10.2 and 10.9.

There may be other reasons that prevent the routing protocols to be established between 10.2 and 10.9.

5.3. Basic Control Plane Procedures

5.3.1. Centerlized CE-BGP

The CE-BGP session between R1 and DGW1 (when PE3 is a DC GW, it is called DGW1) is established between 10.2 and 3.3.3.3. The IP address 10.2 is called the uplink interface address of R1 in this document. The IP address 3.3.3.3 is called the centerlized loopback address of VPNx in this document. The IP address 10.9 is called the downlink VRF-interface address of PE1/PE2 in this document.

R1 advertises a BGP route for a prefix (say "Prefix1") behind it to DGW1 via that CE-BGP session. The nexthop for Prefix1 is R1's uplink interface address (say 10.2).

Note that the data packets from R1 to the centerlized loopback address may be routed following the default route on R1. Thus DGW1 don't need to use the CE-BGP session to advertise prefixes of VPNx to R1.

5.3.2. RT-2E Advertisement from PE1/PE2 to DGW1

When PE1 learns the ARP entry of 10.2, it advertises a RT-2R route to DGW1. The ESI value of the RT-2R route is ESI21, which is the ESI of PE1's downlink VRF-interface for R1. The RT-2R route is constructed following Section 3.1. This is a mono VLAN-based service interface, thus the ETI1 (Ethernet Tag ID 1) of that RT-2R route can be 0.

Note that in [RFC7432], when the ESI is single-active, the MAC forwarding only use the label and the BGP nexthop of the RT-2R route as long as they are valid for forwarding status. But in RT-5E routes we assume that the ESI is always preferred even if the ESI is single-active. This is follows [I-D.ietf-bess-evpn-prefix-advertisement] section 3.2 Table 1.

5.3.3. RT-5G Advertisement from DGW1 to PE1/PE2

When DGW1 receives the prefix1 from the CE-BGP session. The nexthop for Prefix1 is 10.2. So DGW1 advertises a RT-5G route to PE1/PE2 for Prefix1. The GW-IP value of the RT-5G route for Prefix1 is 10.2.

Note that DGW1 can load-balance packets for Prefix1 via the IP-AD/EVI routes (of ESI21) from PE1/PE2. Because ESI21 (which is advertised along with RT-2R of 10.2) is the ESI for Prefix1's GW-IP.

Note that the centerlized loopback address is advertised to PE1/PE2 by DGW1 via RT-5L route. The nexthop of the RT-5L route is DGW1. The label of the RT-5L route is VPNx's label on DGW1. The RMAC of the RT-5L route is DGW1's MAC when the encapsulation is VXLAN.

5.3.4. RT-2E Advertisement between PE1 and PE2

The RT-2R routes advertisement between PE1 and PE2 is used to sync their ARP entries to each other in order to avoid ARP missing. The ESI Value of these two RT-2R routes is ESI21.

5.4. Mass-Withdraw by EAD/ES Route

In the figure of Section 1.1.2 of [I-D.wang-bess-evpn-ether-tag-id-usage], there are two L3EVIs, VPNx and VPNy. We just take VPNx for example in Section 5.3, now we consider these two L3EVIs together.

                                    +-----------------------+
 PNEC1                      PE1     |                       |
+-------------+          +----------+--------+              |
|             |          |  __(20.9)__(VPNy) | Withdraw     |
| Prefix1  "  |   P1     | /                 | IP-AD/ES     |
|  /       #===========X==<                  | ----X--->    | DGW1
| R1_______"  |  ESI21   | \__      __       |         +----+----+
|    10.2  "  |    +     |    (10.9)  (VPNx) |         |         |
|          "  |    |     +-----------+-------+         |(3.3.3.3)|
|          "  |    |                 |                 |    |    |
| Prefix2  "  |    |                 |                 |  (VPNx)---+N3
|  /       "  |    |        PE2      |                 |         |
| R2_______"  |    |     +-----------+-------+         |  (VPNy)---+N5
|    20.2  "  |    +     |  __(20.9)__(VPNy) |         |         |
|          "  |  ESI21   | /                 |         +----+----+
|          #==============<                  |              |
|          "  |   P2     | \__      __       |              |
|             |          |    (10.9)  (VPNx) |              |
+-------------+          +----------+--------+              |
                                    |                       |
                                    +-----------------------+
Figure 2: Mono VLAN-based S-I Use Case

When the physical interface of the downlink VRF-interface (P1) on PE1 fails (illustrated by the 'X' on P1), PE1 will withdraw the IP-AD/ES route of ESI21, so DGW1 will re-route 10.2 for VPNx's CE-prefiex1. and re-route 20.2 for VPNy's CE-prefix2 at the same time. Then data packets for CE-Prefix1 and CE-Prefix2 will be sent to PE2 instead.

5.5. If Mutiple VLAN-based Service Inerface is Used

Now we assume that ESI21 are attached to L3EVI VPN1 according to Section 1.1.3 of [I-D.wang-bess-evpn-ether-tag-id-usage]. And we assume that CCC Active-Active Protection are used inside PNEC1.

                                 +--------------------------+
 PNEC1                      PE1  |                          |
+-------------+          +-------+------+                   | DGW1'
|             |          | X__(20.9)    | ----X---->   +----+----+
|          "  |   P1     | /        \   | Withdraw     |         |
|          #==============<      (VPN1) | IP-AD/EVI    |  (VPN1)---+N6
| R1_______"  |  ESI21   | \__      /   | ET-ID=2      |         |
|    10.2  "  |    +     |    (10.9)    |              +----+----+
|          "  |    |     +--------+-----+                   |
|          "  |    |              |                         |
|          "  |    |              |                         | DGW1
|          "  |    |        PE2   |                    +----+----+
| R2_______"  |    |     +--------+-----+              |         |
|    20.2  "  |    +     |  __(20.9)    |              |(3.3.3.3)|
|          "  |  ESI21   | /        \   | Withdraw     |    |    |
|          #==============<      (VPN1) | IP-AD/EVI    |  (VPN1)---+N3
|          "  |   P2     | \__      /   | ET-ID=1      |         |
|             |          | X  (10.9)    | ----X---->   +----+----+
+-------------+          +-------+------+                   |
                                 |                          |
                                 +--------------------------+
Figure 3: Mutiple VLAN-based S-I Use Case

When physical port P3 (see Figure 6, which illustrates the physical links of Figure 3) fails, the CFM session of P2.1 (10.9 of PE2) goes down (illustrated by the 'X' inside PE2), while the CFM session of P2.2 (20.9 of PE2) continues to be UP. thus only the IP-AD/EVI route (whose ET-ID=1) of P2.1 should be withdrawn by PE2. the IP-AD/EVI route (where ET-ID=2) of P2.2 and the IP-AD/ES route should not be withdrawn by PE2.

Note that if the ET-IDs of these two IP-AD/EVI routes are the same, when P2.1 fails, DGW1 will continue to load-balance traffics whose DA=20.2 to PE2, because that there is still another IP-AD/EVI route (of VPN1) whose ESI and ET-ID are the same. That's why ACI-specifice Ethernet auto-discovery mode [I-D.wang-bess-evpn-ether-tag-id-usage] should be followed in this case.

Note that we assume that the ARP entry for 10.2 will be learnt on PE1 only, and 20.2 will be learnt on PE2 only. Note that the two downlink VRF-interfaces P2.1 (to R1) and P2.2 (to R2) on PE2 are sub-interfaces of the same physical interface P2. So they have the same ESI. ESI21 are attached to L3EVI VPN1 using multiple VLAN-based service interface, thus the mass-withdraw procedures of Section 5.4 can be used in this case too.

5.6. If VLAN-bundle Service Interface is Used

If R1 and R2 can share the same gateway IP address, P2.1 and P2.2 can be aggregated into the same subinterface (where the shared gateway IP is configured to). Although they are aggregated, this can't change the fact that they don't share the same risks. When that physical interface P3 (see Figure 6) fails, one of them will fail, while the other will continue to work well.

Thus different (in ET-ID) IP-AD/EVI routes for P2.1 and P2.2 should be advertised separately. That's why [I-D.wang-bess-evpn-ether-tag-id-usage] should be followed in this case.

5.7. On the Failure of PE3 Node

Take the Figure 3 for example, on the failure of DGW1, PE1/PE2 should delay the deletion of the RT-5G route from DGW1. DGW1 can use a new BGP attribute to indicate the delayed-deletion requirement to PE1/PE2. Otherwise the L3 traffic between R1 and R2 will be interrupted. Fortunately, DGW1 will typically have a redundant node (DGW1' in Figure 3), and DGW1' can be used to take DGW1's place when DGW1 fails.

Note that from the viewpoint of R1 and R2, the total of PE1, PE2, DGW1, DGW1' and the underlay network between them is regarded as the following VNF:

+---------------------------------+
|                                 |
|    +----------------------+     |
|    |  MPU1 (DGW1)         |     |
|    +----------------------+     |
|                                 |
|    +----------------------+     |
|    |  MPU2 (DGW1')        |     |
|    +----------------------+     |
|                                 |
|    +----------------------+     |
|    |  LPU1 (PE1)          |----------------R1
|    +----------------------+     |
|                                 |
|    +----------------------+     |
|    |  LPU2 (PE2)          |----------------R2
|    +----------------------+     |
|                                 |
+---------------------------------+
Figure 4: EVPN Instance as a VNF

R1 and R2 connect to the LPUs of the VNF. and the data packets between R1 and R2 just pass through the LPUs, not through the MPUs. But R1/R2 establish the BGP session with the MPUs, not the LPUs. When the MPU1(or actually DGW1) fails, the LPUs(or actually PE1/PE2) will keep the forwarding state unchanged untill the MPU1 or MPU2 comes up. So the delayed deletion on PE1/PE2 for DGW1's sake is apprehensible for the same reason.

Note that for the north-bound traffics, the DC GWs also plays a LPU role of this VNF.

5.8. For Common CE-prefixes behind R1 and R2

We can assume that there is a common prefix (say Prefix3) behind both R1 and R2, That's saying that DGW1 can reach Prefix3 through either R1 or R2. When R1 advertise Prefix3 to DGW1 over that CE-BGP session, 10.2 may not be the best choice for Prefix3's BGP next hop.

      EVPN Instance as a VNF
+---------------------------------+
|                                 |
|    +----------------------+     |
|    |  MPU1 (DGW1)         |<---------<-----+
|    +----------------------+     |          |
|                                 |          ^
|    +----------------------+     |          | CE-BGP
|    |  MPU2 (DGW1')        |     |          | Prefix3
|    +----------------------+     |          | NH=7.7.7.7
|                                 |          |
|    +----------------------+     |     10.2 |
|    |  LPU1 (PE1)          |---------------[R1(7.7.7.7)]---+
|    +----------------------+     |                         |
|                                 |                       Prefix3
|    +----------------------+     |     20.2                |
|    |  LPU2 (PE2)          |---------------[R2(7.7.7.7)]---+
|    +----------------------+     |
|                                 |
+---------------------------------+
Figure 5: IP Aliasing of Common CE-Prefixes

In such case, we can configure a common anycast loopback address (say 7.7.7.7) on R1 and R2. Then, when R1 advertise Prefix3 to DGW1, R1 choose 7.7.7.7 to be the BGP next-hop of the advertisement. Thus the RT-5G of Prefix3 from DGW1 will be advertised along with GW-IP=7.7.7.7.

In addition to the common prefixes behind R1 and R2, there will be exclusive prefixes particular to R1 or R2, and maybe R1/R2 can't distinguish the common prefixes from the exclusive prefixes, so R1/R2 just advertise all prefixes behind it to PEs by CE-BGP using the common nexthop (e.g. 10.2). then the PEs can not distinguish the common prefixes from the exclusive prefixes either. Thus RT-5E routes can not be used even if distributed CE-prefix auto-discovery mode is used, because that PE1/PE2 can't advertise different ESIs for the common prefixes and the exclusive prefixes.

The ECMP-Merging approaches of Section 6.2.1 and Section 6.2.2 of [I-D.wz-bess-evpn-vpws-as-vrf-ac] can also be used in such cases in order to simplify the required recursive resolution.

If one of the PEs can't resolve the GW-IP (e.g. 7.7.7.7) of a RT-5G route to another RT-5 route (e.g. the RT-5L route of 7.7.7.7), DGW1 can proxy the recursive resolution for other PEs. When 7.7.7.7 can be resolved to two RT-2 routes of 10.2 and 20.2, DGW1 can advertise the RT-5G route of the CE-prefix along with GW-IP=10.2 or GW-IP=20.2. Further, DGW1 may advertise two RT-5G routes for that CE-prefix, 10.2 is the GW-IP of one of them, 20.2 is the GW-IP of the other.

6. Load Balancing of Unicast Packets

6.1. IP Aliasing using GW-IP

When a RT-5G's GW-IP can be resolved to an ECMP-list of RT-5L (e.g. Section 6.2.1 of [I-D.wz-bess-evpn-vpws-as-vrf-ac]) routes, we can say that the IP aliasing is implemented using GW-IP.

Note that when the encapsulation is VXLAN, in this case, PE3 will encapsulate the RMAC per each path of that ECMP-list.

6.2. IP Aliasing using ESI

When a RT-5G's GW-IP can only be resolved to a single RT-2R (e.g. Section 5.3.3, where the RT-5G is a local-discovered RT-5G) route, but the <ESI,SOI> of that RT-2R route can be resolved to an ECMP-list of RT-1 routes, we can say that the IP aliasing is implemented using ESI.

It is similar to [I-D.sajassi-bess-evpn-ip-aliasing] except for a few notable exceptions as explained in the following.

o How to encapsulate Destination MAC ?
* The IP-AD/EVI routes don't have their own RMAC -

Note that when the encapsulation is VXLAN, PE3 will encapsulate the RMAC of the RT-2R route for corresponding GW-IP address. And the RMAC of PE1 MUST have the same value with the RMAC of PE2. This can be achieved by configuration.

* The IP-AD/EVI routes have their own RMAC -

Note that when the encapsulation is VXLAN, PE3 will encapsulate the RMAC of an IP-AD/EVI route in that ECMP-list. When an IP packet is encapsulated with a VNI label according to an IP-AD/EVI route, the packet SHOULD be encapsulated with a Destination-MAC according to the RMAC of that IP-AD/EVI route, if and only if the IP-AD/EVI route have a RMAC of its own.

o How to select the IP-AD/EVI routes?

When selecting corresponding IP-AD/EVI routes for a RT-5E route, the procedures discussed in Section 3.2 of [I-D.wang-bess-evpn-ether-tag-id-usage] should be followed.

7. IANA Considerations

no IANA Considerations.

9. References

9.1. Normative References

[I-D.wang-bess-evpn-ether-tag-id-usage]
Wang, Y., "Ethernet Tag ID Usage Update for Ethernet A-D per EVI Route", Work in Progress, Internet-Draft, draft-wang-bess-evpn-ether-tag-id-usage-03, , <https://datatracker.ietf.org/doc/html/draft-wang-bess-evpn-ether-tag-id-usage-03>.
[I-D.sajassi-bess-evpn-ip-aliasing]
Sajassi, A., Badoni, G., Warade, P., Pasupula, S., Drake, J., and J. Rabadan, "EVPN Support for L3 Fast Convergence and Aliasing/Backup Path", Work in Progress, Internet-Draft, draft-sajassi-bess-evpn-ip-aliasing-02, , <https://datatracker.ietf.org/doc/html/draft-sajassi-bess-evpn-ip-aliasing-02>.
[I-D.ietf-bess-evpn-prefix-advertisement]
Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. Sajassi, "IP Prefix Advertisement in EVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-prefix-advertisement-11, , <https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-prefix-advertisement-11>.
[I-D.ietf-bess-evpn-inter-subnet-forwarding]
Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. Rabadan, "Integrated Routing and Bridging in EVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-inter-subnet-forwarding-15, , <https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-inter-subnet-forwarding-15>.
[RFC7432]
Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, , <https://www.rfc-editor.org/info/rfc7432>.
[RFC8679]
Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., Michel, C., and H. Chen, "MPLS Egress Protection Framework", RFC 8679, DOI 10.17487/RFC8679, , <https://www.rfc-editor.org/info/rfc8679>.
[I-D.wz-bess-evpn-vpws-as-vrf-ac]
Wang, Y. and Z. Zhang, "EVPN VPWS as VRF Attachment Circuit", Work in Progress, Internet-Draft, draft-wz-bess-evpn-vpws-as-vrf-ac-02, , <https://datatracker.ietf.org/doc/html/draft-wz-bess-evpn-vpws-as-vrf-ac-02>.
[I-D.sajassi-bess-evpn-ac-aware-bundling]
Sajassi, A., Brissette, P., Mishra, M. P., Thoria, S., Rabadan, J., and J. Drake, "AC-Aware Bundling Service Interface in EVPN", Work in Progress, Internet-Draft, draft-sajassi-bess-evpn-ac-aware-bundling-04, , <https://datatracker.ietf.org/doc/html/draft-sajassi-bess-evpn-ac-aware-bundling-04>.

9.2. Informative References

[I-D.ietf-idr-tunnel-encaps]
Patel, K., Velde, G., Sangli, S., and J. Scudder, "The BGP Tunnel Encapsulation Attribute", Work in Progress, Internet-Draft, draft-ietf-idr-tunnel-encaps-22, , <https://datatracker.ietf.org/doc/html/draft-ietf-idr-tunnel-encaps-22>.
[I-D.wang-bess-evpn-arp-nd-synch-without-irb-02]
Wang, Y. and Z. Zhang, "ARP/ND Synching And IP Aliasing without IRB", Work in Progress, Internet-Draft, draft-wang-bess-evpn-arp-nd-synch-without-irb-02, , <https://datatracker.ietf.org/doc/html/draft-wang-bess-evpn-arp-nd-synch-without-irb-02>.

Appendix A. Explanation for Physical Links of the Use-cases

There are three PEs, two L2NEs (Layer 2 Network Elements) and five L3NEs (Layer 3 Network Elements) in abobe network. The PEs are PE1, PE2 and PE3. The L2NEs are L2NE1 and L2NE2. The L3NEs are N1/N2/N3/N4/N5. They are all illustrated in Figure 6.

There are 9 physical links among these 10 physical devices as illustrated in Figure 6. These physical links are called as PLi (i=1,2...8). The two physical ports of the same physical link PLi are both called as Pi (i=1,2...8).

As illustrated in Figure 6, some of these physical ports may have subinterfaces. When a subinterface's VLAN ID is j and it is physical port Pi's subinterface, that subinterface is called as Pi.j. For example, P1.2 is a subinterface of physical port P1 and its VLAN ID is 2.

There are three NIs (Network Instances) among PE1, PE2 and PE3. They are VPNx, VPNy and NIz. Two subinterfaces are attached to VPNx, they are P1.1 and P2.1. Other two subinterfaces are attached to VPNy, they are P1.2 and P2.2. N3 is also attched to VPNx, while N5 is also attached to VPNy.

There are two EVCs (Ethernet Virtual Connections) between L2NE1 and L2NE2, they are EVC1 and EVC2. The L2NE1's EVC1 instance (which is illustrated as the "O" on L2NE1) have three member interfaces, they are P4, P1.1 and P3.1, where P3.1 and P1.1 are of the same protection-group. The L2NE2's EVC1 instance have two member interfaces, they are P3.1 and P2.1. The L2NE2's EVC2 instance (which is illustrated as the "O" on L2NE2) have three member interfaces, they are P5, P2.2 and P3.2, where P3.1 and P1.1 are of the same protection-group. The L2NE1's EVC2 instance have two member interfaces, they are P3.2 and P1.2. The L2NE2's EVC1 instance and L2NE1's EVC2 instance are both CCC (Circuit Cross Connection) local connections.

VPNx and VPNy are associated to NIz on each PE.

A.1. Failure Detections for P1.2 (or P2.1)

There is a CFM session CFM1 between P1.2 of PE1 and L2NE2's P3.2, when physical port P3 fails, the CFM session CFM1 will go down. There is a CFM session CFM2 between P2.1 of PE2 and L2NE1's P3.1, when physical port P3 fails, the CFM session CFM2 will go down.

A.2. Protection Approaches for N1 (or N2)

A.2.1. CCC-Approaches

The L2NE1's EVC1 instance and L2NE2's EVC2 instance are both CCC local connections too. In L2NE1's EVC1 instance, P1.1 and P3.1 are of the same protection-group PG1. In L2NE2's EVC2 instance, P2.2 and P3.2 are of the same protection-group PG2. In PG1, both P1.1 and P3.1 will receive data packets. In PG2, both P2.2 and P3.2 will receive data packets.

A.2.1.1. CCC Active-Active Protection

L2NE1 (or L2NE2) will load-balance N1's (N2's) data packets between P1.1 and P3.1 (or P2.2 and P3.2).

A.2.1.2. CCC Active-Standby Protection

In PG1, P1.1 is the active path, P3.1 is the backup path. In PG2, P2.2 is the active path, P3.2 is the backup path.

That's saying that L2NE1 (or L2NE2) will not send N1's (or N2's) data packets over P3.1 (or P3.2), unless P1.1 (or P2.2) or P1 (or P2) has been in failure before that data forwarding.

A.2.2. VSI-Approaches

L2NE1's EVC2 instance and L2NE2's EVC1 instance are both VSI instances in this case. P1.1, P3.1, P2.2 and P3.2 are all individual ACs in these VSIs.

Note that L2NE2's EVC1 instance and L2NE1's EVC2 instance are still both CCC local connections in this case, and there is no PG1 or PG2 in this case, and there are no PWs in this case.

Appendix B. Different Understandings on Resolve GW-IP to RT-5

B.1. Section 3.2 of I-D.ietf-bess-evpn-prefix-advertisement

The following bullets in Section 3.2 of [I-D.ietf-bess-evpn-prefix-advertisement]:

   "RT-5 routes support recursive lookup resolution through the use of
   Overlay Indexes as follows:

   o ... It is important to note that recursive
     resolution of the Overlay Index applies upon installation into an
     IP-VRF, and not upon BGP propagation (for instance, on an ASBR).

   ...

   o In order to enable the recursive lookup resolution at the ingress
     NVE, an NVE that is a possible egress NVE for a given Overlay Index
     must originate a route advertising itself as the BGP next hop on
     the path to the system denoted by the Overlay Index. For instance:

     . ...
     . If the RT-5 specifies an ESI as the Overlay Index, recursive
       resolution can only be done if the NVE has received and installed
       an RT-1 (Auto-Discovery per-EVI) route specifying that ESI.
     . If the RT-5 specifies a GW IP address as the Overlay Index,
       recursive resolution can only be done if the NVE has received and
       installed an RT-2 (MAC/IP route) specifying that IP address in
       the IP address field of its NLRI.
     . ...

     Note that the RT-1 or RT-2 routes needed for the recursive
     resolution may arrive before or after the given RT-5 route.

   o ..."

B.2. How to Interpret Above Paragraphs

We should note that above section can be interpreted that it was written based on the following principles:

  • The following paragraph (say Praragraph 1) sepecifies how the recursive lookup resolution will be done:

    "In order to enable the recursive lookup resolution at the ingress NVE, an NVE that is a possible egress NVE for a given Overlay Index must originate a route advertising itself as the BGP next hop on the path to the system denoted by the Overlay Index. For instance:"

  • The examples that is constrained by the phrase "For instance:" described some use-cases that followed above paragraph, with the understanding that new use-cases were possible in the future with new documents, as long as the rules of above Paragraph 1 were respected.

B.3. Special PEs

If there are devices that have interpreted above Paragraph 2 as the following:

"if the recursive resolution can't find out a RT-2 for that RT-5's GW-IP, that RT-5 should not be installed."

Such behavior of that PE might not be considered as according to Section 3.2 of [I-D.ietf-bess-evpn-prefix-advertisement]. It is just not included in [I-D.ietf-bess-evpn-prefix-advertisement].

B.4. GW-IP or a new TLV

No matter how to understand Section 3.2 of [I-D.ietf-bess-evpn-prefix-advertisement], now we can assume that the function of the GW-IP field is replaced with a new TLV (e.g. the IP-mapping SOI extended community, similar to what have been done in Section 6.3 of [I-D.wz-bess-evpn-vpws-as-vrf-ac]), then we can compare these two implementations and see whether a new TLV will bring us some benefits or not.

Now assume that the Figure 5 of this draft is changed to distributed CE-prefixes auto-discovery mode (which is similar to Section 6.3 of [I-D.wz-bess-evpn-vpws-as-vrf-ac]). The comparisons are illustrated as the following:

Table 1: GW-IP vs IP-mapping SOI
No. Compared Points GW-IP New TLV RT-5E
1 Can non-upgraded RRs accept it? yes yes yes
2 Can non-upgraded DGWs* install it? maybe no no
3 Should PE1/PE2 be upgraded? yes yes yes
4 Will it confuse non-upgraded RRs? no no no
5 Will it confuse non-upgraded DGWs? no no maybe**

Notes:

*

We also can take the Figure 4 of Section 6.3 of [I-D.wz-bess-evpn-vpws-as-vrf-ac] for example, in such case, its PE3 may be a DGW.

**

If the RT-5E routes of the original Bump-in-the-wire usecase are advertised along with the route-target of the IP-VRF (thus no RTs of the BD-10), when DGW1 receives a RT-5E route and there is a SBD IRB in the IP-VRF instance, it may select RT-1 per EVI routes for the RT-5E route in the context of that SBD. This is discussed in section Section 3.6.4 of [I-D.wang-bess-evpn-ether-tag-id-usage].

We can found in above table that a new TLV will be no better than the original GW-IP field.

Note that when PEs can not distinguish the common prefixes from the exclusive prefixes, only CE-BGP nexthop based Overlay Index can be used for IP aliasing (independent CE-BGP sessions and RT-5L routes can also be used as per Section 6.1 of [I-D.wz-bess-evpn-vpws-as-vrf-ac], but this is not IP aliasing), because that the PEs can't advertise different ESIs for the common prefixes and the exclusive prefixes.

Authors' Addresses

Yubao Wang
ZTE Corporation
No.68 of Zijinghua Road, Yuhuatai Distinct
Nanjing
China
Zheng(Sandy) Zhang
ZTE Corporation
No. 50 Software Ave, Yuhuatai Distinct
Nanjing
China