Skip to main content

Early Review of draft-ietf-bess-evpn-mh-pa-07

Request Review of draft-ietf-bess-evpn-mh-pa-07
Requested revision 07 (document currently at 10)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2022-12-02
Requested 2022-11-17
Requested by Stephane Litkowski
Authors Patrice Brissette , Luc André Burdet , Bin Wen , Eddie Leyton , Jorge Rabadan
I-D last updated 2023-02-16
Completed reviews Genart Early review of -09 by Paul Kyzivat (diff)
Rtgdir Early review of -07 by Ketan Talaulikar (diff)
Assignment Reviewer Ketan Talaulikar
State Completed
Request Early review on draft-ietf-bess-evpn-mh-pa by Routing Area Directorate Assigned
Posted at
Reviewed revision 07 (document currently at 10)
Result Has issues
Completed 2023-02-16
16	Abstract

18	   The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables
19	   establishing a logical link-aggregation connection with a redundant
20	   group of independent nodes.  The purpose of multi-chassis LAG is to
21	   provide a solution to achieve higher network availability, while

[nit] please remove the comma after "availability" 

22	   providing different modes of sharing/balancing of traffic.  RFC7432
23	   defines EVPN based MC-LAG with single-active and all-active

[nit] s/EVPN based/EVPN-based

24	   multi-homing load-balancing mode.  The current draft expands on

[nit] s/current draft/this document/g - applies to other references to "draft"

25	   existing redundancy mechanisms supported by EVPN and introduces
26	   support for port-active load-balancing mode.

85	1.  Introduction

87	   EVPN, as per [RFC7432], provides all-active per flow load-balancing

[nit] s/per flow/per-flow/g

88	   for multi-homing.  It also defines single-active with service carving
89	   mode, where one of the PEs, in redundancy relationship, is active per

[nit] s/in redundancy/in a redundancy

90	   service.

92	   While these two multi-homing scenarios are most widely utilized in

[minor] Would be good to give the reference to RFC7432? Suggestion:

... two multi-homing scenarios (speficied in [RFC7432) are ...

93	   data center and service provider access networks, there are scenarios
94	   where active-standby per interface multi-homing load-balancing is
95	   useful and required.  The main consideration for this mode of

[minor] Suggestion:

... for this new mode of ...

96	   load-balancing is the determinism of traffic forwarding through a
97	   specific interface rather than statistical per flow load-balancing
98	   across multiple PEs providing multi-homing.  The determinism provided
99	   by active-standby per interface is also required for certain QOS

[minor] Suggestion:

... provided by this per-interface active-standby mode is also ...

[nit] s/per interface/per-interface/g

100	   features to work.  While using this mode, customers also expect
101	   minimized convergence during failures.

[major] The terms "active-standby per-interface", "per-interface active-standby" 
and "port-active" are used through the document interchangeably. 
Is it possible to converge on one term that is used consistently? Perhaps
define the term in this Sec 1 and then use just "port-active" through the rest
of the document maybe?

[minor] "minimized" sounds a bit odd. Did you mean "fast convergence" perhaps?

103	   A new type of load-balancing mode, port-active load-balancing, is
104	   defined.  This draft describes how the new load-balancing mode can be
105	   supported via EVPN.  The new mode may also be referred to as per
106	   interface active/standby.

[minor] Text seems a bit fragmented. Suggestion:

This document defines a new type of multi-homing mode called port-active 
load-balancing, and describes how this new mode can be supported via EVPN.

[major] The new mode does provide multi-homing, but I am not sure that it 
provides load-balancing of traffic in the true sense. 
Can you please clarify what is meant by load-balancing?

108	                    +-----+
109	                    | PE3 |
110	                    +-----+
111	                 +-----------+
112	                 |  MPLS/IP  |
113	                 |  CORE     |
114	                 +-----------+
115	               +-----+   +-----+
116	               | PE1 |   | PE2 |
117	               +-----+   +-----+
118	                  |         |
119	                  I1       I2
120	                    \     /
121	                     \   /
122	                     +---+
123	                     |CE1|
124	                     +---+

126	                         Figure 1: MC-LAG Topology

128	   Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are

[nit] s/a MC-LAG/an MC-LAG

129	   part of the same redundancy group providing multi-homing to CE1 via
130	   interfaces I1 and I2.  Interfaces I1 and I2 are members of a LAG
131	   running LACP protocol.  The core, shown as IP or MPLS enabled,
132	   provides wide range of L2 and L3 services.  MC-LAG multi-homing

[nit] s/provides wide/provides a wide

133	   functionality is decoupled from those services in the core and it
134	   focuses on providing multi-homing to the CE.  With per-port active/
135	   standby load-balancing, only one of the two interface I1 or I2 would

[nit] s/two interface/two interfaces

136	   be in forwarding, the other interface will be in standby.  This also

[nit] s/forwarding, the/forwarding and the

137	   implies that all services on the active interface are in active mode
138	   and all services on the standby interface operate in standby mode.

140	1.1.  Requirements Language

142	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
144	   "OPTIONAL" in this document are to be interpreted as described in BCP
145	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
146	   capitals, as shown here.

148	2.  Multi-Chassis Link Aggregation

[minor] Is this mode only applicable for MC-LAG or other types of access as

150	   When a CE is multi-homed to a set of PE nodes using the
151	   [IEEE.802.1AX_2014] Link Aggregation Control Protocol (LACP), the PEs
152	   must act as if they were a single LACP speaker for the Ethernet links
153	   to form and operate as a Link Aggregation Group (LAG).  To achieve
154	   this, the PEs connected to the same multi-homed CE must synchronize
155	   LACP configuration and operational data among them.  Interchassis
156	   Communication Protocol (ICCP) [RFC7275] has been used for that
157	   purpose.  EVPN LAG simplifies greatly that solution.  Along with the
158	   simplification come a few assumptions:

[major] Are these assumptions or requirements/constraints? Please consider
using normative language for such operational requirements as done in Sec 3.

160	   *  a CE device connected to multi-homing PEs may have a single LAG
161	      with all its active links i.e. links in the LAG operate in all-
162	      active load-balancing mode.

[major] Why "may have"? Is it not a requirement that the CE considers all
links to both the PEs as active and it is the PEs who would set the link
down/out-of-sync on their side based on the EVPN signaling?

164	   *  Same LACP parameters MUST be configured on peering PEs such as
165	      system id, port priority and port key.

[nit] s/priority and/priority, and

167	   Any discrepancies from this list are out of the scope of this

[minor] If both the above are made normative MUST, then it is not really out
of scope, right? The handling of mis-configurations/mis-wiring can be out of

168	   document, as are mis-configuration and mis-wiring detection across

[nit] misconfiguration & miswiring

169	   peering PEs.

171	3.  Port-active Load-balancing Procedure

173	   Following steps describe the proposed procedure with EVPN LAG to

[nit] The following

174	   support port-active load-balancing mode:

176	   a.  The Ethernet-Segment Identifier (ESI) MUST be assigned per access
177	       interface as described in [RFC7432], which may be auto derived or

[nit] auto-derived

178	       manually assigned.  Access interface MAY be a Layer-2 or Layer-3

[nit] The access

179	       interface.  The usage of ESI over Layer-3 interface is newly

[nit] over a Layer-3

180	       described in this document.

182	   b.  Ethernet-Segment (ES) MUST be configured in port-active
183	       load-balancing mode on peering PEs for specific access interface.

185	   c.  Peering PEs MAY exchange only Ethernet-Segment (ES) route
186	       (Route Type-4) when ESI is configured on a Layer-3 interface.

188	   d.  PEs in the redundancy group leverage the DF election defined in
189	       [RFC8584] to determine which PE keeps the port in active mode and
190	       which one(s) keep it in standby mode.  While the DF election

[nit] one keeps

191	       defined in [RFC8584] is per [ES, Ethernet Tag] granularity, for
192	       port-active mode of multi-homing, the DF election is done per

[nit] the port-active

193	       <ES>.  The details of this algorithm are described in Section 4.

195	   e.  DF router MUST keep corresponding access interface in up and
196	       forwarding active state for that Ethernet-Segment

198	   f.  Non-DF routers will by default implement a bidirectional blocking
199	       scheme for all traffic in line with [RFC7432] Single-Active
200	       blocking scheme, albeit across all VLANS.

[nit] VLANs

202	       *  Non-DF routers MAY bring and keep peering access interface
203	          attached to it in operational down state.

[nit] an operational

205	       *  If the interface is running LACP protocol, then the non-DF PE
206	          MAY also set the LACP state to OOS (Out of Sync) as opposed to
207	          interface state down.  This allows for better convergence on

[nit] an interface down state

208	          standby to active transition.

210	   g.  For EVPN-VPWS service, the usage of primary/backup bits of EVPN
211	       Layer-2 attributes extended community [RFC8214] is highly
212	       recommended to achieve better convergence.

214	4.  Designated Forwarder Algorithm to Elect per Port-active PE

216	   The ES routes, running in port-active load-balancing mode, are
217	   advertised with the new Port Mode Load-Balancing capability in the DF
218	   Election Extended Community defined in [RFC8584].  Moreover, the ES
219	   associated to the port leverages existing procedure of Single-Active,

[nit] associated with
[nit] leverages the existing

220	   and signals Single-Active Multihomed site redundancy mode along with
221	   Ethernet-AD per-ES route (Section 7.5 of [RFC7432]).  Finally the
222	   ESI-label based split-horizon procedures in Section 8.3 of [RFC7432]

[nit] ESI label-based

223	   should be used to avoid transient echo'ed packets when Layer-2
224	   circuits are involved.

226	   The various algorithms for DF Election are discussed in Sections 4.2
227	   to 4.5 for completeness, although the choice of algorithm in this

[nit] completeness eventhough the choice of the algorithm

228	   solution doesn't affect complexity or performance as in other load-
229	   balancing modes.

231	4.1.  Capability Flag

233	   [RFC8584] defines a DF Election extended community, and a Bitmap
234	   field to encode "capabilities" to use with the DF election algorithm
235	   in the DF algorithm field.  Bitmap (2 octets) is extended by the
236	   following value:

[major] The extension is only the P bit. The text gives a wrong impression
that the D and AC-DF bits are also being extended by this document. Please
consider changing this text to clarify that D and AC-DF bit are existing
bits that are also used by this mode.

238	                            1 1 1 1 1 1
239	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
240	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
241	       |D|A|     |P|                   |
242	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

244	    Figure 2: Amended Bitmap field in the DF Election Extended Community

246	   Bit 0:    D bit or 'Don't Preempt' bit, as explained in
247	             [I-D.ietf-bess-evpn-pref-df].

249	   Bit 1:    AC-DF Capability (AC-Influenced DF election), as explained
250	             in [RFC8584].

252	   Bit 5:    (corresponds to Bit 29 of the DF Election Extended
253	             Community and it is defined by this document): 'Port Mode

[minor] Suggest to remove this "Bit 29" - I don't see similar counting of bits
within the entire ExtComm being done anywhere. The "Bit 5" of the field is
clear enough.

254	             Load-Balancing' Capability (P bit hereafter), determines

[nit] the use of quote seems odd

255	             that the DF-Algorithm should be modified to consider the
256	             port ES only and not the Ethernet Tags.

[major] Seems odd to call this "port mode load-balancing" when there is no
load-balancing? Wouldn't "port active mode multihoming" be more accurate?

258	4.2.  Modulo-based Algorithm

260	   The default DF Election algorithm, or modulus-based algorithm as in
261	   [RFC7432] and updated by [RFC8584], is used here, at the granularity
262	   of ES only.  Given that ES-Import Route Target extended community may
263	   be auto-derived and directly inherits its auto-derived value from ESI
264	   bytes 1-6, many operators differentiate ESI primarily within these
265	   bytes.  As a result, bytes 3-6 are used to determine the designated
266	   forwarder using Modulo-based DF assignment, achieving good entropy
267	   during Modulo calculation across ESIs:
268	   Assuming a redundancy group of N PE nodes, the PE with ordinal i is
269	   the DF for an <EE> when (Es mod N) = i, where Es represents bytes 3-6
270	   of that ESI.

272	4.3.  HRW Algorithm

274	   Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also
275	   be used and signaled, and modified to operate at the granularity of
276	   <ES> rather than per <ES, VLAN>.

278	   Section 3.2 of [RFC8584] describes computing a 32 bit CRC over the

[nit] 32-bit

279	   concatenation of Ethernet Tag and ESI.  For port-active
280	   load-balancing mode, the Ethernet Tag is simply removed from the CRC
281	   computation.

283	   DF(Es) denotes the DF and BDF(Es) denote the BDF for the ESI es; Si
284	   is the IP address of PE i; and Weight is a function of Si, and Es.

286	   1.  DF(Es) = Si| Weight(Es, Si) >= Weight(Es, Sj), for all j.  In the
287	       case of a tie, choose the PE whose IP address is numerically the
288	       least.  Note that 0 <= i,j < number of PEs in the redundancy
289	       group.

291	   2.  BDF(Es) = Sk| Weight(Es, Si) >= Weight(Es, Sk), and Weight(Es,
292	       Sk) >= Weight(Es, Sj).  In the case of a tie, choose the PE whose
293	       IP address is numerically the least.

295	   Where:

297	   *  DF(Es) is defined to be the address Si (index i) for which
298	      Weight(Es, Si) is the highest; 0 <= i < N-1.

300	   *  BDF(Es) is defined as that PE with address Sk for which the
301	      computed Weight is the next highest after the Weight of the DF.  j
302	      is the running index from 0 to N-1; i and k are selected values.

304	4.4.  Preference-based DF Election

306	   When the new capability 'Port-Mode' is signaled, the algorithm is
307	   modified to consider the port only and not any associated Ethernet
308	   Tags.  Furthermore, the "port-based" capability MUST be compatible
309	   with the "Don't Preempt" bit.  When an interface recovers, a peering
310	   PE signaling D-bit will enable non-revertive behaviour at the port

[nit] behavior

311	   level.

313	4.5.  AC-Influenced DF Election

315	   The AC-DF bit MUST be set to 0 when advertising Port Mode Load-
316	   Balancing capability (P=1).  When an AC (sub-interface) goes down, it
317	   does not influence the DF election.  The peer's Ethernet A-D per EVI
318	   is ignored in all Port Mode DF Election algorthms.

[nit] algorithms

320	   Upon receiving AC-DF bit set (A=1) from a remote PE, it MUST be

[nit] the AC-DF bit set

321	   ignored when performing Port-Mode DF Election.

323	5.  Convergence considerations

325	   To improve the convergence, upon failure and recovery, when

[nit] when the

326	   port-active load-balancing mode is used, some advanced
327	   synchronization between peering PEs may be required.  Port-active is
328	   challenging in a sense that the "standby" port is in down state.  It

[nit] in the sense
[nit] in a down

329	   takes some time to bring a "standby" port in up-state and settle the

[nit] port to an up state

330	   network.  For IRB and L3 services, ARP / ND cache may be
331	   synchronized.  Moreover, associated VRF tables may also be
332	   synchronized.  For L2 services, MAC table synchronization may be
333	   considered.

335	   Finally, for members of a LAG running LACP the ability to set the
336	   "standby" port in "out-of-sync" state a.k.a "warm-standby" can be
337	   leveraged.

339	5.1.  Primary / Backup per Ethernet-Segment

341	   The EVPN Layer 2 Attributes Control Flags extended community SHOULD
342	   be advertised in Ethernet A-D per ES route for fast convergence.

344	   Only the P and B bits are relevant to this document, and only in the
345	   context of Ethernet A-D per ES routes:

[minor] Please consider providing references for the ExtComm and the bits on
their first use.

347	   *  When advertised, the EVPN Layer 2 Attributes Control Flags
348	      extended community SHALL have only P or B bits set and all other
349	      bits and fields MUST be zero.

351	   *  A remote PE receiving the optional EVPN Layer 2 Attributes Control
352	      Flags extended community in Ethernet A-D per ES routes SHALL
353	      consider only P and B bits.

[minor] In other words, the other bits are ignored and this is not considered
an error/malformed, right?

355	   For EVPN Layer 2 Attributes Control Flags extended community sent and
356	   received in Ethernet A-D per EVI routes used in [RFC8214], [RFC7432]
357	   and [I-D.ietf-bess-evpn-vpws-fxc]:

359	   *  P and B bits received are overridden by "parent" bits on Ethernet
360	      A-D per ES above.

362	   *  Other fields and bits of the extended community are used according
363	      to the procedures of those documents.

365	5.2.  Backward Compatibility

367	   Implementations that comply with [RFC7432] or [RFC8214] only (i.e.,
368	   implementations that predate this document) will not advertise the

[nit] predate this specification

369	   EVPN Layer 2 Attributes Control Flags extended community in Ethernet
370	   A-D per ES routes.  That means that all remote PEs in the ES will not
371	   receive P and B bit per ES and will continue to receive and honour

[major] Don't we need normative language to this effect in Sec 4 or 5 above?
[nit] honor

372	   the P and B bits received in Ethernet A-D per EVI route(s).
373	   Similarly, an implementation that complies with [RFC7432] or
374	   [RFC8214] only and that receives an EVPN Layer 2 Attributes Control
375	   Flags extended community will ignore it and will continue to use the
376	   default path resolution algorithm.

[minor] The Sec Cons section touches upon this, but it would be good to
describe here in brief the multi-homing/load-balancing mode that would result
with some reference pointers.

378	6.  Applicability

[minor] Suggestion: Consider rolling in the first half of this section into
the section 1 to give a better context to the reader and the 2nd
half in section 2.

380	   A common deployment is to provide L2 or L3 service on the PEs
381	   providing multi-homing.  The services could be any L2 EVPN such as
382	   EVPN VPWS, EVPN [RFC7432], etc.  L3 service could be in VPN context

[nit] a VPN

383	   [RFC4364] or in global routing context.  When a PE provides first hop

[nit] in a global

384	   routing, EVPN IRB could also be deployed on the PEs.  The mechanism
385	   defined in this document is used between the PEs providing L2 and/or
386	   L3 services, when per interface single-active load-balancing is
387	   desired.

389	   A possible alternate solution is the one described in this draft is
390	   MC-LAG with ICCP [RFC7275] active-standby redundancy.  However, ICCP
391	   requires LDP to be enabled as a transport of ICCP messages.  There
392	   are many scenarios where LDP is not required e.g. deployments with
393	   VXLAN or SRv6.  The solution defined in this draft with EVPN does not
394	   mandate the need to use LDP or ICCP and is independent of the
395	   underlay encapsulation.

397	7.  Overall Advantages

[minor] Suggestion: Consider moving this text up front to give reader a better
context on the benefits/reason for introduction of this mode.

399	   The use of port-active multi-homing brings the following benefits to
400	   EVPN networks:

402	   a.  Open standards based per interface single-active load-balancing

[nit] standards-based

403	       mechanism that eliminates the need to run ICCP and LDP (e.g. they

[nit] e.g.,

404	       may be running VXLAN or SRv6 in the network).

406	   b.  Agnostic of underlay technology (MPLS, VXLAN, SRv6) and
407	       associated services (L2, L3, Bridging, E-LINE, etc).

409	   c.  Provides a way to enable deterministic QOS over MC-LAG attachment
410	       circuits.

412	   d.  Fully compliant with [RFC7432], does not require any new protocol
413	       enhancement to existing EVPN RFCs.

415	   e.  Can leverage various DF election algorithms e.g. modulo, HRW,
416	       etc.

418	   f.  Replaces legacy MC-LAG ICCP-based solution, and offers following

[nit] the following

419	       additional benefits:

421	       *  Efficiently supports 1+N redundancy mode (with EVPN using BGP
422	          RR) where as ICCP requires full mesh of LDP sessions among PEs
423	          in redundancy group.

[nit] whereas
[nit] requires a full
[nit] in the redundancy

425	       *  Fast convergence with mass-withdraw is possible with EVPN, no
426	          equivalent in ICCP.

428	8.  IANA Considerations

430	   This document solicits the allocation of the following values:

[major] Please specify that this is from the "BGP Extended Communities"
registry group

432	   *  Bit 5 in the [RFC8584] DF Election Capabilities registry, with
433	      name "P" for Port Mode Load-Balancing.

[minor] consider naming "P bit - XXX" so it is more descriptive.

435	9.  Security Considerations

437	   The same Security Considerations described in [RFC7432] and [RFC8584]
438	   are valid for this document.

440	   By introducing a new capability, a new requirement for unanimity (or
441	   lack thereof) between PEs is added.  Without consensus on the new DF
442	   election procedures and Port Mode, the DF election algorithm falls
443	   back to the default DF election as provided in [RFC8584] and
444	   [RFC7432].  This behavior could be exploited by an attacker that
445	   manages to modify the configuration of one PE in the ES so that the
446	   DF election algorithm and capabilities in all the PEs in the ES fall
447	   back to the default DF election.  If that is the case, the PEs will
448	   be exposed to the same unfair load balancing, service disruption, and
449	   possibly black-holing or duplicate traffic mentioned in those
450	   documents and their security sections.

[minor] If we are talking about attackers modifying configs, then would they
not do more harm by making the configs on the dual-home PEs to be not
consistent? Without detection mechanism, the service impact may be far greater
in this case?

452	10.  Acknowledgements

509	12.2.  Informative References

511	   [I-D.ietf-bess-evpn-vpws-fxc]
512	              Sajassi, A., Brissette, P., Uttaro, J., Drake, J.,
513	              Boutros, S., and J. Rabadan, "EVPN VPWS Flexible Cross-
514	              Connect Service", Work in Progress, Internet-Draft, draft-
515	              ietf-bess-evpn-vpws-fxc-05, 8 February 2022,
516	              <
517	              vpws-fxc-05.txt>.

519	   [IEEE.802.1AX_2014]
520	              IEEE, "IEEE Standard for Local and metropolitan area
521	              networks -- Link Aggregation", IEEE 802.1AX-2014,
522	              DOI 10.1109/IEEESTD.2014.7055197, 24 December 2014,
523	              <
524	              opac?punumber=6997981>.

[major] Should the reference to MC-LAG not be normative since the document
talks about setting port in "out-of-sync" state?

526	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
527	              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
528	              2006, <>.