BESS Workgroup J. Rabadan, Ed.
Internet Draft S. Sathappan
Intended status: Standards Track Alcatel-Lucent
T. Przygienda
Ericsson
W. Lin
T. Singh
Juniper Networks
A. Sajassi
S. Mohanty
Cisco Systems
Expires: May 28, 2016 November 25, 2015
Preference-based EVPN DF Election
draft-rabadan-bess-evpn-pref-df-00
Abstract
RFC7432 defines the Designated Forwarder (DF) in (PBB-)EVPN networks
as the PE responsible for sending broadcast, multicast and unknown
unicast traffic (BUM) to a multi-homed device/network in the case of
an all-active multi-homing ES, or BUM and unicast in the case of
single-active multi-homing.
The DF is selected out of a candidate list of PEs that advertise the
Ethernet Segment Identifier (ESI) to the EVPN network, according to
the 'service-carving' algorithm.
While 'service-carving' provides an efficient and automated way of
selecting the DF across different EVIs or ISIDs in the ES, there are
some use-cases where a more 'deterministic' and user-controlled
method is required. At the same time, Service Providers require an
easy way to force an on-demand DF switchover in order to carry out
some maintenance tasks on the existing DF or control whether a new
active PE can preempt the existing DF PE.
This document proposes an extension to the current RFC7432 DF
election procedures so that the above requirements can be met.
Status of this Memo
Rabadan et al. Expires May 28, 2016 [Page 1]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on May 28, 2016.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 3
2. Solution requirements . . . . . . . . . . . . . . . . . . . . . 3
3. EVPN BGP Attributes for Deterministic DF Election . . . . . . . 4
4. Solution description . . . . . . . . . . . . . . . . . . . . . 5
4.1 Use of the Preference algorithm . . . . . . . . . . . . . . 5
4.2 Use of the Preference algorithm in RFC7432
Ethernet-Segments . . . . . . . . . . . . . . . . . . . . . 7
4.3 The Non-Revertive option . . . . . . . . . . . . . . . . . . 7
Rabadan et al. Expires May 28, 2016 [Page 2]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 9
11. Conventions used in this document . . . . . . . . . . . . . . 9
12. Security Considerations . . . . . . . . . . . . . . . . . . . 10
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
15.1 Normative References . . . . . . . . . . . . . . . . . . . 10
15.2 Informative References . . . . . . . . . . . . . . . . . . 10
16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 10
17. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 10
17. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 10
1. Problem Statement
RFC7432 defines the Designated Forwarder (DF) in (PBB-)EVPN networks
as the PE responsible for sending broadcast, multicast and unknown
unicast traffic (BUM) to a multi-homed device/network in the case of
an all-active multi-homing ES or BUM and unicast traffic to a multi-
homed device or network.
The DF is selected out of a candidate list of PEs that advertise the
Ethernet Segment Identifier (ESI) to the EVPN network and according
to the 'service-carving' algorithm.
While 'service-carving' provides an efficient and automated way of
selecting the DF across different EVIs or ISIDs in the ES, there are
some use-cases where a more 'deterministic' and user-controlled
method is required. At the same time, Service Providers require an
easy way to force an on-demand DF switchover in order to carry out
some maintenance tasks on the existing DF or control whether a new
active PE can preempt the existing DF PE.
This document proposes an extension to the current RFC7432 DF
election procedures so that the above requirements can be met.
2. Solution requirements
This document proposes an extension of the RFC7432 'service-carving'
DF election algorithm motivated by the following requirements:
a) The solution MUST provide an administrative preference option so
that the user can control in what order the candidate PEs may
become DF, assuming they are all operationally ready to take over.
b) This extension MUST work for RFC7432 Ethernet Segments (ES) and
Rabadan et al. Expires May 28, 2016 [Page 3]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
virtual ES, as defined in [vES].
c) The user MUST be able to force a PE to preempt the existing DF for
a given EVI/ISID without re-configuring all the PEs in the ES.
d) The solution SHOULD allow an option to NOT preempt the current DF,
even if the former DF PE comes back up after a failure. This is
also known as "non-revertive" behavior, as opposed to the RFC7432
DF election procedures that are always revertive.
e) The solution MUST work for single-active and all-active multi-
homing Ethernet Segments.
3. EVPN BGP Attributes for Deterministic DF Election
This solution reuses and extends the DF Election Extended Community
defined in [EVPN-HRW-DF] that is advertised along with the ES route:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06 | Sub-Type(TBD) | DF Type |DP| Reserved=0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved = 0 | DF Preference (2 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Where the following fields are re-defined as follows:
o DF Type can have the following values:
- Type 0 - Default, mod based DF election as per RFC7432.
- Type 1 - HRW algorithm as per [EVPN-HRW-DF]
- Type 2 - Preference algorithm (this document)
o DP or 'Don't Preempt' bit, determines if the PE advertising the ES
route requests the remote PEs in the ES not to preempt it as DF.
The default value is DP=0, which is compatible with the current
'preempt' or 'revertive' behavior in RFC7432. The DP bit SHOULD be
ignored if the DF Type is different than 2.
o DF Preference defines a 2-octet value that indicates the PE
preference to become the DF in the ES. The default value MUST be
32767. This value is the midpoint in the allowed Preference range
of values, which gives the operator the flexibility of choosing a
significant number of values, above or below the default
Preference.
Rabadan et al. Expires May 28, 2016 [Page 4]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
4. Solution description
Figure 1 illustrates an example that will be used in the description
of the solution.
EVPN network
+-------------------+
| +-------+ ENNI Aggregation
| <---ESI1,500 | PE1 | /\ +----Network---+
| <-----ESI2,100 | |===||=== |
| | |===||== \ vES1 | +----+
+-----+ | | \/ |\----------------+CE1 |
CE3--+ PE4 | +-------+ | \ ------------+ |
+-----+ | | \ / | +----+
| | | X |
| <---ESI1,255 +-----+============ \ |
| <-----ESI2,200 | PE2 |========== \ vES2 | +----+
| +-----+ | \ ----------+CE2 |
| | | --------------| |
| +-----+ ----------------------+ |
| <-----ESI2,300 | PE3 +--/ | | +----+
| +-----+ +--------------+
--------------------+
Figure 1 ES and Deterministic DF Election
Figure 1 shows three PEs that are connecting EVCs coming from the
Aggregation Network to their EVIs in the EVPN network. CE1 is
connected to vES1 - that spans PE1 and PE2 - and CE2 is connected to
vES2, that is defined in PE1, PE2 and PE3.
If the algorithm chosen for vES1 and vES2 is type 2, i.e. Preference-
based, the PEs may become DF irrespective of their IP address and
based on an administrative Preference value. The following sections
provide some examples of the new defined procedures and how they are
applied in the use-case in Figure 1.
4.1 Use of the Preference algorithm
Assuming the operator wants to control - in a flexible way - what PE
becomes the DF for a given vES and the order in which the PEs become
DF in case of multiple failures, the following procedure may be used:
a) vES1 and vES2 are now configurable with three optional parameters
that are signaled in the DF Election extended community. These
parameters are the Preference, Preemption option (or "Don't
Rabadan et al. Expires May 28, 2016 [Page 5]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
Preempt Me" option) and DF algorithm type. We will represent these
parameters as [Pref,DP,type]. Let's assume vES1 is configured as
[500,0,Pref] in PE1, and [255,0,Pref] in PE2. vES2 is configured
as [100,0,Pref], [200,0,Pref] and [300,0,Pref] in PE1, PE2 and PE3
respectively.
b) The PEs will advertise an ES route for each vES, including the 3
parameters in the DF Election Extended Community.
c) According to RFC7432, each PE will wait for the DF timer to expire
before running the DF election algorithm. After the timer expires,
each PE runs the Preference-based DF election algorithm as
follows:
o The PE will check the DF type in each ES route, and assuming all
the ES routes are consistent in this DF type and the value is 2
(Preference-based), the PE will run the new extended procedure.
Otherwise, the procedure will fall back to RFC7432 'service-
carving'.
o In this extended procedure, each PE builds a list of candidate
PEs, ordered based on the Preference. E.g. PE1 will build a list
of candidate PEs for vES1 ordered by the Preference, from high
to low: PE1>PE2. Hence PE1 will become the DF for vES1. In the
same way, PE3 becomes the DF for vES2.
d) Note that, by default, the Highest-Preference is chosen for each
ES or vES, however the ES configuration can be changed to the
Lowest-Preference algorithm as long as this option is consistent
in all the PEs in the ES. E.g. vES1 could have been explicitly
configured as type Preference-based with Lowest-Preference, in
which case, PE2 would have been the DF.
e) Assuming some maintenance tasks had to be executed on PE3, the
operator could set vES2's preference to e.g. 50 so that PE2 is
forced to take over as DF for vES2. Once the maintenance on PE3 is
over, the operator could decide to leave the existing preference
or configure the old preference back.
f) In case of equal Preference in two or more PEs in the ES, the tie-
breakers will be the DP bit and the lowest IP PE in that order.
For instance:
o If vES1 parameters were [500,0,Pref] in PE1 and [500,1,Pref] in
PE2, PE2 would be elected due to the DP bit.
o If vES1 parameters were [500,0,Pref] in PE1 and [500,0,Pref] in
PE2, PE1 would be elected, assuming PE1's IP address is lower
Rabadan et al. Expires May 28, 2016 [Page 6]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
than PE2's.
g) The Preference is an administrative option that MUST be configured
on a per-ES basis from the management plane, but MAY also be
dynamically changed based on the use of local policies. For
instance, on PE1, ES1's Preference can be lowered from 500 to 100
in case the bandwidth on the ENNI port is decreased a 50% (that
could happen if e.g. the 2-port LAG between PE1 and the
Aggregation Network loses one port). Policies MAY also trigger
dynamic Preference changes based on the PE's bandwidth
availability in the core, of specific ports going operationally
down, etc. The definition of the actual local policies is out of
scope of this document. The default Preference value is 32767.
4.2 Use of the Preference algorithm in RFC7432 Ethernet-Segments
While the Preference-based DF type described in section 4.1 is
typically used in virtual ES scenarios where there is normally an
individual EVI per vES, the existing RFC7432 definition of ES allows
potentially up to thousands of EVIs on the same ES. If this is the
case, and the operator still wants to control who the DF is for a
given EVI, the use of the Preference-based DF type can also provide
the desired level of load balancing.
In this type of scenarios, the ES is configured with an
administrative Preference value, but then a range of EVI/ISIDs can be
defined to use the Highest-Preference or the Lowest-Preference
depending on the desired behavior. With this option, the PE will
build a list of candidate PEs ordered by the Preference, however the
DF for a given EVI/ISID will be determined by the local
configuration.
For instance:
o Assuming ES3 is defined in PE1 and PE2, PE1 may be configured as
[500,0,Preference] for ES3 and PE2 as [100,0,Preference].
o In addition, assuming vlan-based service interfaces, the PEs will
be configured with (vlan/ISID-range,high_or_low), e.g. (1-
2000,high) and (2001-4000, low).
o This will result in PE1 being DF for EVI/ISIDs 1-2000 and PE2 being
DF for EVI/ISIDs 2001-4000.
4.3 The Non-Revertive option
As discussed in section 2(d), an option to NOT preempt the existing
Rabadan et al. Expires May 28, 2016 [Page 7]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
DF for a given EVI/ISID is required and therefore added to the DF
Election extended community. This option will allow a non-revertive
behavior in the DF election.
Note that, when a given PE in an ES is taken down for maintenance
operations, before bringing it back, the Preference may be changed in
order to provide a non-revertive behavior. The DP bit and the
mechanism explained in this section will be used for those cases when
a former DF comes back up without any controlled maintenance
operation, and the non-revertive option is desired in order to avoid
service impact.
In Figure 1, we assume that based on the Highest-Pref, PE3 is the DF
for ESI2.
If PE3 has a link, EVC or node failure, PE2 would take over as DF.
If/when PE3 comes back up again, PE3 will take over, causing some
unnecessary packet loss in the ES.
The following procedure avoids preemption upon failure recovery
(please refer to Figure 1):
1) A new "Don't Preempt Me" parameter is defined on a per-PE per-ES
basis. If "Don't Preempt Me" is disabled (default behavior) the
advertised DP bit will be 0. If "Don't Preempt Me" is enabled on a
PE, the ES route will be advertised with DP=1 ("Don't Preempt Me")
once the DF timer is expired and the DF elected.
2) Assuming we want to avoid 'preemption', the three PEs are
configured with the "Don't Preempt Me" option. Note that each PE
individually MAY be configured with different preemption value. In
this example, we assume ESI2 is configured as 'DP=disabled' in PE1
but 'DP=enabled' in PE2 and PE3.
3) When ES2 is enabled in the three PEs, and after the DF timer, the
PEs (due to the Highest-Pref type) select PE3 as DF for EVI1. Only
after the timer and the DF election, the PEs will check the 'DP'
configuration and since it is enabled on PE2 and PE3, these two
PEs will send an ES route update, now with DP=1. This update will
not cause any change in the existing DFs since there is no change
in the Preference value.
4) If PE3's vES2 goes down (due to EVC failure - detected by OAM, or
port failure or node failure), PE2 will become the DF for
ESI2/EVI1.
5) When PE3's vES2 comes back up, PE3 will start a boot-timer (if
booting up) or hold-timer (if the port or EVC recovers). That
Rabadan et al. Expires May 28, 2016 [Page 8]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
timer will allow some time for PE3 to receive the ES routes from
PE1 and PE2. PE3 will then check its own
[Pref,DP,type]=[300,1,Pref] and if its Pref is higher than any of
the other PE's Pref, then PE3 will send the ES route with an 'in-
use' Preference equal to the highest received Preference. In this
case, since PE2 advertised [Pref,DP,type]=[200,1,Pref], PE3 will
then send [200,0,Pref].
Note that, a PE will always send DP=0 the first time it advertises
an ES route after the ES becomes active, and irrespective of the
configuration. Also a PE will always send DP=0 as long as the
advertised Pref is the 'in-use' Pref (as opposed to the 'admin'
Pref).
This ES route update sent by PE3 (with [200,0,Pref]) will not
cause any changes in the DF election and PE2 will continue being
DF. This is because the DP bit will be used as a tie-breaker in
the DF election. That is, if a PE has two candidate PEs with the
same Pref, it will pick up the one with DP=1.
6) Only in case of PE2's failure, PE3 will become DF again (assuming
it wins the DF election to PE1), and will resend the ES route with
the admin Pref (as opposed to the 'in-use' Pref) and the DP bit
that corresponds to its configuration.
5. Conclusions
Service Providers are seeking for options where the DF election can
be controlled by the user in a deterministic way and with a non-
revertive behavior. This document defines the use of a Preference
algorithm that can be configured and used in a flexible manner to
achieve those objectives.
11. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying RFC-2119 significance.
In this document, the characters ">>" preceding an indented line(s)
indicates a compliance requirement statement using the key words
listed above. This convention aids reviewers in quickly identifying
Rabadan et al. Expires May 28, 2016 [Page 9]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
or finding the explicit compliance requirements of this RFC.
12. Security Considerations
This section will be added in future versions.
13. IANA Considerations
This document solicits the allocation of DF type = 2 in the registry
created by [vES] for the DF type field.
15. References
15.1 Normative References
[RFC7432]Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet
VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, <http://www.rfc-
editor.org/info/rfc7432>.
15.2 Informative References
[vES] Sajassi et al. "EVPN Virtual Ethernet Segment", draft-sajassi-
bess-evpn-virtual-eth-segment-01, work-in-progress, July 6, 2015.
[EVPN-HRW-DF] Mohanty S. et al. "A new Designated Forwarder Election
for the EVPN", draft-mohanty-bess-evpn-df-election-02, work-in-
progress, October 19, 2015.
16. Acknowledgments
17. Contributors
In addition to the authors listed, the following individuals also
contributed to this document:
Vinod Prabhu, ALU
Kiran Nagaraj, ALU
John Drake, Juniper
Selvakumar Sivaraj, Juniper
17. Authors' Addresses
Rabadan et al. Expires May 28, 2016 [Page 10]
Internet-Draft Preference-based EVPN DF Election November 25, 2015
Jorge Rabadan
Alcatel-Lucent
777 E. Middlefield Road
Mountain View, CA 94043 USA
Email: jorge.rabadan@alcatel-lucent.com
Senthil Sathappan
Alcatel-Lucent
Email: senthil.sathappan@alcatel-lucent.com
Tony Przygienda
Ericsson
Email: antoni.przygienda@ericsson.com
Tapraj Singh
Juniper Networks, Inc.
Email: tsingh@juniper.net
Wen Lin
Juniper Networks, Inc.
Email: wlin@juniper.net
Ali Sajassi
Cisco Systems, Inc.
Email: sajassi@cisco.com
Satya Ranjan Mohanty
Cisco Systems, Inc.
Email: satyamoh@cisco.com
Rabadan et al. Expires May 28, 2016 [Page 11]