Skip to main content

Optimized Ingress Replication Solution for Ethernet VPN (EVPN)
draft-ietf-bess-evpn-optimized-ir-12

Note: This ballot was opened for revision 09 and is now closed.

Erik Kline
No Objection
Francesca Palombini
No Objection
Comment (2021-10-21 for -09) Sent
Thanks for the work on this document.

I strongly support John's DISCUSS and COMMENT points, especially:

* overuse of abbreviations and the assumptions that the reader is familiar with all concepts and terms used make the document really hard to read for non-expert in the field. I'll also point out that having a terminology section with expansion but with no references is not as useful as one with proper descriptions and references.

* there are several occurrences of SHOULD in the document that left me wondering in which cases it is acceptable to not go with the recommended behavior. Again, John has pointed out 2 of them I believe, but I would suggest to check all normative SHOULD as only a couple were put in context. Also note that even if the context might be obvious to the authors and wg, it would help the reader with one more sentence clarifying it.

Francesca
John Scudder
(was Discuss) No Objection
Comment (2022-01-06 for -11) Sent
Thanks for the updates!

Overall comments:

1. This document suffers from what I think is an overuse of
   abbreviations.  See https://www.psychologicalscience.org/observer/alienating-the-audience-how-abbreviations-hamper-scientific-communication
   for one perspective on why this is problematic.  Any individual
   one of these doesn't rise to the level of being objectionable, but in 
   aggregate at some point it makes the document a lot less accessible to
   anyone who isn't part the in-group who has memorized the abbreviations.
   4r xm, <- ... is !@? to rd, 4r no gd rn, [see terminology section below]
   even though anyone who goes to the effort of looking up the terminology
   can decode it.  I would really prefer it if this were improved; I think 
   it's not that much work for the authors and will make the resulting spec 
   more usable.  I had intended to offer an example edit that expands many
   of the abbreviations, but have run out of time; I'd still be willing to
   do it later if requested, let me know.
   
   (Consider also the contrast with RFC 6514; for instance instead of 
   referring to "the L-flag", when mentioning that flag it says "the Leaf
   Information Required flag".  Since we don't pay by the byte for publishing
   our documents, it seems to be worth spending a few more keystrokes to 
   make it easier to read them.)
   
2. The document starts in the middle.  It jumps right from the requirements
   to the tunnel attribute diagram, with no overview or outline of the 
   solution.  This is related to Pascal's review comment, mentioned by Éric
   Vyncke.
   
Terminology:
   
   4r: for
   xm: example
   <-: this
   ...: sentence
   !@?: difficult
   rd: read
   gd: good
   rn: reason
   
Detailed review:

I’ve done my comments in the form of an edited copy of the draft.  I
don't think the datatracker tooling allows me to use attachments, so
I'll follow up to this with an email with attached edited copy, as well
as a PDF of the rfcdiff output for your convenience if you’d like to use
it. I’ve also pasted a traditional diff below to capture the comments
for the record and in case you want to use it for in-line reply. I’d
appreciate feedback regarding whether you found this a useful way to
receive my comments as compared to a more traditional numbered list of
comments with selective quotation from the draft.


*** draft-ietf-bess-evpn-optimized-ir-09.txt	2021-10-20 13:48:15.000000000 -0400
--- draft-ietf-bess-evpn-optimized-ir-09-jgs-markup.txt	2021-10-20 20:39:39.000000000 -0400
***************
*** 19,25 ****
  
  Abstract
  
!    Network Virtualization Overlay (NVO) networks using EVPN as control
     plane may use Ingress Replication (IR) or PIM (Protocol Independent
     Multicast) based trees to convey the overlay Broadcast, Unknown
     unicast and Multicast (BUM) traffic.  PIM provides an efficient
--- 19,25 ----
  
  Abstract
  
!    Network Virtualization Overlay (NVO) networks using EVPN as their control
     plane may use Ingress Replication (IR) or PIM (Protocol Independent
     Multicast) based trees to convey the overlay Broadcast, Unknown
     unicast and Multicast (BUM) traffic.  PIM provides an efficient
***************
*** 105,111 ****
  
     Ethernet Virtual Private Networks (EVPN) may be used as the control
     plane for a Network Virtualization Overlay (NVO) network.  Network
!    Virtualization Edge (NVE) devices and Provider Edges (PEs) that are
  
  
  
--- 105,111 ----
  
     Ethernet Virtual Private Networks (EVPN) may be used as the control
     plane for a Network Virtualization Overlay (NVO) network.  Network
!    Virtualization Edge (NVE) and Provider Edge (PEs) devices that are
  
  
  
***************
*** 182,187 ****
--- 182,191 ----
     "OPTIONAL" in this document are to be interpreted as described in BCP
     14 [RFC2119] [RFC8174] when, and only when, they appear in all
     capitals, as shown here.
+    
+ Is there any logic to the order in which the terms are presented? If so,
+ it escaped me. It would have been much better for my reading of the document,
+ if the terms had been given in alphabetical order, for obvious reasons.
  
     The following terminology is used throughout the document:
  
***************
*** 236,241 ****
--- 240,247 ----
        Replicator-AR route.  It is used to identify the ingress packets
        that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR
        case.
+       
+ A reference to section 8 would be helpful in the above.
  
     -  IR-VNI: VNI advertised along with the RT-3 for IR.
  
***************
*** 288,296 ****
--- 294,313 ----
     hereafter) meets the following requirements:
  
     a.  It provides an IR optimization for BM (Broadcast and Multicast)
+    
+ Thank you for expanding "BM", but... you've already defined it in your
+ Terminology section, so maybe you don't need to define it again. (But see
+ also my general comment on the subject of abbreviations in general; 
+ depending on how we resolve that this comment may be overtaken by events.)
+ 
         traffic without the need for PIM, while preserving the packet
         order for unicast applications, i.e., known and unknown unicast
         traffic should follow the same path.  This optimization is
+        
+ ... the same path as what? If you mean unknown should follow the same path
+ as known, then use "... i.e., unknown unicast traffic should follow the same
+ path as known unicast traffic". If you mean something different, what is it?
+ 
         required in low-performance NVEs.
  
     b.  It reduces the flooded traffic in NVO networks where some NVEs do
***************
*** 361,369 ****
--- 378,403 ----
  
     The Flags field is 8 bits long.  This document defines the use of 4
     bits of this Flags field:
+    
+ It would be quite helpful to include a diagram of the Flags field as in
+ RFC 6514 §5:
+ 
+    The Flags field has the following format:
+ 
+        0 1 2 3 4 5 6 7
+       +-+-+-+-+-+-+-+-+
+       |  reserved   |L|
+       +-+-+-+-+-+-+-+-+
+ 
+ except of course with all the new and previously-defined flags filled
+ in too.
  
     -  bits 3 and 4, forming together the Assisted-Replication Type (T)
        field
+       
+ Up here you call it the Assisted-Replication Type field.  Just a few lines
+ later you call it the AR Type field.  Can you make up your mind and use 
+ one or the other, please?
  
     -  bit 5, called the Broadcast and Multicast (BM) flag
  
***************
*** 406,411 ****
--- 440,448 ----
     -  Flag L is an existing flag defined in [RFC6514] (L=Leaf
        Information Required) and it will be used only in the Selective AR
        Solution.
+       
+ I think it would be nice to provide the bit position for this flag, as in
+ "(L=Leaf Information Required, bit 7)"
  
     Please refer to Section 11 for the IANA considerations related to the
     PTA flags.
***************
*** 420,436 ****
--- 457,497 ----
        address that we denominate IR-IP in this document.  When
        advertised by an AR-LEAF node, the Regular-IR route SHOULD be
        advertised with type T= AR-LEAF.
+       
+ Your use of SHOULD implies there is at least one case where a reasonable
+ implementation could choose to advertise a Regular-IR route from an 
+ AR-LEAF node with a different type.  I am left to guess what the case is,
+ and what value it should choose then.  Maybe it would use RNVE instead?
+ Please say something about this.  On the other hand if there isn't any 
+ such case, this should be a MUST.
  
     -  Replicator-AR route: this route is used by the AR-REPLICATOR to
        advertise its AR capabilities, with the fields set as follows:
  
        o  Originating Router's IP Address MUST be set to an IP address of
           the PE that should be common to all the EVIs on the PE (usually
+          
+ What's "the PE" in this context?  I'm assuming it means "the advertising
+ router".  If that's right, please say that instead of "the PE".
+ 
           this is the PE's loopback address).  The Tunnel Identifier and
           Next-Hop SHOULD be set to the same IP address as the
           Originating Router's IP address when the NVE/PE originates the
           route.  The Next-Hop address is referred to as the AR-IP and
           SHOULD be different than the IR-IP for a given PE/NVE.
+          
+ Similar question to my earlier one about the two SHOULDs above.  I guess
+ in the case of the second SHOULD, it MAY be the same in the case of a 
+ router unable to support two different IP addresses for this purpose, in
+ which case the procedures of Section 8 MUST be applied?  If that's right,
+ please add language to that effect.  
+ 
+ As for the first SHOULD, does this imply that the Tunnel Identifier and
+ Next-Hop MAY be set to the IP address of some other router?  
+ 
+ Also, "when the NVE/PE originates the route" -- in this section aren't
+ we always talking about the NVE/PE originating the route?  This clause
+ makes me think there is another case, but I can't figure out what it is.
  
        o  Tunnel Type = Assisted-Replication Tunnel.  Section 11 provides
           the allocated type value.
***************
*** 440,446 ****
        o  L (Leaf Information Required) = 0 (for non-selective AR) or 1
           (for selective AR).
  
!    In addition, this document also uses the Leaf A-D route (RT-11)
     defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the
  
  
--- 501,507 ----
        o  L (Leaf Information Required) = 0 (for non-selective AR) or 1
           (for selective AR).
  
!    In addition, this document also uses the Leaf Auto-Discovery (A-D) route (RT-11)
     defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the
  
  
***************
*** 452,457 ****
--- 513,522 ----
  
     selective AR mode is used.  The Leaf A-D route MAY be used by the AR-
     LEAF in response to a Replicator-AR route (with the L flag set) to
+    
+ The above is ambiguous.  Maybe "An AR-LEAF MAY send a Leaf A-D route in
+ response to reception of a Replicator-AR route whose L flag is set."?
+ 
     advertise its desire to receive the BM traffic from a specific AR-
     REPLICATOR.  It is only used for selective AR and its fields are set
     as follows:
***************
*** 459,466 ****
--- 524,538 ----
  
  
        o  Originating Router's IP Address is set to the advertising PE's
+       
+ What's "the PE" in this context?  I'm assuming it means "the advertising
+ router".  If that's right, please say that instead of "the PE".
+ 
           IP address (same IP used by the AR-LEAF in regular-IR routes).
           The Next-Hop address is set to the IR-IP.
+          
+ ... and the IR-IP is different from the "advertising PE's IP address" I
+ guess?
  
        o  Route Key is the "Route Type Specific" NLRI of the Replicator-
           AR route for which this Leaf A-D route is generated.
***************
*** 477,483 ****
  
        o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
           the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
!          Identifier set to the IP of the advertising AR-LEAF.  The PMSI
           Tunnel attribute MUST carry a downstream-assigned MPLS label or
           VNI that is used by the AR-REPLICATOR to send traffic to the
           AR-LEAF.
--- 549,555 ----
  
        o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
           the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
!          Identifier set to the IP address of the advertising AR-LEAF.  The PMSI
           Tunnel attribute MUST carry a downstream-assigned MPLS label or
           VNI that is used by the AR-REPLICATOR to send traffic to the
           AR-LEAF.
***************
*** 488,494 ****
  
     Each node attached to the BD may understand and process the BM/U
     flags.  Note that these BM/U flags may be used to optimize the
!    delivery of multi-destination traffic and its use SHOULD be an
     administrative choice, and independent of the AR role.
  
     Non-optimized-IR nodes will be unaware of the new PMSI attribute flag
--- 560,566 ----
  
     Each node attached to the BD may understand and process the BM/U
     flags.  Note that these BM/U flags may be used to optimize the
!    delivery of multi-destination traffic and their use SHOULD be an
     administrative choice, and independent of the AR role.
  
     Non-optimized-IR nodes will be unaware of the new PMSI attribute flag
***************
*** 512,518 ****
     AR function is enabled.  Three different roles are defined for a
     given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE).  The
     solution is called "non-selective" because the chosen AR-REPLICATOR
!    for a given flow MUST replicate the BM traffic to 'all' the NVE/PEs
     in the BD except for the source NVE/PE.
  
                             (           )
--- 584,590 ----
     AR function is enabled.  Three different roles are defined for a
     given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE).  The
     solution is called "non-selective" because the chosen AR-REPLICATOR
!    for a given flow MUST replicate the BM traffic to all the NVE/PEs
     in the BD except for the source NVE/PE.
  
                             (           )
***************
*** 567,572 ****
--- 639,658 ----
     An AR-REPLICATOR is defined as an NVE/PE capable of replicating
     ingress BM (Broadcast and Multicast) traffic received on an overlay
     tunnel to other overlay tunnels and local Attachment Circuits (ACs).
+    
+ This is different from the definition you have in the terminology section,
+ which is:
+ 
+    -  AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an
+       NVE/PE that can replicate Broadcast or Multicast traffic received
+       on overlay tunnels to other overlay tunnels.
+       
+ In the definition here, you mention local attachment circuits, in §2 you
+ don't.  Probably you should harmonize these definitions.  Having done so,
+ it's not clear to me that you need to repeat the definition here (though
+ if you think you need to remind the reader of what you already told them, 
+ it's OK).
+    
     The AR-REPLICATOR signals its role in the control plane and
     understands where the other roles (AR-LEAF nodes, RNVEs and other AR-
     REPLICATORs) are located.  A given AR-enabled BD service may have
***************
*** 584,608 ****
         generate a Regular-IR route if it does not have local attachment
         circuits (AC).  If the Regular-IR route is advertised, the AR
         Type field is set to zero.
  
     c.  The Replicator-AR and Regular-IR routes are generated according
         to section 3.  The AR-IP and IR-IP used by the AR-REPLICATOR are
         different routable IP addresses.
  
     d.  When a node defined as AR-REPLICATOR receives a BM packet on an
!        overlay tunnel, it will do a tunnel destination IP lookup and
         apply the following procedures:
  
!        o  If the destination IP is the AR-REPLICATOR IR-IP Address the
            node will process the packet normally as in [RFC7432].
  
!        o  If the destination IP is the AR-REPLICATOR AR-IP Address the
            node MUST replicate the packet to local ACs and overlay
            tunnels (excluding the overlay tunnel to the source of the
            packet).  When replicating to remote AR-REPLICATORs the tunnel
!           destination IP will be an IR-IP.  That will be an indication
            for the remote AR-REPLICATOR that it MUST NOT replicate to
!           overlay tunnels.  The tunnel source IP used by the AR-
            REPLICATOR MUST be its IR-IP when replicating to either AR-
            REPLICATOR or AR-LEAF nodes.
  
--- 670,705 ----
         generate a Regular-IR route if it does not have local attachment
         circuits (AC).  If the Regular-IR route is advertised, the AR
         Type field is set to zero.
+        
+ Do you mean "... the AR Type field of the Replicator-AR route MUST be 
+ set to zero"?  If so, please say that.
  
     c.  The Replicator-AR and Regular-IR routes are generated according
         to section 3.  The AR-IP and IR-IP used by the AR-REPLICATOR are
+ 
+ I think you mean Section 4?
+ 
         different routable IP addresses.
+        
+ I think you'll find that "routable IP address" isn't a well-defined
+ term (for example I'm sure you're NOT talking specifically about non-
+ RFC 1918 addresses).  Can you choose different language here to say 
+ what you mean?
  
     d.  When a node defined as AR-REPLICATOR receives a BM packet on an
!        overlay tunnel, it will do a tunnel destination IP address lookup and
         apply the following procedures:
  
!        o  If the destination IP address is the AR-REPLICATOR IR-IP Address the
            node will process the packet normally as in [RFC7432].
  
!        o  If the destination IP address is the AR-REPLICATOR AR-IP Address the
            node MUST replicate the packet to local ACs and overlay
            tunnels (excluding the overlay tunnel to the source of the
            packet).  When replicating to remote AR-REPLICATORs the tunnel
!           destination IP address will be an IR-IP.  That will be an indication
            for the remote AR-REPLICATOR that it MUST NOT replicate to
!           overlay tunnels.  The tunnel source IP address used by the AR-
            REPLICATOR MUST be its IR-IP when replicating to either AR-
            REPLICATOR or AR-LEAF nodes.
  
***************
*** 628,642 ****
        and remote NVE/PEs), skipping the non-BM overlay tunnels.
  
     -  When an AR-REPLICATOR receives a BM packet on an overlay tunnel,
!       it will check the destination IP of the underlay IP header and:
  
!       o  If the destination IP matches its AR-IP, the AR-REPLICATOR will
           forward the BM packet to its flooding list (ACs and overlay
           tunnels) excluding the non-BM overlay tunnels.  The AR-
!          REPLICATOR will do source squelching to ensure the traffic is
           not sent back to the originating AR-LEAF.
  
!       o  If the destination IP matches its IR-IP, the AR-REPLICATOR will
           skip all the overlay tunnels from the flooding list, i.e.  it
           will only replicate to local ACs.  This is the regular IR
           behavior described in [RFC7432].
--- 725,742 ----
        and remote NVE/PEs), skipping the non-BM overlay tunnels.
  
     -  When an AR-REPLICATOR receives a BM packet on an overlay tunnel,
!       it will check the destination IP address of the underlay IP header and:
  
!       o  If the destination IP address matches its AR-IP, the AR-REPLICATOR will
           forward the BM packet to its flooding list (ACs and overlay
           tunnels) excluding the non-BM overlay tunnels.  The AR-
!          REPLICATOR will ensure the traffic is
           not sent back to the originating AR-LEAF.
+     
+ Above, I suggested the removal of "do source squelching" since AFAICT
+ it removes jargon while leaving the intention clear. 
  
!       o  If the destination IP address matches its IR-IP, the AR-REPLICATOR will
           skip all the overlay tunnels from the flooding list, i.e.  it
           will only replicate to local ACs.  This is the regular IR
           behavior described in [RFC7432].
***************
*** 645,650 ****
--- 745,754 ----
        is different for BM traffic, as far as Unknown unicast traffic
        forwarding is concerned, AR-LEAF nodes behave exactly in the same
        way as AR-REPLICATORs do.
+       
+ I'm unclear why you're defining the behavior of AR-LEAF nodes here, when
+ you started by saying "An AR-REPLICATOR will follow..."  Surely, defining
+ AR-LEAF behavior here is misplaced?
  
     -  The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood-
        list composed of ACs and overlay tunnels to the IR-IP Addresses of
***************
*** 655,660 ****
--- 759,767 ----
        o  When an AR-REPLICATOR/LEAF receives an unknown packet on an AC,
           it will forward the unknown packet to its flood-list, skipping
           the non-U overlay tunnels.
+          
+ Possibly the term "unknown packet" is well-understood by the target 
+ audience, but I think it needs either an explanation or a reference here.
  
        o  When an AR-REPLICATOR/LEAF receives an unknown packet on an
           overlay tunnel will forward the unknown packet to its local ACs
***************
*** 688,696 ****
     b.  In this non-selective AR solution, the AR-LEAF MUST advertise a
         single Regular-IR inclusive multicast route as in [RFC7432].  The
         AR-LEAF SHOULD set the AR Type field to AR-LEAF.  Note that
!        although this flag does not make any difference for the egress
         nodes when creating an EVPN destination to the AR-LEAF, it is
!        RECOMMENDED to use this flag for an easy operation and
         troubleshooting of the BD.
  
     c.  In a service where there are no AR-REPLICATORs, the AR-LEAF MUST
--- 795,803 ----
     b.  In this non-selective AR solution, the AR-LEAF MUST advertise a
         single Regular-IR inclusive multicast route as in [RFC7432].  The
         AR-LEAF SHOULD set the AR Type field to AR-LEAF.  Note that
!        although this field does not make any difference for the egress
         nodes when creating an EVPN destination to the AR-LEAF, it is
!        RECOMMENDED to use this field for an easy operation and
         troubleshooting of the BD.
  
     c.  In a service where there are no AR-REPLICATORs, the AR-LEAF MUST
***************
*** 701,706 ****
--- 808,816 ----
         IGP or any other detection mechanism).  Ingress replication MUST
         use the forwarding information given by the remote Regular-IR
         Inclusive Multicast Routes as described in [RFC7432].
+        
+ I found the above paragraph to be confusing.  Does it boil down to,
+ if there are no AR-REPLICATORS, use regular IR?
  
     d.  In a service where there is one or more AR-REPLICATORs (based on
         the received Replicator-AR routes for the BD), the AR-LEAF can
***************
*** 709,720 ****
         o  A single AR-REPLICATOR MAY be selected for all the BM packets
            received on the AR-LEAF attachment circuits (ACs) for a given
            BD.  This selection is a local decision and it does not have
!           to match other AR-LEAF's selection within the same BD.
  
         o  An AR-LEAF MAY select more than one AR-REPLICATOR and do
            either per-flow or per-BD load balancing.
  
!        o  In case of a failure on the selected AR-REPLICATOR, another
            AR-REPLICATOR will be selected.
  
         o  When an AR-REPLICATOR is selected, the AR-LEAF MUST send all
--- 819,830 ----
         o  A single AR-REPLICATOR MAY be selected for all the BM packets
            received on the AR-LEAF attachment circuits (ACs) for a given
            BD.  This selection is a local decision and it does not have
!           to match other AR-LEAFs' selections within the same BD.
  
         o  An AR-LEAF MAY select more than one AR-REPLICATOR and do
            either per-flow or per-BD load balancing.
  
!        o  In case of a failure of the selected AR-REPLICATOR, another
            AR-REPLICATOR will be selected.
  
         o  When an AR-REPLICATOR is selected, the AR-LEAF MUST send all
***************
*** 752,757 ****
--- 862,874 ----
         to the AR-REPLICATOR and be programmed.  While the AR-REPLICATOR-
         activation-time is running, the AR-LEAF node will use regular
         ingress replication.
+        
+ Probably you should say something about the case where a router has
+ selected its preferred AR-REPLICATOR from the set that are available,
+ and then a new AR-REPLICATOR shows up that is more preferable.  Should
+ the router shift to the new, preferred replicator?  Should it stick 
+ with the one it was already using even though less-preferred?  Is it a
+ matter of local policy?
  
     An AR-LEAF will follow a data path implementation compatible with the
     following rules:
***************
*** 849,874 ****
         REPLICATORs will fall back to non-selective AR mode.
  
     c.  The Selective AR-REPLICATOR MUST follow the procedures described
!        in section Section 5.1, except for the following differences:
  
         o  The Replicator-AR route MUST include L=1 (Leaf Information
            Required) in the Replicator-AR route.  This flag is used by
            the AR-REPLICATORs to advertise their 'selective' AR-
            REPLICATOR capabilities.  In addition, the AR-REPLICATOR auto-
            configures its IP-address-specific import route-target as
!           described in section Section 4.
  
         o  The AR-REPLICATOR will build a 'selective' AR-LEAF-set with
            the list of nodes that requested replication to its own AR-IP.
            For instance, assuming NVE1 and NVE2 advertise a Leaf A-D
            route with PE1's IP-address-specific route-target and NVE3
            advertises a Leaf A-D route with PE2's IP-address-specific
!           route-target, PE1 MUST only add NVE1/NVE2 to its selective AR-
!           LEAF-set for BD-1, and exclude NVE3.
  
!        o  When a node defined and operating as Selective AR-REPLICATOR
            receives a packet on an overlay tunnel, it will do a tunnel
!           destination IP lookup and if the destination IP is the AR-
            REPLICATOR AR-IP Address, the node MUST replicate the packet
            to:
  
--- 966,997 ----
         REPLICATORs will fall back to non-selective AR mode.
  
     c.  The Selective AR-REPLICATOR MUST follow the procedures described
!        in Section 5.1, except for the following differences:
  
         o  The Replicator-AR route MUST include L=1 (Leaf Information
            Required) in the Replicator-AR route.  This flag is used by
            the AR-REPLICATORs to advertise their 'selective' AR-
            REPLICATOR capabilities.  In addition, the AR-REPLICATOR auto-
            configures its IP-address-specific import route-target as
!           described in the third bullet of the procedures for Leaf A-D 
!           route in Section 4.
  
         o  The AR-REPLICATOR will build a 'selective' AR-LEAF-set with
            the list of nodes that requested replication to its own AR-IP.
            For instance, assuming NVE1 and NVE2 advertise a Leaf A-D
            route with PE1's IP-address-specific route-target and NVE3
            advertises a Leaf A-D route with PE2's IP-address-specific
!           route-target, PE1 will only add NVE1/NVE2 to its selective AR-
!           LEAF-set for BD-1, and exclude NVE3.  Likewise, PE2 will only
!           add NVE3 to its selective AR-LEAF-set for BD-1, and exclude
!           NVE1/NVE2.
!           
! I changed the MUST to "will" above -- it's an example, it's inappropriate
! to use RFC 2119 type keywords in it.
  
!        o  When a node defined and operating as a Selective AR-REPLICATOR
            receives a packet on an overlay tunnel, it will do a tunnel
!           destination IP lookup and if the destination IP address is the AR-
            REPLICATOR AR-IP Address, the node MUST replicate the packet
            to:
  
***************
*** 878,893 ****
               overlay tunnel to the source AR-LEAF).
  
            +  overlay tunnels to the RNVEs if the tunnel source IP is the
!              IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR
!              MUST NOT replicate the BM traffic to remote RNVEs).  In
               other words, only the first-hop selective AR-REPLICATOR
               will replicate to all the RNVEs.
  
            +  overlay tunnels to the remote Selective AR-REPLICATORs if
!              the tunnel source IP is an IR-IP of its own AR-LEAF-set (in
               any other case, the AR-REPLICATOR MUST NOT replicate the BM
!              traffic to remote AR-REPLICATORs), where the tunnel
!              destination IP is the AR-IP of the remote Selective AR-
               REPLICATOR.  The tunnel destination IP AR-IP will be an
  
  
--- 1001,1016 ----
               overlay tunnel to the source AR-LEAF).
  
            +  overlay tunnels to the RNVEs if the tunnel source IP is the
!              IR-IP of an AR-LEAF.  In any other case, the AR-REPLICATOR
!              MUST NOT replicate the BM traffic to remote RNVEs.  In
               other words, only the first-hop selective AR-REPLICATOR
               will replicate to all the RNVEs.
  
            +  overlay tunnels to the remote Selective AR-REPLICATORs if
!              the tunnel source IP address is an IR-IP of its own AR-LEAF-set.  In
               any other case, the AR-REPLICATOR MUST NOT replicate the BM
!              traffic to remote AR-REPLICATORs.  When doing this replication, the tunnel
!              destination IP address is the AR-IP of the remote Selective AR-
               REPLICATOR.  The tunnel destination IP AR-IP will be an
  
  
***************
*** 911,916 ****
--- 1034,1042 ----
            destination IP addresses.  Some of those overlay tunnels MAY
            be flagged as non-BM receivers based on the BM flag received
            from the remote nodes in the BD.
+           
+ It's not clear to me why you'd include "overlay tunnels ... flagged as 
+ non-BM receivers" in a flood-list that's used for flooding BM traffic?
  
        2.  Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a
            Selective AR-REPLICATOR-set, where:
***************
*** 928,945 ****
     -  When a Selective AR-REPLICATOR receives a BM packet on an AC, it
        will forward the BM packet to its flood-list #1, skipping the non-
        BM overlay tunnels.
  
     -  When a Selective AR-REPLICATOR receives a BM packet on an overlay
        tunnel, it will check the destination and source IPs of the
        underlay IP header and:
  
!       o  If the destination IP matches its AR-IP and the source IP
           matches an IP of its own Selective AR-LEAF-set, the AR-
           REPLICATOR will forward the BM packet to its flood-list #2, as
           long as the list of AR-REPLICATORs for the BD matches the
           Selective AR-REPLICATOR-set.  If the Selective AR-REPLICATOR-
           set does not match the list of AR-REPLICATORs, the node reverts
           back to non-selective mode and flood-list #1 is used.
  
        o  If the destination IP matches its AR-IP and the source IP does
           not match any IP of its Selective AR-LEAF-set, the AR-
--- 1054,1104 ----
     -  When a Selective AR-REPLICATOR receives a BM packet on an AC, it
        will forward the BM packet to its flood-list #1, skipping the non-
        BM overlay tunnels.
+       
+ It sure seems like it would have been cleaner to have expressed this by
+ naming a list (Flood-list #3, whatever) that doesn't include the non-BM
+ overlay tunnels to begin with, and then saying that's the list used in 
+ this case. I guess this also relates to my previous comment/question --
+ basically, why are the non-BM overlay tunnels even included?
  
     -  When a Selective AR-REPLICATOR receives a BM packet on an overlay
        tunnel, it will check the destination and source IPs of the
        underlay IP header and:
  
!       o  If the destination IP address matches its AR-IP and the source IP address
           matches an IP of its own Selective AR-LEAF-set, the AR-
           REPLICATOR will forward the BM packet to its flood-list #2, as
           long as the list of AR-REPLICATORs for the BD matches the
           Selective AR-REPLICATOR-set.  If the Selective AR-REPLICATOR-
           set does not match the list of AR-REPLICATORs, the node reverts
           back to non-selective mode and flood-list #1 is used.
+          
+ Presumably this time the non-BM overlay tunnels are NOT excluded?
+ 
+ Also, I guess the language above is where the answer to the "fall back to
+ non-selective AR mode" puzzle from point b, above, is hidden. It requires 
+ that I make some assumptions:
+ 
+ - The "list of AR-REPLICATORS for the BD" is derived from the set of
+   AR-REPLICATOR advertisements for the BD. (This is not intuitively
+   obvious; "list" is very generic and could be, for example, configured
+   or something.)
+ - The Selective AR-REPLICATOR-set is all the members of the above list
+   that have advertised L=1.
+ - Ergo, if the sets aren't identical, some of them must have advertised
+   L=0.
+ 
+ It seems to me as though it would be more understandable to say something
+ like:
+ 
+ --
+       o  If the destination IP address matches its AR-IP and the source IP address
+          matches an IP of its own Selective AR-LEAF-set, the AR-
+          REPLICATOR will forward the BM packet to its flood-list #2,
+          unless some AR-REPLICATOR within the BD has advertised L=0.
+          In the latter case, the node reverts
+          back to non-selective mode and flood-list #1 is used.
+ --
  
        o  If the destination IP matches its AR-IP and the source IP does
           not match any IP of its Selective AR-LEAF-set, the AR-
***************
*** 960,970 ****
           This is the regular-IR behavior described in [RFC7432].
  
     -  In any case, non-BM overlay tunnels are excluded from flood-lists
!       and, also, source squelching is always done in order to ensure the
        traffic is not sent back to the originating source.  If the
!       encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not
        the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
        the labels when forwarding them to the egress overlay tunnels.
  
  6.2.  Selective AR-LEAF procedures
  
--- 1119,1142 ----
           This is the regular-IR behavior described in [RFC7432].
  
     -  In any case, non-BM overlay tunnels are excluded from flood-lists
!    
! That seems inconsistent with what point 1, above, says -- the place where
! I asked why you'd include non-BM receivers.  In any case, there you say
! they can be part of flood-list #1. Here you say they "are excluded". 
! Which is it?
! 
!       and, also, 
        traffic is not sent back to the originating source.  If the
!       encapsulation is MPLSoGRE or MPLSoUDP and the BD label is not
        the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
        the labels when forwarding them to the egress overlay tunnels.
+       
+ Above, I removed "source squelching" again since it seemed not to add
+ anything, as previously.
+ 
+ Reference needed for "BD label".  I also wonder, is the requirement that
+ the replicator copy the rest of the labels a new one introduced here, or 
+ are you just repeating an existing requirement from an underlying spec?
  
  6.2.  Selective AR-LEAF procedures
  
***************
*** 991,996 ****
--- 1163,1174 ----
     b.  The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs
         in the BD.  The Selective AR-LEAF MUST advertise a Leaf A-D route
         after receiving a Replicator-AR route with L=1.  It is
+        
+ "after receiving" -- so, does this mean it MUST NOT advertise a Leaf A-D
+ route prior to receiving any Replicator-AR route with L=1?  That would also
+ imply that if all Replicator-AR routes with L=1 are withdrawn, the Leaf A-D
+ route MUST be withdrawn?
+ 
         RECOMMENDED that the Selective AR-LEAF waits for a AR-LEAF-join-
         wait-timer (in seconds, default value is 3) before sending the
         Leaf A-D route, so that the AR-LEAF can collect all the
***************
*** 998,1004 ****
         route.
  
     c.  In a service where there is more than one Selective AR-
!        REPLICATORs the Selective AR-LEAF MUST locally select a single
         Selective AR-REPLICATOR for the BD.  Once selected:
  
  
--- 1176,1182 ----
         route.
  
     c.  In a service where there is more than one Selective AR-
!        REPLICATOR the Selective AR-LEAF MUST locally select a single
         Selective AR-REPLICATOR for the BD.  Once selected:
  
  
***************
*** 1021,1026 ****
--- 1199,1211 ----
         o  In case of a failure on the selected AR-REPLICATOR, another
            AR-REPLICATOR will be selected and a new Leaf A-D update will
            be issued for the new AR-REPLICATOR.  This new route will
+           
+ What does "in case of a failure on the selected AR-REPLICATOR" mean, 
+ practically speaking?  How is this detected?  I presume the failure
+ is detected when the relevant route becomes infeasible as the result
+ of any of the relevant underlying BGP mechanisms (nexthop unresolvability,
+ holdtime expired, route withdrawal, etc).
+ 
            update the selective list in the new Selective AR-REPLICATOR.
            In case of failure on the active Selective AR-REPLICATOR, it
            is RECOMMENDED for the Selective AR-LEAF to revert to IR
***************
*** 1030,1035 ****
--- 1215,1223 ----
            AR mode with the new Selective AR-REPLICATOR.  The AR-
            REPLICATOR-activation-timer MAY be the same configurable
            parameter as in Section 5.2.
+           
+ What happens if a new AR-REPLICATOR is learned by the AR-LEAF, and the 
+ new replicator is preferred over the currently-selected one?
  
     All the AR-LEAFs in a BD are expected to be configured as either
     selective or non-selective.  A mix of selective and non-selective AR-
***************
*** 1045,1051 ****
  
        1.  Flood-list #1 - composed of ACs and the overlay tunnel to the
            selected AR-REPLICATOR (using the AR-IP as the tunnel
!           destination IP).
  
        2.  Flood-list #2 - composed of ACs and overlay tunnels to the
            remote IR-IP Addresses.
--- 1233,1239 ----
  
        1.  Flood-list #1 - composed of ACs and the overlay tunnel to the
            selected AR-REPLICATOR (using the AR-IP as the tunnel
!           destination IP address).
  
        2.  Flood-list #2 - composed of ACs and overlay tunnels to the
            remote IR-IP Addresses.
***************
*** 1054,1061 ****
        there is any selected AR-REPLICATOR.  If there is, flood-list #1
        will be used.  Otherwise, flood-list #2 will.
  
!    -  When an AR-LEAF receives a BM packet on an overlay tunnel, will
!       forward the BM packet to its local ACs and never to an overlay
        tunnel.  This is the regular IR behavior described in [RFC7432].
  
  
--- 1242,1249 ----
        there is any selected AR-REPLICATOR.  If there is, flood-list #1
        will be used.  Otherwise, flood-list #2 will.
  
!    -  When an AR-LEAF receives a BM packet on an overlay tunnel, it will
!       forward the packet to its local ACs and never to an overlay
        tunnel.  This is the regular IR behavior described in [RFC7432].
  
  
***************
*** 1071,1076 ****
--- 1259,1267 ----
     In addition to AR, the second optimization supported by this solution
     is the ability for the all the BD nodes to signal Pruned-Flood-Lists
     (PFL).  As described in section 3, an EVPN node can signal a given
+    
+ I guess you meant Section 4?
+ 
     value for the BM and U PFL flags in the IR Inclusive Multicast
     Routes, where:
  
***************
*** 1085,1090 ****
--- 1276,1286 ----
     PFL flag and remove the sender from the corresponding flood-list.  A
     given BD node receiving BUM traffic on an overlay tunnel MUST
     replicate the traffic normally, regardless of the signaled PFL flags.
+    
+ What exactly does "replicate the traffic normally" mean, in the context
+ of this specification?  I guess you should say something like "replicate
+ the traffic according to [reference]".  Also, I don't get it: what are the
+ flags FOR, if they're ignored when receiving on an overlay tunnel?
  
     This optimization MAY be used along with the AR solution.
  
***************
*** 1123,1128 ****
--- 1319,1328 ----
  
  
         NVE2, but not to NVE3.  PE2 and NVE2 will replicate the BM
+        
+ "but not to NVE3".  What happened to "MUST replicate the traffic normally"?
+ To me, these two pieces of text seem to contradict one another.
+ 
         packets to their local ACs but we will avoid NVE3 having to
         replicate unnecessarily those BM packets to VM31 and VM32.
  
***************
*** 1135,1147 ****
--- 1335,1357 ----
         NVE3 to NVE2, PE1 and PE2 but not NVE1.  The solution avoids the
         unnecessary replication to NVE1, since the destination of the
         unknown traffic cannot be at NVE1.
+        
+ It's not clear to me why the destination can't be at NVE1.
  
     4.  Any Unknown unicast packet sent from TS1 will be forwarded by PE1
         to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the
         target of the unknown traffic cannot be at those NVEs.
+        
+ Similarly, I don't get why this is the case.
  
  8.  AR Procedures for single-IP AR-REPLICATORS
  
+ I'm curious why the design choice was made to specify two different ways to 
+ do the same thing.  You motivate why not all routers can use distinguished
+ IP addresses for the two different functional modes; however, presumably all
+ routers could make use of distinguished VNIs as you do here.  I'd appreciate
+ a few words about why you didn't choose to just always use the VNI approach.
+ 
     The procedures explained in sections Section 5 and Section 6 assume
     that the AR-REPLICATOR can use two local routable IP addresses to
     terminate and originate NVO tunnels, i.e. IR-IP and AR-IP addresses.
***************
*** 1184,1201 ****
  9.  AR Procedures and EVPN All-Active Multi-homing Split-Horizon
  
     This section extends the procedures for the cases where AR-LEAF nodes
!    or AR-REPLICATOR nodes are attached to the the same Ethernet Segment
     in the BD.  The case where one (or more) AR-LEAF node(s) and one (or
     more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment
     is out of scope.
  
  9.1.  Ethernet Segments on AR-LEAF nodes
  
     If VXLAN or NVGRE are used, and if the Split-horizon is based on the
     tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split-
     horizon check will not work if there is an Ethernet-Segment shared
!    between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel
     IP SA of the packets with its own AR-IP.
  
     In order to be compatible with the IP SA split-horizon check, the AR-
     REPLICATOR MAY keep the original received tunnel IP SA when
--- 1394,1418 ----
  9.  AR Procedures and EVPN All-Active Multi-homing Split-Horizon
  
     This section extends the procedures for the cases where AR-LEAF nodes
!    or AR-REPLICATOR nodes are attached to the same Ethernet Segment
     in the BD.  The case where one (or more) AR-LEAF node(s) and one (or
     more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment
     is out of scope.
+    
+ I just can't understand what this paragraph is telling me. :-(  Apart from
+ anything else, to the casual reader the second sentence seems to contradict
+ the first.
  
  9.1.  Ethernet Segments on AR-LEAF nodes
  
     If VXLAN or NVGRE are used, and if the Split-horizon is based on the
     tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split-
     horizon check will not work if there is an Ethernet-Segment shared
!    between two AR-LEAF nodes, and the AR-REPLICATOR replaces the tunnel
     IP SA of the packets with its own AR-IP.
+    
+ I changed "changes" to "replaces"; it's my best guess as to what you meant.
+ If that's wrong, please help me understand what you did mean.
  
     In order to be compatible with the IP SA split-horizon check, the AR-
     REPLICATOR MAY keep the original received tunnel IP SA when
***************
*** 1203,1209 ****
     LEAF nodes to apply Split-horizon check procedures for BM packets,
     before sending them to the local Ethernet-Segment.  Even if the AR-
     LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
!    AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to
     other AR-REPLICATORs.
  
     When EVPN is used for MPLS over GRE (or UDP), the ESI-label based
--- 1420,1426 ----
     LEAF nodes to apply Split-horizon check procedures for BM packets,
     before sending them to the local Ethernet-Segment.  Even if the AR-
     LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
!    AR-REPLICATOR MUST always use its IR-IP as the IP SA when replicating to
     other AR-REPLICATORs.
  
     When EVPN is used for MPLS over GRE (or UDP), the ESI-label based
***************
*** 1220,1226 ****
  
  9.2.  Ethernet Segments on AR-REPLICATOR nodes
  
!    Ethernet Segments associated to one or more AR-REPLICATOR nodes
     SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
     homing, as follows:
  
--- 1437,1443 ----
  
  9.2.  Ethernet Segments on AR-REPLICATOR nodes
  
!    Ethernet Segments associated with one or more AR-REPLICATOR nodes
     SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
     homing, as follows:
  
***************
*** 1240,1245 ****
--- 1457,1464 ----
        it had been received on a local AC that is part of the ES and will
        be forwarded to all local ES, irrespective of their DF or NDF
        state.
+       
+ Please define/expand "ES".
  
     -  BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP
        as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and
***************
*** 1254,1259 ****
--- 1473,1483 ----
     In addition, the procedures introduced by this document may bring
     some new risks for the successful delivery of BM traffic.  Unicast
     traffic is not affected by this document.  The forwarding of
+    
+ If unicast traffic isn't affected, what's the U flag even for?  It sure
+ seems as though it's intended to affect the forwarding of (unknown)
+ unicast traffic.
+ 
     Broadcast and Multicast (BM) traffic is modified though, and BM
     traffic from the AR-LEAF nodes will be attracted by the existance of
     AR-REPLICATORs in the BD.  An AR-LEAF will forward BM traffic to its
***************
*** 1262,1270 ****
  
     An implementation following the procedures in this document should
     not create BM loops, since the AR-REPLICATOR will always forward the
     BM traffic using the correct tunnel IP Destination Address that
     indicates the remote nodes how to forward the traffic.  This is true
!    in both, the Non-Selective and Selective modes defined in this
     document.
  
     The Selective mode provides a multi-staged replication solution,
--- 1486,1503 ----
  
     An implementation following the procedures in this document should
     not create BM loops, since the AR-REPLICATOR will always forward the
+    
+ Instead of "should not create BM loops" I suggest "will not create" or
+ if you can't actually promise that, "is not expected to create".  I assume
+ you're using "should" in the sense of weak expectation, and not like a 
+ RFC 2119 SHOULD.
+ 
     BM traffic using the correct tunnel IP Destination Address that
     indicates the remote nodes how to forward the traffic.  This is true
!    
! Instead of "indicates", try instructs, cues, or directs?
!    
!    in both the Non-Selective and Selective modes defined in this
     document.
  
     The Selective mode provides a multi-staged replication solution,
Murray Kucherawy
No Objection
Comment (2021-10-21 for -09) Sent
The shepherd writeup asserts that there are no IANA actions, but the document contains two such actions.

Section 2 defines "IR forwarding mode", but that term is not present elsewhere in this document.  Neither is "GENEVE".
Roman Danyliw
No Objection
Comment (2021-10-18 for -09) Sent
Thank you to Derek Atkins for the SECDIR review.

** Section 10.   The following sentence doesn’t parse for me.  Can the guidance on avoiding loops and the AR-REPLICATOR please be clarified.

   An implementation following the procedures in this document should
   not create BM loops, since the AR-REPLICATOR will always forward the
   BM traffic using the correct tunnel IP Destination Address that
   indicates the remote nodes how to forward the traffic.

** Section 10.  Editorial?

OLD
The forwarding of
   Broadcast and Multicast (BM) traffic is modified though, and BM
   traffic from …

NEW
The forwarding of Broadcast and Multicast (BM) traffic is modified; and BM traffic from ...

** Section 10.  Typo. s/existance/existence/

** Editorial.  Multiple places.  Typo likely to rendering. s/section Section X.X/Section X.X/
Warren Kumari
No Objection
Zaheduzzaman Sarker
No Objection
Comment (2021-10-20 for -09) Not sent
Keeping my review strictly TSV focused, I haven't noticed any transport related issues. Thanks Michael Tuxen for the TSVART review.
Éric Vyncke
No Objection
Comment (2021-10-18 for -09) Sent
Thank you for the work put into this document.

Please reply and address the points raised by Pascal Thubert (thank you Pascal!) in his internet directorate review at:
https://datatracker.ietf.org/doc/review-ietf-bess-evpn-optimized-ir-09-intdir-telechat-thubert-2021-10-15/ or at
https://mailarchive.ietf.org/arch/msg/bess/qLMMP2d49xjKy_JXNCUo1DBOwVs/

With Pascal's review, I have only skimmed the document and have only 1 nit: please introduce references to IR/EVPN/PIM early in the text.

I also wonder why the specific case of NVO is discussed in the document as I would assume that the issues are in all EVPN deployments.

Pascal and I hope that this helps to improve the document,

Regards,

-éric
Martin Vigoureux Former IESG member
Yes
Yes (for -09) Unknown

                            
Alvaro Retana Former IESG member
No Objection
No Objection (2021-10-21 for -09) Sent
Thanks to Julien Meuric for the rtg-dir review.  Please reply to it.


(1) §4: 
   -  T is the AR Type field (2 bits) that defines the AR role of the
...
     o  11 (decimal 3) = RESERVED

What should a receiver do if the reserved value is received?


(2) §4: 

   Each AR-enabled node MUST understand and process the AR type field in
   the PTA (Flags field) of the routes, and MUST signal the
   corresponding type (1 or 2) according to its administrative choice.

"MUST understand and process the AR type field"

From a normative action point of view, this statement has no value as it is equivalent to saying that the AR node has to support this document...   Please remove the normative statement.


(3) §5.1: "The Replicator-AR and Regular-IR routes are generated according to section 3."   s/3/4


(4) §7: "As described in section 3..."  s/3/4


(5) §5.2: The non-existence of an AR-REPLICATOR results in the AS-LEAF having to use regular IR.  That seems like the right/only action.  However, because the AS-LEAF is defined as a node with "poor replication performance", it concerns me that a rogue replicator can use a non-REPLICATOR type with the objective of impacting the application (as described in the Introduction).

The Security Considerations already mention an attack on the AR-REPLICATOR.  It would be good if this other vector was also added.
Benjamin Kaduk Former IESG member
No Objection
No Objection (2021-10-27 for -09) Sent
Thanks to Derek Atkins for the secdir review.

Thanks as well to John Scudder for his detailed review; I support his
discuss position and have omitted a few of my comments that he has
already covered.  (There are probably a few more that I could have
omitted, but I did not do an exhaustive check.  Feel free to just point
to his ballot thread instead of repeating the explanation to me.)

I think it would be very helpful to clearly state early on what the
difference between the "selective" and "non-selective" setups is.  The
first description I see is not until §6.2 (I comment below where it
appears as well).

Section 2

   -  AR-IP: IP address owned by the AR-REPLICATOR and used to
      differentiate the ingress traffic that must follow the AR
      procedures.

From context I infer that the AR-IP is advertised along with the
Replicator-AR RT-3 route.  Since we talk about other defined values as
being advertised along with such RT-3 routes, should we also say that
this IP is advertised along with the corresponding RT-3 route?

   -  AR-VNI: VNI advertised by the AR-REPLICATOR along with the
      Replicator-AR route.  It is used to identify the ingress packets
      that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR
      case.

This phrasing seems ambiguous: please distinguish whether this is used
only in the single-IP AR-REPLICATOR case or it identifies packets that
sometimes follow AR procedures (in the single-IP AR-REPLICATOR case) and
sometimes do not.

   -  PTA: PMSI Tunnel Attribute

PMSI is not marked as "well-known" at
https://www.rfc-editor.org/materials/abbrev.expansion.txt and should be
expanded on first use or otherwise defined.

   -  EVI: EVPN Instance.  An EVPN instance spanning the Provider Edge
      (PE) devices participating in that EVPN

This seems rather circular.  Can we define "EVPN Instance" without
reference to "EVPN instance"?

Section 3

   c.  The solution is compatible with [RFC7432] and [RFC8365] and has
       no impact on the EVPN procedures for BM traffic.  In particular,

I do not think that "no impact on the EVPN procedures" is what was
intended -- it obviously has impact on the procedures, since it is
implemented differently.  Perhaps it has no impact on the CE, but that's
not what this text seems to say.

Section 4

I agree with the directorate reviewer that splitting the RT-3 NLRI
layout and the PTA general format into separate figures is quite
worthwhile.  I would also suggest naming the first one as the NLRI of
the RT-3 route type, rather than leaving that implicit.

   The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel
   Attribute's (PTA) general format used in [RFC7432] are shown below:

I suggest referencing RFC 6514 as the source of the PTA format.

   The Flags field is 8 bits long.  This document defines the use of 4
   bits of this Flags field:

That's half of the flag bits!  Why is it better to allocate so many
flags than to move more structure into the tunnel identifier portion of
the PTA?  I guess RFC 7902 does provision for extended tunnel attribute
flags, but the question of whether these all belong as flags still seems
valid.

   -  Regular-IR route: in this route, Originating Router's IP Address,
      Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used
      as described in [RFC7432] when Ingress Replication is in use.  The
      NVE/PE that advertises the route will set the Next-Hop to an IP
      address that we denominate IR-IP in this document.  When
      advertised by an AR-LEAF node, the Regular-IR route SHOULD be
      advertised with type T= AR-LEAF.

When would I violate this SHOULD (and what other behaviors would be
usable)?

      o  Originating Router's IP Address MUST be set to an IP address of
         the PE that should be common to all the EVIs on the PE (usually
         this is the PE's loopback address).  The Tunnel Identifier and

Is it really the usual case that a PE has only one loopback address (so
that the definite article "the" applies)?  This seems particularly
poigniant since we assume that AR-REPLICATORs will have multiple
addresses available for use, to distinguish inbound IR and AR traffic.

         Next-Hop SHOULD be set to the same IP address as the
         Originating Router's IP address when the NVE/PE originates the
         route.  [...]

Should we say anything about what they are set to when the NVE/PE does
not originate the route?

                It is only used for selective AR and its fields are set
   as follows:

The antecedent for "its fields" seems to be "the Leaf A-D route
(RT-11)"; I suggest using the precise terminology that the fields of the
"route type specific portion of the route" are what are described.
Precise use of terminology makes the documents much more approachable to
unfamiliar readers that rely on textual search to correlate the relevant
parts of the various documents in question.

   -  Replicator-AR route: this route is used by the AR-REPLICATOR to
      advertise its AR capabilities, with the fields set as follows:

      o  Originating Router's IP Address MUST be set to an IP address of
         the PE that should be common to all the EVIs on the PE (usually
         this is the PE's loopback address).  The Tunnel Identifier and

I note that the guidance in RFC 7432 for constructing what we in this
document refer to as the "Regular-IR route" also has text about "the
PE's loopback address" being useful for what we would call the IR-IP,
but this address here is the AR-IP and (if we keep reading) SHOULD be
different than the IR-IP.  I think we need to say something about
whether PEs are really expected to only have one ("the") loopback
address vs multiple, and if there is only one how to decide whether to
use it as AR-IP or IR-IP.  To use language ("the PE's loopback address")
that implies there is only one, while strongly suggesting that it be
used for two different purposes and also strongly suggesting that those
two different purposes have different addresses, seems to be internally
inconsistent.

      o  The AR-LEAF constructs an IP-address-specific route-target as
         indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
         placing the IP address carried in the Next-Hop field of the
         received Replicator-AR route in the Global Administrator field
         of the Community, with the Local Administrator field of this
         Community set to 0.  [...]

The analogous text in draft-ietf-bess-evpn-bum-procedure-updates also
mentions "setting the Extended Communities attribute of the Leaf A-D
route to that Community"; would that be useful to include here as well?

      o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
         the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
         Identifier set to the IP of the advertising AR-LEAF.  The PMSI
         Tunnel attribute MUST carry a downstream-assigned MPLS label or
         VNI that is used by the AR-REPLICATOR to send traffic to the
         AR-LEAF.

This seems to be the only place where we specify the actual
format/contents (i.e., including Tunnel Identifier contents) of the "AR"
PTA tunnel type.  I would have expected something more declarative of a
declaration, that the IANA registration could point to.

Section 5.1

It's a bit unfortunate that there's so much overlap between the list of
"considerations" and the "rules" that an implementation must be
compatible with, but it may be too risky to try to coalesce them at this
time.

   b.  An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY
       advertise a Regular-IR route.  The AR-REPLICATOR MUST NOT
       generate a Regular-IR route if it does not have local attachment
       circuits (AC).  If the Regular-IR route is advertised, the AR
       Type field is set to zero.

This seems to merit some more substantial discussion, since the value of
zero in the AR type field is otherwise avoided in this document.  That
is, we have specific values for "leaf" and "acting as replicator", but
the value zero is normally "does not support optimized-ir".  Except here
it's also used for "replicator advertising as non-replicator role"; it's
probably appropriate to not abuse "leaf" for this case, but using zero
seems to in some sense be a different abuse.  Would the '11' value have
been usable to indicate this distinction?

Section 6

                                                                   The
   solution is called "selective" because a given AR-REPLICATOR MUST
   replicate the BM traffic to only the AR-LEAF that requested the
   replication (as opposed to all the AR-LEAF nodes) and MAY replicate
   the BM traffic to the RNVEs.  [...]

I'm not sure I understand the motivation behind MAY, here.  If we don't
replicate the BM traffic to RNVEs isn't that data loss?

Section 6.1

       o  When a node defined and operating as Selective AR-REPLICATOR
          receives a packet on an overlay tunnel, it will do a tunnel
          destination IP lookup and if the destination IP is the AR-
          REPLICATOR AR-IP Address, the node MUST replicate the packet
          to:
       [...]
          +  overlay tunnels to the remote Selective AR-REPLICATORs if
             the tunnel source IP is an IR-IP of its own AR-LEAF-set (in
             any other case, the AR-REPLICATOR MUST NOT replicate the BM
             traffic to remote AR-REPLICATORs), where the tunnel
             destination IP is the AR-IP of the remote Selective AR-
             REPLICATOR.  The tunnel destination IP AR-IP will be a

It seems like it would require less cognitive burden on the reader if we
disambiguated "tunnel source IP" as it relates to the incoming tunnel on
which the packet in question was received vs the outgoing tunnel to
which it is being replicated.  ("tunnel destination IP" is arguably
already disambiguated by the lead-in text that talks about doing a
lookup based on the tunnel the packet was received on.)  Given that the
"rules" that appear later to specifically say that it checks both
destination and source of the underlay IP header, it seems reasonable to
say something similar here when listing the "considerations".

          +  The Selective AR-REPLICATOR-set is composed of the overlay
             tunnels to all the AR-REPLICATORs that send a Replicator-AR
             route with L=1.  The AR-IP addresses are used as tunnel
             destination IP.

I'm not sure why the "that send a Replicator-AR route with L=1" clause
is needed -- if there are AR-REPLICATORS that send with L=0 then aren't
we required to fall back to the non-selective procedures?

   -  In any case, non-BM overlay tunnels are excluded from flood-lists
      and, also, source squelching is always done in order to ensure the
      traffic is not sent back to the originating source.  If the
      encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not
      the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
      the labels when forwarding them to the egress overlay tunnels.

I'm not sure that I understand which labels "the rest of the labels"
are in this context.

Section 6.2

   In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective
   AR-LEAFs.  NVE1 selects PE1 as its Selective AR-REPLICATOR.  If that
   is so, NVE1 will send all its BM traffic for BD-1 to PE1.  If other
   AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic
   from PE1.  These are the differences in the behavior of a Selective
   AR-LEAF compared to a non-selective AR-LEAF:

I think this might be the first time we concretely say what makes the
"selective" procedures earn that name (the combined selectivity of all
BM traffic, in both directions, between leaf and replicator, as opposed
to the non-selective case where leafs pick only the replicator that they
send to, and must receive from everywhere.  This seems like something
that would be useful to have much earlier in the document, e.g., in the
introduction.  (It's also somewhat different than the sense in which RFC
6513 constrasts selective and inclusive tunnels, though I expect it's
probably too late to try to change the terminology used here.)

Section 8

   -  An AR-REPLICATOR will perform IR or AR forwarding mode for the
      incoming Overlay packets based on an ingress VNI lookup, as
      opposed to the tunnel IP DA lookup.  Note that, when replicating
      to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI
      advertised by the egress node will determine the IR or AR
      forwarding mode at the subsequent AR-REPLICATOR.

Does this implicitly put a requirement on all AR-REPLICATOR
implementations to support the VNI-based scheme, since they might be
called upon to forward to another replicator using it?

Section 9.1

   In order to be compatible with the IP SA split-horizon check, the AR-
   REPLICATOR MAY keep the original received tunnel IP SA when
   replicating packets to a remote AR-LEAF or RNVE.  This will allow AR-
   LEAF nodes to apply Split-horizon check procedures for BM packets,
   before sending them to the local Ethernet-Segment.  Even if the AR-
   LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
   AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to
   other AR-REPLICATORs.

It seems unfortunate that an AR-LEAF node needs to have knowledge of the
configuration in use at remote AR-REFLECTORs in order to know if the
split-horizon check will be effective.  Is there no way to always
require certain replicator behavior and give the leafs reliable
knowledge in split-horizon scenarios?

Section 9.2

   Ethernet Segments associated to one or more AR-REPLICATOR nodes
   SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
   homing, as follows:

Is it really the "ethernet segments" that would follow local-bias
procedures, or the EVPN nodes attached to them?  Is putting SHOULD-level
guidance to this effect in effect updating the core EVPN specification
to privilege one way of handling multi-homing over others?  (Maybe not,
since the requirements only come into play when AR-REPLICATORs are
involved and we disclaim applicability to cases where AR-REPLICATOR and
AR-LEAF are on the same ethernet segment ... we might consider saying
that as some part of why that case is out of scope.)

Also, if we know of procedures other than local-bias that will still be
effective, we might mention them as some justification for why this is
only a SHOULD and not a MUST.

Section 10

Since we use the Leaf A-D route from [bum-procedure-update], we might
want to pull in its security considerations as well.

I feel like there may be some more considerations to mention that are
specific to the multi-homing case, but I don't think I understand that
scenario well enough to be able to state them, myself.

We might also mention that AR-REPLICATORs are, by design, using more
bandwidth than stock RFC7432 PEs would, and if they exceed their local
bandwidth that will cause service disruption.

The text that's here already does do a pretty good job of capturing the
important topics for the common case, though -- thanks!

   An implementation following the procedures in this document should
   not create BM loops, since the AR-REPLICATOR will always forward the
   BM traffic using the correct tunnel IP Destination Address that
   indicates the remote nodes how to forward the traffic.  This is true
   in both, the Non-Selective and Selective modes defined in this
   document.

(In the vein of my earlier comment,) what about the case when the tunnel
destination is expecting to use VNI to determine how to forward the
traffic?

Section 14.2

Having SHOULD-level guidance to use the "local bias" procedures detailed
in RFC 8365 might require that document to be promoted to a normative
reference; see
https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/

NITS

Section 1

   Section 3 lists the requirements of the combined optimized-IR
   solution, whereas Section 5 and Section 6 describe the Assisted-
   Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL)
   solution.

I suggest mentioning that sections 5 and 6 differ in that they cover the
selective and non-selective cases.

Section 2

   -  Regular-IR: Refers to Regular Ingress Replication, where the
      source NVE/PE sends a copy to each remote NVE/PE part of the BD.

s/part of/that is part of/

Section 4

   -  Regular-IR route: in this route, Originating Router's IP Address,
      Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used
      as described in [RFC7432] when Ingress Replication is in use.  The
      NVE/PE that advertises the route will set the Next-Hop to an IP
      address that we denominate IR-IP in this document.  When
      advertised by an AR-LEAF node, the Regular-IR route SHOULD be
      advertised with type T= AR-LEAF.

Hmm, down near the end of page 9 we say that AR-enabled nodes MUST
signal the proper AR type (1 or 2) according to its administrative
choice -- how is that MUST compatible with the SHOULD here?

Also, if we're going to write out T = 01 (AR-REPLICATOR) just a few lines
later, we should write out T = 10 (AR-LEAF) here.

      o  The AR-LEAF constructs an IP-address-specific route-target as
         indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
         placing the IP address carried in the Next-Hop field of the

Pedantically, "as indicated in [bum-procedure-update]" would involve
"placing the IP address carried in the Next Hop of the received I/S-PMSI
A-D route in the Global Administrator field of the Community", which is
obviously not going to be applicable in this case.  So "analogously to"
might be more appropriate than "as indicated".

         received Replicator-AR route in the Global Administrator field
         of the Community, with the Local Administrator field of this
         Community set to 0.  Note that the same IP-address-specific
         import route-target is auto-configured by the AR-REPLICATOR
         that sent the Replicator-AR, in order to control the acceptance
         of the Leaf A-D routes.

This "Note that ... is auto-configured by" phrasing suggests to me that
there is some more detailed text elsewhere laying out a requirement to
do this (and any needed procedures, though I suspect there are no real
procedures to document).  However, later on §6.1 refer back to §4 (here)
for "the AR-REPLICATOR auto-configures its IP-address-specific import
route-target as described in section Section 4."  Maybe we could write
this in a way that's more clearly a specification and binding on the
AR-REPLICATOR?

      o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
         the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel

"type" here seems to refer to the new T field in the PTA flags, and
should probably be referenced using consistent terminlogy.

   Each AR-enabled node MUST understand and process the AR type field in
   the PTA (Flags field) of the routes, and MUST signal the

(same point about consistent terminology for T/AR-type)

   corresponding type (1 or 2) according to its administrative choice.

I suggest writing "01" and "10" to match the previous treatment of the
two-bit field.

Section 5.1

   -  When an AR-REPLICATOR receives a BM packet on an AC, it will
      forward the BM packet to its flooding list (including local ACs
      and remote NVE/PEs), skipping the non-BM overlay tunnels.

I assume that it goes without saying that the AR-REPLICATOR does not
flood the packet back to the AC it came in on.  (The "rules" later in
the section do specify source squelching.)

Section 5.2

   b.  In this non-selective AR solution, the AR-LEAF MUST advertise a
       single Regular-IR inclusive multicast route as in [RFC7432].  The
       AR-LEAF SHOULD set the AR Type field to AR-LEAF.  Note that
       although this flag does not make any difference for the egress
       nodes when creating an EVPN destination to the AR-LEAF, it is

egress, or ingress?
Lars Eggert Former IESG member
No Objection
No Objection (2021-10-11 for -09) Sent
All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 10. , paragraph 3, nit:
-    traffic from the AR-LEAF nodes will be attracted by the existance of
-                                                                 ^
+    traffic from the AR-LEAF nodes will be attracted by the existence of
+                                                                 ^

Section 4. , paragraph 16, nit:
> as the AR-IP and SHOULD be different than the IR-IP for a given PE/NVE. o Tu
>                                      ^^^^
Did you mean "different from"? "Different than" is often considered colloquial
style.

Section 5.2. , paragraph 6, nit:
> REPLICATORS do, as described in section Section 5.1. 5.3. RNVE procedures RN
>                                 ^^^^^^^^^^^^^^^
Possible typo: you repeated a word.

Section 5.2. , paragraph 13, nit:
> ow the procedures described in section Section 5.1, except for the following
>                                ^^^^^^^^^^^^^^^
Possible typo: you repeated a word.

Section 5.2. , paragraph 15, nit:
> rt route-target as described in section Section 4. o The AR-REPLICATOR will b
>                                 ^^^^^^^^^^^^^^^
Possible typo: you repeated a word.

Section 6.1. , paragraph 4, nit:
> ist of AR-REPLICATORs, the node reverts back to non-selective mode and flood
>                                 ^^^^^^^^^^^^
Consider using "reverts".

Section 6.1. , paragraph 12, nit:
>  that the Selective AR-LEAF waits for a AR-LEAF-join-wait-timer (in seconds,
>                                       ^
Use "an" instead of "a" if the following word starts with a vowel sound, e.g.
"an article", "an hour".

Section 6.2. , paragraph 16, nit:
> R-REPLICATOR nodes are attached to the the same Ethernet Segment in the BD.
>                                    ^^^^^^^
Two determiners in a row. Choose either "the" or "the".

Document references draft-ietf-bess-evpn-bum-procedure-updates-10, but -11 is
the latest available revision.
Martin Duke Former IESG member
No Objection
No Objection (2021-10-12 for -09) Not sent
Thanks to Michael Tuexen for the TSVART review.