Last Call Review of draft-ietf-bess-evpn-fast-df-recovery-09
review-ietf-bess-evpn-fast-df-recovery-09-genart-lc-davies-2024-08-12-00
review-ietf-bess-evpn-fast-df-recovery-09-genart-lc-davies-2024-08-12-00
I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <https://wiki.ietf.org/en/group/gen/GenArtFAQ>. Document: draft-ietf-bess-evpn-fast-df-recovery-09 Reviewer: Elwyn Davies Review Date: 2024-08-12 IETF LC End Date: 2024-07-31 IESG Telechat date: 2024-08-22 Summary:I apologise for the rather late delivery of this review. This was partly due to domestic duties (it was our family birthday/anniversary period which diverted me from reviewing) and also due to my taking some time to come to grips with the document. I am not an expert in the EVPN technology althoughin essence this is not complex but it took me some time to get a handle on the technology which this document is trying to improve. Be that as it may, there appear to be some areas where the document is not internally consistent (see the Major Issue) and I think the reliance on what is an extended example (s3) to explain the operation of the technique has lead to a less than robust explanation of the generic system, particularly the conflation of the values of the SCT offset from the time when local recovery is complete and the time delay used to await the arrival of additional RT-4 messages from other PEs. These should be separate parameters in my opinion, and conflating them could lead to the skew offset resulting in operations being scheduled before the end of the time delay period. Major Issues: s2 vs s3: If I read the text correctly, the last two paragraphs of s2 appear to imply that the newly inserted PE performs its service carving (just some transition to DF state) at the advertised SCT whereas the partner PEs make all their transitions DF->NDF and NDF->DF at SCT+skew (where skew is negative). However the latter part of s3 appears to imply that the partner PEs only make their DF->NDF transitions at SCT+skew and both the inserted and partner PEs make their NDF->DF transitions at SCT. This seems to be inconsistent. Minor issues: s2 and s3: Appropriate choice of SCT: The SCT is an absolute time. It is passed to the other PEs which then have to calculate another absolute time which is 'skew' earlier than the SCT value at which time the other PEs are intended to take action.. Thus at the very worst the SCT needs to be 'skew' in the future at the time it is transmitted to the other PEs so that this time of action is not in the past. I think there needs to be a discussion of the calculation of the SCT to avoid the other PEs being requested to take action at a time which has now passed or before they might have received all RT-4s. The discussion in s3 conflates the offset of the SCT with the Timer period for awaiting other RT-4 receptions. I think this means that SCT+skew is before the expiry of the Timer. s2.1. para 3: Improving NTP Era handling: The need to worry about the NTP Era seems unfortunate. If it was assumed that the current NTP Era applied to all SCT values, only values of SCT less than the value of 'skew' would cause issues as the time value is used here. Constraining SCT to be greater than 'skew' is not an enormous computational burden and the chances are that postponing the restart of a PE device by one 'skew' if it was lucky enough to need to restart within one 'skew' of the era changeover are unlikely to be problematic. Nits/editorial comments: Global: s/i.e./i.e.,/ (2 instances) Global: s/BGP Extended Community/BGP EVPN Extended Community/ Abstract, para 1: Provide a note of RFC 7432 as the basic RFC for the EVPN solution and flag RFC 8584 when HRW is first mentioned. Also s/[RFC8584]/(RFC 8584)/ as references are not allowed in the Abstract. Abstract, para 1: s/Highest Random/the Highest Random/, s/of the failed link/of a failed link/ Abstract and s1, para 2: These paras mention 'signalling between the recovered node' but the previous words refer to recovered node or link. If it is a link that is recovered, which node is involved or how else is the recovery improved? Abstract and s1, para 1: The terms 'becoming pervasive' and 'next generation' are not future proof. Suggest s/becoming pervasive/extensively used/ and omit 'next generation'. s1, para 2:s/Frowarder/Forwarder/ s1.3, para 1: The term 'Layer2 duplicate' is used. Since we are dealing with an Ethernet infrastructure by definition, presumably this means a duplicated Ethernet packet. Can this term be used? Otherwise this needs some explanation. s1.3, para 2: The term 'redundancy group' appears in bullet point 3 of Section 8.5 of RFC7432 without precise definition. According to the Cisco EVPN deatures for the IOS XR Release 7.6 (https://www.cisco.com/c/en/us/td/docs/iosxr/ncs5500/vpn/76x/b-l2vpn-cg-ncs5500-76x/evpn-features.html), Redundancy Group membership is configured during startup. I think this term might merit some more specific explanation in this document (or an erratum registered for RFC7432). s1.3, para 2, 2nd sentence and para 3: Under certain conditions, this may cause Layer2 duplicates and potential loops if there is a momentary overlap in forwarding roles between two or more PE devices, consequently leading to broadcast storms. Where can one see evidence for this statement and identification of the conditions that lead to these problems? I think this may be covered by the initial part of s3. In which case a pointer to this would be helpful. I am not sure if s1.3,para 3 refers to another difficulty or is a duplication. Please clarify and again provide evidence and identification of the conditions. Also the last segment of s1.3 repeats a description of the nature of the problem described in para 2. I think the section needs tightening up to give a single description of the symptoms and possibly give pointer to where problem has been identified and quantified. s1.2: Additional terms need to be defined: NDF, SCT (usefully included in the terminology sctin). s1.3, para 5 : s/HRW also cannot help/HRW cannot help either/ s1.4, para 1: s/presents multiples advantages/offers multiple advantages/ s1.4, bullet 2: I cannot parse: by ensuring that PEs any unrecognized new BGP Extended Community. s1.4, bullet 4: suggest OLD: (Route Type 4) NEW: (Route Type 4; See [RFC7432] Sections 7 and 7.4) END s1.4, bullet 5: "....and normalizes to NTP for EVPN signalling only." I don't think 'normalizes' is the right term here. Do you mean defaults? Maybe I will see when I read further on. s2, para 3: OLD: A new BGP Extended Community, the Service Carving Timestamp NEW: IANA has allocated a new sub-type for the BGP EVPN Extended Community (type 0x06) [RFC7153], defining a community of PEs that utilize the time synchronization recovery mechanism. The "Service Carving Timestamp" with sub-type value 0x0F (see Section 6) is used in communicating the Serving Carving Time (SCT) for each Ethernet Segment route (RT-4) to other partners to ensure an orderly start up or transfer of forwarding duties. END s2, para 3: It may be obvious but I think it needs to be emphasised that the skew value must be consistent across all the PEs. I assume that the intention is that the skew value should be administratively configurable in PEs supporting RT-4. Should there be some advice on range of sensible values? s2. para 3: The term RT-4 needs to be expanded on first use (or better RT-4 and SCT should be expanded in the terminology section). s2.1, paras 1 and 2: These paragraphs largely duplicate the definition of the Service Carving Timestamp in s2. I suggest they are replaced with: The BGP advertisement of each Ethernet Segment route (RT-4) where this scheme is to be used contains an EVPN Extended Community (type 0x06) with Service Carving Timestamp sub-type (Type 0x0F). The expected Service Carving Time is encoded as an 8-octet value as follows: s3.1, para 3: s/the 64-bit NTP Timestamp Format/ an adapted form of the 64-bit NTP Timestamp Format/ s2.1, para 7: OLD: The use of a 16-bit fractional seconds yields adequate precision of 15 microseconds (2^-16 s). NEW: The use of a 16-bit fractional seconds value yields adequate precision of approximately 15 microseconds (2^-16 s). s2.1, para 8: Note that the short naming of the flags as 'A' and 'T' is purely local to this document. The IANA registry does not register this naming although 'A' is used in the same way in RFC 8584. I suggest OLD: This document introduces a new flag called "T" (for Time Synchronization) to the bitmap field of the DF Election Extended Community defined in [RFC8584]. NEW: This document introduces a new flag called Time Synchronization ) indicated by "T" in the bitmap field of the DF Election Extended Community defined in [RFC8584] (see Figure 3). END s3.1/s4: What should happen if a PE with SCT capability is in process of recovering and a PE without SCT capability that was not previously in the redundancy group starts recovery? Doubtless a very rare occurrence but might occur. for example, if a hardware replacement happened. s6: This section needs to be redrafted in more conventional IANA Considerations format. There should not be a date column. It would be helpful to have references to the IANA registries in the Normatiive Refs.