Micro-loop prevention by introducing a local convergence delay
draft-ietf-rtgwg-uloop-delay-08

Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.

Alvaro Retana Discuss

Discuss (2017-10-10 for -07)
Section 5.1. (Definitions) refers to a couple of “existing IGP timers”.  I understand the concepts, but can you please reference the IGP documents where these timers are defined?  I quickly checked rfc2328 and couldn’t find a specific place that talked about LSP_GEN_TIMER (or LSA, of course!), or a similar concept.  SPF_DELAY seems to be introduced by I-D.ietf-rtgwg-backoff-algo.  Given that the rest of Section 5. (Specification) is built on these “existing IGP timers”, I think that the references should be Normative.

Note also that the description in Section 5.2. (Current IGP reactions) is described (in 5.3) as the “standard IP convergence” and carries a “MUST” associated with it.  It was mentioned (in 5.1) that the timers in question are “often associated with a damping mechanism”, which is not part of the base IGP specifications. 

I’m putting this comment in as a DISCUSS given that understanding the definitions (and having then Normative references) is necessary for the implementation of the mechanism described.  I think it should be easy to resolve by just adding the appropriate references.
Comment (2017-10-10 for -07)
(1) Where do the numbers in the “Route computation event time scale” table come from?  Please put a reference or at least some guidance to the origin of the information.  If it's just for informational purposes, then please say so.  BTW, please also put a number on the table.  [I have the same question for the tables in Section 9.]

(2) Section 5.4. (Local delay for link down) specifies that “update of the RIB and the FIB SHOULD be delayed for ULOOP_DELAY_DOWN_TIMER msecs.  Otherwise, the RIB and FIB SHOULD be updated immediately.”  When would ULOOP_DELAY_DOWN_TIMER not be applied?  Along the same lines, if there’s no delay mentioned in Step 5 of 5.3, when would the RIB/FIB not be updated immediately?  IOW, why are these “SHOULDs” not “MUSTs”?

(3) What should be the default setting for ULOOP_DELAY_DOWN_TIMER?  Section 9. (Examples) shows a couple of manually configured (?) scenarios, but no guidance is present in the document.  Please include guidance (maybe based on the local network convergence, or even a default that manufacturers can use) in the Deployment Considerations section.

(4) Section 11. (Existing implementations).  Please take a look at RFC7942.


Nits:

s/any traffic destined to D if a neighbor did not/any traffic destined to D; if a neighbor did not

s/can be work/can work

“IGP shortcut feature”: a reference would be nice

Alia Atlas Yes

Deborah Brungard No Objection

Ben Campbell No Objection

Comment (2017-10-11 for -07)
(Oops, sorry, I entered the bit about addressing my comments for the wrong draft. The following comments still apply.)

- General: Do I undertand correctly that this is a black-box implementation detail? I note that section 4 explicitly says that it is a local-only feature that does not require interoperability. If so, then standards track seems inappropriate. BCP or informational seems to make more sense. Since there are recommendations here, I think BCP is the right choice.  (I note Adam made a similar comment.)

-11: Do you expect this section to stay in the RFC? It is likely to become outdated rather quickly.

Editorial Comments:

- General: Please number the tables.

- sections 2 and 3 and their child sections have quite a few grammar errors. Please proofread it again. I mention a few specifics below, but doubt I caught everything.

- 2, first paragraph: " That means that all non-D neighbors of S on the topology will send to S any traffic destined to D if a neighbor did not, then that neighbor would be loop-free."
I can't parse that sentence. Is it a run-on sentence, or are there missing words?
-- S / "can be work" / "can work"

-3: " may cause high damages for a network."
I suggest " may cause significant network damage".

-4, last paragraph: "This benefit comes at the expense of eliminating transient forwarding loops involving the local router. "
How is that an "expense"? Isn't it the whole point?

-5.3, first paragraph and paragraph before figure 4:
The MUST is stated twice. Please avoid redundant normative statements. Even if they agree now, they can cause maintenance issues down the road.

Alissa Cooper No Objection

Spencer Dawkins No Objection

Suresh Krishnan No Objection

Warren Kumari No Objection

Comment (2017-10-11 for -07)
Section 1.  Introduction:
"That means that all non-D neighbors of S on the
   topology will send to S any traffic destined to D if a neighbor did
   not, then that neighbor would be loop-free."
 -- I was unable to parse the above. I may just be overtired, but it feels like there are some missing words.


Nits:
" When S-D fails, a transient forwarding loop may appear between S and
   B if S updates its forwarding entry to D before B."
 -- Perhaps "... entry to D before B does." or "... before B updates its forwarding entry"? 

Section 2.1.  Fast reroute inefficiency
"On the  router C, the nexthop to D is the tunnel T thanks to the IGP  shortcut." 
s/the// 

"On C, the tail-end of the TE tunnel (router B) is no more on the shortest-path tree (SPT) to D, ..."
s/is no more on/is no longer on/
(related)
"... so C does not encapsulate anymore the traffic to D..."
s/does not encapsulate anymore/no longer encapsulates/

Section 3.  Overview of the solution
"This ordered convergence, is similar to the ordered FIB ..."
s/,/ (superfluous).

Mirja Kühlewind No Objection

Comment (2017-10-09 for -06)
Nit in section 9:
You should probably not talk about 'our' solution or mechanism in an RFC:
s/our/this/ or s/our X/the X described in this document/ 
This appears multiple times in section 9.

Kathleen Moriarty No Objection

Comment (2017-10-10 for -07)
Thanks for addressing the SecDir review comments:
https://mailarchive.ietf.org/arch/msg/secdir/tnRc2LPp6FqfDeyqd2cJExEtdXA

Eric Rescorla No Objection

Comment (2017-10-11 for -07)
Line 115
   Consider the case in Figure 1 where S does not have an LFA to protect
   its traffic to D.  That means that all non-D neighbors of S on the
You need to define LFA.


Line 118
   topology will send to S any traffic destined to D if a neighbor did
   not, then that neighbor would be loop-free.  Regardless of the
   advanced fast-reroute (FRR) technique used, when S converges to the
This is not a grammatical sentence.


Line 132
        S ------ B
             1
        Figure 1
What do the numbers in this box mean? I assume they are route metrics, but you need to say so.


Line 136
   When S-D fails, a transient forwarding loop may appear between S and
   B if S updates its forwarding entry to D before B.
Something seems to have gone badly wrong with this paragraph. Are these lines supposed to be in the previous paragraph.


Line 326
      unstable.  As an example, [I-D.ietf-rtgwg-backoff-algo] defines a
      standard SPF delay algorithm.
You need to define SPF here.


Line 338
   1.  The Up/Down event is notified to the IGP.
Usually, one would say that the IGP is notified of...


Line 552
           S

             Figure 7
Is this the same as the previous figure with T running CEAB?

Adam Roach No Objection

Comment (2017-10-10 for -07)
This document doesn't really define any new on-the-wire protocol. Was publication as a BCP rather than a standards track document considered?

The Introduction contains the following text:

   That means that all non-D
   neighbors of S on the topology will send to S any traffic destined to
   D if a neighbor did not, then that neighbor would be loop-free.

I can't parse this sentence. Is there supposed to be a sentence break somewhere in there?

The introduction starts talking about post-failure events (e.g., "when S converges to the new topology") before mentioning a failure of the S-D link. This makes it very hard to follow. Would suggest mentioning the failure being considered before talking about the ensuing events.

Section 4 begins:

   This document defines a two-step convergence initiated by the router
   detecting a failure and advertising the topological changes in the
   IGP.  This introduces a delay between the convergence of the local
   router and the network wide convergence.

This reads backwards to me. With this technique, the network converges first, followed by an introduced delay, followed by router convergence. Right?

Further on in that section:

   This benefit comes at the
   expense of eliminating transient forwarding loops involving the local
   router.

I can't make sense of this. Eliminating transient forwarding loops is a good thing, right? Not an expense?

I agree with Alvaro that the lack of a recommended default for ULOOP_DELAY_DOWN_TIMER is an issue, especially as the values configured in the examples seem to change arbitrarily from 1 second to 2 seconds.

Benoit Claise No Record

Terry Manderson No Record

Alexey Melnikov No Record