Early Review of draft-ietf-rtgwg-backoff-algo-04
review-ietf-rtgwg-backoff-algo-04-rtgdir-early-vainshtein-2017-04-27-00

Request Review of draft-ietf-rtgwg-backoff-algo-04
Requested rev. 04 (document currently at 10)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2017-04-30
Requested 2017-03-31
Requested by Jeff Tantsura
Other Reviews Secdir Last Call review of -07 by Benjamin Kaduk (diff)
Genart Last Call review of -07 by Elwyn Davies (diff)
Opsdir Last Call review of -07 by Mehmet Ersue (diff)
Review State Completed
Reviewer Sasha Vainshtein
Review review-ietf-rtgwg-backoff-algo-04-rtgdir-early-vainshtein-2017-04-27
Posted at https://mailarchive.ietf.org/arch/msg/rtg-dir/QYtccPtnx5FYn0Nl_I5dc7_1j-o
Reviewed rev. 04 (document currently at 10)
Review result Has Issues
Draft last updated 2017-04-27
Review completed: 2017-04-27

Review
review-ietf-rtgwg-backoff-algo-04-rtgdir-early-vainshtein-2017-04-27

Hello,

I have been selected as the Routing Directorate as an early reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. In this case an early review has been requested by Jeff Tantsura – one of the co-chairs of the RTGWG.
The purpose of the review is to provide assistance to the Routing ADs.   For more information about the Routing Directorate, please see ​http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Document: draft-ietf-rtgwg-backoff-algo-04
Reviewer: Alexander (“Sasha”) Vainshtein
Review Date: 27-Apr-17
IETF LC End Date: N/A
Intended Status: Standards Track

Summary:
I have some minor concerns about this document that, from my POV, should be resolved before publication.

Comments:
The draft is very well written, and, with one notable exception, easy to understand.
It represents an attempt to standardize one aspect of behavior of link-state routing protocols: delay between the first IGP event that triggers new SPF computation and the SPF calculation. Until now, this been left for the implementers to play with freely. The resulting differences have been known for quite some time to result in some case in transient micro-loops.

The resolution proposed in this draft includes a well-defined FSM and a full set of tunable parameters (timers) used in this FSM.
The range and granularity of all the tunable parameters are explicitly defined in the document so that the operator would be able to tune its network to use exactly the same SPF delay algorithm with exactly the same parameters. (The default values are not defined because one size does not fit all in this case).

It should be noticed that the draft does not intend to provide a comprehensive solution of the micro-loop problems.
Rather, it provides a common baseline upon which specific solutions for these problems can be built (e.g., see draft-ietf-rtgwg-uloop-delay<https://datatracker.ietf.org/doc/draft-ietf-rtgwg-uloop-delay/?include_text=1>).

Major Issues: None found.

Minor Issues:

1.       The exception to good readability of the draft refers to the term “proximate failures/IGP events” that appears 4 times in the draft. English is not my mother tongue, and the reference<https://www.merriam-webster.com/thesaurus/proximate> I’ve looked up did not help much. “Temporally close” (or something along these lines) looks like a suitable alternative. (Does this comment run strictly against the recommendation for the RTG-DIR reviewers “to avoid raising esoteric questions of English usage”?)

2.       Section 5 mentions starting the SPF_TIMER (with one of the 3 values defined for it) as part of the response to some FSM events if it was not already running -  but it does not specify what happens when this timer expires. I assume that its expiration leaves the FSM in its current state and results in running the SPF computation – if this is correct, it would be nice to say that explicitly.

3.       Section 7 recommends that,  in order to mitigate micro-loop problems using the proposed algorithm, “all routers in the IGP domain, or at least all the routers in the same area/level, have exactly the same configured  values” of the relevant timers . However, the draft does not specify whether these timers should be configured just at the protocol instance level or also at the level of each specific area/level. From my POV, the granularity of configuration should be defined in this draft – one way or another.

4.       The latest versions of the YANG data model drafts for IS-IS and OSPF already define the timers introduced in this draft. But there are no references to these drafts in the document. From my POV such references (Informational and therefore non-blocking) would be useful for the readers, and I suggest to add them.

5.       I have some concerns regarding incremental introduction and activation of the proposed algorithm. The operator that runs a well-tuned network may experience transient problems when some of its routers are already upgraded and use the proposed back-off algorithm while some others still cannot do that. Some text explaining potential issues in this scenario and, if possible, their mitigation, would be most helpful.

6.       The explanatory text in the draft seems to strongly suggest that  SPF_INITIAL_DELAY <= SPF_SHORT_DELAY <=  SPF_LONG_DELAY –  but this is not formalized as a requirement anywhere in the text. From my POV satisfying this relationship should be RECOMMENDED to the operators.

NITS:

1.       Section 3 lists 3 possible values for the SPF_DELAY variable called INITIAL_SPF_DELAY, SHORT SPF_DELAY and LONG_SPF_DELAY. Then, in the last para, it refers to a previously undefined value, INITIAL_WAIT. This is an obvious typo and should be replaced with INITIAL_SPF_DELAY

2.       One of the parameters of the algorithm is called HOLD_DOWN_INTERVAL (in Section 3 and Section 6)  vs. HOLDDOWN_INTERVAL in Section 5. This also looks like an obvious typo, and the same name should be used across the document.

I have discussed my concerns about the draft with the authors who have been most cooperative.
I believe that we have reached an agreement on acceptable resolution of all concerns listed above.

Regards,
Sasha

Office: +972-39266302
Cell:      +972-549266302
Email:   Alexander.Vainshtein@ecitele.com


___________________________________________________________________________

This e-mail message is intended for the recipient only and contains information which is 
CONFIDENTIAL and which may be proprietary to ECI Telecom. If you have received this 
transmission in error, please inform us by e-mail, phone or fax, and then delete the original 
and all copies thereof.
___________________________________________________________________________