Benchmarking Methodology for Link-State IGP Data-Plane Route Convergence

Note: This ballot was opened for revision 12 and is now closed.

(David Ward) Discuss

Discuss (2007-07-17 for -)
A few comments on this draft:

0) it is unclear why RIP isn't covered 

1) it is unclear why the recommended timer values in 3.2.4 do not correspond to typical values configured in the network but, the route scaling numbers do

2) the packet sampling time in 3.2.5 seems out of date. IGPs converge faster than the packet sample time today.

3) it is recommended that the results are in usecs only

4) it is unclear if there are packet order test requirements for ECMP paths. Since the many ECMP tests are called out is there any 'correct' test or outcome that is desired for selection of ECMP path? 

5) On bcast interfaces it is unclear if both p2p/nbma and bcast needs to be configured

6) why don't the tests include generation of LSA/LSP as well as change in data plane. IOW, what is SUT/DUT in Fig2 as well as 4.1.3.

7) The results from the remote failure case 4.1.3 aren't quite correct:

"The additional
   convergence time contributed by LSP Propagation can be
   obtained by subtracting the Rate-Derived Convergence Time
   measured in 4.1.2 (Convergence Due to Neighbor Interface 
   Failure) from the Rate-Derived Convergence Time measured in 
   this test case."

Though the point is mostly academic, it isn't technically correct.

8) why don't we include fiber pull test and/or enable disable interface

9) Do we want to specify tests for "link up" vs just "link down?" Link up is a critical event in the network and frequently causes loops/microloops.

10) In the "metric change" test in 4.5 since traffic is moving from one intf to another there should be an observable convergence event unlike what is stated in expected results:

"There should be no externally observable IGP Route Convergence ..."

11) In general the results section of each tests should state what should be observed during the test, packet loss, packets tx/rx between any SUT and required systems, etc. Right now, there is a very brief description of the influencing variables of the test. It would not be possible to verify a true positive on a test w/ current text.

12) There needs to be a notion and testing of specific prefixes:

first, last and then a median and mean

13) There needs to be a notion of important prefixes or those that are biased for prioritized convergence. E.g. BGP Nexthops.

14) There should be a measure of any microloops formed and duration of any loops|microloops

15) Is graceful restart and time to restore FIB considered a convergence event? If not, why?

(Ron Bonica) Yes

Jari Arkko No Objection

(Stewart Bryant) (was Discuss) No Objection

Comment (2011-08-15)
This is much improved since the previous version, and the RFC Editor's not addresses my remaining concerns.

(Ross Callon) No Objection

(Gonzalo Camarillo) No Objection

(Ralph Droms) No Objection

(Lisa Dusseault) No Objection

(Lars Eggert) No Objection

(Adrian Farrel) (was Discuss) No Objection

Comment (2010-06-30)
Section 1

   The test cases in this document are black-box
   tests that emulate the network events that cause convergence, as
   described in [Po09a].

Do the event cause convergence or necessitate convergence?


Please also address the large number of minor issues raised in the Routing Area Directorate review from Julien Meuric as follows...


I have been selected as the Routing Directorate reviewer for this draft. 
The Routing Directorate seeks to review all routing or routing-related 
drafts as they pass through IETF last call and IESG review. The purpose 
of the review is to provide assistance to the Routing ADs. For more 
information about the Routing Directorate, please see

Although these comments are primarily for the use of the Routing ADs, it 
would be helpful if you could consider them along with any other IETF 
Last Call comments that you receive, and strive to resolve them through 
discussion or by updating the draft.

Document: draft-ietf-bmwg-igp-dataplane-conv-meth-21.txt
Reviewer: Julien Meuric (with the help of an anonymous colleague eating 
IGPs at breakfast)
Review Date: 06/30/2010
Intended Status: Informational

I have some minor concerns about this document that I think should be 
resolved before publication.

The document is rather heavy: it covers multiple scenarios, gives 
several sequences of testing actions, analyses details about 
uncertainty... As a result, for someone not used to the BMWG (please 
keep in mind that this is my 1st review on a document from BMWG) it is 
not so easy to follow in every detail and it requires some back-up 
reading (draft-ietf-bmwg-igp-dataplane-conv-term for instance).

*Major Issues:*
No major issues found.

*Minor Issues:*
1/ I imagine it has already been discussed on the WG (sorry if I bring 
back a troll), but it seems unusual to use RFC 2119 language for an 
Informational document, and that is why it is explicitly stated in 
section 2. Considering the status remains the same, instead of 
advertising that fact, would not it be simpler to avoid the capital 
letters in the corresponding words?
2/ My GMPLS background brings me to think that an IGP adjacency may be 
independent from the corresponding data link. The document seems to 
focus on the classical IGP use, but it would be better to make that 
context clearer through a simple sentence than considering it is the 
3/ There is unfortunately no reference to traffic-engineering 
extensions, while it might impact IGP convergence. Adding a few words on 
this so as to state it is out of scope (if so) would be welcome.
4/ By reading section 3, we understand that the causes considered for 
testing in this methodology concern failures and administrative changes 
(status, costs). Therefore, the link insertion/recovery is apparently 
not part of the testing. However, we can find it in section 8 if we take 
a close look to the procedure steps. As a consequence, in order to stay 
clearly consistent to draft-ietf-bmwg-igp-dataplane-conv-app-17 
referenced here, it would be useful to clarify somewhere in section 3 
that interface or link insertion/recovery is treated along with the 
failure events and is therefore taken into account.
5/ The document will also gain in stating from the introduction the 
scope of this methodology regarding router stress in front of 
convergence performance (i.e. what is addressed in section 5). For 
example, add something like:
"Convergence performance is tightly linked to the number of tasks a 
router has to deal with. As the most impacting tasks are mainly related 
to the control plane and the data plane, the more the DUT is stressed as 
in a live environment, the more accurate performance results (i.e. the 
ones that would be observed in a live environment) will be. Section 5 
gives detailson the recommended environment for IGP convergence 
performance benchmarking."

Even though it may be usual in the WG, the way document references are 
built ("AuthID#") is much less readable than "Summarized-Title" as used 
in some places else. Let us hope most of them will be update with RFC 
numbers (not more convenient in fact, but stable reference).
The phrase "next-hop router" may be confusing (at least until going into 
the details), especially because in some contexts like BGP, a next-hop 
router may not be adjacent but remote. How about "adjacent routers" to 
reuse IS-IS terminology or "neighbor routers" to reuse OSPF terminology?
The "ECMP" acronym is expanded in section 3.4 (where it is actually 
tested) while it has been used since section 3.1: expansion should be 
moved (or duplicated) there.
A mix of "Loss of connectivity" and "LoC" acronym are used 
alternatively: strict consistency along the document may not be a goal, 
but association between them should at least be explicit at 1st use 
(section 4).
"IS-IS" is always referred to as "ISIS", I would add the dash.
Some titles on figures (e.g. 9) and equations (e.g. 3) are closer to the 
following paragraph than the corresponding item, swapping or reducing 
the amount of blank lines would be easier to read.
Section 2:
s/in other BMWG work/in other documents issued by the Benchmarking 
Methodology Working Group/
Section 3.1:
At 1st occurence, it might be more accurate to specify that "N >= 1" or 
"N > 0".
Section 3.4:
"the tester emulates N next-hop routers"
Whitout the figure, it is difficult to quickly picture the 
configuration. I may ease the understanding by adding something like 
"(N-1 adjacent to R1; 1 adjacent to R2)".
Section 5.4: "LSA", "LSP" and "SPF" are not expanded: they may be usual, 
but IGP is expanded in the abstract and introduction (and "LSP" has 2 
usual meanings in the Routing area)... The same question may raise for 
"IS-IS" and "OSPF" expansion, but they are considered as "well-known" on (while 
the formers are not).
Section 5.6:
s/topologies 3, 4, and 6/topologies 3, 4 and 6/
s/packets are transmitted/packets be transmitted/
Section 5.9:
s/test case has/test case have/
Section 7:
s/loss or not./loss or not?/
s/Complete the table below/The table below should be completed/
Section 8:
"DUT's" and "Tester's" read weird to me with respect to what I was 
taught at school, but someone put "the car's wheels" on Wikipedia. I 
thus leave this issue to native English speakers. :-)
Section 8.1.4:
s/may influenced/may be influenced/

Stephen Farrell No Objection

(Sam Hartman) No Objection

(Russ Housley) No Objection

(Cullen Jennings) No Objection

(Chris Newman) No Objection

(Tim Polk) No Objection

(Pete Resnick) No Objection

Comment (2011-08-25)
This document seems to be misusing RFC 2119 language. They don't seem to follow the admonition in section 6 of 2119:

   Imperatives of the type defined in this memo must be used with care
   and sparingly.  In particular, they MUST only be used where it is
   actually required for interoperation or to limit behavior which has
   potential for causing harm (e.g., limiting retransmisssions)  For
   example, they must not be used to try to impose a particular method
   on implementors where the method is not required for

(Dan Romascanu) No Objection

Comment (2007-07-18 for -)
1. The Abstract says 'The methodology can be applied to any link-state IGP, such as ISIS and OSPF.' Is it true that the methodology applies only to link-state IGPs? If true, I would suggest that the title is change to add 'link-state'. Else strike out 'link-state' from the Abstract. 

2. Section 3.2.2 - 'To obtain results similar to those that would be 
   observed in an operational network, it is recommended that the 
   number of installed routes closely approximates that the network.' 

Probably '... that of the network'  

An indication of the deegree of magnitude of this number also seems to be in place here. 

3. Section 4.2 - what does 'remove layer 2 session' mean? I read layer 2 failure a failure that is detected at layer 2, but can reflect a fault that happens in the lower layer and can be as trivial as a cable failure. Am I wrong?

(Peter Saint-Andre) No Objection

(Robert Sparks) No Objection

(spt) No Objection

(Magnus Westerlund) No Objection