Skip to main content

Last Call Review of draft-ietf-bmwg-protection-meth-
review-ietf-bmwg-protection-meth-genart-lc-halpern-2012-08-10-00

Request Review of draft-ietf-bmwg-protection-meth
Requested revision No specific revision (document currently at 14)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2012-04-10
Requested 2012-03-09
Authors Rajiv Papneja , Samir Vapiwala , Jay Karthik , Scott Poretsky , Shankar Rao , JL. Le Roux
I-D last updated 2012-08-10
Completed reviews Genart Last Call review of -?? by Joel M. Halpern
Secdir Last Call review of -?? by Kathleen Moriarty
Assignment Reviewer Joel M. Halpern
State Completed
Request Last Call review on draft-ietf-bmwg-protection-meth by General Area Review Team (Gen-ART) Assigned
Result Ready
Completed 2012-08-10
review-ietf-bmwg-protection-meth-genart-lc-halpern-2012-08-10-00
I am the assigned Gen-ART reviewer for this draft. For background on 


Gen-ART, please see the FAQ at 


<

http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.






Please resolve these comments along with any other Last Call comments 


you may receive.




Document: draft-ietf-bmwg-protection-meth-09.txt
    Methodology for benchmarking MPLS protection mechanisms
Reviewer: Joel M. Halpern
Review Date: 9-March-2012
IETF LC End Date: 20-March-2012
IESG Telechat date: N/A



Summary: This document is almost ready for publication as an 


Informational RFC.




Major issues:
    I find a definitions section (Section 3)that says
"This document also uses existing terminology defined in other BMWG work."


and follows this with examples, but neither a complete list of terms nor 


a complete list of documents, to be an unclear approach.  If the reader 


finds terms they do not know, they have no good indication as to what 


document(s) they should read to repair the gap.






   The description of the General Reference Topology (section 4) seems 


unclear to me.  The text starts out discussing, and the diagram 


explicitly shows, a Traffic Generator and a Traffic Analyzer.  In the 


diagram these are two disparate devices, connected to routers R1 and R5 


respectively.  So far, so good.  However, the text then talks about 


"the Tester" as being made up of the Traffic generator and the Traffic 


Analyzer", and describes "the Tester" as being directly connected to the 


Device Under Test.  It is exceedingly unclear whether this is supposed 


to mean that the full collection of routers R1-R6 are the device under 


test, or whether R1, R5, or some other specific router is the device 


under test.






    Could an effort be made to reword section 5.7?  First it says "one 


or more traffic streams".  Then it says "16 flows".  Then it talks about 


traffic spreading across some set of prefixes.  And the description of 


the reason for not doing round-robin across the prefixes leaves me even 


more confused about what one actually should set up.






    Section 5.8 describing the capabilities of "the Tester" seems to 


contradict section 4, where "the Tester is comprised of" the Traffic 


generator and the Traffic Analyzer.  The capabilities listed in section 


5.8 go well beyond that.






    The 8 scenarios shown in section 6 all have Mid-Point PLRs as far 


as I can tell.  Section 7 says that the test it describes can be applied 


to all the 8 cases from section 6.  But then it carefully describes 


cases of Headend, Mid-Point, and Egress PLR.  But no examples of the 


first or third have been shown.  Thus, I do not see, for example, as 


described in section 7.1. one can select a scenario from section 6, and 


then establish a headend PLR.






   This reviewer would like to verify that the test procedures 


described produce a meaningful value for items like Failover Packet Loss 


and Failover Time.  Is there a specific reference for these, since the 


actual calculations are not described here?






    Finding the definition of the Failover Time calculation methods 


hidden in the reporting format (section 8) was quite surprising.  Given 


that these are important definitions for the meaning fo the tests, they 


should occur before the test descriptions, not in the reporting format.




Minor issues:


    As noted by id-nits, section 3 references TERM-ID as a document 


defining terminology, but there is no such ID in the list of references. 


 And why do the section headers for 5.1, 5.2, and 5.6 also have 


"[TERM-ID]"?  Note that even if those section headers are defined terms, 


it is stylistically unusual to put the reference into the section header.


(It almost looks like "TERM-ID" was a marker for things which still 


needed a proper reference.)






    In section 5.1 a set of example failure events is listed.  It is 


unclear whether this list is the ones to be tested for, or just "some" 


events.  In addition, it is unclear why there is inconsistency in the 


coverage of the descriptions of the failures. The three different 


monitoring methods are mentioned explicitly with the Interface Shutdown 


failures, but are not even mentioned for the other failures.  And then 


while most of the failures list local or remote side, the last two 


failures do not indicate a side.  Why?






    Some of the abbreviations in section 6 are unclear.  For example, 


since there is no real provider it is not clear which router(s) are 


meant by PE as distinct from P routers.  Also, while I familiar with 


Layer3 VPN, I am not familiar with the usage"Layer2 VC".  Further given 


taht VPNs have different label usages, I suspect that both "Layer3 VPN" 


and "Layer2 VC" are insufficiently specific to match to a specific size 


label stack.


    As an example of the above confusion, in the figure in section 


6.1.2, the number of labels  in the Layer3VPN from the PE to the P 


router is described as going from 2 to 3 upon failure.  The PE->P link 


in the diagram is R1->R2, which is upstream of the failure.  So the 


number of labels on that link won't change.  The number of labels on the 


R6->R3 P->PE link (assuming I have properly guessed what PE is) does go 


from 2 to 3.  But the lines refer simply to PE-P.


    Similarly, while I suspect that the numbers are accurate, it is 


very hard to map the pre-failure label counts to the diagrams in a way 


that explains the difference between the numbers in section 6.1.1/6.1.2 


and  6.1.3 and onward.  Assuming PE-PE traffic is HE-TE traffic, then 


the internal topology should not affect the label count on that.  So you 


probably mean something else.  But I don't know what.






    It is unclear what section 7 means by "Select an overlay technology 


(e.g. IGP, VPN, or VC)."  Please clarify.






    Why is section 7.1.3 (determining tailend performance) included in 


the document, when no test cases include tailend failure?






    Is it an issue that the timestamp based  method for determining the 


failover time will, on average, overestimate the failure time by on 


inter-packet interval?  (Based on assuming that on average the failure 


and recovery are each uniformly distributed across the inter-packet 


interval.)





Nits/editorial comments:


    Section 7.1.2 item A refers to 9 scenarios from section 6. There 


are only 8.


    Section 7.4 decides to leave out the nubmer of scenarios from 


section 6, leading to a surprising, but otherwise probably meaningless, 


difference in wording.