Benchmarking Methodology for Software-Defined Networking (SDN) Controller Performance
RFC 8456
Yes
No Objection
Note: This ballot was opened for revision 08 and is now closed.
Warren Kumari Yes
Alvaro Retana No Objection
(Adam Roach; former steering group member) No Objection
I again share Martin's concerns about the use of the word "standard" in this document's abstract and introduction.
(Alissa Cooper; former steering group member) No Objection
Regarding this text: "The test SHOULD use one of the test setups described in section 3.1 or section 3.2 of this document in combination with Appendix A." Appendix A is titled "Example Test Topology." If it's really an example, then it seems like it should not be normatively required. So either the appendix needs to be re-named, or the normative language needs to be removed. And if it is normatively required, why is it in an appendix? The document would also benefit from describing what the exception cases to the SHOULD are (I guess if the tester doesn't care about having comparable results with other tests?).
(Benjamin Kaduk; former steering group member) No Objection
In the Abstract: This document defines the methodologies for benchmarking control plane performance of SDN controllers. Why "the" methodologies? That seems more authoritative than is appropriate in an Informational document. Why do we need the test setup diagrams in both the terminology draft and this one? It seems like there is some excess redundancy, here. In Section 4.1, how can we even have a topology with just one network device? This "at least 1" seems too low. Similarly, how would TP1 and TP2 *not* be connected to the same node if there is only one device? Thank you for adding consideration to key distribution in Section 4.4, as noted by the secdir review. But insisting on having key distribution done prior to testing gives the impression that keys are distributed once and updated never, which has questionable security properties. Perhaps there is value in doing some testing while rekeyeing is in progress? I agree with others that the statistical methodology is not clearly justified, such as the sample size of 10 in Section 4.7 (with no consideration for sample relative variance), use of sample vs. population veriance, etc. It seems like the measurements being described sometimes start the timer at an event at a network element and other times start the timer when a message enters the SDN controller itself (similarly for outgoing messages), which seems to include a different treatment of propagation delays in the network, for different tests. Assuming these differences were made by conscious choice, it might be nice to describe why the network propagation is/is not included for any given measurement. It looks like the term "Nxrn" is introduced implicitly and the reader is supposed to infer that the 'n' represents a counter, with Nrx1 corresponding to the first measurement, Nrx2 the second, etc. It's probably worth mentioning this explicitly, for all fields that are measured on a per-trial/counter basis. I'm not sure that the end condition for the test in Section 5.2.2 makes sense. It seems like the test in Section 5.2.3 should not allow flexibility in "unique source and/or destination address" and rather should specify exactly what happens. In Section 5.3.1, only considering 2% of asynchronous messages as invalid implies a preconception about what might be the reason for such invalid messages, but that assumption might not hold in the case of an active attack, which may be somewhat different from the pure DoS scenario considered in the following section. Section 5.4.1 says "with incremental sequence number and source address" -- are both the sequence number and source address incrementing for each packet sent? This could be more clear. It also is a little jarring to refer to "test traffic generator TP2" when TP2 is just receiving traffic and not generating it. Appendix B.3 indicates that plain TCP or TLS can be used for communications between switch and controller. It seems like this would be a highly relevant test parameter to report with the results for the tests described in this document, since TLS would introduce additional overhead to be quantified! The figure in Section B.4.5 leaves me a little confused as to what is being measured, if the SDN Application is depicted as just spontaneously installing a flow at some time vaguely related to traffic generation but not dependent on or triggered by the traffic generation.
(Deborah Brungard; former steering group member) No Objection
(Eric Rescorla; former steering group member) No Objection
Rich version of this review at: https://mozphab-ietf.devsvcdev.mozaws.net/D3948 COMMENTS > reported. > > 4.7. Test Repeatability > > To increase the confidence in measured result, it is recommended > that each test SHOULD be repeated a minimum of 10 times. Nit: you might be happier with "RECOMMENDED that each test be repeated ..." Also, where does 10 come from? Generally, the number of trials you need depends on the variance of each trial. > Test Reporting > > Each test has a reporting format that contains some global and > identical reporting components, and some individual components that > are specific to individual tests. The following test configuration > parameters and controller settings parameters MUST be reflected in This is an odd MUST, as it's not required for interop. > 5. Stop the trial when the discovered topology information matches > the deployed network topology, or when the discovered topology > information return the same details for 3 consecutive queries. > 6. Record the time last discovery message (Tmn) sent to controller > from the forwarding plane test emulator interface (I1) when the > trial completed successfully. (e.g., the topology matches). How large is the TD usually? How much does 3 seconds compare to that? > Total Trials > > SUM[SQUAREOF(Tri-TDm)] > Topology Discovery Time Variance (TDv) ---------------------- > Total Trials -1 > You probably don't need to specify individual formulas for mean and variance. However, you probably do want to explain why you are using the n-1 sample variance formula. > > Measurement: > > (R1-T1) + (R2-T2)..(Rn-Tn) > Asynchronous Message Processing Time Tr1 = ----------------------- > Nrx Incidentally, this formula is the same as \sum_i{R_i} - \sum_i{T_i} > messages transmitted to the controller. > > If this test is repeated with varying number of nodes with same > topology, the results SHOULD be reported in the form of a graph. The > X coordinate SHOULD be the Number of nodes (N), the Y coordinate > SHOULD be the average Asynchronous Message Processing Time. This is an odd metric because an implementation which handled overload by dropping every other message would look better than one which handled overload by queuing.
(Ignas Bagdonas; former steering group member) No Objection
The document seems to assume the OpenFlow dataplane abstraction model – which is one of the possible models; the practical applicability of such model to anything beyond experimental deployments is a completely separate question outside of the scope of this document. The methodology tends to apply to a broader set of central control based systems, and not only to the data plane operations – therefore the document seems to be setting at least something practically usable for benchmarking of such central control systems. Possibly the document could mention such assumptions made about the overall model where the methodology defined applies to. A nit: s/Khasanov Boris/Boris Khasanov, unless Boris himself would insist otherwise.
(Martin Vigoureux; former steering group member) No Objection
Hello, I have the same question/comment than on the companion document: I wonder about the use of the term "standard" in the abstract in view of the intended status of the document (Informational). Could the use of this word confuse the reader?
(Mirja Kühlewind; former steering group member) No Objection
Editorial comments: 1) sdn-controller-benchmark-term should probably rather be referred in the intro (instead of the abstract). 2) Is the test setup needed in both docs (this and sdn-controller-benchmark-term) or would a reference to sdn-controller-benchmark-term maybe be sufficient? 3) Appendix A.1 should probably also be moved to sdn-controller-benchmark-term
(Spencer Dawkins; former steering group member) No Objection
I have a few questions, at the No Objection level ... do the right thing, of course.
I apologize for attempting to play amateur statistician, but it seems to me that this text
4.7. Test Repeatability
To increase the confidence in measured result, it is recommended
that each test SHOULD be repeated a minimum of 10 times.
is recommending a heuristic, when I'd think that you'd want to repeat a test until the results seem to be converging on some measure of central tendency, given some acceptable margin of error, and this text
Procedure:
1. Establish the network connections between controller and network
nodes.
2. Query the controller for the discovered network topology
information and compare it with the deployed network topology
information.
3. If the comparison is successful, increase the number of nodes by 1
and repeat the trial.
If the comparison is unsuccessful, decrease the number of nodes by
1 and repeat the trial.
4. Continue the trial until the comparison of step 3 is successful.
5. Record the number of nodes for the last trial (Ns) where the
topology comparison was successful.
seems to beg for a binary search, especially if you're testing whether a controller can support a large number of controllers ...
This text
Reference Test Setup:
The test SHOULD use one of the test setups described in section 3.1
or section 3.2 of this document in combination with Appendix A.
or some variation is repeated about 16 times, and I'm not understanding why this is using BCP 14 language, and if BCP 14 language is the right thing to do, I'm not understanding why it's always SHOULD.
I get the part that this will help compare results, if two researchers are running the same tests. Is there more to the requirement than that?
In this text,
Procedure:
1. Perform the listed tests and launch a DoS attack towards
controller while the trial is running.
Note:
DoS attacks can be launched on one of the following interfaces.
a. Northbound (e.g., Query for flow entries continuously on
northbound interface)
b. Management (e.g., Ping requests to controller's management
interface)
c. Southbound (e.g., TCP SYN messages on southbound interface)
is there a canonical description of "DoS attack" that researchers should be using, in order to compare results? These are just examples, right?
Is the choice of
[OpenFlow Switch Specification] ONF,"OpenFlow Switch Specification"
Version 1.4.0 (Wire Protocol 0x05), October 14, 2013.
intentional? I'm googling that the current version of OpenFlow is 1.5.1, from 2015.
(Suresh Krishnan; former steering group member) No Objection
I share Ignas's concern about this being too tightly associated with the OpenFlow model. * Section 4.1 The test cases SHOULD use Leaf-Spine topology with at least 1 Network Device in the topology for benchmarking. How is it even possible to have a leaf-spine topology with one Network Device?
(Terry Manderson; former steering group member) No Objection