Benchmarking Methodology for Software-Defined Networking (SDN) Controller PerformanceRFC 8456

Note: This ballot was opened for revision 08 and is now closed.

Ignas Bagdonas No Objection

Comment (2018-04-19 for -08) No emailsend info
The document seems to assume the OpenFlow dataplane abstraction model – which is one of the possible models; the practical applicability of such model to anything beyond experimental deployments is a completely separate question outside of the scope of this document. The methodology tends to apply to a broader set of central control based systems, and not only to the data plane operations – therefore the document seems to be setting at least something practically usable for benchmarking of such central control systems. Possibly the document could mention such assumptions made about the overall model where the methodology defined applies to.

A nit: s/Khasanov Boris/Boris Khasanov, unless Boris himself would insist otherwise.

Alissa Cooper No Objection

Comment (2018-04-18 for -08) No emailsend info
Regarding this text:

"The test SHOULD use one of the test setups described in section 3.1
or section 3.2 of this document in combination with Appendix A."

Appendix A is titled "Example Test Topology." If it's really an example, then it seems like it should not be normatively required. So either the appendix needs to be re-named, or the normative language needs to be removed. And if it is normatively required, why is it in an appendix? The document would also benefit from describing what the exception cases to the SHOULD are (I guess if the tester doesn't care about having comparable results with other tests?).

(Spencer Dawkins) No Objection

Comment (2018-04-16 for -08) No emailsend info
I have a few questions, at the No Objection level ... do the right thing, of course.

I apologize for attempting to play amateur statistician, but it seems to me that this text

4.7. Test Repeatability

To increase the confidence in measured result, it is recommended
that each test SHOULD be repeated a minimum of 10 times.

is recommending a heuristic, when I'd think that you'd want to repeat a test until the results seem to be converging on some measure of central tendency, given some acceptable margin of error, and this text

Procedure:

1. Establish the network connections between controller and network
nodes.
2. Query the controller for the discovered network topology
information and compare it with the deployed network topology
information.
3. If the comparison is successful, increase the number of nodes by 1
and repeat the trial.
If the comparison is unsuccessful, decrease the number of nodes by
1 and repeat the trial.
4. Continue the trial until the comparison of step 3 is successful.
5. Record the number of nodes for the last trial (Ns) where the
topology comparison was successful.

seems to beg for a binary search, especially if you're testing whether a controller can support a large number of controllers ...

This text

Reference Test Setup:

The test SHOULD use one of the test setups described in section 3.1
or section 3.2 of this document in combination with Appendix A.

or some variation is repeated about 16 times, and I'm not understanding why this is using BCP 14 language, and if BCP 14 language is the right thing to do, I'm not understanding why it's always SHOULD.

I get the part that this will help compare results, if two researchers are running the same tests. Is there more to the requirement than that?

In this text,

Procedure:

1. Perform the listed tests and launch a DoS attack towards
controller while the trial is running.

Note:

DoS attacks can be launched on one of the following interfaces.

a. Northbound (e.g., Query for flow entries continuously on
northbound interface)
b. Management (e.g., Ping requests to controller's management
interface)
c. Southbound (e.g., TCP SYN messages on southbound interface)

is there a canonical description of "DoS attack" that researchers should be using, in order to compare results? These are just examples, right?

Is the choice of

[OpenFlow Switch Specification]  ONF,"OpenFlow Switch Specification"
Version 1.4.0 (Wire Protocol 0x05), October 14, 2013.

intentional? I'm googling that the current version of OpenFlow is 1.5.1, from 2015.

Comment (2018-04-19 for -08) No emailsend info
In the Abstract:

This document defines the methodologies for benchmarking control
plane performance of SDN controllers.

Why "the" methodologies?   That seems more authoritative than is
appropriate in an Informational document.

Why do we need the test setup diagrams in both the terminology draft
and this one?  It seems like there is some excess redundancy, here.

In Section 4.1, how can we even have a topology with just one
network device?  This "at least 1" seems too low.  Similarly, how
would TP1 and TP2 *not* be connected to the same node if there is
only one device?

Thank you for adding consideration to key distribution in Section
4.4, as noted by the secdir review.  But insisting on having key
distribution done prior to testing gives the impression that keys
are distributed once and updated never, which has questionable
security properties.  Perhaps there is value in doing some testing
while rekeyeing is in progress?

I agree with others that the statistical methodology is not clearly
justified, such as the sample size of 10 in Section 4.7 (with no
consideration for sample relative variance), use of sample vs.
population veriance, etc.

It seems like the measurements being described sometimes start the
timer at an event at a network element and other times start the
timer when a message enters the SDN controller itself (similarly for
outgoing messages), which seems to include a different treatment of
propagation delays in the network, for different tests.  Assuming
these differences were made by conscious choice, it might be nice to
describe why the network propagation is/is not included for any
given measurement.

It looks like the term "Nxrn" is introduced implicitly and the
reader is supposed to infer that the 'n' represents a counter, with
Nrx1 corresponding to the first measurement, Nrx2 the second, etc.
It's probably worth mentioning this explicitly, for all fields that
are measured on a per-trial/counter basis.

I'm not sure that the end condition for the test in Section 5.2.2
makes sense.

It seems like the test in Section 5.2.3 should not allow flexibility
in "unique source and/or destination address" and rather should
specify exactly what happens.

In Section 5.3.1, only considering 2% of asynchronous messages as
invalid implies a preconception about what might be the reason for
such invalid messages, but that assumption might not hold in the
case of an active attack, which may be somewhat different from the
pure DoS scenario considered in the following section.

Section 5.4.1 says "with incremental sequence number and source
incrementing for each packet sent?  This could be more clear.
It also is a little jarring to refer to "test traffic generator TP2"
when TP2 is just receiving traffic and not generating it.

Appendix B.3 indicates that plain TCP or TLS can be used for
communications between switch and controller.  It seems like this
would be a highly relevant test parameter to report with the results
for the tests described in this document, since TLS would introduce

The figure in Section B.4.5 leaves me a little confused as to what
is being measured, if the SDN Application is depicted as just
spontaneously installing a flow at some time vaguely related to
traffic generation but not dependent on or triggered by the traffic
generation.

Suresh Krishnan No Objection

Comment (2018-04-19 for -08) No emailsend info
I share Ignas's concern about this being too tightly associated with the OpenFlow model.

* Section 4.1
The test cases SHOULD use Leaf-Spine topology with at least 1
Network Device in the topology for benchmarking.

How is it even possible to have a leaf-spine topology with one Network Device?

Mirja Kühlewind No Objection

Comment (2018-04-18 for -08) No emailsend info
Editorial comments:

1) sdn-controller-benchmark-term should probably rather be referred in the intro (instead of the abstract).

2) Is the test setup needed in both docs (this and sdn-controller-benchmark-term) or would a reference to sdn-controller-benchmark-term maybe be sufficient?

3) Appendix A.1 should probably also be moved to sdn-controller-benchmark-term

(Eric Rescorla) No Objection

Comment (2018-04-17 for -08) No emailsend info
Rich version of this review at:
https://mozphab-ietf.devsvcdev.mozaws.net/D3948

>      reported.
>
>   4.7. Test Repeatability
>
>      To increase the confidence in measured result, it is recommended
>      that each test SHOULD be repeated a minimum of 10 times.

Nit: you might be happier with "RECOMMENDED that each test be repeated
..."

Also, where does 10 come from? Generally, the number of trials you
need depends on the variance of each trial.

>      Test Reporting
>
>      Each test has a reporting format that contains some global and
>      identical reporting components, and some individual components that
>      are specific to individual tests. The following test configuration
>      parameters and controller settings parameters MUST be reflected in

This is an odd MUST, as it's not required for interop.

>      5. Stop the trial when the discovered topology information matches
>        the deployed network topology, or when the discovered topology
>        information return the same details for 3 consecutive queries.
>      6. Record the time last discovery message (Tmn) sent to controller
>        from the forwarding plane test emulator interface (I1) when the
>        trial completed successfully. (e.g., the topology matches).

How large is the TD usually? How much does 3 seconds compare to that?

>                                                   Total Trials
>
>                                              SUM[SQUAREOF(Tri-TDm)]
>      Topology Discovery Time Variance (TDv)  ----------------------
>                                                  Total Trials -1
>

You probably don't need to specify individual formulas for mean and
variance. However, you probably do want to explain why you are using
the n-1 sample variance formula.

>
>   Measurement:
>
>                                              (R1-T1) + (R2-T2)..(Rn-Tn)
>      Asynchronous Message Processing Time Tr1 = -----------------------
>                                                          Nrx

Incidentally, this formula is the same as \sum_i{R_i} - \sum_i{T_i}

>      messages transmitted to the controller.
>
>      If this test is repeated with varying number of nodes with same
>      topology, the results SHOULD be reported in the form of a graph. The
>      X coordinate SHOULD be the Number of nodes (N), the Y coordinate
>      SHOULD be the average Asynchronous Message Processing Time.

This is an odd metric because an implementation which handled overload
by dropping every other message would look better than one which
handled overload by queuing.

Alvaro Retana No Objection

I again share Martin's concerns about the use of the word "standard" in this document's abstract and introduction.
Hello,
Could the use of this word confuse the reader?