Skip to main content

Benchmarking Methodology for Network Security Device Performance
draft-ietf-bmwg-ngfw-performance-13

Discuss


Yes

Warren Kumari

No Objection


No Record

Andrew Alston
Francesca Palombini
John Scudder
Paul Wouters
Robert Wilton

Summary: Has 4 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.

Lars Eggert Discuss

Discuss (2022-02-03)
This document needs TSV and ART people to help with straightening out a lot of
issues related to TCP, TLS, and H1/2/3. Large parts of the document don't
correctly reflect the complex realities of what "HTTP" is these days (i.e.,
that we have H1 and H2 over either TCP or TLS, and H3 over only QUIC.) The
document is also giving unnecessarily detailed behavioral descriptions of TCP
and its parameters, while at the same time not being detailed enough about TLS,
H2 and esp. QUIC/H3. It feels like this stared out as an H1/TCP document that
was then incompletely extended to H2/H3.

Section 4.3.1.1. , paragraph 2, discuss:
>    The TCP stack SHOULD use a congestion control algorithm at client and
>    server endpoints.  The IPv4 and IPv6 Maximum Segment Size (MSS)
>    SHOULD be set to 1460 bytes and 1440 bytes respectively and a TX and
>    RX initial receive windows of 64 KByte.  Client initial congestion
>    window SHOULD NOT exceed 10 times the MSS.  Delayed ACKs are
>    permitted and the maximum client delayed ACK SHOULD NOT exceed 10
>    times the MSS before a forced ACK.  Up to three retries SHOULD be
>    allowed before a timeout event is declared.  All traffic MUST set the
>    TCP PSH flag to high.  The source port range SHOULD be in the range
>    of 1024 - 65535.  Internal timeout SHOULD be dynamically scalable per
>    RFC 793.  The client SHOULD initiate and close TCP connections.  The
>    TCP connection MUST be initiated via a TCP three-way handshake (SYN,
>    SYN/ACK, ACK), and it MUST be closed via either a TCP three-way close
>    (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK).

There are a lot of requirements in here that are either no-ops ("SHOULD use a
congestion control algorithm"), nonsensical ("maximum client delayed ACK SHOULD
NOT exceed 10 times the MSS") or under the sole control of the stack. This
needs to be reviewed and corrected by someone who understands TCP.
Comment (2022-02-03)
Section 1. , paragraph 2, comment:
>    18 years have passed since IETF recommended test methodology and
>    terminology for firewalls initially ([RFC3511]).  The requirements
>    for network security element performance and effectiveness have
>    increased tremendously since then.  In the eighteen years since

These sentences don't age well - rephrase without talking about particular
years?

Section 4.3.2.3. , paragraph 2, comment:
>    The server pool for HTTP SHOULD listen on TCP port 80 and emulate the
>    same HTTP version (HTTP 1.1 or HTTP/2 or HTTP/3) and settings chosen
>    by the client (emulated web browser).  The Server MUST advertise

An H3 server will not listen on TCP port 80. In general, the document needs to
be checked for the implicit assumption that HTTP sues TCP; there is text
throughout that is nonsensical for H3 (like this example).    The Server MUST
advertise

Section 6.3. , paragraph 6, comment:
>       The average number of successfully established TCP connections per
>       second between hosts across the DUT/SUT, or between hosts and the
>       DUT/SUT.  The TCP connection MUST be initiated via a TCP three-way
>       handshake (SYN, SYN/ACK, ACK).  Then the TCP session data is sent.
>       The TCP session MUST be closed via either a TCP three-way close
>       (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK),
>       and MUST NOT by RST.

This prohibits TCP fast open, why? Also, wouldn't it be enough to say that the
connection needs to not abnormally reset, rather than describing the TCP packet
sequences that are acceptable? Given that those are not the only possible
sequences, c.f., loss and reordering.

Section 6.3. , paragraph 6, comment:
>       The average number of successfully completed transactions per
>       second.  For a particular transaction to be considered successful,
>       all data MUST have been transferred in its entirety.  In case of
>       HTTP(S) transactions, it MUST have a valid status code (200 OK),
>       and the appropriate FIN, FIN/ACK sequence MUST have been
>       completed.

H3 doesn't do FIN/ACK, etc. See above.

Section 7.1.3.4. , paragraph 4, comment:
>    a.  Number of failed application transactions (receiving any HTTP
>        response code other than 200 OK) MUST be less than 0.001% (1 out
>        of 100,000 transactions) of total attempted transactions.
>
>    b.  Number of Terminated TCP connections due to unexpected TCP RST
>        sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
>        connections) of total initiated TCP connections.

Why is a 0.001% failure rate deemed acceptable? (Also elsewhere.)

Section 7.2.1. , paragraph 2, comment:
>    Using HTTP traffic, determine the sustainable TCP connection
>    establishment rate supported by the DUT/SUT under different
>    throughput load conditions.

H3 doesn't do TCP.

Section 7.2.3.2. , paragraph 9, comment:
>    The client SHOULD negotiate HTTP and close the connection with FIN
>    immediately after completion of one transaction.  In each test
>    iteration, client MUST send GET request requesting a fixed HTTP
>    response object size.

H3 doesn't do TCP FIN.

Section 7.2.3.3. , paragraph 6, comment:
>    c.  During the sustain phase, traffic SHOULD be forwarded at a
>        constant rate (considered as a constant rate if any deviation of
>        traffic forwarding rate is less than 5%).

What does this mean? How would traffic NOT be forwarded at a constant rate?

Section 7.2.3.3. , paragraph 5, comment:
>    d.  Concurrent TCP connections MUST be constant during steady state
>        and any deviation of concurrent TCP connections SHOULD be less
>        than 10%. This confirms the DUT opens and closes TCP connections
>        at approximately the same rate.

What does it mean for a TCP connection to be constant?

Section 7.4.1. , paragraph 4, comment:
>    Scenario 1: The client MUST negotiate HTTP and close the connection
>    with FIN immediately after completion of a single transaction (GET
>    and RESPONSE).

H3 sessions don't send TCP FINs. (Also elsewhere.)

Section 7.7. , paragraph 1, comment:
> 7.7.  HTTPS Throughput

Is this HTTPS as in H1, H2 or H3? All of the above?

Found terminology that should be reviewed for inclusivity; see
https://www.rfc-editor.org/part2/#inclusive_language for background and more
guidance:

 * Term "dummy"; alternatives might be "placeholder", "sample", "stand-in",
   "substitute".

Thanks to Matt Joras for their General Area Review Team (Gen-ART) review
(https://mailarchive.ietf.org/arch/msg/gen-art/NUycZt5uKAZejOvCr6tdi_7SvPA).

-------------------------------------------------------------------------------
All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 4.1. , paragraph 8, nit:
>  actively inspected by the DUT/SUT. Also "Fail-Open" behavior MUST be disable
>                                     ^^^^
A comma may be missing after the conjunctive/linking adverb "Also".

Section 4.2. , paragraph 9, nit:
> security vendors implement ACL decision making.) The configured ACL MUST NOT
>                                ^^^^^^^^^^^^^^^
The noun "decision-making" (= the process of deciding something) is spelled
with a hyphen.

Section 4.2.1. , paragraph 1, nit:
>  the MSS. Delayed ACKs are permitted and the maximum client delayed ACK SHOUL
>                                     ^^^^
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).

Section 4.3.1.3. , paragraph 3, nit:
>  the MSS. Delayed ACKs are permitted and the maximum server delayed ACK MUST
>                                     ^^^^
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).

Section 4.3.1.3. , paragraph 4, nit:
> IPv6 with a ratio identical to the clients distribution ratio. Note: The IAN
>                                    ^^^^^^^
An apostrophe may be missing.

Section 4.3.3.1. , paragraph 2, nit:
> S throughput performance test with smallest object size. 3. Ensure that any
>                                    ^^^^^^^^
A determiner may be missing.

Section 6.1. , paragraph 19, nit:
> sion with a more specific Kbit/s in parenthesis. * Time to First Byte (TTFB)
>                                  ^^^^^^^^^^^^^^
Did you mean "in parentheses"? "parenthesis" is the singular.

Section 7.5.3. , paragraph 2, nit:
> s and key strengths as well as forward looking stronger keys. Specific test
>                                ^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 7.5.4.2. , paragraph 3, nit:
> SHOULD NOT be reported, if the above mentioned KPI (especially inspected thro
>                                ^^^^^^^^^^^^^^^
The adjective "above-mentioned" is spelled with a hyphen.

Section 7.6.1. , paragraph 4, nit:
> s and key strengths as well as forward looking stronger keys. Specific test
>                                ^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 7.9.3.4. , paragraph 1, nit:
> * Accuracy of DUT/SUT statistics in term of vulnerabilities reporting A.2. T
>                                  ^^^^^^^^^^
Did you mean the commonly used phrase "in terms of"?

Section 7.9.4. , paragraph 2, nit:
> tected attack traffic MUST be dropped and the session SHOULD be reset A.3.2.
>                                      ^^^^
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).

Murray Kucherawy Discuss

Discuss (2022-02-02)
I may be wandering into unfamiliar territory here, i.e., how benchmarking specs are typically written, but this is sufficiently confusing that I'd like to discuss it.

I note that RFC 3511, which this document obsoletes, didn't cite RFC 2119 (BCP 14) but rather defined those same key words on its own.  Then it used SHOULD rather liberally, in a way that seems kind of peculiar to me (especially compared to the text of Section 6 of RFC 2119).  Do any of them matter to the outcome of the benchmark being constructed or executed?  If so and they would spoil the test, shouldn't they be MUSTs?  If not, why include them?  Or in the alternative, why might I, as someone setting up a test, legitimately do something contrary to the SHOULD in each case (which "SHOULD" expressly permits)?

This document does cite BCP 14 directly, and then seems to take that curious pattern to the next level.  Among the 130+ SHOULDs in here, I'm particularly confused by stuff like this in Section 4.3.1:

   This section specifies which parameters SHOULD be considered while
   configuring clients using test equipment.  

I have no idea what this means to the test.  If I've simply thought about these parameters, have I met the burden here?

This in Section 4.3.1.1 ("TCP Stack Attributes") seems an odd thing to have to stipulate:

  The client SHOULD initiate and close TCP connections.

Then Section 7.1.3, which contains subsections about each of the test parameters for the benchmark described in Section 7.1, consists of this text:

   In this section, the benchmarking test specific parameters SHOULD be
   defined.

As I read it, this is a self-referential SHOULD about this document!  I'm very confused.  This happens again in Section 7.2.3, 7.3.3, etc., up to 7.9.3, and even Appendix A.3.  I think in each case you just want:

   This section defines test-specific parameters for this benchmark.
Comment (2022-02-02)
Nits not yet mentioned by others:

Section 4.2:

* "... users SHOULD configure their device ..." -- s/device/devices/ (unless all users share one device)

Section 6.3:

* "The value SHOULD be expressed in millisecond." -- s/millisecond/milliseconds/

Roman Danyliw Discuss

Discuss (2022-02-02)
** A key element of successfully running the throughput tests described in Section 7, appears to be ensuring how to configure the device under test.  Section 4.2. helpfully specifies feature sets with recommendations configurations.  However, it appears there are elements of under-specification given the level of detail specified with normative language.  Specifically:

-- Section 4.2.1 seems unspecified regarding all the capabilities in Table 1 and 2.  The discussion around vulnerabilities (CVEs) does not appear to be relevant to configuration of anti-spyware, anti-virus, anti-botnet, DLP, and DDOS.  

-- Recognizing that NGFW, NGIPS and UTM are not precise product categories, offerings in this space commonly rely on statistical models or AI techniques (e.g., machine learning) to improve detection rates and reduce false positives to realize the capabilities in Table 1 and 2.  If even possible, how should these settings be tuned?  How should the training period be handled when describing the steps of the test regime (e.g., in Section 4.3.4? Section 7.2.4?)

** Appendix A.  The KPI measures don’t seem precise here – CVEs are unlikely to be the measure seen on the wire.  Wouldn’t it be exploits associated with a particular vulnerability (that’s numbered via CVE)?  There can be a one-to-many relationship between the vulnerability and exploits (e.g., multiple products affected by a single CVE); or the multiple implementations of an exploit.
Comment (2022-02-02)
** Abstract.  NGFW, NGIPS and UTM are fuzzy product categories.  Do you want to them somewhere?  How do they differ in functionality?  UTM is mentioned here, but not again in the document.

** Section 1.
The requirements
   for network security element performance and effectiveness have
   increased tremendously since then.  In the eighteen years since
   [RFC3511] was published, recommending test methodology and
   terminology for firewalls, requirements and expectations for network
   security elements has increased tremendously.  

I don’t follow how the intent of these two sentences is different.  Given the other text in this paragraph, these sentences also appear redundant.

** Section 3. Per “This document focuses on advanced, …”, what makes a testing method “advanced”?

** Section 4.2.  The abstract said that testing for NGFW, NGIPS and UTM would be provided.  This section is silent on UTM.

** Section 4.2.  Should the following additional features be noted as a feature of NGFWs and NGIPS (Tables 1 and 2)?

-- reconnaissance detection

-- geolocation or network topology-based classification/filtering

** Section 4.2. Thanks for the capability taxonomies describe here.  Should it be noted that “Table 1 and 2 are approximate taxonomies of features commonly found in currently deployed NGFW and NGIDS.  The features provided by specific implementations may be named differently and not necessarily have configuration settings that align to the taxonomy.”

** Table 1.  Is there a reason that DPI and Anti-Evasion (listed in Table 2 for NGIPS) are not mentioned here (for NGFW).  I don’t see how many (all?) of the features listed as RECOMMENDED could be done without it.

** Table 3.  For Anti-Botnet, should it read “detects and blocks”?

** Table 3.  For Web Filtering, is this scoped to be classification and threat detection by URI?

** Table 3.  This table is missing a description for DoS from Table 1 and DPI and Anti-Evasion from Table 2.

** Section 4.2.  Per “Logging SHOULD be enabled.”  How does this “SHOULD” align with “logging and reporting” being a RECOMMENDED in Table 1 and 2?  Same question on “Application Identification and Control SHOULD be configured”

** Section 4.3.1.1.  Why is such well-formed and well-behaved traffic assumed for a security device?

** Section 4.3.1.  What cipher suites should be used for TLS 1.3 based tests? The text is prescriptive for TLS 1.2 (using a RECOMMEND) but simply restates all of those registered by RFC8446.

** Section 9.  Given that the configurations of these test will include working exploits, it would be helpful to provide a reminder on the need control access to them.

** Section A.1.
In parallel, the CVEs will be sent to the DUT/SUT as
   encrypted and as well as clear text payload formats using a traffic
   generator.  

This guidance doesn’t seem appropriate for all cases.  Couldn’t the vulnerability being exploited involve a payload in the unencrypted part or a phase in the communication exchange before a secure channel is negotiated?

** Editorial nits
-- Section 1.  Editorial. s/for firewalls initially/for firewalls/

-- Section 5.  Typo. s/as test equipments/as test equipment/

Éric Vyncke Discuss

Discuss (2022-02-03)
Thank you for the work put into this document. 

Please find below one blocking DISCUSS points (probably easy to address but really important), some non-blocking COMMENT points (but replies would be appreciated even if only for my own education).

Thanks to Toerless for his deep and detailed IoT directorate review, I have seen as well that the authors are engaged in email discussions on this review:
https://datatracker.ietf.org/doc/review-ietf-bmwg-ngfw-performance-13-iotdir-telechat-eckert-2022-01-30/

Special thanks to Al Morton for the shepherd's write-up including the section about the WG consensus. 

I hope that this helps to improve the document,

Regards,

-éric


# DISCUSS

As noted in https://www.ietf.org/blog/handling-iesg-ballot-positions/, a DISCUSS ballot is a request to have a discussion on the following topics

The document obsoletes RFC 3511, but it does not include any performance testing of IP fragmentation (which RFC 3511 did), which is AFAIK still a performance/evasion problem. What was the reason for this lack of IP fragmentation support ? At the bare minimum, there should be some text explaining why IP fragmentation can be ignored.
Comment (2022-02-03)
One generic comment about the lack of testing with IPv6 extension headers as they usually reduce the performance (even for NGFW/NGIPS). There should be some words about this lack of testing.

## Section 4.1

Please always use "ARP/ND" rather than "ARP".

## Section 4.2

Any reason why "SSL" is used rather than "TLS" ?

Suggest to replace "IP subnet" by "IP prefix".

## Section 4.3.1.2 (and other sections)

"non routable Private IPv4 address ranges" unsure what it is ? RFC 1918 addresses are routable albeit private, or is it about link-local IPv4 address ? 169.254.0.0/16 or 198.18.0.0/15 ?

## Section 4.3.1.3

Suggest to add a date information (e.g., 2022) in the sentence "The above ciphers and keys were those commonly used enterprise grade encryption cipher suites for TLS 1.2".

In "[RFC8446] defines the following cipher suites for use with TLS 1.3." is this about a SHOULD or a MUST ?

## Section 6.1

In "Results SHOULD resemble a pyramid in how it is reported" I have no clue how a report could resemble a pyramid. Explanations/descriptions are welcome in the text.

## Section 7.8.4 (and other sections)

In "This test procedure MAY be repeated multiple times with different IP types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic distribution)" should it be a "SHOULD" rather than a "MAY" ?

Warren Kumari Yes

Alvaro Retana No Objection

Comment (2022-02-02)
The datatracker should indicate that this document replaces draft-balarajah-bmwg-ngfw-performance.

Erik Kline No Objection

Comment (2022-02-02)
[throughout; comment]

* In all sections describing Configuration Parameters, both Client and
  Server "IP address range" is mentioned in the singular.  I think
  appropriate s/range/ranges/ might make sense.

Martin Duke No Objection

Comment (2022-02-01)
(4.3.1.3) RFC8446 is not the reference for HTTP/2.

(4.3.1.1), (4.3.2.1) Is there a reason that delayed ack limits are defined only in terms of number of bytes, instead of time? What if an HTTP request (for example) ends, and the delayed ack is very long? Note also that the specification for delayed acks limits it to every two packets, although in the real world many endpoints use much higher thresholds. [It's OK to keep it at 10*MSS if you prefer].

(4.3.3.1) What is a "TCP persistence stack"?

Zaheduzzaman Sarker No Objection

Comment (2022-02-03)
Thanks for the efforts on this specification. I have been part of writing two testcase documents for real-time congestion control algorithms and understand getting things in a reasonable shape is hard. 

I have similar observation as Murray and Eric when it comes to obsoleting the previous specification. Hence supporting their discusses.

Some more comments/questions below -

  * Section 5 : what is "packet loss latency" metric? where is it defined? how do I measure?

  * Traffic profile is missing in all the benchmark test which is a MUST to have. If this is intentional then a rational need to be added.

  * Section 7.3 and 7.7 : The HTTP throughput will look different not only because of object size but also how often the request are sent. If the requests are sent all at once the resulted throughput may look like a long file download and if they are sparse then they will look small downloads in a sparse timeline. Here, it is not clear to me what is the intention. Again the traffic profile is missing and I am started to think that Section 7.1.3.3 might be part of Section 7.1.3.2.

  * Section 7.4 and 7.8 : I can have similar view as per my comment on Section 7.3. This is not clear to me that only object size matter here on the latency.

Andrew Alston No Record

Francesca Palombini No Record

John Scudder No Record

Paul Wouters No Record

Robert Wilton No Record

(Benjamin Kaduk; former steering group member) (was Discuss) No Objection

No Objection (2022-03-19)
[Updated to remove my Discuss point, as my colleagues have convinced me
that my concern was not reasonable]

I support Roman's Discuss (which you have already begun to resolve, thank
you).

Perhaps it is time to retire the term "SSL" in favor of the current
protocol name, "TLS".

Section 4.1

   In some deployment scenarios, the network security devices (Device
   Under Test/System Under Test) are connected to routers and switches,
   which will reduce the number of entries in MAC or ARP tables of the
   Device Under Test/System Under Test (DUT/SUT).  If MAC or ARP tables
   have many entries, this may impact the actual DUT/SUT performance due
   to MAC and ARP/ND (Neighbor Discovery) table lookup processes.  This

I understand the motivation for benchmarking the maximum performance from
the device under controlled circumstances, but it also seems that if a
device really will exhibit degraded performance due to the number of
entries in its MAC/ARP table, that would be useful information to have.
Perhaps a remark about how future work could include repeating
benchmarking results with different numbers of other devices on the local
network segment is in order.

Section 4.2

   Table 1 and Table 2 below describe the RECOMMENDED and OPTIONAL sets
   of network security feature list for NGFW and NGIPS respectively.

I agree with the IoTdir reviewer that Certificate Validation should surely
be a recommended feature for NGFWs.  But see also the DISCUSS point.

    | SSL Inspection | DUT/SUT intercepts and decrypts inbound HTTPS  |
    |                | traffic between servers and clients.  Once the |
    |                | content inspection has been completed, DUT/SUT |
    |                | encrypts the HTTPS traffic with ciphers and    |
    |                | keys used by the clients and servers.          |

This description could stand to be more clear, especially in light of the
fundamental differences between TLS 1.2 and TLS 1.3.
First, the description starts off with "intercepts and decrypts" and then
goes on to say that once inspection is over, the DUT/SUT "encrypts the
HTTPS traffic".  Does this mean that the DUT/SUT specifically needs to
re-encrypt after decrypting, or is it permissible to retain the original
ciphertext and just relay that ciphertext onward?
Second, in TLS 1.3, it is by construction impossible for a single set of
traffic encryption keys to be shared by all three of client, server, and
DUT/SUT -- RSA key transport is forbidden and ephemeral key exchange is
required.  In order to perform content inspection, such a middlebox needs
to be able to impersonate the server to the client (i.e., holding a
certificate and private key that is trusted by the client and represents
the identity of the real server, which is expected to require specific
configuration on the client to enable) and complete separate TLS
connections to client and to server.  In this scenario the middlebox must
remain as a "machine in the middle" for the duration of the entire
connection and decrypt/reencrypt all content using the different keys for
the client/middlebox and middlebox/server connections.

   *  Geographical location filtering, and Application Identification
      and Control SHOULD be configured to trigger based on a site or
      application from the defined traffic mix.

Do we have a sense for how sensitive the performance results are going to
be with respect to the proportion of traffic that triggers these classes
of filtering/control?  Would it be appropriate to require that this
breakdown be included in the report?

Section 4.3.1.3

   validation.  Depending on test scenarios and selected HTTP version,
   HTTP header compression MAY be set to enable or disable.  This

I didn't think it was possible to fully disable header compression for
HTTP/2 and HTTP/3 (just to set the dynamic table size to zero).

   [RFC8446] defines the following cipher suites for use with TLS 1.3.
   [...]

TLS_AES_128_CCM_8_SHA256 is marked as Recommended=N in the registry; I
think we should indicate that there is little need to benchmark it except
for those special circumstances where the cipher is appropriate.
Even TLS_AES_128_CCM_SHA256 (with full-length authentication tag) is
mostly only going to be used in IoT environments and is likely not needed
for a target of "enterprise grade encryption cipher suites".

Section 4.3.2.3

   server, TLS 1.2 or higher MUST be used with a maximum record size of
   16 KByte and MUST NOT use ticket resumption or session ID reuse.  The

Why is TLS resumption prohibited?
(As a technical matter, TLS 1.3 resumption uses a different mechanism than
the two TLS 1.2 resumption mechanisms, and it may be prudent to
specifically note whether TLS 1.3 resumption is also forbidden.)

   server SHALL serve a certificate to the client.  The HTTPS server
   MUST check host SNI information with the FQDN if SNI is in use.

What does "check host SNI information with the FQDN" mean?  Where is the
FQDN in question obtained from?  (In §4.3.3.1 we say that the proposed
(SNI) FQDN is compared to "the domain embedded in the certificate".  Note
that, of course, the certificate can contain more than one domain name,
e.g., via the now-quite-common use of subjectAltName.)

Section 6.1

       e.  Key test parameters

           *  Used cipher suites and keys

Do we really need to report the specific *keys* used (as opposed to
cryptographic parameters of the TLS connection like the group used for key
exchange, algorithm and key size of the server certificate, etc.)?

           *  Percentage of encrypted traffic and used cipher suites and
              keys (The RECOMMENDED ciphers and keys are defined in
              Section 4.3.1.3)

For what it's worth, trends in generic web traffic are rapidly converging
towards near-universal HTTPS usage.  I am not really sure that measuring
unencrypted traffic is going to be very interesting to many users (though
I concede that some will still be using it and find the corresponding
benchmarking results useful).

Section 7.3.3.2

   RECOMMENDED HTTP response object size: 1, 16, 64, 256 KByte, and
   mixed objects defined in Table 4.

With the explosion of video use on the modern Web, it might be worth
revisiting these recommended object sizes.  Is there likely to be value in
having very large objects for any of the tests?

Section 7.6.1, 7.7.1

   Test iterations MUST include common cipher suites and key strengths
   as well as forward looking stronger keys.  Specific test iterations

How/where would an implementor obtain more guidance on "common cipher suites
and key strengths" and "forward looking stronger keys"?  (With
understanding that this guidance will change over time and cannot be
permanently enshrined in this RFC-to-be.)

(Why does Section 7.8.1 not have similar language?)

Section 9

Hmm, I thought we typically had some language about how if the
benchmarking techniques specified in this document were used outside a
laboratory isolated test environment, security and other risks could arise
(e.g., due to DoS of nearby nodes/services).

Appendix A

I agree with Roman that the text around "CVEs" is imprecise and should be
talking about exploits that are identified by CVEs.