Skip to main content

Benchmarking Methodology for Network Security Device Performance
draft-ietf-bmwg-ngfw-performance-15

Note: This ballot was opened for revision 13 and is now closed.

Warren Kumari
Yes
Éric Vyncke
(was Discuss) Yes
Comment (2022-09-12 for -14) Sent
Thank you for the work put into this document and for addressing my previous DISCUSS and my previous COMMENT. They are kept below only for archiving purpose.

Thanks to Toerless for his deep and detailed IoT directorate review, I have seen as well that the authors are engaged in email discussions on this review:
https://datatracker.ietf.org/doc/review-ietf-bmwg-ngfw-performance-13-iotdir-telechat-eckert-2022-01-30/

Special thanks to Al Morton for the shepherd's write-up including the section about the WG consensus. 

I hope that this helps to improve the document,

Regards,

-éric


# previous DISCUSS for archiving

As noted in https://www.ietf.org/blog/handling-iesg-ballot-positions/, a DISCUSS ballot is a request to have a discussion on the following topics

The document obsoletes RFC 3511, but it does not include any performance testing of IP fragmentation (which RFC 3511 did), which is AFAIK still a performance/evasion problem. What was the reason for this lack of IP fragmentation support ? At the bare minimum, there should be some text explaining why IP fragmentation can be ignored.

# previous COMMENT for archiving

One generic comment about the lack of testing with IPv6 extension headers as they usually reduce the performance (even for NGFW/NGIPS). There should be some words about this lack of testing.

## Section 4.1

Please always use "ARP/ND" rather than "ARP".

## Section 4.2

Any reason why "SSL" is used rather than "TLS" ?

Suggest to replace "IP subnet" by "IP prefix".

## Section 4.3.1.2 (and other sections)

"non routable Private IPv4 address ranges" unsure what it is ? RFC 1918 addresses are routable albeit private, or is it about link-local IPv4 address ? 169.254.0.0/16 or 198.18.0.0/15 ?

## Section 4.3.1.3

Suggest to add a date information (e.g., 2022) in the sentence "The above ciphers and keys were those commonly used enterprise grade encryption cipher suites for TLS 1.2".

In "[RFC8446] defines the following cipher suites for use with TLS 1.3." is this about a SHOULD or a MUST ?

## Section 6.1

In "Results SHOULD resemble a pyramid in how it is reported" I have no clue how a report could resemble a pyramid. Explanations/descriptions are welcome in the text.

## Section 7.8.4 (and other sections)

In "This test procedure MAY be repeated multiple times with different IP types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic distribution)" should it be a "SHOULD" rather than a "MAY" ?
Erik Kline
No Objection
Comment (2022-02-02 for -13) Not sent
[throughout; comment]

* In all sections describing Configuration Parameters, both Client and
  Server "IP address range" is mentioned in the singular.  I think
  appropriate s/range/ranges/ might make sense.
Murray Kucherawy
(was Discuss) No Objection
Comment (2022-10-27) Sent
Thanks for handling my DISCUSS about all the normative language used here.

Nits not yet mentioned by others:

Section 4.2:

* "... users SHOULD configure their device ..." -- s/device/devices/ (unless all users share one device)

Section 6.3:

* "The value SHOULD be expressed in millisecond." -- s/millisecond/milliseconds/
Roman Danyliw
(was Discuss) No Objection
Comment (2022-10-22) Sent for earlier
Thanks for addressing my DISCUSS and COMMENT feedback.
Zaheduzzaman Sarker
No Objection
Comment (2022-02-03 for -13) Sent
Thanks for the efforts on this specification. I have been part of writing two testcase documents for real-time congestion control algorithms and understand getting things in a reasonable shape is hard. 

I have similar observation as Murray and Eric when it comes to obsoleting the previous specification. Hence supporting their discusses.

Some more comments/questions below -

  * Section 5 : what is "packet loss latency" metric? where is it defined? how do I measure?

  * Traffic profile is missing in all the benchmark test which is a MUST to have. If this is intentional then a rational need to be added.

  * Section 7.3 and 7.7 : The HTTP throughput will look different not only because of object size but also how often the request are sent. If the requests are sent all at once the resulted throughput may look like a long file download and if they are sparse then they will look small downloads in a sparse timeline. Here, it is not clear to me what is the intention. Again the traffic profile is missing and I am started to think that Section 7.1.3.3 might be part of Section 7.1.3.2.

  * Section 7.4 and 7.8 : I can have similar view as per my comment on Section 7.3. This is not clear to me that only object size matter here on the latency.
Alvaro Retana Former IESG member
No Objection
No Objection (2022-02-02 for -13) Sent
The datatracker should indicate that this document replaces draft-balarajah-bmwg-ngfw-performance.
Benjamin Kaduk Former IESG member
(was Discuss) No Objection
No Objection (2022-03-19 for -13) Sent for earlier
[Updated to remove my Discuss point, as my colleagues have convinced me
that my concern was not reasonable]

I support Roman's Discuss (which you have already begun to resolve, thank
you).

Perhaps it is time to retire the term "SSL" in favor of the current
protocol name, "TLS".

Section 4.1

   In some deployment scenarios, the network security devices (Device
   Under Test/System Under Test) are connected to routers and switches,
   which will reduce the number of entries in MAC or ARP tables of the
   Device Under Test/System Under Test (DUT/SUT).  If MAC or ARP tables
   have many entries, this may impact the actual DUT/SUT performance due
   to MAC and ARP/ND (Neighbor Discovery) table lookup processes.  This

I understand the motivation for benchmarking the maximum performance from
the device under controlled circumstances, but it also seems that if a
device really will exhibit degraded performance due to the number of
entries in its MAC/ARP table, that would be useful information to have.
Perhaps a remark about how future work could include repeating
benchmarking results with different numbers of other devices on the local
network segment is in order.

Section 4.2

   Table 1 and Table 2 below describe the RECOMMENDED and OPTIONAL sets
   of network security feature list for NGFW and NGIPS respectively.

I agree with the IoTdir reviewer that Certificate Validation should surely
be a recommended feature for NGFWs.  But see also the DISCUSS point.

    | SSL Inspection | DUT/SUT intercepts and decrypts inbound HTTPS  |
    |                | traffic between servers and clients.  Once the |
    |                | content inspection has been completed, DUT/SUT |
    |                | encrypts the HTTPS traffic with ciphers and    |
    |                | keys used by the clients and servers.          |

This description could stand to be more clear, especially in light of the
fundamental differences between TLS 1.2 and TLS 1.3.
First, the description starts off with "intercepts and decrypts" and then
goes on to say that once inspection is over, the DUT/SUT "encrypts the
HTTPS traffic".  Does this mean that the DUT/SUT specifically needs to
re-encrypt after decrypting, or is it permissible to retain the original
ciphertext and just relay that ciphertext onward?
Second, in TLS 1.3, it is by construction impossible for a single set of
traffic encryption keys to be shared by all three of client, server, and
DUT/SUT -- RSA key transport is forbidden and ephemeral key exchange is
required.  In order to perform content inspection, such a middlebox needs
to be able to impersonate the server to the client (i.e., holding a
certificate and private key that is trusted by the client and represents
the identity of the real server, which is expected to require specific
configuration on the client to enable) and complete separate TLS
connections to client and to server.  In this scenario the middlebox must
remain as a "machine in the middle" for the duration of the entire
connection and decrypt/reencrypt all content using the different keys for
the client/middlebox and middlebox/server connections.

   *  Geographical location filtering, and Application Identification
      and Control SHOULD be configured to trigger based on a site or
      application from the defined traffic mix.

Do we have a sense for how sensitive the performance results are going to
be with respect to the proportion of traffic that triggers these classes
of filtering/control?  Would it be appropriate to require that this
breakdown be included in the report?

Section 4.3.1.3

   validation.  Depending on test scenarios and selected HTTP version,
   HTTP header compression MAY be set to enable or disable.  This

I didn't think it was possible to fully disable header compression for
HTTP/2 and HTTP/3 (just to set the dynamic table size to zero).

   [RFC8446] defines the following cipher suites for use with TLS 1.3.
   [...]

TLS_AES_128_CCM_8_SHA256 is marked as Recommended=N in the registry; I
think we should indicate that there is little need to benchmark it except
for those special circumstances where the cipher is appropriate.
Even TLS_AES_128_CCM_SHA256 (with full-length authentication tag) is
mostly only going to be used in IoT environments and is likely not needed
for a target of "enterprise grade encryption cipher suites".

Section 4.3.2.3

   server, TLS 1.2 or higher MUST be used with a maximum record size of
   16 KByte and MUST NOT use ticket resumption or session ID reuse.  The

Why is TLS resumption prohibited?
(As a technical matter, TLS 1.3 resumption uses a different mechanism than
the two TLS 1.2 resumption mechanisms, and it may be prudent to
specifically note whether TLS 1.3 resumption is also forbidden.)

   server SHALL serve a certificate to the client.  The HTTPS server
   MUST check host SNI information with the FQDN if SNI is in use.

What does "check host SNI information with the FQDN" mean?  Where is the
FQDN in question obtained from?  (In §4.3.3.1 we say that the proposed
(SNI) FQDN is compared to "the domain embedded in the certificate".  Note
that, of course, the certificate can contain more than one domain name,
e.g., via the now-quite-common use of subjectAltName.)

Section 6.1

       e.  Key test parameters

           *  Used cipher suites and keys

Do we really need to report the specific *keys* used (as opposed to
cryptographic parameters of the TLS connection like the group used for key
exchange, algorithm and key size of the server certificate, etc.)?

           *  Percentage of encrypted traffic and used cipher suites and
              keys (The RECOMMENDED ciphers and keys are defined in
              Section 4.3.1.3)

For what it's worth, trends in generic web traffic are rapidly converging
towards near-universal HTTPS usage.  I am not really sure that measuring
unencrypted traffic is going to be very interesting to many users (though
I concede that some will still be using it and find the corresponding
benchmarking results useful).

Section 7.3.3.2

   RECOMMENDED HTTP response object size: 1, 16, 64, 256 KByte, and
   mixed objects defined in Table 4.

With the explosion of video use on the modern Web, it might be worth
revisiting these recommended object sizes.  Is there likely to be value in
having very large objects for any of the tests?

Section 7.6.1, 7.7.1

   Test iterations MUST include common cipher suites and key strengths
   as well as forward looking stronger keys.  Specific test iterations

How/where would an implementor obtain more guidance on "common cipher suites
and key strengths" and "forward looking stronger keys"?  (With
understanding that this guidance will change over time and cannot be
permanently enshrined in this RFC-to-be.)

(Why does Section 7.8.1 not have similar language?)

Section 9

Hmm, I thought we typically had some language about how if the
benchmarking techniques specified in this document were used outside a
laboratory isolated test environment, security and other risks could arise
(e.g., due to DoS of nearby nodes/services).

Appendix A

I agree with Roman that the text around "CVEs" is imprecise and should be
talking about exploits that are identified by CVEs.
Lars Eggert Former IESG member
(was Discuss) No Objection
No Objection (2022-02-03 for -14) Sent for earlier
Section 1. , paragraph 2, comment:
>    18 years have passed since IETF recommended test methodology and
>    terminology for firewalls initially ([RFC3511]).  The requirements
>    for network security element performance and effectiveness have
>    increased tremendously since then.  In the eighteen years since

These sentences don't age well - rephrase without talking about particular
years?

Section 4.3.2.3. , paragraph 2, comment:
>    The server pool for HTTP SHOULD listen on TCP port 80 and emulate the
>    same HTTP version (HTTP 1.1 or HTTP/2 or HTTP/3) and settings chosen
>    by the client (emulated web browser).  The Server MUST advertise

An H3 server will not listen on TCP port 80. In general, the document needs to
be checked for the implicit assumption that HTTP sues TCP; there is text
throughout that is nonsensical for H3 (like this example).    The Server MUST
advertise

Section 6.3. , paragraph 6, comment:
>       The average number of successfully established TCP connections per
>       second between hosts across the DUT/SUT, or between hosts and the
>       DUT/SUT.  The TCP connection MUST be initiated via a TCP three-way
>       handshake (SYN, SYN/ACK, ACK).  Then the TCP session data is sent.
>       The TCP session MUST be closed via either a TCP three-way close
>       (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK),
>       and MUST NOT by RST.

This prohibits TCP fast open, why? Also, wouldn't it be enough to say that the
connection needs to not abnormally reset, rather than describing the TCP packet
sequences that are acceptable? Given that those are not the only possible
sequences, c.f., loss and reordering.

Section 6.3. , paragraph 6, comment:
>       The average number of successfully completed transactions per
>       second.  For a particular transaction to be considered successful,
>       all data MUST have been transferred in its entirety.  In case of
>       HTTP(S) transactions, it MUST have a valid status code (200 OK),
>       and the appropriate FIN, FIN/ACK sequence MUST have been
>       completed.

H3 doesn't do FIN/ACK, etc. See above.

Section 7.1.3.4. , paragraph 4, comment:
>    a.  Number of failed application transactions (receiving any HTTP
>        response code other than 200 OK) MUST be less than 0.001% (1 out
>        of 100,000 transactions) of total attempted transactions.
>
>    b.  Number of Terminated TCP connections due to unexpected TCP RST
>        sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
>        connections) of total initiated TCP connections.

Why is a 0.001% failure rate deemed acceptable? (Also elsewhere.)

Section 7.2.1. , paragraph 2, comment:
>    Using HTTP traffic, determine the sustainable TCP connection
>    establishment rate supported by the DUT/SUT under different
>    throughput load conditions.

H3 doesn't do TCP.

Section 7.2.3.2. , paragraph 9, comment:
>    The client SHOULD negotiate HTTP and close the connection with FIN
>    immediately after completion of one transaction.  In each test
>    iteration, client MUST send GET request requesting a fixed HTTP
>    response object size.

H3 doesn't do TCP FIN.

Section 7.2.3.3. , paragraph 6, comment:
>    c.  During the sustain phase, traffic SHOULD be forwarded at a
>        constant rate (considered as a constant rate if any deviation of
>        traffic forwarding rate is less than 5%).

What does this mean? How would traffic NOT be forwarded at a constant rate?

Section 7.2.3.3. , paragraph 5, comment:
>    d.  Concurrent TCP connections MUST be constant during steady state
>        and any deviation of concurrent TCP connections SHOULD be less
>        than 10%. This confirms the DUT opens and closes TCP connections
>        at approximately the same rate.

What does it mean for a TCP connection to be constant?

Section 7.4.1. , paragraph 4, comment:
>    Scenario 1: The client MUST negotiate HTTP and close the connection
>    with FIN immediately after completion of a single transaction (GET
>    and RESPONSE).

H3 sessions don't send TCP FINs. (Also elsewhere.)

Section 7.7. , paragraph 1, comment:
> 7.7.  HTTPS Throughput

Is this HTTPS as in H1, H2 or H3? All of the above?

Found terminology that should be reviewed for inclusivity; see
https://www.rfc-editor.org/part2/#inclusive_language for background and more
guidance:

 * Term "dummy"; alternatives might be "placeholder", "sample", "stand-in",
   "substitute".

Thanks to Matt Joras for their General Area Review Team (Gen-ART) review
(https://mailarchive.ietf.org/arch/msg/gen-art/NUycZt5uKAZejOvCr6tdi_7SvPA).

-------------------------------------------------------------------------------
All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 4.1. , paragraph 8, nit:
>  actively inspected by the DUT/SUT. Also "Fail-Open" behavior MUST be disable
>                                     ^^^^
A comma may be missing after the conjunctive/linking adverb "Also".

Section 4.2. , paragraph 9, nit:
> security vendors implement ACL decision making.) The configured ACL MUST NOT
>                                ^^^^^^^^^^^^^^^
The noun "decision-making" (= the process of deciding something) is spelled
with a hyphen.

Section 4.2.1. , paragraph 1, nit:
>  the MSS. Delayed ACKs are permitted and the maximum client delayed ACK SHOUL
>                                     ^^^^
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).

Section 4.3.1.3. , paragraph 3, nit:
>  the MSS. Delayed ACKs are permitted and the maximum server delayed ACK MUST
>                                     ^^^^
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).

Section 4.3.1.3. , paragraph 4, nit:
> IPv6 with a ratio identical to the clients distribution ratio. Note: The IAN
>                                    ^^^^^^^
An apostrophe may be missing.

Section 4.3.3.1. , paragraph 2, nit:
> S throughput performance test with smallest object size. 3. Ensure that any
>                                    ^^^^^^^^
A determiner may be missing.

Section 6.1. , paragraph 19, nit:
> sion with a more specific Kbit/s in parenthesis. * Time to First Byte (TTFB)
>                                  ^^^^^^^^^^^^^^
Did you mean "in parentheses"? "parenthesis" is the singular.

Section 7.5.3. , paragraph 2, nit:
> s and key strengths as well as forward looking stronger keys. Specific test
>                                ^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 7.5.4.2. , paragraph 3, nit:
> SHOULD NOT be reported, if the above mentioned KPI (especially inspected thro
>                                ^^^^^^^^^^^^^^^
The adjective "above-mentioned" is spelled with a hyphen.

Section 7.6.1. , paragraph 4, nit:
> s and key strengths as well as forward looking stronger keys. Specific test
>                                ^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 7.9.3.4. , paragraph 1, nit:
> * Accuracy of DUT/SUT statistics in term of vulnerabilities reporting A.2. T
>                                  ^^^^^^^^^^
Did you mean the commonly used phrase "in terms of"?

Section 7.9.4. , paragraph 2, nit:
> tected attack traffic MUST be dropped and the session SHOULD be reset A.3.2.
>                                      ^^^^
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).
Martin Duke Former IESG member
No Objection
No Objection (2022-02-01 for -13) Sent
(4.3.1.3) RFC8446 is not the reference for HTTP/2.

(4.3.1.1), (4.3.2.1) Is there a reason that delayed ack limits are defined only in terms of number of bytes, instead of time? What if an HTTP request (for example) ends, and the delayed ack is very long? Note also that the specification for delayed acks limits it to every two packets, although in the real world many endpoints use much higher thresholds. [It's OK to keep it at 10*MSS if you prefer].

(4.3.3.1) What is a "TCP persistence stack"?