Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Telechat Review of draft-ietf-bmwg-ngfw-performance-13
review-ietf-bmwg-ngfw-performance-13-iotdir-telechat-eckert-2022-01-30-00

Versions:
Request	Review of	draft-ietf-bmwg-ngfw-performance
	Requested revision	No specific revision (document currently at 15)
	Type	Telechat Review
	Team	Internet of Things Directorate (iotdir)
	Deadline	2022-01-30
	Requested	2022-01-18
	Requested by	Éric Vyncke
	Authors	Balamuhunthan Balarajah , Carsten Rossenhoevel , Brian Monkman
	I-D last updated	2023-03-09 (Latest revision 2022-10-22)
	Completed reviews	Secdir Early review of -00 by Kathleen Moriarty (diff) Tsvart IETF Last Call review of -12 by Tommy Pauly (diff) Tsvart Telechat review of -13 by Tommy Pauly (diff) Iotdir Telechat review of -13 by Toerless Eckert (diff) Genart Telechat review of -13 by Matt Joras (diff) Tsvart Telechat review of -13 by Tommy Pauly (diff)
Assignment	Reviewer	Toerless Eckert
	State	Completed
	Request	Telechat review on draft-ietf-bmwg-ngfw-performance by Internet of Things Directorate Assigned
	Posted at	https://mailarchive.ietf.org/arch/msg/iot-directorate/p_nx-4R8RHAT1dt1W_laupy32WE
	Reviewed revision	13 (document currently at 15)
	Result	On the right track
	Completed	2022-01-30
review-ietf-bmwg-ngfw-performance-13-iotdir-telechat-eckert-2022-01-30-00
Reviewer: toerless eckert
Review result: On the right track

Summary:
Thanks a lot for this work. Its an immensibley complex and important problem to
tackle. I have in my time only measured router traffic performance and that
already was an infinite matrix. This looks to me like some order of infinite
bigger a problem.

Meaning: however questioning my reviews feedback may be wrt. nitpicking about
the document, i think that the document in its existing form is already a great
advancement to measure performance for these security devices, and in doubt
should be progressed rather faster than slower especially because in my
(limited) market understanding, many security device vendors will only provide
actual feedback once it is an RFC (community i think overall more conservative
in adopting IETF work, most not proactively engaging during draft stage).

But of course: feel free to improve the document with any of the
feedback/suggestions in my review that you feel are useful.

Maybe high level, i would suggest most importantly to add more explanations,
especially in an appropriate section about those aspects known NOT to be
considered (but potentially important) so that the applicability of the tests
that are described are better put into perspective by adopters of the draft to
their real-world situations.

Favorite pet topic: Add req. to measure the DUT through a power meter and
report consumption so we can start making sure products with lower power
consumptions will see sales benefits when reporting numbers from this document
(see details inline).

Formal:
I choose to keep the whole document inline to make it easier for readers to vet
my comments without having to open in parallel a copy of the whole document.

Rest inline - email ends with string EOF (i have seen some email truncation
happening).

Thanks!
    Toerless

---
Please fix the following nits - from https://www.ietf.org/tools/idnits
idnits 2.17.00 (12 Aug 2021)

> /tmp/idnits29639/draft-ietf-bmwg-ngfw-performance-13.txt:
> ...
>
>   Checking nits according to https://www.ietf.org/id-info/checklist :
>   ----------------------------------------------------------------------------
>
>   ** The abstract seems to contain references ([RFC3511]), which it
>      shouldn't.  Please replace those with straight textual mentions of the
>      documents in question.
>
>   == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
>      in the document.  If these are example addresses, they should be changed.
>
>   == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses
>      in the document.  If these are example addresses, they should be changed.
>
>   -- The draft header indicates that this document obsoletes RFC3511, but the
>      abstract doesn't seem to directly say this.  It does mention RFC3511
>      though, so this could be OK.
>
>
>   Miscellaneous warnings:
>   ----------------------------------------------------------------------------
>
>   == The document seems to lack the recommended RFC 2119 boilerplate, even if
>      it appears to use RFC 2119 keywords.
>
>      (The document does seem to have the reference to RFC 2119 which the
>      ID-Checklist requires).
>
>

The lines in the following commented copy of the document are from idnits too

When a comment/question is preceeded with "Nit:", then it indicates, that
it seems to me the best answer would be modified draft text.

When a comment/question is preceeded with "Q:", then i am actually not so
sure what the outcome could be, so an answer in mail would be a start.

2       Benchmarking Methodology Working Group                      B. Balarajah
3       Internet-Draft
4       Obsoletes: 3511 (if approved)                            C. Rossenhoevel
5       Intended status: Informational                                  EANTC AG
6       Expires: 16 July 2022                                         B. Monkman
7                                                                     NetSecOPEN
8                                                                   January 2022

10          Benchmarking Methodology for Network Security Device Performance
11                        draft-ietf-bmwg-ngfw-performance-13

13      Abstract

15         This document provides benchmarking terminology and methodology for
16         next-generation network security devices including next-generation
17         firewalls (NGFW), next-generation intrusion prevention systems
18         (NGIPS), and unified threat management (UTM) implementations.  The

Nit: Why does it have to be next-generation for all example type
of devices except for UTMs, and what does next-generation mean.
Would suggest to rewrite text so reader does not ask herself these
questions.

18         (NGIPS), and unified threat management (UTM) implementations.  The
19         main areas covered in this document are test terminology, test
20         configuration parameters, and benchmarking methodology for NGFW and
21         NGIPS.  This document aims to improve the applicability,

I don't live and breathe the security device TLA space, but i start to
suspect a UTM is some platform on which FW and IPS could run as software
modules, and because its only software you assume the UTM does not have
to be next-gen ? I wonder how much of this guesswork/thought process you
want the reader to have or if you want to avoid that by being somehawt
clearer...

21         NGIPS.  This document aims to improve the applicability,
22         reproducibility, and transparency of benchmarks and to align the test
23         methodology with today's increasingly complex layer 7 security
24         centric network application use cases.  As a result, this document
25         makes [RFC3511] obsolete.

[minor] I kinda wonder if / how obsoleting RFC3511 could/should work.
I understand when we do a bis of a standard protocol and really don't
want anyone to implement the older version. But unless there is a
similar IETF mandate going along with this draft that says
non-NG FW and non-NG IPS are hereby obsoleted by the IETF, i can not
see how this draft can obsolete RFC3511 because it simply applies
to a different type of benchmarked entities. And RFC3511 would stay
on forever for whatever we call non-NG.

[minor] At least i think that is the case Unless this document actually does
apply also to non-NG FW/IPS and can therefore superceed RFC3511 and actually
obsolete it. But the text so far does say the opposite.

[mayor] I observe that RFC3511 asks to measure and report goodput (5.6.5.2),
and this document does not mention the term, and if at all, the loss in
performance of client/server TCP or QUIC connections through behavior of the
DUT (such as proxying) is at best covered indirectly by mentioning parameters
such as less than 5% reduction in throughput. If this document is superceeding
rfc3511 i think it should have a very explicit section discussing goodput - and
maybe expanding on it.

consider for example the impact of TCP connection throughput and goodput.
Very likely  DUT proxying TCP connections will have quite a different
performance/goodput impact for a calssical web-page vs. video streaming.
Therefore i am also worried about sending only average bitrates per session as
opposed to some sessions going up to e.g. 500Mbps for a video streaming
connection (example best commercial available UHD video streaming today). Those
type of sessions might incur a lot of goodput loss with bad DUTs, but if i
understand the test profile, then the per-TCP connection througput of the test
profiles will be much less than 100Mbps. If such range in client session
bitrates is not meant to be tested, it might at least be useful to add a
section listing candidate gaps like this. Another one for example is the impact
of higher RTT especially between DUT and server in the Internet. This mostly
challenges TCP window size operation on DUT operating as TCP hosts and also
their ability to buffer for retransmissions. Test Equipment IMHO may/should be
able to emulate such long RTT. But this is not included in this document (RTT
not mentioned).

Beside goodput related issues, there are a couple other points in this review
that may be too difficult to fix this late in the development of the document,
but maybe for any of those considered to be useful input maybe add them to a
section "out-of-scope (for future versions) considerations" or the like to
capture them.

27      Status of This Memo

29         This Internet-Draft is submitted in full conformance with the
30         provisions of BCP 78 and BCP 79.

32         Internet-Drafts are working documents of the Internet Engineering
33         Task Force (IETF).  Note that other groups may also distribute
34         working documents as Internet-Drafts.  The list of current Internet-
35         Drafts is at https://datatracker.ietf.org/drafts/current/.

37         Internet-Drafts are draft documents valid for a maximum of six months
38         and may be updated, replaced, or obsoleted by other documents at any
39         time.  It is inappropriate to use Internet-Drafts as reference
40         material or to cite them other than as "work in progress."

42         This Internet-Draft will expire on 5 July 2022.

44      Copyright Notice

46         Copyright (c) 2022 IETF Trust and the persons identified as the
47         document authors.  All rights reserved.

49         This document is subject to BCP 78 and the IETF Trust's Legal
50         Provisions Relating to IETF Documents (https://trustee.ietf.org/
51         license-info) in effect on the date of publication of this document.
52         Please review these documents carefully, as they describe your rights
53         and restrictions with respect to this document.  Code Components
54         extracted from this document must include Revised BSD License text as
55         described in Section 4.e of the Trust Legal Provisions and are
56         provided without warranty as described in the Revised BSD License.

58      Table of Contents

60         1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
61         2.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   4
62         3.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
63         4.  Test Setup  . . . . . . . . . . . . . . . . . . . . . . . . .   4
64           4.1.  Testbed Configuration . . . . . . . . . . . . . . . . . .   5
65           4.2.  DUT/SUT Configuration . . . . . . . . . . . . . . . . . .   6
66             4.2.1.  Security Effectiveness Configuration  . . . . . . . .  12
67           4.3.  Test Equipment Configuration  . . . . . . . . . . . . . .  12
68             4.3.1.  Client Configuration  . . . . . . . . . . . . . . . .  12
69             4.3.2.  Backend Server Configuration  . . . . . . . . . . . .  15
70             4.3.3.  Traffic Flow Definition . . . . . . . . . . . . . . .  17
71             4.3.4.  Traffic Load Profile  . . . . . . . . . . . . . . . .  17
72         5.  Testbed Considerations  . . . . . . . . . . . . . . . . . . .  18
73         6.  Reporting . . . . . . . . . . . . . . . . . . . . . . . . . .  19
74           6.1.  Introduction  . . . . . . . . . . . . . . . . . . . . . .  19
75           6.2.  Detailed Test Results . . . . . . . . . . . . . . . . . .  21
76           6.3.  Benchmarks and Key Performance Indicators . . . . . . . .  21
77         7.  Benchmarking Tests  . . . . . . . . . . . . . . . . . . . . .  23
78           7.1.  Throughput Performance with Application Traffic Mix . . .  23
79             7.1.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  23
80             7.1.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  23
81             7.1.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  23
82             7.1.4.  Test Procedures and Expected Results  . . . . . . . .  25
83           7.2.  TCP/HTTP Connections Per Second . . . . . . . . . . . . .  26
84             7.2.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  26
85             7.2.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  27
86             7.2.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  27
87             7.2.4.  Test Procedures and Expected Results  . . . . . . . .  28
88           7.3.  HTTP Throughput . . . . . . . . . . . . . . . . . . . . .  30
89             7.3.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  30
90             7.3.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  30
91             7.3.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  30
92             7.3.4.  Test Procedures and Expected Results  . . . . . . . .  32
93           7.4.  HTTP Transaction Latency  . . . . . . . . . . . . . . . .  33
94             7.4.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  33
95             7.4.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  33
96             7.4.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  34
97             7.4.4.  Test Procedures and Expected Results  . . . . . . . .  35
98           7.5.  Concurrent TCP/HTTP Connection Capacity . . . . . . . . .  36
99             7.5.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  36
100            7.5.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  36
101            7.5.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  37
102            7.5.4.  Test Procedures and Expected Results  . . . . . . . .  38
103          7.6.  TCP/HTTPS Connections per Second  . . . . . . . . . . . .  39
104            7.6.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  40
105            7.6.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  40
106            7.6.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  40
107            7.6.4.  Test Procedures and Expected Results  . . . . . . . .  42
108          7.7.  HTTPS Throughput  . . . . . . . . . . . . . . . . . . . .  43
109            7.7.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  43
110            7.7.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  43
111            7.7.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  43
112            7.7.4.  Test Procedures and Expected Results  . . . . . . . .  45
113          7.8.  HTTPS Transaction Latency . . . . . . . . . . . . . . . .  46
114            7.8.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  46
115            7.8.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  46
116            7.8.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  46
117            7.8.4.  Test Procedures and Expected Results  . . . . . . . .  48
118          7.9.  Concurrent TCP/HTTPS Connection Capacity  . . . . . . . .  49
119            7.9.1.  Objective . . . . . . . . . . . . . . . . . . . . . .  49
120            7.9.2.  Test Setup  . . . . . . . . . . . . . . . . . . . . .  49
121            7.9.3.  Test Parameters . . . . . . . . . . . . . . . . . . .  49
122            7.9.4.  Test Procedures and Expected Results  . . . . . . . .  51
123        8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  52
124        9.  Security Considerations . . . . . . . . . . . . . . . . . . .  53
125        10. Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  53
126        11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  53
127        12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  53
128          12.1.  Normative References . . . . . . . . . . . . . . . . . .  53
129          12.2.  Informative References . . . . . . . . . . . . . . . . .  53
130        Appendix A.  Test Methodology - Security Effectiveness
131                Evaluation  . . . . . . . . . . . . . . . . . . . . . . .  54
132          A.1.  Test Objective  . . . . . . . . . . . . . . . . . . . . .  55
133          A.2.  Testbed Setup . . . . . . . . . . . . . . . . . . . . . .  55
134          A.3.  Test Parameters . . . . . . . . . . . . . . . . . . . . .  55
135            A.3.1.  DUT/SUT Configuration Parameters  . . . . . . . . . .  55
136            A.3.2.  Test Equipment Configuration Parameters . . . . . . .  55
137          A.4.  Test Results Validation Criteria  . . . . . . . . . . . .  56
138          A.5.  Measurement . . . . . . . . . . . . . . . . . . . . . . .  56
139          A.6.  Test Procedures and Expected Results  . . . . . . . . . .  57
140            A.6.1.  Step 1: Background Traffic  . . . . . . . . . . . . .  57
141            A.6.2.  Step 2: CVE Emulation . . . . . . . . . . . . . . . .  58
142        Appendix B.  DUT/SUT Classification . . . . . . . . . . . . . . .  58
143        Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  58

145     1.  Introduction

147        18 years have passed since IETF recommended test methodology and
148        terminology for firewalls initially ([RFC3511]).  The requirements
149        for network security element performance and effectiveness have
150        increased tremendously since then.  In the eighteen years since

[nit] What is a network security element ? Please provide reference or define.
If we are talking about them in this doc why are they not mentioned in the
abstract ?

150        increased tremendously since then.  In the eighteen years since
151        [RFC3511] was published, recommending test methodology and
152        terminology for firewalls, requirements and expectations for network
153        security elements has increased tremendously.  Security function

[nit] This does not parse as correct english to me "recommending test
methodology ... has increased tremendously". It would, if you mean that more
and more test methodologies where recommended, but not if there is an
outstanding need to do so (which this document intends to fill).

[nit] Why does the recommending part apply only to firewalls and the
requirements and expectations only to security elements ?

153        security elements has increased tremendously.  Security function

[nit] What is a security function ? (i know, but i don't know if the reader is
supposed to know). Aka: provide reference, add terminology section or define.
Maybe easiest to restructure this intro paragraph to start with the
explanation of the evolution from firewalls to network security elements
which support one or more securit functions including firewall, intrusion
detection etc. pp - and then conclude easily how this means that this
requires this document to define all the good BMWG stuff it hopefully does.

Although a terminology section is never a bad thing either ;-)

154        implementations have evolved to more advanced areas and have
155        diversified into intrusion detection and prevention, threat
156        management, analysis of encrypted traffic, etc.  In an industry of
157        growing importance, well-defined, and reproducible key performance
158        indicators (KPIs) are increasingly needed to enable fair and
159        reasonable comparison of network security functions.  All these

[nit] maybe add what to compare - performance, functionality, scale,
flexibility, adjustability - or if you knowingly only discuss subsets
of these aspects, then maybe still list all the aspects you are aware
of to be of interest to likely readers of this document and summarize
those that you will and those that you won't cover in this document, so
that the readers don't have to continue reading the document hoping to find
them described.

160        reasons have led to the creation of a new next-generation network
161        security device benchmarking document, which makes [RFC3511]
162        obsolete.

[nit] as mentioned above, whether or not the obsolete is true is
not clear to me yet.

164     2.  Requirements

166        The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
167        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
168        "OPTIONAL" in this document are to be interpreted as described in BCP
169        14 [RFC2119], [RFC8174] when, and only when, they appear in all
170        capitals, as shown here.

172     3.  Scope

174        This document provides testing terminology and testing methodology
175        for modern and next-generation network security devices that are
176        configured in Active ("Inline", see Figure 1 and Figure 2) mode.  It

[nit] The word Active does not again happen in the document, instead, the
description on line 261 defines Inline mode as "active", which in my book
makes 176+261 a perfect circular definition. I would suggest to have a
terminology section that define "Inline", for example by also adding one
most likely possible alternative mode description.

177        covers the validation of security effectiveness configurations of

[nit] security configuration effectiveness ?

179        network security devices, followed by performance benchmark testing.
179        This document focuses on advanced, realistic, and reproducible
180        testing methods.  Additionally, it describes testbed environments,

[nit] are you sure advanced and realistic are meant to characterize the
testing method or the scenario that is being tested ? "reroducible
testing methods for advanced real world scenarios" ?

181        test tool requirements, and test result formats.

183     4.  Test Setup

185        Test setup defined in this document applies to all benchmarking tests

[nit] "/Test setup defined/The test setup defined/

186        described in Section 7.  The test setup MUST be contained within an
187        Isolated Test Environment (see Section 3 of [RFC6815]).

189     4.1.  Testbed Configuration

191        Testbed configuration MUST ensure that any performance implications
192        that are discovered during the benchmark testing aren't due to the

[nit] /aren't/are not/

193        inherent physical network limitations such as the number of physical
194        links and forwarding performance capabilities (throughput and
195        latency) of the network devices in the testbed.  For this reason,
196        this document recommends avoiding external devices such as switches
197        and routers in the testbed wherever possible.

199        In some deployment scenarios, the network security devices (Device
200        Under Test/System Under Test) are connected to routers and switches,
201        which will reduce the number of entries in MAC or ARP tables of the
202        Device Under Test/System Under Test (DUT/SUT).  If MAC or ARP tables
203        have many entries, this may impact the actual DUT/SUT performance due
204        to MAC and ARP/ND (Neighbor Discovery) table lookup processes.  This
205        document also recommends using test equipment with the capability of

[nit] /also/therefore/

206        emulating layer 3 routing functionality instead of adding external
207        routers in the testbed.

209        The testbed setup Option 1 (Figure 1) is the RECOMMENDED testbed
210        setup for the benchmarking test.

212        +-----------------------+                   +-----------------------+
213        | +-------------------+ |   +-----------+   | +-------------------+ |
214        | | Emulated Router(s)| |   |           |   | | Emulated Router(s)| |
215        | |    (Optional)     | +----- DUT/SUT  +-----+    (Optional)     | |
216        | +-------------------+ |   |           |   | +-------------------+ |
217        | +-------------------+ |   +-----------+   | +-------------------+ |
218        | |     Clients       | |                   | |      Servers      | |
219        | +-------------------+ |                   | +-------------------+ |
220        |                       |                   |                       |
221        |   Test Equipment      |                   |   Test Equipment      |
222        +-----------------------+                   +-----------------------+

224                          Figure 1: Testbed Setup - Option 1

226        If the test equipment used is not capable of emulating layer 3
227        routing functionality or if the number of used ports is mismatched
228        between test equipment and the DUT/SUT (need for test equipment port
229        aggregation), the test setup can be configured as shown in Figure 2.

231         +-------------------+      +-----------+      +--------------------+
232         |Aggregation Switch/|      |           |      | Aggregation Switch/|
233         | Router            +------+  DUT/SUT  +------+ Router             |
234         |                   |      |           |      |                    |
235         +----------+--------+      +-----------+      +--------+-----------+
236                    |                                           |
237                    |                                           |
238        +-----------+-----------+                   +-----------+-----------+
239        |                       |                   |                       |
240        | +-------------------+ |                   | +-------------------+ |
241        | | Emulated Router(s)| |                   | | Emulated Router(s)| |
242        | |     (Optional)    | |                   | |     (Optional)    | |
243        | +-------------------+ |                   | +-------------------+ |
244        | +-------------------+ |                   | +-------------------+ |
245        | |      Clients      | |                   | |      Servers      | |
246        | +-------------------+ |                   | +-------------------+ |
247        |                       |                   |                       |
248        |    Test Equipment     |                   |    Test Equipment     |
249        +-----------------------+                   +-----------------------+

251                          Figure 2: Testbed Setup - Option 2

[nit] Please elaborate on the "number of used ports", and if possible show in
Figure 2 by drawing multiple links. I guess that in a common case, the test
equipment might provide few, but fast ports, whereas the DUT/SU might provide
more slower ports, and one would there use external switches as port
multiplexer ? Or vice-versa ? Butif such adaptation is performed, i wonder how
different setup might impact the measurements. So for example let's say the
Test Equipment (TE) has a 100Gbps port, and the DUT has 4 * 10Gbps port, so you
need on each side a switch with 100Gbps and 2 * 10 Gbps. Would you try to use
VLANs into the TE, or would you just build a single LAN. Any recommendations
for the switch config, and why.

[mayor] The fact that the left side says only client and the right side says
only server is worth some more discussion. Especially because the Filtering in
Figure 3 also lets me wonder in which direction traffic is meant to be
filtered/inspected. Are you considering the case that clients are responders to
(TCP/QUIC/UDP) connections ? For example left side is "inside", DUT is a site
firewall to the Internet (right side), and there is some server on the left
side (e.g.: SMTP). How about that you do have on the right an Internet and a
separate site DMZ interface and then of course traffic not only between left
and right, but between those interfaces on the right ?

More broadly applicable, dynamic port discovery for ICE/STUN, where you want to
permit inside to outside connections (to the STUN server) to permit new
connections from other external nodes to go back inside). E.g.: would be good
to have some elaboration about the rype of connections covered by this
document. If its only initiators on the left and responders on the right, that
is fine, but it should be said so and maybe point to those above cases (DMZ,
inside servers, STUN/ICE) not covered by this document.

253     4.2.  DUT/SUT Configuration

255        A unique DUT/SUT configuration MUST be used for all benchmarking
256        tests described in Section 7.  Since each DUT/SUT will have its own
257        unique configuration, users SHOULD configure their device with the
258        same parameters and security features that would be used in the
259        actual deployment of the device or a typical deployment in order to
260        achieve maximum network security coverage.  The DUT/SUT MUST be

[nit] What is a "unique configuration" ? It could be different configurations
across two different DUT but both achieving the same service/filtering, just
difference in syntax, or it could be difference in functional outcome. Would
be good to be more precise what is meant.

[nit] Why would a user choose an actual deployment vs. a typical deployment ?
I am imagining that a user would choose an actual deployment to measure
performance specifically for that deployment but a typical deployment when the
DUT would need to be deployed in different setups but not each of those can be
measured individually, or because the results are meant to be comparable with
other users who may have taken performance numbers. WOuld be good to elaborate
a bit more so readers have a clearer understanding what "actual deployment" and
"typical deployment" means and how/why to pick one over the other.

[nit] I do not understand how the text up to "in order to" justifies that it
will achieve the maximum network security coverage. I also do not know what
"maximum network security coverage" means. If there is a definition, please
provide it. Else introduce it.

260        achieve maximum network security coverage.  The DUT/SUT MUST be
261        configured in "Inline" mode so that the traffic is actively inspected
262        by the DUT/SUT.  Also "Fail-Open" behavior MUST be disabled on the
263        DUT/SUT.

265        Table 1 and Table 2 below describe the RECOMMENDED and OPTIONAL sets
266        of network security feature list for NGFW and NGIPS respectively.
267        The selected security features SHOULD be consistently enabled on the
268        DUT/SUT for all benchmarking tests described in Section 7.

270        To improve repeatability, a summary of the DUT/SUT configuration
271        including a description of all enabled DUT/SUT features MUST be
272        published with the benchmarking results.

274               +============================+=============+==========+
275               | DUT/SUT (NGFW) Features    | RECOMMENDED | OPTIONAL |
276               +============================+=============+==========+
277               | SSL Inspection             |      x      |          |
278               +----------------------------+-------------+----------+
279               | IDS/IPS                    |      x      |          |
280               +----------------------------+-------------+----------+
281               | Anti-Spyware               |      x      |          |
282               +----------------------------+-------------+----------+
283               | Anti-Virus                 |      x      |          |
284               +----------------------------+-------------+----------+
285               | Anti-Botnet                |      x      |          |
286               +----------------------------+-------------+----------+
287               | Web Filtering              |             |    x     |
288               +----------------------------+-------------+----------+
289               | Data Loss Protection (DLP) |             |    x     |
290               +----------------------------+-------------+----------+
291               | DDoS                       |             |    x     |
292               +----------------------------+-------------+----------+
293               | Certificate Validation     |             |    x     |
294               +----------------------------+-------------+----------+

[mayor] This may be bogs because i don't know well enough how for the purpose
of this document security devices are expected to inspect HTTP connections
from client to server. Maybe this is a sane approach where the security device
operates as a client trusted HTTPs proxy, maybe its one of the more hacky
approaches (faked server certs). But however it works, i think that a security
device can not get away from validating the certificate of the server in a
connection. Else it shouldn't be called a security DUT.

But i am not sure if that validation is what you call "Certificate Validation".

294               +----------------------------+-------------+----------+
295               | Logging and Reporting      |      x      |          |
296               +----------------------------+-------------+----------+
297               | Application Identification |      x      |          |
298               +----------------------------+-------------+----------+

300                           Table 1: NGFW Security Features

[nit] Why are "Web Filtering"..."Certificate Validation" only MAY ?
Please point to a place in the document (or elsewhere) that rationales
the SHOULD/MAY recommendations. Same applies to Table 2.

[nit]

302               +============================+=============+==========+
303               | DUT/SUT (NGIPS) Features   | RECOMMENDED | OPTIONAL |
304               +============================+=============+==========+
305               | SSL Inspection             |      x      |          |
306               +----------------------------+-------------+----------+
307               | Anti-Malware               |      x      |          |
308               +----------------------------+-------------+----------+
309               | Anti-Spyware               |      x      |          |
310               +----------------------------+-------------+----------+
311               | Anti-Botnet                |      x      |          |
312               +----------------------------+-------------+----------+
313               | Logging and Reporting      |      x      |          |
314               +----------------------------+-------------+----------+
315               | Application Identification |      x      |          |
316               +----------------------------+-------------+----------+
317               | Deep Packet Inspection     |      x      |          |
318               +----------------------------+-------------+----------+
319               | Anti-Evasion               |      x      |          |
320               +----------------------------+-------------+----------+

322                           Table 2: NGIPS Security Features

[nit] I ended up scrolling up and down to compare the tables.
It might be useful for other readers like me to merge the tables,
aka: put the columns for NGFW and NGIPS into one table.

[nit] Please start with Table 3 as it introduces the security features,
else the two above tables introduce a lot of features without defining them.

324        The following table provides a brief description of the security
325        features.

327         +================+================================================+
328         | DUT/SUT        | Description                                    |
329         | Features       |                                                |
330         +================+================================================+
331         | SSL Inspection | DUT/SUT intercepts and decrypts inbound HTTPS  |
332         |                | traffic between servers and clients.  Once the |
333         |                | content inspection has been completed, DUT/SUT |
334         |                | encrypts the HTTPS traffic with ciphers and    |
335         |                | keys used by the clients and servers.          |
336         +----------------+------------------------------------------------+
337         | IDS/IPS        | DUT/SUT detects and blocks exploits targeting  |
338         |                | known and unknown vulnerabilities across the   |
339         |                | monitored network.                             |
340         +----------------+------------------------------------------------+
341         | Anti-Malware   | DUT/SUT detects and prevents the transmission  |
342         |                | of malicious executable code and any           |
343         |                | associated communications across the monitored |
344         |                | network.  This includes data exfiltration as   |
345         |                | well as command and control channels.          |
346         +----------------+------------------------------------------------+
347         | Anti-Spyware   | Anti-Spyware is a subcategory of Anti Malware. |
348         |                | Spyware transmits information without the      |
349         |                | user's knowledge or permission.  DUT/SUT       |
350         |                | detects and block initial infection or         |
351         |                | transmission of data.                          |
352         +----------------+------------------------------------------------+
353         | Anti-Botnet    | DUT/SUT detects traffic to or from botnets.    |
354         +----------------+------------------------------------------------+
355         | Anti-Evasion   | DUT/SUT detects and mitigates attacks that     |
356         |                | have been obfuscated in some manner.           |
357         +----------------+------------------------------------------------+
358         | Web Filtering  | DUT/SUT detects and blocks malicious website   |
359         |                | including defined classifications of website   |
360         |                | across the monitored network.                  |
361         +----------------+------------------------------------------------+
362         | DLP            | DUT/SUT detects and prevents data breaches and |
363         |                | data exfiltration, or it detects and blocks    |
364         |                | the transmission of sensitive data across the  |
365         |                | monitored network.                             |
366         +----------------+------------------------------------------------+
367         | Certificate    | DUT/SUT validates certificates used in         |
368         | Validation     | encrypted communications across the monitored  |
369         |                | network.                                       |
370         +----------------+------------------------------------------------+
371         | Logging and    | DUT/SUT logs and reports all traffic at the    |
372         | Reporting      | flow level across the monitored network.       |
373         +----------------+------------------------------------------------+
374         | Application    | DUT/SUT detects known applications as defined  |
375         | Identification | within the traffic mix selected across the     |
376         |                | monitored network.                             |
377         +----------------+------------------------------------------------+

379                        Table 3: Security Feature Description

[nit] Why is DDoS and DPI not listed in this table ? I just randomnly stumbled
across that one, but maybe there are more mismatches between Table 1 and 2.
Pls. make sure all Table 1/2  Features are mentioned.

[nit] I have a bout 1000 questions and concerns about this stuff: Are there
actually IETF specifications for how any of these features on the DUT do work or
should work, or is this all vendor proprietary functionality ? For anything that
is vendor / market proprietary specification, how would the TE (Test Equipment)
know what the DUT does, so that it can effectively test it ? I imagine that
if there is a difference in how a particular feature functions across different
vendor DUTs, that the same is true for TE, so some TE would have more functional
overlap with DUT than others. ?

[nit (continued] E.g.: lets say some DUT1 feature , e.g.: DLP is really simple
and therefore ot very secure. But that makes it a lot faster than a DUT2 DLP
feature which is a lot more secure. Maybe there is a metric for this security,
like if i rememver correctly from the past, the number of signatures in virus
detection or the like... How would such differences be taken into account in
measurement?

381        Below is a summary of the DUT/SUT configuration:

383        *  DUT/SUT MUST be configured in "inline" mode.

385        *  "Fail-Open" behavior MUST be disabled.

387        *  All RECOMMENDED security features are enabled.

389        *  Logging SHOULD be enabled.  DUT/SUT SHOULD log all traffic at the
390           flow level - Logging to an external device is permissible.

[nit] Does that mean logging of ALL flows or only of flows that trigger some
security issue ? Logging of ALL flows seems like a big performance hog and
may be something infeasible in fast deployments and may need to be tested as
a separate case by itself. (but my concern may be outdated).

[nit] If logging is to an external device, it may be useful to indicate in
Figure 1/2 such a logging receiver, and ideally have it operate via a link from
the DUT that does not pass test traffic so that it does not interfere.

392        *  Geographical location filtering, and Application Identification
393           and Control SHOULD be configured to trigger based on a site or
394           application from the defined traffic mix.

[nit] Geographic location filtering does not sound like a generically necessary
or applicable security feature. If you are for example a high-tech manufacturer
that sells all over the world, you may appreciate customers visiting your
webserver from countries that happen to also host a lot of botnets. Or is this
document focussed on a more narrower set of use-cases ? E.g.: DUT only to filter
anything that could can not put into the cloud (such as web services) ? E.g.:
would be good to write up some justification for the GeoLoc SHOULD that would
then help readers to better understand when/how to conffigure and and when/how
not.

396        In addition, a realistic number of access control rules (ACL) SHOULD
397        be configured on the DUT/SUT where ACLs are configurable and
398        reasonable based on the deployment scenario.  This document
399        determines the number of access policy rules for four different
400        classes of DUT/SUT: Extra Small (XS), Small (S), Medium (M), and
401        Large (L).  A sample DUT/SUT classification is described in
402        Appendix B.

[mayor] IMHO, you can not put numbers such as those in Figure 3 into the main
text of the document, but the speed definitions of the four classes into an
Appendix B. It seems clear to me that the numbers in Figure 3 (and probably
elsewhere) where derived from the assumptions that the four speed classes are
defined as in Appendix B. Suggestion: inline the text of Appendix B here and
mention that numbers such as in Figure 3 are derived from the assumption of
those XS/S/M/L numbers, Add (if necessary, else not) that it may be appropriate
to choose other numbers for XS/S/M/L, but if one does that, then the dependent
numbers (such as those from Figure 3) may also need to be re-evaluated.

404        The Access Control Rules (ACL) defined in Figure 3 MUST be configured
405        from top to bottom in the correct order as shown in the table.  This
406        is due to ACL types listed in specificity decreasing order, with
407        "block" first, followed by "allow", representing a typical ACL based
408        security policy.  The ACL entries SHOULD be configured with routable
409        IP subnets by the DUT/SUT.  (Note: There will be differences between
410        how security vendors implement ACL decision making.)  The configured

[nit] /security vendors/DUT/

[nit] I don't understand what i am supposed to learn from the (Note: ...)
sentence. Rephrase ? or remove.

410        how security vendors implement ACL decision making.)  The configured
411        ACL MUST NOT block the security and measurement traffic used for the
412        benchmarking tests.

[nit] what is "security traffic" ? what is "measurement traffic" ?  Don't see
these terms defined before. Those two terms do not immediately click to me. I
guess measured user/client-server traffic vs. test-setup management traffic
(including logging) ?? In any case introduce the terms, define them and use
them consistently. Whatever they are.

414                                                            +---------------+
415                                                            | DUT/SUT       |
416                                                            | Classification|
417                                                            | # Rules       |
418        +-----------+-----------+--------------------+------+---+---+---+---+
419        |           | Match     |                    |      |   |   |   |   |
420        | Rules Type| Criteria  |   Description      |Action| XS| S | M | L |
421        +-------------------------------------------------------------------+
422        |Application|Application| Any application    | block| 5 | 10| 20| 50|
423        |layer      |           | not included in    |      |   |   |   |   |
424        |           |           | the measurement    |      |   |   |   |   |
425        |           |           | traffic            |      |   |   |   |   |
426        +-------------------------------------------------------------------+
427        |Transport  |SRC IP and | Any SRC IP subnet  | block| 25| 50|100|250|
428        |layer      |TCP/UDP    | used and any DST   |      |   |   |   |   |
429        |           |DST ports  | ports not used in  |      |   |   |   |   |
430        |           |           | the measurement    |      |   |   |   |   |
431        |           |           | traffic            |      |   |   |   |   |
432        +-------------------------------------------------------------------+
433        |IP layer   |SRC/DST IP | Any SRC/DST IP     | block| 25| 50|100|250|
434        |           |           | subnet not used    |      |   |   |   |   |
435        |           |           | in the measurement |      |   |   |   |   |
436        |           |           | traffic            |      |   |   |   |   |
437        +-------------------------------------------------------------------+

[nit] WOuld suggest to remove the word "Any" to minimize misinterpretation.

[nit] These three blocks seem to never get exercised by the actual measurement
traffic, right ? So the purpose would then be to simply load up the DUT with
them in case the DUT implementation is stupid enough to have these cause
relevant performance impacts even when not exercised by traffic. Would be good
to write this down as a rationale after the table.  Especially because the
"Any" had me confused first that in a real-world deployment you would of course
not include 250 individual application/port/prefixes, but you just have some
simple block-all.

[nit] Even 27 years ago i've seen routers acting as firewalls for universities
that had thousands of such ACL entries. Aka: i think these numbers are way too
low.

438        |Application|Application| Half of the        | allow| 10| 10| 10| 10|
439        |layer      |           | applications       |      |   |   |   |   |
440        |           |           | included in the    |      |   |   |   |   |
441        |           |           | measurement traffic|      |   |   |   |   |
442        |           |           |(see the note below)|      |   |   |   |   |
443        +-------------------------------------------------------------------+
444        |Transport  |SRC IP and | Half of the SRC    | allow| >1| >1| >1| >1|
445        |layer      |TCP/UDP    | IPs used and any   |      |   |   |   |   |
446        |           |DST ports  | DST ports used in  |      |   |   |   |   |
447        |           |           | the measurement    |      |   |   |   |   |
448        |           |           | traffic            |      |   |   |   |   |
449        |           |           | (one rule per      |      |   |   |   |   |
450        |           |           | subnet)            |      |   |   |   |   |
451        +-------------------------------------------------------------------+
452        |IP layer   |SRC IP     | The rest of the    | allow| >1| >1| >1| >1|
453        |           |           | SRC IP subnet      |      |   |   |   |   |
454        |           |           | range used in the  |      |   |   |   |   |
455        |           |           | measurement        |      |   |   |   |   |
456        |           |           | traffic            |      |   |   |   |   |
457        |           |           | (one rule per      |      |   |   |   |   |
458        |           |           | subnet)            |      |   |   |   |   |
459        +-----------+-----------+--------------------+------+---+---+---+---+

[mayor] There should be an explanation of how this is supposed to work, and
it seems there are rules missing:

      rule on row 438 explicitly permits half the traffic sent by the test
      equiment. So supposedly only the other half has to be checked by rule on
      row 444. So when 444 says "Half of the SRC...", is that half of the total
      ? Would that have to be set up so that after 444 we now have 75% of the
      measurement traffic going through ?  Likewise then rule 452 does it bring
      the total amount of permitted traffic to 87.5% ?.

[nit] Ultimately, we only have "allows" here.
      Is there an assumption that after row 459 there is an implicit
      deny-anything-else ? I guess so, but it should be written out explicitly
      in the table.

461                            Figure 3: DUT/SUT Access List

463        Note: If half of the applications included in the measurement traffic
464        is less than 10, the missing number of ACL entries (dummy rules) can
465        be configured for any application traffic not included in the
466        measurement traffic.

468     4.2.1.  Security Effectiveness Configuration

470        The Security features (defined in Table 1 and Table 2) of the DUT/SUT
471        MUST be configured effectively to detect, prevent, and report the
472        defined security vulnerability sets.  This section defines the
473        selection of the security vulnerability sets from Common

[nit] "from the CVE" ?!

474        vulnerabilities and Exposures (CVE) list for the testing.  The

[nit] Add reference for CVE. (Not sure whats best spec, or wikipedia or
cve.org,...)

475        vulnerability set SHOULD reflect a minimum of 500 CVEs from no older
476        than 10 calendar years to the current year.  These CVEs SHOULD be
477        selected with a focus on in-use software commonly found in business
478        applications, with a Common vulnerability Scoring System (CVSS)
479        Severity of High (7-10).

481        This document is primarily focused on performance benchmarking.
482        However, it is RECOMMENDED to validate the security features
483        configuration of the DUT/SUT by evaluating the security effectiveness
484        as a prerequisite for performance benchmarking tests defined in the

[nit]  /in the/in/

485        section 7.  In case the benchmarking tests are performed without
486        evaluating security effectiveness, the test report MUST explain the
487        implications of this.  The methodology for evaluating security
488        effectiveness is defined in Appendix A.

490     4.3.  Test Equipment Configuration

492        In general, test equipment allows configuring parameters in different
493        protocol layers.  These parameters thereby influence the traffic
494        flows which will be offered and impact performance measurements.

496        This section specifies common test equipment configuration parameters
497        applicable for all benchmarking tests defined in Section 7.  Any
498        benchmarking test specific parameters are described under the test
499        setup section of each benchmarking test individually.

501     4.3.1.  Client Configuration

503        This section specifies which parameters SHOULD be considered while
504        configuring clients using test equipment.  Also, this section
505        specifies the RECOMMENDED values for certain parameters.  The values
506        are the defaults used in most of the client operating systems
507        currently.

509     4.3.1.1.  TCP Stack Attributes

511        The TCP stack SHOULD use a congestion control algorithm at client and
512        server endpoints.  The IPv4 and IPv6 Maximum Segment Size (MSS)
513        SHOULD be set to 1460 bytes and 1440 bytes respectively and a TX and
514        RX initial receive windows of 64 KByte.  Client initial congestion
515        window SHOULD NOT exceed 10 times the MSS.  Delayed ACKs are
516        permitted and the maximum client delayed ACK SHOULD NOT exceed 10
517        times the MSS before a forced ACK.  Up to three retries SHOULD be
518        allowed before a timeout event is declared.  All traffic MUST set the
519        TCP PSH flag to high.  The source port range SHOULD be in the range
520        of 1024 - 65535.  Internal timeout SHOULD be dynamically scalable per
521        RFC 793.  The client SHOULD initiate and close TCP connections.  The
522        TCP connection MUST be initiated via a TCP three-way handshake (SYN,
523        SYN/ACK, ACK), and it MUST be closed via either a TCP three-way close
524        (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK).

[nit] Would be nice to have reference to where/how these parameters are
determined. Would be nice to mention why these parameters are choosen. Probably
to reflect the most common current TCP behavior that achieves best performance ?

[minor] The document mentions QUIC in three places, but has no equivalent
section for QUIC here as it has for TCP. I would suggest to add a section here,
even if it can just say "Due to the absence of suficient experience, QUIC
parameters are unspecified. Similarily to TCP, parameters should be choosen
that best reflect state-of-the art performance results for QUIC client/server
traffic".

526     4.3.1.2.  Client IP Address Space

528        The sum of the client IP space SHOULD contain the following
529        attributes.

531        *  The IP blocks SHOULD consist of multiple unique, discontinuous
532           static address blocks.

534        *  A default gateway is permitted.

[comment] How is this relevant, what do you expect it to do ? What would happen
if you just removed it ?

536        *  The DSCP (differentiated services code point) marking is set to DF
537           (Default Forwarding) '000000' on IPv4 Type of Service (ToS) field
538           and IPv6 traffic class field.

540        The following equation can be used to define the total number of
541        client IP addresses that will be configured on the test equipment.

543        Desired total number of client IP = Target throughput [Mbit/s] /
544        Average throughput per IP address [Mbit/s]

546        As shown in the example list below, the value for "Average throughput
547        per IP address" can be varied depending on the deployment and use
548        case scenario.

550        (Option 1)  DUT/SUT deployment scenario 1 : 6-7 Mbit/s per IP (e.g.
551                    1,400-1,700 IPs per 10Gbit/s throughput)

553        (Option 2)  DUT/SUT deployment scenario 2 : 0.1-0.2 Mbit/s per IP
554                    (e.g.  50,000-100,000 IPs per 10Gbit/s throughput)

556        Based on deployment and use case scenario, client IP addresses SHOULD
557        be distributed between IPv4 and IPv6.  The following options MAY be
558        considered for a selection of traffic mix ratio.

560        (Option 1)  100 % IPv4, no IPv6

562        (Option 2)  80 % IPv4, 20% IPv6

564        (Option 3)  50 % IPv4, 50% IPv6

566        (Option 4)  20 % IPv4, 80% IPv6

568        (Option 5)  no IPv4, 100% IPv6

[minor] This guidance is IMHO not very helpfull. It seems to me the first
guidance seems to be that the percentage of IPv4 vs. IPv6 addresses should be
based on the relevant ratio of IPv4 vs. IPv6 traffic in the target deployment
because the way the test setup is done, some N% IPv4 addresses will also
roughly result in N% IPv4 traffic in the test.

That type of explanation might be very helpfull, because the risk is that
readers may think they can derive the percentage of test IPv4/IPv6 addresses
from the ratio of IPv4/IPv6 addresses in the target deployment, but that very
often will not work:

For example in the common dual-stack deployment, every client has an IPv4 and
an IPv6 address, so its 50% IPv4, but the actual percentage of IPv4 traffic
will very much depend on the application scenario. Some enterprises may go up
to 90% or more IPv6 traffic if the main traffic is all newer cloud services
traffic. An vice versa, it could be as little as 10% IPv6 if all the cloud
services are legacy apps in the cloud not supporting IPv6.

570        Note: The IANA has assigned IP address range for the testing purpose
571        as described in Section 8.  If the test scenario requires more IP
572        addresses or subnets than the IANA assigned, this document recommends
573        using non routable Private IPv4 address ranges or Unique Local
574        Address (ULA) IPv6 address ranges for the testing.

[minor] See comments in Section 8. It might be useful to merge the text of this
paragraph with the one in Section 8, else the addressing recommendations are
somewhat split in the middle.

[minor] It would be prudent to add a disclaimer that this document does not
consider to determine whether DUT may emobdy optimizations in performance
behavior for known testing address ranges. Such a disclaimer may be more
general and go on the end of the document, e.g.: before IANA section - no
considerations against DUT optimizations of known test scenarios including
addressing ranges or other test profile specific parameters.

576     4.3.1.3.  Emulated Web Browser Attributes

578        The client emulated web browser (emulated browser) contains
579        attributes that will materially affect how traffic is loaded.  The

[nit] what does "how traffic is loaded" mean ? Rephrase.

580        objective is to emulate modern, typical browser attributes to improve
581        realism of the result set.

[nit] /result set/resulting traffic/ ?

583        For HTTP traffic emulation, the emulated browser MUST negotiate HTTP
584        version 1.1 or higher.  Depending on test scenarios and chosen HTTP
585        version, the emulated browser MAY open multiple TCP connections per
586        Server endpoint IP at any time depending on how many sequential
587        transactions need to be processed.  For HTTP/2 or HTTP/3, the
588        emulated browser MAY open multiple concurrent streams per connection
589        (multiplexing).  HTTP/3 emulated browser uses QUIC ([RFC9000]) as
590        transport protocol.  HTTP settings such as number of connection per
591        server IP, number of requests per connection, and number of streams
592        per connection MUST be documented.  This document refers to [RFC8446]
593        for HTTP/2.  The emulated browser SHOULD advertise a User-Agent
594        header.  The emulated browser SHOULD enforce content length
595        validation.  Depending on test scenarios and selected HTTP version,
596        HTTP header compression MAY be set to enable or disable.  This
597        setting (compression enabled or disabled) MUST be documented in the
598        report.

600        For encrypted traffic, the following attributes SHALL define the
601        negotiated encryption parameters.  The test clients MUST use TLS
602        version 1.2 or higher.  TLS record size MAY be optimized for the

[minor] I would bet SEC review will challenge you to comment on TLS 1.3.
Would make sense to add a sentence stating that the ratio of TLS 1.2 vs TLS 1.3
traffic should be choosen based on expected target deployment and may range
from 100% TLS 1.2 to 100% TLS 1.3. In the absence of known ratios, a 50/50%
ratio is RECOMMENDED.

602        version 1.2 or higher.  TLS record size MAY be optimized for the
603        HTTPS response object size up to a record size of 16 KByte.  If
604        Server Name Indication (SNI) is required in the traffic mix profile,
605        the client endpoint MUST send TLS extension Server Name Indication
606        (SNI) information when opening a security tunnel.  Each client

[minor] SNI is pretty standard today. I would remove the "if" and make the
whole sentence a MUST.

606        (SNI) information when opening a security tunnel.  Each client
607        connection MUST perform a full handshake with server certificate and
608        MUST NOT use session reuse or resumption.

610        The following TLS 1.2 supported ciphers and keys are RECOMMENDED to
611        use for HTTPS based benchmarking tests defined in Section 7.

613        1.  ECDHE-ECDSA-AES128-GCM-SHA256 with Prime256v1 (Signature Hash
614            Algorithm: ecdsa_secp256r1_sha256 and Supported group: secp256r1)

616        2.  ECDHE-RSA-AES128-GCM-SHA256 with RSA 2048 (Signature Hash
617            Algorithm: rsa_pkcs1_sha256 and Supported group: secp256r1)

619        3.  ECDHE-ECDSA-AES256-GCM-SHA384 with Secp521 (Signature Hash
620            Algorithm: ecdsa_secp384r1_sha384 and Supported group: secp521r1)

622        4.  ECDHE-RSA-AES256-GCM-SHA384 with RSA 4096 (Signature Hash
623            Algorithm: rsa_pkcs1_sha384 and Supported group: secp256r1)

625        Note: The above ciphers and keys were those commonly used enterprise
626        grade encryption cipher suites for TLS 1.2.  It is recognized that
627        these will evolve over time.  Individual certification bodies SHOULD
628        use ciphers and keys that reflect evolving use cases.  These choices
629        MUST be documented in the resulting test reports with detailed
630        information on the ciphers and keys used along with reasons for the
631        choices.

633        [RFC8446] defines the following cipher suites for use with TLS 1.3.

635        1.  TLS_AES_128_GCM_SHA256

637        2.  TLS_AES_256_GCM_SHA384

639        3.  TLS_CHACHA20_POLY1305_SHA256

641        4.  TLS_AES_128_CCM_SHA256

643        5.  TLS_AES_128_CCM_8_SHA256

645     4.3.2.  Backend Server Configuration

647        This section specifies which parameters should be considered while
648        configuring emulated backend servers using test equipment.

650     4.3.2.1.  TCP Stack Attributes

652        The TCP stack on the server side SHOULD be configured similar to the
653        client side configuration described in Section 4.3.1.1.  In addition,
654        server initial congestion window MUST NOT exceed 10 times the MSS.
655        Delayed ACKs are permitted and the maximum server delayed ACK MUST
656        NOT exceed 10 times the MSS before a forced ACK.

658     4.3.2.2.  Server Endpoint IP Addressing

660        The sum of the server IP space SHOULD contain the following
661        attributes.

663        *  The server IP blocks SHOULD consist of unique, discontinuous
664           static address blocks with one IP per server Fully Qualified
665           Domain Name (FQDN) endpoint per test port.

[minor] The "per FQDN per test port" is likely underspecified/confusing.
How would you recommend to configure the testbed if the same FQDN may be
reachable across more than one DUT server port and the DUT is doing load
balancing ? If that is not supposed to be considered, then it seems as if every
FQDN is supposed to be reachable across only one DUT port, but then the
sentence ikely should just say "per FQDN" (without the "per test port
qualification"). Not 100% sure...

[minor] Especially for IPv4, there is obviously a big trend in DC to save
IPv4 address space by using SNI. Therefore a realistic scanerio would be
to have more than one FQDN per IPv4 address. Maybe as high as 10:1 (guesswork).
In any case i think it is prudent to include testing of such SNI overload
of IP addresses because it likely can impact performance (demux of processing
state not solely based on 5-tuple).

667        *  A default gateway is permitted.  The DSCP (differentiated services

[minor] Again wondering why default gateway adds value to the doc.

667        *  A default gateway is permitted.  The DSCP (differentiated services
668           code point) marking is set to DF (Default Forwarding) '000000' on
669           IPv4 Type of Service (ToS) field and IPv6 traffic class field.

671        *  The server IP addresses SHOULD be distributed between IPv4 and
672           IPv6 with a ratio identical to the clients distribution ratio.

674        Note: The IANA has assigned IP address range for the testing purpose
675        as described in Section 8.  If the test scenario requires more IP
676        addresses or subnets than the IANA assigned, this document recommends
677        using non routable Private IPv4 address ranges or Unique Local
678        Address (ULA) IPv6 address ranges for the testing.

[minor] same note about moving these addressing recommendations out as in the
client section.

680     4.3.2.3.  HTTP / HTTPS Server Pool Endpoint Attributes

682        The server pool for HTTP SHOULD listen on TCP port 80 and emulate the
683        same HTTP version (HTTP 1.1 or HTTP/2 or HTTP/3) and settings chosen
684        by the client (emulated web browser).  The Server MUST advertise
685        server type in the Server response header [RFC7230].  For HTTPS
686        server, TLS 1.2 or higher MUST be used with a maximum record size of
687        16 KByte and MUST NOT use ticket resumption or session ID reuse.  The
688        server SHOULD listen on TCP port 443 for HTTP version 1.1 and 2.  For
689        HTTP/3 (HTTP over QUIC) the server SHOULD listen on UDP 443.  The
690        server SHALL serve a certificate to the client.  The HTTPS server
691        MUST check host SNI information with the FQDN if SNI is in use.
692        Cipher suite and key size on the server side MUST be configured
693        similar to the client side configuration described in
694        Section 4.3.1.3.

696     4.3.3.  Traffic Flow Definition

698        This section describes the traffic pattern between client and server
699        endpoints.  At the beginning of the test, the server endpoint
700        initializes and will be ready to accept connection states including
701        initialization of the TCP stack as well as bound HTTP and HTTPS
702        servers.  When a client endpoint is needed, it will initialize and be
703        given attributes such as a MAC and IP address.  The behavior of the
704        client is to sweep through the given server IP space, generating a
705        recognizable service by the DUT.  Sequential and pseudorandom sweep
706        methods are acceptable.  The method used MUST be stated in the final
707        report.  Thus, a balanced mesh between client endpoints and server
708        endpoints will be generated in a client IP and port to server IP and
709        port combination.  Each client endpoint performs the same actions as
710        other endpoints, with the difference being the source IP of the
711        client endpoint and the target server IP pool.  The client MUST use
712        the server IP address or FQDN in the host header [RFC7230].

[minor] given the prevalence of SNI centric server selection, i would suggest
to change server IP to server FQDN and note that server IP is simply derived
from server FQDN. Likewise server port is dervice from server protocol, which
seems to be just HTTP or HTTPs, so its unclear to me where we would get ports
different from 80 and 443 (maybe thats mentioned later). Aka: server Port may
not be relevant to mention.

714     4.3.3.1.  Description of Intra-Client Behavior

716        Client endpoints are independent of other clients that are
717        concurrently executing.  When a client endpoint initiates traffic,
718        this section describes how the client steps through different
719        services.  Once the test is initialized, the client endpoints
720        randomly hold (perform no operation) for a few milliseconds for
721        better randomization of the start of client traffic.  Each client
722        will either open a new TCP connection or connect to a TCP persistence
723        stack still open to that specific server.  At any point that the
724        traffic profile may require encryption, a TLS encryption tunnel will
725        form presenting the URL or IP address request to the server.  If
726        using SNI, the server MUST then perform an SNI name check with the
727        proposed FQDN compared to the domain embedded in the certificate.
728        Only when correct, will the server process the HTTPS response object.
729        The initial response object to the server is based on benchmarking
730        tests described in Section 7.  Multiple additional sub-URLs (response
731        objects on the service page) MAY be requested simultaneously.  This
732        MAY be to the same server IP as the initial URL.  Each sub-object
733        will also use a canonical FQDN and URL path, as observed in the
734        traffic mix used.

[minor] This may be necessary to keep the configuration complexity at bay,
but in practice each particular IP client will likely exhibit quite different
traffic profiles. One may continuously request HTTP video segments when
streaming video. Another one may continuously do WebRTC (zoom), and the like.
BY having every client randomnly do all the services (this is what i figure from
above description), you forego the important performance aspect of "worst hit
client" if the DUT exhibits specific issues with specific services (false
filtering, performance degradation etc..). IMHO it would be great if test
equipment could create different client traffic profiles by segmentation of
the possible application space into groups and then assign new clients
randomnly to groups. Beside being able to easier find performance issues,
it is also resulting in more real-world performance, which might be higher.
For example in a multi-core CPU based DUT, there may be heuristics of
assigning different clients traffic to different CPU cores, so that L1..L3
cache of the CPU core can be better kept focussed on the codespace for
a particular type of client inspection. (just guessing).

736     4.3.4.  Traffic Load Profile

738        The loading of traffic is described in this section.  The loading of
739        a traffic load profile has five phases: Init, ramp up, sustain, ramp
740        down, and collection.

742        1.  Init phase: Testbed devices including the client and server
743            endpoints should negotiate layer 2-3 connectivity such as MAC
744            learning and ARP.  Only after successful MAC learning or ARP/ND
745            resolution SHALL the test iteration move to the next phase.  No
746            measurements are made in this phase.  The minimum RECOMMENDED
747            time for Init phase is 5 seconds.  During this phase, the
748            emulated clients SHOULD NOT initiate any sessions with the DUT/
749            SUT, in contrast, the emulated servers should be ready to accept
750            requests from DUT/SUT or from emulated clients.

752        2.  Ramp up phase: The test equipment SHOULD start to generate the
753            test traffic.  It SHOULD use a set of the approximate number of
754            unique client IP addresses to generate traffic.  The traffic
755            SHOULD ramp up from zero to desired target objective.  The target
756            objective is defined for each benchmarking test.  The duration
757            for the ramp up phase MUST be configured long enough that the
758            test equipment does not overwhelm the DUT/SUTs stated performance
759            metrics defined in Section 6.3 namely, TCP Connections Per
760            Second, Inspected Throughput, Concurrent TCP Connections, and
761            Application Transactions Per Second.  No measurements are made in
762            this phase.

764        3.  Sustain phase: Starts when all required clients are active and
765            operating at their desired load condition.  In the sustain phase,
766            the test equipment SHOULD continue generating traffic to constant
767            target value for a constant number of active clients.  The
768            minimum RECOMMENDED time duration for sustain phase is 300
769            seconds.  This is the phase where measurements occur.  The test
770            equipment SHOULD measure and record statistics continuously.  The
771            sampling interval for collecting the raw results and calculating
772            the statistics SHOULD be less than 2 seconds.

774        4.  Ramp down phase: No new connections are established, and no
775            measurements are made.  The time duration for ramp up and ramp
776            down phase SHOULD be the same.

778        5.  Collection phase: The last phase is administrative and will occur
779            when the test equipment merges and collates the report data.

781     5.  Testbed Considerations

783        This section describes steps for a reference test (pre-test) that
784        control the test environment including test equipment, focusing on
785        physical and virtualized environments and as well as test equipments.
786        Below are the RECOMMENDED steps for the reference test.

788        1.  Perform the reference test either by configuring the DUT/SUT in
789            the most trivial setup (fast forwarding) or without presence of

[nit] Define/explain or provide reference for "fast forwarding".

790            the DUT/SUT.

[minor] Is the DUT/SUT assumed to operate as a router or transparent L2 switch ?
Asking because "or without presence" should be amended (IMHO) with mentioning
that instead of the DUT one would put a router or switch in its place that
is pre-loaded with a config equivalent to that of the DUT but without any
seurity functions, just passing traffic at rates to bring the TE to its limits.

792        2.  Generate traffic from traffic generator.  Choose a traffic
793            profile used for HTTP or HTTPS throughput performance test with
794            smallest object size.

796        3.  Ensure that any ancillary switching or routing functions added in
797            the test equipment does not limit the performance by introducing
798            network metrics such as packet loss and latency.  This is
799            specifically important for virtualized components (e.g.,
800            vSwitches, vRouters).

802        4.  Verify that the generated traffic (performance) of the test
803            equipment matches and reasonably exceeds the expected maximum
804            performance of the DUT/SUT.

806        5.  Record the network performance metrics packet loss latency
807            introduced by the test environment (without DUT/SUT).

809        6.  Assert that the testbed characteristics are stable during the
810            entire test session.  Several factors might influence stability
811            specifically, for virtualized testbeds.  For example, additional
812            workloads in a virtualized system, load balancing, and movement
813            of virtual machines during the test, or simple issues such as
814            additional heat created by high workloads leading to an emergency
815            CPU performance reduction.

[minor] Add something to test the performance of the logging system. Without
DUT actually generating logging, this will so far not have been validated.
Maybe TE can generate logging records ? Especially burst logging from DUT
without loss is important to verify (no packet loss of logged events).

817        The reference test SHOULD be performed before the benchmarking tests
818        (described in section 7) start.

820     6.  Reporting

[minor] I would swap section 6 and 7, because it is problematic to read what's
to be reported without knowing whats to be measured first. For example, when i
read 6. first it was not clear to me if/how you would test the performance
limits, so the report data had a lot of questions for me.

Of course when you do run the testbed you first should have read both sections
first.

822        This section describes how the benchmarking test report should be
823        formatted and presented.  It is RECOMMENDED to include two main
824        sections in the report, namely the introduction and the detailed test
825        results sections.

827     6.1.  Introduction

829        The following attributes SHOULD be present in the introduction
830        section of the test report.

[minor] I'd suggest to say here that the test report needs to include all
information sufficient for independent third-party reproduction of the test
setup to permit third party falsification of the test results. This includes
but may not be limited to the following...

832        1.  The time and date of the execution of the tests

834        2.  Summary of testbed software and hardware details
835            a.  DUT/SUT hardware/virtual configuration

837                *  This section SHOULD clearly identify the make and model of
838                   the DUT/SUT

840                *  The port interfaces, including speed and link information

842                *  If the DUT/SUT is a Virtual Network Function (VNF), host
843                   (server) hardware and software details, interface
844                   acceleration type such as DPDK and SR-IOV, used CPU cores,
845                   used RAM, resource sharing (e.g.  Pinning details and NUMA
846                   Node) configuration details, hypervisor version, virtual
847                   switch version

849                *  details of any additional hardware relevant to the DUT/SUT
850                   such as controllers

852            b.  DUT/SUT software

854                *  Operating system name

856                *  Version

858                *  Specific configuration details (if any)

[minor] Any software details necessary and sufficient to preproduce the
software setup of DUT/SUT.

860            c.  DUT/SUT enabled features

862                *  Configured DUT/SUT features (see Table 1 and Table 2)

864                *  Attributes of the above-mentioned features

866                *  Any additional relevant information about the features

868            d.  Test equipment hardware and software

870                *  Test equipment vendor name

872                *  Hardware details including model number, interface type

874                *  Test equipment firmware and test application software
875                   version

877            e.  Key test parameters

879                *  Used cipher suites and keys

881                *  IPv4 and IPv6 traffic distribution
882                *  Number of configured ACL

884            f.  Details of application traffic mix used in the benchmarking
885                test "Throughput Performance with Application Traffic Mix"
886                (Section 7.1)

888                *  Name of applications and layer 7 protocols

890                *  Percentage of emulated traffic for each application and
891                   layer 7 protocols

893                *  Percentage of encrypted traffic and used cipher suites and
894                   keys (The RECOMMENDED ciphers and keys are defined in
895                   Section 4.3.1.3)

897                *  Used object sizes for each application and layer 7
898                   protocols

900        3.  Results Summary / Executive Summary

902            a.  Results SHOULD resemble a pyramid in how it is reported, with
903                the introduction section documenting the summary of results
904                in a prominent, easy to read block.

906     6.2.  Detailed Test Results

908        In the result section of the test report, the following attributes
909        SHOULD be present for each benchmarking test.

911        a.  KPIs MUST be documented separately for each benchmarking test.
912            The format of the KPI metrics SHOULD be presented as described in
913            Section 6.3.

915        b.  The next level of details SHOULD be graphs showing each of these
916            metrics over the duration (sustain phase) of the test.  This
917            allows the user to see the measured performance stability changes
918            over time.

920     6.3.  Benchmarks and Key Performance Indicators

922        This section lists key performance indicators (KPIs) for overall
923        benchmarking tests.  All KPIs MUST be measured during the sustain
924        phase of the traffic load profile described in Section 4.3.4.  All
925        KPIs MUST be measured from the result output of test equipment.

[minor] At some other place of the document i think to remember observing of
DUT self-reporting. Shouldn't then the self-reporting of the DUT be vetted as
well, e.g.: compared against the TE report data ?

927        *  Concurrent TCP Connections
928           The aggregate number of simultaneous connections between hosts
929           across the DUT/SUT, or between hosts and the DUT/SUT (defined in
930           [RFC2647]).

[minor] Add reference to section in rfc2647 where this is defined. Also: If you
refer but not reproduce

932        *  TCP Connections Per Second

934           The average number of successfully established TCP connections per
935           second between hosts across the DUT/SUT, or between hosts and the
936           DUT/SUT.  The TCP connection MUST be initiated via a TCP three-way
937           handshake (SYN, SYN/ACK, ACK).  Then the TCP session data is sent.
938           The TCP session MUST be closed via either a TCP three-way close
939           (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK),
940           and MUST NOT by RST.

942        *  Application Transactions Per Second

944           The average number of successfully completed transactions per
945           second.  For a particular transaction to be considered successful,
946           all data MUST have been transferred in its entirety.  In case of
947           HTTP(S) transactions, it MUST have a valid status code (200 OK),
948           and the appropriate FIN, FIN/ACK sequence MUST have been
949           completed.

951        *  TLS Handshake Rate

953           The average number of successfully established TLS connections per
954           second between hosts across the DUT/SUT, or between hosts and the
955           DUT/SUT.

957        *  Inspected Throughput

959           The number of bits per second of examined and allowed traffic a
960           network security device is able to transmit to the correct
961           destination interface(s) in response to a specified offered load.
962           The throughput benchmarking tests defined in Section 7 SHOULD
963           measure the average Layer 2 throughput value when the DUT/SUT is
964           "inspecting" traffic.  This document recommends presenting the
965           inspected throughput value in Gbit/s rounded to two places of
966           precision with a more specific Kbit/s in parenthesis.

968        *  Time to First Byte (TTFB)

970           TTFB is the elapsed time between the start of sending the TCP SYN
971           packet from the client and the client receiving the first packet
972           of application data from the server or DUT/SUT.  The benchmarking
973           tests HTTP Transaction Latency (Section 7.4) and HTTPS Transaction
974           Latency (Section 7.8) measure the minimum, average and maximum
975           TTFB.  The value SHOULD be expressed in milliseconds.

977        *  URL Response time / Time to Last Byte (TTLB)

979           URL Response time / TTLB is the elapsed time between the start of
980           sending the TCP SYN packet from the client and the client
981           receiving the last packet of application data from the server or
982           DUT/SUT.  The benchmarking tests HTTP Transaction Latency
983           (Section 7.4) and HTTPS Transaction Latency (Section 7.8) measure
984           the minimum, average and maximum TTLB.  The value SHOULD be
985           expressed in millisecond.

[minor] Up to this point i don't think the report would include comparison for
these KPI between no-DUT-present vs. DUT-present. Is that true ? How then is
the reaader of the report meant to be able to vet the relative impact of the
DUT for all these metrics vs. DUT not being present ?

987     7.  Benchmarking Tests

[minor] I think it would be good to insert here some descriptive and
comparative overview of the tests from the different 7.x sections.

For example, i guess (but don't know from the test), that the 7.1 test should ?
perform throughput test for non-http/https applications, or else if all the
applications in 7.1 would be http/https, then it would duplicate the results of
7.3 and 7.7, right ? Not sure though if/where it is written out that you
therefore want a traffic mix of only non-HTTP/HTTPS application traffic for 7.1.

If instead the customer relevant application mix (7.1.1) does include some
percentage of HTTP/HTTP applications, then shouldn't all the tests, even those
focussing on the HTTP/HTTPs characteristic also always include the
non-HTTP/HTTPs application flows as kind of "background" traffic, even if not
measured in the tests of particular 7.x sub-section ?

[minor] Section 7. is a lot of work to get right. I observe that there is a lot
of procedural replication across the steps. It would be easier to read if all
that duplication was removed and described once - such as the
initial/max/iterative step description. But i can understand how much work this
might be, to then especially extraxct only the differences for each 7.x and
only describe those 7.x differences there.

989     7.1.  Throughput Performance with Application Traffic Mix

991     7.1.1.  Objective

993        Using a relevant application traffic mix, determine the sustainable
994        inspected throughput supported by the DUT/SUT.

996        Based on the test customer's specific use case, testers can choose
997        the relevant application traffic mix for this test.  The details
998        about the traffic mix MUST be documented in the report.  At least the
999        following traffic mix details MUST be documented and reported
1000       together with the test results:

1002          Name of applications and layer 7 protocols

1004          Percentage of emulated traffic for each application and layer 7
1005          protocol

1007          Percentage of encrypted traffic and used cipher suites and keys
1008          (The RECOMMENDED ciphers and keys are defined in Section 4.3.1.3.)

1010          Used object sizes for each application and layer 7 protocols

1012    7.1.2.  Test Setup

1014       Testbed setup MUST be configured as defined in Section 4.  Any
1015       benchmarking test specific testbed configuration changes MUST be
1016       documented.

1018    7.1.3.  Test Parameters

1020       In this section, the benchmarking test specific parameters SHOULD be
1021       defined.

1023    7.1.3.1.  DUT/SUT Configuration Parameters

1025       DUT/SUT parameters MUST conform to the requirements defined in
1026       Section 4.2.  Any configuration changes for this specific
1027       benchmarking test MUST be documented.  In case the DUT/SUT is
1028       configured without SSL inspection, the test report MUST explain the
1029       implications of this to the relevant application traffic mix
1030       encrypted traffic.

[nit] /SSL inspection/SSL Inspection/ - capitalized in all other places in the
doc.

[minor] I am not quite familiar with the details, so i hope a ready knows what
the "MUST explain the implication" means.

[minor] What is the equivalent for TLS (inspection), and why is it not equally
mentioned ?

1032    7.1.3.2.  Test Equipment Configuration Parameters

1034       Test equipment configuration parameters MUST conform to the
1035       requirements defined in Section 4.3.  The following parameters MUST
1036       be documented for this benchmarking test:

1038          Client IP address range defined in Section 4.3.1.2

1040          Server IP address range defined in Section 4.3.2.2

1042          Traffic distribution ratio between IPv4 and IPv6 defined in
1043          Section 4.3.1.2

1045          Target inspected throughput: Aggregated line rate of interface(s)
1046          used in the DUT/SUT or the value defined based on requirement for
1047          a specific deployment scenario

[minor] maybe add: or based on DUT specified performance limits (DUT may not
always provide "linerate" throughput, so the ultimate test would be to see
if/how much of the vendor promised performance is reachable.

1049          Initial throughput: 10% of the "Target inspected throughput" Note:
1050          Initial throughput is not a KPI to report.  This value is
1051          configured on the traffic generator and used to perform Step 1:
1052          "Test Initialization and Qualification" described under the
1053          Section 7.1.4.

1055          One of the ciphers and keys defined in Section 4.3.1.3 are
1056          RECOMMENDED to use for this benchmarking test.

1058    7.1.3.3.  Traffic Profile

1060       Traffic profile: This test MUST be run with a relevant application
1061       traffic mix profile.

1063    7.1.3.4.  Test Results Validation Criteria

1065       The following criteria are the test results validation criteria.  The
1066       test results validation criteria MUST be monitored during the whole
1067       sustain phase of the traffic load profile.

1069       a.  Number of failed application transactions (receiving any HTTP
1070           response code other than 200 OK) MUST be less than 0.001% (1 out
1071           of 100,000 transactions) of total attempted transactions.

[minor] So this is the right number, as opposed to the 0.01% in A.4...
If you don't intend to fix A.4 (requested there), pls. explain the reason for
the difference.

1073       b.  Number of Terminated TCP connections due to unexpected TCP RST
1074           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1075           connections) of total initiated TCP connections.

1077    7.1.3.5.  Measurement

1079       Following KPI metrics MUST be reported for this benchmarking test:

1081       Mandatory KPIs (benchmarks): Inspected Throughput, TTFB (minimum,
1082       average, and maximum), TTLB (minimum, average, and maximum) and
1083       Application Transactions Per Second

1085       Note: TTLB MUST be reported along with the object size used in the
1086       traffic profile.

1088       Optional KPIs: TCP Connections Per Second and TLS Handshake Rate

[minor] I would prefer for TCP connections to be mandatory too. Makes it easier
to communicate test data with lower layer folks. FOr example, network layer
equipment often has per 5-tuple flow state also with build/churn-rate limits,
so to match a security SUT with the other networking equipment this TCP
connection rate rate is quite important.

1090    7.1.4.  Test Procedures and Expected Results

1092       The test procedures are designed to measure the inspected throughput
1093       performance of the DUT/SUT at the sustaining period of traffic load
1094       profile.  The test procedure consists of three major steps: Step 1
1095       ensures the DUT/SUT is able to reach the performance value (initial
1096       throughput) and meets the test results validation criteria when it
1097       was very minimally utilized.  Step 2 determines the DUT/SUT is able
1098       to reach the target performance value within the test results
1099       validation criteria.  Step 3 determines the maximum achievable
1100       performance value within the test results validation criteria.

1102       This test procedure MAY be repeated multiple times with different IP
1103       types: IPv4 only, IPv6 only, and IPv4 and IPv6 mixed traffic
1104       distribution.

1106    7.1.4.1.  Step 1: Test Initialization and Qualification

1108       Verify the link status of all connected physical interfaces.  All
1109       interfaces are expected to be in "UP" status.

1111       Configure traffic load profile of the test equipment to generate test
1112       traffic at the "Initial throughput" rate as described in
1113       Section 7.1.3.2.  The test equipment SHOULD follow the traffic load
1114       profile definition as described in Section 4.3.4.  The DUT/SUT SHOULD
1115       reach the "Initial throughput" during the sustain phase.  Measure all
1116       KPI as defined in Section 7.1.3.5.  The measured KPIs during the
1117       sustain phase MUST meet all the test results validation criteria
1118       defined in Section 7.1.3.4.

1120       If the KPI metrics do not meet the test results validation criteria,
1121       the test procedure MUST NOT be continued to step 2.

1123    7.1.4.2.  Step 2: Test Run with Target Objective

1125       Configure test equipment to generate traffic at the "Target inspected
1126       throughput" rate defined in Section 7.1.3.2.  The test equipment
1127       SHOULD follow the traffic load profile definition as described in
1128       Section 4.3.4.  The test equipment SHOULD start to measure and record
1129       all specified KPIs.  Continue the test until all traffic profile
1130       phases are completed.

1132       Within the test results validation criteria, the DUT/SUT is expected
1133       to reach the desired value of the target objective ("Target inspected
1134       throughput") in the sustain phase.  Follow step 3, if the measured
1135       value does not meet the target value or does not fulfill the test
1136       results validation criteria.

1138    7.1.4.3.  Step 3: Test Iteration

1140       Determine the achievable average inspected throughput within the test
1141       results validation criteria.  Final test iteration MUST be performed
1142       for the test duration defined in Section 4.3.4.

1144    7.2.  TCP/HTTP Connections Per Second

1146    7.2.1.  Objective

1148       Using HTTP traffic, determine the sustainable TCP connection
1149       establishment rate supported by the DUT/SUT under different
1150       throughput load conditions.

1152       To measure connections per second, test iterations MUST use different
1153       fixed HTTP response object sizes (the different load conditions)
1154       defined in Section 7.2.3.2.

1156    7.2.2.  Test Setup

1158       Testbed setup SHOULD be configured as defined in Section 4.  Any
1159       specific testbed configuration changes (number of interfaces and
1160       interface type, etc.)  MUST be documented.

1162    7.2.3.  Test Parameters

1164       In this section, benchmarking test specific parameters SHOULD be
1165       defined.

1167    7.2.3.1.  DUT/SUT Configuration Parameters

1169       DUT/SUT parameters MUST conform to the requirements defined in
1170       Section 4.2.  Any configuration changes for this specific
1171       benchmarking test MUST be documented.

1173    7.2.3.2.  Test Equipment Configuration Parameters

1175       Test equipment configuration parameters MUST conform to the
1176       requirements defined in Section 4.3.  The following parameters MUST
1177       be documented for this benchmarking test:

1179       Client IP address range defined in Section 4.3.1.2

1181       Server IP address range defined in Section 4.3.2.2

1183       Traffic distribution ratio between IPv4 and IPv6 defined in
1184       Section 4.3.1.2

1186       Target connections per second: Initial value from product datasheet
1187       or the value defined based on requirement for a specific deployment
1188       scenario

1190       Initial connections per second: 10% of "Target connections per
1191       second" (Note: Initial connections per second is not a KPI to report.
1192       This value is configured on the traffic generator and used to perform
1193       the Step1: "Test Initialization and Qualification" described under
1194       the Section 7.2.4.

1196       The client SHOULD negotiate HTTP and close the connection with FIN
1197       immediately after completion of one transaction.  In each test
1198       iteration, client MUST send GET request requesting a fixed HTTP
1199       response object size.

1201       The RECOMMENDED response object sizes are 1, 2, 4, 16, and 64 KByte.

1203    7.2.3.3.  Test Results Validation Criteria

1205       The following criteria are the test results validation criteria.  The
1206       Test results validation criteria MUST be monitored during the whole
1207       sustain phase of the traffic load profile.

1209       a.  Number of failed application transactions (receiving any HTTP
1210           response code other than 200 OK) MUST be less than 0.001% (1 out
1211           of 100,000 transactions) of total attempted transactions.

1213       b.  Number of terminated TCP connections due to unexpected TCP RST
1214           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1215           connections) of total initiated TCP connections.

1217       c.  During the sustain phase, traffic SHOULD be forwarded at a
1218           constant rate (considered as a constant rate if any deviation of
1219           traffic forwarding rate is less than 5%).

1221       d.  Concurrent TCP connections MUST be constant during steady state
1222           and any deviation of concurrent TCP connections SHOULD be less
1223           than 10%. This confirms the DUT opens and closes TCP connections
1224           at approximately the same rate.

1226    7.2.3.4.  Measurement

1228       TCP Connections Per Second MUST be reported for each test iteration
1229       (for each object size).

[minor] Add Variance or min/max rates to report in case above point d (line
1221) problem does exist ?

1231    7.2.4.  Test Procedures and Expected Results

1233       The test procedure is designed to measure the TCP connections per
1234       second rate of the DUT/SUT at the sustaining period of the traffic
1235       load profile.  The test procedure consists of three major steps: Step
1236       1 ensures the DUT/SUT is able to reach the performance value (Initial
1237       connections per second) and meets the test results validation
1238       criteria when it was very minimally utilized.  Step 2 determines the
1239       DUT/SUT is able to reach the target performance value within the test
1240       results validation criteria.  Step 3 determines the maximum
1241       achievable performance value within the test results validation
1242       criteria.

1244       This test procedure MAY be repeated multiple times with different IP
1245       types: IPv4 only, IPv6 only, and IPv4 and IPv6 mixed traffic
1246       distribution.

1248    7.2.4.1.  Step 1: Test Initialization and Qualification

1250       Verify the link status of all connected physical interfaces.  All
1251       interfaces are expected to be in "UP" status.

1253       Configure the traffic load profile of the test equipment to establish
1254       "Initial connections per second" as defined in Section 7.2.3.2.  The
1255       traffic load profile SHOULD be defined as described in Section 4.3.4.

1257       The DUT/SUT SHOULD reach the "Initial connections per second" before
1258       the sustain phase.  The measured KPIs during the sustain phase MUST
1259       meet all the test results validation criteria defined in
1260       Section 7.2.3.3.

1262       If the KPI metrics do not meet the test results validation criteria,
1263       the test procedure MUST NOT continue to "Step 2".

1265    7.2.4.2.  Step 2: Test Run with Target Objective

1267       Configure test equipment to establish the target objective ("Target
1268       connections per second") defined in Section 7.2.3.2.  The test
1269       equipment SHOULD follow the traffic load profile definition as
1270       described in Section 4.3.4.

1272       During the ramp up and sustain phase of each test iteration, other
1273       KPIs such as inspected throughput, concurrent TCP connections and
1274       application transactions per second MUST NOT reach the maximum value
1275       the DUT/SUT can support.  The test results for specific test
1276       iterations SHOULD NOT be reported, if the above-mentioned KPI
1277       (especially inspected throughput) reaches the maximum value.
1278       (Example: If the test iteration with 64 KByte of HTTP response object
1279       size reached the maximum inspected throughput limitation of the DUT/
1280       SUT, the test iteration MAY be interrupted and the result for 64
1281       KByte SHOULD NOT be reported.)

1283       The test equipment SHOULD start to measure and record all specified
1284       KPIs.  Continue the test until all traffic profile phases are
1285       completed.

1287       Within the test results validation criteria, the DUT/SUT is expected
1288       to reach the desired value of the target objective ("Target
1289       connections per second") in the sustain phase.  Follow step 3, if the
1290       measured value does not meet the target value or does not fulfill the
1291       test results validation criteria.

1293    7.2.4.3.  Step 3: Test Iteration

1295       Determine the achievable TCP connections per second within the test
1296       results validation criteria.

1298    7.3.  HTTP Throughput

1300    7.3.1.  Objective

1302       Determine the sustainable inspected throughput of the DUT/SUT for
1303       HTTP transactions varying the HTTP response object size.

[nit] High level, what is the difference between 7.2 and 7.3 ? Some more
explanation would be useful. One interpretation i came up with is that 7.2
measures performane of e.g.: HTTP connections where each connection performs a
single GET, and 7.3 measures long-lived HTTP connections in which a high rate
of HTTP GET is performed (so as to differentiate transactions at TCP+HTTP level
(7.2) from those only happening at HTTP level (7.3). If that is a lucky guess
it might help other similarily guessing readers to write this out more
explicitly.

1305    7.3.2.  Test Setup

1307       Testbed setup SHOULD be configured as defined in Section 4.  Any
1308       specific testbed configuration changes (number of interfaces and
1309       interface type, etc.)  MUST be documented.

1311    7.3.3.  Test Parameters

1313       In this section, benchmarking test specific parameters SHOULD be
1314       defined.

1316    7.3.3.1.  DUT/SUT Configuration Parameters

1318       DUT/SUT parameters MUST conform to the requirements defined in
1319       Section 4.2.  Any configuration changes for this specific
1320       benchmarking test MUST be documented.

1322    7.3.3.2.  Test Equipment Configuration Parameters

1324       Test equipment configuration parameters MUST conform to the
1325       requirements defined in Section 4.3.  The following parameters MUST
1326       be documented for this benchmarking test:

1328       Client IP address range defined in Section 4.3.1.2

1330       Server IP address range defined in Section 4.3.2.2

1332       Traffic distribution ratio between IPv4 and IPv6 defined in
1333       Section 4.3.1.2

1335       Target inspected throughput: Aggregated line rate of interface(s)
1336       used in the DUT/SUT or the value defined based on requirement for a
1337       specific deployment scenario
1338       Initial throughput: 10% of "Target inspected throughput" Note:
1339       Initial throughput is not a KPI to report.  This value is configured
1340       on the traffic generator and used to perform Step 1: "Test
1341       Initialization and Qualification" described under Section 7.3.4.

1343       Number of HTTP response object requests (transactions) per
1344       connection: 10

1346       RECOMMENDED HTTP response object size: 1, 16, 64, 256 KByte, and
1347       mixed objects defined in Table 4.

1349               +=====================+============================+
1350               | Object size (KByte) | Number of requests/ Weight |
1351               +=====================+============================+
1352               | 0.2                 | 1                          |
1353               +---------------------+----------------------------+
1354               | 6                   | 1                          |
1355               +---------------------+----------------------------+
1356               | 8                   | 1                          |
1357               +---------------------+----------------------------+
1358               | 9                   | 1                          |
1359               +---------------------+----------------------------+
1360               | 10                  | 1                          |
1361               +---------------------+----------------------------+
1362               | 25                  | 1                          |
1363               +---------------------+----------------------------+
1364               | 26                  | 1                          |
1365               +---------------------+----------------------------+
1366               | 35                  | 1                          |
1367               +---------------------+----------------------------+
1368               | 59                  | 1                          |
1369               +---------------------+----------------------------+
1370               | 347                 | 1                          |
1371               +---------------------+----------------------------+

1373                              Table 4: Mixed Objects

[minor] Interesting/useful data. If there was any reference/explanation how
these numbere where derived, that would be great to add.

1375    7.3.3.3.  Test Results Validation Criteria

1377       The following criteria are the test results validation criteria.  The
1378       test results validation criteria MUST be monitored during the whole
1379       sustain phase of the traffic load profile.

1381       a.  Number of failed application transactions (receiving any HTTP
1382           response code other than 200 OK) MUST be less than 0.001% (1 out
1383           of 100,000 transactions) of attempt transactions.

1385       b.  Traffic SHOULD be forwarded at a constant rate (considered as a
1386           constant rate if any deviation of traffic forwarding rate is less
1387           than 5%).

1389       c.  Concurrent TCP connections MUST be constant during steady state
1390           and any deviation of concurrent TCP connections SHOULD be less
1391           than 10%. This confirms the DUT opens and closes TCP connections
1392           at approximately the same rate.

1394    7.3.3.4.  Measurement

1396       Inspected Throughput and HTTP Transactions per Second MUST be
1397       reported for each object size.

1399    7.3.4.  Test Procedures and Expected Results

1401       The test procedure is designed to measure HTTP throughput of the DUT/
1402       SUT.  The test procedure consists of three major steps: Step 1
1403       ensures the DUT/SUT is able to reach the performance value (Initial
1404       throughput) and meets the test results validation criteria when it
1405       was very minimal utilized.  Step 2 determines the DUT/SUT is able to
1406       reach the target performance value within the test results validation
1407       criteria.  Step 3 determines the maximum achievable performance value
1408       within the test results validation criteria.

1410       This test procedure MAY be repeated multiple times with different
1411       IPv4 and IPv6 traffic distribution and HTTP response object sizes.

1413    7.3.4.1.  Step 1: Test Initialization and Qualification

1415       Verify the link status of all connected physical interfaces.  All
1416       interfaces are expected to be in "UP" status.

1418       Configure traffic load profile of the test equipment to establish
1419       "Initial inspected throughput" as defined in Section 7.3.3.2.

1421       The traffic load profile SHOULD be defined as described in
1422       Section 4.3.4.  The DUT/SUT SHOULD reach the "Initial inspected
1423       throughput" during the sustain phase.  Measure all KPI as defined in
1424       Section 7.3.3.4.

1426       The measured KPIs during the sustain phase MUST meet the test results
1427       validation criteria "a" defined in Section 7.3.3.3.  The test results
1428       validation criteria "b" and "c" are OPTIONAL for step 1.

1430       If the KPI metrics do not meet the test results validation criteria,
1431       the test procedure MUST NOT be continued to "Step 2".

1433    7.3.4.2.  Step 2: Test Run with Target Objective

1435       Configure test equipment to establish the target objective ("Target
1436       inspected throughput") defined in Section 7.3.3.2.  The test
1437       equipment SHOULD start to measure and record all specified KPIs.
1438       Continue the test until all traffic profile phases are completed.

1440       Within the test results validation criteria, the DUT/SUT is expected
1441       to reach the desired value of the target objective in the sustain
1442       phase.  Follow step 3, if the measured value does not meet the target
1443       value or does not fulfill the test results validation criteria.

1445    7.3.4.3.  Step 3: Test Iteration

1447       Determine the achievable inspected throughput within the test results
1448       validation criteria and measure the KPI metric Transactions per
1449       Second.  Final test iteration MUST be performed for the test duration
1450       defined in Section 4.3.4.

1452    7.4.  HTTP Transaction Latency

[nit] It would be nice to have explanatory text explaining why 7.4 requires
different test runs as opposed to just measuring the transaction latency as
part of 7.2 and 7.3. I have not tried to compare in detail the descriptions
here to figure out the differences in test runs, but even if there are
differences, why would transaction latency not also be measured in 7.2 and 7.3
as a metric ?

1454    7.4.1.  Objective

1456       Using HTTP traffic, determine the HTTP transaction latency when DUT
1457       is running with sustainable HTTP transactions per second supported by
1458       the DUT/SUT under different HTTP response object sizes.

1460       Test iterations MUST be performed with different HTTP response object
1461       sizes in two different scenarios.  One with a single transaction and
1462       the other with multiple transactions within a single TCP connection.
1463       For consistency both the single and multiple transaction test MUST be
1464       configured with the same HTTP version

1466       Scenario 1: The client MUST negotiate HTTP and close the connection
1467       with FIN immediately after completion of a single transaction (GET
1468       and RESPONSE).

1470       Scenario 2: The client MUST negotiate HTTP and close the connection
1471       FIN immediately after completion of 10 transactions (GET and
1472       RESPONSE) within a single TCP connection.

1474    7.4.2.  Test Setup

1476       Testbed setup SHOULD be configured as defined in Section 4.  Any
1477       specific testbed configuration changes (number of interfaces and
1478       interface type, etc.)  MUST be documented.

1480    7.4.3.  Test Parameters

1482       In this section, benchmarking test specific parameters SHOULD be
1483       defined.

1485    7.4.3.1.  DUT/SUT Configuration Parameters

1487       DUT/SUT parameters MUST conform to the requirements defined in
1488       Section 4.2.  Any configuration changes for this specific
1489       benchmarking test MUST be documented.

1491    7.4.3.2.  Test Equipment Configuration Parameters

1493       Test equipment configuration parameters MUST conform to the
1494       requirements defined in Section 4.3.  The following parameters MUST
1495       be documented for this benchmarking test:

1497       Client IP address range defined in Section 4.3.1.2

1499       Server IP address range defined in Section 4.3.2.2

1501       Traffic distribution ratio between IPv4 and IPv6 defined in
1502       Section 4.3.1.2

1504       Target objective for scenario 1: 50% of the connections per second
1505       measured in benchmarking test TCP/HTTP Connections Per Second
1506       (Section 7.2)

1508       Target objective for scenario 2: 50% of the inspected throughput
1509       measured in benchmarking test HTTP Throughput (Section 7.3)

1511       Initial objective for scenario 1: 10% of "Target objective for
1512       scenario 1"

1514       Initial objective for scenario 2: 10% of "Target objective for
1515       scenario 2"

1517       Note: The Initial objectives are not a KPI to report.  These values
1518       are configured on the traffic generator and used to perform the
1519       Step1: "Test Initialization and Qualification" described under the
1520       Section 7.4.4.

1522       HTTP transaction per TCP connection: Test scenario 1 with single
1523       transaction and test scenario 2 with 10 transactions.

1525       HTTP with GET request requesting a single object.  The RECOMMENDED
1526       object sizes are 1, 16, and 64 KByte.  For each test iteration,
1527       client MUST request a single HTTP response object size.

1529    7.4.3.3.  Test Results Validation Criteria

1531       The following criteria are the test results validation criteria.  The
1532       Test results validation criteria MUST be monitored during the whole
1533       sustain phase of the traffic load profile.

1535       a.  Number of failed application transactions (receiving any HTTP
1536           response code other than 200 OK) MUST be less than 0.001% (1 out
1537           of 100,000 transactions) of attempt transactions.

1539       b.  Number of terminated TCP connections due to unexpected TCP RST
1540           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1541           connections) of total initiated TCP connections.

1543       c.  During the sustain phase, traffic SHOULD be forwarded at a
1544           constant rate (considered as a constant rate if any deviation of
1545           traffic forwarding rate is less than 5%).

1547       d.  Concurrent TCP connections MUST be constant during steady state
1548           and any deviation of concurrent TCP connections SHOULD be less
1549           than 10%. This confirms the DUT opens and closes TCP connections
1550           at approximately the same rate.

1552       e.  After ramp up the DUT MUST achieve the "Target objective" defined
1553           in Section 7.4.3.2 and remain in that state for the entire test
1554           duration (sustain phase).

1556    7.4.3.4.  Measurement

1558       TTFB (minimum, average, and maximum) and TTLB (minimum, average and
1559       maximum) MUST be reported for each object size.

1561    7.4.4.  Test Procedures and Expected Results

1563       The test procedure is designed to measure TTFB or TTLB when the DUT/
1564       SUT is operating close to 50% of its maximum achievable connections
1565       per second or inspected throughput.  The test procedure consists of
1566       two major steps: Step 1 ensures the DUT/SUT is able to reach the
1567       initial performance values and meets the test results validation
1568       criteria when it was very minimally utilized.  Step 2 measures the
1569       latency values within the test results validation criteria.

1571       This test procedure MAY be repeated multiple times with different IP
1572       types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic
1573       distribution), HTTP response object sizes and single and multiple
1574       transactions per connection scenarios.

1576    7.4.4.1.  Step 1: Test Initialization and Qualification

1578       Verify the link status of all connected physical interfaces.  All
1579       interfaces are expected to be in "UP" status.

1581       Configure traffic load profile of the test equipment to establish
1582       "Initial objective" as defined in Section 7.4.3.2.  The traffic load
1583       profile SHOULD be defined as described in Section 4.3.4.

1585       The DUT/SUT SHOULD reach the "Initial objective" before the sustain
1586       phase.  The measured KPIs during the sustain phase MUST meet all the
1587       test results validation criteria defined in Section 7.4.3.3.

1589       If the KPI metrics do not meet the test results validation criteria,
1590       the test procedure MUST NOT be continued to "Step 2".

1592    7.4.4.2.  Step 2: Test Run with Target Objective

1594       Configure test equipment to establish "Target objective" defined in
1595       Section 7.4.3.2.  The test equipment SHOULD follow the traffic load
1596       profile definition as described in Section 4.3.4.

1598       The test equipment SHOULD start to measure and record all specified
1599       KPIs.  Continue the test until all traffic profile phases are
1600       completed.

1602       Within the test results validation criteria, the DUT/SUT MUST reach
1603       the desired value of the target objective in the sustain phase.

1605       Measure the minimum, average, and maximum values of TTFB and TTLB.

1607    7.5.  Concurrent TCP/HTTP Connection Capacity

[nit] again a summary comparison of the traffic in 7.5 vs. the prior traffic
profiles would be helpful to understand the benefit of these test runs. Is this
about any real-world reqirement or more a synthetic performance number for
unrealistic HTTP connections (which would still be a useful number IMHO, just
want to know) ?

The traffic profile below is somewhat strange because
it defines the rate of GET within a TCP connection based not on real-world
application behavior, but just to create some rate of GET per TCP connection
over the steady state. I guess the goal is something like "measure the maximum
sustainable number of TCP/HTTP connctions, wehreas each connection carries as
little as possible traffic and a sufficiently low number of HTTP (GET)
transactions that the DUT is not too much performance loaded with the HTTP
level inspection, but mostly with HTTP/TCP flow maintainance ??

In general, describing for each of the 7.x section upfront the goal and design
criteria of the test runs in those high-level terms is IMHO very beneficial for
reviewers to vet if/how well the detailled description does meet the goals.
Otherwise one is somewhat left puzzling about that question. Aka: enhance the
7.x.1 objective sessions with that amount of details.

1609    7.5.1.  Objective

1611       Determine the number of concurrent TCP connections that the DUT/ SUT
1612       sustains when using HTTP traffic.

1614    7.5.2.  Test Setup

1616       Testbed setup SHOULD be configured as defined in Section 4.  Any
1617       specific testbed configuration changes (number of interfaces and
1618       interface type, etc.)  MUST be documented.

1620    7.5.3.  Test Parameters

1622       In this section, benchmarking test specific parameters SHOULD be
1623       defined.

1625    7.5.3.1.  DUT/SUT Configuration Parameters

1627       DUT/SUT parameters MUST conform to the requirements defined in
1628       Section 4.2.  Any configuration changes for this specific
1629       benchmarking test MUST be documented.

1631    7.5.3.2.  Test Equipment Configuration Parameters

1633       Test equipment configuration parameters MUST conform to the
1634       requirements defined in Section 4.3.  The following parameters MUST
1635       be noted for this benchmarking test:

1637          Client IP address range defined in Section 4.3.1.2

1639          Server IP address range defined in Section 4.3.2.2

1641          Traffic distribution ratio between IPv4 and IPv6 defined in
1642          Section 4.3.1.2

1644          Target concurrent connection: Initial value from product datasheet
1645          or the value defined based on requirement for a specific
1646          deployment scenario.

1648          Initial concurrent connection: 10% of "Target concurrent
1649          connection" Note: Initial concurrent connection is not a KPI to
1650          report.  This value is configured on the traffic generator and
1651          used to perform the Step1: "Test Initialization and Qualification"
1652          described under the Section 7.5.4.

1654          Maximum connections per second during ramp up phase: 50% of
1655          maximum connections per second measured in benchmarking test TCP/
1656          HTTP Connections per second (Section 7.2)

1658          Ramp up time (in traffic load profile for "Target concurrent
1659          connection"): "Target concurrent connection" / "Maximum
1660          connections per second during ramp up phase"

1662          Ramp up time (in traffic load profile for "Initial concurrent
1663          connection"): "Initial concurrent connection" / "Maximum
1664          connections per second during ramp up phase"

1666       The client MUST negotiate HTTP and each client MAY open multiple
1667       concurrent TCP connections per server endpoint IP.

1669       Each client sends 10 GET requests requesting 1 KByte HTTP response
1670       object in the same TCP connection (10 transactions/TCP connection)
1671       and the delay (think time) between each transaction MUST be X
1672       seconds.

1674       X = ("Ramp up time" + "steady state time") /10

1676       The established connections SHOULD remain open until the ramp down
1677       phase of the test.  During the ramp down phase, all connections
1678       SHOULD be successfully closed with FIN.

1680    7.5.3.3.  Test Results Validation Criteria

1682       The following criteria are the test results validation criteria.  The
1683       Test results validation criteria MUST be monitored during the whole
1684       sustain phase of the traffic load profile.

1686       a.  Number of failed application transactions (receiving any HTTP
1687           response code other than 200 OK) MUST be less than 0.001% (1 out
1688           of 100,000 transaction) of total attempted transactions.

1690       b.  Number of terminated TCP connections due to unexpected TCP RST
1691           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1692           connections) of total initiated TCP connections.

1694       c.  During the sustain phase, traffic SHOULD be forwarded at a
1695           constant rate (considered as a constant rate if any deviation of
1696           traffic forwarding rate is less than 5%).

1698    7.5.3.4.  Measurement

1700       Average Concurrent TCP Connections MUST be reported for this
1701       benchmarking test.

1703    7.5.4.  Test Procedures and Expected Results

1705       The test procedure is designed to measure the concurrent TCP
1706       connection capacity of the DUT/SUT at the sustaining period of
1707       traffic load profile.  The test procedure consists of three major
1708       steps: Step 1 ensures the DUT/SUT is able to reach the performance
1709       value (Initial concurrent connection) and meets the test results
1710       validation criteria when it was very minimally utilized.  Step 2
1711       determines the DUT/SUT is able to reach the target performance value
1712       within the test results validation criteria.  Step 3 determines the
1713       maximum achievable performance value within the test results
1714       validation criteria.

1716       This test procedure MAY be repeated multiple times with different
1717       IPv4 and IPv6 traffic distribution.

1719    7.5.4.1.  Step 1: Test Initialization and Qualification

1721       Verify the link status of all connected physical interfaces.  All
1722       interfaces are expected to be in "UP" status.

1724       Configure test equipment to establish "Initial concurrent TCP
1725       connections" defined in Section 7.5.3.2.  Except ramp up time, the
1726       traffic load profile SHOULD be defined as described in Section 4.3.4.

1728       During the sustain phase, the DUT/SUT SHOULD reach the "Initial
1729       concurrent TCP connections".  The measured KPIs during the sustain
1730       phase MUST meet all the test results validation criteria defined in
1731       Section 7.5.3.3.

1733       If the KPI metrics do not meet the test results validation criteria,
1734       the test procedure MUST NOT be continued to "Step 2".

1736    7.5.4.2.  Step 2: Test Run with Target Objective

1738       Configure test equipment to establish the target objective ("Target
1739       concurrent TCP connections").  The test equipment SHOULD follow the
1740       traffic load profile definition (except ramp up time) as described in
1741       Section 4.3.4.

1743       During the ramp up and sustain phase, the other KPIs such as
1744       inspected throughput, TCP connections per second, and application
1745       transactions per second MUST NOT reach the maximum value the DUT/SUT
1746       can support.

1748       The test equipment SHOULD start to measure and record KPIs defined in
1749       Section 7.5.3.4.  Continue the test until all traffic profile phases
1750       are completed.

1752       Within the test results validation criteria, the DUT/SUT is expected
1753       to reach the desired value of the target objective in the sustain
1754       phase.  Follow step 3, if the measured value does not meet the target
1755       value or does not fulfill the test results validation criteria.

1757    7.5.4.3.  Step 3: Test Iteration

1759       Determine the achievable concurrent TCP connections capacity within
1760       the test results validation criteria.

1762    7.6.  TCP/HTTPS Connections per Second

[minor] The one big performance factor that i think is not documented or
suggested to be compared is the cost of certificate (chain) validation for
different key-length certificates used for the TCP/HTTPs connections. The
parameters for TLS 1.2 and TLS 1.3 mentioned in before in the document do not
cover that.  I think it would be prudent to figure out an Internet common
minimum (fastest to process) certificate and a common maximum complexity
certificate. The latter one may simply be when revocation is enabled, e.g.:
checking the server certificate against a revocation list.

Just saying because server certificate verification may monopolise connection
setup performance - unless you want to make the argument that it is irrelevant
because due to the limited number of servers in the test, the DUT is
assumed/known to be able to cache server certificate validation results during
ramput phase so it does become irrelevant during steady state phase. But it
would be at least good to describe this in text.

1763    7.6.1.  Objective

1765       Using HTTPS traffic, determine the sustainable SSL/TLS session
1766       establishment rate supported by the DUT/SUT under different
1767       throughput load conditions.

1769       Test iterations MUST include common cipher suites and key strengths
1770       as well as forward looking stronger keys.  Specific test iterations
1771       MUST include ciphers and keys defined in Section 7.6.3.2.

1773       For each cipher suite and key strengths, test iterations MUST use a
1774       single HTTPS response object size defined in Section 7.6.3.2 to
1775       measure connections per second performance under a variety of DUT/SUT
1776       security inspection load conditions.

1778    7.6.2.  Test Setup

1780       Testbed setup SHOULD be configured as defined in Section 4.  Any
1781       specific testbed configuration changes (number of interfaces and
1782       interface type, etc.)  MUST be documented.

1784    7.6.3.  Test Parameters

1786       In this section, benchmarking test specific parameters SHOULD be
1787       defined.

1789    7.6.3.1.  DUT/SUT Configuration Parameters

1791       DUT/SUT parameters MUST conform to the requirements defined in
1792       Section 4.2.  Any configuration changes for this specific
1793       benchmarking test MUST be documented.

1795    7.6.3.2.  Test Equipment Configuration Parameters

1797       Test equipment configuration parameters MUST conform to the
1798       requirements defined in Section 4.3.  The following parameters MUST
1799       be documented for this benchmarking test:

1801       Client IP address range defined in Section 4.3.1.2

1803       Server IP address range defined in Section 4.3.2.2

1805       Traffic distribution ratio between IPv4 and IPv6 defined in
1806       Section 4.3.1.2

1808       Target connections per second: Initial value from product datasheet
1809       or the value defined based on requirement for a specific deployment
1810       scenario.

1812       Initial connections per second: 10% of "Target connections per
1813       second" Note: Initial connections per second is not a KPI to report.
1814       This value is configured on the traffic generator and used to perform
1815       the Step1: "Test Initialization and Qualification" described under
1816       the Section 7.6.4.

1818       RECOMMENDED ciphers and keys defined in Section 4.3.1.3

1820       The client MUST negotiate HTTPS and close the connection with FIN
1821       immediately after completion of one transaction.  In each test
1822       iteration, client MUST send GET request requesting a fixed HTTPS
1823       response object size.  The RECOMMENDED object sizes are 1, 2, 4, 16,
1824       and 64 KByte.

1826    7.6.3.3.  Test Results Validation Criteria

1828       The following criteria are the test results validation criteria.  The
1829       test results validation criteria MUST be monitored during the whole
1830       test duration.

1832       a.  Number of failed application transactions (receiving any HTTP
1833           response code other than 200 OK) MUST be less than 0.001% (1 out
1834           of 100,000 transactions) of attempt transactions.

1836       b.  Number of terminated TCP connections due to unexpected TCP RST
1837           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1838           connections) of total initiated TCP connections.

1840       c.  During the sustain phase, traffic SHOULD be forwarded at a
1841           constant rate (considered as a constant rate if any deviation of
1842           traffic forwarding rate is less than 5%).

1844       d.  Concurrent TCP connections MUST be constant during steady state
1845           and any deviation of concurrent TCP connections SHOULD be less
1846           than 10%. This confirms the DUT opens and closes TCP connections
1847           at approximately the same rate.

1849    7.6.3.4.  Measurement

1851       TCP connections per second MUST be reported for each test iteration
1852       (for each object size).

1854       The KPI metric TLS Handshake Rate can be measured in the test using 1
1855       KByte object size.

1857    7.6.4.  Test Procedures and Expected Results

1859       The test procedure is designed to measure the TCP connections per
1860       second rate of the DUT/SUT at the sustaining period of traffic load
1861       profile.  The test procedure consists of three major steps: Step 1
1862       ensures the DUT/SUT is able to reach the performance value (Initial
1863       connections per second) and meets the test results validation
1864       criteria when it was very minimally utilized.  Step 2 determines the
1865       DUT/SUT is able to reach the target performance value within the test
1866       results validation criteria.  Step 3 determines the maximum
1867       achievable performance value within the test results validation
1868       criteria.

1870       This test procedure MAY be repeated multiple times with different
1871       IPv4 and IPv6 traffic distribution.

1873    7.6.4.1.  Step 1: Test Initialization and Qualification

1875       Verify the link status of all connected physical interfaces.  All
1876       interfaces are expected to be in "UP" status.

1878       Configure traffic load profile of the test equipment to establish
1879       "Initial connections per second" as defined in Section 7.6.3.2.  The
1880       traffic load profile SHOULD be defined as described in Section 4.3.4.

1882       The DUT/SUT SHOULD reach the "Initial connections per second" before
1883       the sustain phase.  The measured KPIs during the sustain phase MUST
1884       meet all the test results validation criteria defined in
1885       Section 7.6.3.3.

1887       If the KPI metrics do not meet the test results validation criteria,
1888       the test procedure MUST NOT be continued to "Step 2".

1890    7.6.4.2.  Step 2: Test Run with Target Objective

1892       Configure test equipment to establish "Target connections per second"
1893       defined in Section 7.6.3.2.  The test equipment SHOULD follow the
1894       traffic load profile definition as described in Section 4.3.4.

1896       During the ramp up and sustain phase, other KPIs such as inspected
1897       throughput, concurrent TCP connections, and application transactions
1898       per second MUST NOT reach the maximum value the DUT/SUT can support.
1899       The test results for specific test iteration SHOULD NOT be reported,
1900       if the above mentioned KPI (especially inspected throughput) reaches
1901       the maximum value.  (Example: If the test iteration with 64 KByte of
1902       HTTPS response object size reached the maximum inspected throughput
1903       limitation of the DUT, the test iteration MAY be interrupted and the
1904       result for 64 KByte SHOULD NOT be reported).

1906       The test equipment SHOULD start to measure and record all specified
1907       KPIs.  Continue the test until all traffic profile phases are
1908       completed.

1910       Within the test results validation criteria, the DUT/SUT is expected
1911       to reach the desired value of the target objective ("Target
1912       connections per second") in the sustain phase.  Follow step 3, if the
1913       measured value does not meet the target value or does not fulfill the
1914       test results validation criteria.

1916    7.6.4.3.  Step 3: Test Iteration

1918       Determine the achievable connections per second within the test
1919       results validation criteria.

1921    7.7.  HTTPS Throughput

1923    7.7.1.  Objective

1925       Determine the sustainable inspected throughput of the DUT/SUT for
1926       HTTPS transactions varying the HTTPS response object size.

1928       Test iterations MUST include common cipher suites and key strengths
1929       as well as forward looking stronger keys.  Specific test iterations
1930       MUST include the ciphers and keys defined in Section 7.7.3.2.

1932    7.7.2.  Test Setup

1934       Testbed setup SHOULD be configured as defined in Section 4.  Any
1935       specific testbed configuration changes (number of interfaces and
1936       interface type, etc.)  MUST be documented.

1938    7.7.3.  Test Parameters

1940       In this section, benchmarking test specific parameters SHOULD be
1941       defined.

1943    7.7.3.1.  DUT/SUT Configuration Parameters

1945       DUT/SUT parameters MUST conform to the requirements defined in
1946       Section 4.2.  Any configuration changes for this specific
1947       benchmarking test MUST be documented.

1949    7.7.3.2.  Test Equipment Configuration Parameters

1951       Test equipment configuration parameters MUST conform to the
1952       requirements defined in Section 4.3.  The following parameters MUST
1953       be documented for this benchmarking test:

1955       Client IP address range defined in Section 4.3.1.2

1957       Server IP address range defined in Section 4.3.2.2

1959       Traffic distribution ratio between IPv4 and IPv6 defined in
1960       Section 4.3.1.2

1962       Target inspected throughput: Aggregated line rate of interface(s)
1963       used in the DUT/SUT or the value defined based on requirement for a
1964       specific deployment scenario.

1966       Initial throughput: 10% of "Target inspected throughput" Note:
1967       Initial throughput is not a KPI to report.  This value is configured
1968       on the traffic generator and used to perform the Step1: "Test
1969       Initialization and Qualification" described under the Section 7.7.4.

1971       Number of HTTPS response object requests (transactions) per
1972       connection: 10

1974       RECOMMENDED ciphers and keys defined in Section 4.3.1.3

1976       RECOMMENDED HTTPS response object size: 1, 16, 64, 256 KByte, and
1977       mixed objects defined in Table 4 under Section 7.3.3.2.

1979    7.7.3.3.  Test Results Validation Criteria

1981       The following criteria are the test results validation criteria.  The
1982       test results validation criteria MUST be monitored during the whole
1983       sustain phase of the traffic load profile.

1985       a.  Number of failed Application transactions (receiving any HTTP
1986           response code other than 200 OK) MUST be less than 0.001% (1 out
1987           of 100,000 transactions) of attempt transactions.

1989       b.  Traffic SHOULD be forwarded at a constant rate (considered as a
1990           constant rate if any deviation of traffic forwarding rate is less
1991           than 5%).

1993       c.  Concurrent TCP connections MUST be constant during steady state
1994           and any deviation of concurrent TCP connections SHOULD be less
1995           than 10%. This confirms the DUT opens and closes TCP connections
1996           at approximately the same rate.

1998    7.7.3.4.  Measurement

2000       Inspected Throughput and HTTP Transactions per Second MUST be
2001       reported for each object size.

2003    7.7.4.  Test Procedures and Expected Results

2005       The test procedure consists of three major steps: Step 1 ensures the
2006       DUT/SUT is able to reach the performance value (Initial throughput)
2007       and meets the test results validation criteria when it was very
2008       minimally utilized.  Step 2 determines the DUT/SUT is able to reach
2009       the target performance value within the test results validation
2010       criteria.  Step 3 determines the maximum achievable performance value
2011       within the test results validation criteria.

2013       This test procedure MAY be repeated multiple times with different
2014       IPv4 and IPv6 traffic distribution and HTTPS response object sizes.

2016    7.7.4.1.  Step 1: Test Initialization and Qualification

2018       Verify the link status of all connected physical interfaces.  All
2019       interfaces are expected to be in "UP" status.

2021       Configure traffic load profile of the test equipment to establish
2022       "Initial throughput" as defined in Section 7.7.3.2.

2024       The traffic load profile SHOULD be defined as described in
2025       Section 4.3.4.  The DUT/SUT SHOULD reach the "Initial throughput"
2026       during the sustain phase.  Measure all KPI as defined in
2027       Section 7.7.3.4.

2029       The measured KPIs during the sustain phase MUST meet the test results
2030       validation criteria "a" defined in Section 7.7.3.3.  The test results
2031       validation criteria "b" and "c" are OPTIONAL for step 1.

2033       If the KPI metrics do not meet the test results validation criteria,
2034       the test procedure MUST NOT be continued to "Step 2".

2036    7.7.4.2.  Step 2: Test Run with Target Objective

2038       Configure test equipment to establish the target objective ("Target
2039       inspected throughput") defined in Section 7.7.3.2.  The test
2040       equipment SHOULD start to measure and record all specified KPIs.
2041       Continue the test until all traffic profile phases are completed.

2043       Within the test results validation criteria, the DUT/SUT is expected
2044       to reach the desired value of the target objective in the sustain
2045       phase.  Follow step 3, if the measured value does not meet the target
2046       value or does not fulfill the test results validation criteria.

2048    7.7.4.3.  Step 3: Test Iteration

2050       Determine the achievable average inspected throughput within the test
2051       results validation criteria.  Final test iteration MUST be performed
2052       for the test duration defined in Section 4.3.4.

2054    7.8.  HTTPS Transaction Latency

2056    7.8.1.  Objective

2058       Using HTTPS traffic, determine the HTTPS transaction latency when
2059       DUT/SUT is running with sustainable HTTPS transactions per second
2060       supported by the DUT/SUT under different HTTPS response object size.

2062       Scenario 1: The client MUST negotiate HTTPS and close the connection
2063       with FIN immediately after completion of a single transaction (GET
2064       and RESPONSE).

2066       Scenario 2: The client MUST negotiate HTTPS and close the connection
2067       with FIN immediately after completion of 10 transactions (GET and
2068       RESPONSE) within a single TCP connection.

2070    7.8.2.  Test Setup

2072       Testbed setup SHOULD be configured as defined in Section 4.  Any
2073       specific testbed configuration changes (number of interfaces and
2074       interface type, etc.)  MUST be documented.

2076    7.8.3.  Test Parameters

2078       In this section, benchmarking test specific parameters SHOULD be
2079       defined.

2081    7.8.3.1.  DUT/SUT Configuration Parameters

2083       DUT/SUT parameters MUST conform to the requirements defined in
2084       Section 4.2.  Any configuration changes for this specific
2085       benchmarking test MUST be documented.

2087    7.8.3.2.  Test Equipment Configuration Parameters

2089       Test equipment configuration parameters MUST conform to the
2090       requirements defined in Section 4.3.  The following parameters MUST
2091       be documented for this benchmarking test:

2093       Client IP address range defined in Section 4.3.1.2

2095       Server IP address range defined in Section 4.3.2.2
2096       Traffic distribution ratio between IPv4 and IPv6 defined in
2097       Section 4.3.1.2

2099       RECOMMENDED cipher suites and key sizes defined in Section 4.3.1.3

2101       Target objective for scenario 1: 50% of the connections per second
2102       measured in benchmarking test TCP/HTTPS Connections per second
2103       (Section 7.6)

2105       Target objective for scenario 2: 50% of the inspected throughput
2106       measured in benchmarking test HTTPS Throughput (Section 7.7)

2108       Initial objective for scenario 1: 10% of "Target objective for
2109       scenario 1"

2111       Initial objective for scenario 2: 10% of "Target objective for
2112       scenario 2"

2114       Note: The Initial objectives are not a KPI to report.  These values
2115       are configured on the traffic generator and used to perform the
2116       Step1: "Test Initialization and Qualification" described under the
2117       Section 7.8.4.

2119       HTTPS transaction per TCP connection: Test scenario 1 with single
2120       transaction and scenario 2 with 10 transactions

2122       HTTPS with GET request requesting a single object.  The RECOMMENDED
2123       object sizes are 1, 16, and 64 KByte.  For each test iteration,
2124       client MUST request a single HTTPS response object size.

2126    7.8.3.3.  Test Results Validation Criteria

2128       The following criteria are the test results validation criteria.  The
2129       Test results validation criteria MUST be monitored during the whole
2130       sustain phase of the traffic load profile.

2132       a.  Number of failed application transactions (receiving any HTTP
2133           response code other than 200 OK) MUST be less than 0.001% (1 out
2134           of 100,000 transactions) of attempt transactions.

2136       b.  Number of terminated TCP connections due to unexpected TCP RST
2137           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
2138           connections) of total initiated TCP connections.

2140       c.  During the sustain phase, traffic SHOULD be forwarded at a
2141           constant rate (considered as a constant rate if any deviation of
2142           traffic forwarding rate is less than 5%).

2144       d.  Concurrent TCP connections MUST be constant during steady state
2145           and any deviation of concurrent TCP connections SHOULD be less
2146           than 10%. This confirms the DUT opens and closes TCP connections
2147           at approximately the same rate.

2149       e.  After ramp up the DUT/SUT MUST achieve the "Target objective"
2150           defined in the parameter Section 7.8.3.2 and remain in that state
2151           for the entire test duration (sustain phase).

2153    7.8.3.4.  Measurement

2155       TTFB (minimum, average, and maximum) and TTLB (minimum, average and
2156       maximum) MUST be reported for each object size.

2158    7.8.4.  Test Procedures and Expected Results

2160       The test procedure is designed to measure TTFB or TTLB when the DUT/
2161       SUT is operating close to 50% of its maximum achievable connections
2162       per second or inspected throughput.  The test procedure consists of
2163       two major steps: Step 1 ensures the DUT/SUT is able to reach the
2164       initial performance values and meets the test results validation
2165       criteria when it was very minimally utilized.  Step 2 measures the
2166       latency values within the test results validation criteria.

2168       This test procedure MAY be repeated multiple times with different IP
2169       types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic
2170       distribution), HTTPS response object sizes and single, and multiple
2171       transactions per connection scenarios.

2173    7.8.4.1.  Step 1: Test Initialization and Qualification

2175       Verify the link status of all connected physical interfaces.  All
2176       interfaces are expected to be in "UP" status.

2178       Configure traffic load profile of the test equipment to establish
2179       "Initial objective" as defined in the Section 7.8.3.2.  The traffic
2180       load profile SHOULD be defined as described in Section 4.3.4.

2182       The DUT/SUT SHOULD reach the "Initial objective" before the sustain
2183       phase.  The measured KPIs during the sustain phase MUST meet all the
2184       test results validation criteria defined in Section 7.8.3.3.

2186       If the KPI metrics do not meet the test results validation criteria,
2187       the test procedure MUST NOT be continued to "Step 2".

2189    7.8.4.2.  Step 2: Test Run with Target Objective

2191       Configure test equipment to establish "Target objective" defined in
2192       Section 7.8.3.2.  The test equipment SHOULD follow the traffic load
2193       profile definition as described in Section 4.3.4.

2195       The test equipment SHOULD start to measure and record all specified
2196       KPIs.  Continue the test until all traffic profile phases are
2197       completed.

2199       Within the test results validation criteria, the DUT/SUT MUST reach
2200       the desired value of the target objective in the sustain phase.

2202       Measure the minimum, average, and maximum values of TTFB and TTLB.

2204    7.9.  Concurrent TCP/HTTPS Connection Capacity

2206    7.9.1.  Objective

2208       Determine the number of concurrent TCP connections the DUT/SUT
2209       sustains when using HTTPS traffic.

2211    7.9.2.  Test Setup

2213       Testbed setup SHOULD be configured as defined in Section 4.  Any
2214       specific testbed configuration changes (number of interfaces and
2215       interface type, etc.)  MUST be documented.

2217    7.9.3.  Test Parameters

2219       In this section, benchmarking test specific parameters SHOULD be
2220       defined.

2222    7.9.3.1.  DUT/SUT Configuration Parameters

2224       DUT/SUT parameters MUST conform to the requirements defined in
2225       Section 4.2.  Any configuration changes for this specific
2226       benchmarking test MUST be documented.

2228    7.9.3.2.  Test Equipment Configuration Parameters

2230       Test equipment configuration parameters MUST conform to the
2231       requirements defined in Section 4.3.  The following parameters MUST
2232       be documented for this benchmarking test:

2234          Client IP address range defined in Section 4.3.1.2

2236          Server IP address range defined in Section 4.3.2.2
2237          Traffic distribution ratio between IPv4 and IPv6 defined in
2238          Section 4.3.1.2

2240          RECOMMENDED cipher suites and key sizes defined in Section 4.3.1.3

2242          Target concurrent connections: Initial value from product
2243          datasheet or the value defined based on requirement for a specific
2244          deployment scenario.

2246          Initial concurrent connections: 10% of "Target concurrent
2247          connections" Note: Initial concurrent connection is not a KPI to
2248          report.  This value is configured on the traffic generator and
2249          used to perform the Step1: "Test Initialization and Qualification"
2250          described under the Section 7.9.4.

2252          Connections per second during ramp up phase: 50% of maximum
2253          connections per second measured in benchmarking test TCP/HTTPS
2254          Connections per second (Section 7.6)

2256          Ramp up time (in traffic load profile for "Target concurrent
2257          connections"): "Target concurrent connections" / "Maximum
2258          connections per second during ramp up phase"

2260          Ramp up time (in traffic load profile for "Initial concurrent
2261          connections"): "Initial concurrent connections" / "Maximum
2262          connections per second during ramp up phase"

2264       The client MUST perform HTTPS transaction with persistence and each
2265       client can open multiple concurrent TCP connections per server
2266       endpoint IP.

2268       Each client sends 10 GET requests requesting 1 KByte HTTPS response
2269       objects in the same TCP connections (10 transactions/TCP connection)
2270       and the delay (think time) between each transaction MUST be X
2271       seconds.

2273       X = ("Ramp up time" + "steady state time") /10

2275       The established connections SHOULD remain open until the ramp down
2276       phase of the test.  During the ramp down phase, all connections
2277       SHOULD be successfully closed with FIN.

2279    7.9.3.3.  Test Results Validation Criteria

2281       The following criteria are the test results validation criteria.  The
2282       Test results validation criteria MUST be monitored during the whole
2283       sustain phase of the traffic load profile.

2285       a.  Number of failed application transactions (receiving any HTTP
2286           response code other than 200 OK) MUST be less than 0.001% (1 out
2287           of 100,000 transactions) of total attempted transactions.

2289       b.  Number of terminated TCP connections due to unexpected TCP RST
2290           sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
2291           connections) of total initiated TCP connections.

2293       c.  During the sustain phase, traffic SHOULD be forwarded at a
2294           constant rate (considered as a constant rate if any deviation of
2295           traffic forwarding rate is less than 5%).

2297    7.9.3.4.  Measurement

2299       Average Concurrent TCP Connections MUST be reported for this
2300       benchmarking test.

2302    7.9.4.  Test Procedures and Expected Results

2304       The test procedure is designed to measure the concurrent TCP
2305       connection capacity of the DUT/SUT at the sustaining period of
2306       traffic load profile.  The test procedure consists of three major
2307       steps: Step 1 ensures the DUT/SUT is able to reach the performance
2308       value (Initial concurrent connection) and meets the test results
2309       validation criteria when it was very minimally utilized.  Step 2
2310       determines the DUT/SUT is able to reach the target performance value
2311       within the test results validation criteria.  Step 3 determines the
2312       maximum achievable performance value within the test results
2313       validation criteria.

2315       This test procedure MAY be repeated multiple times with different
2316       IPv4 and IPv6 traffic distribution.

2318    7.9.4.1.  Step 1: Test Initialization and Qualification

2320       Verify the link status of all connected physical interfaces.  All
2321       interfaces are expected to be in "UP" status.

2323       Configure test equipment to establish "Initial concurrent TCP
2324       connections" defined in Section 7.9.3.2.  Except ramp up time, the
2325       traffic load profile SHOULD be defined as described in Section 4.3.4.

2327       During the sustain phase, the DUT/SUT SHOULD reach the "Initial
2328       concurrent TCP connections".  The measured KPIs during the sustain
2329       phase MUST meet the test results validation criteria "a" and "b"
2330       defined in Section 7.9.3.3.

2332       If the KPI metrics do not meet the test results validation criteria,
2333       the test procedure MUST NOT be continued to "Step 2".

2335    7.9.4.2.  Step 2: Test Run with Target Objective

2337       Configure test equipment to establish the target objective ("Target
2338       concurrent TCP connections").  The test equipment SHOULD follow the
2339       traffic load profile definition (except ramp up time) as described in
2340       Section 4.3.4.

2342       During the ramp up and sustain phase, the other KPIs such as
2343       inspected throughput, TCP connections per second, and application
2344       transactions per second MUST NOT reach to the maximum value that the
2345       DUT/SUT can support.

2347       The test equipment SHOULD start to measure and record KPIs defined in
2348       Section 7.9.3.4.  Continue the test until all traffic profile phases
2349       are completed.

2351       Within the test results validation criteria, the DUT/SUT is expected
2352       to reach the desired value of the target objective in the sustain
2353       phase.  Follow step 3, if the measured value does not meet the target
2354       value or does not fulfill the test results validation criteria.

2356    7.9.4.3.  Step 3: Test Iteration

2358       Determine the achievable concurrent TCP connections within the test
2359       results validation criteria.

[mayor] I would really love to see DUT power consumption numbers captured and
reported for the 10% and the maximum achieved rates for the 7.x tests (during
steady state).

Energy consumption is becoming a more and more important factor in networking,
and all the high-touch operations of security devices are amongst the most
power/compute hungry operations of any network device, but with a wide variety
depending on how its implemented. Its also extremely simple to just plug a
power-meter into the supply line of the DUT.

This would encourage DUT vendors to reduce power consumption, something that
often can be achieved by just selecting appropriate components (lowest power
CPU options, going FPGA etc.. routes).

Personally, i am of course also interested in easily derived performance
factors such as comparing 100% power consumption for the HTTP vs. HTTPS case -
cost of end-to-end security that is. If a DUT just shows linerate for both HTTP
and HTTPS, but with double the power consumption when using HTTPs, that may
even impact deployment - even in small cases with a small 19" rack, some
ventilation and some amount of power - 100..500W makes a difference whethre its
100 or 500W.

2361    8.  IANA Considerations

2363       This document makes no specific request of IANA.

2365       The IANA has assigned IPv4 and IPv6 address blocks in [RFC6890] that
2366       have been registered for special purposes.  The IPv6 address block
2367       2001:2::/48 has been allocated for the purpose of IPv6 Benchmarking
2368       [RFC5180] and the IPv4 address block 198.18.0.0/15 has been allocated
2369       for the purpose of IPv4 Benchmarking [RFC2544].  This assignment was
2370       made to minimize the chance of conflict in case a testing device were
2371       to be accidentally connected to part of the Internet.

[minor] I don't think the secnd paragraph belongs into an IANA considerations
section. This section is usually resesrved only for actions IANA is supposed to
take for this document. I would suggest to move this paragraph to an earlier
section, maybe even simply make one up "Addressing for tests".

2373    9.  Security Considerations

2375       The primary goal of this document is to provide benchmarking
2376       terminology and methodology for next-generation network security
2377       devices for use in a laboratory isolated test environment.  However,
2378       readers should be aware that there is some overlap between
2379       performance and security issues.  Specifically, the optimal
2380       configuration for network security device performance may not be the
2381       most secure, and vice-versa.  The cipher suites recommended in this
2382       document are for test purpose only.  The cipher suite recommendation
2383       for a real deployment is outside the scope of this document.

2385    10.  Contributors

2387       The following individuals contributed significantly to the creation
2388       of this document:

2390       Alex Samonte, Amritam Putatunda, Aria Eslambolchizadeh, Chao Guo,
2391       Chris Brown, Cory Ford, David DeSanto, Jurrie Van Den Breekel,
2392       Michelle Rhines, Mike Jack, Ryan Liles, Samaresh Nair, Stephen
2393       Goudreault, Tim Carlin, and Tim Otto.

2395    11.  Acknowledgements

2397       The authors wish to acknowledge the members of NetSecOPEN for their
2398       participation in the creation of this document.  Additionally, the
2399       following members need to be acknowledged:

2401       Anand Vijayan, Chris Marshall, Jay Lindenauer, Michael Shannon, Mike
2402       Deichman, Ryan Riese, and Toulnay Orkun.

2404    12.  References

2406    12.1.  Normative References

2408       [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2409                  Requirement Levels", BCP 14, RFC 2119,
2410                  DOI 10.17487/RFC2119, March 1997,
2411                  <https://www.rfc-editor.org/info/rfc2119>.

2413       [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2414                  2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2415                  May 2017, <https://www.rfc-editor.org/info/rfc8174>.

2417    12.2.  Informative References

2419       [RFC2544]  Bradner, S. and J. McQuaid, "Benchmarking Methodology for
2420                  Network Interconnect Devices", RFC 2544,
2421                  DOI 10.17487/RFC2544, March 1999,
2422                  <https://www.rfc-editor.org/info/rfc2544>.

2424       [RFC2647]  Newman, D., "Benchmarking Terminology for Firewall
2425                  Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999,
2426                  <https://www.rfc-editor.org/info/rfc2647>.

2428       [RFC3511]  Hickman, B., Newman, D., Tadjudin, S., and T. Martin,
2429                  "Benchmarking Methodology for Firewall Performance",
2430                  RFC 3511, DOI 10.17487/RFC3511, April 2003,
2431                  <https://www.rfc-editor.org/info/rfc3511>.

2433       [RFC5180]  Popoviciu, C., Hamza, A., Van de Velde, G., and D.
2434                  Dugatkin, "IPv6 Benchmarking Methodology for Network
2435                  Interconnect Devices", RFC 5180, DOI 10.17487/RFC5180, May
2436                  2008, <https://www.rfc-editor.org/info/rfc5180>.

2438       [RFC6815]  Bradner, S., Dubray, K., McQuaid, J., and A. Morton,
2439                  "Applicability Statement for RFC 2544: Use on Production
2440                  Networks Considered Harmful", RFC 6815,
2441                  DOI 10.17487/RFC6815, November 2012,
2442                  <https://www.rfc-editor.org/info/rfc6815>.

2444       [RFC6890]  Cotton, M., Vegoda, L., Bonica, R., Ed., and B. Haberman,
2445                  "Special-Purpose IP Address Registries", BCP 153,
2446                  RFC 6890, DOI 10.17487/RFC6890, April 2013,
2447                  <https://www.rfc-editor.org/info/rfc6890>.

2449       [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
2450                  Protocol (HTTP/1.1): Message Syntax and Routing",
2451                  RFC 7230, DOI 10.17487/RFC7230, June 2014,
2452                  <https://www.rfc-editor.org/info/rfc7230>.

2454       [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
2455                  Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
2456                  <https://www.rfc-editor.org/info/rfc8446>.

2458       [RFC9000]  Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
2459                  Multiplexed and Secure Transport", RFC 9000,
2460                  DOI 10.17487/RFC9000, May 2021,
2461                  <https://www.rfc-editor.org/info/rfc9000>.

2463    Appendix A.  Test Methodology - Security Effectiveness Evaluation

[nit] /Evaluation/Test/ - called test in the rest of this doc.

2464    A.1.  Test Objective

2466       This test methodology verifies the DUT/SUT is able to detect,

[nit] /verifies the/ verifies that the/

2467       prevent, and report the vulnerabilities.

2469       In this test, background test traffic will be generated to utilize
2470       the DUT/SUT.  In parallel, the CVEs will be sent to the DUT/SUT as
2471       encrypted and as well as clear text payload formats using a traffic
2472       generator.  The selection of the CVEs is described in Section 4.2.1.

2474       The following KPIs are measured in this test:

2476       *  Number of blocked CVEs

2478       *  Number of bypassed (nonblocked) CVEs

2480       *  Background traffic performance (verify if the background traffic
2481          is impacted while sending CVE toward DUT/SUT)

2483       *  Accuracy of DUT/SUT statistics in term of vulnerabilities
2484          reporting

2486    A.2.  Testbed Setup

2488       The same testbed MUST be used for security effectiveness test and as
2489       well as for benchmarking test cases defined in Section 7.

2491    A.3.  Test Parameters

2493       In this section, the benchmarking test specific parameters SHOULD be
2494       defined.

[nit] /SHOULD/are/ - a requirement against the authors of the document to write
desirable text in the document is not normative.

2496    A.3.1.  DUT/SUT Configuration Parameters

2498       DUT/SUT configuration parameters MUST conform to the requirements
2499       defined in Section 4.2.  The same DUT configuration MUST be used for
2500       Security effectiveness test and as well as for benchmarking test
2501       cases defined in Section 7.  The DUT/SUT MUST be configured in inline
2502       mode and all detected attack traffic MUST be dropped and the session

[nit] /detected traffic/detected CVE traffic/ - there is also background
traffic, which i guess shouldnot be dropped, right ?

[nit] /the session/its session/ ?

2503       SHOULD be reset

2505    A.3.2.  Test Equipment Configuration Parameters

2507       Test equipment configuration parameters MUST conform to the
2508       requirements defined in Section 4.3.  The same client and server IP
2509       ranges MUST be configured as used in the benchmarking test cases.  In
2510       addition, the following parameters MUST be documented for this
2511       benchmarking test:

2513       *  Background Traffic: 45% of maximum HTTP throughput and 45% of
2514          Maximum HTTPS throughput supported by the DUT/SUT (measured with
2515          object size 64 KByte in the benchmarking tests "HTTP(S)
2516          Throughput" defined in Section 7.3 and Section 7.7).

[nit] RECOMMENDED Background Traffic ?

2518       *  RECOMMENDED CVE traffic transmission Rate: 10 CVEs per second

2520       *  It is RECOMMENDED to generate each CVE multiple times
2521          (sequentially) at 10 CVEs per second

2523       *  Ciphers and keys for the encrypted CVE traffic MUST use the same
2524          cipher configured for HTTPS traffic related benchmarking tests
2525          (Section 7.6 - Section 7.9)

2527    A.4.  Test Results Validation Criteria

2529       The following criteria are the test results validation criteria.  The
2530       test results validation criteria MUST be monitored during the whole
2531       test duration.

[nit] /criteria are/lists/ - duplication of criteria in sentence.

2533       a.  Number of failed application transaction in the background
2534           traffic MUST be less than 0.01% of attempted transactions.

2536       b.  Number of terminated TCP connections of the background traffic
2537           (due to unexpected TCP RST sent by DUT/SUT) MUST be less than
2538           0.01% of total initiated TCP connections in the background
2539           traffic.

[comment] That is quite high. Shouldn't this at least be 5 nines of
success ? 99.999% -> 0.001% maximum rate of errors ? I thought thats the common
lore service provider product quality requirement minimum.

2541       c.  During the sustain phase, traffic SHOULD be forwarded at a
2542           constant rate (considered as a constant rate if any deviation of
2543           traffic forwarding rate is less than 5%).

[minor] This seems underspecified. I guess in the ideally behaving DUT case
all background traffic is passed unmodified and all CVE connection traffic is
dropped. So the total amount of traffic with CVE events must be configured to
be less then 5% ?! What additional information would this 5% tell me that i do
not already get from a. and b. ? E.g.: if i fail some background connection,
then the impact depends on how big that connection would have been, but it
doesn't seem as if i get new information if a big NetFlix background flow got
killed and therefore 5 Gigabyte less background traffic where observed, or if
the same happened to a 200KByte amazon shopping connection. It would just cause
DUT to maybe do less inspection on big flows in fear of triggering false resets
on them ?? Is that what we want from DUTs ?

2545       d.  False positive MUST NOT occur in the background traffic.

[comment]  I do not understand d. When a background transaction from a. fails,
how is that different from false-positively being classified as a CVE - it would
be droppen then, right ? Or are you saying that a./b. is the case that the
background traffic receives errors from the DUT even though the DUT does NOT
recognize it as a CVE ?  Any example reason why that would happen ?

2547    A.5.  Measurement

2549       Following KPI metrics MUST be reported for this test scenario:

2551       Mandatory KPIs:

2553       *  Blocked CVEs: It SHOULD be represented in the following ways:

2555          -  Number of blocked CVEs out of total CVEs

2557          -  Percentage of blocked CVEs

2559       *  Unblocked CVEs: It SHOULD be represented in the following ways:

2561          -  Number of unblocked CVEs out of total CVEs

2563          -  Percentage of unblocked CVEs

2565       *  Background traffic behavior: It SHOULD be represented one of the
2566          followings ways:

2568          -  No impact: Considered as "no impact'" if any deviation of
2569             traffic forwarding rate is less than or equal to 5 % (constant
2570             rate)

2572          -  Minor impact: Considered as "minor impact" if any deviation of
2573             traffic forwarding rate is greater than 5% and less than or
2574             equal to10% (i.e. small spikes)

2576          -  Heavily impacted: Considered as "Heavily impacted" if any
2577             deviation of traffic forwarding rate is greater than 10% (i.e.
2578             large spikes) or reduced the background HTTP(S) throughput
2579             greater than 10%

[minor] I would prefer reporting the a./b. numbers, e.g.: percentage of
failed background connections. As mentioned before, i find the total background
traffic rate impact a rather problematic/less valuable metric.

2581       *  DUT/SUT reporting accuracy: DUT/SUT MUST report all detected
2582          vulnerabilities.

2584       Optional KPIs:

2586       *  List of unblocked CVEs

[minor] I think this KPI is a SHOULD or even MUST. Otherwise one can not trace
security impacts (when one does not know which CVE it is). This is still the
security effectiveness appendix, and reporting is not effective without this.

2588    A.6.  Test Procedures and Expected Results

2590       The test procedure is designed to measure the security effectiveness
2591       of the DUT/SUT at the sustaining period of the traffic load profile.
2592       The test procedure consists of two major steps.  This test procedure
2593       MAY be repeated multiple times with different IPv4 and IPv6 traffic
2594       distribution.

2596    A.6.1.  Step 1: Background Traffic

2598       Generate background traffic at the transmission rate defined in
2599       Appendix A.3.2.

2601       The DUT/SUT MUST reach the target objective (HTTP(S) throughput) in
2602       sustain phase.  The measured KPIs during the sustain phase MUST meet
2603       all the test results validation criteria defined in Appendix A.4.

2605       If the KPI metrics do not meet the acceptance criteria, the test
2606       procedure MUST NOT be continued to "Step 2".

2608    A.6.2.  Step 2: CVE Emulation

2610       While generating background traffic (in sustain phase), send the CVE
2611       traffic as defined in the parameter section.

2613       The test equipment SHOULD start to measure and record all specified
2614       KPIs.  Continue the test until all CVEs are sent.

2616       The measured KPIs MUST meet all the test results validation criteria
2617       defined in Appendix A.4.

2619       In addition, the DUT/SUT SHOULD report the vulnerabilities correctly.

2621    Appendix B.  DUT/SUT Classification

2623       This document aims to classify the DUT/SUT in four different
2624       categories based on its maximum supported firewall throughput
2625       performance number defined in the vendor datasheet.  This
2626       classification MAY help user to determine specific configuration
2627       scale (e.g., number of ACL entries), traffic profiles, and attack
2628       traffic profiles, scaling those proportionally to DUT/SUT sizing
2629       category.

2631       The four different categories are Extra Small (XS), Small (S), Medium
2632       (M), and Large (L).  The RECOMMENDED throughput values for the
2633       following categories are:

2635       Extra Small (XS) - Supported throughput less than or equal to1Gbit/s

2637       Small (S) - Supported throughput greater than 1Gbit/s and less than
2638       or equal to 5Gbit/s

2640       Medium (M) - Supported throughput greater than 5Gbit/s and less than
2641       or equal to10Gbit/s

2643       Large (L) - Supported throughput greater than 10Gbit/s

2645    Authors' Addresses

2647       Balamuhunthan Balarajah
2648       Berlin
2649       Germany

2651       Email: bm.balarajah@gmail.com
2652       Carsten Rossenhoevel
2653       EANTC AG
2654       Salzufer 14
2655       10587 Berlin
2656       Germany

2658       Email: cross@eantc.de

2660       Brian Monkman
2661       NetSecOPEN
2662       417 Independence Court
2663       Mechanicsburg, PA 17050
2664       United States of America

2666       Email: bmonkman@netsecopen.org

EOF
Telechat Review of draft-ietf-bmwg-ngfw-performance-13 review-ietf-bmwg-ngfw-performance-13-iotdir-telechat-eckert-2022-01-30-00

Telechat Review of draft-ietf-bmwg-ngfw-performance-13
review-ietf-bmwg-ngfw-performance-13-iotdir-telechat-eckert-2022-01-30-00