Skip to main content

Considerations for Benchmarking Network Performance in Containerized Infrastructures
draft-dcn-bmwg-containerized-infra-11

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Trần Minh Ngọc , Sridhar Rao , Jangwon Lee , Younghan Kim
Last updated 2023-07-04 (Latest revision 2023-03-12)
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-dcn-bmwg-containerized-infra-11
Benchmarking Methodology Working Group                           N. Tran
Internet-Draft                                       Soongsil University
Intended status: Informational                                    S. Rao
Expires: 6 January 2024                             The Linux Foundation
                                                                  J. Lee
                                                                  Y. Kim
                                                     Soongsil University
                                                             5 July 2023

  Considerations for Benchmarking Network Performance in Containerized
                            Infrastructures
                 draft-dcn-bmwg-containerized-infra-11

Abstract

   Recently, the Benchmarking Methodology Working Group has extended the
   laboratory characterization from physical network functions (PNFs) to
   virtual network functions (VNFs).  Considering the network function
   implementation trend moving from virtual machine-based to container-
   based, system configurations and deployment scenarios for
   benchmarking will be partially changed by how the resource allocation
   and network technologies are specified for containerized VNFs.  This
   draft describes additional considerations for benchmarking network
   performance when network functions are containerized and performed in
   general-purpose hardware.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 6 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Tran, et al.             Expires 6 January 2024                 [Page 1]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Containerized Infrastructure Overview . . . . . . . . . . . .   4
   4.  Benchmarking Considerations . . . . . . . . . . . . . . . . .   5
     4.1.  Networking Models . . . . . . . . . . . . . . . . . . . .   5
       4.1.1.  Kernel-space non-Acceleration Model . . . . . . . . .   6
       4.1.2.  User-space Acceleration Model . . . . . . . . . . . .   7
       4.1.3.  eBPF Acceleration Model . . . . . . . . . . . . . . .   8
       4.1.4.  Smart-NIC Acceleration Model  . . . . . . . . . . . .  13
       4.1.5.  Model Combination . . . . . . . . . . . . . . . . . .  14
     4.2.  Resources Configuration . . . . . . . . . . . . . . . . .  15
       4.2.1.  CPU Isolation / NUMA Affinity . . . . . . . . . . . .  15
       4.2.2.  Pod Hugepages . . . . . . . . . . . . . . . . . . . .  16
       4.2.3.  Pod CPU Cores and Memory Allocation . . . . . . . . .  16
       4.2.4.  Service Function Chaining . . . . . . . . . . . . . .  17
       4.2.5.  Additional Considerations . . . . . . . . . . . . . .  17
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  18
     6.1.  Informative References  . . . . . . . . . . . . . . . . .  18
   Appendix A.  Benchmarking Experience (Networking Models)  . . . .  21
     A.1.  Benchmarking Environment  . . . . . . . . . . . . . . . .  21
     A.2.  Benchmarking Results  . . . . . . . . . . . . . . . . . .  24
   Appendix B.  Benchmarking Experience (Resources Configuration in
           Single Pod Scenario)  . . . . . . . . . . . . . . . . . .  25
     B.1.  Benchmarking Environment  . . . . . . . . . . . . . . . .  25
     B.2.  Benchmarking Results  . . . . . . . . . . . . . . . . . .  27
   Appendix C.  Benchmarking Experience (Networking Model Combination
           and Resources Configuration in Multi-Pod Scenario)  . . .  28
     C.1.  Benchmarking Environment  . . . . . . . . . . . . . . . .  28
     C.2.  Benchmarking Results  . . . . . . . . . . . . . . . . . .  30
   Appendix D.  Change Log (to be removed by RFC Editor before
           publication)  . . . . . . . . . . . . . . . . . . . . . .  31
     D.1.  Since draft-dcn-bmwg-containerized-infra-10 . . . . . . .  32
     D.2.  Since draft-dcn-bmwg-containerized-infra-09 . . . . . . .  32
     D.3.  Since draft-dcn-bmwg-containerized-infra-08 . . . . . . .  32
     D.4.  Since draft-dcn-bmwg-containerized-infra-07 . . . . . . .  33
     D.5.  Since draft-dcn-bmwg-containerized-infra-06 . . . . . . .  33

Tran, et al.             Expires 6 January 2024                 [Page 2]
Internet-Draft      Benchmarking Containerized Infra           July 2023

     D.6.  Since draft-dcn-bmwg-containerized-infra-05 . . . . . . .  33
     D.7.  Since draft-dcn-bmwg-containerized-infra-04 . . . . . . .  34
     D.8.  Since draft-dcn-bmwg-containerized-infra-03 . . . . . . .  34
     D.9.  Since draft-dcn-bmwg-containerized-infra-02 . . . . . . .  34
     D.10. Since draft-dcn-bmwg-containerized-infra-01 . . . . . . .  34
     D.11. Since draft-dcn-bmwg-containerized-infra-00 . . . . . . .  34
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .  35
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  35
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35

1.  Introduction

   The Benchmarking Methodology Working Group(BMWG) has recently
   expanded its benchmarking scope from Physical Network Function(PNF)
   running on a dedicated hardware system to Network Function
   Virtualization(NFV) infrastructure and Virtualized Network
   Function(VNF).  [RFC8172] described considerations for configuring
   NFV infrastructure and benchmarking metrics, and [RFC8204] gives
   guidelines for benchmarking virtual switch which connects VNFs in
   Open Platform for NFV(OPNFV).

   Recently NFV infrastructure has evolved to include a lightweight
   virtualized platform called the containerized infrastructure, where
   network functions are virtualized by using the host operating system
   (OS) virtualization instead of hardware virtualization in virtual
   machine (VM)-based infrastructure based on the hypervisor.  In
   comparison to VMs, containers do not have a separate hardware and
   kernel.  Containerized virtual network functions (C-VNF) share the
   same kernel space on the same host, while their resources are
   logically isolated in different namespaces.  Considering this
   architecture difference between container-based and virtual-machine
   based NFV systems, containerized NFV network performance benchmarking
   might have different System Under Test(SUT) and Device Under
   Test(DUT) configurations compared with both black-box benchmarking
   and VM-based NFV infrastructure as described in [RFC8172].

Tran, et al.             Expires 6 January 2024                 [Page 3]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   In terms of networking, to route traffic between containers which are
   isolated in different network namespaces, a container network plugin
   is required.  This network plugin creates the network interface
   inside the container and tunnels it to the host network via the Linux
   bridge, virtual switch (vSwitch) or direct Network Interface Card
   (NIC) based on chosen networking techniques.  These techniques
   include multiple different packet acceleration solutions which have
   been applied recently in containerized infrastructure to enhance
   containerized network throughput and line-rate transmission speed.
   The differences in the architecture of these acceleration solutions
   create different containerized networking models considerations which
   should be noticed while benchmarking containerized network
   performance.  Besides, the unique architecture of containerized
   network might cause additional resource configuration considerations.

   This draft aims to provide additional considerations as
   specifications to guide containerized infrastructure benchmarking
   compared with the previous benchmarking methodology of common NFV
   infrastructure.  These considerations include investigation of
   multiple networking models based on the usage of different packet
   acceleration techniques, and investigation of several resources
   configurations that might impact on containerized network performance
   such as CPU isolation, hugepages, CPU cores and memory allocation,
   service function chaining.  The benchmark experiences of these
   mentioned considerations are also presented in this draft as
   references.  Note that, although the detailed configurations of both
   infrastructures differ, the new benchmarks and metrics defined in
   [RFC8172] and [RFC8204] can be equally applied in containerized
   infrastructure from a generic-NFV point of view, and therefore
   defining additional evaluation metrics or methodologies are out of
   scope.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document is to be interpreted as described in [RFC2119].  This
   document uses the terminology described in [RFC8172], [RFC8204],
   [ETSI-TST-009].

3.  Containerized Infrastructure Overview

   With the proliferation and popularity of Kubernetes, in a common
   containerized infrastructure, pod is defined as a basic unit for
   orchestration and management that can host multiple containers, with
   shared storage and network resources.  Kubernetes supports several
   run-time options for containers such as Docker, CRI-O and containerd.
   In this document, the terms container and pod are used

Tran, et al.             Expires 6 January 2024                 [Page 4]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   interchangeably, and Kubernetes concepts are used for general
   containerized infrastructure.

   For benchmarking of the containerized infrastructure, as mentioned in
   [RFC8172], the basic approach is to reuse existing benchmarking
   methods developed within the BMWG.  Various network function
   specifications defined in BMWG should still be applied to
   containerized VNF(C-VNF)s for the performance comparison with
   physical network functions and VM-based VNFs.  A major distinction of
   the containerized infrastructure from the VM-based infrastructure is
   the absence of a hypervisor.  Without hypervisor, all C- VNFs share
   the same host and kernel space.  Storage, computing, and networking
   resources are logically isolated between containers via different
   namespaces.

   Container networking is provided by Container Network Interface (CNI)
   Plugins . CNI plugins create the network link between containers and
   host’s external (real) interfaces.  Different kinds of CNI plugins
   leverage different networking technologies and solutions to create
   this link.  These include bringing host network device into container
   namespace, or creating network interface pairs with one side attached
   to container network namespace and the other attached to the host
   network namespace, either direct point-to-point, or via a bridge/
   switching function.  To support packet acceleration techniques such
   as user-space networking, SR-IOV or eBPF, specific CNI plugins are
   required.  The architectural differences of these CNIs bring
   additional considerations when benchmarking network performance in
   containerized infrastructure.

4.  Benchmarking Considerations

4.1.  Networking Models

   Container networking services in Kubernetes are provided by CNI
   plugins which describe network configuration in JSON format.
   Initially, when a pod or container is first instantiated, it has no
   network.  CNI plugins insert a network interface into the isolated
   container network namespace, and performs other necessary tasks to
   connect the host and container network namespaces.  It then allocates
   IP address to the interface, configures routing consistent with the
   IP address management plugin.  Different CNIs use different
   networking technologies to implement this connection.  Based on the
   chosen networking technologies, and how the packet is processed/
   accelerated via the kernel-space and/or the user-space of the host,
   these CNIs can be categorized into different container networking
   models.  The usage of each networking model and its corresponding
   CNIs can affect the container networking performance.

Tran, et al.             Expires 6 January 2024                 [Page 5]
Internet-Draft      Benchmarking Containerized Infra           July 2023

4.1.1.  Kernel-space non-Acceleration Model

    +------------------------------------------------------------------+
    | User Space                                                       |
    |   +-----------+                                  +-----------+   |
    |   |   C-VNF   |                                  |   C-VNF   |   |
    |   | +-------+ |                                  | +-------+ |   |
    |   +-|  eth  |-+                                  +-|  eth  |-+   |
    |     +---^---+                                      +---^---+     |
    |         |                                              |         |
    |         |     +----------------------------------+     |         |
    |         |     |                                  |     |         |
    |         |     |  Networking Controller / Agent   |     |         |
    |         |     |                                  |     |         |
    |         |     +-----------------^^---------------+     |         |
    ----------|-----------------------||---------------------|----------
    |     +---v---+                   ||                 +---v---+     |
    |  +--|  veth |-------------------vv-----------------|  veth |--+  |
    |  |  +-------+     Switching/Routing Component      +-------+  |  |
    |  |         (Kernel Routing Table, OVS Kernel Datapath,        |  |
    |  |         Linux Bridge, MACVLAN/IPVLAN sub-interfaces)       |  |
    |  |                                                            |  |
    |  +-------------------------------^----------------------------+  |
    |                                  |                               |
    | Kernel Space         +-----------v----------+                    |
    +----------------------|          NIC         |--------------------+
                           +----------------------+

         Figure 1: Example architecture of the Kernel-Space non-
                            Acceleration Model

   Figure 1 shows kernel-space non-Acceleration model.  In this model,
   the virtual ethernet (veth) interface on the host side can be
   attached to different switching/routing components based on the
   chosen CNI.  In the case of Calico, it is the direct point-to-point
   attachment to the host namespace then using Kernel routing table for
   routing between containers.  For Flannel, it is the Linux Bridge.  In
   the case of MACVLAN/IPVLAN, it is the corresponding virtual sub-
   interfaces.  For dynamic networking configuration, the Forwarding
   policy can be pushed by the controller/agent located in the user-
   space.  In the case of Open vSwitch (OVS) [OVS], configured with
   Kernel Datapath, the first packet of the 'non-matching' flow can be
   sent to the user space networking controller/agent (ovs-switchd) for
   dynamic forwarding decision.

Tran, et al.             Expires 6 January 2024                 [Page 6]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   In general, the switching/routing component is running on kernel
   space, data packets should be processed in-network stack of host
   kernel before transferring packets to the C-VNF running in user-
   space.  Not only pod-to-External but also pod-to-pod traffic should
   be processed in the kernel space.  This design makes networking
   performance worse than other networking models which utilize packet
   acceleration techniques described in below sections.  Kernel-space
   vSwitch models are listed below:

   o Docker Network [Docker-network], Flannel Network [Flannel], Calico
   [Calico], OVS (OpenvSwitch) [OVS], OVN (Open Virtual Network) [OVN],
   MACVLAN, IPVLAN

4.1.2.  User-space Acceleration Model

    +------------------------------------------------------------------+
    | User Space                                                       |
    |   +---------------+                          +---------------+   |
    |   |     C-VNF     |                          |     C-VNF     |   |
    |   | +-----------+ |    +-----------------+   | +-----------+ |   |
    |   | |  virtio   | |    |    Networking   |   | |  virtio   |-|   |
    |   +-|  /memif   |-+    | Controller/Agent|   +-|  /memif   |-+   |
    |     +-----^-----+      +-------^^--------+     +-----^-----+     |
    |           |                    ||                    |           |
    |           |                    ||                    |           |
    |     +-----v-----+              ||              +-----v-----+     |
    |     | vhost-user|              ||              | vhost-user|     |
    |  +--|  / memif  |--------------vv--------------|  / memif  |--+  |
    |  |  +-----------+                              +-----------+  |  |
    |  |                          vSwitch                           |  |
    |  |                      +--------------+                      |  |
    |  +----------------------|      PMD     |----------------------+  |
    |                         |              |                         |
    |                         +-------^------+                         |
    ----------------------------------|---------------------------------
    |                                 |                                |
    |                                 |                                |
    |                                 |                                |
    | Kernel Space         +----------V-----------+                    |
    +----------------------|          NIC         |--------------------+
                           +----------------------+

   Figure 2: Example architecture of the User-Space Acceleration Model

   Figure 2 shows user-space vSwitch model, in which data packets from
   physical network port are bypassed kernel processing and delivered
   directly to the vSwitch running on user-space.  This model is

Tran, et al.             Expires 6 January 2024                 [Page 7]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   commonly considered as Data Plane Acceleration (DPA) technology since
   it can achieve high-rate packet processing than a kernel-space
   network with limited packet throughput.  For bypassing kernel and
   directly transferring the packet to vSwitch, Data Plane Development
   Kit (DPDK) is essentially required.  With DPDK, an additional driver
   called Pull-Mode Driver (PMD) is created on vSwtich.  PMD driver must
   be created for each NIC separately.  Userspace CNI [userspace-cni] is
   required to create user-space network interface (virtio or memif) at
   each container.  User-space vSwitch models are listed below:

   o OVS-DPDK [ovs-dpdk], VPP [vpp]

4.1.3.  eBPF Acceleration Model

Tran, et al.             Expires 6 January 2024                 [Page 8]
Internet-Draft      Benchmarking Containerized Infra           July 2023

    +------------------------------------------------------------------+
    | User Space                                                       |
    |    +----------------+                     +----------------+     |
    |    |      C-VNF     |                     |      C-VNF     |     |
    |    | +------------+ |                     | +------------+ |     |
    |    +-|     eth    |-+                     +-|     eth    |-+     |
    |      +-----^------+                         +------^-----+       |
    |            |                                       |             |
    -------------|---------------------------------------|--------------
    |      +-----v-------+                        +-----v-------+      |
    |      |  +------+   |                        |  +------+   |      |
    |      |  | eBPF |   |                        |  | eBPF |   |      |
    |      |  +------+   |                        |  +------+   |      |
    |      | veth tc hook|                        | veth tc hook|      |
    |      +-----^-------+                        +------^------+      |
    |            |                                       |             |
    |            |   +-------------------------------+   |             |
    |            |   |                               |   |             |
    |            |   |       Networking Stack        |   |             |
    |            |   |                               |   |             |
    |            |   +-------------------------------+   |             |
    |      +-----v-------+                        +-----v-------+      |
    |      |  +------+   |                        |  +------+   |      |
    |      |  | eBPF |   |                        |  | eBPF |   |      |
    |      |  +------+   |                        |  +------+   |      |
    |      | veth tc hook|                        | veth tc hook|      |
    |      +-------------+                        +-------------+      |
    |      |     OR      |                        |     OR      |      |
    |    +-|-------------|------------------------|-------------|--+   |
    |    | +-------------+                        +-------------+  |   |
    |    | |  +------+   |                        |  +------+   |  |   |
    |    | |  | eBPF |   |         NIC Driver     |  | eBPF |   |  |   |
    |    | |  +------+   |                        |  +------+   |  |   |
    |    | |  XDP hook   |                        |  XDP hook   |  |   |
    |    | +-------------+                        +------------ +  |   |
    |    +---------------------------^-----------------------------+   |
    |                                |                                 |
    | Kernel Space          +--------v--------+                        |
    +-----------------------|       NIC       |------------------------+
                            +-----------------+

     Figure 3: Example architecture of the eBPF Acceleration Model -
                                non-AFXDP

Tran, et al.             Expires 6 January 2024                 [Page 9]
Internet-Draft      Benchmarking Containerized Infra           July 2023

    +------------------------------------------------------------------+
    | User Space                                                       |
    |    +-----------------+                    +-----------------+    |
    |    |      C-VNF      |                    |      C-VNF      |    |
    |    | +-------------+ |  +--------------+  | +-------------+ |    |
    |    +-|     eth     |-+  |   CNDP APIs  |  +-|     eth     |-+    |
    |      +-----^-------+    +--------------+    +------^------+      |
    |            |                                       |             |
    |      +-----v-------+                        +------v------+      |
    -------|    AFXDP    |------------------------|    AFXDP    |------|
    |      |    socket   |                        |    socket   |      |
    |      +-----^-------+                        +-----^-------+      |
    |            |                                       |             |
    |            |   +-------------------------------+   |             |
    |            |   |                               |   |             |
    |            |   |       Networking Stack        |   |             |
    |            |   |                               |   |             |
    |            |   +-------------------------------+   |             |
    |            |                                       |             |
    |    +-------|---------------------------------------|--------+    |
    |    | +-----|------+                           +----|-------+|    |
    |    | |  +--v---+  |                           |  +-v----+  ||    |
    |    | |  | eBPF |  |         NIC Driver        |  | eBPF |  ||    |
    |    | |  +------+  |                           |  +------+  ||    |
    |    | |  XDP hook  |                           |  XDP hook  ||    |
    |    | +-----^------+                           +----^-------+|    |
    |    +-------|-------------------^-------------------|--------+    |
    |            |                                       |             |
    -------------|---------------------------------------|--------------
    |            +---------+                   +---------+             |
    |               +------|-------------------|----------+            |
    |               | +----v-------+       +----v-------+ |            |
    |               | |   netdev   |       |   netdev   | |            |
    |               | |     OR     |       |     OR     | |            |
    |               | | sub/virtual|       | sub/virtual| |            |
    |               | |  function  |       |  function  | |            |
    | Kernel Space  | +------------+  NIC  +------------+ |            |
    +---------------|                                     |------------+
                    +-------------------------------------+

     Figure 4: Example architecture of the eBPF Acceleration Model -
                        using AFXDP supported CNI

Tran, et al.             Expires 6 January 2024                [Page 10]
Internet-Draft      Benchmarking Containerized Infra           July 2023

    +------------------------------------------------------------------+
    | User Space                                                       |
    |   +---------------+                          +---------------+   |
    |   |     C-VNF     |                          |     C-VNF     |   |
    |   | +-----------+ |    +-----------------+   | +-----------+ |   |
    |   | |  virtio   | |    |    Networking   |   | |  virtio   |-|   |
    |   +-|  /memif   |-+    | Controller/Agent|   +-|  /memif   |-+   |
    |     +-----^-----+      +-------^^--------+     +-----^-----+     |
    |           |                    ||                    |           |
    |           |                    ||                    |           |
    |     +-----v-----+              ||              +-----v-----+     |
    |     | vhost-user|              ||              | vhost-user|     |
    |  +--|  / memif  |--------------vv--------------|  / memif  |--+  |
    |  |  +-----^-----+                              +-----^-----+  |  |
    |  |        |                 vSwitch                  |        |  |
    |  |  +-----v-----+                              +-----v-----+  |  |
    |  +--| AFXDP PMD |------------------------------| AFXDP PMD |--+  |
    |     +-----^-----+                              +-----^-----+     |
    |           |                                          |           |
    |     +-----v-----+                              +-----v-----+     |
    ------|   AFXDP   |------------------------------|   AFXDP   |-----|
    |     |   socket  |                              |   socket  |     |
    |     +-----^----+                               +-----^-----+     |
    |           |                                          |           |
    |           |    +-------------------------------+     |           |
    |           |    |                               |     |           |
    |           |    |       Networking Stack        |     |           |
    |           |    |                               |     |           |
    |           |    +-------------------------------+     |           |
    |           |                                          |           |
    |    +------|------------------------------------------|--------+  |
    |    | +----|-------+                           +------|-----+  |  |
    |    | |  +-v----+  |                           |  +---v--+  |  |  |
    |    | |  | eBPF |  |         NIC Driver        |  | eBPF |  |  |  |
    |    | |  +------+  |                           |  +------+  |  |  |
    |    | |  XDP hook  |                           |  XDP hook  |  |  |
    |    | +------------+                           +------------+  |  |
    |    +----------------------------^-----------------------------+  |
    |                                 |                                |
    ----------------------------------|---------------------------------
    |                                 |                                |
    | Kernel Space         +----------v-----------+                    |
    +----------------------|          NIC         |--------------------+
                           +----------------------+

     Figure 5: Example architecture of the eBPF Acceleration Model -
            using user- space vSwitch which support AFXDP PMD

Tran, et al.             Expires 6 January 2024                [Page 11]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   The eBPF Acceleration model leverages the extended Berkeley Packet
   Filter (eBPF) technology [eBPF] to achieve high-performance packet
   processing.  It enables execution of sandboxed programs inside
   abstract virtual machines within the Linux kernel without changing
   the kernel source code or loading the kernel module.  To accelerate
   data plane performance, eBPF programs are attached to different BPF
   hooks inside the linux kernel stack.

   One type of BPF hook is the eXpress Data Path (XDP) at the networking
   driver.  It is the first hook that triggers eBPF program upon packet
   reception from external network.  The other type of BPF hook is
   Traffic Control Ingress/Egress eBPF hook (tc eBPF).  The eBPF program
   running at the tc hook enforce policy on all traffic exit the pod,
   while the eBPF program running at the XDP hook enforce policy on all
   traffic coming from NIC.

   On the egress datapath side, whenever a packet exits the pod, it
   first goes through the pod’s veth interface.  Then, the destination
   that received the packet depends on the chosen CNI plugin that is
   used to create container networking.  If the chosen CNI plugin is a
   non-AFXDP-based CNI, the packet is received by the eBPF program
   running at veth interface tc hook.  If the chosen CNI plugin is an
   AFXDP-supported CNI, the packet is received by the AFXDP socket
   [AFXDP].  AFXDP socket is a new Linux socket type which allows a fast
   packet delivery tunnel between itself and the XDP hook at the
   networking driver.  This tunnel bypasses the network stack in kernel
   space to provide high-performance raw packet networking.  Packets are
   transmitted between user space and AFXDP socket via a shared memory
   buffer.  Once the egress packet arrived at the AFXDP socket or tc
   hook, it is directly forwarded to the NIC.

   On the ingress datapath side, eBPF programs at the XDP hook/tc hook
   pick up packets from the NIC network devices (NIC ports).  In case of
   using AFXDP CNI plugin [afxdp-cni], there are two operation modes:
   “primary” and “cdq”. In “primary” mode, NIC network devices can be
   directly allocated to pods.  Meanwhile, in “cdq” mode, NIC network
   devices can be efficiently partioned to subfunctions or SR-IOV
   virtual functions, which enables multiple pods to share a primary
   network device.  Then, from network devices, packets are directly
   delivered to the veth interface pair or AFXDP socket (via or not via
   AFXDP socket depends on the chosen CNI), bypass all of the kernel
   network layer processing such as iptables.  In case of Cilium CNI
   [Cilium], context-switching process to the pod network namespace can
   also be bypassed.

   Notable eBPF Acceleration models can be classified into 3 categories
   below.  Their corresponding model architecture are shown in Figure 3,
   Figure 4, Figure 5.

Tran, et al.             Expires 6 January 2024                [Page 12]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   o non-AFXDP: eBPF supported CNI such as Calico [Calico], Cilium
   [Cilium]

   o using AFXDP supported CNI: AFXDP K8s plugin [afxdp-cni] used by
   Cloud Native Data Plane project [CNDP]

   o using user-space vSwitch which support AFXDP PMD: OVS-DPDK
   [ovs-dpdk] and VPP [vpp] are the vSwitches that have AFXDP device
   driver support.  Userspace CNI [userspace-cni] is used to enable
   container networking via these vSwitches.

   Container network performance of Cilium project is reported by the
   project itself in [cilium-benchmark].  Meanwhile, AFXDP performance
   and comparison against DPDK are reported in [intel-AFXDP] and
   [LPC18-DPDK-AFXDP], respectively.

4.1.4.  Smart-NIC Acceleration Model

    +------------------------------------------------------------------+
    | User Space                                                       |
    |    +-----------------+                    +-----------------+    |
    |    |      C-VNF      |                    |      C-VNF      |    |
    |    | +-------------+ |                    | +-------------+ |    |
    |    +-|  vf driver  |-+                    +-|  vf driver  |-+    |
    |      +-----^-------+                        +------^------+      |
    |            |                                       |             |
    -------------|---------------------------------------|--------------
    |            +---------+                   +---------+             |
    |               +------|-------------------|------+                |
    |               | +----v-----+       +-----v----+ |                |
    |               | | virtual  |       | virtual  | |                |
    |               | | function |       | function | |                |
    | Kernel Space  | +----^-----+  NIC  +-----^----+ |                |
    +---------------|      |                   |      |----------------+
                    | +----v-------------------v----+ |
                    | |      Classify and Queue     | |
                    | +-----------------------------+ |
                    +---------------------------------+

            Figure 6: Examples of Smart-NIC Acceleration Model

   Figure 6 shows Smart-NIC acceleration model, which does not use
   vSwitch component.  This model can be separated into two
   technologies.

   One is Single-Root I/O Virtualization (SR-IOV), which is an extension
   of PCIe specifications to enable multiple partitions running
   simultaneously within a system to share PCIe devices.  In the NIC,

Tran, et al.             Expires 6 January 2024                [Page 13]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   there are virtual replicas of PCI functions known as virtual
   functions (VF), and each of them is directly connected to each
   container's network interfaces.  Using SR-IOV, data packets from
   external bypass both kernel and user space and are directly forwarded
   to container’s virtual network interface.  SRIOV network device
   plugin for Kubernetes [SR-IOV] is recommended to create an special
   interface at each container controlled by the VF driver.

   The other technology is eBPF/XDP programs offloading to Smart-NIC
   card as mentioned in the previous section.  It enables general
   acceleration of eBPF. eBPF programs are attached to XDP and run at
   the Smart-NIC card, which allows server CPUs to perform more
   application-level work.  However, not all Smart-NIC cards provide
   eBPF/XDP offloading support.

4.1.5.  Model Combination

     +-------------------------------------------------------+
     | User Space                                            |
     | +--------------------+         +--------------------+ |
     | |        C-VNF       |         |        C-VNF       | |
     | | +------+  +------+ |         | +------+  +------+ | |
     | +-|  eth |--|  eth |-+         +-|  eth |--|  eth |-+ |
     |   +---^--+  +---^--+             +--^---+  +---^--+   |
     |       |         |                   |          |      |
     |       |         |                   |          |      |
     |       |     +---v--------+  +-------v----+     |      |
     |       |     | vhost-user |  | vhost-user |     |      |
     |       |  +--|  / memif   |--|  / memif   |--+  |      |
     |       |  |  +------------+  +------------+  |  |      |
     |       |  |             vSwitch              |  |      |
     |       |  +----------------------------------+  |      |
     |       |                                        |      |
     --------|----------------------------------------|-------
     |       +-----------+              +-------------+      |
     |              +----|--------------|---+                |
     |              |+---v--+       +---v--+|                |
     |              ||  vf  |       |  vf  ||                |
     |              |+------+       +------+|                |
     | Kernel Space |                       |                |
     +--------------|           NIC         |----------------+
                    +-----------------------+

             Figure 7: Examples of Model Combination deployment

Tran, et al.             Expires 6 January 2024                [Page 14]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   Figure 7 shows the networking model when combining user-space vSwitch
   model and Smart-NIC acceleration model.  This model is frequently
   considered in service function chain scenarios when two different
   types of traffic flows are present.  These two types are North/South
   traffic and East/West traffic.

   North/South traffic is the type that packets are received from other
   servers and routed through VNF.  For this traffic type, Smart-NIC
   model such as SR-IOV is preferred because packets always have to pass
   the NIC.  User-space vSwitch involvement in north-south traffic will
   create more bottlenecks.  On the other hand, East/West traffic is a
   form of sending and receiving data between containers deployed in the
   same server and can pass through multiple containers.  For this type,
   user-space vSwitch models such as OVS-DPDK and VPP are preferred
   because packets are routed within the user space only and not through
   the NIC.

   The throughput advantages of these different networking models with
   different traffic direction cases are reported in [Intel-SRIOV-NFV].

4.2.  Resources Configuration

   The resources configuration consideration list here is not only
   applied for the C-VNF but also other components in a containerized
   SUT.  A Containerized SUT is composed of NICs, possible cables
   between hosts, kernel and/or vSwitch, and C-VNFs.

4.2.1.  CPU Isolation / NUMA Affinity

   CPU pinning enables benefits such as maximizing cache utilization,
   eliminating operating system thread scheduling overhead as well as
   coordinating network I/O by guaranteeing resources.  One example
   technology of CPU Pinning in containerized infrastructure is the CPU
   Manager for Kubernetes (CMK) [CMK].  This technology was proved to be
   effective in avoiding the "noisy neighbor" problem, as shown in an
   existing experience [Intel-EPA].  Besides, CPU Isolation techniques'
   benefits are not only applied for "noisy neighbor" problem.
   Different VNFs also neighbor each other and neighbor vSwitch if used.

   NUMA affects the speed of different CPU cores when accessing
   different memory regions.  CPU cores in the same NUMA nodes can
   locally access to the shared memory in that node, which is faster
   than remotely accessing the memory in a different NUMA node.  In
   containerized network, packet forwarding is processed through NIC,
   VNF and a possible vSwitch based on chosen networking model.  NIC's
   NUMA node alignment can be checked via the PCI devices' node
   affinity.  Meanwhile, specific CPU cores can be direclty assigned to
   VNF and vSwtich via their configuration settings.  Network

Tran, et al.             Expires 6 January 2024                [Page 15]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   performance can be changed depending on the location of the NUMA node
   whether it is the same NUMA node where the physical network
   interface, vSwitch and VNF are attached to.  There is benchmarking
   experience for cross-NUMA performance impacts [cross-NUMA-vineperf].
   In that tests, they consist of cross-NUMA performance with 3
   scenarios depending on the location of the traffic generator and
   traffic endpoint.  As the results, it was verified as below:

   o A single NUMA Node serving multiple interfaces is worse than Cross-
   NUMA Node performance degradation

   o Worse performance with VNF sharing CPUs across NUMA

   Note that CPU Pinning and NUMA Affinity configurations considerations
   might also applied to VM-based VNF.  As mentioned above, dedicated
   CPU cores of a specific NUMA node can be assigned to VNF and vSwitch
   via their own running configurations.  NIC's NUMA node can be checked
   from the PCI devices' infomration.  Host's NUMA nodes can be
   scheduled to virtual machines by specifying in their settings the
   chosen nodes.

4.2.2.  Pod Hugepages

   Hugepage configures a large page size of memory to reduce Translation
   Lookaside Buffer(TLB) miss rate and increase the application
   performance.  This increases the performance of logical/virtual to
   physical address lookups performed by a CPU's memory management unit,
   and overall system performance.  In the containerized infrastructure,
   the container is isolated at the application level, and
   administrators can set huge pages more granular level (e.g.,
   Kubernetes allows to use of 2M bytes or 1G bytes huge pages for the
   container).  Moreover, this page is dedicated to the application but
   another process, so the application uses the page more efficiently
   way.  From a network benchmark point of view, however, the impact on
   general packet processing can be relatively negligible, and it may be
   necessary to consider the application level to measure the impact
   together.  In the case of using the DPDK application, as reported in
   [Intel-EPA], it was verified to improve network performance because
   packet handling processes are running in the application together.

4.2.3.  Pod CPU Cores and Memory Allocation

   Different resources allocation choices may impact the container
   network performance.  These include different CPU cores and RAM
   allocation to Pods, and different CPU cores allocation to the Poll
   Mode Driver and the vSwitch.  Benchmarking experience from [ViNePERF]
   which was published in [GLOBECOM-21-benchmarking-kubernetes] verified
   that:

Tran, et al.             Expires 6 January 2024                [Page 16]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   o 2 CPUs per Pod is insufficient for all packet frame sizes.  With
   large packet frame sizes (over 1024), increasing CPU per pods
   significantly increases the throughput.  Different RAM allocation to
   Pods also causes different throughput results

   o Not assigning dedicated CPU cores to DPDK PMD causes significant
   performance dropss

   o Increasing CPU core allocation to OVS-DPDK vSwitch does not affect
   its performance.  However, increasing CPU core allocation to VPP
   vSwitch results in better latency.

   Besides, regarding user-space acceleration model which uses PMD to
   poll packets to the user-space vSwitch, dedicated CPU cores
   assignment to PMD’s Rx Queues might improve the network performance.

4.2.4.  Service Function Chaining

   When we consider benchmarking for containerized and VM-based
   infrastructure and network functions, benchmarking scenarios may
   contain various operational use cases.  Traditional black-box
   benchmarking focuses on measuring the in-out performance of packets
   from physical network ports since the hardware is tightly coupled
   with its function and only a single function is running on its
   dedicated hardware.  However, in the NFV environment, the physical
   network port commonly will be connected to multiple VNFs(i.e.,
   Multiple PVP test setup architectures were described in
   [ETSI-TST-009]) rather than dedicated to a single VNF.  This scenario
   is called Service Function Chaining.  Therefore, benchmarking
   scenarios should reflect operational considerations such as the
   number of VNFs or network services defined by a set of VNFs in a
   single host. [service-density] proposed a way for measuring the
   performance of multiple NFV service instances at a varied service
   density on a single host, which is one example of these operational
   benchmarking aspects.  Another aspect in benchmarking service
   function chaining scenario should be considered is different network
   acceleration technologies.  Network performance differences may occur
   because of different traffic patterns based on the provided
   acceleration method.

4.2.5.  Additional Considerations

   Apart from the single-host test scenario, the multi-hosts scenario
   should also be considered in container network benchmarking, where
   container services are deployed across different servers.  To provide
   network connectivity for container-based VNFs between different
   server nodes, inter-node networking is required.  According to
   [ETSI-NFV-IFA-038], there are several technologies to enable inter-

Tran, et al.             Expires 6 January 2024                [Page 17]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   node network: overlay technologies using a tunnel endpoint (e.g.
   VXLAN, IP in IP), routing using Border Gateway Protocol (BGP), layer
   2 underlay, direct network using dedicated NIC for each pod, or load
   balancer using LoadBalancer service type in Kubernetes.  Different
   protocols from these technologies may cause performance differences
   in container networking.

5.  Security Considerations

   Benchmarking activities as described in this memo are limited to
   technology characterization of a Device Under Test/System Under Test
   (DUT/SUT) using controlled stimuli in a laboratory environment with
   dedicated address space and the constraints specified in the sections
   above.

   The benchmarking network topology will be an independent test setup
   and MUST NOT be connected to devices that may forward the test
   traffic into a production network or misroute traffic to the test
   management network.

   Further, benchmarking is performed on a "black-box" basis and relies
   solely on measurements observable external to the DUT/SUT.

   Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
   benchmarking purposes.  Any implications for network security arising
   from the DUT/SUT SHOULD be identical in the lab and in production
   networks.

6.  References

6.1.  Informative References

   [AFXDP]    "AF_XDP", September 2022,
              <https://www.kernel.org/doc/html/v4.19/networking/
              af_xdp.html>.

   [afxdp-cni]
              "AF_XDP Plugins for Kubernetes",
              <https://github.com/intel/afxdp-plugins-for-kubernetes>.

   [Calico]   "Project Calico", July 2019,
              <https://docs.projectcalico.org/>.

   [Cilium]   "Cilium Documentation", March 2022,
              <https://docs.cilium.io/en/stable//>.

Tran, et al.             Expires 6 January 2024                [Page 18]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   [cilium-benchmark]
              Cilium, "CNI Benchmark: Understanding Cilium Network
              Performance", May 2021,
              <https://cilium.io/blog/2021/05/11/cni-benchmark>.

   [CMK]      Intel, "Userspace CNI Plugin", February 2021,
              <https://github.com/intel/CPU-Manager-for-Kubernetes>.

   [CNDP]     "CNDP - Cloud Native Data Plane", September 2022,
              <https://cndp.io/>.

   [cross-NUMA-vineperf]
              Anuket Project, "Cross-NUMA performance measurements with
              VSPERF", March 2019, <https://wiki.anuket.io/display/HOME/
              Cross-NUMA+performance+measurements+with+VSPERF>.

   [Docker-network]
              "Docker, Libnetwork design", July 2019,
              <https://github.com/docker/libnetwork/>.

   [eBPF]     "eBPF, extended Berkeley Packet Filter", July 2019,
              <https://www.iovisor.org/technology/ebpf>.

   [ETSI-NFV-IFA-038]
              "Network Functions Virtualisation (NFV) Release 4;
              Architectural Framework; Report on network connectivity
              for container-based VNF", November 2021.

   [ETSI-TST-009]
              "Network Functions Virtualisation (NFV) Release 3;
              Testing; Specification of Networking Benchmarks and
              Measurement Methods for NFVI", October 2018.

   [Flannel]  "flannel 0.10.0 Documentation", July 2019,
              <https://coreos.com/flannel/>.

   [GLOBECOM-21-benchmarking-kubernetes]
              Sridhar, R., Paganelli, F., and A. Morton, "Benchmarking
              Kubernetes Container-Networking for Telco Usecases",
              December 2021.

   [intel-AFXDP]
              Karlsson, M., "AF_XDP Sockets: High Performance Networking
              for Cloud-Native Networking Technology Guide", January
              2021.

Tran, et al.             Expires 6 January 2024                [Page 19]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   [Intel-EPA]
              Intel, "Enhanced Platform Awareness in Kubernetes", 2018,
              <https://builders.intel.com/docs/networkbuilders/enhanced-
              platform-awareness-feature-brief.pdf>.

   [Intel-SRIOV-NFV]
              Patrick, K. and J. Brian, "SR-IOV for NFV Solutions
              Practical Considerations and Thoughts", February 2017.

   [LPC18-DPDK-AFXDP]
              Karlsson, M. and B. Topel, "The Path to DPDK Speeds for
              AF_XDP", November 2018.

   [OVN]      "How to use Open Virtual Networking with Kubernetes", July
              2019, <https://github.com/ovn-org/ovn-kubernetes>.

   [OVS]      "Open Virtual Switch", July 2019,
              <https://www.openvswitch.org/>.

   [ovs-dpdk] "Open vSwitch with DPDK", July 2019,
              <http://docs.openvswitch.org/en/latest/intro/install/
              dpdk/>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", RFC 2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC2544]  Bradner, S. and J. McQuaid, "Benchmarking Methodology for
              Network Interconnect Devices", RFC 2544, March 1999,
              <https://www.rfc-editor.org/rfc/rfc2544>.

   [RFC8172]  Morton, A., "Considerations for Benchmarking Virtual
              Network Functions and Their Infrastructure", RFC 8172,
              July 2017, <https://www.rfc-editor.org/rfc/rfc8172>.

   [RFC8204]  Tahhan, M., O'Mahony, B., and A. Morton, "Benchmarking
              Virtual Switches in the Open Platform for NFV (OPNFV)",
              RFC 8204, September 2017,
              <https://www.rfc-editor.org/rfc/rfc8204>.

   [service-density]
              Konstantynowicz, M. and P. Mikus, "NFV Service Density
              Benchmarking", March 2019, <https://tools.ietf.org/html/
              draft-mkonstan-nf-service-density-00>.

   [SR-IOV]   "SRIOV for Container-networking", July 2019,
              <https://github.com/intel/sriov-cni>.

Tran, et al.             Expires 6 January 2024                [Page 20]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   [userspace-cni]
              Intel, "CPU Manager for Kubernetes", August 2021,
              <https://github.com/intel/userspace-cni-network-plugin>.

   [ViNePERF] "Project: Virtual Network Performance for Telco NFV",
              <https://wiki.anuket.io/display/HOME/ViNePERF>.

   [vpp]      "VPP with Containers", July 2019, <https://fdio-
              vpp.readthedocs.io/en/latest/usecases/containers.html>.

Appendix A.  Benchmarking Experience (Networking Models)

A.1.  Benchmarking Environment

   This appendix is our IETF Hackathon test's proof-of-concept for the
   different networking model benchmarking consideration.  This appendix
   can be removed if the document is approved.

   In this test, our purpose is to test the performance of different
   containerized networking acceleration models: User-space, eBPF, and
   Smart-NIC.  The selected solutions for each model are: VPP, AFXDP
   (both OVS-AFXDP PMD case and AFXDP CNI plugin case), and SR-IOV
   respectively.  The test is set up like below.

   o Benchmarking physical servers' specifications

Tran, et al.             Expires 6 January 2024                [Page 21]
Internet-Draft      Benchmarking Containerized Infra           July 2023

+-------------------+-------------------------+-------------------------+
|     Node Name     |    Specification        |      Description        |
+-------------------+-------------------------+-------------------------+
| Master Node       |- Intel(R) Xeon(R)       | Container Deployment    |
|                   |  Gold 5220R @ 2.4Ghz    |and Network Allocation   |
|                   |  (10 Cores)             |- Centos 7.7             |
|                   |- MEM 128GB              |- Kubernetes Master      |
|                   |- DISK 500GB             |- MULTUS CNI             |
|                   |- Control plane : 1G     |  Userspace CNI          |
|                   |                         |  Kubernetes SRIOV plugin|
|                   |                         |  Kubernetes AFXDP plugin|
+-------------------+-------------------------+-------------------------+
| Worker Node       |- Intel(R) Xeon(R)       | Container Service       |
|                   |  Gold 5220R @ 2.4Ghz    |- Ubuntu 22.04           |
|                   |  (80 Cores)             |  (18.04 fpr VPP test)   |
|                   |- MEM 256G               |- Kubernetes Worker      |
|                   |- DISK 2T                |- Layer 2 Forwarding     |
|                   |- Control plane : 1G     |  DPDK application       |
|                   |- Data plane : XL710-qda2|- MULTUS CNI             |
|                   |  (1NIC 2PORT- 40Gb)     |  Userspace CNI          |
|                   |                         |  Kubernetes SRIOV plugin|
|                   |                         |  Kubernetes AFXDP plugin|
+-------------------+-------------------------+-------------------------+
| Packet Generation |- Intel(R) Xeon(R)       | Packet Generator        |
| Node              |  Gold 6148 @ 2.4Ghz     |- CentOS 7.7             |
|                   |  (2Socket X 20Core)     |- installed Trex 2.4     |
|                   |- MEM 128G               |                         |
|                   |- DISK 2T                | Benchmarking Application|
|                   |- Control plane : 1G     |- T-Rex Non Drop Rate    |
|                   |- Data plane : XL710-qda2|                         |
|                   |  (1NIC 2PORT- 40Gb)     |                         |
+-------------------+-------------------------+-------------------------+

           Figure 8: Test Environment-Server Specification

   o Benchmarking general architecture

   +-------------------------------------------------------------------+
   |              Containerized Infrastructure Worker Node             |
   | +--------------------------------------------------+              |
   | |           POD - Multus CNI                       |              |
   | |              (l2fwd)                             |              |
   | |         +---------------+                        |              |
   | |         |               |                        |              |
   | |   +-----v------+    +---v--------+  +----------+ |              |
   | |   | Userspace/ |    | Userspace/ |  | Flannel  | |              |
   | |   | SRIOV/AFXDP|    | SRIOV/AFXDP|  |          | |              |
   | |   |    eth1    |    |    eth2    |  |   eth0   | |              |

Tran, et al.             Expires 6 January 2024                [Page 22]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   | |   +-----^-----=+    +----^-------+  +----------+ |              |
   | +---------|----------------|-----------------------+              |
   |           |                |                                      |
   | +---------v----------------v-----------------------+              |
   | |          Different Acceleration Options          |              |
   | |       +-------------+      +-------------+       |              |
   | |  +----| vhost/memif |------| vhost/memif |-----+ |              |
   | |  |    +-------------+      +-------------+     | |              |
   | |  |              OVS/VPP vSwitch                | |              |
   | |  |                                             | |              |
   | |  |    +-------------+      +-------------+     | |              |
   | |  +----|  DPDK PMD   |------|  DPDK PMD   |-----+ |              |
   | |       +-------------+      +-------------+       |   User Space |
   +-|--------------------------------------------------|--------------+
   | |                                                  |              |
   | |                                                  | Kernel Space |
   +-|--------------------------------------------------|--------------+
   | |  +---------------------------------------------+ |              |
   | |  | +-----------------+     +-----------------+ | |              |
   | |  | | VF (SRIOV case) |     | VF (SRIOV case) | | |              |
   | |  | +-----------------+     +-----------------+ | |              |
   | |  | +-----------------+     +-----------------+ | |              |
   | |  | | XDP (AFXDP case)|     | XDP (AFXDP case)| | |              |
   | |  | +-----------------+     +-----------------+ | |              |
   | |  +---------------------------------------------+ |              |
   | +-------^----------------------^-------------------+              |
   |         |                      |                       NIC Driver |
   +---+ +---v----+            +----v---+ +----------------------------+
       | | PORT 0 |  40G NIC   | PORT 1 | |
       | +---^----+            +----^---+ |
       +-----|----------------------|-----+
       +-----|----------------------|-----+
   +---| +---V----+            +----v---+ |----------------------------+
   |   | | PORT 0 |  40G NIC   | PORT 1 | |   Packet Generator (Trex)  |
   |   | +--------+            +--------+ |                            |
   |   +----------------------------------+                            |
   +-------------------------------------------------------------------+

                Figure 9: Networking Model Test Architecture

Tran, et al.             Expires 6 January 2024                [Page 23]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   Multus CNI is set up to enable attaching different network interfaces
   to pod.  Flannel CNI is used for control plane networking between
   Kubernetes master and worker node.  For user-space networking model,
   Userspace CNI is used for packet forwarding between pod's interfaces
   and VPP userspace vSwitch.  For eBPF networking model, Kubernetes
   AFXDP plugin is used for AFXDP CNI plugin case and Userspace CNI is
   used for OVS vSwitch with AFXDP PMD support case.  For Smart-NIC
   networking model, Kubernetes SR-IOV plugin is used for packet
   forwarding between pod's veth interfaces and NIC SR-IOV's virtual
   functions.

   Details packet flow of each networking models can be referred from
   Section 4.1.

A.2.  Benchmarking Results

   Figure 10 shows our zero packet loss throughput test results with
   different packet's frame sizes as specified in [RFC2544].  The
   results show different throughput performances of different
   networking models.  SR-IOV and eBPF using AFXDP-CNI have the best
   performances, followed by VPP vSwitch and eBPF using OVS-AFXDP.  The
   performance gap between the 2 eBPF model variations might be caused
   by the limited performance of the vhost-user interface of the OVS
   vSwitch as mentioned in [GLOBECOM-21-benchmarking-kubernetes].

       +------------+---------------------------------------------------+
       |            |                 Model                             |
       | Frame Size +---------------------------------------------------+
       |            |  Userspace  |   eBPF    |   eBPF    |  Smart-NIC  |
       |            |    (VPP)    |(OVS-AFXDP)|(AFXDP CNI)|  (SR-IOV)   |
       +------------+-------------+-----------+-----------+-------------+
       |    64      |    7.25     |   1.64    |    4.32   |    10.48    |
       +------------+-------------+-----------+-----------+-------------+
       |    128     |    13.32    |   2.69    |    8.32   |    25.37    |
       +------------+-------------+-----------+-----------+-------------+
       |    256     |    19.26    |   3.54    |    14.47  |    30.38    |
       +------------+-------------+-----------+-----------+-------------+
       |    512     |    25.62    |   7.32    |    27.13  |    37.11    |
       +------------+-------------+-----------+-----------+-------------+
       |    1024    |    30.12    |   13.42   |    37.16  |    39.10    |
       +------------+-------------+-----------+-----------+-------------+
       |    1280    |    31.23    |   17.83   |    39.23  |    39.23    |
       +------------+-------------+-----------+-----------+-------------+
       |    1518    |    31.26    |   21.37   |    39.25  |    39.28    |
       +------------+-------------+-----------+-----------+-------------+

       Figure 10: Different Networking Models Zero Packet Loss
                    Throughput Test Results (Gbps)

Tran, et al.             Expires 6 January 2024                [Page 24]
Internet-Draft      Benchmarking Containerized Infra           July 2023

Appendix B.  Benchmarking Experience (Resources Configuration in Single
             Pod Scenario)

   This appendix is our IETF Hackathon test's proof-of-concept for the
   resources configuration benchmarking consideration.  This appendix
   can be removed if the document is approved.

B.1.  Benchmarking Environment

   In this test, we tested different NUMA and CPU Pinning configurations
   on VPP user-space networking model.  For CPU Pinning configuration
   test, we deployed a noisy neighbor pod alongside the layer 2
   forwarding C-VNF.  The noisy neighbor is implemented using CPU stress
   application.  Both pods are managed by the CPU Manager for Kubernetes
   which is a command-line program that enables CPU core pinning for
   container-based workloads.  For NUMA configuration test, we aligned
   the C-VNF, vSwitch and NIC interface over 2 NUMA nodes.  Different
   CPU Pinning and NUMA Alignment configuration scenarios are described
   below.

   o Benchmarking physical servers' Specifications: Same as Appendix A

   o Benchmarking Architecture

Tran, et al.             Expires 6 January 2024                [Page 25]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   +---------------------------------------------------------------+
   |                    Align Pod, vSwitch, NIC in                 |
   |                    NUMA node 0 or NUMA node 1                 |
   +---------------------------------------------------------------+
   |           Containerized Infrastructure Worker Node            |
   |        +--------------------------+       +-----------------+ |
   |        |         Pod (l2fwd)*     |       | Noisy neighbor* | |
   |        |    +-------------+       |       |                 | |
   |        |    |             |       |       |                 | |
   |        | +--v----+    +---v---+   |       |                 | |
   |        | |  eth1 |    |  eth2 |   |       |    Stress-ng    | |
   |        | +--^----+    +---^---+   |       |                 | |
   |        +----|-------------|-------+       +-----------------+ |
   |             |             |                                   |
   |        +----v--+      +---v---+                               |
   |   +----| memif |------| memif |------+                        |
   |   |    +-------+      +-------+      |                        |
   |   |            VPP vSwitch           |                        |
   |   |                                  |                        |
   |   |    +--------+     +-------+      |                        |
   |   +----|  PMD   |-----|  PMD  |------+                        |
   |        +--^-----+     +-----^-+                    User Space |
   +-----------|-----------------|---------------------------------+
   |           |                 |                                 |
   |           |                 |                    Kernel Space |
   +-----------|-----------------|---------------------------------+
   |           |                 |                             NIC |
   +-----+ +---v----+          +-v------+ +------------------------+
         | | PORT 0 |  40G NIC | PORT 1 | |
         | +---^----+          +-^------+ |
         +-----|-----------------|--------+
         +-----|-----------------|--------+
   +-----| +---V----+          +-v------+ |------------------------+
   |     | | PORT 0 |  40G NIC | PORT 1 | | Packet Generator (Trex)|
   |     | +--------+          +--------+ |                        |
   |     +--------------------------------+                        |
   +---------------------------------------------------------------+

   *- CPU Manager for Kubernetes configured

     Figure 11: Resource Configuration Test Architecture in Single Pod
                                  scenario

   o CPU Pinning Scenarios

Tran, et al.             Expires 6 January 2024                [Page 26]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   Both the C-VNF pod and the noisy neighbor pod are configured with 3
   kinds of CMK modes: Disable(no CPU Pinning), Shared (Both pods shared
   the same assigned CPU cores), Exclusive (Dedicated CPU cores for each
   pod)

   o NUMA Alignment Scenarios

          +----------------------+---------+---------+
          |  Scenario  |   NIC   | vSwitch |   pod   |
          +----------------------+---------+---------+
          |     s1     |  NUMA0  |  NUMA0  |  NUMA0  |
          +----------------------+---------+---------+
          |     s2     |  NUMA0  |  NUMA0  |  NUMA1  |
          +----------------------+---------+---------+
          |     s3     |  NUMA0  |  NUMA1  |  NUMA1  |
          +----------------------+---------+---------+
          |     s4     |  NUMA0  |  NUMA1  |  NUMA0  ||
          +----------------------+---------+---------+

         Figure 12: NUMA Alignment Scenarios in Single-Pod scenario

B.2.  Benchmarking Results

   For the CPU Pinning test, in shared mode, we assigned two CPUs for
   several PODs.  In exclusive mode, we dedicated one CPU for one POD,
   independently.  First, the test was conducted to figure out the line
   rate of the VPP switch, and the basic Kubernetes performance when CMK
   is disabled.  After that, CMK-Shared mode and CMK-Exclusive mode were
   applied.  During each CPU Pinning scenario test, 4 different NUMA
   alignment were also applied.  The result is shown at Figure 13.

   The test results confirm that CPU Pinning can mitigate the effect of
   the noisy neighbor.  Exclusive mode worked better than Shared mode
   because of its CPU cores dedicated assignment.  Regarding NUMA
   alignment configuration, aligning all of C-VNF, vSwitch and NIC
   interface in the same NUMA can optimize the network performance
   (scenario 1).  Meanwhile, aligning vSwitch and C-VNF in different
   NUMA nodes can cause significant throughput degradation. (scenario 2
   and 4).

Tran, et al.             Expires 6 January 2024                [Page 27]
Internet-Draft      Benchmarking Containerized Infra           July 2023

       +--------------------+-------------------------------------------+
       |     CPU Pinning    |         NUMA Alignment Scenarios          |
       |                    +----------+----------+----------+----------+
       |      Scenarios     |    s1    |    s2    |    s3    |    s4    |
       +--------------------+----------+----------+----------+----------+
       |    Without CMK     |   4.78   |   2.34   |   4.39   |   2.41   |
       +--------------------+----------+----------+----------+----------+
       | CMK-Exclusive Mode |   15.63  |   7.67   |   14.33  |   7.84   |
       +--------------------+----------+----------+----------+----------+
       |  CMK-shared Mode   |   11.16  |   5.47   |   10.23  |   5.52   |
       +--------------------+----------+----------+----------+----------+

    Figure 13: Different resource configurations 1518-byte packet
     size's zero packet loss throughput test result in single pod
                           scenario (Gbps)

Appendix C.  Benchmarking Experience (Networking Model Combination and
             Resources Configuration in Multi-Pod Scenario)

   This appendix is our IETF Hackathon test's proof-of-concept for the
   model combination and resources configuration benchmarking
   considerations.  This appendix can be removed if the document is
   approved.

C.1.  Benchmarking Environment

   The main goal of this experience was to benchmark the multi-pod
   scenario, in which packets are traversed through two pods.  We
   conducted two experiments.  First, we compared the networking
   performance between using model combination (SR-IOV-VPP) and only
   VPP.  Second, we evaluated different NUMA alignment configurations in
   multi-pod case.  As there are two pods in this case, NUMA alignment
   scenarios are different with single-pod case.  Meanwhile, because CPU
   Pinning scenarios are the same, we did not evaluate CPU Pinning in
   this multi-pod case.  Figure 14 is benchmarking architecture in this
   test, where two pods ran on the same host and vSwitch delivers
   packets between two pods, and SR-IOV VF handled input/output packets
   of the worker node.  For the only VPP case, VPP vSwitch handled all
   packet forwarding processes as illustrated in user-space networking
   acceleration model section.

   o Benchmarking physical servers' Specifications: Same as Appendix A

   o Benchmarking Architecture

Tran, et al.             Expires 6 January 2024                [Page 28]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   +---------------------------------------------------------------+
   |                Align Pod1, Pod2, vSwitch, NIC in              |
   |                    NUMA node 0 or NUMA node 1                 |
   +---------------------------------------------------------------+
   |             Containerized Infrastructure Worker Node          |
   |  +--------------------------+    +--------------------------+ |
   |  |      Pod1 (l2fwd)        |    |       Pod2 (l2fwd)       | |
   |  |    +-------------+       |    |    +-------------+       | |
   |  |    |             |       |    |    |             |       | |
   |  | +--v----+    +---v---+   |    | +--v----+    +---v---+   | |
   |  | |  eth1 |    |  eth2 |   |    | |  eth1 |    |  eth2 |   | |
   |  | +--^----+    +---^---+   |    | +--^----+    +---^---+   | |
   |  +----|-------------|-------+    +----|-------------|-------+ |
   |       |             |                 |             |         |
   |       |         +---v---+         +---v---+         |         |
   |       |    +----| memif |---------| memif |---+     |         |
   |       |    |    +-------+         +-------+   |     |         |
   |       |    |            VPP vSwitch           |     |         |
   |       |    +----------------------------------+     |    User |
   |       |                                             |   Space |
   +-------|---------------------------------------------|---------+
   |       |                                             |   Kernel|
   |       |                                             |   Space |
   +-------|---------------------------------------------|---------+
   |    +--v---+                                     +---v--+      |
   |    | VF0  |             NIC Driver              | VF1  |      |
   |    +--|---+                                     +---|--+      |
   +-+ +---v----+                                   +----v---+ +---+
     | | PORT 0 |             40G NIC               | PORT 1 | |
     | +---^----+                                   +----^---+ |
     +-----|---------------------------------------------|-----+
     +-----|---------------------------------------------|-----+
   +-| +---V----+                                   +----v---+ |---+
   | | | PORT 0 |             40G NIC               | PORT 1 | |   |
   | | +--------+                                   +--------+ |   |
   | +---------------------------------------------------------+   |
   |                Packet Generator (T-Rex)                       |
   +---------------------------------------------------------------+

                 Figure 14: Multi-pod Benchmarking Scenario

   o NUMA Alignment Scenarios

   Based on the results from single-pod case, aligning pod and vSwitch
   in different NUMA node can cause degraded performance.  Hence, in
   multi-pod case, we did not consider the cases which vSwitch and both
   pods are aligned to different nodes.

Tran, et al.             Expires 6 January 2024                [Page 29]
Internet-Draft      Benchmarking Containerized Infra           July 2023

        +----------------------+---------+---------+---------+
        |  Scenario  |   NIC   | vSwitch |  pod1   |  pod2   |
        +----------------------+---------+---------+---------+
        |     s1     |  NUMA0  |  NUMA0  |  NUMA0  |  NUMA0  |
        +----------------------+---------+---------+---------+
        |     s2     |  NUMA0  |  NUMA0  |  NUMA0  |  NUMA1  |
        +----------------------+---------+---------+---------+
        |     s3     |  NUMA0  |  NUMA0  |  NUMA1  |  NUMA0  |
        +----------------------+---------+---------+---------+
        |     s4     |  NUMA0  |  NUMA1  |  NUMA1  |  NUMA1  |
        +----------------------+---------+---------+---------+
        |     s4     |  NUMA0  |  NUMA1  |  NUMA1  |  NUMA0  |
        +----------------------+---------+---------+---------+
        |     s4     |  NUMA0  |  NUMA1  |  NUMA0  |  NUMA1  |
        +----------------------+---------+---------+---------+

         Figure 15: NUMA Alignment Scenarios in Multi-Pods scenario

C.2.  Benchmarking Results

   o Networking Model Combination Performance

   The results in Figure 16 confirm that combining Smart-NIC model (SR-
   IOV) and user-space model (VPP) can enhance the network throughput
   performance.  SR-IOV can improve VPP in terms of north-south traffic
   as it directly forwards traffic from the NIC to the pod interface.
   Meanwhile, VPP is better for east-west traffic between pods as packet
   forwarding is directly handled at user-space.

Tran, et al.             Expires 6 January 2024                [Page 30]
Internet-Draft      Benchmarking Containerized Infra           July 2023

                     +------------+-------------------------+
                     |            |          Model          |
                     | Frame Size +-------------------------+
                     |   (bytes)  |  Userspace  | Combined  |
                     |            |    (VPP)    |(SRIOV-VPP)|
                     +------------+-------------+-----------+
                     |    64      |    7.23     |   9.62    |
                     +------------+-------------+-----------+
                     |    128     |    13.38    |   15.71   |
                     +------------+-------------+-----------+
                     |    256     |    19.23    |   23.91   |
                     +------------+-------------+-----------+
                     |    512     |    25.58    |   31.76   |
                     +------------+-------------+-----------+
                     |    1024    |    30.07    |   39.15   |
                     +------------+-------------+-----------+
                     |    1280    |    31.16    |   39.33   |
                     +------------+-------------+-----------+
                     |    1518    |    31.25    |   39.32   |
                     +------------+-------------+-----------+

          Figure 16: Networking Model Combination Zero Packet Loss
                       Throughput Test Results (Gbps)

   o Different NUMA Alignments Performance in Multi-pod scenario

   The results in Figure 17 show that aligning both pods, vSwitch and
   NIC to the same NUMA node can optimize the network performance
   (scenario 1).  Spliting pods between different NUMA nodes might
   degrade the performance (scenario 2, 3, 5, 6).  Besides, aligning
   vSwitch and the pod that forward the packet out of the worker node to
   the same NUMA node might generate better performance (scenario 3, 6).

         +-------------+-----------------------------------------------+
         |             |             NUMA Alignment Scenarios          |
         |             +-------+-------+-------+-------+-------+-------+
         |             |  s1   |  s2   |  s3   |  s4   |  s5   |  s6   |
         +-------------+-------+-------+-------+-------+-------+-------+
         |  Throughput | 39.31 | 23.67 | 29.23 | 37.25 | 23.58 | 29.36 |
         +-------------+-------+-------+-------+-------+-------+-------+

      Figure 17: Different resource configurations 1518-byte packet
       size's zero packet loss throughput test result in single pod
                             scenario (Gbps)

Appendix D.  Change Log (to be removed by RFC Editor before publication)

Tran, et al.             Expires 6 January 2024                [Page 31]
Internet-Draft      Benchmarking Containerized Infra           July 2023

D.1.  Since draft-dcn-bmwg-containerized-infra-10

   Updated Benchmarking Experience appendixes with latest results from
   Hackathon events.

   Re-orgianized Benchmarking Experience appendixes to match with the
   the proposed benchmarking consideration inside the draft (Networking
   Models and Resources Configuration)

   Minor enhancement changes to Introduction and Resource Configuration
   consideration sections such as general description for container
   network plugin, which resource can also be applied for VM-VNF.

D.2.  Since draft-dcn-bmwg-containerized-infra-09

   Removed Additional Deployment Scenarios (section 4.1 of version 09).
   We agreed with reviews from VinePerf that performance difference
   between with-VM and without-VM scenarios are negligible

   Removed Additional Configuration Parameters (section 4.2 of version
   09).  We agreed with reviews from VinePerf that these parameters are
   explained in Performance Impacts/Resources Configuration section

   As VinePerf suggestion to categorize the networking models based on
   how they can accelerate the network performances, rename titles of
   section 4.3.1 and 4.3.2 of version 09: Kernel-space vSwitch model and
   User-space vSwitch model to Kernel-space non-Acceleration model and
   User-space Acceleration model.  Update corresponding explanation of
   kernel-space non-Acceleration model

   VinePerf suggested to replace the general architecture of eBPF
   Acceleration model with 3 seperate architecture for 3 different eBPF
   Acceleration model: non-AFXDP, using AFXDP supported CNI, and using
   user-space vSwitch which support AFXDP PMD.  Update corresponding
   explanation of eBPF Acceleration model

   Renamed Performance Impacts section (section 4.4 of version 09) to
   Resources Configuration.

   We agreed with VinePerf reviews to add "CPU Cores and Memory
   Allocation" consideration into Resources Configuration section

D.3.  Since draft-dcn-bmwg-containerized-infra-08

   Added new Section 4.  Benchmarking Considerations.  Previous
   Section 4.  Networking Models in Containerized Infrastructure was
   moved into this new Section 4 as a subsection

Tran, et al.             Expires 6 January 2024                [Page 32]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   Re-organized Additional Deployment Scenarios for containerized
   network benchmarking contents from Section 3.  Containerized
   Infrastructure Overview to new Section 4.  Benchmarking
   Considerations as the Addtional Deployment Scenarios subsection

   Added new Addtional Configuration Parameters subsection to new
   Section 4.  Benchmarking Considerations

   Moved previous Section 5.  Performance Impacts into new Section 4.
   Benchmarking Considerations as the Deployment settings impact on
   network performance section

   Updated eBPF Acceleration Model with AFXDP deployment option

   Enhanced Abstract and Introduction's description about the draft's
   motivation and contribution.

D.4.  Since draft-dcn-bmwg-containerized-infra-07

   Added eBPF Acceleration Model in Section 4.  Networking Models in
   Containerized Infrastructure

   Added Model Combination in Section 4.  Networking Models in
   Containerized Infrastructure

   Added Service Function Chaining in Section 5.  Performance Impacts

   Added Troubleshooting and Results for SRIOV-DPDK Benchmarking
   Experience

D.5.  Since draft-dcn-bmwg-containerized-infra-06

   Added Benchmarking Experience of Multi-pod Test

D.6.  Since draft-dcn-bmwg-containerized-infra-05

   Removed Section 3.  Benchmarking Considerations, Removed Section 4.
   Benchmarking Scenarios for the Containerized Infrastructure

   Added new Section 3.  Containerized Infrastructure Overview, Added
   new Section 4.  Networking Models in Containerized Infrastructure.
   Added new Section 5.  Performance Impacts

   Re-organized Subsection Comparison with the VM-based Infrastructure
   of previous Section 3.  Benchmarking Considerations and previous
   Section 4.Benchmarking Scenarios for the Containerized Infrastructure
   to new Section 3.  Containerized Infrastructure Overview

Tran, et al.             Expires 6 January 2024                [Page 33]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   Re-organized Subsection Container Networking Classification of
   previous Section 3.  Benchmarking Considerations to new Section 4.
   Networking Models in Containerized Infrastructure.  Kernel-space
   vSwitch models and User-space vSwitch models were presented as
   seperate subsections in this new Section 4.

   Re-organized Subsection Resource Considerations of previous
   Section 3.  Benchmarking Considerations to new Section 5.
   Performance Impacts as 2 seperate subsections CPU Isolation / NUMA
   Affinity and Hugepages.  Previous Section 5.  Additional
   Considerations was moved into this new Section 5 as the Additional
   Considerations subsection.

   Moved Benchmarking Experience contents to Appendix

D.7.  Since draft-dcn-bmwg-containerized-infra-04

   Added Benchmarking Experience of SRIOV-DPDK.

D.8.  Since draft-dcn-bmwg-containerized-infra-03

   Added Benchmarking Experience of Contiv-VPP.

D.9.  Since draft-dcn-bmwg-containerized-infra-02

   Editorial changes only.

D.10.  Since draft-dcn-bmwg-containerized-infra-01

   Editorial changes only.

D.11.  Since draft-dcn-bmwg-containerized-infra-00

   Added Container Networking Classification in Section 3.Benchmarking
   Considerations (Kernel Space network model and User Space network
   model).

   Added Resource Considerations in Section 3.Benchmarking
   Considerations(Hugepage, NUMA, RX/TX Multiple-Queue).

   Renamed Section 4.Test Scenarios to Benchmarking Scenarios for the
   Containerized Infrastructure, added 2 additional scenarios BMP2VMP
   and VMP2VMP.

   Added Additional Consideration as new Section 5.

Tran, et al.             Expires 6 January 2024                [Page 34]
Internet-Draft      Benchmarking Containerized Infra           July 2023

Contributors

   Kyoungjae Sun - ETRI - Republic of Korea

   Email: kjsun@etri.re.kr

   Hyunsik Yang - KT - Republic of Korea

   Email: yangun@dcn.ssu.ac.kr

Acknowledgments

   The authors would like to thank Al Morton for their valuable ideas
   and comments for this work.

Authors' Addresses

   Tran Minh Ngoc
   Soongsil University
   369, Sangdo-ro, Dongjak-gu
   Seoul
   06978
   Republic of Korea
   Phone: +82 28200841
   Email: mipearlska1307@dcn.ssu.ac.kr

   Sridhar Rao
   The Linux Foundation
   B801, Renaissance Temple Bells, Yeshwantpur
   Bangalore 560022
   India
   Phone: +91 9900088064
   Email: srao@linuxfoundation.org

   Jangwon Lee
   Soongsil University
   369, Sangdo-ro, Dongjak-gu
   Seoul
   06978
   Republic of Korea
   Phone: +82 1074484664
   Email: jangwon.lee@dcn.ssu.ac.kr

Tran, et al.             Expires 6 January 2024                [Page 35]
Internet-Draft      Benchmarking Containerized Infra           July 2023

   Younghan Kim
   Soongsil University
   369, Sangdo-ro, Dongjak-gu
   Seoul
   06978
   Republic of Korea
   Phone: +82 1026910904
   Email: younghak@ssu.ac.kr

Tran, et al.             Expires 6 January 2024                [Page 36]