Network Working Group                                          A. Morton
Internet-Draft                                                 AT&T Labs
Intended status: Informational                          February 2, 2015
Expires: August 6, 2015

  Considerations for Benchmarking Virtual Network Functions and Their


   Benchmarking Methodology Working Group has traditionally conducted
   laboratory characterization of dedicated physical implementations of
   internetworking functions.  This memo investigates additional
   considerations when network functions are virtualized and performed
   in commodity off-the-shelf hardware.


   3.4 Added inter-actions/dependencies within resource domains

   4.3 Added new metrics for characterization: PDV, reordering, mean
   delay, etc.

   4.4 Resolved the question of capacity and the 3x3 Matrix

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

Morton                   Expires August 6, 2015                 [Page 1]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   This Internet-Draft will expire on August 6, 2015.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Considerations for Hardware and Testing . . . . . . . . . . .   4
     3.1.  Hardware Components . . . . . . . . . . . . . . . . . . .   4
     3.2.  Configuration Parameters  . . . . . . . . . . . . . . . .   4
     3.3.  Testing Strategies  . . . . . . . . . . . . . . . . . . .   5
     3.4.  Attention to Shared Resources . . . . . . . . . . . . . .   5
   4.  Benchmarking Considerations . . . . . . . . . . . . . . . . .   6
     4.1.  Comparison with Physical Network Functions  . . . . . . .   6
     4.2.  Continued Emphasis on Black-Box Benchmarks  . . . . . . .   6
     4.3.  New Benchmarks and Related Metrics  . . . . . . . . . . .   7
     4.4.  Assessment of Benchmark Coverage  . . . . . . . . . . . .   7
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  11
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Benchmarking Methodology Working Group (BMWG) has traditionally
   conducted laboratory characterization of dedicated physical
   implementations of internetworking functions.  The Black-box
   Benchmarks of Throughput, Latency, Forwarding Rates and others have
   served our industry for many years.  [RFC1242] and [RFC2544] are the
   cornerstones of the work.

Morton                   Expires August 6, 2015                 [Page 2]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   An emerging set of service provider and vendor development goals is
   to reduce costs while increasing flexibility of network devices, and
   drastically accelerate their deployment.  Network Function
   Virtualization (NFV) has the promise to achieve these goals, and
   therefore has garnered much attention.  It now seems certain that
   some network functions will be virtualized following the success of
   cloud computing and virtual desktops supported by sufficient network
   path capacity, performance, and widespread deployment; many of the
   same techniques will help achieve NFV.

   See for
   more background, for example, the white papers there may be a useful
   starting place.  The Performance and Portability Best Practices
   [NFV.PER001] are particularly relevant to BMWG.  There are currently
   work-in-progress documents available in the Open Area including drafts
   describing Infrastructure aspects and service quality.

2.  Scope

   BMWG will consider the new topic of Virtual Network Functions and
   related Infrastructure to ensure that common issues are recognized
   from the start, using background materials from industry and SDOs
   (e.g., IETF, ETSI NFV).

   This memo investigates additional methodological considerations
   necessary when benchmarking VNF instantiated and hosted in commodity
   off-the-shelf (COTS) hardware.  An essential consideration is
   benchmarking both physical and virtual network functions, thereby
   allowing direct comparison.

   A clearly related goal: the benchmarks for the capacity of COTS to
   host a plurality of VNF instances should be investigated.  Existing
   networking technology benchmarks will also be considered for
   adaptation to NFV and closely associated technologies.

   A non-goal is any overlap with traditional computer benchmark
   development and their specific metrics (SPECmark suites such as

   A colossal non-goal is any form of architecture development related
   to NFV and associated technologies in BMWG, as has been the case
   since BMWG began work in 1989.

Morton                   Expires August 6, 2015                 [Page 3]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

3.  Considerations for Hardware and Testing

   This section lists the new considerations which must be addressed to
   benchmark VNF(s) and their supporting infrastructure.

3.1.  Hardware Components

   New Hardware devices will become part of the test set-up.

   1.  High volume server platforms (COTS, possibly with virtual
       technology enhancements).

   2.  Storage systems with large capacity, high speed, and high

   3.  Network Interface ports specially designed for efficient service
       of many virtual NICs.

   4.  High capacity Ethernet Switches.

   Labs conducting comparisons of different VNFs may be able to use the
   same hardware platform over many studies, until the steady march of
   innovations overtakes their capabilities (as happens with the lab's
   traffic generation and testing devices today).

3.2.  Configuration Parameters

   It will be necessary to configure and document the settings for the
   entire COTS platform, including:

   o  number of server blades (shelf occupation)

   o  CPUs

   o  caches

   o  storage system

   o  I/O

   as well as configurations that support the devices which host the VNF

   o  Hypervisor

   o  Virtual Machine

   o  Infrastructure Virtual Network

Morton                   Expires August 6, 2015                 [Page 4]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   and finally, the VNF itself, with items such as:

   o  specific function being implemented in VNF

   o  number of VNF components in the service function chain

   o  number of physical interfaces and links transited in the service
      function chain

3.3.  Testing Strategies

   The concept of characterizing performance at capacity limits may
   change.  For example:

   1.  It may be more representative of system capacity to characterize
       the case where Virtual Machines (VM, hosting the VNF) are
       operating at 50% Utilization, and therefore sharing the "real"
       processing power across many VMs.

   2.  Another important case stems from the need for partitioning
       functions.  A noisy neighbor (VM hosting a VNF in an infinite
       loop) would ideally be isolated and the performance of other VMs
       would continue according to their specifications.

   3.  System errors will likely occur as transients, implying a
       distribution of performance characteristics with a long tail
       (like latency), leading to the need for longer-term tests of each
       set of configuration and test parameters.

   4.  The desire for Elasticity and flexibility among network functions
       will include tests where there is constant flux in the VM
       instances.  Requests for new VMs and Releases for VMs hosting
       VNFs no longer needed would be an normal operational condition.

   5.  All physical things can fail, and benchmarking efforts can also
       examine recovery aided by the virtual architecture with different
       approaches to resiliency.

3.4.  Attention to Shared Resources

   Since many components of the new NFV Infrastructure are virtual, test
   set-up design must have prior knowledge of inter-actions/dependencies
   within the various resource domains in the System Under Test (SUT).
   For example, a virtual machine performing the role of a traditional
   tester function such as generating and/or receiving traffic should
   avoid sharing any SUT resources with the Device Under Test DUT.
   Otherwise, the results will have unexpected dependencies not
   encountered in physical device benchmarking.  The shared-resource

Morton                   Expires August 6, 2015                 [Page 5]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   aspect of test design remains one of the critical challenges to
   overcome in a reasonable way to produce useful results.

4.  Benchmarking Considerations

   This section discusses considerations related to Benchmarks
   applicable to VNFs and their associated technologies.

4.1.  Comparison with Physical Network Functions

   In order to compare the performance of virtual designs and
   implementations with their physical counterparts, identical
   benchmarks must be used.  Since BMWG has developed specifications for
   many network functions already, there will be re-use of existing
   benchmarks through references, while allowing for the possibility of
   benchmark curation during development of new methodologies.
   Consideration should be given to quantifying the number of parallel
   VNFs required to achieve comparable performance with a given physical
   device, or whether some limit of scale was reached before the VNFs
   could achieve the comparable level.

4.2.  Continued Emphasis on Black-Box Benchmarks

   When the network functions under test are based on Open Source code,
   there may be a tendency to rely on internal measurements to some
   extent, especially when the externally-observable phenomena only
   support an inference of internal events (such as routing protocol
   convergence).  However, external observations remain essential as the
   basis for Benchmarks.  Internal observations with fixed specification
   and interpretation may be provided in parallel, to assist the
   development of operations procedures when the technology is deployed,
   for example.  Internal metrics and measurements from Open Source
   implementations may be the only direct source of performance results
   in a desired dimension, but corroborating external observations are
   still required to assure the integrity of measurement discipline was
   maintained for all reported results.

   A related aspect of benchmark development is where the scope includes
   multiple approaches to a common function under the same benchmark.
   For example, there are many ways to arrange for activation of a
   network path between interface points and the activation times can be
   compared if the start-to-stop activation interval has a generic and
   unambiguous definition.  Thus, generic benchmark definitions are
   preferred over technology/protocol specific definitions where

Morton                   Expires August 6, 2015                 [Page 6]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

4.3.  New Benchmarks and Related Metrics

   There will be new classes of benchmarks needed for network design and
   assistance when developing operational practices (possibly automated
   management and orchestration of deployment scale).  Examples follow
   in the paragraphs below, many of which are prompted by the goals of
   increased elasticity and flexibility of the network functions, along
   with accelerated deployment times.

   Time to deploy VNFs: In cases where the COTS hardware is already
   deployed and ready for service, it is valuable to know the response
   time when a management system is tasked with "standing-up" 100's of
   virtual machines and the VNFs they will host.

   Time to migrate VNFs: In cases where a rack or shelf of hardware must
   be removed from active service, it is valuable to know the response
   time when a management system is tasked with "migrating" some number
   of virtual machines and the VNFs they currently host to alternate
   hardware that will remain in-service.

   Time to create a virtual network in the COTS infrastructure: This is
   a somewhat simplified version of existing benchmarks for convergence
   time, in that the process is initiated by a request from (centralized
   or distributed) control, rather than inferred from network events
   (link failure).  The successful response time would remain dependent
   on dataplane observations to confirm that the network is ready to

   Also, it appears to be valuable to measure traditional packet
   transfer performance metrics during the assessment of traditional and
   new benchmarks, including metrics that may be used to support service
   engineering such as the Spatial Composition metrics found in
   [RFC6049].  Examples include Mean one-way delay in section 4.1 of
   [RFC6049], Packet Delay Variation (PDV) in [RFC5481], and Packet
   Reordering [RFC4737] [RFC4689].

4.4.  Assessment of Benchmark Coverage

   It can be useful to organize benchmarks according to their applicable
   lifecycle stage and the performance criteria they intend to assess.
   The table below provides a way to organize benchmarks such that there
   is a clear indication of coverage for the intersection of lifecycle
   stages and performance criteria.

Morton                   Expires August 6, 2015                 [Page 7]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   |               |             |            |               |
   |               |   SPEED     |  ACCURACY  |  RELIABILITY  |
   |               |             |            |               |
   |               |             |            |               |
   |  Activation   |             |            |               |
   |               |             |            |               |
   |               |             |            |               |
   |  Operation    |             |            |               |
   |               |             |            |               |
   |               |             |            |               |
   | De-activation |             |            |               |
   |               |             |            |               |

   For example, the "Time to deploy VNFs" benchmark described above
   would be placed in the intersection of Activation and Speed, making
   it clear that there are other potential performance criteria to
   benchmark, such as the "percentage of unsuccessful VM/VNF stand-ups"
   in a set of 100 attempts.  This example emphasizes that the
   Activation and De-activation lifecycle stages are key areas for NFV
   and related infrastructure, and encourage expansion beyond
   traditional benchmarks for normal operation.  Thus, reviewing the
   benchmark coverage using this table (sometimes called the 3x3 matrix)
   can be a worthwhile exercise in BMWG.

   In one of the first applications of the 3x3 matrix on BMWG, we
   discovered that metrics on measured size, capacity, or scale do not
   easily match one of the three columns above.  There are three
   possibilities to resolve this:

   o  Add a column, Scaleability, but then it would be expected to have
      metrics in most of the Activation, Operation, and De-activation
      functions (which may not be the case).

   o  Include Scalability under Reliability: This fits the user
      perspective of the 3x3 matrix because the size or capacity of a
      device contributes to the likelihood that a request will be
      blocked, or that operation will be un-reliable when operating in
      an overload state.

   o  Keep size, capacity, and scale metrics separate from the 3x3

Morton                   Expires August 6, 2015                 [Page 8]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   After some discussion, including some of the original developers of
   the 3x3 matrix, it is suggested to keep capacity metrics separate
   from the 3x3 matrix and list them separately.  This approach
   encourages use of the 3x3 matrix to organize reports of results,
   where the capacity at which the various metrics were measured would
   be included in the title of the matrix (and results for multiple
   capacities would result in separate 3x3 matrices, if there were
   sufficient measurements/results to organize in that way).

5.  Security Considerations

   Benchmarking activities as described in this memo are limited to
   technology characterization of a Device Under Test/System Under Test
   (DUT/SUT) using controlled stimuli in a laboratory environment, with
   dedicated address space and the constraints specified in the sections

   The benchmarking network topology will be an independent test setup
   and MUST NOT be connected to devices that may forward the test
   traffic into a production network, or misroute traffic to the test
   management network.

   Further, benchmarking is performed on a "black-box" basis, relying
   solely on measurements observable external to the DUT/SUT.

   Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
   benchmarking purposes.  Any implications for network security arising
   from the DUT/SUT SHOULD be identical in the lab and in production

6.  IANA Considerations

   No IANA Action is requested at this time.

7.  Acknowledgements

   The author acknowledges an encouraging conversation on this topic
   with Mukhtiar Shaikh and Ramki Krishnan in November 2013.  Bhavani
   Parise and Ilya Varlashkin have provided useful suggestions to expand
   these considerations.  Bhuvaneswaran Vengainathan has already tried
   the 3x3 matrix with SDN controller draft, and contributed to many
   discussions.  Scott Bradner quickly pointed out shared resource
   dependencies in an early vSwitch measurement proposal, and the topic
   was included here as a key consideration.

Morton                   Expires August 6, 2015                 [Page 9]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

8.  References

8.1.  Normative References

              "Network Function Virtualization: Performance and
              Portability Best Practices", Group Specification ETSI GS
              NFV-PER 001 V1.1.1 (2014-06), June 2014.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2330]  Paxson, V., Almes, G., Mahdavi, J., and M. Mathis,
              "Framework for IP Performance Metrics", RFC 2330, May

   [RFC2544]  Bradner, S. and J. McQuaid, "Benchmarking Methodology for
              Network Interconnect Devices", RFC 2544, March 1999.

   [RFC2679]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way
              Delay Metric for IPPM", RFC 2679, September 1999.

   [RFC2680]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way
              Packet Loss Metric for IPPM", RFC 2680, September 1999.

   [RFC2681]  Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip
              Delay Metric for IPPM", RFC 2681, September 1999.

   [RFC3393]  Demichelis, C. and P. Chimento, "IP Packet Delay Variation
              Metric for IP Performance Metrics (IPPM)", RFC 3393,
              November 2002.

   [RFC3432]  Raisanen, V., Grotefeld, G., and A. Morton, "Network
              performance measurement with periodic streams", RFC 3432,
              November 2002.

   [RFC4689]  Poretsky, S., Perser, J., Erramilli, S., and S. Khurana,
              "Terminology for Benchmarking Network-layer Traffic
              Control Mechanisms", RFC 4689, October 2006.

   [RFC4737]  Morton, A., Ciavattone, L., Ramachandran, G., Shalunov,
              S., and J. Perser, "Packet Reordering Metrics", RFC 4737,
              November 2006.

   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J.
              Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)",
              RFC 5357, October 2008.

Morton                   Expires August 6, 2015                [Page 10]

Internet-Draft     Benchmarking VNFs and Related Inf.      February 2015

   [RFC5905]  Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network
              Time Protocol Version 4: Protocol and Algorithms
              Specification", RFC 5905, June 2010.

8.2.  Informative References

   [RFC1242]  Bradner, S., "Benchmarking terminology for network
              interconnection devices", RFC 1242, July 1991.

   [RFC5481]  Morton, A. and B. Claise, "Packet Delay Variation
              Applicability Statement", RFC 5481, March 2009.

   [RFC6049]  Morton, A. and E. Stephan, "Spatial Composition of
              Metrics", RFC 6049, January 2011.

   [RFC6248]  Morton, A., "RFC 4148 and the IP Performance Metrics
              (IPPM) Registry of Metrics Are Obsolete", RFC 6248, April

   [RFC6390]  Clark, A. and B. Claise, "Guidelines for Considering New
              Performance Metric Development", BCP 170, RFC 6390,
              October 2011.

Author's Address

   Al Morton
   AT&T Labs
   200 Laurel Avenue South
   Middletown,, NJ  07748

   Phone: +1 732 420 1571
   Fax:   +1 732 368 1192

Morton                   Expires August 6, 2015                [Page 11]