Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
A Performance-Oriented Digital Twin for Carrier Networks
draft-paillisse-nmrg-performance-digital-twin-00

Versions:
The information below is for an old version of the document.
Document	Type	This is an older version of an Internet-Draft whose latest revision state is "Expired".
	Authors	Jordi Paillisse , Paul Almasan , Miquel Ferriol , Pere Barlet , Albert Cabellos , Shihan Xiao , Xiang Shi , Xiangle Cheng , Diego Perino , Diego Lopez , Antonio Pastor
	Last updated	2022-07-11
	RFC stream	(None)
	Formats	txt html xml htmlized pdf bibtex bibxml
Stream	Stream state	(No stream defined)
	Consensus boilerplate	Unknown
	RFC Editor Note	(None)
IESG	IESG state	I-D Exists
	Telechat date	(None)
	Responsible AD	(None)
	Send notices to	(None)
Email authors IPR References Referenced by Nits Search email archive
draft-paillisse-nmrg-performance-digital-twin-00
Network Management Research Group                           J. Paillisse
Internet-Draft                                                P. Almasan
Intended status: Informational                                M. Ferriol
Expires: 12 January 2023                                       P. Barlet
                                                             A. Cabellos
                                                       UPC-BarcelonaTech
                                                                 S. Xiao
                                                                  X. Shi
                                                                X. Cheng
                                                                  Huawei
                                                               D. Perino
                                                                D. Lopez
                                                               A. Pastor
                                                          Telefonica I+D
                                                            11 July 2022

        A Performance-Oriented Digital Twin for Carrier Networks
            draft-paillisse-nmrg-performance-digital-twin-00

Abstract

   This draft introduces the concept of a Network Digital Twin (NDT) for
   performance evaluation.  A Performance NDT is able to produce
   performance estimates (delay, jitter, loss) of a given input network
   with a specific topology, traffic demand, and routing and scheduling
   configuration.  Also, this draft discusses the interface of the
   digital twin, how it relates to existing control plane elements, use
   cases, and possible implementation options.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 12 January 2023.

Paillisse, et al.        Expires 12 January 2023                [Page 1]
Internet-Draft      Network Performance Digital Twin           July 2022

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Architecture of the Network Performance Digital Twin  . . . .   5
   4.  Interfaces  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     4.1.  Administrator . . . . . . . . . . . . . . . . . . . . . .   7
     4.2.  Configuration Interface . . . . . . . . . . . . . . . . .   7
     4.3.  Digital Twin Interface (DTI)  . . . . . . . . . . . . . .   7
   5.  Mapping to the Network Digital Twin Architecture  . . . . . .   8
   6.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . .   9
     6.1.  Network Operations and Management . . . . . . . . . . . .   9
       6.1.1.  Network planning  . . . . . . . . . . . . . . . . . .   9
       6.1.2.  What-if scenarios . . . . . . . . . . . . . . . . . .  10
       6.1.3.  Troubleshooting . . . . . . . . . . . . . . . . . . .  11
       6.1.4.  Anomaly detection . . . . . . . . . . . . . . . . . .  11
       6.1.5.  Training  . . . . . . . . . . . . . . . . . . . . . .  11
     6.2.  Network Optimization  . . . . . . . . . . . . . . . . . .  12
   7.  Implementation Challenges . . . . . . . . . . . . . . . . . .  13
     7.1.  Simulation  . . . . . . . . . . . . . . . . . . . . . . .  13
     7.2.  Emulation . . . . . . . . . . . . . . . . . . . . . . . .  14
     7.3.  Analytical Modelling  . . . . . . . . . . . . . . . . . .  14
     7.4.  Neural Networks . . . . . . . . . . . . . . . . . . . . .  14
       7.4.1.  MultiLayer Perceptron . . . . . . . . . . . . . . . .  15
       7.4.2.  Recurrent Neural Networks . . . . . . . . . . . . . .  15
       7.4.3.  Convolutional Neural Networks . . . . . . . . . . . .  15
       7.4.4.  Graph Neural Networks . . . . . . . . . . . . . . . .  15
       7.4.5.  NN Comparison . . . . . . . . . . . . . . . . . . . .  16
   8.  Training  . . . . . . . . . . . . . . . . . . . . . . . . . .  17
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  18
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  18
     11.1.  Normative References . . . . . . . . . . . . . . . . . .  18
     11.2.  Informative References . . . . . . . . . . . . . . . . .  18

Paillisse, et al.        Expires 12 January 2023                [Page 2]
Internet-Draft      Network Performance Digital Twin           July 2022

   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  22
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  22

1.  Introduction

   A Digital Twin for computer networks is a virtual replica of an
   existing network with a behavior equivalent to that of the real one.
   The key advantage of a Network Digital Twin (NDT) is the ability to
   recreate the complexities and particularities of the network
   infrastructure without the deployment cost of a real network.  Hence,
   network administrators can test, deploy and modify network
   configurations safely, without worrying about the impact on the real
   network.  Once the administrator has found a configuration that
   fulfills the expected objectives, it is deployed to the real network.
   In addition, a NDT is faster, safer and more cost-effective than
   interacting with the physical network.  All these characteristics
   make NDT useful for different network management tasks ranging from
   network planning or troubleshooting to optimization.

   The concept of a NDT has been proposed for different approaches:
   network management
   [I-D.draft-zhou-nmrg-digitaltwin-network-concepts], 5G networks
   [digital-twin-5G], Vehicular networks [digital-twin-vanets],
   artificial intelligence [digital-twin-AI], or Industry 4.0
   [digital-twin-industry], among others.

   This draft proposes a Digital Twin for network management with a
   focus on performance evaluation.  That is, given several input
   parameters (topology, traffic matrix, etc), a Network Performance
   Digital Twin (NPDT) predicts network performance metrics such as
   delay (per path or per link), jitter, or loss.  This draft defines
   the inputs and outputs of such Digital Twin, the associated
   interfaces with other modules in the network control plane, and
   details use cases.

   In addition, this draft discusses possible implementation options for
   the NPDT, with a special emphasis on those based on Machine Learning.
   The aim of Section 7 (Implementation Challenges) is describing the
   advantages and limitations of these techniques.  For example, most
   Machine Learning technologies rely heavily on large amounts of data
   to achieve acceptable accuracy.  Other considerations include
   adjusting the architecture of the Neural Network to successfully
   understand the structure of the input data.

   In order to use a Network Performance Digital Twin (NPDT) in
   practical scenarios (c.f.  Section 6), such as network optimization,
   it should meet certain requirements:

Paillisse, et al.        Expires 12 January 2023                [Page 3]
Internet-Draft      Network Performance Digital Twin           July 2022

   Fast:  low delay when making predictions (in the order of
      milliseconds) to use it in optimization scenarios that need to
      test a large number of configuration variables (c.f.
      Section 6.2).

   Accurate:  the error of the prediction (vs the ground truth) has to
      be below a certain threshold to be deployable in real-world
      networks.

   Scalable:  support networks of arbitrarily large topologies

   Variety of Inputs:  accept a wide range of combinations of:

   *  Routing configurations

   *  Scheduling configurations (FIFO, Weighted Fair Queueing, Deficit
      Round Robin, etc)

   *  Topologies

   *  Traffic Matrices

   *  Traffic Models (constant bitrate, Poisson, ON/OFF, etc)

   Accessible:  despite the internal architecture of the NPDT, it needs
      to be easy to use for network engineers and administrators.  This
      includes, but is not limited to: interfaces to communicate with
      NPDT that are well-known in the networking community, metrics that
      are readily understood by network engineers, or confidence values
      of the estimations.

   Note that the inputs and outputs described here are an example, but
   other inputs and outputs are possible depending on the specificities
   of each scenario.

2.  Terminology

   Digital Twin (DT):
      A virtual replica of a physical system.

   Network Digital Twin (NDT):
      A virtual replica a physical network.

   Network Performance Digital Twin (NPDT):
      A virtual replica a physical network, that can predict with
      accuracy several performance metrics of the physical network.

Paillisse, et al.        Expires 12 January 2023                [Page 4]
Internet-Draft      Network Performance Digital Twin           July 2022

   Network Optimizer:
      An algorithm capable of finding the optimal configuration
      parameters of a network, e.g.  OSPF weights, given an optimization
      objective, e.g. latency below a certain threshold.

   Control Plane:
      Any system, hardware or software, centralized or decentralized, in
      charge of controlling and managing a physical network.  Examples
      are routing protocols, SDN controllers, etc.

3.  Architecture of the Network Performance Digital Twin

   Figure 1 presents an overview of the architecture of a Network
   Performance Digital Twin (NPDT).

          Administrator Intent
                       |
                       |
                       |Intent-Based Interface
                       |
                       |
   +-------------+-----------------------------+
   |             |     |                       |
   |             |   Intent-Based   Optimizer  |
   |             |   Rendered                  |         +-------------+
   |             |                             |   DTI   |   Network   |
   | Management  |                             |Interface| Performance |
   | Plane       |                             |<------->|   Digital   |
   |             |                             |         |    Twin     |
   |             |                             |         |             |
   |             |   Measure        Configure  |         +-------------+
   |             |  |                  |       |
   +-------------+-----------------------------+
                    |                  |
                    |                  |
       Measurement  |                  |  Configuration
         Interface  |                  |  Interface
                    |                  |
           +--------------------------------------+
           |                                      |
           |          Physical Network            |
           |                                      |
           +--------------------------------------+

            Figure 1: Global architecture of the Performance DT

   Each element is defined as:

Paillisse, et al.        Expires 12 January 2023                [Page 5]
Internet-Draft      Network Performance Digital Twin           July 2022

   Network Performance Digital Twin (NPDT):  a system capable of
      generating performance estimates of a specific instance of a
      network.

   Physical Network:  a real-world network that can be configured via
      standard interfaces.

   Management Plane:  The set of hardware and software elements in
      charge of controlling the Physical Network.  This ranges from
      routing processes, optimization algorithms, network controllers,
      visibility platforms, etc.  The definition, organization and
      implementation of the elements within the management plane is
      outside of the scope of this document.  In what follows, some
      elements of the management plane that are relevant to this
      document are described.

   *  Optimizer: a network optimizer that can tune the configuration
      parameters of a network given one or more optimization objectives,
      e.g. do not exceed a latency threshold in all paths, minimize the
      load of the most used link, and avoid more than 10 Gbps of traffic
      at router R4 [DEFO].

   *  Intent-Based Renderer: a system capable of understanding network
      intent, according to the definitions in
      [irtf-nmrg-ibn-concepts-definitions-09].

   *  Measure: any system to measure the status and performance of a
      network, e.g.  Netflow [RFC3954], streaming telemetry
      [streaming-telemetry], etc.

   *  Configure: any system to apply configuration settings to the
      network devices, e.g. a NETCONF Manager or an end-to-end system to
      manage device configuration files [facebook-config].

   And the functions of each interface are:

   DT Interface (DTI):  an interface to communicate with the Network
      Performance Digital Twin (NPDT).  Inputs to the DT are a
      description of the network (topology, routing configuration, etc),
      and the outputs are performance metrics (delay, jitter, loss, c.f.
      Section 4).

   Configuration Interface (CI):  a standard interface to configure the
      physical network, such as NETCONF [RFC6241], YANG, OpenFlow
      [OFspec], LISP [RFC6830], etc.

   Measurement Interface (MI):  a standard interface to collect network

Paillisse, et al.        Expires 12 January 2023                [Page 6]
Internet-Draft      Network Performance Digital Twin           July 2022

      status information, such as Netflow [RFC3954], SNMP, streaming
      telemetry [openconfig-rtgwg-gnmi-spec-01], etc.

   Intent-Based Interface (IBI):  an interface for the network
      administrator to define optimization objectives or run the DT to
      obtain performance estimates, among others.

4.  Interfaces

4.1.  Administrator

   This interface can be a simple CLI or a state-of-the-art GUI,
   depending on the final product.  In summary, it has to offer the
   network administrator the following options/features:

   *  Predict the performance of one or more network scenarios, defined
      by the administrator.  Several use-cases related to this option
      are detailed in Section 6.1.

   *  Define network optimization objectives and run the network
      optimizer.

   *  Apply the optimized configuration to the physical network.

4.2.  Configuration Interface

   This interface is used to configure the Physical Network with the
   configuration parameters obtained from the optimizer.  It can be
   composed of one or more IETF protocols for network configuration, a
   non-exhaustive list is: NETCONF [RFC6241], RESTCONF/YANG [RFC8040],
   PCE [RFC4655], OVSDB [RFC7047], or LISP [RFC6830].  It is also
   possible to use other standards defined outside the IETF that allow
   the configuration of elements in the forwarding plane, e.g.  OpenFlow
   [OFspec] or P4 Runtime [P4Rspec].

4.3.  Digital Twin Interface (DTI)

   This interface can be defined with any widespread data format, such
   as CSV files or JSON objects.  There are two groups of data.  We are
   assuming a network with N nodes.

   Inputs:  data sent to the NPDT to calculate the performance
      estimates:

   *  Topology: description of the network topology in graph format, eg.
      NetworkX [NetworkXlib].

Paillisse, et al.        Expires 12 January 2023                [Page 7]
Internet-Draft      Network Performance Digital Twin           July 2022

   *  Routing configuration: a matrix of size N*N.  Each cell contains
      the path from source N(i) to destination N(j) as a series of nodes
      of the topology.  Note that not all source-destination pairs may
      have a path.  Since the NPDT only needs a sequence of nodes to
      define a route, it supports different routing protocols, from
      OSPF, IS-IS or BGP, to SRv6, LISP, etc.

   *  Traffic Demands: a definition of the traffic that is injected into
      the network.  It can be specified with different granularities,
      ranging from a list of 5-tuple flows and their associated traffic
      intensity, to a N*N matrix defining the traffic intensity for each
      source-destination pair.  Some source-destination pairs may have
      zero traffic intensity.  The traffic intensity defines parameters
      of the traffic: bits per second, number of packets, average packet
      size, etc.

   *  Traffic Model: the statistical properties of the input traffic,
      e.g.  Video on Demand, backup, VoIP traffic, etc.  It can be
      defined globally for the whole network or individually for each
      flow in the Traffic Demands.

   *  Scheduling configuration: attributes associated to the nodes of
      the topology graph describing the scheduling configuration of the
      network, that is (1) scheduling policy (e.g.  FIFO, WFQ, DRR,
      etc), and (2) number of queues per output port.

   Outputs:  performance estimates of the NPDT: three matrices of size
      N*N containing the delay, jitter and loss for all the paths in the
      input topology.

   Note that this is an example of the inputs/outputs of a performance
   NPDT, but other inputs and outputs are possible depending on the
   specificities of each scenario.

5.  Mapping to the Network Digital Twin Architecture

   Since the NPDT is a type of Network Digital Twin, its elements can be
   mapped to the reference architecture of a NDT described in
   [I-D.draft-zhou-nmrg-digitaltwin-network-concepts].  Table 1 maps the
   elements of the NDT reference architecture to those of the NPDT.
   Note that the Physical Network is the same for both architectures.

Paillisse, et al.        Expires 12 January 2023                [Page 8]
Internet-Draft      Network Performance Digital Twin           July 2022

     +=====================================+========================+
     |      NDT Reference Architecture     |       This draft       |
     +====================+================+========================+
     | Application Layer  |                | Intent-Based Interface |
     |                    |                +------------------------+
     |                    |                | Optimizer              |
     +--------------------+----------------+------------------------+
     | Digital Twin Layer | Management     | Management Plane       |
     |                    +----------------+------------------------+
     |                    | Service        | Network Performance    |
     |                    | Mapping Models | Digital Twin           |
     |                    +----------------+------------------------+
     |                    | Data           | Optional in production |
     |                    | Repository     | deployments            |
     +--------------------+----------------+------------------------+
     | Physical Network   | Data           | Measurement Interface  |
     |                    | Collection     |                        |
     |                    +----------------+------------------------+
     |                    | Control        | Configuration          |
     |                    |                | Interface              |
     +--------------------+----------------+------------------------+

        Table 1: Mapping of NDT reference architecture elements to
             the architecture of the Network Performance DT.

6.  Use Cases

6.1.  Network Operations and Management

6.1.1.  Network planning

   The size and traffic of networks has doubled every year
   [network-capacity].  To accommodate this growth in users and network
   applications, networks need periodical upgrades.  For example, ISPs
   might be willing to increase certain link capacities or add new
   connections to alleviate the burden on the existing infrastructure.
   This is typically a cumbersome process that relies on expert
   knowledge.  Furthermore, modern networks are becoming larger and more
   complex, thus exacerbating the difficulty of existing solutions to
   scale to larger networks [planning-scalability].

   Since the NPDT models large infrastructures and can produce accurate
   and fast performance estimates, it can help in different tasks
   related to network capacity and planning:

   *  Estimating when an existing network will run out of resources,
      assuming a given growth in users.

Paillisse, et al.        Expires 12 January 2023                [Page 9]
Internet-Draft      Network Performance Digital Twin           July 2022

   *  Use performance estimates to plan the optimal upgrade that can
      cope with user growth.  Network operators can leverage the NPDT to
      make better planning decisions and anticipate network upgrades.

   *  Find unconventional topologies: in some networking scenarios,
      especially datacenter networks, some topologies are well-known to
      offer high performance [Google-Clos].  However, it is also
      possible to search for new topologies that optimize performance
      with the help of algorithms.  On one hand, the algorithm explores
      different topologies and, on the other hand, the NPDT provides
      fast performance estimations to the algorithm.  Hence, the NPDT
      guides the optimization algorithm towards the topologies with
      better performance [auto-dc-topology].

6.1.2.  What-if scenarios

   The NPDT is a unique tool to perform what-if analysis, that is,
   analyze the impact of potential scenarios and configurations safely
   without any impact on the real network.  In this context, the NPDT
   acts as a safe sandbox where different configurations are applied to
   the NPDT to understand their impact on the network.  Some examples of
   What-if analysis are:

   *  What is the impact in my network performance if we acquire company
      ACME and we incorporate all its employees?

   *  When will the network run out of capacity if we have an organic
      growth of users?

   *  What is the optimal network hardware upgrade given a budget?

   *  We need to update this path.  What is the impact on the
      performance of the other flows?

   *  A particular day has a spike of 10% in traffic intensity.  How
      much loss will it introduce?  Can we reduce this loss if we rate-
      limit another flow?

   *  How many links can fail until the SLA is degraded?

   *  What happens if link B fails?  Is the network able to process the
      current traffic load?

Paillisse, et al.        Expires 12 January 2023               [Page 10]
Internet-Draft      Network Performance Digital Twin           July 2022

6.1.3.  Troubleshooting

   There are many factors that cause network failures (e.g., invalid
   network configurations, unexpected protocol interactions).  Debugging
   modern networks is complex and time consuming.  Currently,
   troubleshooting is typically done by human experts with years of
   experience using networking tools.

   Network operators can leverage a NPDT to reproduce previous network
   failures, in order to find the source of service disruptions.
   Specifically, network operators can replicate past network failure
   scenarios and analyze their impact on network performance, making it
   easier to find specific configuration errors.  In addition, the NPDT
   helps in finding more robust network configurations that prevent
   service disruptions in the future.

6.1.4.  Anomaly detection

   Since the NPDT models the behaviour of a real-world network, network
   operators have access to an estimation of the expected network
   behaviour.  When the real-world network behaviour deviates from the
   NPDT's behaviour, it can act as an indicator of an anomaly in the
   real-world network.  Such anomalies can appear at different places in
   a network (e.g., core, edge, IoT), and different data sources can be
   used to detect such anomalies.

6.1.5.  Training

   As discussed before, the NPDT can be understood as a safe playground
   where misconfigurations don't affect the real-world system
   performance.  In this context, the NPDT can play an important role in
   improving the education and certification process of network
   professionals, both in basic networking training and advanced
   scenarios.  For example:

   *  In basic network training, understand how routing modifications
      impact delay.

   *  In more advanced studies, showcase the impact of scheduling
      configuration on flow performance, and how to use them to optimize
      SLAs.

   *  In cybersecurity scenarios, evaluate the effects of network
      attacks and possible counter-measures.

Paillisse, et al.        Expires 12 January 2023               [Page 11]
Internet-Draft      Network Performance Digital Twin           July 2022

6.2.  Network Optimization

   Since the DT can provide performance estimates in short timescales,
   it is possible to pair it with a network optimizer (Figure 2).  The
   network administrator defines one or more optimization objectives
   e.g. maximum average delay for all paths in the network.  The
   optimizer can be implemented with a classical optimization algorithm,
   like Constraint Programming [DEFO], or Local Search [LS], or a
   Machine-Learning one, such as Deep Neural Networks [DNN-TM], or
   Multi-Agent Reinforcement Learning [MARL-TE].  Regardless of the
   implementation, the optimizer tests various configurations to find
   the network configuration parameters that satisfy the optimization
   objectives.  In order to know the performance of a specific network
   configuration, the optimizer sends such configuration to the NPDT,
   that predicts the performance metrics of such configuration.

                      +------------+   Candidate        +-------------+
                      |            |   Network Config.  |   Network   |
    Optimization----> | Network    |------------------->| Performance |
    objectives        | Optimizer  |                    |   Digital   |
                      |            |<-------------------|    Twin     |
                      +------------+    Estimated       +-------------+
                            |           Performance
                            |
                            |
                            v
              Optimized Network Configuration

        Figure 2: Using a NPDT as a network model for an optimizer.

   An example of optimization use case would be multi-objective
   optimization scenarios: commonly, the network administrator defines a
   set of optimization goals that must be concurrently met [DEFO], for
   example:

   *  Bound the latency of all links to a maximum.

   *  Do not exceed a link utilization of 80%, but for only a sub-set of
      all the links.

   *  Route all flows of type B through node 10.

   *  Avoid more than 35 Gbps of traffic to router R5.

   *  Minimize the routing cost, that is, the number of flow to re-route
      [ReRoute-Cost].

Paillisse, et al.        Expires 12 January 2023               [Page 12]
Internet-Draft      Network Performance Digital Twin           July 2022

7.  Implementation Challenges

   This section presents different technologies that can be used to
   build a NPDT, and details the advantages and disadvantages of using
   them to implement a NPDT.  It takes into account how they perform
   with respect to the requirements of accuracy, speed, and scale of the
   NPDT predictions.

7.1.  Simulation

   Packet-level simulators, such as OMNET++ [OMNET] and NS-3 [ns-3]
   simulate network events.  In a nutshell, they simulate the operation
   of a network by processing a series of events, such as the
   transmission of a packet, enqueuing and dequeuing packets in the
   router, etc.  Hence, they offer excellent accuracy when predicting
   network performance metrics (delay, jitter and loss), but they take a
   significant amount of time to run the simulation.  They scale
   linearly with number of packets to simulate.

   In fact, the simulation time depends on the number of events to
   process [limitations-net-sim].  This limits the scalability of
   simulators, even if the topology does not change: increasing traffic
   intensities will take longer to simulate because more packets enter
   the network per unit of time.  Conversely, simulating the same
   traffic intensity in larger topologies will also increase the
   simulation time.  For example, consider a simulator that takes 11
   hours to process 4 billion events (these values are obtained from an
   actual simulation).  Although 4 billion events may appear a large
   figure, consider:

   *  A 1 Gbps ethernet link, transmitting regular frames with the
      maximum of 1518 bytes.

   *  This translates to approx. 82k packets crossing the link per
      second.

   *  Assuming a network with 50 links, and that the transmission of a
      packet over a link equals to a single event a in the simulator,
      such network translates to 82k packets/s/link * 50 links * 1
      event/packet ~ 4 million events to simulate one second of network
      activity.

   *  Then, with a budget of 4 billion events, it takes 11 hours to
      simulate only 16 minutes of network activity.

   These figures show that, despite the high accuracy of network
   simulators, they take too much time to calculate performance
   estimations.

Paillisse, et al.        Expires 12 January 2023               [Page 13]
Internet-Draft      Network Performance Digital Twin           July 2022

7.2.  Emulation

   Network emulators run the original network software in a virtualized
   environment.  This makes them easy to deploy, and depending on the
   emulation hardware, they can produce reasonably fast estimations.
   However, for large scale networks their speed will eventually
   decrease because they are not using specific hardware built for
   networking.  For fully-virtualized networks, emulating a network
   requires as many resources as the real one, which is not cost-
   effective.

   In addition, some studies have reported variable accuracy depending
   on the emulation conditions, both the parameters and underlying
   hardware and OS configurations [emulation-perf].  Hence, emulators
   show some limitations if we want to build a fast and scalable NPDT.
   However, emulators are useful in other use cases, for example in
   training, debugging, or testing new features.

7.3.  Analytical Modelling

   Queueing Theory (QT) is an analytical tool that models computer
   networks as a series of queues.  The key advantage of QT is its
   speed, because the calculations rely on mathematical equations.  QT
   is arguably the most popular modeling technique, where networks are
   represented as interconnected queues that are evaluated analytically.
   This represents a well-established framework that can model complex
   and large networks.

   However, the main limitation of QT is the traffic model: although it
   offers high accuracy for Poisson traffic models, it presents poor
   accuracy under realistic traffic models [qt-precision].  Internet
   traffic has been extensively analyzed in the past two decades, and
   despite the community has not agreed on a universal model, there is
   consensus that in general aggregated traffic shows strong
   autocorrelation and a heavy-tail [inet-traffic].

7.4.  Neural Networks

   Finally, Neural Networks (NN) and other Machine Learning (ML) tools
   are as fast as QT (in the order of milliseconds), and can provide
   similar accuracy to that of packet-level simulators.  They represent
   an interesting alternative, but have two key limitations.  First,
   they require training the NN with a large amount of data from a wide
   range of network scenarios: different routings, topologies,
   scheduling configurations, as well as link failures and network
   congestion.  This dataset may not be always accessible, or easy to
   produce in a production network (see Section 8).  Second, in order to
   scale to larger topologies and keep the accuracy, not all NN provide

Paillisse, et al.        Expires 12 January 2023               [Page 14]
Internet-Draft      Network Performance Digital Twin           July 2022

   sufficient accuracy, therefore, some use cases need custom NN
   architectures.

7.4.1.  MultiLayer Perceptron

   A MultiLayer Perceptron [MLP] is a basic kind of NN from the family
   of feedforward NN.  In short, input data is propagated
   unidirectionally from the input layer of neurons through the output.
   There may be an arbitrary number of hidden layers between the input
   and output layer.  They are widely used for basic ML applications,
   such as regression.

7.4.2.  Recurrent Neural Networks

   Recurrent Neural Networks [RNN] are a more advanced type of NN
   because they connect some layers to the previous ones, which gives
   them the ability to store state.  They are mostly used to process
   sequential data, such as handwriting, text, or audio.  They have been
   used extensively in speech processing [RNN-speech], and in general,
   Natural Language Processing applications [NLP].

7.4.3.  Convolutional Neural Networks

   Convolutional Neural Networks (CNN), are a Deep Learning NN designed
   to process structured arrays of data such as images.  CNNs are highly
   performant when detecting patterns in the input data.  This makes
   them widely used in computer vision tasks, and have become the state
   of the art for many visual applications, such as image classification
   [CNN-images].  Hence, their current design presents limited
   applicability to computer networks.

7.4.4.  Graph Neural Networks

   Graph Neural Networks [GNN] are a type of neural network designed to
   work with graph-structured data.  A relevant type of GNN with
   interesting characteristics for computer networks are Message Passing
   Neural Networks (MPNN).  In a nutshell, MPNN exchanges a set of
   messages between the graph nodes in order to understand the
   relationship between the input graph and the expected outputs of the
   training dataset.  They are composed of three functions, that are
   repeated several iterations, depending on the size of the graph:

   *  Message: encodes information about the relationship of two
      contiguous elements of the graph in a message (an n-element
      array).

Paillisse, et al.        Expires 12 January 2023               [Page 15]
Internet-Draft      Network Performance Digital Twin           July 2022

   *  Aggregation: combines the different messages received on a
      particular node.  It is typically an element-wise summation.  The
      result is an array of constant length, independently of the number
      of received messages.

   *  Update: combines the hidden states of a node with the aggregated
      message.  The result of this function is used as input to the next
      message-passing iteration.

   Note that the internal architecture of a MPNN is re-build for each
   input graph.

   Such ability to understand graph-structured data naturally renders
   them interesting for a Network Performance Digital Twin.  Since
   computer networks are fundamentally graphs, they have the potential
   to take as input a graph of the network, and produce as output
   performance estimations of such the input network [qt-precision].

7.4.5.  NN Comparison

   Figure 3 presents a comparison of different types of NN that predict
   the delay of a given input network.  We use a dataset of the
   performance of different network topologies, created with simulation
   data (i.e, ground truth) from OMNET++. We measure the error relative
   to the delay of the simulation data.  In order to evaluate how well
   the different NN deal with different network topologies, we train
   each NN in three different scenarios:

   *  Same topology: the training and testing datasets contain the same
      network topologies.

   *  Different topology: the training and testing datasets contain
      different sets of network topologies.  The objective is
      determining if the NN keeps the same performance if we show it a
      topology it has never seen.

   *  Link failures: here we remove a random link from the topology.

       +----------------------------------------------------------+
       |  Mean Average Percentage Error of the delay prediction   |
       +----------------------+-----------------------------------+
       |       Scenario       |    MLP    |    RNN    |    GNN    |
       +----------------------+-----------+-----------+-----------+
       |  Same topology       |   0.123   |   0.1     |   0.020   |
       |  Different topology  |  11.5     |   0.305   |   0.019   |
       |  Link failures       |   1.15    |   0.638   |   0.042   |
       +----------------------+-----------+-----------+-----------+

Paillisse, et al.        Expires 12 January 2023               [Page 16]
Internet-Draft      Network Performance Digital Twin           July 2022

       Figure 3: Performance comparison of different NN architectures

   We can see that all NNs predict with excellent accuracy the network
   delay if we don't change the topology used during training.  However,
   when it comes to new topologies, the error of the MLP is unacceptable
   (1150 %), as well as the RNN, around 30%. On the other hand, the GNN
   can understand new topologies, with an error below 2%. Similarly, if
   a link fails, the RNN has difficulties offering accurate predictions
   (60% error), while the GNN maintains the accuracy (4.2%).  These
   results show the potential of GNNs to build a Network Performance
   Digital Twin.

8.  Training

   In the context of Digital Twins based on Machine Learning, they
   require a training process before they can be deployed.  Commonly,
   the training process makes use of a dataset of inputs and expected
   outputs, that guides the training process to adjust the internal
   architecture of e.g. the neural network.  There are some caveats
   regarding the training process:

   *  In order to obtain sufficient accuracy, the training dataset needs
      to be representative, that is, contain samples of a wide range of
      possible inputs and outputs.  In networks, this translates to
      samples of a congested network, with a link failure, etc.
      Otherwise, the resulting algorithm cannot predict such situations.

   *  Taking the latter into account, this means that some kind of
      samples, e.g. those of a congested or disrupted network are
      difficult to obtain from a production network.

   *  A way to acquire those samples is in a testbed, although it may
      not be possible for some networks, especially those of large
      scale.  A possible solution in this situation is developing Neural
      Networks that are invariant to some of the metrics of the graph,
      e.g. number of nodes.  That is, the NN does not lose accuracy if
      the number of nodes increases.  This makes it possible to train
      the NN in a testbed, and then deploy it in a network that is
      larger than the testbed without losing accuracy.

9.  IANA Considerations

   This memo includes no request to IANA.

Paillisse, et al.        Expires 12 January 2023               [Page 17]
Internet-Draft      Network Performance Digital Twin           July 2022

10.  Security Considerations

   An attacker can alter the software image of the NPDT.  This could
   produce inaccurate performance estimations, that could result in
   network misconfigurations, disruptions or outages.  Hence, in order
   to prevent the accidental deployment of a malicious NPDT, the
   software image of the NPDT MUST be digitally signed by the vendor.

11.  References

11.1.  Normative References

11.2.  Informative References

   [OMNET]    "https://omnetpp.org/", 2022.

   [ns-3]     "https://www.nsnam.org/", 2022.

   [P4Rspec]  "https://p4.org/p4-spec/p4runtime/main/P4Runtime-
              Spec.html", 2021.

   [OFspec]   "TS-025: OpenFlow Switch Specification
              https://opennetworking.org/wp-content/uploads/2014/10/
              openflow-switch-v1.5.1.pdf", 2015.

   [NetworkXlib]
              "https://networkx.org/", 2022.

   [openconfig-rtgwg-gnmi-spec-01]
              Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack,
              C., and C. Morrow, "gRPC Network Management Interface
              (gNMI)", March 2018,
              <https://datatracker.ietf.org/doc/html/draft-openconfig-
              rtgwg-gnmi-spec-01>.

   [RFC8040]  Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF
              Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017,
              <https://www.rfc-editor.org/info/rfc8040>.

   [RFC6241]  Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
              and A. Bierman, Ed., "Network Configuration Protocol
              (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
              <https://www.rfc-editor.org/info/rfc6241>.

   [RFC6830]  Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The
              Locator/ID Separation Protocol (LISP)", RFC 6830,
              DOI 10.17487/RFC6830, January 2013,
              <https://www.rfc-editor.org/info/rfc6830>.

Paillisse, et al.        Expires 12 January 2023               [Page 18]
Internet-Draft      Network Performance Digital Twin           July 2022

   [RFC4655]  Farrel, A., Vasseur, J.-P., and J. Ash, "A Path
              Computation Element (PCE)-Based Architecture", RFC 4655,
              DOI 10.17487/RFC4655, August 2006,
              <https://www.rfc-editor.org/info/rfc4655>.

   [RFC7047]  Pfaff, B. and B. Davie, Ed., "The Open vSwitch Database
              Management Protocol", RFC 7047, DOI 10.17487/RFC7047,
              December 2013, <https://www.rfc-editor.org/info/rfc7047>.

   [RFC3954]  Claise, B., Ed., "Cisco Systems NetFlow Services Export
              Version 9", RFC 3954, DOI 10.17487/RFC3954, October 2004,
              <https://www.rfc-editor.org/info/rfc3954>.

   [I-D.draft-zhou-nmrg-digitaltwin-network-concepts]
              Zhou, C., Yang, H., Duana, X., Lopez, D., Pastor, A., Wu,
              Q., Boucadir, M., and C. Jacquenet, "Digital Twin Network:
              Concepts and Reference Architecture", Work in Progress,
              Internet-Draft, draft-zhou-nmrg-digitaltwin-network-
              concepts-06, 2 December 2021,
              <https://datatracker.ietf.org/doc/html/draft-zhou-nmrg-
              digitaltwin-network-concepts-06>.

   [irtf-nmrg-ibn-concepts-definitions-09]
              Clemm, A., Ciavaglia, L., Granville, L. Z., and J.
              Tantsura, "Intent-Based Networking - Concepts and
              Definitions", March 2022,
              <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-
              ibn-concepts-definitions-09>.

   [digital-twin-5G]
              Nguyen, H. X., Trestian, R., To, D., and M. Tatipamula,
              "Digital Twin for 5G and Beyond", 2021,
              <https://doi.org/10.1109/MCOM.001.2000343>.

   [digital-twin-vanets]
              Zhao, L., Han, G., Li, Z., and L. Shu, "Intelligent
              Digital Twin-Based Software-Defined Vehicular Networks",
              2020, <https://doi.org/10.1109/MNET.011.1900587>.

   [digital-twin-industry]
              Groshev, M., Guimarães, C., Martín-Pérez, J., and A. D. L.
              Oliva, "Toward Intelligent Cyber-Physical Systems: Digital
              Twin Meets Artificial Intelligence", 2021,
              <https://doi.org/10.1109/MCOM.001.2001237>.

Paillisse, et al.        Expires 12 January 2023               [Page 19]
Internet-Draft      Network Performance Digital Twin           July 2022

   [streaming-telemetry]
              Gupta, A., Harrison, R., Canini, M., Feamster, N.,
              Rexford, J., and W. Willinger, "Sonata: Query-Driven
              Streaming Network Telemetry", 2018,
              <https://doi.org/10.1145/3230543.3230555>.

   [network-capacity]
              Ellis, A. D., Suibhne, N. M., Saad, D., and D. N. Payne,
              "Communication networks beyond the capacity crunch", 2016,
              <https://royalsocietypublishing.org/doi/abs/10.1098/
              rsta.2015.0191>.

   [planning-scalability]
              Zhu, H., Gupta, V., Ahuja, S. S., Tian, Y., Zhang, Y., and
              X. Jin, "Network Planning with Deep Reinforcement
              Learning", 2021,
              <https://doi.org/10.1145/3452296.3472902>.

   [limitations-net-sim]
              Rampfl, S., "Network simulation and its limitations",
              2013, <https://doi.org/10.2313/NET-2013-08-1_08>.

   [emulation-perf]
              Jurgelionis, A., Laulajainen, J., Hirvonen, M., and A. I.
              Wang, "An Empirical Study of NetEm Network Emulation
              Functionalities", 2011,
              <https://doi.org/10.1109/ICCCN.2011.6005933>.

   [qt-precision]
              Ferriol-Galmés, M., Rusek, K., Suárez-Varela, J., Xiao,
              S., Cheng, X., Barlet-Ros, P., and A. Cabellos-Aparicio,
              "RouteNet-Erlang: A Graph Neural Network for Network
              Performance Evaluation", 2022,
              <https://arxiv.org/abs/2202.13956>.

   [inet-traffic]
              Popoola, J. and R. Ipinyomi, "Empirical Performance of
              Weibull Self-Similar Tele-traffic Model", 2017.

   [MLP]      Pal, S. and S. Mitra, "Multilayer perceptron, fuzzy sets,
              and classification", 1992,
              <https://doi.org/10.1109/72.159058>.

   [RNN]      Hochreiter, S. and J. Schmidhuber, "Long Short-Term
              Memory", 1997,
              <https://doi.org/10.1162/neco.1997.9.8.1735>.

Paillisse, et al.        Expires 12 January 2023               [Page 20]
Internet-Draft      Network Performance Digital Twin           July 2022

   [RNN-speech]
              Mikolov, T., Kombrink, S., Burget, L., Černocký, J., and
              S. Khudanpur, "Extensions of recurrent neural network
              language model", 2011,
              <https://doi.org/10.1109/ICASSP.2011.5947611>.

   [GNN]      Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M.,
              and G. Monfardini, "The Graph Neural Network Model", 2009,
              <https://doi.org/10.1109/TNN.2008.2005605>.

   [DEFO]     Hartert, R., Vissicchio, S., Schaus, P., Bonaventure, O.,
              Filsfils, C., Telkamp, T., and P. Francois, "A Declarative
              and Expressive Approach to Control Forwarding Paths in
              Carrier-Grade Networks", 2015,
              <https://doi.org/10.1145/2785956.2787495>.

   [facebook-config]
              Sung, Y. E., Tie, X., Wong, S. H., and H. Zeng, "Robotron:
              Top-down Network Management at Facebook Scale", 2016,
              <https://doi.org/10.1145/2934872.2934874>.

   [auto-dc-topology]
              Salman, S., Streiffer, C., Chen, H., Benson, T., and A.
              Kadav, "DeepConf: Automating Data Center Network
              Topologies Management with Machine Learning", 2018,
              <https://doi.org/10.1145/3229543.3229554>.

   [CNN-images]
              Krizhevsky, A., Sutskever, I., and G. E. Hinton, "ImageNet
              Classification with Deep Convolutional Neural Networks",
              2012, <https://proceedings.neurips.cc/paper/2012/file/
              c399862d3b9d6b76c8436e924a68c45b-Paper.pdf>.

   [MARL-TE]  Bernárdez, G., Suárez-Varela, J., López, A., Wu, B., Xiao,
              S., Cheng, X., Barlet-Ros, P., and A. Cabellos-Aparicio,
              "Is Machine Learning Ready for Traffic Engineering
              Optimization?", 2021,
              <https://doi.org/10.1109/ICNP52444.2021.9651930>.

   [LS]       Gay, S., Hartert, R., and S. Vissicchio, "Expect the
              unexpected: Sub-second optimization for segment routing",
              2017, <https://doi.org/10.1109/INFOCOM.2017.8056971>.

   [DNN-TM]   Valadarsky, A., Schapira, M., Shahaf, D., and A. Tamar,
              "Learning to Route", 2017,
              <https://doi.org/10.1145/3152434.3152441>.

Paillisse, et al.        Expires 12 January 2023               [Page 21]
Internet-Draft      Network Performance Digital Twin           July 2022

   [ReRoute-Cost]
              Zheng, J., Xu, Y., Wang, L., Dai, H., and G. Chen, "Online
              Joint Optimization on Traffic Engineering and Network
              Update in Software-defined WANs", 2021,
              <https://doi.org/10.1109/INFOCOM42981.2021.9488837>.

   [NLP]      Chowdhary, K. R., "Natural Language Processing", 2020,
              <https://doi.org/10.1007/978-81-322-3972-7_19>.

   [Google-Clos]
              Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead,
              A., Bannon, R., Boving, S., Desai, G., Felderman, B.,
              Germano, P., Kanagala, A., Provost, J., Simmons, J.,
              Tanda, E., Wanderer, J., H\"{o}lzle, U., Stuart, S., and
              A. Vahdat, "Jupiter Rising: A Decade of Clos Topologies
              and Centralized Control in Google's Datacenter Network",
              2015, <https://doi.org/10.1145/2785956.2787508>.

   [digital-twin-AI]
              Mozo, A., Karamchandani, A., Gómez-Canaval, S., Sanz, M.,
              Moreno, J. I., and A. Pastor, "B5GEMINI: AI-Driven Network
              Digital Twin", 2022,
              <https://www.mdpi.com/1424-8220/22/11/4106>.

Acknowledgements

   TBD

Authors' Addresses

   Jordi Paillisse
   UPC-BarcelonaTech
   c/ Jordi Girona 1-3
   08034 Barcelona Catalonia
   Spain
   Email: jordi.paillisse@upc.edu

   Paul Almasan
   UPC-BarcelonaTech
   c/ Jordi Girona 1-3
   08034 Barcelona Catalonia
   Spain
   Email: felician.paul.almasan@upc.edu

Paillisse, et al.        Expires 12 January 2023               [Page 22]
Internet-Draft      Network Performance Digital Twin           July 2022

   Miquel Ferriol
   UPC-BarcelonaTech
   c/ Jordi Girona 1-3
   08034 Barcelona Catalonia
   Spain
   Email: miquel.ferriol@upc.edu

   Pere Barlet
   UPC-BarcelonaTech
   c/ Jordi Girona 1-3
   08034 Barcelona Catalonia
   Spain
   Email: pere.barlet@upc.edu

   Albert Cabellos
   UPC-BarcelonaTech
   c/ Jordi Girona 1-3
   08034 Barcelona Catalonia
   Spain
   Email: alberto.cabellos@upc.edu

   Shihan Xiao
   Huawei
   China
   Email: xiaoshihan@huawei.com

   Xiang Shi
   Huawei
   China
   Email: shixiang16@huawei.com

   Xiangle Cheng
   Huawei
   China
   Email: chengxiangle1@huawei.com

   Diego Perino
   Telefonica I+D
   Barcelona
   Spain
   Email: diego.perino@telefonica.com

Paillisse, et al.        Expires 12 January 2023               [Page 23]
Internet-Draft      Network Performance Digital Twin           July 2022

   Diego Lopez
   Telefonica I+D
   Seville
   Spain
   Email: diego.r.lopez@telefonica.com

   Antonio Pastor
   Telefonica I+D
   Madrid
   Spain
   Email: antonio.pastorperales@telefonica.com

Paillisse, et al.        Expires 12 January 2023               [Page 24]
A Performance-Oriented Digital Twin for Carrier Networks draft-paillisse-nmrg-performance-digital-twin-00

A Performance-Oriented Digital Twin for Carrier Networks
draft-paillisse-nmrg-performance-digital-twin-00