Skip to main content

Data Collection Requirements and Technologies for Digital Twin Network
draft-zcz-nmrg-digitaltwin-data-collection-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Expired".
Authors Cheng Zhou , Danyang Chen , Pedro Martinez-Julia
Last updated 2022-07-10
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-zcz-nmrg-digitaltwin-data-collection-00
Internet Research Task Force                                     C. Zhou
Internet-Draft                                                   D. Chen
Intended status: Informational                              China Mobile
Expires: 11 January 2023                          P. Martinez-Julia, Ed.
                                                                    NICT
                                                            10 July 2022

 Data Collection Requirements and Technologies for Digital Twin Network
             draft-zcz-nmrg-digitaltwin-data-collection-00

Abstract

   The Digital Twin Network is a network system with Physical Network
   and Twin Network, which can be mapped interactively in real time.
   The construction of Digital Twin Network requires real-time data of
   Physical Network to update the state of Twin Network.  This document
   aims to describe the data collection requirements and provide data
   collection methods or tools to build the data repository for digital
   twin network.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 11 January 2023.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Zhou, et al.             Expires 11 January 2023                [Page 1]
Internet-Draft            Network Working Group                July 2022

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Definitions and Acroyms . . . . . . . . . . . . . . . . . . .   3
   3.  Data Collection Requirements for Digital Twin Network . . . .   3
     3.1.  Target Driven and On-demand Collection  . . . . . . . . .   3
     3.2.  Diverse Tools for Various Data  . . . . . . . . . . . . .   4
     3.3.  Lightweight and Efficient Collection  . . . . . . . . . .   5
     3.4.  Open and Standardized Interfaces  . . . . . . . . . . . .   5
     3.5.  Naming for Caching  . . . . . . . . . . . . . . . . . . .   6
     3.6.  Efficient Multi-Destination Delivery  . . . . . . . . . .   6
   4.  An Efficient Data Collection Method for Digital Twin
           Network . . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Efficient Data Collection Mechanism . . . . . . . . . . .   6
     4.3.  Data Collection Process . . . . . . . . . . . . . . . . .   8
   5.  Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .   9
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   With the deployment of Internet of Things (IoT), cloud computing and
   data center, etc., the scale of the current network is expanded
   gradually.  However, the increase of network scale leads to also
   increasing the complexity of the current network, and it induces
   plenty of problems.  In order to improve the autonomy ability of
   network and reduce potential negative effects on physical and virtual
   networks, we consider that an endogenous intelligent and autonomous
   network architecture which achieves self-optimization and decision is
   indispensable (in general, self-management and self-operation).  The
   digital twin technology answers to the challenge of building self-
   management systems because it can optimize and validate policies
   through real-time and interactive mapping with physical
   entities.[I-D.irtf-nmrg-network-digital-twin-arch]

Zhou, et al.             Expires 11 January 2023                [Page 2]
Internet-Draft            Network Working Group                July 2022

   Data is the cornerstone required for constructing a digital twin for
   a network, namely a Digital Twin Network (DTN).  In the face of large
   network scale, data collection, storage and management are faced with
   great challenges.  So, data collection methods and tools should meet
   the requirements of target-driven, diversity, lightweight and
   efficiency, while being open and standardized.  Among all the
   requirements, achieving a lightweight and efficient data collection
   method is of the most importance.  If the full-data collection method
   is adopted, huge storage space and bandwidth resource is needed,
   especially for complex scenarios that require real-time data and
   traffic from multi-source and heterogeneous devices.  Therefore, it
   is extremely important to agree on lightweight and efficient data
   collection, aggregation, and correlation methods, toward building the
   telemetry data transmission, processing, and storage required to
   build a DTN system.

2.  Definitions and Acroyms

   PN: Physical Network

   IMC: Instruction Management Center

   DSC: Data Storage Center

   DTN: Digital Twin Network

   TSE: Telemetry Streaming Element

   RDF: Resource Description Framework

   CPE: Complex Event Processing

3.  Data Collection Requirements for Digital Twin Network

3.1.  Target Driven and On-demand Collection

   The monitoring data of a network is the basis to build a DTN system.
   Such data is collected from physical and virtual networks.  It
   includes, but is not limited to, the following types:

   *  Provisional and operational status of physical or virtual devices,
      as well as the network topology with all network elements.

   *  Running status of physical, logical, or virtual ports and links.

   *  Logs and events records of all the network elements.

Zhou, et al.             Expires 11 January 2023                [Page 3]
Internet-Draft            Network Working Group                July 2022

   *  Statistics (packet loss, traffic throughput, latency, etc.) of
      flows and ports.

   *  Various data regarding users and services.

   *  Lift-cycle operation data of all network elements.

   *  All above data in time series.

   The collection of network data for maintaining a DTN should be in
   target-driven and on-demand mode.  It is not always necessary to
   collect complete network data list above because of the high cost of
   resources (CPU, memory, bandwidth etc.).  The type, frequency and
   method of data collection aim to meet the application of a DTN
   depends on the specific network topology and application
   requirements.

3.2.  Diverse Tools for Various Data

   The different types of network data used to maintain a DTN have
   several characteristics.  Some data (e.g. port statistics, key link
   info, etc.) requires higher collecting frequency, and some data (e.g.
   flow status, link fault, etc.) needs to be of higher level of real-
   time.  Some data (e.g. device status, port statistics, etc.) can be
   collected directly and simply via normal tools, while some data (e.g.
   per-flow latency, traffic matrix, etc.) can only be acquired through
   complex network measurement.  Therefore, multiple tools or methods
   are needed to collect the massive data required to build the DTN
   entity.

   Currently, some widely-used tools, such as SNMP, NetConf, Telemetry,
   INT (In-band Network Telemetry), DPI (Deep Packet Inspection), etc.
   can be candidate tools to collect data for digital twin network.
   Going forward, it is necessary to study new data collection
   technology in the following aspects in combination with the data
   requirements of network application for DTN:

   *  High-performance data collection technology based on programmable
      circuits.

   *  Measurement methods for complex network data such as network
      performance and network traffic.

   *  Collaborative data collection technology for multiple data
      sources.

Zhou, et al.             Expires 11 January 2023                [Page 4]
Internet-Draft            Network Working Group                July 2022

   *  Distributed and collaborative data collection technology for
      complex network, and the time synchronization problem of data
      acquisition.

3.3.  Lightweight and Efficient Collection

   Data collection tools and methods should be as lightweight as
   possible, so as to reduce the occupation of network equipment
   resources and ensure that data collection does not affect the normal
   operation of the network.  The major requirements are list as below.

   *  Data collection tools and methods needs to improve efficiency of
      execution, reduce the cost of computing, storage and communication
      bandwidth.

   *  The collection of redundant data should be avoided or minimized.

   *  For the data set that needs to be collected, make full use of the
      data compression technology, to reduce the resource cost in the
      collection phase.

3.4.  Open and Standardized Interfaces

   Data collection interface used to build the DTN should be open and
   standardized to help avoid either hardware or software vendor lock,
   and achieve inter-operability.  The major requirements of data
   collection interfaces are:

   *  Support configuration management, including the data collection
      protocol, frequency or period, etc.

   *  Support several speed options (e.g. minute-level, 10-second level,
      second level (near real time), and real time level) to accommodate
      different data requirements from applications.

   *  Be extensible so that more features can be added with limited
      parameter changes and with backward compatibility.

   *  Be able to provide secure and reliable information exchange
      mechanism.

Zhou, et al.             Expires 11 January 2023                [Page 5]
Internet-Draft            Network Working Group                July 2022

3.5.  Naming for Caching

   Both raw network data and knowledge items obtained from monitoring
   must be able to be addressed uniquely.  This means to give a unique
   identifier or "name" to each data or knowledge item that references
   it.  This name will be used by caching mechanisms to store the data
   and provide it for clients that request it, which will also use such
   name.

3.6.  Efficient Multi-Destination Delivery

   The maintenance of DTN systems will not be the sole purpose of
   monitoring information and knowledge communication.  Other
   applications would also request raw telemetry data or knowledge
   items.  They can use the name to identify it.  The telemetry system,
   following the recommendations of RFC 9232 [RFC9232], will deliver the
   requested data or knowledge items to the requesters as much
   efficiently as possible.  On the one hand, items will be provided by
   the closest cache to the destination of the data.  On the other hand,
   items will be replicated in the best nodes, following an efficient
   multi-cast spanning tree.  Different underlying protocols can be used
   to achieve this mechanism.

4.  An Efficient Data Collection Method for Digital Twin Network

4.1.  Overview

   The system that manages the DTN maps, in real time, the PN to the
   DTN.  However the existing methods collect the full data from the PN
   for modeling, and do not consider problems like time-lag,
   insufficient storage resources, low computational efficiency and
   waste of bandwidth resources caused by data transmission.  In order
   to solve these problems, this section introduces an efficient data
   collection method for maintaining the DTN.  This data collection
   method is based on sending instructions to the elements of the PN for
   them to pre-process the data (data cleaning or knowledge
   representation) before sending it back to be applied to the DTN.

4.2.  Efficient Data Collection Mechanism

   The management system structure consists of the PN and the DTN.  The
   PN includes multiple Data Storage Centers (DSC) and Telemetry
   Streaming Element (TSE), and the DTN includes the Instruction
   Management Center (IMC) and Data Storage Center (DSC).  The TSE has
   multiple functions, including data collection, data aggregation, data
   correlation, knowledge representation and query, etc.  In addition, a
   Complex Event Processing (CEP) engine is integrated into TSE to
   perform queries to the streamed data.  The IMC has two functions.  On

Zhou, et al.             Expires 11 January 2023                [Page 6]
Internet-Draft            Network Working Group                July 2022

   the one hand, it is used to manage the registration of the DSC in the
   PN side, and its registration information can include various key
   information such as the IP address of the DSC in the PN side, chosen
   data type, and various index names in the data, data source name and
   data size, etc.  On the other hand, it is used to adaptively
   configure data collection instructions according to the collection
   requirements of the DSC in the DTN side and search for IP addresses
   to send instructions.  The instruction-carrying information includes
   rule-based mathematical expressions, executable models in .exe
   format, dynamic collection frequency, parameter lists, program text
   files in .m format, text files with parameter configuration, and
   other types of files.  Instructions are flexible and programmable,
   and can be created, modified, combined, and deleted at any time
   according to requirements.  When the DSC of the DTN side requests
   data to the IMC, the IMC searches the IP address of the DSC in the
   database with the registration information, which is built according
   to critical information, such as data type and data name, and
   functional instructions for data processing or knowledge
   representation can be implemented depending on the demand
   configuration.  The DSC of the DTN side stores the effective
   information after data processing and knowledge representation
   returned by the TSE.

   The DSC in the PN side has two functions.  On the one hand, it stores
   data of various types, such as performance indicators, operational
   status, log, traffic scheduling, business requirements, etc.  On the
   other hand, it has the function of automatically parsing the
   instructions sent by the TSE.  Then the operating environment of the
   instruction is configured according to the instruction needs, and
   data processing or knowledge representation is performed based on the
   instruction.  Data processing mainly includes data cleaning, filling
   missing data, normalization, conflict verification, etc.  Knowledge
   representation refers to the representation of the original data as a
   data structure that can be used for efficient computation.  Such
   representation results are closer to machine language, which is
   conducive to the rapid and accurate construction of the model.  The
   role of knowledge representation is to represent the original data as
   a data structure that can be used to efficiently calculate.  Such
   representation results closer to the machine language, which is
   conducive to the rapid and accurate construction of the model.

Zhou, et al.             Expires 11 January 2023                [Page 7]
Internet-Draft            Network Working Group                July 2022

       +------------------------------+   +-----------------------+
       |   Physical  Network          |   |  Digital Twin Network |
       | +-----+    +-----+  +------+ |   |  +------+  +-------+  |
       | |     |    |     |  |      | |   |  |      |  |       |  |
       | | DSC |... | DSC |  | TSE  | |   |  |  IMC |  |  DSC  |  |
       | |     |    |     |  |      | |   |  |      |  |       |  |
       | +-+---+    +--+--+  +---+--+ |   |  +---+--+  +----+--+  |
       |   |           |         |    |   |      |          |     |
       +------------------------------+   +-----------------------+
           |           |         |               |          |
           | 1.1. Register       |               |          |
           +-----------+--------->               |          |
           |           |         |               |          |
           |           | 1.2. Register           |          |
           |           +--------->               |          |
           |           |         | 1.3. Register |          |
           |           |         +--------------->          |
           |           |         |             2. Data req. |
           |           |         |               <----------+
           |           |         | 3. Query and instruction |
           |           |         |    configuration         |
           |           |         |               +          |
           |           |         4. Send instructions       |
           |           |         <---------------+          |
           |           |         |               |          |
           |           |   5. Parse and execute  |          |
           |           |      instruction        |          |
           | 6. Data subscript.  |               |          |
           <---------------------+               |          |
           | 7. Knowledge        |               |          |
           |    representation   |               |          |
           |     8. Data pushing |               |          |
           +--------------------->               |          |
           |           | 9. Data aggregation and |          |
           |           |    correlation          |          |
           |           |         | 10. Send processed data  |
           |           |         +-------------------------->
           |           |         |               |          |

                     Figure 1: Data Collection Process

4.3.  Data Collection Process

   The specific process is as follows:

   *  The DSC in the PN side registers into the TSE.  The TSE registers
      into the IMC.  Both provide their IP addresses, the data type, the
      data source, the data size, etc.

Zhou, et al.             Expires 11 January 2023                [Page 8]
Internet-Draft            Network Working Group                July 2022

   *  The DSC in the DTN side sends the data collection request to the
      IMC.

   *  According to the data collection request, the IMC intelligently
      queries the registration addressing information and configures the
      data processing instruction.

   *  The IMC in the DTN side sends the corresponding instruction
      according to the query result to the TSE.

   *  After receiving the instructions, the TSE parses them and executes
      them.  The query function can be performed by the CEP engine,
      which receives all telemetry data and processes it with all
      queries provided.

   *  The TSE sends data subscription to DSC in the PN side.

   *  The DSC in the PN side represents the data semantically in RDF
      form or sends the data in raw form to the TSE for it to make the
      semantic representation.

   *  The DSC in the PN side pushes the data or knowledge item to the
      TSE.

   *  The TSE aggregates and correlates the collected data or knowledge
      items.  Then, according to the actual needs, generates aggregated
      data or knowledge items.

   *  The TSE sends the resulting data or knowledge items to the DSC in
      the DTN side.

5.  Summary

   This draft describes the requirements for data collection and
   provides the data collection methods or tools required to build the
   data repository for maintaining DTN systems.  These data collection
   methods or tools should meet the requirement of target-driven,
   diversity, lightweight and efficiency, while being open and
   standardized.  Among all the requirements, lightweight and efficiency
   requirements are the most important.  Thus, this draft provides a
   lightweight and efficient method for data collection that is
   particularly optimized for maintaining DTN systems.  Going forward,
   more methods (transformation and aggregation functions) and tools
   (solutions) shall be studied to extend the contents of this draft.

Zhou, et al.             Expires 11 January 2023                [Page 9]
Internet-Draft            Network Working Group                July 2022

6.  Security Considerations

   TBD.

7.  IANA Considerations

   This document has no requests to IANA.

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC9232]  Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
              A. Wang, "Network Telemetry Framework", RFC 9232,
              DOI 10.17487/RFC9232, May 2022,
              <https://www.rfc-editor.org/info/rfc9232>.

8.2.  Informative References

   [I-D.irtf-nmrg-network-digital-twin-arch]
              Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu,
              Q., Boucadair, M., and C. Jacquenet, "Digital Twin
              Network: Concepts and Reference Architecture", Work in
              Progress, Internet-Draft, draft-irtf-nmrg-network-digital-
              twin-arch-00, 21 March 2022,
              <https://www.ietf.org/archive/id/draft-irtf-nmrg-network-
              digital-twin-arch-00.txt>.

Authors' Addresses

   Cheng Zhou
   China Mobile
   Beijing
   100053
   China
   Email: zhouchengyjy@chinamobile.com

   Danyang Chen
   China Mobile
   Beijing
   100053
   China

Zhou, et al.             Expires 11 January 2023               [Page 10]
Internet-Draft            Network Working Group                July 2022

   Email: chendanyang@chinamobile.com

   Pedro Martinez-Julia (editor)
   NICT
   4-2-1, Nukui-Kitamachi, Koganei, Tokyo
   184-8795
   Japan
   Email: pedro@nict.go.jp

Zhou, et al.             Expires 11 January 2023               [Page 11]