Skip to main content

Knowledge Graphs for YANG-based Network Management
draft-marcas-nmop-knowledge-graph-yang-05

Document Type Active Internet-Draft (individual)
Authors Ignacio Dominguez Martinez-Casanueva , Lucía Cabanillas Rodríguez , Pedro Martinez-Julia
Last updated 2024-10-21
RFC stream (None)
Intended RFC status (None)
Formats
Additional resources GitHub Repository
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-marcas-nmop-knowledge-graph-yang-05
Network Management Operations                   I. D. Martinez-Casanueva
Internet-Draft                                             L. Cabanillas
Intended status: Informational                                Telefonica
Expires: 24 April 2025                              P. M. Martinez-Julia
                                                                    NICT
                                                         21 October 2024

           Knowledge Graphs for YANG-based Network Management
               draft-marcas-nmop-knowledge-graph-yang-05

Abstract

   The success of the YANG language and YANG-based protocols for
   managing the network has unlocked new opportunities in network
   analytics.  However, the wide heterogeneity of YANG models hinders
   the consumption and analysis of network data.  Besides, data encoding
   formats and transport protocols will differ depending on the network
   management protocol supported by the network device.  These
   challenges call for new data management paradigms that facilitate the
   discovery, understanding, integration and access to silos of
   heterogenous YANG data, abstracting from the complexities of the
   network devices.

   This document introduces the knowledge graph paradigm as a solution
   to this data management problem, with focus on YANG-based network
   management.  The document provides background on related topics such
   as ontologies and graph standards, and shares guidelines for
   implementing knowledge graphs from YANG data.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://idomingu.github.io/knowledge-graph-yang/draft-marcas-
   knowledge-graph-yang.html.  Status information for this document may
   be found at https://datatracker.ietf.org/doc/draft-marcas-nmop-
   knowledge-graph-yang/.

   Discussion of this document takes place on the Network Management
   Operations Working Group mailing list (mailto:nmop@ietf.org), which
   is archived at https://mailarchive.ietf.org/arch/browse/nmop/.
   Subscribe at https://www.ietf.org/mailman/listinfo/nmop/.

   Source for this draft and an issue tracker can be found at
   https://github.com/idomingu/knowledge-graph-yang.

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 1]
Internet-Draft            knowledge-graph-yang              October 2024

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 April 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions and Definitions . . . . . . . . . . . . . . . . .   4
     2.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Acronyms  . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  A Bief Introduction to Knowledge Graphs . . . . . . . . . . .   5
     3.1.  What is a Knowledge Graph?  . . . . . . . . . . . . . . .   5
     3.2.  Key Graph Standards . . . . . . . . . . . . . . . . . . .   6
     3.3.  Knowledge Objects . . . . . . . . . . . . . . . . . . . .   7
   4.  Knowledge Graph Construction (KGC)  . . . . . . . . . . . . .   7
     4.1.  Ontology Development  . . . . . . . . . . . . . . . . . .   7
       4.1.1.  Automatic knowledge Extraction from YANG Models . . .   7
       4.1.2.  Standard Development Methodologies  . . . . . . . . .   8
     4.2.  Knowledge Graph Construction Pipeline . . . . . . . . . .   9
       4.2.1.  Ingestion . . . . . . . . . . . . . . . . . . . . . .   9
       4.2.2.  Mapping . . . . . . . . . . . . . . . . . . . . . . .  10

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 2]
Internet-Draft            knowledge-graph-yang              October 2024

       4.2.3.  Materialization . . . . . . . . . . . . . . . . . . .  11
   5.  Knowledge Graph Applications  . . . . . . . . . . . . . . . .  11
   6.  Challenges  . . . . . . . . . . . . . . . . . . . . . . . . .  13
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  14
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
   9.  Open Issues . . . . . . . . . . . . . . . . . . . . . . . . .  15
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  15
     10.2.  Informative References . . . . . . . . . . . . . . . . .  16
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  21
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

1.  Introduction

   The size and complexity of networks keeps increasing, thus the path
   towards enabling an autonomous network requires the combination of
   network telemetry mechanisms [RFC9232].  These mechanisms range from
   legacy protocols like SNMP to the recent model-driven telemetry (MDT)
   based on the YANG language [RFC7950] and network management protocols
   such as NETCONF [RFC6241], RESTCONF [RFC8040], or gNMI [GNMI].

   MDT, in particular, has drawn the attention of the network industry
   owing to the benefits of modeling configuration and status data of
   the network with a formal data modeling language like YANG.  However,
   since the inception of YANG, the network industry has experienced the
   massive creation of YANG data models developed by vendors, standards
   developing organizations (e.g., IETF), and consortia (e.g.,
   OpenConfig).  In turn, these data models target different abstraction
   layers of the network, namely, network elements, and network service
   [RFC8199].  Additionally, YANG data models may augment (or deviate
   from) other models to define new features (or remove or adjust
   existing ones) depending on the implementation.  In summary, this
   trend has resulted into a wide variety of independent YANG data
   models, hence, the creation of data silos in the network.  Refer to
   Sections 4.1 and 4.4 of [I-D.boucadair-nmop-rfc3535-20years-later]
   for a discussion on the fragmented YANG ecosystem and the integration
   complexity issues.

   Such amount and heterogeneity of YANG data models has hindered the
   collection and combination of network data for advanced network
   analytics.  The current landscape shows different YANG models
   referencing the same concepts in a different way.  For example, the
   IETF "ietf-interface" [RFC8343] and OpenConfig "openconfig-
   interfaces" [oc-interfaces] follow different structures and syntax,
   but both reference the same "interface" concept.

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 3]
Internet-Draft            knowledge-graph-yang              October 2024

   Similarly, there are YANG models, like Service Assurance [RFC9418],
   that convey semantic relationships with other concepts via
   identifiers.  Figure 1 depicts the YANG tree diagram [RFC8340] for a
   subservice augmentation where the leaf "device" hints a relationship
   between the "subservice" concept and the "device" concept.

        module: ietf-service-assurance-device

          augment /sain:subservices/sain:subservice/sain:parameter:
            +--rw parameters
              +--rw device    string

         Figure 1: YANG tree diagram of a subservice augmentation.

   The extraction of this hidden knowledge from YANG models would enable
   the integration of YANG data silos at a conceptual level, regardless
   of the physical implementation (i.e., the YANG schema, syntax, and
   encoding format).  In this regard, the knowledge graph is a promising
   technology that can link data silos based on common concepts like
   "device" that are captured in ontologies.  Besides, by transforming
   the YANG data into a graph structure the relationships between data
   silos are represented as first class citizens in the graph instead of
   "foreign keys" where the relationship is made implicit.  This
   document provides guidelines for building a knowledge graph for data
   sources based on the YANG language.

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.1.  Terminology

   This document defines the following terms:

   Data materialization: Technique that collects data from remote data
   source and persists a copy the data in a target data storage.  This
   process can also be seen as Extract-Transform-Load (ETL).

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 4]
Internet-Draft            knowledge-graph-yang              October 2024

   Data virtualization: Technique wherein an intermediate component
   (i.e., data virtualization layer) exposes data available in a remote
   data sources without creating an copy of the data.  The data
   virtualization layer keeps pointers to the original location of data,
   so when a data consumer asks for these data, the virtualization layer
   collects the data from the source and directly serves the data to the
   consumer.

   Ontology: Formal, shared representation of knowledge in a domain.

2.2.  Acronyms

   CQ: Competency Question

   ETL: Extract-Transform-Load

   KG: Knowledge Graph

   KGC: Knowledge Graph Construction

   LOT: Linked Open Terms

   LPG: Labelled Property Graph

   OWL: Web Ontology Language

   RDF: Resource Description Framework

   RDFS: RDF Schema

   RML: RDF Mapping Language

   SAREF: Smart Applications REFerence

   SHACL: Shapes Constraint Language

   W3C: World Wide Web Consortium

3.  A Bief Introduction to Knowledge Graphs

3.1.  What is a Knowledge Graph?

   A knowledge graph contains a collection of facts alongside what know
   we about them and represents following a graph structure.  Knowledge
   graphs enable a contextualized understanding of data as the data
   (e.g., the individuals, instance) travel with the meaning of the data
   themselves (e.g., the concepts, knowledge).  For example, a knowledge
   graph may contain data about an interface "eth0", but also, that an

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 5]
Internet-Draft            knowledge-graph-yang              October 2024

   interface can be physical or virtual, belongs to a network device,
   and has a name, description, and an MTU.

   To this end, knowledge graphs build upon on ontologies, which are
   explicit representations of conceptualizations in a specific domains.
   In other words, ontologies can be seen as representations of
   conceptual models following a formal logic that allows machines to
   understand and reason over them.  In this regard, a conceptual model
   model, also known as information model, may translate into different
   data models depending on the data source technology [RFC3444].

   By mapping data models (i.e., physical level) with the concepts
   represented in ontologies (i.e., conceptual level), we can find
   heterogenous datasets scattered in the network that reference common
   concepts such as "interface" or "device".  Based on this semantic
   mapping, in addition to the flexibility of the graph structure,
   knowledge graphs enable the integration of heterogenous data based
   their semantics is what knowledge graphs can deliver.

3.2.  Key Graph Standards

   The Resource Description Framework (RDF) [RDF] data model from the
   W3C Semantic Web has been considered as the standard graph data model
   given its maturity.  For that reason, most of the knowledge graph
   implementations have relied upon the RDF standard and other standards
   from the Semantic Web like RDF Schema (RDFS) [RDFS], Ontology
   Language (OWL) [OWL], Shapes Constraint Language (SHACL) [SHACL], and
   SPARQL [SPARQL].

   However, the late success of graph databases like Neo4j have proved
   the Labelled Property Graph (LPG) data model as an alternative for
   implementing knowledge graphs.  Aiming to bridge the gap between
   these two graph data models, the W3C RDF-Star working group is
   investigating evolving RDF to facilitate the representation of
   statement about statements.

   Similarly, the ETSI ISG CIM defined the NGSI-LD standard
   [ETSI-GS-CIM-009], which builds upon two:

   *  An NGSI-LD information model which derives from the Labeled
      Property Graph (LPG) model and grounds on the RDF for a semantic
      annotation of the data in the graph.

   *  The NGSI-LD API, which defines a REST API for building and
      interacting with the graph.

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 6]
Internet-Draft            knowledge-graph-yang              October 2024

3.3.  Knowledge Objects

   The intrinsic nature of knowledge graphs is to connect as much
   knowledge as possible within certain scope---time and/or space.
   However, not all processes and operations require whole knowledge
   graphs.  For instance, the communication of a piece of telemetry
   data, organized according to NTF [RFC9232], can be repreented as a
   subset of the knowledge graph of all measurements.

   A knowledge object, as defined in [EERVC], consists in a knowledge
   graph subset of an arbitrary size---from single atoms to tens or
   hundreds of triples---that is decorated with metadata to facilitate
   its contextualization.

   Knowledge objects are particularly well suited to enable entities
   that work with knowledge graphs to communicate to each other
   knowledge pieces, obtained from their knowledge graphs or newly
   created from other sources, such as monitoring.  It has been
   demonstrated in [SECDEP].

4.  Knowledge Graph Construction (KGC)

   The construction of a knowledge graph can be divided into two main
   activities: ontology development (Section 4.1) and knowledge graph
   construction pipeline (Section 4.2).

4.1.  Ontology Development

   Ontologies provide the formal representation of the conceptual models
   that capture the semantics of data, and building on this, the
   integration of data in the knowledge graph.  Ontologies can be
   developed following different techniques, ranging from manual to
   fully automated, depending on the characteristics of the data to be
   integrated in the knowledge graph (e.g., format or schema).

4.1.1.  Automatic knowledge Extraction from YANG Models

   The extraction of knowledge from YANG models can be automated, in
   particular, by analyzing YANG identities to generate controlled
   vocabularies and taxonomies.

   [RFC7950] defines a YANG identity as "globally unique, abstract, and
   untyped identity", therefore, a relation between a YANG identity and
   a concept is straightforward.  Additionally, YANG identities can
   inherit from other YANG identities via the "base" statement.  These
   ideas align with the notion of a taxonomy, where concepts are
   hierarchically linked with other concepts.

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 7]
Internet-Draft            knowledge-graph-yang              October 2024

   To support the creation of knowledge structures like taxonomies or
   thesauri, the W3C standardized the Simple Knowledge Organization
   System (SKOS).  In such ontology, a concept scheme comprises a set of
   concepts that can be linked with other concepts via hierarchical and
   associative relations.  Typically, a YANG model containing YANG
   identities can be represented as an instance of the
   "skos:ConceptScheme" class.  Next, all YANG identities included in a
   YANG model can be represented as "skos:Concept instances" that are
   contained in the concept scheme.  Lastly, those YANG identities that
   include the "base" statement, the respective SKOS concept will
   include a relation "skos:broader" whose range is the SKOS concept
   representing the parent YANG identity.

4.1.2.  Standard Development Methodologies

   Automating the extraction of all the knowledge from YANG models is
   impossible, and therefore, manual intervention from domain experts is
   required.  To ease this process, a recommended practice is to develop
   the ontology by following a standard methodology like Linked Open
   Terms (LOT) [Poveda-Villalon2022].

   LOT is an ontology development methodology that adopts best practices
   from agile software development.  The methodology has been widely
   used in European projects as well as in the creation of the ETSI
   SAREF ontology and its extensions.  Precisely, with SAREF Ontology
   ETSI tackled a similar problem in the scope of IoT, where there is a
   heterogeneous variety of standard data models and protocols.  The
   methodology iterates over a workflow of the following four
   activities:

   1.  ontology requirements specification

   2.  ontology implementation

   3.  ontology publication, and

   4.  ontology maintenance.

   The workflow starts with the specification of requirements that the
   ontology must fulfill.  To that aim, the methodology requires
   collecting knowledge from domain experts, but also by analyzing the
   data sources (e.g., network devices) and schemas for the data (e.g.,
   YANG models) to be ingested and integrated in the knowledge graph.
   LOT recommends several approaches such as competency questions (CQs),
   natural language statements, or tabular information inspired by
   METHONTOLOGY.

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 8]
Internet-Draft            knowledge-graph-yang              October 2024

4.2.  Knowledge Graph Construction Pipeline

   The construction of a knowledge graph is supported by a data pipeline
   that follows the archetypical Extract-Transform-Load (ETL), wherein
   the raw data is collected from the source(s), transformed, and
   finally, stored for consumption.  The knowledge graph creation
   pipeline can thus be split into multiple steps as depicted in
   Figure 2.

         +-----------+       +---------+       +-----------------+
         |           |       |         |       |                 |
         | Ingestion +------>| Mapping +------>| Materialization |
         |           | Raw   |         | RDF   |                 |
         +-----------+ data  +---------+ data  +--------+--------+
               ^      (YANG)                            |
          Raw  |                                        | RDF
          data |                                        | data
         (YANG)|                                        |
               |                                        v
         +-----+----+                             +-----------+
         |   Data   |                             | Knowledge |
         |  Source  |                             |   Graph   |
         | (device) |                             +-----------+
         +----------+

           Figure 2: High-level architecture of a Knowledge Graph
                           Construction Pipeline

   These steps are the following: ingestion, mapping, and
   materialization.

4.2.1.  Ingestion

   Represents the first step in the creation of the knowledge graph.
   This step is realized by means of collectors that ingest raw data
   from the selected data source.  These collectors implement data
   access protocols which are specific to the technology and type of the
   data source.  When it comes to network management protocols based on
   YANG, these protocols can be NETCONF [RFC6241], RESTCONF [RFC8040]
   and gNMI [GNMI].

   Two main types of data sources are identified based on the techniques
   used to ingest the data, namely, batch and streaming.  In the case of
   batch data sources data are pulled (once or periodically) from the
   data source.  This could be represented by queries sent to a YANG
   server like an SDN controller to fetch the network topology
   [RFC8345].

Martinez-Casanueva, et al.Expires 24 April 2025                 [Page 9]
Internet-Draft            knowledge-graph-yang              October 2024

   Regarding streaming data sources, the collector subscribes to a YANG
   server to receive notifications of YANG data periodically or upon
   changes in the data source (e.g., a network device whose interface
   goes down).  These subscriptions can be realized, either based on
   configuration or dynamically, using mechanisms like YANG Push
   [RFC8641].  But additionally, another common scenario is the use of
   message broker systems like Apache Kafka for decoupling the ingestion
   of streams of YANG data
   [I-D.netana-nmop-yang-message-broker-integration].  Hence, knowledge
   graph collectors could also support the ingestion of YANG data from
   these kinds of message brokers.

4.2.2.  Mapping

   This second step consists at receiving the raw data data from the
   Ingestion step.  Here, the raw data is mapped to the concepts capture
   in one or more ontologies.  By applying these mapping rules, the raw
   data is semantically annotated and transformed into RDF data.  These
   mappings can be declared using declarative languages like RDF Mapping
   Language (RML) [Iglesias-Molina2023].

   RML is a declarative language that is currently being standardized
   within the W3C Knowledge Graph Construction Community group [W3C-KGC]
   that allows for defining mappings rules for raw data encoded in semi-
   structured formats like XML or JSON.  The benefits of using a
   declarative language like RML are twofold: i) the engine that
   implements the RML rules is generic, thus the mappings rules are
   decoupled from the code; ii) the explicit representation of mapping
   and transformation rules as part of the knowledge graph provides data
   lineage insights that can greatly improve data quality and the
   troubleshooting of data pipelines.  RML is making progress towards
   becoming a standard, but support of additional YANG encoding formats
   like CBOR [RFC8949] or Protobuf remains a challenge.  The knowledge
   payload carried by CBOR and/or Protobuf is organized as knowledge
   objects transmitted by the mapping entities and received by the
   materialization entities.  The use of knowledge objects allows them
   to easily "cut" knowledge graphs into smaller pieces, transmit them,
   and "paste" and/or "glue" the pieces onto the destination knowledge
   graph.  Consistency is retained by making the same ontologies be used
   with the particular knowledge objects.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 10]
Internet-Draft            knowledge-graph-yang              October 2024

4.2.3.  Materialization

   This is the final step of the knowledge graph creation.  This step
   receives as an input the knowledge object that contains RDF data
   generated in the Mapping step, which has easily manageable semantic
   triples---or quadruples---, as well as metadata to contextualize them
   and facilicate the incorporation of the knwoledge to the local
   knowledge graph storage element.  At this point, the RDF data can be
   sent to an RDF triple store like Apache Jena Fuseki [Fuseki] for
   consumption via SPARQL.  But alternatively, this step may transform
   the RDF data into an LPG structure and store the resulting data in a
   graph database like Neoj4 [Neo4j].  Similarly, the RDF data could
   also be transformed into the ETSI NGSI-LD standard and stored in an
   NGSI-LD Context Broker.

5.  Knowledge Graph Applications

   Network performance KPIs:  The integration of data at different
      levels of abstraction in the network can facilitate the
      computation of network performance KPIs, such as throughput or
      packet loss ratio.  By integrating data silos such as the network
      topology with the status of network interfaces, a network
      analytics application could ask the knowledge graph to compute the
      throughput or packets loss ratio at a specific link in the
      network.

   Anomaly detection and incident management:  Projects like NORIA
      [Tailhardat2023] have demonstrated how knowledge graphs can help
      in the detection of anomalies in network systems.  This approach
      links data pertaining to different data silos like network
      infrastructure, logs, alarms, and ticketing.  In another example,
      the combination network topology data with data about network
      interface status, consumers of the knowledge graph can detect
      network anomalies like link fault because an network interface has
      been unexpectedly disabled but it was configured to be enabled.

   Service assurance:  A knowledge graph can enable the implementation
      of the service assurance for intent-based networking architecture
      defined in [RFC9417].  Precisely, this architecture, and the
      companion YANG data models from RFC 9418, define an assurance
      graph where dependencies among network services and their
      associated health and symptoms are captured.  All these data,
      which can be further linked with other data silos like network
      topology or network interface status, can be naturally integrated
      and represented in a knowledge graph.

   Network digital twin:  Knowledge graph are considered promising

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 11]
Internet-Draft            knowledge-graph-yang              October 2024

      candidates for the realization of network digital twins
      [I-D.irtf-nmrg-network-digital-twin-arch].  Particularly, the
      benefits of using knowledge graphs to construct netwokr digital
      twins dedicated to he detection of network faults is demonstrated
      in [ANSA].  Services of such type benefit from the ability to
      integrate heterogenous silos of data, in combination with the
      explicit representation of the semantics of the data; making the
      knowledge graph a powerful technology for building and connecting
      multiple network digital twins.  In addition, the representation
      of concepts by means of ontologies, produces abstract
      representations of network digital twins, regardless of the
      complexities of the underlying technologies.  For instance, an
      abstract representation of a network topology Digital Map
      [I-D.havel-nmop-digital-map] in the knowledge graph can be
      translated into a descriptor or data model that is specific to the
      technology used (e.g., KNE, ContainerLab, or OSM).

   Evolution of YANG Catalog:  The flexibility and extensibility of
      knowledge graphs have made them a popular choice for implementing
      data catalogs.  The purpose of a data catalog is to provide
      consumers with a registry of datasets exposed by data sources
      where to find data of interest.  Additionally, these datasets can
      be linked to the (business) concepts that they refer to, so that
      consumers can search for datasets based on relevant concepts such
      as “interface”. Taking inspiration from these implementations, and
      building on a knowledge graph, the YANG Catalog could evolve
      towards a data catalog, where the YANG modules represent those
      datasets of interest.  The dependencies between YANG models
      (import, deviations, augments) can be naturally represented in the
      knowledge graph.  In turn, these YANG models can be linked with
      concepts that are represented in ontologies.  Additionally, these
      YANG models, can be combined with the implementation details of
      network devices yang lib augment
      [I-D.lincla-netconf-yang-library-augmentation] that could be part
      of an inventory [I-D.ietf-ivy-network-inventory-yang].

   Contextualized telemetry data:  Having context of how YANG telemetry
      data [I-D.ietf-opsawg-collected-data-manifest] is being collected
      can improve the understanding of the data for network analytics or
      closed-loop automation.  Knowledge graphs can help in this task by
      linking the collected data with: (i) the metadata that
      characterizes the platform producing the data; and (ii) the
      metadata that characterizes how and when the data were metered.
      As shown in [TKDP], the application of both knowledge graphs and
      knowledge objects to the management of telemetry data facilitates
      reducing the level of coupling of multiple tasks and overall
      operations, which enables the resolution of current network
      problems through the collaboration of different stakeholers.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 12]
Internet-Draft            knowledge-graph-yang              October 2024

   Artificial intelligence particularization to network management:  The
      application of knowledge graphs to telemetry data is particularly
      necessary for applying artificial intelligence (AI) methods to
      network management, as supported by [AINEMA].  Multiple AI
      elements can interoperate coherently when they make use of the
      same ontology, they manage several sub-graphs of a bigger
      knowledge graph, and they use knowledge objects to communicate
      knowledge-based messages to each other.  The benefit is
      demonstrated in typical network scenarios, as discussed in
      [EERVC].

6.  Challenges

   Ontology development:  Time-consuming task that requires skills in
      knowledge management and conceptual modeling.  Additionally,
      ontology developers should maintain a tight coordination with
      domain owners and ontology users.  Following a standard
      methodology like LOT provides guidance in the process but still,
      the development of the ontology requires manual work.  Tools that
      can produce or bootstrap ontologies from existing YANG data models
      in a semi-automatic, or even automatic, are desirable.  In this
      sense, the future release of the YANG language could be extended
      to facilitate this task at design time.  YANG data models could
      include explicit semantics in the data models, in the same way
      that JSON-LD [JSON-LD] or CSVW [CSVW] include metadata indicating
      which concepts from concepts are referenced by the data.  In the
      current version of YANG, this could be achieved at runtime using
      the YANG Metadata extension [RFC7952].  With this extension, YANG
      data models could include additional metadata to indicate the
      ontology concept a YANG data node is referring to, though this
      approach only works at runtime, and additionally, it would require
      augmenting existing YANG data models.

   Pipeline performance:  To integrate the raw data from the original
      source into the knowledge graph entails several steps as described
      before.  This steps add an extra latency before having the data
      stored in the knowledge graph for consumption.  This latency can
      be an important limitation for real-time analytics use cases.

   Scalability:  The knowledge graph must be able to integrate massive
      amounts of data collected from the network.  Distributed and
      federated architectures can improve the scalability of a global,
      composable knowledge graph.  However, these architectures add
      complexity to the management of knowledge graph as well as extra
      latency when federating requests.

   Virtualization:  The common approach for data integration is by

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 13]
Internet-Draft            knowledge-graph-yang              October 2024

      materializing the data in the knowledge graph, which entails
      duplicating the data.  However, this approach presents multiple
      limitations in terms of data governance and data cadence.
      Regarding data governance, having copies of the original data
      hampers keeping track of all the available data.  With respect to
      data cadence, in particular for batch data sources, data are
      periodically pulled from the source at particular frequency, which
      might not be optimal depending on the use case.  In this sense,
      data virtualization introduces a new data access technique that
      can overcome these limitations.  With this technique, the
      knowledge graph defines pointers to the data at the original
      source, and the KGC pipeline performs the ingestion and mapping of
      the data, and eventually the delivery of data to the consumer,
      only when requested on demand.

   Network configuration:  This document has focused on integrating
      telemetry data in the knowledge graph for monitoring purposes.
      But knowledge graphs could also be leveraged for integrating data
      related to the configuration of devices and services in the
      network.  This approach could enable closed-loop network
      management since both configuration and operational data are
      stored in the knowledge graph.

7.  Security Considerations

   Access control to data:  The knowledge graph becomes an integrator of
      data, and, in many cases, sensible.  Therefore, data access
      control mechanisms must be present to ensure that only authorized
      consumers can discover and access data from the knowledge graph.
      Access control policies based on roles or attributes are common
      approaches, but additional aspects like sensitivity of data could
      be included in the policy.

   Integrity and authenticity of mappings:  The declaration of mappings
      of raw data to concepts in ontologies is a critical step in the
      knowledge graph construction.  Unauthorized mappings, or even
      tampered mappings, can lead to security breaches and anomalies
      producing a great impact on analytics and machine learning
      applications that consume data from the knowledge graph.  To
      protect consumers from these scenarios, the knowledge graph must
      include mechanisms that verify the correctness, authenticity, and
      integrity of the mappings used in the construction of the graph.
      Only data owners, as accountable of their data, should be
      authorized to define and deploy mappings for the knowledge graph
      construction.

   Data provenance:  Keeping track of the history of data as they go

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 14]
Internet-Draft            knowledge-graph-yang              October 2024

      through the knowledge graph construction pipeline can improve the
      quality of the data of the knowledge graph.  As part of the
      knowledge graph construction, signatures can be appended to the
      data [I-D.lopez-opsawg-yang-provenance], can help in verifying
      that such data come from the golden data source, and therefore,
      that the data can be trusted.

8.  IANA Considerations

   This document has no IANA actions.

9.  Open Issues

   *  Should RML mappings reference data at the YANG level using XPath
      or subtree filters?  Or references should remain based on the
      actual encoding format used by the network management protocol,
      e.g., JSON, XML.

   *  Should this document provide guidelines for generating URIs of
      nodes/subjects in the knowledge graph?  Take into account there
      are several levels of abstraction device vs network/service level.
      For example, the URI that identifies a network interface cannot be
      generated only from the name of the interface as there could
      conflicts with other interfaces of other network devices having
      the same name.

   *  Definition of YANG data sources with formal vocabulary, similar to
      what Web of Things ontology has done for MQTT or REST APIs or D2RQ
      ontology for relational databases.  Having the specification of
      the data source in the knowledge graph improves provenance and
      decouples the configuration from the implementation, e.g., via
      custom INI config file.

   *  Implementations?  References to examples based on open-source
      implementations.  Integration with YANG-Push-Kafka architecture.
      Target future hackathons.

   *  Document focused on YANG data sources.  Should the document open
      the scope to other kinds of data sources like IPFIX?

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 15]
Internet-Draft            knowledge-graph-yang              October 2024

   [RFC7950]  Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
              RFC 7950, DOI 10.17487/RFC7950, August 2016,
              <https://www.rfc-editor.org/rfc/rfc7950>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

10.2.  Informative References

   [AINEMA]   Pedro Martinez-Julia, Shunsuke Homma, Diego Lopez.,
              "Artificial Intelligence Framework for Network Management,
              draft-pedro-nmrg-ai-framework.", n.d.,
              <https://datatracker.ietf.org/doc/draft-pedro-nmrg-ai-
              framework/>.

   [ANSA]     Pedro Martinez-Julia, Ved P. Kafle, Hitoshi Asaeda.,
              "Application of Category Theory to Network Service Fault
              Detection. IEEE Open Journal of the Communications Society
              5 (2024): 4417-4443.", n.d..

   [CSVW]     "CSVW - CSV on the Web", n.d., <https://csvw.org>.

   [EERVC]    Pedro Martinez-Julia, Ved P. Kafle, Hiroaki Harai.,
              "Exploiting External Events for Resource Adaptation in
              Virtual Computer and Network Systems, IEEE Transactions on
              Network and Service Management 15 (2018): 555-566.", n.d..

   [ETSI-GS-CIM-009]
              "Context Information Management (CIM); NGSI-LD API", March
              2024, <https://www.etsi.org/deliver/etsi_gs/
              CIM/001_099/009/01.08.01_60/gs_CIM009v010801p.pdf>.

   [Fuseki]   Apache, "Apache Jena Fuseki", n.d.,
              <https://jena.apache.org/documentation/fuseki2/>.

   [GNMI]     OpenConfig, "gRPC Network Management Interface (gNMI)",
              n.d.,
              <https://github.com/openconfig/reference/blob/master/rpc/
              gnmi/gnmi-specification.md>.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 16]
Internet-Draft            knowledge-graph-yang              October 2024

   [I-D.boucadair-nmop-rfc3535-20years-later]
              Boucadair, M., Contreras, L. M., de Dios, O. G., Graf, T.,
              Rahman, R., and L. Tailhardat, "RFC 3535, 20 Years Later:
              An Update of Operators Requirements on Network Management
              Protocols and Modelling", Work in Progress, Internet-
              Draft, draft-boucadair-nmop-rfc3535-20years-later-05, 16
              October 2024, <https://datatracker.ietf.org/doc/html/
              draft-boucadair-nmop-rfc3535-20years-later-05>.

   [I-D.havel-nmop-digital-map]
              Havel, O., Claise, B., de Dios, O. G., Elhassany, A., and
              T. Graf, "Modeling the Digital Map based on RFC 8345:
              Sharing Experience and Perspectives", Work in Progress,
              Internet-Draft, draft-havel-nmop-digital-map-02, 21
              October 2024, <https://datatracker.ietf.org/doc/html/
              draft-havel-nmop-digital-map-02>.

   [I-D.ietf-ivy-network-inventory-yang]
              Yu, C., Belotti, S., Bouquier, J., Peruzzini, F., and P.
              Bedard, "A YANG Data Model for Network Inventory", Work in
              Progress, Internet-Draft, draft-ietf-ivy-network-
              inventory-yang-03, 7 July 2024,
              <https://datatracker.ietf.org/doc/html/draft-ietf-ivy-
              network-inventory-yang-03>.

   [I-D.ietf-opsawg-collected-data-manifest]
              Claise, B., Quilbeuf, J., Lopez, D., Martinez-Casanueva,
              I. D., and T. Graf, "A Data Manifest for Contextualized
              Telemetry Data", Work in Progress, Internet-Draft, draft-
              ietf-opsawg-collected-data-manifest-05, 21 October 2024,
              <https://datatracker.ietf.org/doc/html/draft-ietf-opsawg-
              collected-data-manifest-05>.

   [I-D.irtf-nmrg-network-digital-twin-arch]
              Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu,
              Q., Boucadair, M., and C. Jacquenet, "Network Digital
              Twin: Concepts and Reference Architecture", Work in
              Progress, Internet-Draft, draft-irtf-nmrg-network-digital-
              twin-arch-08, 16 October 2024,
              <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-
              network-digital-twin-arch-08>.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 17]
Internet-Draft            knowledge-graph-yang              October 2024

   [I-D.lincla-netconf-yang-library-augmentation]
              Lin, Z., Claise, B., and I. D. Martinez-Casanueva,
              "Augmented-by Addition into the IETF-YANG-Library", Work
              in Progress, Internet-Draft, draft-lincla-netconf-yang-
              library-augmentation-01, 4 March 2024,
              <https://datatracker.ietf.org/doc/html/draft-lincla-
              netconf-yang-library-augmentation-01>.

   [I-D.lopez-opsawg-yang-provenance]
              Lopez, D., Pastor, A., Feng, A. H., Birkholz, H., and S.
              Garcia, "Applying COSE Signatures for YANG Data
              Provenance", Work in Progress, Internet-Draft, draft-
              lopez-opsawg-yang-provenance-03, 6 July 2024,
              <https://datatracker.ietf.org/doc/html/draft-lopez-opsawg-
              yang-provenance-03>.

   [I-D.netana-nmop-yang-message-broker-integration]
              Graf, T. and A. Elhassany, "An Architecture for YANG-Push
              to Message Broker Integration", Work in Progress,
              Internet-Draft, draft-netana-nmop-yang-message-broker-
              integration-00, 22 April 2024,
              <https://datatracker.ietf.org/doc/html/draft-netana-nmop-
              yang-message-broker-integration-00>.

   [Iglesias-Molina2023]
              Iglesias-Molina, A., "The RML Ontology: A Community-Driven
              Modular Redesign After a Decade of Experience in Mapping
              Heterogeneous Data to RDF", The Semantic Web – ISWC 2023 ,
              October 2023,
              <https://doi.org/10.1007/978-3-031-47243-5_9>.

   [JSON-LD]  W3C, "JSON-LD 1.1: A JSON-based Serialization for Linked
              Data", July 2020, <https://www.w3.org/TR/json-ld11/>.

   [Neo4j]    "rdflib-neo4j - RDFLib Store backed by neo4j", n.d.,
              <https://github.com/neo4j-labs/rdflib-neo4j>.

   [oc-interfaces]
              OpenConfig, "openconfig-interfaces modules", n.d.,
              <https://openconfig.net/projects/models/schemadocs/
              yangdoc/openconfig-interfaces.html>.

   [OWL]      W3C, "OWL 2 Web Ontology Language Document Overview
              (Second Edition)", December 2012,
              <https://www.w3.org/TR/owl2-overview/>.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 18]
Internet-Draft            knowledge-graph-yang              October 2024

   [Poveda-Villalon2022]
              Engineering Applications of Artificial Intelligence, "LOT:
              An industrial oriented ontology engineering framework",
              May 2022,
              <https://doi.org/10.1016/j.engappai.2022.104755>.

   [RDF]      W3C, "Resource Description Framework (RDF): Concepts and
              Abstract Syntax", February 2014,
              <https://www.w3.org/TR/rdf11-concepts/>.

   [RDFS]     W3C, "RDF Schema 1.1", February 2014,
              <https://www.w3.org/TR/rdf-schema/>.

   [RFC3444]  Pras, A. and J. Schoenwaelder, "On the Difference between
              Information Models and Data Models", RFC 3444,
              DOI 10.17487/RFC3444, January 2003,
              <https://www.rfc-editor.org/rfc/rfc3444>.

   [RFC6241]  Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
              and A. Bierman, Ed., "Network Configuration Protocol
              (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
              <https://www.rfc-editor.org/rfc/rfc6241>.

   [RFC7952]  Lhotka, L., "Defining and Using Metadata with YANG",
              RFC 7952, DOI 10.17487/RFC7952, August 2016,
              <https://www.rfc-editor.org/rfc/rfc7952>.

   [RFC8040]  Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF
              Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017,
              <https://www.rfc-editor.org/rfc/rfc8040>.

   [RFC8199]  Bogdanovic, D., Claise, B., and C. Moberg, "YANG Module
              Classification", RFC 8199, DOI 10.17487/RFC8199, July
              2017, <https://www.rfc-editor.org/rfc/rfc8199>.

   [RFC8340]  Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams",
              BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018,
              <https://www.rfc-editor.org/rfc/rfc8340>.

   [RFC8343]  Bjorklund, M., "A YANG Data Model for Interface
              Management", RFC 8343, DOI 10.17487/RFC8343, March 2018,
              <https://www.rfc-editor.org/rfc/rfc8343>.

   [RFC8345]  Clemm, A., Medved, J., Varga, R., Bahadur, N.,
              Ananthakrishnan, H., and X. Liu, "A YANG Data Model for
              Network Topologies", RFC 8345, DOI 10.17487/RFC8345, March
              2018, <https://www.rfc-editor.org/rfc/rfc8345>.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 19]
Internet-Draft            knowledge-graph-yang              October 2024

   [RFC8641]  Clemm, A. and E. Voit, "Subscription to YANG Notifications
              for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641,
              September 2019, <https://www.rfc-editor.org/rfc/rfc8641>.

   [RFC8949]  Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", STD 94, RFC 8949,
              DOI 10.17487/RFC8949, December 2020,
              <https://www.rfc-editor.org/rfc/rfc8949>.

   [RFC9232]  Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
              A. Wang, "Network Telemetry Framework", RFC 9232,
              DOI 10.17487/RFC9232, May 2022,
              <https://www.rfc-editor.org/rfc/rfc9232>.

   [RFC9417]  Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T.
              Arumugam, "Service Assurance for Intent-Based Networking
              Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023,
              <https://www.rfc-editor.org/rfc/rfc9417>.

   [RFC9418]  Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T.
              Arumugam, "A YANG Data Model for Service Assurance",
              RFC 9418, DOI 10.17487/RFC9418, July 2023,
              <https://www.rfc-editor.org/rfc/rfc9418>.

   [SECDEP]   Ana Hermosilla, Jose Manuel Manjón-Cáliz, Pedro Martinez-
              Julia, Antonio Pastor, Jordi Ortiz, Diego R. Lopez,
              Antonio Skarmeta., "Secure deployment of third-party
              applications over 5G-NFV ML-empowered infrastructures, the
              7th International Conference on Mobile Internet Security
              (MobiSec '23), Dec 19-21, 2023, Okinawa, Japan.", n.d..

   [SHACL]    W3C, "Shapes Constraint Language (SHACL)", July 2017,
              <https://www.w3.org/TR/shacl/>.

   [SPARQL]   W3C, "SPARQL 1.1 Query Language", March 2013,
              <https://www.w3.org/TR/sparql11-query/>.

   [Tailhardat2023]
              KGCW’23: 4th International Workshop on Knowledge Graph
              Construction, "Designing NORIA: a Knowledge Graph-based
              Platform for Anomaly Detection and Incident Management in
              ICT Systems", May 2023,
              <https://ceur-ws.org/Vol-3471/paper3.pdf>.

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 20]
Internet-Draft            knowledge-graph-yang              October 2024

   [TKDP]     Pedro Martinez-Julia, Ved P. Kafle, Hitoshi Asaeda.,
              "Telemetry Knowledge Distributed Processing for Network
              Digital Twins and Network Resilience. NOMS 2023-2023 IEEE/
              IFIP Network Operations and Management Symposium (2023):
              1-6.", n.d..

   [W3C-KGC]  W3C, "Knowledge Graph Construction Community Group", n.d.,
              <https://www.w3.org/community/kg-construct/>.

Acknowledgments

   This document is based on work partially funded by the EU Horizon
   Europe projects aerOS (grant agreement no. 101069732) and ROBUST-6G
   (grant agreement no. 101139068).

   The authors would like to thank Med, Benoit, Lionel, and Thomas for
   their review and valuable comments.

Authors' Addresses

   Ignacio Dominguez Martinez-Casanueva
   Telefonica
   Email: ignacio.dominguezmartinez@telefonica.com

   Lucia Cabanillas
   Telefonica
   Email: lucia.cabanillasrodriguez@telefonica.com

   Pedro
   NICT
   Email: pedro@nict.go.jp

Martinez-Casanueva, et al.Expires 24 April 2025                [Page 21]