Skip to main content

Semantic Metadata Annotation for Network Anomaly Detection
draft-netana-opsawg-nmrg-network-anomaly-semantics-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Thomas Graf , Wanting Du , Alex Huang Feng
Last updated 2023-10-23
Replaced by draft-netana-nmop-network-anomaly-semantics
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-netana-opsawg-nmrg-network-anomaly-semantics-00
Network Working Group                                            T. Graf
Internet-Draft                                                     W. Du
Intended status: Experimental                                   Swisscom
Expires: 25 April 2024                                     A. Huang Feng
                                                               INSA-Lyon
                                                         23 October 2023

       Semantic Metadata Annotation for Network Anomaly Detection
         draft-netana-opsawg-nmrg-network-anomaly-semantics-00

Abstract

   This document explains why and how semantic metadata annotation helps
   to test and validate outlier detection, supports supervised and semi-
   supervised machine learning development and make anomalies for humans
   apprehensible.  The proposed semantics uniforms the network anomaly
   data exchange between and among operators and vendors to improve
   their network outlier detection systems.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 25 April 2024.

Graf, et al.              Expires 25 April 2024                 [Page 1]
Internet-Draft          Network Anomaly Semantics           October 2023

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Outlier Detection . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Data Mesh . . . . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Observed Symptoms . . . . . . . . . . . . . . . . . . . . . .   4
   5.  Semantic Metadata . . . . . . . . . . . . . . . . . . . . . .   8
     5.1.  Overview of the Model . . . . . . . . . . . . . . . . . .   8
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   8
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   8
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   Network Anomaly Detection Architecture [Ahf23] provides an overall
   introduction into how anomaly detection is being applied into the IP
   network domain and which operational data is needed.  It approaches
   the problem space by automating what a Network Engineer would
   normally do when veryfing a network connectivity service.  Monitor
   from different network plane perspectives to understand wherever one
   network plane affects another negatively.

   In order to fine tune outlier detection, the results provided as
   analytical data need to be reviewed by a Network Engineer.  Keeping
   the human out of the monitoring but still involving him in the alert
   verification loop.

Graf, et al.              Expires 25 April 2024                 [Page 2]
Internet-Draft          Network Anomaly Semantics           October 2023

   This document describes what information is needed to understand the
   output of the outlier detection for a Network Engineer, but also at
   the same time is semantically structured that it can be used for
   outlier detection testing by comparing the results systematically and
   set a baseline for supervised machine learning which requires labeled
   operational data.

2.  Outlier Detection

   Outlier Detection, also known as anomaly detection, describes a
   systematic approach to identify rare data points deviating
   significantly from the majority.  Outliers are commonly classified in
   three categories:

   Global outliers:  A data point is considered a global outlier if its
      value is far outside the entirety of a data set.  For example, an
      average dropped packet count is between 0 and 10 per minute during
      a one week observation and the observed global outlier was 100000
      packets.

   Contextual outliers:  A data point is considered a contextual outlier
      if its value significantly deviates from the rest of the data
      points in the same time series context.  For example, the
      forwarded packet volume in a timeseries are changing during the
      time of the day like an oscillation curve, where the observed
      contextual packet volume outlier is outside the oscillation curve
      at that moment in time.  At another time the same value could be
      considered normal.

   Collective outliers:  A subset of data points within a data set is
      considered anomalous if those values as a collection deviate
      significantly from the entire data set, but the values of the
      individual data points are not themselves anomalous in either a
      contextual or global sense.  In Network Telemetry time series, one
      way this can manifest is that the amount of network path and
      interface state changes matches the time range when the forwarded
      packet volume decreases as a group.

   For each outlier a score between 0 and 1 is being calculated.  The
   higher the value, the higher the probability that the observed data
   point is an outlier.  Anomaly detection: A survey [VAP09] gives
   additional details on anomaly detection and its types.

Graf, et al.              Expires 25 April 2024                 [Page 3]
Internet-Draft          Network Anomaly Semantics           October 2023

3.  Data Mesh

   The Data Mesh [Deh22] Architecture distinguishes between operational
   and analytical data.  Operational data refers to collected data from
   operational systems.  While analytical data refers to insights gained
   from operational data.

   In terms of network observability, semantics of operational network
   metrics are defined by IETF and are categorized as described in the
   Network Telemetry Framework [RFC9232] in the following three
   different network planes:

   Management Plane:  Time series data describing the state changes and
      statistics of a network node and its components.  For example,
      Interface state and statistics modelled in ietf-interfaces.yang
      [RFC8343]

   Control Plane:  Time series data describing the state and state
      changes of network reachability.  For example, BGP VPNv6 unicast
      updates and withdrawals exported in BGP Monitoring Protocol (BMP)
      [RFC7854] and modeled in BGP [RFC4364]

   Forwarding Plane:  Time series data describing the forwarding
      behavior of packets and its data-plane context.  For example,
      dropped packet count modelled in IPFIX entity
      forwardingStatus(IE89) [RFC7270] and packetDeltaCount(IE2)
      [RFC5102] and exportet with IPFIX [RFC7011].

   In terms of network observability, semantics of analytical data
   refers to incident notifications or service level indicators.  For
   example the incident notification described in Section 7.2 of
   [I-D.feng-opsawg-incident-management], the health status and symptoms
   described in the Service Assurance Intend Based Networking [RFC9418]
   or the precision availability metrics defined in [I-D.ietf-ippm-pam]
   or network anomalies and its symptoms as described in this document.

4.  Observed Symptoms

   In this section observed network symptoms are specified and
   categorized according to the following scheme:

   Action:  Which action the network node performed for a packet in the
      Forwarding Plane, a path or adjancency in the Control Plane or
      state or statistical changes in the Management Plane.  For
      Forwarding Plane we distinguish between missing, where the drop
      occured outside the measured network node, drop and on-path delay,
      which was measured on the network node.  For control-plane we
      distinguish between reachability, which refers to a change in the

Graf, et al.              Expires 25 April 2024                 [Page 4]
Internet-Draft          Network Anomaly Semantics           October 2023

      routing or forwarding information base (RIB/FIB) and adjcacency
      which refers to a change in peering or link-layer resolution.  For
      Management Plane we refer to state or statistical changes on
      interfaces.

   Reason:  For each action one or more reasons describinging why this
      action was used.  For Drops in Forwarding Plane we distinguish
      between Unreachable because network layer reachability information
      was missing, administered because an administrator configured a
      rule preventing the forwarding for this packet and Corrupt where
      the network node was unable to determine where to forward to due
      to packet, software or hardware error.  For On-Path Delay we
      distinguish between Minimum, Average and Maximum Delay for a given
      Flow.

   Relation:  For each reason one or more relation describe the cause
      why the action was chosen.  These reason could relate network
      plane entity, a packet, control-plane or node administered
      instruction.

   Table 1 consolidates for the forwarding plane a list of common
   symptoms with their actions, reasons and relations.

Graf, et al.              Expires 25 April 2024                 [Page 5]
Internet-Draft          Network Anomaly Semantics           October 2023

            +=========+==============+========================+
            | Action  | Reason       | Relation               |
            +=========+==============+========================+
            | Missing | Previous     | Time                   |
            +---------+--------------+------------------------+
            | Drop    | Unreachable  | next-hop               |
            +---------+--------------+------------------------+
            | Drop    | Unreachable  | link-layer             |
            +---------+--------------+------------------------+
            | Drop    | Unreachable  | Time To Life expired   |
            +---------+--------------+------------------------+
            | Drop    | Unreachable  | Fragmentation needed   |
            |         |              | and Don't Fragment set |
            +---------+--------------+------------------------+
            | Drop    | Administered | Access-List            |
            +---------+--------------+------------------------+
            | Drop    | Administered | Unicast Reverse Path   |
            |         |              | Forwarding             |
            +---------+--------------+------------------------+
            | Drop    | Administered | Discard Route          |
            +---------+--------------+------------------------+
            | Drop    | Administered | Policed                |
            +---------+--------------+------------------------+
            | Drop    | Administered | Shaped                 |
            +---------+--------------+------------------------+
            | Drop    | Corrupt      | Bad Packet             |
            +---------+--------------+------------------------+
            | Drop    | Corrupt      | Bad Egress Interface   |
            +---------+--------------+------------------------+
            | Delay   | Min          | -                      |
            +---------+--------------+------------------------+
            | Delay   | Mean         | -                      |
            +---------+--------------+------------------------+
            | Delay   | Max          | -                      |
            +---------+--------------+------------------------+

              Table 1: Describing Symptoms and their Actions,
                  Reason and Relation for Forwarding Plane

   Table 2 consolidates for the control plane a list of common symptoms
   with their actions, reasons and relations.

Graf, et al.              Expires 25 April 2024                 [Page 6]
Internet-Draft          Network Anomaly Semantics           October 2023

                +==============+=============+============+
                | Action       | Reason      | Relation   |
                +==============+=============+============+
                | Reachability | Update      | Imported   |
                +--------------+-------------+------------+
                | Reachability | Update      | Received   |
                +--------------+-------------+------------+
                | Reachability | Withdraw    | Received   |
                +--------------+-------------+------------+
                | Reachability | Withdraw    | Peer Down  |
                +--------------+-------------+------------+
                | Adjacency    | Established | Peer       |
                +--------------+-------------+------------+
                | Adjacency    | Established | Link-Layer |
                +--------------+-------------+------------+
                | Adjacency    | Teared Down | Peer       |
                +--------------+-------------+------------+
                | Adjacency    | Teared Down | Link-Layer |
                +--------------+-------------+------------+

                   Table 2: Describing Symptoms and their
                      Actions, Reason and Relation for
                               Control Plane

   Table 3 consolidates for the management plane a list of common
   symptoms with their actions, reasons and relations.

               +===========+==================+============+
               | Action    | Reason           | Relation   |
               +===========+==================+============+
               | Interface | Up               | Link-Layer |
               +-----------+------------------+------------+
               | Interface | Down             | Link-Layer |
               +-----------+------------------+------------+
               | Interface | Errors           | -          |
               +-----------+------------------+------------+
               | Interface | Discards         | -          |
               +-----------+------------------+------------+
               | Interface | Unknown Protocol | -          |
               +-----------+------------------+------------+

                   Table 3: Describing Symptoms and their
                      Actions, Reason and Relation for
                              Management Plane

Graf, et al.              Expires 25 April 2024                 [Page 7]
Internet-Draft          Network Anomaly Semantics           October 2023

5.  Semantic Metadata

   Metadata adds additional context to data.  For instance, in networks
   the software version of a network node where management plane metrics
   are obtained from as described in
   [I-D.claise-opsawg-collected-data-manifest].  Where in Semantic
   Metadata the meaning or ontology of the annotated data is being
   described.

5.1.  Overview of the Model

   Figure 1 contains the YANG tree diagram [RFC8340] of the ietf-
   anomaly-detection-semantic-metadata module.

   module: ietf-anomaly-detection-semantic-metadata

      Figure 1: YANG tree diagram for ietf-anomaly-detection-semantic-
                                  metadata

   Describe YANG module

6.  Security Considerations

   The security considerations.

7.  Acknowledgements

   The authors would like to thank xxx for their review and valuable
   comments.

8.  References

8.1.  Normative References

   [Ahf23]    Huang Feng, A., "Daisy: Practical Anomaly Detection in
              large BGP/MPLS and BGP/SRv6 VPN Networks", IETF 117,
              Applied Networking Research Workshop,
              DOI 10.1145/3606464.3606470, July 2023,
              <https://anrw23.hotcrp.com/doc/anrw23-paper8.pdf>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Graf, et al.              Expires 25 April 2024                 [Page 8]
Internet-Draft          Network Anomaly Semantics           October 2023

   [RFC8340]  Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams",
              BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018,
              <https://www.rfc-editor.org/info/rfc8340>.

   [RFC9232]  Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
              A. Wang, "Network Telemetry Framework", RFC 9232,
              DOI 10.17487/RFC9232, May 2022,
              <https://www.rfc-editor.org/info/rfc9232>.

8.2.  Informative References

   [Deh22]    Dehghani, Z., "Data Mesh", O'Reilly Media,
              ISBN 9781492092391, March 2022,
              <https://www.oreilly.com/library/view/data-
              mesh/9781492092384/>.

   [I-D.claise-opsawg-collected-data-manifest]
              Claise, B., Quilbeuf, J., Lopez, D., Martinez-Casanueva,
              I. D., and T. Graf, "A Data Manifest for Contextualized
              Telemetry Data", Work in Progress, Internet-Draft, draft-
              claise-opsawg-collected-data-manifest-06, 10 March 2023,
              <https://datatracker.ietf.org/doc/html/draft-claise-
              opsawg-collected-data-manifest-06>.

   [I-D.feng-opsawg-incident-management]
              Feng, C., Hu, T., Contreras, L. M., Graf, T., Wu, Q., Yu,
              C., and N. Davis, "Incident Management for Network
              Services", Work in Progress, Internet-Draft, draft-feng-
              opsawg-incident-management-02, 21 October 2023,
              <https://datatracker.ietf.org/doc/html/draft-feng-opsawg-
              incident-management-02>.

   [I-D.ietf-ippm-pam]
              Mirsky, G., Halpern, J. M., Min, X., Clemm, A., Strassner,
              J., and J. François, "Precision Availability Metrics for
              Services Governed by Service Level Objectives (SLOs)",
              Work in Progress, Internet-Draft, draft-ietf-ippm-pam-08,
              18 October 2023, <https://datatracker.ietf.org/doc/html/
              draft-ietf-ippm-pam-08>.

   [I-D.ietf-opsawg-ipfix-on-path-telemetry]
              Graf, T., Claise, B., and A. H. Feng, "Export of On-Path
              Delay in IPFIX", Work in Progress, Internet-Draft, draft-
              ietf-opsawg-ipfix-on-path-telemetry-04, 6 July 2023,
              <https://datatracker.ietf.org/doc/html/draft-ietf-opsawg-
              ipfix-on-path-telemetry-04>.

Graf, et al.              Expires 25 April 2024                 [Page 9]
Internet-Draft          Network Anomaly Semantics           October 2023

   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
              2006, <https://www.rfc-editor.org/info/rfc4364>.

   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
              Meyer, "Information Model for IP Flow Information Export",
              RFC 5102, DOI 10.17487/RFC5102, January 2008,
              <https://www.rfc-editor.org/info/rfc5102>.

   [RFC7011]  Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
              "Specification of the IP Flow Information Export (IPFIX)
              Protocol for the Exchange of Flow Information", STD 77,
              RFC 7011, DOI 10.17487/RFC7011, September 2013,
              <https://www.rfc-editor.org/info/rfc7011>.

   [RFC7270]  Yourtchenko, A., Aitken, P., and B. Claise, "Cisco-
              Specific Information Elements Reused in IP Flow
              Information Export (IPFIX)", RFC 7270,
              DOI 10.17487/RFC7270, June 2014,
              <https://www.rfc-editor.org/info/rfc7270>.

   [RFC7854]  Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP
              Monitoring Protocol (BMP)", RFC 7854,
              DOI 10.17487/RFC7854, June 2016,
              <https://www.rfc-editor.org/info/rfc7854>.

   [RFC8343]  Bjorklund, M., "A YANG Data Model for Interface
              Management", RFC 8343, DOI 10.17487/RFC8343, March 2018,
              <https://www.rfc-editor.org/info/rfc8343>.

   [RFC9418]  Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T.
              Arumugam, "A YANG Data Model for Service Assurance",
              RFC 9418, DOI 10.17487/RFC9418, July 2023,
              <https://www.rfc-editor.org/info/rfc9418>.

   [VAP09]    Chandola, V., Banerjee, A., and V. Kumar, "Anomaly
              detection: A survey", IETF 117, Applied Networking
              Research Workshop, DOI 10.1145/1541880.1541882, July 2009,
              <https://www.researchgate.net/
              publication/220565847_Anomaly_Detection_A_Survey>.

Authors' Addresses

   Thomas Graf
   Swisscom
   Binzring 17
   CH-8045 Zurich
   Switzerland

Graf, et al.              Expires 25 April 2024                [Page 10]
Internet-Draft          Network Anomaly Semantics           October 2023

   Email: thomas.graf@swisscom.com

   Wanting Du
   Swisscom
   Binzring 17
   CH-8045 Zurich
   Switzerland
   Email: wanting.du@swisscom.com

   Alex Huang Feng
   INSA-Lyon
   Lyon
   France
   Email: alex.huang-feng@insa-lyon.fr

Graf, et al.              Expires 25 April 2024                [Page 11]