Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Incident Management for Network Services
draft-feng-opsawg-incident-management-00

Versions:
The information below is for an old version of the document.
Document	Type	This is an older version of an Internet-Draft whose latest revision state is "Replaced".
	Authors	Chong Feng , Tong Hu , Luis M. Contreras , Qin Wu , Chaode Yu
	Last updated	2023-03-13
	Replaced by	draft-feng-nmop-network-incident-yang
	RFC stream	(None)
	Formats	txt html xml htmlized pdf bibtex bibxml
Stream	Stream state	(No stream defined)
	Consensus boilerplate	Unknown
	RFC Editor Note	(None)
IESG	IESG state	I-D Exists
	Telechat date	(None)
	Responsible AD	(None)
	Send notices to	(None)
Email authors IPR References Referenced by Nits Search email archive
draft-feng-opsawg-incident-management-00
OPSAWG                                                      C. Feng, Ed.
Internet-Draft                                                    Huawei
Intended status: Standards Track                                   T. Hu
Expires: 14 September 2023                                          CMCC
                                                           LM. Contreras
                                                          Telefonica I+D
                                                                   Q. Wu
                                                                   C. Yu
                                                                  Huawei
                                                           13 March 2023

                Incident Management for Network Services
                draft-feng-opsawg-incident-management-00

Abstract

   This document provides an architecture for the incident management
   system and related function interface requirements.

   This document also defines a YANG module to support the incident
   lifecycle management.  This YANG module is meant to provide a
   standard way to report, diagnose, and resolve incidents for the sake
   of enhanced network services.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 14 September 2023.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Feng, et al.            Expires 14 September 2023               [Page 1]
Internet-Draft             Incident Management                March 2023

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Sample Use Cases  . . . . . . . . . . . . . . . . . . . . . .   5
     3.1.  Incident-Based Trouble Tickets dispatching  . . . . . . .   5
     3.2.  Fault Locating  . . . . . . . . . . . . . . . . . . . . .   6
     3.3.  Fault Labelling . . . . . . . . . . . . . . . . . . . . .   6
     3.4.  Energy Conservation . . . . . . . . . . . . . . . . . . .   7
   4.  Incident Management Architecture  . . . . . . . . . . . . . .   7
   5.  Functional Interface Requirements between the Client and the
           Agent . . . . . . . . . . . . . . . . . . . . . . . . . .   9
     5.1.  Incident Detection  . . . . . . . . . . . . . . . . . . .   9
     5.2.  Incident Diagnosis  . . . . . . . . . . . . . . . . . . .  12
     5.3.  Incident Resolution . . . . . . . . . . . . . . . . . . .  13
   6.  Incident Data Model Concepts  . . . . . . . . . . . . . . . .  13
     6.1.  Identifying the Incident Instance . . . . . . . . . . . .  13
     6.2.  The Incident Lifecycle  . . . . . . . . . . . . . . . . .  13
       6.2.1.  Incident Instance Lifecycle . . . . . . . . . . . . .  13
       6.2.2.  Operator Incident Lifecycle . . . . . . . . . . . . .  14
   7.  Incident Data Model . . . . . . . . . . . . . . . . . . . . .  14
     7.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .  14
     7.2.  Incident Notifications  . . . . . . . . . . . . . . . . .  15
     7.3.  Incident Acknowledge  . . . . . . . . . . . . . . . . . .  17
     7.4.  Incident Diagnose . . . . . . . . . . . . . . . . . . . .  17
     7.5.  Incident Resolution . . . . . . . . . . . . . . . . . . .  19
   8.  Incident Management YANG Module . . . . . . . . . . . . . . .  19
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  32
     9.1.  The "IETF XML" Registry . . . . . . . . . . . . . . . . .  32
     9.2.  The "YANG Module Names" Registry  . . . . . . . . . . . .  32
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  32
   11. Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  33
   12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  33
   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  33
     13.1.  Normative References . . . . . . . . . . . . . . . . . .  33
     13.2.  Informative References . . . . . . . . . . . . . . . . .  34
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  34

Feng, et al.            Expires 14 September 2023               [Page 2]
Internet-Draft             Incident Management                March 2023

1.  Introduction

   Network performance management and fault management are used for
   monitoring and troubleshooting separately in networking
   infrastructures.  Typically, metrics and alarms, transaction
   operations are monitored centrally and incident tickets are triggered
   accordingly.  A YANG [RFC7950] data model for alarm management
   [RFC8632] defines a standard interface for alarm management.

   A data model for Network and VPN Service Performance Monitoring [I-
   D.opsawg-yang-vpn-service-pm] defines a standard interface for
   performance management.  In addition, distributed tracing mechanism
   defined in [W3C-Trace-Context] can also be used to follow, analyze
   and debug operations, such as configuration transactions, across
   multiple distributed systems.

   However, alarm-centric solution described in [RFC8632] and
   performance-centric solution described in [I-D.opsawg-yang-vpn-
   service-pm], trace context-centric solution is based on a data source
   specific information and maintenance engineers' experience and fall
   short when keeping track of them separately in various different
   management systems, e.g., the frequency and quantity of alarms
   reported to Operating Support System (OSS) increased dramatically (in
   many cases multiple orders of magnitude) with the growth of service
   types and complexity, hard to aggregate in a single domain along with
   key performance metrics, various different events, notifications,
   overwhelm OSS platforms, result in low processing efficiency,
   inaccurate root cause identification and duplicated tickets.

   Usually, the network modeling from device to different connection and
   service layers follows some existing standards.  Once there are some
   failures happened on network devices, there could be some correlative
   alarms appeared on the upper layers.  Theoretically, it is possible
   to compress a series of alarms into fewer incidents.  The traditional
   working manner is also based on this correlation relationship.  But
   the traditional working manner is time-consuming and labor-intensive
   which reduces efficiency.  Additionally, it quite depends on the
   experience of maintenance engineers.  Moreover, the investigation of
   some faults also depends on some other data like topology data or
   performance data.  This complicates network troubleshooting, and the
   correlation of alarms and network services.  Therefore, it is
   difficult to assess the impact of alarms on network services.

   To address these challenges, an incident-centric solution is
   proposed, which also supports cross-domain or cross-layer root cause
   analysis and network troubleshooting.  A network incident refers to
   an unexpected interruption of a network service, degradation of a
   network service quality, or sub-health of a network service while an

Feng, et al.            Expires 14 September 2023               [Page 3]
Internet-Draft             Incident Management                March 2023

   alarm described in [RFC8632] represents an undesirable state in a
   resource that requires corrective actions.  An alarm will always be
   reported when network resources are unexpected while an incident is
   reported only when network services are affected, e.g., symptoms
   (e.g.,CPU overloaded) at the device level defined in [I-D.opsawg-
   service-assurance-yang] or root cause alarms can be used to generate
   and report incidents when the network service is in sub-health state
   or gets degraded.  An incident may be triggered by aggregation and
   analysis of multiple alarms or other network anomalies, for example,
   the protocols related to the interface fail to work properly due to
   the interface down, as a result, the network service becomes
   unavailable.  An incident may also be raised through the analysis of
   some network performance metrics, for example, the delay or packet
   loss rate exceeds the threshold, causing degradation of the network
   service.

   Artificial Intelligence (AI) and Machine Learning (ML) play a
   important role in the processing of large amounts of data with
   complex correlations.  For example, Neural Network Algorithm or
   Hierarchy Aggregation Algorithm can be used to replace manual alarm
   correlation.  Through online and offline learning, these algorithms
   can be continuously optimized to improve the efficiency of fault
   diagnosis.

   This document defines the concepts, requirements, and architecture of
   incident management.  The document also defines a YANG data model for
   incident lifecycle management, which improves troubleshooting
   efficiency, ensures network service quality, and improves network
   automation [RFC8969].

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   The following terms are defined in [RFC8632] are not redefined here:

   *  alarm

   The following terms are defined in this document:

   Incident:  An unexpected interruption of a network service,
      degradation of network service quality, or sub-health of a network
      service.

Feng, et al.            Expires 14 September 2023               [Page 4]
Internet-Draft             Incident Management                March 2023

   Incident management:  Lifecycle management of incidents including
      incident identification, reporting, acknowledge, diagnosis, and
      resolution.

   Incident management system:  An entity which implements incident
      management.  It include incident management agent and incident
      management client.

   Incident management agent:  An entity which provides some functions
      of incident management.  For example, it can detect an incident,
      perform incident diagnosis, resolution and prediction,etc.

   Incident management client:  An entity which can manage incidents.
      For example, it can receive incident notifications, query the
      information of incidents, instruct the incident management agent
      to diagnose, resolve, etc.

3.  Sample Use Cases

3.1.  Incident-Based Trouble Tickets dispatching

   Currently, the dispatching of trouble tickets is mostly based on
   dispatching alarms.  Some operators' maintenance engineers monitor
   and identify alarms which could link to the same fault.  Then they
   dispatch these alarms to the same trouble ticket, which is in low
   automation.  If there are many alarms, then the human costs are
   increased accordingly.

   Some operators preset whitelist and adopt some coarse granularity
   association rules for the alarm management.  It seems to improve
   fault management automation.  However, some trouble tickets could be
   missed if the filtering conditions are too tight.  If the filtering
   conditions are too loose, multiple trouble tickets would be
   dispatched to the same fault.

   It is hard to achieve a perfect balance between the automation and
   duplicated trouble tickets under the traditional working situations.
   However, with the help of incident management, massive alarms can be
   aggregated into a few incidents, multiple trouble tickets will be
   saved.  At the same time, incident management can keep high accuracy
   and automation.  This could be an answer to this pain point of
   traditional trouble ticket dispatching

Feng, et al.            Expires 14 September 2023               [Page 5]
Internet-Draft             Incident Management                March 2023

3.2.  Fault Locating

   Currently, to accomplish fault isolation and locating work,
   maintenance experts need to combine topology data, service data with
   huge amount of alarm data to do the analysis.  Sometimes they also
   require some cooperation from the construction engineers who work on
   site, to operate fixing attempts on devices and then further
   investigation the root cause is required.

   For example, for a common cable interruption, maintenance experts
   need to analyze the root cause alarm from massive alarms, and then
   trace the root alarm to the faulty span segment by segment.  Next,
   site engineers perform tests at the source station to locate the
   interruption and locate the faulty optical exchange station.  Then
   travel to the located optical exchange station to replace or splice
   fibers.  During the whole process, multiple people are needed inside
   and outside the site.

   With the help of incident management, the system can automatically
   locate the faulty span, and eliminate the need for manual analysis.
   By cooperating with the integrated OTDR within the equipment, we can
   determine the target optical exchange station before site visits.
   Multiple site visits and time are saved.

3.3.  Fault Labelling

   Fiber cutover is a common maintenance scenario for Operators.  During
   the cutover process, maintenance experts must identify affected
   devices based on the cutover object and their experience.  They will
   give these devices a mark to remind other maintenance engineers that
   it is not necessary to dispatch trouble tickets before the ending of
   cutover.

   However, depending on human experience, it is very likely to make
   some mistakes.  For example, some devices are missing to mark and
   some devices are marked incorrectly.  If the devices are missing to
   mark, some trouble tickets will be dispatched during cutover, which
   are not needed actually.  If the devices are wrongly marked, some
   fault not related to this cutover will be missing.

   With incident management, maintenance experts only need to mark the
   cutover objects and do not need to mark the devices that would be
   affected.  Because of the alarm aggregation capabilities and knowing
   the relationship between root cause alarm and correlative alarm, the
   fault management system can automatically identify correlative
   alarms, without dispatching any trouble tickets to the affected
   devices.

Feng, et al.            Expires 14 September 2023               [Page 6]
Internet-Draft             Incident Management                March 2023

3.4.  Energy Conservation

   Under the global trend of energy conservation, emission reduction and
   safety management, more and more enterprises have joined the energy
   conservation and emission reduction ranks and adopted measures to
   turn off the power after work during non-working hours, making due
   contributions to the green earth.  However, this proactive power-off
   measure periodically generates a large number of alarms on the
   network, and the traditional Operation and Management system can not
   effectively identify such non-real faults caused by the enterprise
   users? operations.  Operators need to manually identify and rectify
   faults based on expert experience, wasting a large number of human
   resources.

   Incident management can intelligently identify faults caused by
   periodic power-off on the tenant side and directly identify faults.
   As a result, operators do not need to dispatch trouble tickets for
   such faults any more, this can help to reduce human resource costs.

4.  Incident Management Architecture

Feng, et al.            Expires 14 September 2023               [Page 7]
Internet-Draft             Incident Management                March 2023

               +----------------------+-------------------+
               |                                          |
               |            Incident Management Client    |
               |                                          |
               |                                          |
               +------------+---------+---------+---------+
                  ^         |         |         |
                  |Incident |Incident |Incident |Incident
                  |Report   |Ack      |Diagnose |Resolve
                  |         |         |         |
                  |         V         V         V
               +--+-------------------+---------+----------+
               |                                           |
               |                                           |
               |            Incident Management Agent      |
               |                                           |
               |                                           |
               |                                           |
               |                                           |
               +----------------------+-----+--+-----------+
                     ^       ^Abnormal         ^
                     |Alarm  |Operations       |Metrics
                     |Report |Report           |/Telemetry
                     |       |                 V
        +--------+-+-+-------+--------------++------------------+
        |                                                       |
        |                     Network                           |
        |                                                       |
        +------------------------------------+------------------+

                 Figure 1: Incident Management Architecture

   Figure 1 illustrates the incident management architecture.  Two key
   components for the incident management are incident management client
   and incident management agent.

   Incident management agent can be deployed in network analytics
   platform, controllers or Orchestrators and provides functionalities
   such as incident detection, report, diagnosis, resolution, querying
   for incident lifecycle management.

   Incident management client can be deployed in the network OSS or
   other business systems of operators and invokes the functionalities
   provided by incident management agent to meet the business
   requirements of fault management.

   A typical workflow of incident management is as follows:

Feng, et al.            Expires 14 September 2023               [Page 8]
Internet-Draft             Incident Management                March 2023

   *  Some alarms or abnormal operations, network performance metrics
      are reported from the network.  Incident management agent receives
      these alarms/abnormal operations/metrics and analyzes the impact
      of these alarms on network services.  If the analysis result
      indicates that network services are affected, an incident will be
      reported to the client.

   *  Incident management client receives the incident raised by agent,
      and acknowledge it.  Client may invoke the 'incident diagnose' rpc
      to diagnose this incident to find the root causes.

   *  If the root causes have been found, the client can resolve this
      incident by invoking the 'incident resolve' rpc operation,
      dispatching a ticket or using other functions (e.g. routing
      calculation,configuration)

5.  Functional Interface Requirements between the Client and the Agent

5.1.  Incident Detection

   In alarm-centric solution, although alarms are processed (based on
   manual rules or preconfigured rule) before being sent to the network
   OSS, multiple alarms are still sent to the network OSS.  Whether
   these alarms have impact on network services and how much of the
   impact they created, it highly depends on the network OSS to analyze,
   which affects the efficiency of network maintenance.

Feng, et al.            Expires 14 September 2023               [Page 9]
Internet-Draft             Incident Management                March 2023

           +--------------+
        +--|  Incident1   |
        |  +--+-----------+
        |     |  +-----------+
        |     +--+  alarm1   |
        |     |  +-----------+
        |     |
        |     |  +-----------+
        |     +--+  alarm2   |
        |     |  +-----------+
        |     |
        |     |  +-----------+
        |     +--+  alarm3   |
        |        +-----------+
        |  +--------------+
        +--|  Incident2   |
        |  +--+-----------+
        |     |  +-----------+
        |     +--+  metric1  |
        |     |  +-----------+
        |     |  +-----------+
        |     +--+  metric2  |
        |        +-----------+
        |
        |  +--------------+
        +--|  Incident3   |
           +--+-----------+
              |  +-----------+
              +--+ alarm1    |
              |  +-----------+
              |
              |  +-----------+
              +--| metric1   |
                 +-----------+

                        Figure 2: Incident Detection

   The incident management agent MUST be capable of detecting incidents.
   It can analyze the impact on network services from numerous alarms or
   monitor network service quality.  Once the network service quality
   does not meet expectations, the incident agent MUST report the
   incident.

   As described in Figure 2, multiple alarms, metrics, or hybrid can be
   aggregated into an incident after analysis.  Each incident is
   associated with network services.

Feng, et al.            Expires 14 September 2023              [Page 10]
Internet-Draft             Incident Management                March 2023

                        +----------------------+
                        |                      |
                        |     Orchestrator     |
                        |                      |
                        +----+-----------------+
                             ^VPN A Unavailable
                             |
                         +---+----------------+
                         |                    |
                         |     Controller     |
                         |                    |
                         |                    |
                         +-+-+-+-----+--+-----+
                         ^ ^            ^
                     IGP | |Interface   |IGP Peer
                    Down | |Down        | Abnormal
                         | |            |
        VPN A            | |            |
       +-----------------+-+------------+------------------*
       | \  +---+       ++-++         +-+-+        +---+  /|
       |  \ |   |       |   |         |   |        |   | / |
       |   \|PE1+-------| P1+X--------|P2 +--------|PE2|/  |
       |    +---+       +---+         +---+        +---+   |
       +---------------------------------------------------+

                 Figure 3: Example 1 of Incident Detection

   As described in Figure 3, vpn a is deployed from PE1 to PE2, if a
   interface of P1 is going down, many alarms are triggered, such as
   interface down, igp down, and igp peer abnormal from P2.  These
   alarms are aggregated and analyzed by controller, and the incident
   'vpn unavailable' is triggered by the controller.

Feng, et al.            Expires 14 September 2023              [Page 11]
Internet-Draft             Incident Management                March 2023

                       +----------------------+
                       |                      |
                       |     Orchestrator     |
                       |                      |
                       +----+-----------------+
                            ^VPN A Degradation
                            |
                        +---+----------------+
                        |                    |
                        |     controller     |
                        |                    |
                        |                    |
                        +-+-+-+-----+--+-----+
                          ^            ^
                          |Packet      |Path Delay
                          |Loss        |
                          |            |
       VPN A              |            |
      +-------------------+------------+-------------------+
      | \  +---+       ++-++         +-+-+        +---+  / |
      |  \ |   |       |   |         |   |        |   | /  |
      |   \|PE1+-------|P1 +---------|P2 +--------|PE2|/   |
      |    +---+       +---+         +---+        +---+    |
      +----------------------------------------------------+

                 Figure 4: Example 2 of Incident Detection

   As described in Figure 4, controller collect the network metrics from
   network elements, it finds the packet loss of P1 and the path delay
   of P2 exceed the thresholds, an incident 'VPN A degradation' may be
   triggered after analysis.

5.2.  Incident Diagnosis

   After an incident is reported to the incident management client, the
   client MAY diagnose the incident to determine the root cause.  Some
   diagnosis operations may affect the running network services.  The
   client can choose not to perform that diagnosis operation after
   determining the impact is not trivial.  The incident management agent
   can also perform self-diagnosis.  However, the self-diagnosis MUST
   not affect the running network services.  Possible diagnosis methods
   include link reachability detection, link quality detection, alarm/
   log analysis, and short-term fine-grained monitoring of network
   quality metrics, etc.

Feng, et al.            Expires 14 September 2023              [Page 12]
Internet-Draft             Incident Management                March 2023

5.3.  Incident Resolution

   After the root cause is diagnosed, the client MAY resolve the
   incident.  The client MAY choose resolve the incident by invoking
   other functions, such as routing calculation function, configuration
   function, dispatching a ticket or asking the agent to resolve it.
   Generally, the client would attempt to directly resolve the root
   cause.  If the root cause cannot be resolved, an alternative solution
   SHOULD be required.  For example, if an incident caused by a physical
   component failure, it cannot be automatically resolved, the standby
   link can be used to bypass the faulty component.

   If the incident has been resolved, the client MAY indicate the agent
   to change the incident status to 'cleared'.  If the incident is
   resolved by the agent, this indicator is unnecessary.

   Incident resolution may affect the running network services.  The
   client can choose not to perform those operations after determining
   the impact is not trivial.

6.  Incident Data Model Concepts

6.1.  Identifying the Incident Instance

   An incident instance is associated with the specific network services
   instance and an incident name.  An incident ID is used as an
   identifier of an incident instance, if an incident instance is
   detected, a new incident ID is created.  The incident ID MUST be
   unique in the whole system.

6.2.  The Incident Lifecycle

6.2.1.  Incident Instance Lifecycle

   From an incident instance perspective, an incident can have the
   following lifecycle: 'raised', 'updated', 'cleared'.  When an
   incident is generated, the status is 'raised'.  If the status changes
   after the incident is generated, (for example, self-diagnosis,
   diagnosis command issued by the client, or any other condition causes
   the status to change but does not reach the 'cleared' level.) , the
   status changes to 'updated'.  When an incident is successfully
   resolved, the status changes to 'cleared'.

Feng, et al.            Expires 14 September 2023              [Page 13]
Internet-Draft             Incident Management                March 2023

6.2.2.  Operator Incident Lifecycle

   From an operator perspective, the lifecycle of an incident instance
   includes 'acknowledged', 'diagnosed', and 'resolved'.  When an
   incident instance is generated, the operator SHOULD acknowledge the
   incident.  And then the operator attempts to diagnose the incident
   (for example, find out the root cause and affected components).
   Diagnosis is not mandatory.  If the root cause and affected
   components are known when the incident is generated, diagnosis is not
   required.  After locating the root cause and affected components,
   operator can try to resolve the incident.

7.  Incident Data Model

7.1.  Overview

Feng, et al.            Expires 14 September 2023              [Page 14]
Internet-Draft             Incident Management                March 2023

      module: ietf-incident
        +--ro incidents
           +--ro incident* [incident-id]
              +--ro incident-id string
              +--ro csn uint64
              +--ro service-instance* string
              +--ro name string
              +--ro type enumeration
              +--ro domain identityref
              +--ro priority incident-priority
              +--ro status? enumeration
              +--ro ack-status? enumeration
              +--ro category identityref
              +--ro tenant? string
              +--ro detail? string
              +--ro resolve-suggestion? string
              +--ro sources
              | ...
              +--ro root-causes
              | ...
              +--ro events
              | ...
              +--ro raise-time? yang:date-and-time
              +--ro occur-time? yang:date-and-time
              +--ro clear-time? yang:date-and-time
              +--ro ack-time? yang:date-and-time
              +--ro last-updated? yang:date-and-time
        rpcs:
          +---x incident-acknowledge
          | ...
          +---x incident-diagnose
          | ...
          +---x incident-resolve
            ...
        notifications:
          +---n incident-notification
             +--ro incident-id? string
             ...

7.2.  Incident Notifications

Feng, et al.            Expires 14 September 2023              [Page 15]
Internet-Draft             Incident Management                March 2023

     notifications:
         +---n incident-notification
            +--ro incident-id? string
            +--ro csn uint64
            +--ro service-instance* string
            +--ro name string
            +--ro type enumeration
            +--ro domain identityref
            +--ro priority incident-priority
            +--ro status? enumeration
            +--ro ack-status? enumeration
            +--ro category identityref
            +--ro tenant? string
            +--ro detail? string
            +--ro resolve-suggestion? string
            +--ro sources
            |  +--ro source* [node]
            |     +--ro node
            |             -> /nw:networks/nw:network/nw:node/nw-inv:name
            |     +--ro resource* [name]
            |        +--ro name al:resource
            +--ro root-causes
            |  +--ro root-cause* [node]
            |     +--ro node
            |             -> /nw:networks/nw:network/nw:node/nw-inv:name
            |     +--ro resource* [name]
            |     |  +--ro name al:resource
            |     |  +--ro cause-name? string
            |     |  +--ro detail? string
            |     +--ro cause-name? string
            |     +--ro detail? string
            +--ro events
            |  +--ro event* [type original-node]
            |     +--ro type enumeration
            |     +--ro original-node union
            |     +--ro is-root? boolean
            |     +--ro (event-type-info)?
            |        +--:(alarm)
            |        |  +--ro alarm
            |        |     +--ro resource? leafref
            |        |     +--ro alarm-type-id? leafref
            |        |     +--ro alarm-type-qualifier? leafref
            |        +--:(notification)
            |        +--:(log)
            |        +--:(KPI)
            |        +--:(unknown)
            +--ro time? yang:date-and-time

Feng, et al.            Expires 14 September 2023              [Page 16]
Internet-Draft             Incident Management                March 2023

   A general notification, incident-notification, is provided here.
   When an incident instance is detected, the notification will be sent.
   After a notification is generated, if the incident management agent
   performs self diagnosis or the client uses the interfaces provided by
   the incident management agent to deliver diagnosis and resolution
   actions, the notification update behavior is triggered, for example,
   the root cause objects and affected objects are updated.  When an
   incident is successfully resolved, the status of the incident would
   be set to 'cleared'.

7.3.  Incident Acknowledge

       +---x incident-acknowledge
       |  +---w input
       |  |  +---w incident-id* string

   After an incident is generated, updated, or cleared, (In some
   scenarios where automatic diagnosis and resolution are supported, the
   status of an incident may be updated multiple times or even
   automatically resolved.)  The operator needs to confirm the incident
   to ensure that the client knows the incident.

   The incident-acknowledge rpc can confirm multiple incidents at a time

7.4.  Incident Diagnose

       +---x incident-diagnose
       |  +---w input
       |  |  +---w incident-id* string
       |  +--ro output
       |     +--ro incident* [incident-id]
       |        +--ro incident-id? string
       |        +--ro (result)?
       |           +--:(success)
       |           |  +--ro service-instance? string
       |           |  +--ro name? string
       |           |  +--ro domain? identityref
       |           |  +--ro priority? incident-priority
       |           |  +--ro impact? enumeration
       |           |  +--ro status? enumeration
       |           |  +--ro ack-status? enumeration
       |           |  +--ro category? identityref
       |           |  +--ro tenant? string
       |           |  +--ro detail? string
       |           |  +--ro resolve-suggestion? string
       |           |  +--ro sources
       |           |  |  +--ro source* [node]
       |           |  |     +--ro node? leafref

Feng, et al.            Expires 14 September 2023              [Page 17]
Internet-Draft             Incident Management                March 2023

       |           |  |     +--ro resource* [name]
       |           |  |        +--ro name? al:resource
       |           |  +--ro root-causes
       |           |  |  +--ro root-cause* [node]
       |           |  |     +--ro node? leafref
       |           |  |     +--ro resource* [name]
       |           |  |     |  +--ro name? al:resource
       |           |  |     |  +--ro cause-name? string
       |           |  |     |  +--ro detail? string
       |           |  |     +--ro cause-name? string
       |           |  |     +--ro detail? string
       |           |  +--ro affects
       |           |  |  +--ro affect* [node]
       |           |  |     +--ro node? leafref
       |           |  |     +--ro resource* [name]
       |           |  |     |  +--ro name? al:resource
       |           |  |     |  +--ro state? enumeration
       |           |  |     |  +--ro detail? string
       |           |  |     +--ro state? enumeration
       |           |  |     +--ro detail? string
       |           |  +--ro links
       |           |  |  +--ro link* leafref
       |           |  +--ro events
       |           |  |  +--ro event* [type original-node]
       |           |  |     +--ro type? enumeration
       |           |  |     +--ro original-node? union
       |           |  |     +--ro is-root? boolean
       |           |  |     +--ro (event-type-info)?
       |           |  |        +--:(alarm)
       |           |  |        |  +--ro alarm
       |           |  |        |     +--ro resource? leafref
       |           |  |        |     +--ro alarm-type-id? leafref
       |           |  |        |     +--ro alarm-type-qualifier? leafref
       |           |  |        +--:(notification)
       |           |  |        +--:(log)
       |           |  |        +--:(KPI)
       |           |  |        +--:(unknown)
       |           |  +--ro time? yang:date-and-time
       |           +--:(failure)
       |              +--ro error-code? string
       |              +--ro error-message? string

   After an incident is generated, incident diagnose rpc can be used to
   diagnose the incident and locate the root causes.  Diagnosis can be
   performed on some detection tasks, such as BFD detection, flow
   detection, telemetry collection, short-term threshold alarm,
   configuration error check, or test packet injection.

Feng, et al.            Expires 14 September 2023              [Page 18]
Internet-Draft             Incident Management                March 2023

   If the diagnosis is successful, the latest status of the incident
   will be returned and a notification of the incident update will be
   triggered.  If the diagnosis fails, error code and error message will
   be returned.

7.5.  Incident Resolution

         +---x incident-resolve
            +---w input
            |  +---w incident* [incident-id]
            |     +---w incident-id
            |             -> /inc:incidents/inc:incident/inc:incident-id
            |     +---w resolved? empty
            +--ro output
               +--ro incident* [incident-id]
                  +--ro incident-id string
                  +--ro (result)?
                     +--:(success)
                     |  +--ro success? empty
                     |  +--ro time? yang:date-and-time
                     +--:(failure)
                        +--ro error-code? string
                        +--ro error-message? string

   After the root cause and impact are determined, incident-resolve rpc
   can be used to resolve the incident (if the agent can resolve it) or
   indicate the incident instances have been resolved by other means.
   How to resolve an incident instance is out of the scope of this
   document.

   Incident resolve rpc allows multiple incident instances to be
   resolved at a time.  If an incident instance is successfully
   resolved, the success flag and resolve time will be returned, and a
   notification will be triggered to update the incident status to
   'cleared'.  If an incident fails to be resolved, an error code and an
   error message will be returned.  If the incident content is changed
   during this process, a notification update will be triggered.

8.  Incident Management YANG Module

   <CODE BEGINS>
          file="ietf-incident@2023-03-13.yang"
      module ietf-incident {
        yang-version 1.1;
        namespace "urn:ietf:params:xml:ns:yang:ietf-incident";
        prefix inc;
        import ietf-yang-types {
          prefix yang;

Feng, et al.            Expires 14 September 2023              [Page 19]
Internet-Draft             Incident Management                March 2023

          reference
            "RFC 6991: Common YANG Data Types";
        }
        import ietf-network {
          prefix nw;
          reference
            "RFC 8345: A YANG Data Model for Network Topologies";
        }
        import ietf-network-inventory {
          prefix nw-inv;
          reference
            "draft-wzwb-opsawg-network-inventory-management-01:
            An Inventory Management Model for Enterprise Networks";
        }
        import ietf-alarms {
          prefix al;
          reference
            "RFC 8632: A YANG Data Model for Alarm Management";
        }
        organization
          "IETF OPSAWG Working Group";
        contact
          "WG Web:   &lt;https://datatracker.ietf.org/wg/opsawg/&gt;
           WG List:  &lt;mailto:opsawg@ietf.org&gt;
           Author:   Chong Feng  &lt;mailto:frank.fengchong@huawei.com&gt;
           Author:   Tong Hu  &lt;mailto:hutong@cmhi.chinamobile.com&gt;
           Author:   Luis Miguel Contreras Murillo &lt;mailto:
                     luismiguel.contrerasmurillo@telefonica.com&gt;;
           Author :   Qin Wu   &lt;mailto:bill.wu@huawei.com&gt;
           Author:   ChaoDe Yu   &lt;mailto:yuchaode@huawei.com&gt;";

       description
          "This module defines the interfaces for incident management
           lifecycle.

           This module is intended for the following use cases:
           * incident lifecycle management:
             - incident report: report incident instance to client
                                when an incident instance is detected.
             - incident acknowledge: acknowledge an incident instance.
             - incident diagnose: diagnose an incident instance.
             - incident resolve: resolve an incident instance.

           Copyright (c) 2022 IETF Trust and the persons identified as
           authors of the code.  All rights reserved.

           Redistribution and use in source and binary forms, with or
           without modification, is permitted pursuant to, and subject

Feng, et al.            Expires 14 September 2023              [Page 20]
Internet-Draft             Incident Management                March 2023

           to the license terms contained in, the Revised BSD License
           set forth in Section 4.c of the IETF Trust's Legal Provisions
           Relating to IETF Documents
           (https://trustee.ietf.org/license-info).
           This version of this YANG module is part of RFC XXXX; see the
           RFC itself for full legal notices.  ";
        revision 2023-03-13 {
          description "initial version";
          reference "RFC XXX: Yang module for incident management.";
        }
        //identities
        identity incident-domain {
          description "The abstract identity to indicate the domain of
                       an incident.";
        }
        identity single-domain {
          base incident-domain;
          description "single domain.";
        }
        identity access {
          base single-domain;
          description "access domain.";
        }
        identity ran {
          base access;
          description "radio access network domain.";
        }
        identity transport {
          base single-domain;
          description "transport domain.";
        }
        identity otn {
          base transport;
          description "optical transport network domain.";
        }
        identity ip {
          base single-domain;
          description "ip domain.";
        }
        identity ptn {
          base ip;
          description "packet transport network domain.";
        }

        identity cross-domain {
          base incident-domain;
          description "cross domain.";
        }

Feng, et al.            Expires 14 September 2023              [Page 21]
Internet-Draft             Incident Management                March 2023

        identity incident-category {
          description "The abstract identity for incident category.";
        }
        identity device {
          base incident-category;
          description "device category.";
        }
        identity power-enviorment {
          base device;
          description "power system category.";
        }
        identity device-hardware {
          base device;
          description "hardware of device category.";
        }
        identity device-software {
          base device;
          description "software of device category";
        }
        identity line {
          base device-hardware;
          description "line card category.";
        }
        identity maintenance {
          base incident-category;
          description "maintenance category.";
        }
        identity network {
          base incident-category;
          description "network category.";
        }
        identity protocol {
          base incident-category;
          description "protocol category.";
        }
        identity overlay {
          base incident-category;
          description "overlay category";
        }
        identity vm {
          base incident-category;
          description "vm category.";
        }

        //typedefs
        typedef incident-priority {
          type enumeration {
            enum critical {

Feng, et al.            Expires 14 September 2023              [Page 22]
Internet-Draft             Incident Management                March 2023

              description "the incident MUST be handled immediately.";
            }
            enum high {
              description "the incident should be handled as soon as
                           possible.";
            }
            enum medium {
              description "network services are not affected, or the
                           services are slightly affected,but corrective
                           measures need to be taken.";
            }
            enum low {
              description "potential or imminent service-affecting
                           incidents are detected,but services are
                           not affected currently.";
            }
          }
          description "define the priority of incident.";
        }
        typedef node-ref {
          type leafref {
            path "/nw:networks/nw:network/nw:node/nw-inv:name";
          }
          description "reference a network node.";
        }
        //groupings
        grouping resources-info {
          description "the grouping which defines the network
                       resources of a node.";
          leaf node {
            type node-ref;
            description "reference to a network node.";
          }
          list resource {
            key name;
            description "the resources of a network node.";
            leaf name {
               type al:resource;
               description "network resource name.";
            }
          }
        }

        grouping incident-time-info {
          description "the grouping defines incident time information.";
          leaf raise-time {
            type yang:date-and-time;
            description "the time when an incident instance is raised.";

Feng, et al.            Expires 14 September 2023              [Page 23]
Internet-Draft             Incident Management                March 2023

          }
          leaf occur-time {
            type yang:date-and-time;
            description "the time when an incident instance is occured.
                         It's the occur time of the first event during
                         incident detection.";
          }
          leaf clear-time {
            type yang:date-and-time;
            description "the time when an incident instance is
                         resolved.";
          }
          leaf ack-time {
            type yang:date-and-time;
            description "the time when an incident instance is
                         acknowledged.";
          }
          leaf last-updated {
            type yang:date-and-time;
            description "the latest time when an incident instance is
                         updated";
          }
        }

        grouping incident-info {
          description "the grouping defines the information of an
                       incident.";
          leaf csn {
            type uint64;
            mandatory true;
            description "The sequence number of the incident instance.";
          }
          leaf-list service-instance {
            type string;
            description "the related network service instances of
                         the incident instance.";
          }
          leaf name {
            type string;
            mandatory true;
            description "the name of an incident.";
          }
          leaf type {
            type enumeration {
              enum fault {
                description "It indicates the type of the incident
                             is a fault, for example an interface
                             fails to work.";

Feng, et al.            Expires 14 September 2023              [Page 24]
Internet-Draft             Incident Management                March 2023

              }
              enum potential-risk {
                description "It indicates the type of the incident
                             is a potential risk, for example high
                             CPU rate may cause a fault in the
                             future.";
              }
            }
            mandatory true;
            description "The type of an incident.";
          }
          leaf domain {
            type identityref {
              base incident-domain;
            }
            mandatory true;
            description "the domain of an incident.";
          }
          leaf priority {
            type incident-priority;
            mandatory true;
            description "the priority of an incident instance.";
          }

          leaf status {
            type enumeration {
              enum raised {
                description "an incident instance is raised.";
              }
              enum updated {
                description "the information of an incident instance
                             is updated.";
              }
              enum cleared {
                description "an incident is cleared.";
              }
            }
            default raised;
            description "The status of an incident instance.";
          }
          leaf ack-status {
            type enumeration {
              enum acknowledged;
              enum unacknowledged;
            }
            default unacknowledged;
            description "the acknowledge status of an incident.";
          }

Feng, et al.            Expires 14 September 2023              [Page 25]
Internet-Draft             Incident Management                March 2023

          leaf category {
            type identityref {
              base incident-category;
            }
            mandatory true;
            description "The category of an incident.";
          }

          leaf tenant {
            type string;
            description "the identifier of related tenant.";
          }
          leaf detail {
            type string;
            description "detail information of this incident.";
          }
          leaf resolve-suggestion {
            type string;
            description "The suggestion to resolve this incident.";
          }
          container sources {
            description "The source components.";
            list source {
              key node;
              uses resources-info;
              min-elements 1;
              description "The source components of incident.";
            }
          }

          container root-causes{
            description "The root cause objects.";
            list root-cause {
              key node;
              description "the root causes of incident.";
              grouping root-cause-info {
                description "The information of root cause.";
                leaf cause-name {
                  type string;
                  description "the name of cause";
                }
                leaf detail {
                  type string;
                  description "the detail information of the cause.";
                }
              }
              uses resources-info {
                augment resource {

Feng, et al.            Expires 14 September 2023              [Page 26]
Internet-Draft             Incident Management                March 2023

                  description "augment root cause information.";
                  //if root cause object is a resource of a node
                  uses root-cause-info;
                }
              }
              //if root cause object is a node
              uses root-cause-info;
            }
          }
          container events {
            description "related event.";
            list event {
              key "type original-node";
              description "related event.";
              leaf type {
                type enumeration {
                  enum alarm {
                    description "alarm type";
                  }
                  enum notification {
                    description "notification type";
                  }
                  enum log {
                    description "log type";
                  }
                  enum KPI {
                    description "KPI type";
                  }
                  enum unknown {
                    description "unknown type";
                  }
                }
                description "event type.";
              }
              leaf original-node {
                type union {
                  type node-ref;
                  type empty;//self
                }
                description "the original node where the event occurs.";
              }
              leaf is-root {
                type boolean;
                default false;
                description "whether this event is the cause of
                              incident.";
              }
              choice event-type-info {

Feng, et al.            Expires 14 September 2023              [Page 27]
Internet-Draft             Incident Management                March 2023

                description "event type information.";
                case alarm {
                  when "type = 'alarm'";
                  container alarm {
                    description "alarm type event.";
                    leaf resource {
                      type leafref {
                        path "/al:alarms/al:alarm-list/al:alarm"
                            +"/al:resource";
                      }
                      description "network resource.";
                      reference "RFC 8632: A YANG Data Model for Alarm
                                 Management";
                    }
                    leaf alarm-type-id {
                      type leafref {
                        path "/al:alarms/al:alarm-list/al:alarm"
                            +"[al:resource = current()/../resource]"
                            +"/al:alarm-type-id";
                      }
                      description "alarm type id";
                      reference "RFC 8632: A YANG Data Model for Alarm
                                  Management";
                    }
                    leaf alarm-type-qualifier {
                      type leafref {
                        path "/al:alarms/al:alarm-list/al:alarm"
                            +"[al:resource = current()/../resource]"
                            +"[al:alarm-type-id = current()/.."
                            +"/alarm-type-id]/al:alarm-type-qualifier";
                      }
                      description "alarm type qualitifier";
                      reference "RFC 8632: A YANG Data Model for Alarm
                                 Management";
                    }
                  }
                }
                case notification {
                  //TODO
                }
                case log {
                //TODO
                }
                case KPI {
                //TODO
                }
                case unknown {
                //TODO

Feng, et al.            Expires 14 September 2023              [Page 28]
Internet-Draft             Incident Management                March 2023

                }
              }
            }

          }

        }

        //data definitions
        container incidents {
          config false;
          description "the information of incidents.";
          list incident {
            key incident-id;
            description "the information of incident.";
            leaf incident-id {
              type string;
              description "the identifier of an incident instance.";
            }
            uses incident-info;
            uses incident-time-info;
          }
        }

        // notifications
        notification incident-notification {
          description "incident notification. It will be triggered when
                       the incident is raised, updated or cleared.";
          leaf incident-id {
            type string;
            description "the identifier of an incident instance.";
          }
          uses incident-info;
          leaf time {
            type yang:date-and-time;
            description "occur time of an incident instance.";
          }
        }
        // rpcs
        rpc incident-acknowledge {
          description "This rpc can be used to acknowledge the specified
                       incidents.";
          input {
            leaf-list incident-id {
              type string;
              description "the identifier of an incident instance.";
            }
          }

Feng, et al.            Expires 14 September 2023              [Page 29]
Internet-Draft             Incident Management                March 2023

        }
        rpc incident-diagnose {
          description "This rpc can be used to diagnose the specified
                       incidents.";
          input {
            leaf-list incident-id {
              type string;
              description
                "the identifier of an incident instance.";
            }
          }
          output {
            list incident {
              key incident-id;
              description "The entry of returned incidents.";
              leaf incident-id {
                type string;
                description
                  "the identifier of an incident instance.";
              }
              choice result {
                description "result information.";
                case success {
                  uses incident-info;
                  leaf time {
                    type yang:date-and-time;
                    description
                      "The update time of an incident.";
                  }
                }
                case failure {
                  leaf error-code {
                    type string;
                    description "error code";
                  }
                  leaf error-message {
                    type string;
                    description "error message";
                  }
                }
              }
            }
          }
        }

        rpc incident-resolve {
          description "This rpc can be used to resolve the specified
                       incidents. It also can be used to set the

Feng, et al.            Expires 14 September 2023              [Page 30]
Internet-Draft             Incident Management                March 2023

                       incident instances are resolved if these incident
                       instances are resolved by external system.";
          input {
            list incident {
              key incident-id;
              min-elements 1;
              description "incident instances.";
              leaf incident-id {
                type leafref {
                  path "/inc:incidents/inc:incident/inc:incident-id";
                }
                description
                  "the identifier of an incident instance.";
              }
              leaf resolved {
                type empty;
                description "indicate the incident instance has
                             been resolved.";
              }

            }
          }
          output {
            list incident {
              key incident-id;
              description "incident instances";
              leaf incident-id {
                type string;
                description "the identifier of incident instance";
              }
              choice result {
                description "result information";
                case success {
                  leaf success {
                    type empty;
                    description "reslove incident instance
                                 successfully";
                  }
                  leaf time {
                    type yang:date-and-time;
                    description "The resolved time of an incident.";
                  }
                }
                case failure {
                  leaf error-code {
                    type string;
                    description "error code";
                  }

Feng, et al.            Expires 14 September 2023              [Page 31]
Internet-Draft             Incident Management                March 2023

                  leaf error-message {
                    type string;
                    description "error message.";
                  }
                }
              }
            }
          }
        }
      }
   <CODE ENDS>

9.  IANA Considerations

9.1.  The "IETF XML" Registry

   This document registers one XML namespace URN in the 'IETF XML
   registry', following the format defined in [RFC3688].

   URI: urn:ietf:params:xml:ns:yang:ietf-incident
   Registrant Contact: The IESG.
   XML: N/A, the requested URIs are XML namespaces.

9.2.  The "YANG Module Names" Registry

   This document registers one module name in the 'YANG Module Names'
   registry, defined in [RFC6020].

   name: ietf-incident
   prefix: inc
   namespace: urn:ietf:params:xml:ns:yang:ietf-incident
   RFC: XXXX
   // RFC Ed.: replace XXXX and remove this comment

10.  Security Considerations

   The YANG modules specified in this document define a schema for data
   that is designed to be accessed via network management protocol such
   as NETCONF [RFC6241] or RESTCONF [RFC8040].  The lowest NETCONF layer
   is the secure transport layer, and the mandatory-to-implement secure
   transport is Secure Shell (SSH) [RFC6242].  The lowest RESTCONF layer
   is HTTPS, and the mandatory-to-implement secure transport is TLS
   [RFC8446].

   The Network Configuration Access Control Model (NACM) [RFC8341]
   provides the means to restrict access for particular NETCONF or
   RESTCONF users to a preconfigured subset of all available NETCONF or
   RESTCONF protocol operations and content.

Feng, et al.            Expires 14 September 2023              [Page 32]
Internet-Draft             Incident Management                March 2023

   There are a number of data nodes defined in this YANG module that are
   writable/creatable/deletable (i.e., config true, which is the
   default).  These data nodes may be considered sensitive or vulnerable
   in some network environments.  Write operations (e.g., edit-config)
   to these data nodes without proper protection can have a negative
   effect on network operations.  These are the subtrees and data nodes
   and their sensitivity/vulnerability:

   Some of the readable data nodes in this YANG module may be considered
   sensitive or vulnerable in some network environments.  It is thus
   important to control read access (e.g., via get, get-config, or
   notification) to these data nodes.  These are the subtrees and data
   nodes and their sensitivity/vulnerability:

   Some of the RPC operations in this YANG module may be considered
   sensitive or vulnerable in some network environments.  It is thus
   important to control access to these operations.  These are the
   operations and their sensitivity/vulnerability:

11.  Contributors

   Aihua Guo
   Futurewei Technologies
   Email: aihuaguo.ietf@gmail.com

12.  Acknowledgments

   The authors would like to thank Mohamed Boucadair, Zhidong Yin,
   Guoxiang Liu, Haomian Zheng, YuanYao for their valuable comments and
   great input to this work.

13.  References

13.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3688]  Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
              DOI 10.17487/RFC3688, January 2004,
              <https://www.rfc-editor.org/info/rfc3688>.

   [RFC6020]  Bjorklund, M., Ed., "YANG - A Data Modeling Language for
              the Network Configuration Protocol (NETCONF)", RFC 6020,
              DOI 10.17487/RFC6020, October 2010,
              <https://www.rfc-editor.org/info/rfc6020>.

Feng, et al.            Expires 14 September 2023              [Page 33]
Internet-Draft             Incident Management                March 2023

   [RFC7950]  Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
              RFC 7950, DOI 10.17487/RFC7950, August 2016,
              <https://www.rfc-editor.org/info/rfc7950>.

   [RFC8345]  Clemm, A., Medved, J., Varga, R., Bahadur, N.,
              Ananthakrishnan, H., and X. Liu, "A YANG Data Model for
              Network Topologies", RFC 8345, DOI 10.17487/RFC8345, March
              2018, <https://www.rfc-editor.org/info/rfc8345>.

   [RFC8632]  Vallin, S. and M. Bjorklund, "A YANG Data Model for Alarm
              Management", RFC 8632, DOI 10.17487/RFC8632, September
              2019, <https://www.rfc-editor.org/info/rfc8632>.

13.2.  Informative References

   [I-D.ietf-opsawg-yang-vpn-service-pm]
              Wu, B., Wu, Q., Boucadair, M., de Dios, O. G., and B. Wen,
              "A YANG Model for Network and VPN Service Performance
              Monitoring", Work in Progress, Internet-Draft, draft-ietf-
              opsawg-yang-vpn-service-pm-15, 11 November 2022,
              <https://datatracker.ietf.org/doc/html/draft-ietf-opsawg-
              yang-vpn-service-pm-15>.

   [I-D.wzwb-opsawg-network-inventory-management]
              Wu, B., Zhou, C., Wu, Q., and M. Boucadair, "An Inventory
              Management Model for Enterprise Networks", Work in
              Progress, Internet-Draft, draft-wzwb-opsawg-network-
              inventory-management-01, 10 February 2023,
              <https://datatracker.ietf.org/doc/html/draft-wzwb-opsawg-
              network-inventory-management-01>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8969]  Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and
              L. Geng, "A Framework for Automating Service and Network
              Management with YANG", RFC 8969, DOI 10.17487/RFC8969,
              January 2021, <https://www.rfc-editor.org/info/rfc8969>.

   [W3C-Trace-Context]
              W3C, "W3C Recommendation on Trace Context", 23 November
              2021, <https://www.w3.org/TR/2021/REC-trace-context-
              1-20211123/>.

Authors' Addresses

Feng, et al.            Expires 14 September 2023              [Page 34]
Internet-Draft             Incident Management                March 2023

   Chong Feng (editor)
   Huawei
   101 Software Avenue, Yuhua District
   Nanjing
   Jiangsu, 210012
   China
   Email: frank.fengchong@huawei.com

   Tong Hu
   China Mobile (Hangzhou) Information Technology Co., Ltd
   Building A01, 1600 Yuhangtang Road, Wuchang Street, Yuhang District
   Hangzhou
   ZheJiang, 311121
   China
   Email: hutong@cmhi.chinamobile.com

   Luis Miguel Contreras Murillo
   Telefonica I+D
   Madrid
   Spain
   Email: luismiguel.contrerasmurillo@telefonica.com

   Qin Wu
   Huawei
   101 Software Avenue, Yuhua District
   Nanjing
   Jiangsu, 210012
   China
   Email: bill.wu@huawei.com

   Chaode Yu
   Huawei
   Email: yuchaode@huawei.com

Feng, et al.            Expires 14 September 2023              [Page 35]
Incident Management for Network Services draft-feng-opsawg-incident-management-00

Incident Management for Network Services
draft-feng-opsawg-incident-management-00