Network Management Research Group                               J. Nobre
Internet-Draft                                              L. Granville
Intended status: Informational   Federal University of Rio Grande do Sul
Expires: December 22, 2014                                      A. Clemm
                                                               A. Prieto
                                                           Cisco Systems
                                                           June 20, 2014

     Autonomic Networking Use Case for Distributed Detection of SLA


   This document describes a use case for autonomic networking in
   distributed detection of SLA violations.  It is one of a series of
   use cases intended to illustrate requirements for autonomic

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 22, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must

Nobre, et al.           Expires December 22, 2014               [Page 1]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Benefits of an Autonomic Solution . . . . . . . . . . . . . .   4
   4.  Intended User and Administrator Experience  . . . . . . . . .   5
   5.  Analysis of Parameters and Information Involved . . . . . . .   5
     5.1.  Device Based Self-Knowledge and Decisions . . . . . . . .   5
     5.2.  Interaction with other devices  . . . . . . . . . . . . .   5
     5.3.  Information needed from Intent  . . . . . . . . . . . . .   6
     5.4.  Monitoring, diagnostics and reporting . . . . . . . . . .   6
   6.  Comparison with current solutions . . . . . . . . . . . . . .   6
   7.  Related IETF Work . . . . . . . . . . . . . . . . . . . . . .   6
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   10. Security Considerations . . . . . . . . . . . . . . . . . . .   7
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     11.1.  Normative References . . . . . . . . . . . . . . . . . .   7
     11.2.  Informative References . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   The Internet has been improving dramatically in terms of size and
   capacity, and accessibility in the last years.  Besides that, the
   communication requirements of distributed services and applications
   running on top of the Internet have become increasingly accurate.
   Performance issues caused by violations on these requirements usually
   present significant financial loss to organizations and end users.
   Thus, the service level requirements of critical networked services
   provided have become a critical concern for network administrators.
   To ensure that SLAs are not being violated, which would usually incur
   in costly penalties, service levels need to be constantly monitored
   at the network infrastructure layer.  To that end, network
   measurements must take place.  Network measurement mechanisms are
   performed through either active or passive measurement techniques.
   In passive measurement, network conditions are said to be checked in
   a non intrusive way because no monitoring traffic is created by the
   measurement process itself.  In the context of IP Flow Information
   EXport (IPFIX) WG, several documents were produced to define passive
   measurement mechanisms (e.g., flow records specification [RFC3954]).
   Active measurement, on the other hand, is intrusive because it
   injects synthetic traffic into the network to measure the network
   performance.  The IP Performance Metrics (IPPM) WG produced documents

Nobre, et al.           Expires December 22, 2014               [Page 2]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

   that describe active measurement mechanisms, such as: One-Way Active
   Measurement Protocol (OWAMP) [RFC4656], Two-Way Active Measurement
   Protocol (TWAMP) [RFC5357], and Cisco Service Level Assurance
   Protocol (SLA) [RFC6812].  Active measurement mechanisms usually
   offer better accuracy and privacy than passive measurement
   mechanisms.  Furthermore, active measurement mechanisms are able to
   detect end-to-end network performance problems in a fine-grained way.
   As a result, active is preferred over passive measurement for SLA
   monitoring.  Measurement probes must be hosted and activated in
   network devices to compute the current network metrics (e.g.,
   considering those described in [RFC4148]).  This activation should
   dynamic in order to follow changes in network conditions, such as
   those related with routes being added or new customer demands.

2.  Problem Statement

   The activation of active measurement probes (sender and responder
   considering the architecture described by Cisco [RFC6812]) is
   expensive in terms of the resource consumption, e.g., CPU cycle and
   memory footprint, which could be useful for primary network functions
   (e.g., routing and switching).  Besides that, the probes also
   increase the network load because of the injected traffic.  The
   resources required and traffic generated by the measurement probes
   are a function of the number of measured network destinations, i.e.,
   with more destinations the larger will be the resources and the
   traffic needed to deploy the probes.  Thus, to have a better
   monitoring coverage it is necessary to deploy more probes what
   consequently turns increases consumed resources.  Otherwise, enabling
   the observation of just a small subset of all network flows can lead
   to an insufficient coverage.  The current best practice in feasible
   deployments of active measurement solutions to distribute the
   available measurement probes along the network consists in relying
   entirely on the human administrator expertise to infer which would be
   the best location to activate the probes.  This is done through
   several steps.  First, it is necessary to collect traffic information
   in order to grasp the traffic matrix.  Then, the administrator uses
   this information to infer which are the best destinations for
   measurement probes.  After that, the administrator activates probes
   on the chosen subset of destinations considering the available
   resources.  This practice, however, does not scale well because it is
   still labor intensive and error-prone for the administrator to
   compute which probes should be activated given the set of critical
   flows that needs to be measured.  Even worse, this practice
   completely fails in networks whose critical flows are too short in
   time and dynamic in terms of traversing network path, like in modern
   cloud environments.  That is so because fast reactions are necessary
   to reconfigure the probes and administrators are not just enough in
   computing and activating the new set of probes required every time

Nobre, et al.           Expires December 22, 2014               [Page 3]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

   the network traffic pattern changes.  Finally, the current active
   measurements practice usually covers only a fraction of the network
   flows that should be observed, which invariably leads to the damaging
   consequence of undetected SLA violations.  Management software can be
   embedded inside network devices to control the deployment of active
   measurement mechanisms.  In fact, this is done by some network
   equipment vendors, specially to avoid the starvation of the network
   devices (e.g., due to configuration errors and lack of experience
   from human administrators).  However, the current approach do not
   enhance the active measurement capabilities in important terms, such
   as scalability and efficiency.  For example, the number of local
   available measurements (and, consequently, detected SLA violations)
   is still bounded by the number of deployed probes.  Thus, if the
   number of SLA violation is greater than the number of available
   probes, only a fraction of the violations will be observed.  Also,
   devices cannot share resources and knowledge about the networking
   infrastructures in order to take advantage of remote management
   information (e.g., measurement results).

3.  Benefits of an Autonomic Solution

   The use case considered here is distributed autonomic detection of
   SLA violations.  The use of Autonomic Netowrking (AN) properties can
   help the activation of measurement probes [P2PBNM-Nobre-2012].  Peer-
   to-Peer (P2P) technology can be embedded in network devices in order
   to improve the probe activation decisions using autonomic loops.
   Thus, it would be possible to coordinate the probe activation and to
   share measurement results among different network devices.  The
   problem to be solved by AN in the present use case is how to steer
   the process of measurement probe activation by a complete solution
   that sets all necessary parameters for this activation to operate
   efficiently, reliably and securely, with minimal human intervention
   and without the need for.  An autonomic solution for the distributed
   detection of SLA violations can provide several benefits.  First,
   this solution could optimize the resource consumption and avoid
   resource starvation on the network devices.  This optimization comes
   from different sources: sharing of measurement results, better
   efficiency in the probe activation decisions, etc.  Second, the
   number of detected SLA violations could be increased.  This increase
   is related with a better coverage of the network.  Third, the
   solution could decrease the time necessary to detect SLA violations.
   Adaptivity features of an autonomic loop could capture faster the
   network dynamics than an human administrator.  Finally, the solution
   could help to reduce the workload of human administrator, or, at
   least, to avoid their need to perform operational tasks.  The active
   measurement model assumes that a typical infrastructure will have
   multiple network segments and Autonomous Systems (ASs), and a
   reasonably large number of several of routers and hosts.  It also

Nobre, et al.           Expires December 22, 2014               [Page 4]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

   considers that multiple Service Level Objectives (SLOs) can be in
   place in a given time.  Since interoperability in a heterogenous
   network is a goal, features found on different active measurement
   mechanisms (e.g.  OWAMP, TWAMP, and IPSLA) and programability
   interfaces (e.g., Cisco's EEM and onePK) could be used for the
   implementation.  The autonomic solution should include and/or
   reference specific algorithms, protocols, metrics and technologies
   for the implementation of distributed detection of SLA violations as
   a whole.

4.  Intended User and Administrator Experience

   The autonomic solution should avoid the human intervention in the
   distributed detection of SLA violations.  Besides that, it could
   enable the control of SLA monitoring by less experienced human
   administrators.  However, some information is necessary from the
   human administrator.  For example, the human administrator should
   provide the SLOs regarding the SLA being monitored.  The
   configuration and bootstrapping of network devices using the
   autonomic solution should be minimal for the human administrator.
   Probably it would be necessary just to inform the address of a device
   which is already using the solution and the devices themselves could
   exchange configuration data.

5.  Analysis of Parameters and Information Involved

5.1.  Device Based Self-Knowledge and Decisions

   Each device has self-knowledge about the local SLA monitoring.  This
   could be in the form of historical measurement data and SLOs.
   Besides that, the devices would have algorithms that could decide
   which probes should be activated in a given time.  The choice of
   which algorithm is better for a specific situation would be also

5.2.  Interaction with other devices

   Network devices could share information about service level
   measurement results.  This information could speed up the detection
   of SLA violations and increase the number of detected SLA violations.
   In any case, it is necessary to assure that the results from remote
   devices have local relevancy.  The definition of network devices that
   exchange measurement data, i.e., management peers, creates a new
   topology.  Different approaches could be used to define this topology
   (e.g., correlated peers [P2PBNM-Nobre-2012]).  To bootstrap peer
   selection, each device could use its known endpoints neighbors (e.g.,
   FIB and RIB tables) as the initial seed to get possible peers.

Nobre, et al.           Expires December 22, 2014               [Page 5]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

5.3.  Information needed from Intent


5.4.  Monitoring, diagnostics and reporting


6.  Comparison with current solutions

   There is no standartized solution for distributed autonomic detection
   of SLA violations.  Current solutions are restricted to ad hoc
   scripts running on a per node fashion to automate some
   administrator's actions.  There some proposals for passive probe
   activation (e.g., DECON and CSAMP), but without the focus on
   autonomic features.  It is also mentioning a proposal from Barford et
   al. to detect and localize links which cause anomalies along a
   network path.

7.  Related IETF Work

   The following paragraphs discuss related IETF work and are provided
   for reference.  This section is not exhaustive, rather it provides an
   overview of the various initiatives and how they relate to autonomic
   distributed detection of SLA violations.  1.  [LMAP]: The Large-Scale
   Measurement of Broadband Performance Working Group aims at the
   standards for performance management.  Since their mechanisms also
   consist in deploying measurement probes the autonomic solution could
   be relevant for LMAP specially considering SLA violation screening.
   Besides that, a solution to decrease the workload of human
   administrators in service providers is probably highly desirable.  2.
   [IPFIX]: IP Flow Information EXport (IPFIX) aims at the process of
   standardization of IP flows (i.e., netflows).  IPFIX uses measurement
   probes (i.e., metering exporters) to gather flow data.  In this
   context, the autonomic solution for the activation of active
   measurement probes could be possibly extended to address also passive
   measurement probes.  Besides that, flow information could be used in
   the decision making of probe activation.  3.  [ALTO]: The Application
   Layer Traffic Optimization Working Group aims to provide topological
   information at a higher abstraction layer, which can be based upon
   network policy, and with application-relevant service functions
   located in it.  Their work could be leveraged for the definition of
   the topology regarding the network devices which exchange measurement

Nobre, et al.           Expires December 22, 2014               [Page 6]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

8.  Acknowledgements

   We wish to acknowledge the helpful contributions, comments, and
   suggestions that were received from Bruno Klauser, Eric Voig, and
   Hanlin Fang.

9.  IANA Considerations

   This memo includes no request to IANA.

10.  Security Considerations

   The bootstrapping of a new device follows the approach of homenet
   [draft-autonomic-homenet], thus in order to exchange data a device
   should register first.  This registration could be performed by a
   "Registrar" device or a cloud service provided by the organization to
   facilitate autonomic mechanisms.  The new device sends its own
   credentials to the Registrar, and after successful authentication,
   receives domain information, to enable subsequent enrolment to the
   domain.  The Registrar sends all required information: a device name,
   domain name, plus some parameters for the operation.  Measurement
   data should be exchanged signed and encripted among devices since
   these data could carry sensible information about network
   infrastructures.  Some attacks should be considering when analyzing
   the security of the autonomic solution Denial of service (DoS)
   attacks could be performed if the solution be tempered to active more
   local probe than the available resources allow.  Besides that,
   results could be forged by a device (attacker) in order to this
   device be considered peer of a specific device (target).  This could
   be done to gain information about a network.

11.  References

11.1.  Normative References

              Nobre, J., Granville, L., Clemm, A., and A. Prieto,
              "Decentralized Detection of SLA Violations Using P2P
              Technology, 8th International Conference Network and
              Service Management (CNSM)", 2012,

   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
              Zekauskas, "A One-way Active Measurement Protocol
              (OWAMP)", RFC 4656, September 2006.

Nobre, et al.           Expires December 22, 2014               [Page 7]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J.
              Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)",
              RFC 5357, October 2008.

   [RFC6812]  Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare,
              S., and E. Yedavalli, "Cisco Service-Level Assurance
              Protocol", RFC 6812, January 2013.

              Behringer, M., Pritikin, M., and S. Bjarnason, "draft-
              behringer-homenet-trust-bootstrap", draft-behringer-
              homenet-trust-bootstrap-02 (work in progress), February

11.2.  Informative References

   [RFC3954]  Claise, B., "Cisco Systems NetFlow Services Export Version
              9", RFC 3954, October 2004.

   [RFC4148]  Stephan, E., "IP Performance Metrics (IPPM) Metrics
              Registry", BCP 108, RFC 4148, August 2005.

Authors' Addresses

   Jeferson Campos Nobre
   Federal University of Rio Grande do Sul
   Porto Alegre


   Lisandro Zambenedetti Granvile
   Federal University of Rio Grande do Sul
   Porto Alegre


   Alexander Clemm
   Cisco Systems
   San Jose


Nobre, et al.           Expires December 22, 2014               [Page 8]

Internet-Draft   AN Use Case Detection of SLA Violations       June 2014

   Alberto Gonzalez Prieto
   Cisco Systems
   San Jose


Nobre, et al.           Expires December 22, 2014               [Page 9]