NFVRG C. Meirosu
Internet Draft Ericsson
Intended status: Informational A. Manzalini
Expires: January 2016 Telecom Italia
J. Kim
Deutsche Telekom
R. Steinert
SICS
S. Sharma
iMinds
G. Marchetto
Politecnico di Torino
I. Papafili
Hellenic Telecommunications Organization
K. Pentikousis
EICT
S. Wright
AT&T
July 6, 2015
DevOps for Software-Defined Telecom Infrastructures
draft-unify-nfvrg-devops-02.txt
Abstract
Carrier-grade network management was optimized for environments built
with monolithic physical nodes and involves significant deployment,
integration and maintenance efforts from network service providers.
The introduction of virtualization technologies, from the physical
layer all the way up to the application layer, however, invalidates
several well-established assumptions in this domain. This draft opens
the discussion in NFVRG about challenges related to transforming the
telecom network infrastructure into an agile, model-driven production
environment for communication services. We take inspiration from data
center DevOps regarding how to simplify and automate management
processes for a telecom service provider software-defined
infrastructure (SDI). Finally, we introduce challenges associated
with operationalizing DevOps principles at scale in software-defined
telecom networks in three areas related to key monitoring,
verification and troubleshooting processes.
Meirosu, et al. Expires January 6, 2016 [Page 1]
Internet-Draft DevOps Challenges July 2015
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 6, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction...................................................3
2. Software-Defined Telecom Infrastructure: Roles and DevOps
principles........................................................5
2.1. Service Developer Role....................................5
Meirosu, et al. Expires January 6, 2016 [Page 2]
Internet-Draft DevOps Challenges July 2015
2.2. VNF Developer role........................................5
2.3. Operator role.............................................6
2.4. DevOps Principles.........................................6
3. Continuous Integration.........................................7
4. Continuous Delivery............................................8
5. Stability Challenges...........................................8
6. Consistency, Availability and Partitioning Challenges.........10
7. Observability Challenges......................................11
8. Verification Challenges.......................................11
9. Troubleshooting Challenges....................................13
10. Programmable network management..............................14
11. DevOps Performance Metrics...................................15
12. Security Considerations......................................16
13. IANA Considerations..........................................16
14. Informative References.......................................16
15. Acknowledgments..............................................18
1. Introduction
Carrier-grade network management was developed as an incremental
solution once a particular network technology matured and came to be
deployed in parallel with legacy technologies. This approach requires
significant integration efforts when new network services are
launched. Both centralized and distributed algorithms have been
developed in order to solve very specific problems related to
configuration, performance and fault management. However, such
algorithms consider a network that is by and large functionally
static. Thus, management processes related to introducing new or
maintaining functionality are complex and costly due to significant
efforts required for verification and integration.
Network virtualization, by means of Software-Defined Networking (SDN)
and Network Function Virtualization (NFV), creates an environment
where network functions are no longer static nor stricltly embedded
in physical boxes deployed at fixed points. The virtualized network
is dynamic and open to fast-paced innovation enabling efficient
network management and reduction of operating cost for network
operators. A significant part of network capabilities are expected to
become available through interfaces that resemble the APIs widespread
within datacenters instead of the traditional telecom means of
management such as the Simple Network Management Protocol, Command
Line Interfaces or CORBA. Such an API-based approach, combined with
the programmability offered by SDN interfaces [RFC7426], open
opportunities for handling infrastructure, resources, and Virtual
Network Functions (VNFs) as code, employing techniques from software
engineering.
Meirosu, et al. Expires January 6, 2016 [Page 3]
Internet-Draft DevOps Challenges July 2015
The efficiency and integration of existing management techniques in
virtualized and dynamic network environments are limited, however.
Monitoring tools, e.g. based on simple counters, physical network
taps and active probing, do not scale well and provide only a small
part of the observability features required in such a dynamic
environment. Although huge amounts of monitoring data can be
collected from the nodes, the typical granularity is rather coarse.
Debugging and troubleshooting techniques developed for software-
defined environments are a research topic that has gathered interest
in the research community in the last years. Still, it is yet to be
explored how to integrate them into an operational network management
system. Moreover, research tools developed in academia (such as
NetSight [H2014], OFRewind [W2011], FlowChecker [S2010], etc.) were
limited to solving very particular, well-defined problems, and
oftentimes are not built for automation and integration into carrier-
grade network operations workflows.
The topics at hand have already attracted several standardization
organizations to look into the issues arising in this new
environment. For example, IETF working groups have activities in the
area of OAM and Verification for Service Function Chaining
[I-D.aldrin-sfc-oam-framework] [I-D.lee-sfc-verification] for Service
Function Chaining. At IRTF, [RFC7149] asks a set of relevant
questions regarding operations of SDNs. The ETSI NFV ISG defines the
MANO interfaces [NFVMANO], and TMForum investigates gaps between
these interfaces and existing specifications in [TR228]. The need for
programmatic APIs in the orchestration of compute, network and
storage resources is discussed in [I-
D.unify-nfvrg-challenges].
From a research perspective, problems related to operations of
software-defined networks are in part outlined in [SDNsurvey] and
research referring to both cloud and software-defined networks are
discussed in [D4.1].
The purpose of this first version of this document is to act as a
discussion opener in NFVRG by describing a set of principles that are
relevant for applying DevOps ideas to managing software-defined
telecom network infrastructures. We identify a set of challenges
related to developing tools, interfaces and protocols that would
support these principles and how can we leverage standard APIs for
simplifying management tasks.
Meirosu, et al. Expires January 6, 2016 [Page 4]
Internet-Draft DevOps Challenges July 2015
2. Software-Defined Telecom Infrastructure: Roles and DevOps principles
Agile methods used in many software focused companies are focused on
releasing small interactions of code tom implement VNFs with high
velocity and high quality into a production environment. Similarly
Service providers are interested to release incremental improvements
in the network services that they create from virtualized network
functions. The cycle time for DevOps as applied in many open source
projects is on the order of one quarter year or 13 weeks.
The code needs to undergo a significant amount of automated testing
and verification with pre-defined templates in a realistic setting.
From the point of view of infrastructure management, the verification
of the network configuration as result of network policy
decomposition and refinement, as well as the configuration of virtual
functions, is one of the most sensitive operations. When
troubleshooting the cause of unexpected behavior, fine-grained
visibility onto all resources supporting the virtual functions
(either compute, or network-related) is paramount to facilitating
fast resolution times. While compute resources are typically very
well covered by debugging and profiling toolsets based on many years
of advances in software engineering, programmable network resources
are a still a novelty and tools exploiting their potential are
scarce.
2.1. Service Developer Role
We identify two dimensions of the "developer" role in software-
defined infrastructure (SDI). One dimension relates to determining
which high-level functions should be part of a particular service,
deciding what logical interconnections are needed between these
blocks and defining a set of high-level constraints or goals related
to parameters that define, for instance, a Service Function Chain.
This could be determined by the product owner for a particular family
of services offered by a telecom provider. Or, it might be a key
account representative that adapts an existing service template to
the requirements of a particular customer by adding or removing a
small number of functional entities. We refer to this person as the
Service Developer and for simplicity (access control, training on
technical background, etc.) we consider the role to be internal to
the telecom provider.
2.2. VNF Developer role
The other dimension of the "developer" role is a person that writes
the software code for a new virtual network function (VNF). Depending
Meirosu, et al. Expires January 6, 2016 [Page 5]
Internet-Draft DevOps Challenges July 2015
on the actual VNF being developed, this person might be internal or
external to the telecom provider. We refer to them as VNF Developers.
2.3. Operator role
The role of an Operator in SDI is to ensure that the deployment
processes were successful and a set of performance indicators
associated to a service are met while the service is supported on
virtual infrastructure within the domain of a telecom provider.
System integration roles are important and we intend to approach them
in a future reversion of this draft.
2.4. DevOps Principles
In line with the generic DevOps concept outlined in [DevOpsP], we
consider that these four principles as important for adapting DevOps
ideas to SDI:
* Deploy with repeatable, reliable processes: Service and VNF
Developers should be supported by automated build, orchestrate and
deploy processes that are identical in the development, test and
production environments. Such processes need to be made reliable and
trusted in the sense that they should reduce the chance of human
error and provide visibility at each stage of the process, as well as
have the possibility to enable manual interactions in certain key
stages.
* Develop and test against production-like systems: both Service
Developers and VNF Developers need to have the opportunity to verify
and debug their respective SDI code in systems that have
characteristics which are very close to the production environment
where the code is expected to be ultimately deployed. Customizations
of Service Function Chains or VNFs could thus be released frequently
to a production environment in compliance with policies set by the
Operators. Adequate isolation and protection of the services active
in the infrastructure from services being tested or debugged should
be provided by the production environment.
* Monitor and validate operational quality: Service Developers, VNF
Developers and Operators must be equipped with tools, automated as
much as possible, that enable to continuously monitor the operational
quality of the services deployed on SDI. Monitoring tools should be
complemented by tools that allow verifying and validating the
operational quality of the service in line with established
procedures which might be standardized (for example, Y.1564 Ethernet
Meirosu, et al. Expires January 6, 2016 [Page 6]
Internet-Draft DevOps Challenges July 2015
Activation [Y1564]) or defined through best practices specific to a
particular telecom operator.
* Amplify development cycle feedback loops: An integral part of the
DevOps ethos is building a cross-cultural environment that bridges
the cultural gap between the desire for continuous change by the
Developers and the demand by the Operators for stability and
reliability of the infrastructure. Feedback from customers is
collected and transmitted throughout the organization. From a
technical perspective, such cultural aspects could be addressed
through common sets of tools and APIs that are aimed at providing a
shared vocabulary for both Developers and Operators, as well as
simplifying the reproduction of problematic situations in the
development, test and operations environments.
Network operators that would like to move to agile methods to deploy
and manage their networks and services face a different environment
compared to typical software companies where simplified trust
relationships between personnel are the norm. In such companies, it
is not uncommon that the same person may be rotating between
different roles. In contrast, in a telecom service provider, there
are strong organizational boundaries between suppliers (whether in
Developer roles for network functions, or in Operator roles for
outsourced services) and the carrier's own personnel that might also
take both Developer and Operator roles. How DevOps principles reflect
on these trust relationships and to what extent initiatives such as
co-creation could transform the environment to facilitate closer Dev
and Ops integration across business boundaries is an interesting area
for business studies, but we could not for now identify a specific
technological challenge.
3. Continuous Integration
Software integration is the process of bringing together the software
component subsystems into one software system, and ensuring that the
subsystems function together as a system. Software integration can
apply regardless of the size of the software components. The
objective of Continuous Integration is to prevent integration
problems close to the expected release of a software development
project into a production (operations) environment. Continuous
Integration is therefore closely coupled with the notion of DevOps as
a mechanism to ease the transition from development to operations.
Continuous integration may result in multiple builds per day. It is
also typically used in conjunction with test driven development
Meirosu, et al. Expires January 6, 2016 [Page 7]
Internet-Draft DevOps Challenges July 2015
approaches that integrate unit testing into the build process. The
unit testing is typically automated through build servers. Such
servers may implement a variety of additional static and dynamic
tests as well as other quality control and documentation extraction
functions. The reduced cycle times of continuous enable improved
software quality by applying small efforts frequently.
Continuous Integration applies to developers of VNF as they integrate
the components that they need to deliver their VNF. The VNFs may
contain components developed by different teams within the VNF
Provider, or may integrate code developed externally - e.g. in
commercial code libraries or in open source communities.
Service providers also apply continuous integration in the
development of network services. Network services are comprised of
various aspects including VNFs and connectivity within and between
them as well as with various associated resource authorizations. The
components of the networks service are all dynamic, and largely
represented by software that must be integrated regularly to maintain
consistency. Some of the software components that Service Providers
may be sourced from VNF Providers or from open source communities.
Service Providers are increasingly motivated to engage with open
Source communities [OSandS]. Open source interfaces supported by open
source communities may be more useful than traditional paper
interface specifications. Even where Service Providers are deeply
engaged in the open source community (e.g. OPNFV) many service
providers may prefer to obtain the code through some software
provider as a business practice. Such software providers have the
same interests in software integration as other VNF providers.
4. Continuous Delivery
The practice of Continuous Delivery extends Continuous Integration by
ensuring that the software checked in on the mainline is always in a
user deployable state and enables rapid deployment by those users.
5. Stability Challenges
The dimensions, dynamicity and heterogeneity of networks are growing
continuously. Monitoring and managing the network behavior in order
to meet technical and business objectives is becoming increasingly
complicated and challenging, especially when considering the need of
predicting and taming potential instabilities.
Meirosu, et al. Expires January 6, 2016 [Page 8]
Internet-Draft DevOps Challenges July 2015
In general, instability in networks may have primary effects both
jeopardizing the performance and compromising an optimized use of
resources, even across multiple layers: in fact, instability of end-
to-end communication paths may depend both on the underlying
transport network, as well as the higher level components specific to
flow control and dynamic routing. For example, arguments for
introducing advanced flow admission control are essentially derived
from the observation that the network otherwise behaves in an
inefficient and potentially unstable manner. Even with resources over
provisioning, a network without an efficient flow admission control
has instability regions that can even lead to congestion collapse in
certain configurations. Another example is the instability which is
characteristic of any dynamically adaptive routing system. Routing
instability, which can be (informally) defined as the quick change of
network reachability and topology information, has a number of
possible origins, including problems with connections, router
failures, high levels of congestion, software configuration errors,
transient physical and data link problems, and software bugs.
As a matter of fact, the states monitored and used to implement the
different control and management functions in network nodes are
governed by several low-level configuration commands (today still
done mostly manually). Further, there are several dependencies among
these states and the logic updating the states (most of which are not
kept aligned automatically). Normally, high-level network goals (such
as the connectivity matrix, load-balancing, traffic engineering
goals, survivability requirements, etc) are translated into low-level
configuration commands (mostly manually) individually executed on the
network elements (e.g., forwarding table, packet filters, link-
scheduling weights, and queue-management parameters, as well as
tunnels and NAT mappings). Network instabilities due to configuration
errors can spread from node to node and propagate throughout the
network.
DevOps in the data center is a source of inspiration regarding how to
simplify and automate management processes for software-defined
infrastructure.
As a specific example, automated configuration functions are expected
to take the form of a "control loop" that monitors (i.e., measures)
current states of the network, performs a computation, and then
reconfigures the network. These types of functions must work
correctly even in the presence of failures, variable delays in
communicating with a distributed set of devices, and frequent changes
in network conditions. Nevertheless cascading and nesting of
automated configuration processes can lead to the emergence of non-
linear network behaviors, and as such sudden instabilities (i.e.
Meirosu, et al. Expires January 6, 2016 [Page 9]
Internet-Draft DevOps Challenges July 2015
identical local dynamic can give rise to widely different global
dynamics).
6. Consistency, Availability and Partitioning Challenges
The CAP theorem [CAP] states that any networked shared-data system
can have at most two of following three properties: 1) Consistency
(C) equivalent to having a single up-to-date copy of the data; 2)
high Availability (A) of that data (for updates); and 3) tolerance to
network Partitions (P).
Looking at a telecom SDI as a distributed computational system
(routing/forwarding packets can be seen as a computational problem),
just two of the three CAP properties will be possible at the same
time. The general idea is that 2 of the 3 have to be chosen. CP favor
consistency, AP favor availability, CA there are no partition. This
has profound implications for technologies that need to be developed
in line with the "deploy with repeatable, reliable processes"
principle for configuring SDI states. Latency or delay and
partitioning properties are closely related, and such relation
becomes more important in the case of telecom service providers where
Devs and Ops interact with widely distributed infrastructure.
Limitations of interactions between centralized management and
distributed control need to be carefully examined in such
environments. Traditionally connectivity was the main concern: C and
A was about delivering packets to destination. The features and
capabilities of SDN and NFV are changing the concerns: for example
in SDN, control plane Partitions no longer imply data plane
Partitions, so A does not imply C. In practice, CAP reflects the need
for a balance between local/distributed operations and
remote/centralized operations.
Furthermore to CAP aspects related to individual protocols,
interdependencies between CAP choices for both resources and VNFs
that are interconnected in a forwarding graph need to be considered.
This is particularly relevant for the "Monitor and Validate
Operational Quality" principle, as apart from transport protocols,
most OAM functionality is generally configured in processes that are
separated from the configuration of the monitored entities. Also,
partitioning in a monitoring plane implemented through VNFs executed
on compute resources does not necessarily mean that the dataplane of
the monitored VNF was partitioned as well.
Meirosu, et al. Expires January 6, 2016 [Page 10]
Internet-Draft DevOps Challenges July 2015
7. Observability Challenges
Monitoring algorithms need to operate in a scalable manner while
providing the specified level of observability in the network, either
for operation purposes (Ops part) or for debugging in a development
phase (Dev part). We consider the following challenges:
* Scalability - relates to the granularity of network observability,
computational efficiency, communication overhead, and strategic
placement of monitoring functions.
* Distributed operation and information exchange between monitoring
functions - monitoring functions supported by the nodes may perform
specific operations (such as aggregation or filtering) locally on the
collected data or within a defined data neighborhood and forward only
the result to a management system. Such operation may require
modifications of existing standards and development of protocols for
efficient information exchange and messaging between monitoring
functions. Different levels of granularity may need to be offered for
the data exchanged through the interfaces, depending on the Dev or
Ops role.
* Configurability and conditional observability - monitoring
functions that go beyond measuring simple metrics (such as delay, or
packet loss) require expressive monitoring annotation languages for
describing the functionality such that it can be programmed by a
controller. Monitoring algorithms implementing self-adaptive
monitoring behavior relative to local network situations may employ
such annotation languages to receive high-level objectives (KPIs
controlling tradeoffs between accuracy and measurement frequency, for
example) and conditions for varying the measurement intensity.
* Automation - includes mapping of monitoring functionality from a
logical forwarding graph to virtual or physical instances executing
in the infrastructure, as well as placement and re-placement of
monitoring functionality for required observability coverage and
configuration consistency upon updates in a dynamic network
environment.
8. Verification Challenges
Enabling ongoing verification of code is an important goal of
continuous integration as part of the data center DevOps concept. In
a telecom SDI, service definitions, decompositions and configurations
need to be expressed in machine-readable encodings. For example,
Meirosu, et al. Expires January 6, 2016 [Page 11]
Internet-Draft DevOps Challenges July 2015
configuration parameters could be expressed in terms of YANG data
models. However, the infrastructure management layers (such as
Software-Defined Network Controllers and Orchestration functions)
might not always export such machine-readable descriptions of the
runtime configuration state. In this case, the management layer
itself could be expected to include a verification process that has
the same challenges as the stand-alone verification processes we
outline later in this section. In that sense, verification can be
considered as a set of features providing gatekeeper functions to
verify both the abstract service models and the proposed resource
configuration before or right after the actual instantiation on the
infrastructure layer takes place.
A verification process can involve different layers of the network
and service architecture. Starting from a high-level verification of
the customer input (for example, a Service Graph as defined in [I-
D.unify-nfvrg-challenges]), the verification process could go more in
depth to reflect on the Service Function Chain configuration. At the
lowest layer, the verification would handle the actual set of
forwarding rules and other configuration parameters associated to a
Service Function Chain instance. This enables the verification of
more quantitative properties (e.g. compliance with resource
availability), as well as a more detailed and precise verification of
the abovementioned topological ones. Existing SDN verification tools
could be deployed in this context, but the majority of them only
operate on flow space rules commonly expressed using OpenFlow syntax.
Moreover, such verification tools were designed for networks where
the flow rules are necessary and sufficient to determine the
forwarding state. This assumption is valid in networks composed only
by network functions that forward traffic by analyzing only the
packet headers (e.g. simple routers, stateless firewalls, etc.).
Unfortunately, most of the real networks contain active network
functions, represented by middle-boxes that dynamically change the
forwarding path of a flow according to function-local algorithms and
an internal state (that is based on the received packets), e.g. load
balancers, packet marking modules and intrusion detection systems.
The existing verification tools do not consider active network
functions because they do not account for the dynamic transformation
of an internal state into the verification process.
Defining a set of verification tools that can account for active
network functions is a significant challenge. In order to perform
verification based on formal properties of the system, the internal
states of an active (virtual or not) network function would need to
be represented. Although these states would increase the verification
process complexity (e.g., using simple model checking would not be
Meirosu, et al. Expires January 6, 2016 [Page 12]
Internet-Draft DevOps Challenges July 2015
feasible due to state explosion), they help to better represent the
forwarding behavior in real networks. A way to address this challenge
is by attempting to summarize the internal state of an active network
function in a way that allows for the verification process to finish
within a reasonable time interval.
9. Troubleshooting Challenges
One of the problems brought up by the complexity introduced by NFV
and SDN is pinpointing the cause of a failure in an infrastructure
that is under continuous change. Developing an agile and low-
maintenance debugging mechanism for an architecture that is comprised
of multiple layers and discrete components is a particularly
challenging task to carry out. Verification, observability, and
probe-based tools are key to troubleshooting processes, regardless
whether they are followed by Dev or Ops personnel.
* Automated troubleshooting workflows
Failure is a frequently occurring event in network operation.
Therefore, it is crucial to monitor components of the system
periodically. Moreover, the troubleshooting system should search for
the cause automatically in the case of failure. If the system follows
a multi-layered architecture, monitoring and debugging actions should
be performed on components from the topmost layer to the bottom layer
in a chain. Likewise, the result of operations should be notified in
reverse order. In this regard, one should be able to define
monitoring and debugging actions through a common interface that
employs layer hopping logic. Besides, this interface should allow
fine-grained and automatic on-demand control for the integration of
other monitoring and verification mechanisms and tools.
* Troubleshooting with active measurement methods
Besides detecting network changes based on passively collected
information, active probes to quantify delay, network utilization and
loss rate are important to debug errors and to evaluate the
performance of network elements. While tools that are effective in
determining such conditions for particular technologies were
specified by IETF and other standardization organization, their use
requires a significant amount of manual labor in terms of both
configuration and interpretation of the results; see also Section
Error! Reference source not found.
In contrast, methods that test and debug networks systematically
based on models generated from the router configuration, router
interface tables or forwarding tables, would significantly simplify
Meirosu, et al. Expires January 6, 2016 [Page 13]
Internet-Draft DevOps Challenges July 2015
management. They could be made usable by Dev personnel that have
little expertise on diagnosing network defects. Such tools naturally
lend themselves to integration into complex troubleshooting workflows
that could be generated automatically based on the description of a
particular service chain. However, there are scalability challenges
associated with deploying such tools in a network. Some tools may
poll each networking device for the forwarding table information to
calculate the minimum number of test packets to be transmitted in the
network. Therefore, as the network size and the forwarding table size
increase, forwarding table updates for the tools may put a non-
negligible load in the network.
10. Programmable network management
The ability to automate a set of actions to be performed on the
infrastructure, be it virtual or physical, is key to productivity
increases following the application of DevOps principles. Previous
sections in this document touched on different dimensions of
programmability:
- Section 6 approached programmability in the context of developing
new capabilities for monitoring and for dynamically setting
configuration parameters of deployed monitoring functions
- Section 7 reflected on the need to determine the correctness of
actions that are to be inflicted on the infrastructure as result
of executing a set of high-level instructions
- Section 8 considered programmability in the perspective of an
interface to facilitate dynamic orchestration of troubleshooting
steps towards building workflows and for reducing the manual steps
required in troubleshooting processes
We expect that programmable network management - along the lines of
[RFC7426] - will draw more interest as we move forward. For
example,in [I-D.unify-nfvrg-challenges], the authors identify the
need for presenting programmable interfaces that accept instructions
in a standards-supported manner for the Two-way Active Measurement
Protocol (TWAMP)TWAMP protocol. More specifically, an excellent
example in this case is traffic measurements, which are extensively
used today to determine SLA adherence as well as debug and
troubleshoot pain points in service delivery. TWAMP is both widely
implemented by all established vendors and deployed by most global
operators. However, TWAMP management and control today relies solely
on diverse and proprietary tools provided by the respective vendors
Meirosu, et al. Expires January 6, 2016 [Page 14]
Internet-Draft DevOps Challenges July 2015
of the equipment. For large, virtualized, and dynamically
instantiated infrastructures where network functions are placed
according to orchestration algorithms proprietary mechanisms for
managing TWAMP measurements have severe limitations. For example,
today's TWAMP implementations are managed by vendor-specific,
typically command-line interfaces (CLI), which can be scripted on a
platform-by-platform basis. As a result, although the control and
test measurement protocols are standardized, their respective
management is not. This hinders dramatically the possibility to
integrate such deployed functionality in the SP-DevOps concept. In
this particular case, recent efforts in the IPPM WG
[I-D.cmzrjp-ippm-twamp-yang] aim to define a standard TWAMP data
model and effectively increase the programmability of TWAMP
deployments in the future.
Data center DevOps tools, such as those surveyed in [D4.1], developed
proprietary methods for describing and interacting through interfaces
with the managed infrastructure. Within certain communities, they
became de-facto standards in the same way particular CLIs became de-
facto standards for Internet professionals. Although open-source
components and a strong community involvement exists, the diversity
of the new languages and interfaces creates a burden for both vendors
in terms of choosing which ones to prioritize for support, and then
developing the functionality and operators that determine what fits
best for the requirements of their systems.
11. DevOps Performance Metrics
Defining a set of metrics that are used as performance indicators is
important for service providers to ensure the successful deployment
and operation of a service in the software-defined telecom
infrastructure.
We identify three types of considerations that are particularly
relevant for these metrics: 1) technical considerations directly
related to the service provided, 2) process-related considerations
regarding the deployment, maintenance and troubleshooting of the
service, i.e. concerning the operation of VNFs, and 3) cost-related
considerations associated to the benefits from using a Software-
Defined Telecom Infrastructure.
First, technical performance metrics shall be service-dependent/-
oriented and may address inter-alia service performance in terms of
delay, throughput, congestion, energy consumption, availability, etc.
Acceptable performance levels should be mapped to SLAs and the
Meirosu, et al. Expires January 6, 2016 [Page 15]
Internet-Draft DevOps Challenges July 2015
requirements of the service users. Metrics in this category were
defined in IETF working groups and other standardization
organizations with responsibility over particular service or
infrastructure descriptions.
Second, process-related metrics shall serve a wider perspective in
the sense that they shall be applicable for multiple types of
services. For instance, process-related metrics may include: number
of probes for end-to-end QoS monitoring, number of on-site
interventions, number of unused alarms, number of configuration
mistakes, incident/trouble delay resolution, delay between service
order and deliver, or number of self-care operations.
Third, cost-related metrics shall be used to monitor and assess the
benefit of employing SDI compared to the usage of legacy hardware
infrastructure with respect to operational costs, e.g. possible man-
hours reductions, elimination of deployment and configuration
mistakes, etc.
Finally, identifying a number of highly relevant metrics for DevOps
and especially monitoring and measuring them is highly challenging
because of the amount and availability of data sources that could be
aggregated within one such metric, e.g. calculation of human
intervention, or secret aspects of costs.
12. Security Considerations
TBD
13. IANA Considerations
This memo includes no request to IANA.
14. Informative References
[NFVMANO] ETSI, "Network Function Virtualization (NFV) Management
and Orchestration V0.6.1 (draft)", Jul. 2014
Meirosu, et al. Expires January 6, 2016 [Page 16]
Internet-Draft DevOps Challenges July 2015
[I-D.aldrin-sfc-oam-framework] S. Aldrin, R. Pignataro, N. Akiya.
"Service Function Chaining Operations, Administration and
Maintenance Framework", draft-aldrin-sfc-oam-framework-01,
(work in progress), July 2014.
[I-D.lee-sfc-verification] S. Lee and M. Shin. "Service Function
Chaining Verification", draft-lee-sfc-verification-00,
(work in progress), February 2014.
[RFC7426] E. Haleplidis (Ed.), K. Pentikousis (Ed.), S. Denazis, J.
Hadi Salim, D. Meyer, and O. Koufopavlou, "Software Defined
Networking (SDN): Layers and Architecture Terminology",
RFC 7426, January 2015
[RFC7149] M. Boucadair and C Jaquenet. "Software-Defined Networking:
A Perspective from within a Service Provider Environment",
RFC 7149, March 2014.
[TR228] TMForum Gap Analysis Related to MANO Work. TR228, May 2014
[I-D.unify-nfvrg-challenges] R. Szabo et al. "Unifying Carrier and
Cloud Networks: Problem Statement and Challenges", draft-
unify-nfvrg-challenges-02 (work in progress), July 2015
[I-D.cmzrjp-ippm-twamp-yang] Civil, R., Morton, A., Zheng, L.,
Rahman, R., Jethanandani, M., and K. Pentikousis, "Two-Way
Active Measurement Protocol (TWAMP) Data Model", draft-
cmzrjp-ippm-twamp-yang-01 (work in progress), July 2015.
[D4.1] W. John et al. D4.1 Initial requirements for the SP-DevOps
concept, universal node capabilities and proposed tools,
August 2014.
[SDNsurvey] D. Kreutz, F. M. V. Ramos, P. Verissimo, C. Esteve
Rothenberg, S. Azodolmolky, S. Uhlig. "Software-Defined
Networking: A Comprehensive Survey." To appear in
proceedings of the IEEE, 2015.
[DevOpsP] "DevOps, the IBM Approach" 2013. [Online].
[Y1564] ITU-R Recommendation Y.1564: Ethernet service activation
test methodology, March 2011
[CAP] E. Brewer, "CAP twelve years later: How the "rules" have
changed", IEEE Computer, vol.45, no.2, pp.23,29, Feb. 2012.
Meirosu, et al. Expires January 6, 2016 [Page 17]
Internet-Draft DevOps Challenges July 2015
[H2014] N. Handigol, B. Heller, V. Jeyakumar, D. Mazieres, N.
McKeown; "I Know What Your Packet Did Last Hop: Using
Packet Histories to Troubleshoot Networks", In Proceedings
of the 11th USENIX Symposium on Networked Systems Design
and Implementation (NSDI 14), pp.71-95
[W2011] A. Wundsam, D. Levin, S. Seetharaman, A. Feldmann;
"OFRewind: Enabling Record and Replay Troubleshooting for
Networks". In Proceedings of the Usenix Anual Technical
Conference (Usenix ATC '11), pp 327-340
[S2010] E. Al-Shaer and S. Al-Haj. "FlowChecker: configuration
analysis and verification of federated Openflow
infrastructures" In Proceedings of the 3rd ACM workshop on
Assurable and usable security configuration (SafeConfig
'10). Pp. 37-44
[OSandS] S. Wright, D. Druta, "Open Source and Standards: The Role
of Open Source in the Dialogue between Research and
Standardization" Globecom Workshops (GC Wkshps), 2014 ,
pp.650,655, 8-12 Dec. 2014
15. Acknowledgments
The research leading to these results has received funding from the
European Union Seventh Framework Programme FP7/2007-2013 under grant
agreement no. 619609 - the UNIFY project. The views expressed here
are those of the authors only. The European Commission is not liable
for any use that may be made of the information in this document.
We would like to thank in particular the UNIFY WP4 contributors, the
internal reviewers of the UNIFY WP4 deliverables, and Wolfgang John
from Ericsson for the useful discussions and insightful comments.
This document was prepared using 2-Word-v2.0.template.dot.
Meirosu, et al. Expires January 6, 2016 [Page 18]
Internet-Draft DevOps Challenges July 2015
Authors' Addresses
Catalin Meirosu
Ericsson Research
S-16480 Stockholm, Sweden
Email: catalin.meirosu@ericsson.com
Antonio Manzalini
Telecom Italia
Via Reiss Romoli, 274
10148 - Torino, Italy
Email: antonio.manzalini@telecomitalia.it
Juhoon Kim
Deutsche Telekom AG
Winterfeldtstr. 21
10781 Berlin, Germany
Email: J.Kim@telekom.de
Rebecca Steinert
SICS Swedish ICT AB
Box 1263, SE-16429 Kista, Sweden
Email: rebste@sics.se
Sachin Sharma
Ghent University-iMinds
Research group IBCN - Department of Information Technology
Zuiderpoort Office Park, Blok C0
Gaston Crommenlaan 8 bus 201
B-9050 Gent, Belgium
Email: sachin.sharma@intec.ugent.be
Guido Marchetto
Politecnico di Torino
Corso Duca degli Abruzzi 24
10129 - Torino, Italy
Email: guido.marchetto@polito.it
Ioanna Papafili
Hellenic Telecommunications Organization
Measurements and Wireless Technologies Section
Laboratories and New Technologies Division
2, Spartis & Pelika str., Maroussi,
GR-15122, Attica, Greece
Buidling E, Office 102
Meirosu, et al. Expires January 6, 2016 [Page 19]
Internet-Draft DevOps Challenges July 2015
Email: iopapafi@oteresearch.gr
Kostas Pentikousis
EICT GmbH
Torgauer Strasse 12-15
Berlin 10829
Germany
Email: k.pentikousis@eict.de
Steven Wright
AT&T Services Inc.
1057 Lenox Park Blvd NE, STE 4D28
Atlanta, GA 30319
USA
Email: sw3588@att.com
Meirosu, et al. Expires January 6, 2016 [Page 20]