Problem Statement for Reliable Virtualized Network Function (VNF)
draft-zong-vnfpool-problem-statement-01
The information below is for an old version of the document.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Ning Zong , Linda Dunbar , Melinda Shore | ||
| Last updated | 2013-09-05 | ||
| Stream | (None) | ||
| Formats | plain text htmlized pdfized bibtex | ||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-zong-vnfpool-problem-statement-01
Network Working Group N. Zong
Internet-Draft L. Dunbar
Intended status: Informational Huawei Technologies
Expires: March 10, 2014 M. Shore
No Mountain Software
September 06, 2013
Problem Statement for Reliable Virtualized Network Function (VNF)
draft-zong-vnfpool-problem-statement-01
Abstract
Virtualization technology has been widely supported by both Network
Operators and Data Center providers to provide service with reduced
operational and capital costs, automated deployment, and enhanced
elasticity. A challenge is how to achieve the reliability and high
availability capabilities of the virtualized network function to
facilitate reliable service.
This document focuses on the problems related to the reliability and
high availability aspects of virtualized network function. A
discussion of reliable virtualized network function pools is
presented for scoping solution purpose. Some related works together
with potential reuse and extension are also introduced.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 10, 2014.
Zong, et al. Expires March 10, 2014 [Page 1]
Internet-Draft Reliable VNF September 2013
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. VNF Instance Selection and Status Monitoring . . . . . . 6
3.2. Backup Selection and Announcement . . . . . . . . . . . . 6
3.3. Service State Synchronization . . . . . . . . . . . . . . 6
3.4. Transition Handling . . . . . . . . . . . . . . . . . . . 6
3.5. Policy Enforcement . . . . . . . . . . . . . . . . . . . 6
4. Working Scope . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. Reference Architecture . . . . . . . . . . . . . . . . . 7
4.2. Proposed Working Scope . . . . . . . . . . . . . . . . . 8
5. Related Works . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Reliable Server Pool . . . . . . . . . . . . . . . . . . 9
5.2. Virtual Router Redundancy Protocol . . . . . . . . . . . 10
5.3. VNF Forwarding Graph . . . . . . . . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . . . 11
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.1. Normative References . . . . . . . . . . . . . . . . . . 12
9.2. Informative References . . . . . . . . . . . . . . . . . 12
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Background
Network functions such as firewall (FW), Deep Packet Inspection
(DPI), Load Balancer (LB), WAN Optimization are conventionally
deployed as a set of dedicated devices in both Network Operators'
network and Data Center (DC) network, as the building blocks of the
services. In recent years, virtualization technology, from server
Zong, et al. Expires March 10, 2014 [Page 2]
Internet-Draft Reliable VNF September 2013
virtualization, network virtualization, to network function
virtualization, is getting wider industry adoption by both Network
Operators and DC providers, to achieve elastic service offering and
reduced operational cost [NFV-WP]. The European Telecommunications
Standards Institute (ETSI) has launched an Industry Specification
Group (ISG) to study the use cases, requirements and architecture of
Network Function Virtualization (NFV) from Network Operators'
perspective.
Building upon general purpose servers, instances of Virtualized
Network Function (VNF) can be placed into various locations including
DC networks, Network Operator networks and even customer premises.
Furthermore, there are potentially more factors that cause VNF
instance transition (e.g. scaling, migration) or even failure, such
as resource contention among instances, hardware status change, and
hardware/software failure at various levels. Therefore, a major
challenge is how to achieve the reliability and high availability
capabilities of the VNF under highly distributed and dynamic
conditions of the VNF instances. For example, the Reliability and
Availability Working Group (RELAV WG) of ETSI ISG NFV targets on
identifying the resiliency problems and requirements to the services
provided by Network Operators [NFV-REL]. An overview of VNF use
cases focus on the reliability and high availability issues can be
found in [VNFP-UC].
In this document, we first overview problems related to the
reliability and high availability aspects of VNF. We then present an
applicable architecture of reliable VNF pools to scope potential
solutions. Finally, we refer to some related works for potential
reuse and extension of existing approaches.
2. Terminology
Reliability and High Availability: capability of a functional entity
to consistently provide function under various dynamic and even
unexpected conditions such as fault, overload, etc.
Virtualized Network Function (VNF): a VNF provides the same
functional behavior and interfaces as the equivalent network
function, but is deployed as software instances building on top of a
virtualization platform [NFV-TERM].
VNF Pool: a group of VNF instances providing same network function.
Pool Element (PE): a VNF instance inside a VNF pool.
Pool User (PU): an entity that requests network function provided by
VNF pool.
Zong, et al. Expires March 10, 2014 [Page 3]
Internet-Draft Reliable VNF September 2013
Pool Manager (PM): an entity that manages pool elements, and
interacts with pool user to provide network function.
3. Problems
Many network services require multiple network functions to be
performed sequentially on data packets. A traditional model for
multi-tier service is shown as below, where for each network
function, all instances connect to the corresponding entrance point
(e.g. LB) responsible for sending/receiving data packets to/from
selected instance(s), and steering the data packets between different
network functions.
Service (e.g. VOIP, Web)
+--------------+ +--------------+ +--------------+
| function#1 | | function#2 | | function#n |
| +----------+ | | +----------+ | | +----------+ |
| | Instance | | | | Instance | |... ...| | Instance | |
| +----------+ | | +----------+ | | +----------+ |
| |data | | |data | | |data |
| |conn | | |conn | | |conn |
| +----------+ | | +----------+ | | +----------+ |
| | Entrance | | | | Entrance | | | | Entrance | |
| | Point | | | | Point | | | | Point | |
| +----------+ | | +----------+ | | +----------+ |
+-----+--------+ +-------+------+ +-------+------+
|data conn |data conn |
+-------------------+----------------------+
Figure 1: Multi-tier Service.
Such model works well when all instances of the same network function
are topologically close to each other. However, VNF instances are
highly distributed in DC networks, Network Operator networks and even
customer premises. When VNF instances are topologically far from
each other, there could be many network links/nodes between an
instance and the corresponding entrance point for steering the data
packets. For two VNF instances of different network functions, it is
possible that they are on the same physical server, but the entrance
points are many links/nodes away. To improve network efficiency, it
is desirable to establish direct data connections between VNF
instances, as shown below.
Service (e.g. VOIP, Web)
+----------+ +----------+ +----------+
| VNF#1 | data conn | VNF#2 | data conn | VNF#n |
| Instance |-----------| Instance |- ... ... -| Instance |
Zong, et al. Expires March 10, 2014 [Page 4]
Internet-Draft Reliable VNF September 2013
+----------+ +----------+ +----------+
^
| Virtualization
+--------------------------------------------------------+
| Virtualization Platform |
+--------------------------------------------------------+
Figure 2: VNF Instances Direct Connection.
Many of today's dedicated network devices have built-in failure
protection and recovery mechanisms for reliability and high
availability. However, VNF instances are software instances running
on general purpose servers via virtualization platform. There are
potentially more factors that cause VNF instance transition or even
failure, such as:
1) hardware failure;
2) hardware status change such as server over-utilization, network
congestion;
3) software failure at various levels including hypervisor,
Virtual Machine (VM), VNF instance;
4) performance downgrade due to resource contention between VNF
instances;
5) instance migration due to server consolidation, or
configuration change due to policy.
Generally, VNF instance transition refers to the actions taken to
address varying condition of hardware, software or configuration.
Transition could be scaling in/out or scale up/down of an instance.
Transition could be replacing an instance in same location, or moving
instance to another location.
Therefore, the major challenge is how to achieve reliability and high
availability capabilities of the VNF during VNF instance transition
or failure in the model of VNF instances direct connection. The
potential issues to be addressed are described in the following
subsections.
Zong, et al. Expires March 10, 2014 [Page 5]
Internet-Draft Reliable VNF September 2013
3.1. VNF Instance Selection and Status Monitoring
One basic goal of reliable VNF is to select a suitable VNF instance
from a group of candidates and replace the VNF instance in case of
instance failure. The issues are:
1) Who is responsible and how to select the VNF instance?
2) Who is responsible and how to monitor the status of instance?
3.2. Backup Selection and Announcement
Before a VNF instance fails, one or more backup instances of the same
network function have to be selected and notified to the directly
connected instances in the adjacent VNFs. The issues are:
1) Who is responsible and how to select the backup instances, as
well as announce the backup instances?
2) How to deal with the backup instance transition or failure?
3.3. Service State Synchronization
The service state of the VNF instance should also be synchronized
between the VNF instance and its backup instances for stateful
network function. The issues are:
1) Who is responsible and how to collect/keep the service state of
the instance?
2) How to synchronize service state with backup instances?
3.4. Transition Handling
It is important to maintain the service reliability during VNF
instance transition. The issues are:
1) Who is responsible and how to notify the VNF instance
transition to the directly connected instances in the adjacent
VNFs?
2) How to re-establish the network connection and session between
a new VNF instance and the directly connected instances with an
acceptable level of service continuity?
3.5. Policy Enforcement
Zong, et al. Expires March 10, 2014 [Page 6]
Internet-Draft Reliable VNF September 2013
There could be some policies reflecting the different reliability
class of the service and hence affecting the selection of VNF
instances. Examples would include isolation policies requiring that
VNF instances be placed on separate physical servers or separate DC
sites. Another example is to place some VNF instances in
topologically closed locations. The issues are:
1) Who is responsible for receiving and enforcing the policy?
2) Who is responsible and how to collect enough information (e.g.
network topology, view of instances distribution) to enforce the
policy?
4. Working Scope
Reliability and high availability aspects of VNF fall within a
broader problem space than has been identified in this draft. The
current focus of our work is to develop tools to improve VNF
reliability and availability, which will be applicable to a wide
range of services.
4.1. Reference Architecture
There are a number of existing technologies for providing reliable
and highly available functions, such as Reliable Server Pool
(RSerPool) [RFC5351], Virtual Router Redundancy Protocol (VRRP)
[RFC5798]. Although these technologies are applicable to different
scenarios using different protocols, the underlying idea is similar.
Both technologies provide service with an abstract object (e.g. pool
handle in RSerPool, Virtual Router ID in VRRP) to represent a group
of functional instances where the dynamic mapping of abstract object
to actual serving instance, or the selection of serving instance, is
managed internally to the group to cover failover procedure. The
advantage of this model is to provide service a reliable and highly
available function in a manner that transparent to both end-hosts and
other service sub-systems.
Based on the above mentioned model, we could derive a reference
architecture for reliable and highly available VNF called reliable
VNF pools, which is illustrated as below. Note that we reuse the
term of pool to represent a group of VNF instances without loss of
generality.
+-----------------+
| Pool User |
+-----------------+
^ ^
| (2) |
Zong, et al. Expires March 10, 2014 [Page 7]
Internet-Draft Reliable VNF September 2013
+-----------+ +-----------+
| |
v v
+--------------+ (4) +--------------+
| Pool Manager |<-------------->| Pool Manager |
+--------------+ +--------------+
^ ^
| (1) |
v v
+------------------------------+ +------------------------------+
|+----------+ +----------+ | | +----------+ +----------+|
|| VNF#1 | | VNF#1 | |(3)| | VNF#2 | | VNF#2 ||
|| Instance | ... | Instance |<+---+>| Instance | ... | Instance ||
|+----------+ +----------+ | | +----------+ +----------+|
| VNF#1 Pool | | VNF#2 Pool |
+------------------------------+ +------------------------------+
Figure 3: Reliable VNF Pools.
In the architecture of reliable VNF pools, there are multiple VNF
pools, each containing a group of VNF instances providing the same
network function. Each VNF pool has a Pool Manager (PM) to manage an
abstract object of the VNF pool including VNF identity (e.g. "FW",
"DPI") and the associated addresses of the VNF instances. A Pool
User (PU) could be either an application end-host or a service sub-
system (e.g. orchestrator in DC service) requesting network function
from the PM. PM could also be a designated VNF instance in the VNF
pool in the case that VNF instances are self-organized to select
designated instance. In such case, the PM itself provides network
function to the PU as well.
4.2. Proposed Working Scope
Based on the reference architecture of reliable VNF pools, we believe
the following design goals or working scope need to be considered as
aspects of providing reliable and highly available VNF.
1) The communication between VNF instances and the responsible PM
to transmit messages such as VNF Instance Selection, Status
Monitoring, Service State Synchronization;
2) The communication between PU and PM to address issues like
Policy Enforcement, VNF Instance query/response;
3) The communication between VNF instances in different VNF pools
to transmit messages such as Backup Announcement, Service State
Synchronization, Transition Handling;
Zong, et al. Expires March 10, 2014 [Page 8]
Internet-Draft Reliable VNF September 2013
4) The communication between different PMs to achieve fault-
tolerance of PMs themselves including pool information redundancy.
The main purpose of this section is to scope the solution space. The
proposed solution will be addressed by separate draft.
5. Related Works
In this section, we refer to some related work for potential reuse
and extension.
5.1. Reliable Server Pool
Reliable Server Pool (RSerPool) supports high availability and the
scalability of applications through the use of pools of servers
[RFC5351]. The main functions of RSerPool involve server pool
management, as well as receiving requests from a client to bind to a
desired server. The main protocols developed by RSerPool are called
Aggregate Server Access Protocol (ASAP) [RFC5352] and Endpoint
Handlespace Redundancy Protocol (ENRP) [RFC5353]. The architecture
of RSerPool is shown as below.
+--------------+
| Pool User |
+--------------+
^
| ASAP
V
+--------------+ ENRP +--------------+
| ENRP Server |<-------->| ENRP Server |
+--------------+ +--------------+
^
| ASAP
V
+--------------------------------------------------+
| +----------+ +----------+ +----------+ |
| | PE | | PE | ... ... | PE | |
| +----------+ +----------+ +----------+ |
| Server Pool |
+--------------------------------------------------+
Figure 4: Reliable Server Pool.
The similarity and applicability of RSerPool to reliable VNF pools
includes:
Zong, et al. Expires March 10, 2014 [Page 9]
Internet-Draft Reliable VNF September 2013
1) Pool Elements (PEs) can be regarded as VNF instances, and an
ENRP Server can be regarded as a PM;
2) ASAP could be applicable to reliable VNF pools for VNF instance
management, policy enforcement, backup announcement, service state
synchronization, transition handling, and so on;
3) ENRP could be applicable to reliable VNF pools for fault-
tolerant PM.
Potential extension to meet the full objective of reliable VNF pools
includes:
1) Extend ASAP to support more policy enforcement such as failure
isolation;
2) Extend ASAP to support more efficient instance transition.
5.2. Virtual Router Redundancy Protocol
Virtual Router Redundancy Protocol (VRRP) specifies an election
protocol that dynamically assigns responsibility for a virtual router
to one of the VRRP routers (i.e. Master) on a LAN [RFC5798]. The
election process provides dynamic failover in the forwarding
responsibility should the Master become unavailable. The advantage
of VRRP is a higher availability default path without requiring
configuration of dynamic routing or router discovery protocols on
every end-host. An example is shown as below.
+---------------+ +---------------+
| VRRP Router#1 | | VRRP Rouer#2 |
|(Master, IP A) | |(Backup, IP B) |
| | | |
VRID=1 +---------------+ +---------------+
| |
---------+-----------------------+---------
^
|
(IP A)
|
+--+--+
| Host|
+-----+
Figure 5: Virtual Router (VRID=1)
Zong, et al. Expires March 10, 2014 [Page 10]
Internet-Draft Reliable VNF September 2013
The similarity and applicability of VRRP to reliable VNF pools
includes:
1) VRRP Routers can be regarded as VNF instances;
2) The Master advertisement and transition between Master and
Backup procedure can be a part of the function of PM as the
designated VNF instance in the VNF pool.
The gap between VRRP and reliable VNF pools includes:
1) In VRRP, the loss of the master is infrequent, while in
reliable VNF pools the more frequent transition of VNF instances
means that failover efficiency is a more pressing concern;
2) There is no policy enforcement related to reliability in VRRP.
5.3. VNF Forwarding Graph
VNF forwarding graph (a.k.a. service chain in a wider sense) defines
the sequence of VNF instances that a user session must traverse [NFV-
UC]. An example of a VNF forwarding graph is a topology in which
user packets traverse a sequence of VNF instances of Intrusion
Detection Service (IDS), FW, Network Address Translation (NAT) and
LB. Different services have different VNF forwarding graphs based on
specific user needs and therefore service logic.
The VNF forwarding graph and reliable VNF pools are independent but
complementary with each other in the following ways:
1) The VNF forwarding graph determines the sequential relation
between VNF instances, while reliable VNF pools maintains the
reliability and high availability of VNF.
6. Security Considerations
Any technology which allows the insertion, deletion, reordering, or
manipulation of network functions has the potential to be subverted
by an attacker, with serious consequences. Distributed VNFs
introduce an additional attack vector, in which bad actors join
several VNFs of a service. Replay attacks have the potential to
create denials of service, reordering, adding, or removing VNFs. VNF
reliability technologies must provide cryptographic protections
against spoofing and insertion attacks as well as replay attacks, in
the form of client authentication, origin authentication on VNF
reliability management (control plane) traffic, and replay
protections. There may be circumstances under which an attacker
masquerading as a VNF manager can introduce data leakage or similar
Zong, et al. Expires March 10, 2014 [Page 11]
Internet-Draft Reliable VNF September 2013
attacks, and consequently server authentication would be required, as
well.
7. IANA Considerations
This document has no actions for IANA.
8. Acknowledgements
The authors would like to thank Daniel King from Lancaster
University, UK for the valuable comments to this draft.
9. References
9.1. Normative References
TBD.
9.2. Informative References
[NFV-WP] NFV Whitepaper: "Network Function Virtualization", issue 1,
2012, http://portal.etsi.org/NFV/NFV_White_Paper.pdf.
[NFV-REL] ETSI GS NFV REL 001: "Network Function Virtualization;
Resiliency Requirements", Version 0.0.1, 2013.
[VNFP-UC] L. Xia, Q. Wu and D. King, "Use cases and Requirements for
Virtual Service Node Pool Management", draft-xia-vsnpool-management-
use-case-01, August 2013.
[NFV-TERM] ETSI GS NFV 003: "Terminology for Main Conceptional
Entities in NFV", Version 0.0.4, 2013.
[RFC5351] P. Lei, L. Ong, M. Tuexen and T. Dreibholz, "An Overview of
Reliable Server Pooling Protocols", RFC5351, September 2008.
[RFC5352] R. Stewart, Q. Xie, M. Stillman and M. Tuexen, "Aggregate
Server Access Protocol (ASAP)", RFC5352, September 2008.
[RFC5353] Q. Xie, R. Stewart, M. Stillman, M. Tuexen and A.
Silverton, "Endpoint Handlespace Redundancy Protocol (ENRP)",
RFC5353, September 2008.
[RFC5798] S.Nadas, "Virtual Router Redundancy Protocol (VRRP) Version
3 for IPv4 and IPv6", RFC5798, March 2010.
[NFV-UC] ETSI GS NFV 001: "Network Function Virtualization; Use
Cases", Version 0.0.2, 2013.
Zong, et al. Expires March 10, 2014 [Page 12]
Internet-Draft Reliable VNF September 2013
10. References
Authors' Addresses
Ning Zong
Huawei Technologies
Email: zongning@huawei.com
Linda Dunbar
Huawei Technologies
Email: linda.dunbar@huawei.com
Melinda Shore
No Mountain Software
Email: melinda.shore@nomountain.net
Zong, et al. Expires March 10, 2014 [Page 13]