Network Working Group E. Harjula
Internet-Draft M. Ylianttila
Intended status: Informational University of Oulu
Expires: September 9, 2010 March 8, 2010
Load balancing models for DHT-based Peer-to-Peer Networks
draft-harjula-p2psip-loadbalancing-survey-01
Abstract
Load balancing in structured (DHT-based) Peer-to-Peer (P2P) systems
has been thoroughly studied during the past years. This draft
provides a review of the existing and proposed load balancing
mechanisms for structured P2P systems, categorizes them to four
fundamental load balancing models, and introduces their main
characteristics.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 9, 2010.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
Harjula & Ylianttila Expires September 9, 2010 [Page 1]
Internet-Draft DHT Load Balancing Models March 2010
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Fundamental load balancing models . . . . . . . . . . . . . . 4
3.1. Virtual servers . . . . . . . . . . . . . . . . . . . . . 4
3.2. Controlling the object location . . . . . . . . . . . . . 5
3.3. Controlling the node location . . . . . . . . . . . . . . 5
3.4. Address space balancing . . . . . . . . . . . . . . . . . 6
3.5. Other mechanisms . . . . . . . . . . . . . . . . . . . . . 6
4. Comparison of the models . . . . . . . . . . . . . . . . . . . 6
4.1. Virtual servers . . . . . . . . . . . . . . . . . . . . . 6
4.2. Controlling the object location . . . . . . . . . . . . . 7
4.3. Controlling the node location . . . . . . . . . . . . . . 8
4.4. Address space balancing . . . . . . . . . . . . . . . . . 8
4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Future work . . . . . . . . . . . . . . . . . . . . . . . . . 10
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
8. Security Considerations . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1. Normative References . . . . . . . . . . . . . . . . . . . 10
9.2. Informative References . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
Harjula & Ylianttila Expires September 9, 2010 [Page 2]
Internet-Draft DHT Load Balancing Models March 2010
1. Introduction
Load balancing in P2P has special aspects and requirements that
distinguish it from load balancing in distributed server systems. In
most of the dedicated client/server systems it is enough to prevent
the load within a node to grow in such extent that the node would
fail in performing its tasks. However, in P2P systems the
requirements for the fairness of load distribution are higher since
P2P nodes are not usually dedicated to certain tasks. Instead, the
P2P software usually runs on the background while the user of the
device is doing other tasks with the node. This emphasizes the
importance of fair load distribution between the nodes in a P2P
system.
The network management and routing in structured P2P systems are
based on a collaborative effort of the participating nodes instead of
centralized control. Structured P2P networks were developed to
improve routing efficiency and success rate, and thus to decrease the
communication overhead [P2PSURVEY]. In structured P2P networks, the
overlay uses specialized algorithms for generating topologies for the
nodes participating the system. The overlay has specific control for
the routing between the nodes, and the overlay controls the content
placement.
P2P systems usually consist of two distinguishable functions: 1)
Search and discovery of nodes, content, or services (also called as
signaling part), and 2) Transferring content or information between
the nodes in either realtime or non-realtime manner (also called as
data- or media part). The load generated by the signaling part is
called signaling load, and the load generated by the data part is
called data transfer load. In the basic P2P scenario, the content is
moved directly between the nodes, and thus the content transfer
generates load only to the source and target node. In addition, most
of the systems allows the content to be stored outside the node that
is actually sharing it, enabling e.g. content 4 sharing while the
sharing node is offline. In the case of realtime content, the media
cannot be split in similar manner to third party nodes. However,
realtime streams may be routed through one or more third party nodes.
The reason for using third party streaming relays may be e.g. the
restrictions of the access network of source or target node.
This draft provides a survey of the existing and emerging load
balancing technologies in structured P2P networking. The presented
overlay load balancing models aim to balance both the signaling and
data transfer load by affecting the node's responsibilities in the
overlay, such as the address space or the number of objects an
overlay node is responsible for. As the issues concerning realtime
media transfers are outside the scope of P2PSIP WG, the load
Harjula & Ylianttila Expires September 9, 2010 [Page 3]
Internet-Draft DHT Load Balancing Models March 2010
balancing models focusing specifically to streaming between the end
nodes are excluded from this draft. The main purpose of this draft
is to provide a scientific background for defining which of the
available load balancing models are the most suitable for P2PSIP
environment.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Load: The stress caused to a certain node in the network, concerning
e.g. the node's computational, network, memory, storage, or battery
capacity.
Signaling load: The load caused by the overlay maintenance and
payload signalling.
Data transfer load: The load caused by the overlay data transfers,
including storing, retrieving, relocation and replication of data.
Overload: A computing entity can be interpreted as overloaded if its
functionality, performance, reliability, or usability substantially
suffers from the load.
Load balancing: A technique to distribute workload fairly across two
or more networked nodes, in order to attain optimal resource
utilization, maximize throughput, minimize response time, and avoid
overload of any node.
3. Fundamental load balancing models
This section presents the fundamental overlay load balancing models
that aim to balance the signaling and data transfer load by affecting
the node's responsibilities in the overlay. Note that several
proposed mechanisms may fall under the same fundamental model, as
well as one proposal may contain 4 elements from more than one
fundamental models.
3.1. Virtual servers
During the past years, a great deal of research has been conducted to
improve the load balancing in structured P2P networks. Many of the
proposed load balancing mechanisms are based on the concept of
virtual servers [CHORD], where single physical node may host several
Harjula & Ylianttila Expires September 9, 2010 [Page 4]
Internet-Draft DHT Load Balancing Models March 2010
virtual nodes (virtual servers) that are all participating
independently the same overlay. The basic concept of multiple
virtual servers per physical node balances the load statistically.
Fundamentally, the model also enables resource-aware load balancing,
since the number of virtual servers allocated per node can vary. In
dynamic virtual server load balancing [VIRTUALSERVER_1],
[VIRTUALSERVER_2], the load between the physical nodes is balanced by
dynamically controlling the number of virtual servers per physical
node (i.e. moving them between physical nodes during runtime).
3.2. Controlling the object location
Another fundamental load balancing concept is to directly control the
object location on the overlay. Power of Two Choices [POWEROF2], is
based on the concept of using multiple hash functions per object.
When inserting an object to an overlay, all respective hash values
are computed and the load information of the corresponding peer
candidates is retrieved. Finally, the object is stored on the peer
with the lightest load. Searching the object can be implemented by;
1) sending separate lookups using each respective hash values, or 2)
using redirection pointers between the corresponding peer candidates,
and sending single lookup using some of the hash functions (if the
peer receiving the lookup does not contain the object, it forwards
the request to the containing peer, using the redirection pointer).
3.3. Controlling the node location
Controlling the node location on the overlay can as well be used for
load balancing. Node migration, proposed by Karger&Ruhl
[SIMPLELOADBAL], is a mechanism allowing underloaded nodes to migrate
to portions of the address space occupied by too many data items, to
share the load of the node responsible for the overloaded address
space. In this model, an underloaded node queries the load of a
randomly chosen remote node and makes a pairwise load comparison
between itself and the remote node. If the load of the remote node
is higher, the node relocates to share the address space of the
remote node.
Rieche et al [HEATDISP], presented another load balancing model,
which is fundamentally based on moving objects between the overlay
nodes. In this model, the address space is divided to intervals,
containing f nodes, where f is a fixed minimum number of nodes
assigned to an interval. The nodes within the interval are
collectively responsible for all of the objects stored in the
interval. If the interval gets overloaded,; 1) the overloaded
interval can be split to two intervals if the interval contains more
than 2f nodes, 2) move nodes from other intervals to the overloaded
interval to reach 2f nodes in the interval and split it (as in the
Harjula & Ylianttila Expires September 9, 2010 [Page 5]
Internet-Draft DHT Load Balancing Models March 2010
first case), or 3) move the border between the overloaded interval
and its successor or predecessor to balance the load between them.
3.4. Address space balancing
Yet another fundamental load balancing model is address space
balancing between the overlay nodes. The concept was introduced in
[SIMPLELOADBAL], together with the previously introduced object
location controlling model. The main idea in the address space
balancing is that each node has a fixed set of O(log N) possible
locations, called virtual nodes, on the overlay, calculated using
different hash algorithms (notice the difference in the terms between
virtual node and virtual server). Each node has only one of these
virtual nodes active at any time. The virtual node that spans the
smallest address space between the real node owning it and the
succeeding real node, is selected as the active node. The selection
is made occasionally. With high probability, each node will be
responsible for O(1/N) fraction of the address space.
3.5. Other mechanisms
Several improvements have been developed for most of the presented
fundamental load balancing models, offering e.g. better performance,
lower overhead or locality awareness. However, as they do not
significantly change the fundamental working principles, they do not
form separate load balancing categories.
4. Comparison of the models
Common goal for each of the presented fundamental load balancing
mechanisms is to achieve fair load distribution between the overlay
nodes without generating too much overhead to the overlay. However,
there are significant differences in both the quality of load
distribution and the generated overhead. The following subsections
briefly discuss the benefits and drawbacks of each of the presented
fundamental mechanisms.
4.1. Virtual servers
In its basic form, the virtual server model affects the load balance
statistically. The situation can be compared to throwing a die once
versus throwing it n times and calculating their average result. The
heterogeneity between the nodes can be addressed by allocating more
virtual servers to higher-capacity devices and less virtual servers
to lower-capacity devices. Dynamic load variations can be addressed
by a mechanism that allows moving virtual servers (and the objects
they are responsible for) between the nodes during runtime.
Harjula & Ylianttila Expires September 9, 2010 [Page 6]
Internet-Draft DHT Load Balancing Models March 2010
The general problem with the virtual server model is the signaling
overhead it generates. As a physical node appears as multiple
virtual nodes (virtual servers) in the overlay, the overlay size
grows with O(N) relation to the number of virtual servers per
physical device. The problem is emphasized with structured P2P
networks, due to their dependence on network management signaling,
which grows with the overlay size. With some optimizations, such as
sending only one request message per physical node and keeping shared
node-specific routing tables (instead of virtual server-specific),
the signaling overhead does not grow with the same proportion, but is
still a major problem. The overhead gets even worse with dynamic
load balancing, which requires mechanisms for runtime load monitoring
and moving virtual servers between the nodes.
4.2. Controlling the object location
Controlling the object location, with e.g. the power of two choices
method, is simple but efficient load balancing method. The method
does not generate any additional management overhead, since each
physical device appears as a single node in the overlay, in contrast
to the virtual server model. Instead, the model generates some
additional overhead to payload messaging due to the redundancy in
storing and looking up an object, as pointed out in section 2.
However, as shown by multiple research articles, such as
[MAINTENANCE_CHORD] and [CONTROLMESSAGE_OVERHEAD], the management
signaling dominates the signaling load in structured P2P networks.
Thus, the generated overall overhead is significantly lower than with
virtual server model in most of the P2P networks. With redirection
pointers, the payload signaling overhead can be alleviated with the
cost of increased maintenance signalling, as the pointers need
maintenance. The model also inherently takes into account the node
heterogeneity, as the load is calculated locally at each node.
The power of two choices is a passive load balancing model, i.e. it
does not provide active load transfers between the nodes. Thus, even
though with high probability an overloaded node does not receive any
additional objects during runtime, the objects (causing overload to
the node) cannot be moved to another node during their lifetime. If
the redirection pointers are in use, and thus the responsible node
candidates for a certain object are aware of each other, they can
move objects between each other to provide dynamic load balancing.
When compared to virtual server model, the power of two choices model
generates less overhead but is also slower in reacting to dynamic
load changes. The dynamic load balancing between the candidate nodes
(mentioned in previous paragraph) brings some help, but the mechanism
is quite inefficient, since the load can be balanced only between the
candidate nodes. Therefore the model works better in dynamic high
Harjula & Ylianttila Expires September 9, 2010 [Page 7]
Internet-Draft DHT Load Balancing Models March 2010
churn networks with short object lifetimes, since the model does not
significantly increase the already high maintenance signaling
overhead, and load balancing is more effective as it can take place
more often (as it happens only in object insertions).
4.3. Controlling the node location
Similarly to previous model, controlling the node location aims to
balance the load between nodes by controlling the number of objects
per node. However, instead of controlling object locations, the
locations of nodes themselves are controlled by the load balancing
mechanism. In a passive form, additional overhead is generated only
during the joining phase and when the nodes are looked up. Object
lookups are not affected by the model. Active node location
management generates also load transfer overhead, as the objects are
moved between the moved node and the node taking over the objects the
moving node was storing in the original location and, similarly,
between the moving node and the node that was holding the objects in
the new location. In addition, the distribution of load information
between the devices also generates overhead to the overlay. This
model takes also into account the node heterogeneity, as the load is
calculated locally.
4.4. Address space balancing
In practice, this load balancing model is somewhat similar to the
previously presented node location management mechanism, but the
general idea of how to achieve the load balance is different.
Address space balancing aims to place the nodes to the overlay so
that the address space controlled by each node is as equal as
possible. This model generates both management and load transfer
overhead, as the optimal node location is determined every time the
address space of a node is changed due to neighbour node departures
and new node arrivals, and changed when needed. A major drawback of
this model is that it only aims to balance the address-space between
the nodes, and does not take into account the actual load of the
node.
4.5. Summary
The summary of the main characteristics of the fundamental load
balancing models is presented in the table below.
+---------------+------------+----------------+---------------------+
| LB model | Achievable | Responsiveness | Cost |
| | LB quality | to rapid load | |
| | | changes | |
+---------------+------------+----------------+---------------------+
Harjula & Ylianttila Expires September 9, 2010 [Page 8]
Internet-Draft DHT Load Balancing Models March 2010
| Virtual | High | High if | -High maintenance |
| Server | | dynamic VN | overhead (also high |
| | | reallocation, | transfer cost if |
| | | otherwise no | dynamic VN |
| | | responsiveness | reallocation) |
| | | at all | |
+---------------+------------+----------------+---------------------+
| Controlling | High | Low (moderate | -No additional |
| Object | | if dynamic | maintenance |
| Location | | object | overhead |
| | | relocation | -Increased object |
| | | | lookup overhead |
| | | | (if redirection |
| | | | pointers in use, |
| | | | no additional |
| | | | lookup overhead, |
| | | | but increased |
| | | | maintenance |
| | | | overhead) |
| | | | -Multiple hash |
| | | | generation in |
| | | | object insertion & |
| | | | lookup (also |
| | | | transfer cost if |
| | | | dynamic object |
| | | | relocations) |
+---------------+------------+----------------+---------------------+
| Controlling | High | High if heavy | -Increased |
| Node Location | | node probes | maintenance |
| | | (otherwise | overhead |
| | | low) | -Incresed node |
| | | | lookup overhead |
| | | | -Transfer cost in |
| | | | node relocations |
+---------------+------------+----------------+---------------------+
| Address-space | Moderate | Not at all | -Increased |
| Balancing | | | maintenance |
| | | | overhead |
| | | | -Incresed node |
| | | | lookup overhead |
| | | | -Transfer cost in |
| | | | virtual node |
| | | | relocations |
+---------------+------------+----------------+---------------------+
Figure 1: Load balancing model characteristics
Harjula & Ylianttila Expires September 9, 2010 [Page 9]
Internet-Draft DHT Load Balancing Models March 2010
5. Future work
The future work includes more detailed analysis of the fundamental
load balancing models, as well as adding more references to existing
load balancing mechanisms, and mapping them with the fundamental
models.
6. Acknowledgements
No acknowledgements at this time.
7. IANA Considerations
This draft includes no request to IANA at this time.
8. Security Considerations
No security considerations at this time.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2. Informative References
[CHORD] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.,
Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: a
scalable peer-to-peer lookup protocol for Internet
applications, IEEE/ACM Transactions on Networking, Volume
11, Issue 1, Feb. 2003, Page(s) 17-32.".
[CONTROLMESSAGE_OVERHEAD]
Hong, S., Hilt, V., and H. Schulzrinne, "Evaluation of
Control Message Overhead of a DHT-Based P2P System, Bell
Labs Technical Journal, Volume 13, Issue 3, 2008.".
[HEATDISP]
Rieche, S., Petra, L., and K. Wehrle, "A Thermal-
Dissipation-based Approach for Balancing Data Load in
Distributed Hash Tables, IEEE Conference on Local Computer
Networks (LCN 2004), Tampa, USA, 2004.".
Harjula & Ylianttila Expires September 9, 2010 [Page 10]
Internet-Draft DHT Load Balancing Models March 2010
[MAINTENANCE_CHORD]
Maenpaa, J. and G. Camarillo, "Study on Maintenance
Operations in a Chord-based Peer-to-Peer Session
Initiation Protocol Overlay Network, IEEE International
Symposium on Parallel&Distributed Processing (IPDPS 2009),
Rome, Italy, 2009.".
[P2PSURVEY]
Androutsellis-Theotokis, S. and D. Spinellis, "A survey of
peer-to-peer content distribution technologies, ACM
Computing Surveys, 36(4):335-371, December 2004.".
[POWEROF2]
Byers, J., Considine, J., and M. Mitzenmacher, "Simple
Load Balancing for DHTs, 2nd International Workshop on
Peer-to-Peer Systems (IPTPS 03), Berkeley, USA, 2003,
IEEE.".
[SIMPLELOADBAL]
Karger, D. and M. Ruhl, "Simple Efficient Load Balancing
Algorithms for Peer-to-Peer Systems, 4th International
Workshop on Peer-to-Peer Systems (IPTPS 04), San Diego,
USA, 2004.".
[VIRTUALSERVER_1]
Rao, A., Laksminarayanan, K., Surana, S., Karp, R., and I.
Stoica, "Load Balancing in Structured P2P Systems,
International Workshop on Peer-to-Peer Systems (IPTPS),
pages 68-79.".
[VIRTUALSERVER_2]
Dabek, F., Kaashoek, M., Karger, D., Morris, R., and I.
Stoica, "Wide-area Cooperative Storage with CFS, 18th ACM
Symposium on Operating System Principles (SOSP), Chateau
Lake Louise, Banff, Alberta, Canada. pp. 202-215.".
Authors' Addresses
Erkki Harjula
University of Oulu
Erkki Koiso-Kanttilan katu 3
University of Oulu, 90014
Finland
Phone: +358 8 553 2522
Email: erkki.harjula@ee.oulu.fi
Harjula & Ylianttila Expires September 9, 2010 [Page 11]
Internet-Draft DHT Load Balancing Models March 2010
Mika Ylianttila
University of Oulu
Erkki Koiso-Kanttilan katu 3
University of Oulu, 90014
Finland
Phone: +358 8 553 25311
Email: mika.ylianttila@ee.oulu.fi
Harjula & Ylianttila Expires September 9, 2010 [Page 12]