Network Working Group R. Alimi
Internet-Draft Yale University
Intended status: Informational Z. Lu
Expires: April 29, 2010 Fudan University
H. Song
Huawei
Y. Yang
Yale University
October 26, 2009
A Survey of In-network Storage Systems
draft-song-decade-survey-01
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 29, 2010.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Alimi, et al. Expires April 29, 2010 [Page 1]
Internet-Draft DECADE Survey October 2009
Abstract
This document describes existing in-network storage systems and their
applicability for DECADE.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5
2. Survey Overview . . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Terminology and Concepts . . . . . . . . . . . . . . . . . 5
2.2. Historical Context . . . . . . . . . . . . . . . . . . . . 5
2.3. In-network Storage System Components . . . . . . . . . . . 7
2.3.1. Data Access Interface . . . . . . . . . . . . . . . . 7
2.3.2. Data Management Operations . . . . . . . . . . . . . . 7
2.3.3. Data Search Capability . . . . . . . . . . . . . . . . 7
2.3.4. Access Control Authorization . . . . . . . . . . . . . 7
2.3.5. Resource Control Interface . . . . . . . . . . . . . . 7
2.3.6. Discovery Mechanism . . . . . . . . . . . . . . . . . 8
2.3.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . 8
3. P2P Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Transparent P2P Caches . . . . . . . . . . . . . . . . . . 8
3.1.1. Data Access Interface . . . . . . . . . . . . . . . . 9
3.1.2. Data Management Operations . . . . . . . . . . . . . . 9
3.1.3. Data Search Capability . . . . . . . . . . . . . . . . 9
3.1.4. Access Control Authorization . . . . . . . . . . . . . 9
3.1.5. Resource Control Interface . . . . . . . . . . . . . . 9
3.1.6. Discovery Mechanism . . . . . . . . . . . . . . . . . 9
3.1.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . 9
3.2. Non-transparent P2P Caches . . . . . . . . . . . . . . . . 9
3.2.1. Data Access Interface . . . . . . . . . . . . . . . . 9
3.2.2. Data Management Operations . . . . . . . . . . . . . . 10
3.2.3. Data Search Capability . . . . . . . . . . . . . . . . 10
3.2.4. Access Control Authorization . . . . . . . . . . . . . 10
3.2.5. Resource Control Interface . . . . . . . . . . . . . . 10
3.2.6. Discovery Mechanism . . . . . . . . . . . . . . . . . 10
3.2.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . 10
4. Web Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Data Access Interface . . . . . . . . . . . . . . . . . . 11
4.2. Data Management Operations . . . . . . . . . . . . . . . . 11
4.3. Data Search Capability . . . . . . . . . . . . . . . . . . 11
4.4. Access Control Authorization . . . . . . . . . . . . . . . 11
4.5. Resource Control Interface . . . . . . . . . . . . . . . . 11
4.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 11
4.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 11
5. CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1. Data Access Interface . . . . . . . . . . . . . . . . . . 12
5.2. Data Management Operations . . . . . . . . . . . . . . . . 12
Alimi, et al. Expires April 29, 2010 [Page 2]
Internet-Draft DECADE Survey October 2009
5.3. Data Search Capability . . . . . . . . . . . . . . . . . . 13
5.4. Access Control Authorization . . . . . . . . . . . . . . . 13
5.5. Resource Control Interface . . . . . . . . . . . . . . . . 13
5.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 13
5.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 13
5.8. Comments . . . . . . . . . . . . . . . . . . . . . . . . . 13
6. NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.1. Data Access Interface . . . . . . . . . . . . . . . . . . 13
6.2. Data Management Operations . . . . . . . . . . . . . . . . 13
6.3. Data Search Capability . . . . . . . . . . . . . . . . . . 14
6.4. Access Control Authorization . . . . . . . . . . . . . . . 14
6.5. Resource Control Interface . . . . . . . . . . . . . . . . 14
6.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 14
6.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 14
6.8. Comments . . . . . . . . . . . . . . . . . . . . . . . . . 14
7. Amazon S3 . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.1. Data Access Interface . . . . . . . . . . . . . . . . . . 15
7.2. Data Management Operations . . . . . . . . . . . . . . . . 15
7.3. Data Search Capability . . . . . . . . . . . . . . . . . . 15
7.4. Access Control Authorization . . . . . . . . . . . . . . . 15
7.5. Resource Control Interface . . . . . . . . . . . . . . . . 15
7.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 15
7.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 15
8. OceanStore . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.1. Data Access Interface . . . . . . . . . . . . . . . . . . 16
8.2. Data Management Operations . . . . . . . . . . . . . . . . 16
8.3. Data Search Capability . . . . . . . . . . . . . . . . . . 16
8.4. Access Control Authorization . . . . . . . . . . . . . . . 16
8.5. Resource Control Interface . . . . . . . . . . . . . . . . 16
8.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 16
8.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 16
9. Cache-and-Forward Architecture . . . . . . . . . . . . . . . . 16
9.1. Data Access Interface . . . . . . . . . . . . . . . . . . 17
9.2. Data Management Operations . . . . . . . . . . . . . . . . 17
9.3. Data Search Capability . . . . . . . . . . . . . . . . . . 17
9.4. Access Control Authorization . . . . . . . . . . . . . . . 17
9.5. Resource Control Interface . . . . . . . . . . . . . . . . 17
9.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 17
9.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 17
10. Network Traffic Redundancy Elimination . . . . . . . . . . . . 17
10.1. Data Access Interface . . . . . . . . . . . . . . . . . . 18
10.2. Data Management Operations . . . . . . . . . . . . . . . . 18
10.3. Data Search Capability . . . . . . . . . . . . . . . . . . 18
10.4. Access Control Authorization . . . . . . . . . . . . . . . 18
10.5. Resource Control Interface . . . . . . . . . . . . . . . . 18
10.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 18
10.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 18
11. BranchCache . . . . . . . . . . . . . . . . . . . . . . . . . 18
Alimi, et al. Expires April 29, 2010 [Page 3]
Internet-Draft DECADE Survey October 2009
11.1. Data Access Interface . . . . . . . . . . . . . . . . . . 19
11.2. Data Management Operations . . . . . . . . . . . . . . . . 19
11.3. Data Search Capability . . . . . . . . . . . . . . . . . . 19
11.4. Access Control Authorization . . . . . . . . . . . . . . . 19
11.5. Resource Control Interface . . . . . . . . . . . . . . . . 19
11.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 19
11.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 20
12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 20
13. Security Considerations . . . . . . . . . . . . . . . . . . . 20
14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20
15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20
16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21
16.1. Normative References . . . . . . . . . . . . . . . . . . . 21
16.2. Informative References . . . . . . . . . . . . . . . . . . 21
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22
Alimi, et al. Expires April 29, 2010 [Page 4]
Internet-Draft DECADE Survey October 2009
1. Introduction
DECADE (DECoupled Application Data Enroute) is an architecture that
provides applications with access to in-network storage.
A major motivation for DECADE is the substantial increase on capacity
and reduction in cost offered by storage systems. In particular,
over the last decade, capacity of solid-state storage has increased
100-fold, while cost dropped to $50/GB; capacity of magnetic storage
devices has increased 100-fold, while cost dropped to $0.50/GB.
High-capacity and low-cost in-network storage devices introduce
substantial opportunities. One example of in-network storage is
content caches supporting Web and P2P content. Different from
existing content caches whose control fully reside at the owners of
the caching devices, DECADE also allows applications to control
access to their allocated in-network storage, as well as the
resources consumed while accessing that storage (bandwidth,
connections, storage space). While designed in the context of P2P
applications, it may be useful to other applications as well. This
document provides details on existing in-network storage solutions,
and evaluates their suitability for DECADE.
We note that the techniques presented in this section are only
representative of the research in this area. Rather than trying to
enumerate an exhaustive list, we have chosen some typical techniques
that lead to derivative works.
2. Survey Overview
2.1. Terminology and Concepts
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
This document uses terms defined in
[I-D.song-decade-problem-statement].
2.2. Historical Context
In-network storage has been used previously in numerous scenarios to
reducing network traffic and enable more efficient content
distribution. Systems have been developed with particular use cases
in mind. Thus, this survey is not meant to point out shortcomings of
existing solutions, but rather to indicate where certain capabilities
required in DECADE are not provided by existing systems.
Alimi, et al. Expires April 29, 2010 [Page 5]
Internet-Draft DECADE Survey October 2009
In the early stage of Internet development, most Web content was
stored at a central server and clients requested Web content from the
central server. In this architecture, the central server was
required to provide a large amount of bandwidth. Web browsing is
still a primary activity on today's Internet. As more and more users
access Web content, a central server can become overloaded. The use
of web caches is one technique to reduce load on a central server.
Web caches store frequently-requested content, and provide bandwidth
for serving the content to clients.
The ongoing growth of broadband technology in the worldwide market
has been driven by the hunger of customers for new multimedia
services as well as Web content. In particular, the use of audio and
video streaming formats has become common for delivery of rich
information to the public - both residential and business.
To overcome this challenge of massive multimedia consumption, only
installing more Web cache will not be enough. Moving content closer
to the consumer results in greater network efficiency, improved QoS,
and lower latency, while facilitating personalization of content
through broadband content applications. In these edge technologies,
CDN is a representative technique. Content Delivery Networks (CDN)
is based on a large-scale distributed network of servers located
closer to the edges of the Internet for efficient delivery of digital
content including various forms of multimedia content.
Although CDN is an effective means of information access and
delivery, there are two barriers to making CDN a more common service:
cost and replication integrity. Deploying a CDN for publicly
available content is expensive. It requires administrative control
over nodes with large storage capacity at geographically dispersed
locations with adequate connectivity. CDN can be scalable, but due
to this administrative and cost overhead, not rapidly deployable for
the common user.
The emergence and maturity of Peer to Peer (P2P) has allowed
improvements to many network applications. P2P allows the use of
client resources, such as CPU, memory, storage, and bandwidth, for
serving content. This can reduce the amount of resources required by
a content provider. Multimedia content delivery using various peer-
to-peer or peer-assisted frameworks has been shown to greatly reduce
the dependence on CDN and central content servers. However,
popularity of P2P applications has resulted in increased traffic on
ISP networks.
DECADE aims to provide a standard protocol allowing P2P applications
(including Content Providers) to make use of in-network storage to
reduce the traffic burden on ISP networks, while enabling P2P
Alimi, et al. Expires April 29, 2010 [Page 6]
Internet-Draft DECADE Survey October 2009
applications to control access to content they have placed in in-
network storage.
2.3. In-network Storage System Components
Before surveying individual technologies, we describe the basic
components of in-network storage systems used to evaluate them in the
context of DECADE.
Note that the network protocol(s) used by a storage system are also
an important part of the design. We omit details of particular
protocol choices in the current version of this document.
2.3.1. Data Access Interface
A set of operations are available to a user for accessing data in the
in-network storage. Solutions typically allow both read and write,
though the mechanisms for doing so can differ drastically.
2.3.2. Data Management Operations
Storage systems may provide users the ability to manage stored
content. For example, operations such as delete and move can be
provided to users. In this survey, we focus on data management
operations that are provided to client users and omit those provided
to system administrators.
2.3.3. Data Search Capability
Some storage systems may provide the capability to search or
enumerate content that has been stored. In this survey, we focus on
search capabilities that are provided to client users and omit those
provided to system administrators.
2.3.4. Access Control Authorization
A user is able to authorize individual users to retrieve the content
stored on its In-network storage. In-network storage can check the
authorization of a client before it stores or retrieves content. In-
network storage only permits the users with authorization to access
the corresponding contents. The admission could be based on user,
content, time period, etc.
2.3.5. Resource Control Interface
This is the interface through which users manage the resources on in-
network storage that can be used by other peers, e.g., the bandwidth
or connections. The storage system may also allow users to indicate
Alimi, et al. Expires April 29, 2010 [Page 7]
Internet-Draft DECADE Survey October 2009
a time for which resources are granted.
2.3.6. Discovery Mechanism
Users use the discovery mechanism to find location of in-network
storage, find access interface or resource control interface or other
interfaces of in-network storage.
2.3.7. Storage Mode
The data managed by the in-network storage could be of various types.
Example storage modes are file-based, object-based, or block-based.
3. P2P Cache
Caching of P2P traffic is a useful approach to reduce P2P network
traffic, because objects in P2P systems are mostly immutable and the
traffic is highly repetitive . In addition, making use of P2P caches
do not require changes to P2P protocols and can be deployed
transparently from clients.
P2P caches operate similarly to web caches, in that they temporarily
store frequently-requested content. Requests for content already
stored in the cache can be served from local storage instead of
requiring the data to be transmitted over expensive network links.
Two types of P2P caches exist: non-transparent P2P caches and
transparent P2P caches. A non-transparent cache appears as a super
peer; it explicitly peers with other P2P clients. For a transparent
cache, once a P2P cache is established, the network will
transparently redirect P2P traffic to the cache, which either serves
the file directly or passes the request on to a remote P2P user and
simultaneously caches that data. Transparency is typically
implemented using deep packet inspection (DPI). DPI products
identify and pass P2P packets to the P2P caching system so it can
cache the traffic and accelerate it.
To enable operation with existing P2P software, P2P caches directly
support P2P application protocols. A large number of P2P protocols
are used by P2P software, and hence are supported by caches, leading
to higher complexity. Additionally, these protocols evolve over
time, and new protocols are introduced.
3.1. Transparent P2P Caches
Alimi, et al. Expires April 29, 2010 [Page 8]
Internet-Draft DECADE Survey October 2009
3.1.1. Data Access Interface
Data Access Interface allows P2P content to be cached (stored) and
supplied (retrieved) locally such that network traffic is reduced,
but it is transparent to P2P users, and P2P users implicitly use the
data-access interface (in the form of their native P2P application
protocol) to store or retrieve content.
3.1.2. Data Management Operations
Not provided.
3.1.3. Data Search Capability
Not provided.
3.1.4. Access Control Authorization
Not provided.
3.1.5. Resource Control Interface
Not provided.
3.1.6. Discovery Mechanism
Use of Deep Packet Inspection means no discovery mechanism provided
to P2P users, it is transparent to P2P users. Since DPI is used to
recognize P2P applications private protocols, P2P Cache is getting
more and more complicated as the P2P applications keep evolving.
3.1.7. Storage Mode
Object-based. Chunks (typically, the unit of transfer amongst P2P
clients) of content are stored in the cache.
3.2. Non-transparent P2P Caches
3.2.1. Data Access Interface
Data Access Interface allows P2P content to be cached (stored) and
supplied (retrieved) locally such that network traffic is reduced.
P2P users implicitly store and retrieve from the cache using the P2P
application's native protocol.
Alimi, et al. Expires April 29, 2010 [Page 9]
Internet-Draft DECADE Survey October 2009
3.2.2. Data Management Operations
Not provided.
3.2.3. Data Search Capability
Not provided.
3.2.4. Access Control Authorization
Not provided.
3.2.5. Resource Control Interface
Not provided.
3.2.6. Discovery Mechanism
Cache pretends to be normal peers to join the P2P overlay network.
Other P2P users can find these cache nodes through overlay routing
mechanism, just looking them as normal neighbor nodes.
3.2.7. Storage Mode
Object-based. Chunks (typically, the unit of transfer amongst P2P
clients) of content are stored in the cache.
4. Web Cache
Web cache is a well-built technology since the late 1990s, which has
been widely deployed by many ISPs to reduce bandwidth consumption and
web access latency. A web cache can cache the web documents (e.g.,
HTML pages, images) between server and client to reduce bandwidth
usage, server load, and perceived lag. A web cache server is
typically shared by many clients, and stores copies of documents
passing through it; subsequent requests may be satisfied from the
cache if certain conditions are met.
Another form of cache is a client-side cache, typically implemented
in web browsers. A client side cache can keep a local copy of all
pages recently displayed by browser, and when the user returns to one
of these web pages, the local cached copy is reused.
A related protocol for P2P applications to use web cache is HPTP
(HTTP based Peer to Peer). It proposes to share chunks of P2P files/
streams using HTTP protocol with cache-control headers.
Alimi, et al. Expires April 29, 2010 [Page 10]
Internet-Draft DECADE Survey October 2009
4.1. Data Access Interface
Users explicitly read from a web cache by making requests, but they
cannot explicitly write data into it. Data is implicitly stored into
the web cache by requesting content that not aleady cached and meets
policy restrictions of the cache provider.
4.2. Data Management Operations
Not provided.
4.3. Data Search Capability
Not provided.
4.4. Access Control Authorization
Not provided.
4.5. Resource Control Interface
Not provided.
4.6. Discovery Mechanism
Web Caches can be transparently deployed between Web Server and Web
Clients, employing DPI for discovery. Alternatively, web caches
could be explicitly discovered by clients using techniques such as
DNS or manual configuration.
4.7. Storage Mode
Object based. Web content is keyed within the cache by HTTP Request
fields, such as Method, URI, and Headers.
5. CDN
Pathan et al. introduced the main idea and function of Content
Delivery Networks (CDN) [PR07]. CDN provides services that improve
network performance by maximizing bandwidth, improving accessibility
and maintaining correctness through content replication. They offer
fast and reliable applications and services by distributing content
to cache or edge servers located close to users.
A CDN has some combination of content-delivery, request-routing,
distribution and accounting infrastructure. The content-delivery
infrastructure consists of a set of edge servers (also called
Alimi, et al. Expires April 29, 2010 [Page 11]
Internet-Draft DECADE Survey October 2009
surrogates) that deliver copies of content to end-users. The
request-routing infrastructure is responsible to directing client
request to appropriate edge servers. It also interacts with the
distribution infrastructure to keep an up-to-date view of the content
stored in the CDN caches. The distribution infrastructure moves
content from the origin server to the CDN edge servers and ensures
consistency of content in the caches. The accounting infrastructure
maintains logs of client accesses and records the usage of the CDN
servers. This information is used for traffic reporting and usage-
based billing.
In practice, CDN typically host static content including images,
video, media clips, advertisements, and other embedded objects for
dynamic Web content. A focus for CDNs is the ability to publish and
deliver content to end-users in a reliable and timely manner. A CDN
focuses on building its network infrastructure to provide the
following services and functionalities: storage and management of
content; distribution of content among surrogates; cache management;
delivery of static, dynamic and streaming content; backup and
disaster recovery solutions; and monitoring, performance measurement
and reporting.
Examples of existing CDNs are Akamai, Limelight, and CloudFront.
The following description uses the term Content Provider to refer to
the entity purchasing CDN service, and the the term Client to refer
to the subscriber requesting content via the CDN from the Content
Provider.
5.1. Data Access Interface
CDN is typically an internal closed system, and CDN just provide read
(retrieve) access interface to clients but they don't provide
write(store) access interface to clients. Content provider can
access to network edge servers and store content to them, or edge
servers retrieve content from content provider, but client nodes just
can retrieve content from edge servers.
5.2. Data Management Operations
Content Provider can manage the data distributed in different cache
nodes, such as moving one hot data from one cache node to another
cache node, or deleting one rarely-accessed data in one cache node,
but client user nodes have no right to perform these operations.
Alimi, et al. Expires April 29, 2010 [Page 12]
Internet-Draft DECADE Survey October 2009
5.3. Data Search Capability
Content provider can search or enumerate what data each cache node
hold, but client user nodes have no right to perform these
operations.
5.4. Access Control Authorization
Content Providers typically cannot control per-client access to
content accessed via a CDN.
5.5. Resource Control Interface
Not provided.
5.6. Discovery Mechanism
Content providers can directly find internal CDN cache nodes to store
content, since they typically have an explicit business relationship.
Clients can locate CDN nodes through DNS or other redirection
mechanism.
5.7. Storage Mode
Mostly using File based Storage Mode, In most cases, CDN cache nodes
cache the entire file from content provider, and sometimes they also
can only cache some objects,such as file prefix or file suffix.
5.8. Comments
6. NFS
The Network File System is designed to allow users to access files
over a network in a manner similar to how local storage is accessed.
NFS is typically used in local area network or enterprise settings,
though changes made in later versions of NFS make it easier to
operate over the Internet.
6.1. Data Access Interface
Traditional file-system operations such as read, write, and update
(overwrite) are provided.
6.2. Data Management Operations
Traditional file-system operations such as move and delete are
provided.
Alimi, et al. Expires April 29, 2010 [Page 13]
Internet-Draft DECADE Survey October 2009
6.3. Data Search Capability
User has the ability to list contents of directories to find
filenames matching desired criteria.
6.4. Access Control Authorization
Files and directories can be protected using read, write, and execute
permissions for the files owner, group, and the public (others).
Extended ACLs can provide additional protections to explicitly allow
access to a subset of users and groups. Per-user access control is
only provided to users with accounts at the storage server.
6.5. Resource Control Interface
While disk space quotas can be configured, it typically limits the
total amount of storage allocated to a particular user. User control
of bandwidth and connections used by remote peers is not provided.
6.6. Discovery Mechanism
Manual configuration is typically used. Clients address NFS servers
by providing a hostname and a directory that should be mounted.
6.7. Storage Mode
File-based storage, allowing files to be organized into directories.
6.8. Comments
The efficiency and scalability of the NFS access control method is a
concern in the context of DECADE. A user owning storage may be
required to explicitly reconfigure permissions for files and
directories often (e.g., for each object transfered to each peer)
resulting in additional overhead for both the user and storage
server.
7. Amazon S3
Amazon S3 [AmazonS3] provides an online storage service. Users
create buckets, and each bucket can contain stored objects. Users
are provided an interface through which they can manage their
buckets. Amazon S3 is popular backend storage for other services.
Another related storage service is the Blob Service provided by
Windows Azure [Azure].
Alimi, et al. Expires April 29, 2010 [Page 14]
Internet-Draft DECADE Survey October 2009
7.1. Data Access Interface
Users can read, and write objects.
7.2. Data Management Operations
Users can delete previously-stored objects.
7.3. Data Search Capability
Users can list contents of buckets to find objects matching desired
criteria.
7.4. Access Control Authorization
Access to stored objects can be restricted by owner, a list of other
Amazon Web Service users, all Amazon Web Service Users, or open to
all users (anonymous access). Another option is for the owner to
generate and sign a query (e.g., a query to read an object) that can
be used by any user until an owner-defined expiration time.
7.5. Resource Control Interface
Not provided.
7.6. Discovery Mechanism
Users are provided a well-known DNS name (either a default provided
by Amazon, or one customized by a particular user). Users accessing
S3 storage use DNS to discover an IP address where S3 requests can be
sent.
7.7. Storage Mode
Object-based, with the extension that objects can be organized into
user-defined buckets.
8. OceanStore
OceanStore is a storage platform developed at UC Berkeley that
provides globally-distributed storage. OceanStore implements a model
where multiple storage providers can pool resources together. Thus,
a major focus is on resiliency and self-organization and self-
maintenance.
The protocol is resilient to some storage nodes being compromised by
utilizing Byzantine agreement and erasure codes to store data at
Alimi, et al. Expires April 29, 2010 [Page 15]
Internet-Draft DECADE Survey October 2009
primary replicas.
8.1. Data Access Interface
Users may read and write objects. Objects may be replaced by newer
versions, and multiple versions of an object may be maintained.
8.2. Data Management Operations
Not provided.
8.3. Data Search Capability
Not provided.
8.4. Access Control Authorization
Provided, but specifics are unclear from published paper.
8.5. Resource Control Interface
Not provided.
8.6. Discovery Mechanism
Users require an entry-point into the system in the form of one
storage node that is part of OceanStore.
8.7. Storage Mode
Object-based, though interfaces have been provided for NFS and HTTP.
9. Cache-and-Forward Architecture
Cache-and-Forward [PRDW08] is an architecture content delivery
services in the future Internet. In this architecture, storage can
be exploited at nodes with the network, either directly at routers or
deployed nearby routers. CNF is based on the concept of store-and-
forward routers with large storage, providing for opportunistic
delivery to occasionally disconnected mobile users and for in-network
caching of content. The proposed CNF protocol uses reliable hop-by-
hop transfer of large data files between CNF routers in place of an
end-to-end transport protocol like TCP.
Alimi, et al. Expires April 29, 2010 [Page 16]
Internet-Draft DECADE Survey October 2009
9.1. Data Access Interface
Users implicitly store content at Cache-and-forward routers by
requesting files. Endhosts read content from in-network storage by
submitting queries for content.
9.2. Data Management Operations
Not provided.
9.3. Data Search Capability
Not provided.
9.4. Access Control Authorization
Not provided.
9.5. Resource Control Interface
Not provided.
9.6. Discovery Mechanism
A query including a location-independent content ID is sent to the
network, and routed to a Cache-and-forward router, which handles
retrieval of the data and forwarding to the endhost.
9.7. Storage Mode
Files. The architecture proposes to cache files at storage within
the network, though files could be made to represent smaller chunks
of larger files.
10. Network Traffic Redundancy Elimination
Another form of in-network storage is Redundancy Elimination (RE), or
identifying and removing repeated content from network transfers.
This technique has been proposed to improve network performance in
many types of networks, such as ISP backbones and enterprise access
links. One example redundancy elimination proposal is SmartRE,
proposed by Anand et al., which focuses on network-wide redundancy
elimination. In packet-level redundancy elimination, forwarding
elements are equipped with additional storage which can be used to
cache data from forwarded packets. Upstream routers may replace
packet data with a fingerprint that tells a downstream router how to
decode and reconstruct the packet based on cached data.
Alimi, et al. Expires April 29, 2010 [Page 17]
Internet-Draft DECADE Survey October 2009
10.1. Data Access Interface
Redundancy-elimination are typically transparent to the user.
Writing into the storage is done by transferring data that has not
already been cached. Storage is read when users transmit data
identical to previously-transmitted data.
10.2. Data Management Operations
Not provided.
10.3. Data Search Capability
Not provided.
10.4. Access Control Authorization
Not provided. However, note that the content provider still retains
control over which peers receive the requested data. The returned
data is simple "compressed" as it is transferred within the network.
10.5. Resource Control Interface
Not provided. The content provider still retains control over the
rate at which packets are sent to a peer. The packet size within the
network may be reduced.
10.6. Discovery Mechanism
No discovery mechanism is necessary. Routers can use redundancy-
elimination without the users' knowledge.
10.7. Storage Mode
Object-based, with "objects" being data from packets transmitted
within the network.
11. BranchCache
BranchCache [BranchCache] is a feature integrated into Windows that
allows files retrieved from web servers and file servers in an
external network to be cached with a local network (e.g., branch
office). BranchCache operates transparently by instrumenting the
HTTP and SMB components of the networking stack. It also provides
two modes of operation: Distributed Cache and Hosted Cache.
In the Hosted Cache mode, a server acts as a cache for files
Alimi, et al. Expires April 29, 2010 [Page 18]
Internet-Draft DECADE Survey October 2009
retrieved over the external network. A client first consults the
cache for the desired file. If it is not present in the cache, the
client retrieves it from the content server and sends it to the cache
for storage.
In the Distributed Cache mode, a client first queries other clients
in the same network using the Web Services Discovery multicast
protocol. As in the Hosted Cache mode, the client retrieves the file
from the content server it is not available locally. After
retrieving the file (either from another client or the content
server), the client stores the file locally.
The original content server still authorizes requests from clients.
Cached content is encrypted, and clients can only decrypt the data
using keys derived from metadata returned by the content server. In
addition to instrumenting the networking stack at clients, content
servers must also support BranchCache.
11.1. Data Access Interface
Clients transparently retrieve (read) data from a cache (other
clients or a Hosted Cache) since it operates by instrumenting the
networking stack. In Hosted Cache mode, clients write data to the
Hosted Cache once it is retrieved from the content server.
11.2. Data Management Operations
Not provided.
11.3. Data Search Capability
Not provided.
11.4. Access Control Authorization
Transferred content is encrypted, and can only be decrypted by keys
derived data received from the original content server. Though data
may be transferred to unauthorized clients, end-to-end security is
maintained by only allowing authorized clients to decrypt the data.
11.5. Resource Control Interface
Not provided.
11.6. Discovery Mechanism
The Distributed Cache mode uses multicast for discovery of other
clients and content with a local network. Currently, the Hosted
Alimi, et al. Expires April 29, 2010 [Page 19]
Internet-Draft DECADE Survey October 2009
Cache mode uses manual configuration of the server used as the Hosted
Cache.
11.7. Storage Mode
File-based.
12. Conclusions
Though there have been many successful in-network storage systems,
they have been designed for use cases different than those defined in
DECADE. As a result, they their functionality and feature set does
not meet the requirements defined for DECADE. DECADE aims to provide
a standard protocol for P2P applications and content providers to
access and control in-network storage, resulting in increased network
efficiency while retaining control over content shared with peers.
Additionally, defining a standard protocol can reduce complexity of
in-network storage since multiple P2P application protocols no longer
need to be implemented by in-network storage systems.
13. Security Considerations
This draft is a survey of existing in-network storage systems, and
does not introduce any security considerations beyond those of the
surveyed systems.
For more information on security considerations of DECADE, see
[I-D.song-decade-problem-statement].
14. IANA Considerations
This document does not have any IANA Considerations.
15. Acknowledgments
The authors would like to thank Yu-Shun Wang and Ning Zong for
comments and contributions to this document.
16. References
Alimi, et al. Expires April 29, 2010 [Page 20]
Internet-Draft DECADE Survey October 2009
16.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
16.2. Informative References
[I-D.song-decade-problem-statement]
Yongchao, S., Zong, N., Yang, Y., and R. Alimi, "DECoupled
Application Data Enroute (DECADE) Problem Statement",
draft-song-decade-problem-statement-00 (work in progress),
October 2009.
[I-D.gu-decade-reqs]
Yingjie, G., Yongchao, S., Yang, Y., and R. Alimi,
"DECoupled Application Data Enroute (DECADE)
Requirements", draft-gu-decade-reqs-01 (work in progress),
October 2009.
[HYAL08] H. Xie, Y. R. Yang, A. Krishnamurthy, Y. Liu, and A.
Silberschatz., "P4P: Provider Portal for Applications.",
In ACM SIGCOMM 2008.
[MCM08] M. Hefeeda, C. Hsu, and K. Mokhtarian., "pCache: A Proxy
Cache for Peer-to-Peer Traffic,", In ACM SIGCOMM'08
Technical Demonstration.
[JZL08] Jie Wu, ZhiHui Lu, BiSheng Liu, et al., "PeerCDN: A Novel
P2P Network Assisted Streaming Content Delivery Network
Scheme", In 8th IEEE International Conference on Computer
and Information Technology (CIT2008).
[GYZ07] G. Shen, Y. Wang, Y. Xiong, B.Y. Zhao, Z.-L. Zhang, "HPTP:
Relieving the tension between isps and p2p", In 6th
International workshop on Peer-To-Peer Systems
(IPTPS2007).
[JCL09] Jiajun Wang, Cheng Huang, Jin Li., "On ISP-friendly rate
allocation for peer-assisted VoD", In ACM Multimedia 2008.
[GH09] Geoff Huston, Telstra., "Web Caching", In The Internet
Protocol Journal Volume 2, No. 3.
[McGraw02]
Scott Hull et al., "Content Delivery Networks: Web
Switching for Security, Availability, and Speed".
[PR07] Pathan, A.K., Buyya, R., "A Taxonomy and Survey of Content
Alimi, et al. Expires April 29, 2010 [Page 21]
Internet-Draft DECADE Survey October 2009
Delivery Networks.", In Grid Computing and Distributed
Systems Laboratory in University of Melbourne, Technology
Report, Feb. 2007.
[AmazonS3]
Amazon, "Amazon Simple Storage Service (Amazon S3).",
http://aws.amazon.com/s3/.
[Azure] Microsoft Corporation., "Windows Azure Blob - Programming
Blob Storage.",
http://go.microsoft.com/fwlink/?LinkId=153400.
[OceanStore]
S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and
J. Kubiatowicz., "Pond: the OceanStore Prototype.", In
FAST 2003.
[AVA09] A. Anand, V. Sekar, A. Akella., "SmartRE: An Architecture
for Coordinated Network-wide Redundancy Elimination.", In
SIGCOMM 2009.
[PRDW08] S. Paul, R. Yates, D. Raychaudhuri, J. Kurose., "The
Cache-and-Forward Network Architecture for Efficient
Mobile Content Delivery Services in the Future Internet",
In Innovations in NGN: Future Network and Services, 2008.
[BranchCache]
Microsoft Corporation., "BranchCache",
http://technet.microsoft.com/en-us/network/dd425028.aspx.
Authors' Addresses
Richard Alimi
Yale University
Email: richard.alimi@yale.edu
ZhiHui Lu
Fudan University
Email: lzh@fudan.edu.cn
Alimi, et al. Expires April 29, 2010 [Page 22]
Internet-Draft DECADE Survey October 2009
Song Haibin
Huawei
Baixia Road No. 91
Nanjing, Jiangsu Province 210001
P.R.China
Phone: +86-25-84565867
Email: melodysong@huawei.com
Yang Richard Yang
Yale University
Email: yry@cs.yale.edu
Alimi, et al. Expires April 29, 2010 [Page 23]