Internet Draft Ingrid Melve
Expires: December 1999 UNINETT
Informational Gary Tomlinson
WREC Working Group Novell
Ian Cooper
Mirror Image Internet
June, 25 1999
Internet Web Replication and Caching Taxonomy
draft-ietf-wrec-taxonomy-01.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This memo specifies standard terminology and the current taxonomy of
web replication and caching infrastructure deployed today. It
introduces standard concepts and protocols uses today within this
application domain. Currently deployed solutions employing this
technologies are presented to establish a standard taxonomy.
Research issues and HTTP proxy caching known problems are covered in
two accompanying document, and are not part of this document. This
document presents open protocols and points to published RFCs for
each protocol.
Melve, Tomlinson, Cooper [Page 1]
Replication and Caching Taxonomy June 25, 1999
Contents
1. Introduction
2. Terminology
3. Distributed Relationships
4. Client to Replica Communication
5. Inter-Replica Communication
6. Client to Proxy Configuration
7. Inter-Cache Communication
8. Network Element Communication
9. Security Considerations
10. Acknowledgements
11. References
12. Authors' Addresses
1. Introduction
Since its introduction in 1990, the World-Wide Web has evolved from a
simple client server model into a sophisticated distributed
architecture. This evolution has been driven largely due to the
scaling problems associated with exponential growth. Distinct
paradigms and solutions have emerged to satisfy specific
requirements. Two core infrastructural components being employed to
meet the demands of this growth are replication and caching. In man
cases, there is a need for web caches and replicated services to be
able to coexist.
There are many protocols, both open and proprietary, employed in web
replication and caching today. A majority of the open protocols
include DNS[21], CacheDigest[16], CARP[9], HTTP[6], ICP[10], PAC[7],
SOCKS[19], TPACT[22], WPAD[8], and WCCP[18]. Additional protocols are
being planned to address emerging solution requirements.
This memo specifies standard terminology and the current taxonomy of
web replication and caching infrastructure deployed in the Internet
today. The principal goal of this document is to establish a common
understanding and reference point of this application domain.
We also expect that this document will be used in the creation of a
standard architectural framework for efficient, reliable, and
predictable service in a web which includes both replicas and caches.
2. Terminology
Where possible, existing definitions [5, 6] have been used in this
document. Additional terminology has been agreed upon and defined in
this document. All of the terminology used in this document is
considered to be standardized with respect to IETF WREC working group
Melve, Tomlinson, Cooper [Page 2]
Replication and Caching Taxonomy June 25, 1999
RFCs.
In this document a number of terms are used to refer to the roles
played by participants in, and objects of, the HTTP communication.
The following definitions are used in the HTTP/1.1 specification [6].
However, these definitions may have come to have differing meaning
within the Web caching community. In those cases, additional
clarification is given.:
client
An application program that establishes connections for the
purpose of sending requests.
user agent
The client which initiates a request. These are often
browsers, editors, spiders (web-traversing robots), or
other end user tools.
server
An application program that accepts connections in order to
service requests by sending back responses. Any given
program may be capable of being both a client and a server;
our use of these terms refers only to the role being
performed by the program for a particular connection,
rather than to the program's capabilities in
general. Likewise, any server may act as an origin server,
proxy, gateway, or tunnel, switching behavior based on the
nature of each request.
origin server
The server on which a given resource resides or is to be
created.
[Ed note; IAN: The following is subtly different from the
definition given in HTTP/1.1. (Should we now
revert to the definition in HTTP/1.1 and document
the difference?) As a community we must be
careful about which type of "transparent proxy" is
being discussed.]
proxy
An intermediary system which acts as both a server and a
client for the purpose of making requests on behalf of
other clients. Requests are serviced internally or by
passing them on, with possible translation, to other
servers. A proxy MUST implement both the client and server
requirements of this specification. A "transparent proxy"
is a proxy that does not modify the request or response
beyond what is required for proxy authentication and
Melve, Tomlinson, Cooper [Page 3]
Replication and Caching Taxonomy June 25, 1999
identification. A "non-transparent proxy" is a proxy that
modifies the request or response in order to provide some
added service to the user agent, such as group annotation
services, media type transformation, protocol reduction,
or anonymity filtering. Except where either transparent or
non-transparent behavior is explicitly stated, the HTTP
proxy requirements apply to both types of proxies.
Note: The term "transparent proxy" given in [6] has different
meaning within the Web caching community. Further
unspecified references in this document (including the
following paragraph) are to the Web caching community
definition, which is given later.
The condition requiring implementation of both server and
client requirements of HTTP/1.1 is only appropriate for a
non-transparent proxy.
[Ed note; IAN: The following is also subtly different from
HTTP/1.1. Should also consider comments from Joe
Touch on whether we should distinguish types of
tunnels.]
tunnel
An intermediary system which is acting as a blind relay
between two connections. Once active, a tunnel is not
considered a party to the HTTP communication, though the
tunnel may have been initiated by an HTTP request. The
tunnel ceases to exist when both ends of the relayed
connections are closed.
[Ed note; IAN: The following has been slightly modified from
HTTP/1.1 to consider server load. Need to consider
comment from Joe Touch regarding clarification of
not using a cache when tunnelling.]
cache
A program's local store of response messages and the
subsystem that controls its message storage, retrieval, and
deletion. A cache stores cacheable responses in order to
reduce the response time, server load and network
bandwidth consumption on future, equivalent requests. Any
client or server may include a cache, though a cache
cannot be used by a server while it is acting as a tunnel.
[Ed note; IAN: The following has been edited from RFC2616 to
reference that document.]
cacheable
Melve, Tomlinson, Cooper [Page 4]
Replication and Caching Taxonomy June 25, 1999
A response is cacheable if a cache is allowed to store a
copy of the response message for use in answering
subsequent requests. The rules for determining the
cacheability of HTTP responses are defined in section 13
of [6]. Even if a resource is cacheable, there may be
additional constraints on whether a cache can use the
cached copy for a particular request.
To these we add the following:
authoritative reference
the owner of data; content production system; possibly an
origin server
content consumer
the user or system that makes requests of an origin server
(which may in turn be handled by a proxy).
caching proxy
A proxy with a cache, acting as server to clients, and
a client to servers
origin server accelerator
an application of a caching proxy where the proxy is
placed closer to the origin server than to the content
consumers in order to off-load the handling of cacheable
responses from the server; also as a means to reduce
traffic within the server's network.
surrogate
[Ed note; IAN: need a definition.]
network element
router or switch
[Ed note; IAN: This term probably needs a better name.]
browser
a special instance of a user agent that acts as a content
presentation device for content consumer
cluster
a tightly coupled set of devices acting together to share
load
reverse proxy
An intermediary system which acts as both a server and a
client for the purpose of serving requests on behalf of
Melve, Tomlinson, Cooper [Page 5]
Replication and Caching Taxonomy June 25, 1999
origin servers. Requests are serviced internally or by
passing them on to the origin server they are representing.
A reverse proxy must interpret and, if necessary, rewrite a
request message before forwarding it. Reverse proxies are
often used as server-side portals through network firewalls
and as helper applications for off loading requests from
origin servers.
[Ed note; IAN: leaving this as a placeholder until we can
work out proxies/reverse proxies/surrogates
and accelerators]
The following definitions are added to describe caching device
topology:
user agent cache
the cache within the user agent program
local caching proxy
the caching proxy a user agent connects to
[Ed note; IAN: should this be renamed 'primary proxy'?]
intermediate caching proxy
seen from the content consumer's view, all caches
participating in the caching mesh that are not the user
agent's local caching proxy
cache server
a server to requests made by local and upper level caching
proxies, but which does not act as a proxy
cache array
diffused array
cache cluster
a cluster of caching proxies, acting logically as one
service and partitioning the URL name space across the
array
caching mesh
a loosely coupled set of co-operating proxy- or caching-
servers, or clusters, acting independently but sharing
cacheable content between themselves using inter-cache
communication protocols (see Section 7)
Moves to insert proxies into the network in a manner such at the
content consumer is unaware of their presence has created a set of
Melve, Tomlinson, Cooper [Page 6]
Replication and Caching Taxonomy June 25, 1999
terms whose definitions may not be consistent with other uses. This
section references prior definitions but also gives their meaning in
the realm of Web caching.
[Ed note; IAN: snooping, redirection, interception - need to
clarify if we only need the first two]
traffic redirection
redirection of traffic from a user agent or network
element to a specific proxy, used to deploy Web-caching
without the need to manually reconfigure individual user
agents, or to force the use of a proxy where such use
would not otherwise occur
network traffic snooping
the examination of network traffic within a network
element to determine whether it should be redirected
transparent proxy (additional definition)
the term "transparent proxy" is defined in [6] (and quoted
above). However, in the realm of Web caching, this has
come to define a proxy which receives traffic as a result
of network traffic snooping. The term typically
describes the use of a proxy and the additional systems
which performing network traffic snooping. The use of
the proxy is transparent to the client. Transparent
proxies are used to remove the need for configuration of
clients to use a proxy.
proxy discovery
this describes the discovery and configuration for use of
a proxy in an environment where the content consumer may
be unaware of the proxy's existence. The use of the proxy
is transparent to the content consumer, but not to the
client.
[Ed note; IAN: should we consider the ability of proxies
to discover each other? Would this be
better titled as "transparent proxy
configuration"?]
The following terms describe the roles of servers and caches in the
realm of caching and replication:
[Ed note; IAN: This section needs significant work]
temporal domain, sparse working set cache
a subset of the content from one or more origin servers,
Melve, Tomlinson, Cooper [Page 7]
Replication and Caching Taxonomy June 25, 1999
stored temporarily and collected from requests made by
content consumers
persistent domain
a collection of origin servers maintaining a persistent
data set from the authoritative reference
replica origin server
origin server storing a persistent replica of a data set
stored at the authoritative reference
3. Distributed System Relationships
[Ed note; GARY: Consider eliminating this big picture, its doesn't
capture all of the relationships and is difficult to communicate]
Diagram of the components that make up a web replication and caching
infrastructure, with communication between the components.
------------------ ----------------- ------------------
| Replica Origin |-----| Master Origin |-----| Replica Origin |
| Server | | Server | | Server |
------------------ ----------------- ------------------
\ | /
\ | /
-----------------------------------------
| Client to
----------------- Replica Server
| Top-Level |
| Caching Proxy |
-----------------
/ \ Inter Cache
/ \ Communication
----------------- -----------------
| Upper-Level |-----------| Upper-Level |
| Caching Proxy | | Caching Proxy |
----------------- -----------------
/ Inter Cache \
/ Communication \ Inter Cache
/ \ Communication
/ \
/ ------------------ \
/ ------------------| \
Melve, Tomlinson, Cooper [Page 8]
Replication and Caching Taxonomy June 25, 1999
----------------- ----------------- || -----------------
| First Level |-----| Caching Proxy | |-----| First Level |
| Caching Proxy | | Array |-- | Caching Proxy |
----------------- ----------------- -----------------
| Client to |
| Proxy Cache | Cache to Network Element
------------- ------------
| Client | | Network |
------------- | Element |
------------
|
|
------------
| Client |
------------
3.1 Replication Relationships
[Ed note; describe the replication system relationship domain]
3.1.1 Client to Replica
[Ed note; recast this as relationship not the definition which
follows in section 4] Client to Replica: cooperation and
communication between clients (both browser/user agents and proxy
caches) and replica origin servers. Used to discover optimal replica
proximity.
Persistent Domain
Complete Idem-Potent Set Replication
------------------ ----------------- ------------------
| Replica Origin | | Master Origin | | Replica Origin |
| Server | | Server | | Server |
------------------ ----------------- ------------------
\ | /
\ | /
-----------------------------------------
| Client to
----------------- Replica Server
| Client |
| |
-----------------
3.1.2 Inter-Replica
Melve, Tomlinson, Cooper [Page 9]
Replication and Caching Taxonomy June 25, 1999
[Ed note; recast this as relationship not the definition which
follows in section 5] Inter-Replica: cooperation and communication
between replica origin servers. Used in replicating data sets
between origin servers.
Persistent Domain
Complete Idem-Potent Set Replication
------------------ ----------------- ------------------
| Replica Origin |-----| Master Origin |-----| Replica Origin |
| Server | | Server | | Server |
------------------ ----------------- ------------------
3.2 Caching Relationships
[Ed note; describe the caching system relationship domain]
3.2.1 Client to Proxy
[Ed note; recast this as relationship not the definition which
follows in section 6] Client to Proxy: configuration, cooperation and
communication between end user clients (browsers and applications)
and a caching proxy.
Temporal Domain
Sparse Working Set Cache
----------------- ----------------- -----------------
| First Level | | First Level | | First Level |
| Caching Proxy | | Caching Proxy | | Caching Proxy |
----------------- ----------------- -----------------
\ | /
\ | /
-----------------------------------------
|
-----------------
| Client |
-----------------
3.2.2 Reverse Proxy to Origin Server
[Ed note; describe the accelerator relationship]
3.2.2 Inter-Cache
[Ed note; recast this as relationship not the definition which
follows in section 7] Inter-Cache: cooperation and communication
between caching proxies.
Melve, Tomlinson, Cooper [Page 10]
Replication and Caching Taxonomy June 25, 1999
Temporal Domain
Sparse Working Set Cache
-----------------
| Top-Level |
| Caching Proxy |
-----------------
/ \
/ \
----------------- -----------------
| Upper-Level |-----------| Upper-Level |
| Caching Proxy | | Caching Proxy |
----------------- -----------------
/ \ / \
/ \ / \
/ \ / \
/ \ / \
/ \ / \
/ \ / \
----------------- ----------------- -----------------
| First Level |-----| First Level |-------| First Level |
| Caching Proxy | | Caching Proxy | | Caching Proxy |
----------------- ----------------- -----------------
Network Element to Caching Proxy
[Ed note; recast this as relationship not the definition which
follows in section 8] Network Element to Proxy Cache: cooperation and
communication between caching proxy and network elements. Examples
include routes and switches. Generally used for transparent caching
and/or diffused arrays.
Temporal Domain
Sparse Working Set Cache
----------------- ----------------- -----------------
| Caching Proxy | | Caching Proxy | | Caching Proxy |
| Array | | Array | | Array |
----------------- ----------------- -----------------
\ | /
\ | /
-----------------------------------------
|
--------------
| Network |
| Element |
--------------
|
|
------------
Melve, Tomlinson, Cooper [Page 11]
Replication and Caching Taxonomy June 25, 1999
| Client |
------------
Caching Proxies with Transparency
[Ed note: Currently contains citations from NetApp document, need
rewording to avoid specific products and concentrate on generic
properties. Explain network elements and NATs and other ways
interception may happen. Intro to usage and "normal" setup.]
Reference [1,2,3,4] for introduction to caching proxies with
transparency.
The goal of intercepting web traffic is to provide a transparent web
proxy, thus avoiding the hassle of individually configuring each
client.
Transparency means that the user does not need to be aware of the
proxy.
The origin server see connections coming from the proxy, not from the
individual end user. Authentication based on client IP address do not
work if there is a transparent proxy cache in the way to the web
server.
A web cache is said to be transparent if clients can access the cache
without the need to configure their browsers, using either a proxy
auto-configuration URL or a manual proxy setting. Transparent caches
appear as a seamless part of the network infrastructure, rather than
a set of discrete proxy servers, and function much like a transparent
firewall. Many ISPs and carriers desire transparent caches because it
lets them retrofit their network with caching without action at the
client. However, when deployed transparently, a web cache must be as
fail-safe and scalable as the rest of the network. [2]
A transparent cache acts much like a gateway or firewall -- it
effectively sits between the users and the network. The advantage of
transparent caching is that it eliminates the need to configure
browsers to use caching. Another strength (and sometimes a weakness)
is that it is impossible to bypass caching. [2]
Conceptually, transparency works by modifying the TCP/IP stack of a
cache so that it operates in "promiscuous mode" and effectively binds
itself to all possible IP addresses. [2]
We need to give a far more abstract definition which includes the way
that router and switch redirection, and within-router action,
operate.
Melve, Tomlinson, Cooper [Page 12]
Replication and Caching Taxonomy June 25, 1999
Comment on some of the problems:
* limited number of ports which can be captured
* due to "unexpected" data on other ports
(or even on well known ports), as experienced by setting up
various services on port 80
* well known problems with use of HTTP for transport [20]
Out-of-path Transparent Caching Proxies
An Out-of-path Transparent Caching Proxy performs the same proxy and
caching functions as a Transparent Caching Proxy and is similarly
transparent to the client. However it does not lie on the forwarding
path between a client and a server and does not perform web traffic
interception. Instead it relies upon a redirecting network element in
the path between client and server to intercept and redirect web
traffic to it. One advantage of this method of transparent caching is
that in the case of cache failure the network element can, providing
it monitors the state of the caches, revert to forwarding web traffic
direct to the server. It is also possible for the network element to
distribute the web traffic load across a group of caches. This method
of transparent caching generally requires a protocol to be run
between the redirecting network element and the cache or caches.
4. Client to Replica Communication
This section describes the cooperation and communication between
clients (both user agents and proxy caches) and replica origin web
servers. Used to discover a optimal web origin server replica for a
web client to establish service with. Optimality is a policy based
decision, often based upon proximity, but may be based on other
criteria such as load.
4.1 Navigation Hyperlinks
Authoritative reference:
This memo.
Description:
The simplest of client to replica communication
mechanisms. This utilizes hyperlink URL's embedded in web
pages that point to the mirror sites. The human user
manually selects the link of the replica origin server
they wish to use.
Security:
Relies on the protocol security associated with the URL
scheme.
Melve, Tomlinson, Cooper [Page 13]
Replication and Caching Taxonomy June 25, 1999
Deployment:
Probably the most commonly deployed client to replica
communication mechanism. Ubiquitous interoperability
with humans.
Submitter:
Document editors.
4.2 URL Redirection
Authoritative reference:
This memo.
Description:
A simple and commonly used mechanism to connect web
clients with origin server replicas is to use URL
redirection. Clients are redirected to a optimal web
server replica via the use of the HTTP [6] protocol
response code 307 Temporary Redirect. A web client
establishes HTTP communication with one of the web server
replicas. The initially contacted replica origin web
server can either choose to accept the service or redirect
the client to the proper replica. Refer to section 10.3.8
in HTTP/1.1 RFC2616 for information on HTTP response code
307.
Security:
Relies entirely upon HTTP security.
Deployment:
Observed at a number of large web sites. Extent of usage
in the Internet is unknown at this time.
Submitter:
Document editors.
4.3 DNS Redirection [21]
Authoritative reference:
Load balancing: RFC1794 DNS Support for Load Balancing
Proximity: This memo
[Ed note; it would have been nice to cite SONAR, but draft has
expired]
Description:
The Domain Name Service (DNS) provides a more
Melve, Tomlinson, Cooper [Page 14]
Replication and Caching Taxonomy June 25, 1999
sophisticated client to replica communication mechanism.
This is accomplished by DNS servers that implement order
of addresses based upon quality of service policies. When
a web client resolves the name of a web server, the
enhanced DNS server orders the IP addresses of the web
server starting with the most optimal replica and ending
with the least optimal replica.
Security:
Relies entirely upon DNS security.
Deployment:
Observed at a number of large web sites and large ISP web
hosted services. Extent of usage in the Internet is
unknown at this time.
Submitter:
Document editors.
5. Inter-Replica Communication
This section describes the cooperation and communication between
replica origin servers. Used in replicating data sets between origin
servers.
5.1 Batch Driven Mirror Replication
Authoritative reference:
This memo.
Description:
In this model, the replica web server to be updated
initiates communication with a master origin web server.
The communication is established at intervals based upon
queued transactions which are scheduled for deferred
processing. The scheduling mechanism policies vary, but
generally are reoccuring at a specified time. Once
communication is established, data sets are copied to the
initiating replica web server.
Security:
Relies upon the protocol being used to transfer the data
set. FTP and RDIST are the most common protocols observed.
Deployment:
Very common for mirror synchronization in the Internet.
Melve, Tomlinson, Cooper [Page 15]
Replication and Caching Taxonomy June 25, 1999
Submitter:
Document editors.
5.2 Demand Driven Mirror Replication
Authoritative reference:
This memo.
Description:
In this model, the replica web server acquires the content
as needed due to demand. This is generally done by web
server accelerators (reverse proxy) operating as origin
server replicas. When a web client requests a URL that is
not in the data set or the replica origin server, the
replica server attempts to acquire it from a master origin
server and forwarded on to the requesting web client.
Security:
Relies upon the protocol being used to transfer the URLs.
FTP, Gopher, HTTP and ICP are the most common protocols
observed.
Deployment:
Observed at several large web sites. Extent of usage in
the Internet is unknown at this time.
Submitter:
Document editors.
5.3 Synchronized Replication
Authoritative reference:
This memo. [Ed note; there is no IETF protocol specified at
this time. The editors are aware of at least
two open source protocols, AFS and CODA, along
with one expired IETF draft
<draft-leach-cifs-v1-spec-01.txt> and one
proprietary protocol Novell NRS; none of which
can be considered an authoritative reference]
Description:
In this model, the replicated origin servers cooperate
using synchronized strategies and specialized replica
protocols to keep the replica data sets coherent.
Synchronization strategies range from tightly coherent (a
few minutes) to loosely coherent (a few or more hours).
Updates occur between replicas based upon the
Melve, Tomlinson, Cooper [Page 16]
Replication and Caching Taxonomy June 25, 1999
synchronization time constraints of the coherency model
employed and are generally in the form of deltas only.
Security:
All of the known protocols utilize strong cryptographic key
exchange methods, which are either based upon the Kerberos
shared secret model or the public/private key RSA model.
Deployment:
Observed at a few sites, primarily at university campuses.
Submitter:
Document editors.
6. Client to Proxy Configuration
This section describes the configuration, cooperation and
communication between end user clients (browsers and applications) a
proxy.
6.1 Manual Proxy Configuration
Authoritative reference:
This memo.
Description:
Each user needs to configure its web client by typing in
information pertaining to proxied protocols and local
policies.
Security:
The potential for doing wrong is high, as each user
individually sets preferences.
Deployment:
Widely deployed, used in all current browsers. Most
browsers support other options as well.
Submitter:
Document editors.
6.2 Proxy Auto Configuration (PAC) [7]
[Ed note: Does it really need to be submitted for Informational RFC?]
Authoritative reference:
Melve, Tomlinson, Cooper [Page 17]
Replication and Caching Taxonomy June 25, 1999
No RFC published, no Internet-Draft
Navigator Proxy Auto-Config File Format. Available from
http://home.netscape.com/eng/mozilla/2.0/
relnotes/demo/proxy-live.html
Description:
A JavaScript page on a web server hands out information on
where to find proxies. Clients need to point at the URL of
this page. No bootstrap mechanism, manual configuration
necessary.
Manual configuration is made easier by centralizing the
script to one URL.
Security:
Common policy per organization possible. Does still require
manual configuration. PAC is better than "manual proxy
configuration" because with PAC administrators can update
the proxy configuration without user intervention.
Interoperability of PAC files is not as good as wanted,
since more popular browsers have slightly different
interpretation of the script, and this may lead to
undesired effects.
Deployment:
Implemented in most web clients.
Submitter:
Document editors.
6.3 Cache Array Routing Protocol (CARP) v1.0 [9]
[Ed note: Current draft expired. A new draft must submitted and this
section completed for this protocol to be considered in the Taxonomy]
Authoritative reference:
Expired Internet-Draft draft-vinod-carp-v1-03.txt
Work in progress.
Description:
Clients may use CARP directly as a hash function based
proxy selection mechanism. They need to be configured with
the location of the cluster information.
Security:
Melve, Tomlinson, Cooper [Page 18]
Replication and Caching Taxonomy June 25, 1999
Deployment:
Submitter:
6.4 Web Proxy Auto-Discovery Protocol (WPAD) [8]
Authoritative reference:
Internet Draft <draft-ietf-wrec-wpad-00.txt>
[Ed note; I-D submission anticipated by 6/25/99]
Work in progress.
Description:
WPAD uses a collection of pre-existing Internet resource
discovery mechanisms to perform web proxy auto-discovery.
The only goal of WPAD is to locate the PAC URL. WPAD does
not specify which proxies will be used. WPAD gets you to
the PAC URL, and the PAC script chooses the proxies for
you.
The WPAD protocol specifies the following:
+ how to use each mechanism for the specific purpose of
web proxy auto-discovery
+ the order in which the mechanisms should be performed
+ the minimal set of mechanisms which must be attempted
by a WPAD compliant web client
The resource discovery mechanisms utilized by WPAD are as
follows:
+ Dynamic Host Configuration Protocol DHCP
+ Service Location Protocol SLP
+ "Well Known Aliases" using DNS A records
+ DNS SRV records
+ "service: URLs" in DNS TXT records
Security:
Relies upon DNS and HTTP security.
Deployment:
Implemented in web clients and caching proxy servers. More
than two independent implementations.
Submitter:
Melve, Tomlinson, Cooper [Page 19]
Replication and Caching Taxonomy June 25, 1999
Josh Cohen, Microsoft, joshco@microsoft.com
7. Inter-Cache Communication
[Ed note: INGRID. Review and chase submissions (push Duane)]
This section describes the cooperation and communication between
caching proxies.
7.1 Internet Cache Protocol (ICP) [10, 11, 12, 13, 14]
Authoritative reference:
RFC 2186 Internet Cache Protocol (ICP), version 2
Description:
ICP is used by caches to query other caches about web
objects, to see if a web object is present at the other
cache.
ICP uses UDP. Since UDP is unreliable, an estimate of
network congestion and availability may be calculated
by ICP loss. This rudimentary loss measurement does,
together with round trip times provide a load balancing
method for caches.
Security:
ICP does not convey information about HTTP headers
associated with a web object. HTTP headers may include
access control and cache directives, Since caches ask for
objects, and then download the objects using HTTP, false
cache hits may occur (object present in cache, but not
accessible for sibling cache is one example).
ICP suffer from all the security problems of UDP.
Deployment:
Widely deployed. Most current cache implementations support
ICP in one form or the other.
Submitter:
Document editors.
7.2 Hyper Text Caching Protocol (HTCP/0.0) [15]
[Ed note: Current draft expired. A new draft must submitted for this
protocol to be considered in the Taxonomy. Based upon reviewers
Melve, Tomlinson, Cooper [Page 20]
Replication and Caching Taxonomy June 25, 1999
comments, the editors would like to drop this protocol from current
Taxonomy consideration, due to its experimental nature]
Authoritative reference:
Expired Internet Draft draft-vixie-htcp-proto-03.txt,
Work in Progress
Description:
HTCP is a protocol for discovering HTTP caches and cached
data, managing sets of HTTP caches, and monitoring cache
activity.
HTCP includes HTTP headers, while ICPv2 does not. HTTP
headers are vital information for web proxy caches.
Security:
Optionally uses the MD5 shared secret authentication.
Lack of authentication option make protocol subject to
attack.
Deployment:
Implemented in caching proxies (two independent
implementations)
Submitter:
Document editors.
7.3 Cache Array Routing Protocol (CARP) v1.0 [9]
[Ed note: Current draft expired. A new draft must submitted and this
section completed for this protocol to be considered in the Taxonomy]
Authoritative reference:
Work in Progress: Internet-Draft draft-vinod-carp-v1-03.txt
Description:
CARP is a hashing function for dividing URL-space among a
cluster of proxy caches. Included in CARP is the definition
of a Proxy Array Membership Table, and ways to download
this information.
An HTTP client agent (either a proxy server or a client
browser) which implements CARP v1.0 can allocate and
intelligently route requests for the correct URLs to any
member of the Proxy Array. Due to the resulting sorting of
requests through these proxies, duplication of cache
contents is eliminated and global cache hit rates may be
improved.
Melve, Tomlinson, Cooper [Page 21]
Replication and Caching Taxonomy June 25, 1999
Security:
Deployment:
Implemented in caching proxy servers. More than two
independent implementations.
Submitter:
7.4 Cache Digest [16]
[Ed note: Does it really need to be submitted for Informational RFC?]
Authoritative reference:
No RFC published, no Internet-Draft
Cache Digest specification
http://squid.nlanr.net/Squid/CacheDigest/
cache-digest-v5.txt
Squid Digest FAQ entry
http://squid.nlanr.net/Squid/FAQ/FAQ-16.html
Description:
Cache Digests are a response to the problems of latency
and congestion associated with previous inter-cache
communications mechanisms such as the Internet Cache
Protocol (ICP) [10, 11] and the HyperText Cache Protocol
[15]. Unlike most of these protocols, Cache Digests
support peering between cache servers without a
request-response exchange taking place. Instead, a summary
of the contents of the server (the Digest) is fetched by
other servers which peer with it. Using Cache Digests it
is possible to determine with a relatively high degree of
accuracy whether a given URL is cached by a particular
server.
Cache Digests are both an exchange protocol and a data
format [16a,16b].
Security:
If the contents of a Digest is sensitive, it should be
protected from access by The Wrong People. Any methods
which would normally be applied to secure an HTTP
connection can be applied to Cache Digests.
A 'Trojan horse' attack is currently possible in a cache
mesh: Cache A can build a fake peer Digest for cache B and
serve it to B's peers if requested. This way A can direct
traffic toward/from B. The impact of this problem is
Melve, Tomlinson, Cooper [Page 22]
Replication and Caching Taxonomy June 25, 1999
minimized by the 'pull' model of transferring Cache
Digests from one server to another.
Cache Digests provide knowledge about peer cache content
on a URL level. Hence, they do not dictate a particular
level of policy management and can be used to implement
various policies on any level (user, organization, etc.).
Deployment:
Cache Digests are supported in Squid; several commercial
vendors are looking into Digest support.
Cache Meshes:
+ NLANR Mesh
+ TF-CACHE mesh (European Academic networks)
Submitter:
Alex Rousskov, NLANR, rousskov@nlanr.net
7.5 Cache Pre-filling [23]
Authoritative reference:
Internet Draft <draft-lovric-francetelecom-
satellites-00.txt>
Work in progress.
Description:
Cache pre-filling is a push-caching implementation. It is
particularly well adapted to IP-multicast networks because
it allows preselected URLs to be inserted in one single
time within all the caches that belong to the targeted
multicast group. Different implementations of cache
pre-filling already exist, especially in satellite
contexts. However, there is still no standard for this
kind of push-caching and vendors propose solutions either
based on dedicated equipments or public domain caches
extended with a pre-filling module.
Security:
Relies on the inter cache protocols being employed.
Deployment:
Observed in two commercial content distribution service
providers.
Submitter:
Ivan Lovric, France Telecom,
ivan.lovric@cnet.francetelecom.fr
Melve, Tomlinson, Cooper [Page 23]
Replication and Caching Taxonomy June 25, 1999
8. Network Element Communication
This section describes the cooperation and communication between
caching proxy and network elements. Examples include routers and
switches. Generally used for transparent caching and/or diffused
arrays.
8.1 Web Cache Coordination Protocol (WCCP)
Authoritative reference:
Internet Draft <draft-ietf-wrec-web-pro-00.txt> [18]
Work in progress.
Description:
WCCP V1 runs between a router functioning as a redirecting
network element and out-of-path transparent caching
proxies. The protocol allows one or more caching proxies
to register themselves with a single router to receive
redirected web traffic. It also allows one of the proxies,
the designated proxy, to dictate to the router how
redirected web traffic is distributed across the caching
proxies.
Security:
WCCP V1 has no security features.
Deployment:
Network elements: WCCP V1 is deployed on a wide range of
Cisco routers.
Caching proxies: WCCP V1 is deployed on a number of
vendors' caches.
Submitter:
David Forster, CISCO, dforster@cisco.com
8.2 Transparent Proxy Agent Control Protocol (TPACT)
Authoritative reference: [Ed note; anticipated submission]
Internet Draft <draft-ietf-wrec-tpact-00.txt> [22] [Ed
note; I-D submission anticipated by 6/25/99]
Work in progress.
Description:
TPACT runs between a network elements (router or switch)
functioning as a redirecting network element and
out-of-path transparent caching proxies. The protocol
Melve, Tomlinson, Cooper [Page 24]
Replication and Caching Taxonomy June 25, 1999
allows one or more caching proxies to register themselves
with a single network element to receive redirected web
traffic. All of the participating caching proxies operate
as a quorum in the diectating of web traffic distribution
across the group.
Security:
MD5 is optionally employed for authentication. Sequence
numbers are employed as security against replay attacks.
Deployment:
Network elements: TPACT is prototyped and being evaluated
on multiple vendor L4 switches.
Caching proxies: TPACT is prototyped and being evaluated
on multiple vendor caches.
Submitter:
John Martin, Network Appliance, jmartin@netapp.com
8.3 SOCKS [19]
Authoritative reference:
RFC1928 SOCKS Protocol Version 5
Description:
SOCKS is primarily used as a proxy cache to firewall
protocol. Although, firewalls don't conform to the
narrowly defined network element definition of routers and
switches, they are a integral part of the network
infrastructure. When used in conjunction with a firewall,
SOCKS provides a authenticated tunnel between the proxy
cache and the firewall.
Security:
A extensive framework provides for multiple authentication
methods. Currently, SSL, CHAP, DES, 3DES are known to be
available.
Deployment:
SOCKS is been widely deployed in the Internet.
Submitter:
Document editors.
9. Security Considerations
Melve, Tomlinson, Cooper [Page 25]
Replication and Caching Taxonomy June 25, 1999
[Ed note: INGRID. Send to list, more information needed]
Information on security in each protocol is provided in the
description of the protocol, and in the accompanying RFC for each
protocol.
Refer to section 15 in HTTP/1.1bis draft-ietf-http-v11-spec-
rev-06.txt
Man in the middle attacks
Refer to HTTP/1.1bis, chapter 15.7
HTTP proxies are men-in-the-middle, the perfect place for a man-in-
the-middle-attack.
Denial of service
Individual protocols
See documentation for each protocol for discussion of security
issues.
Trusted parties
You need to trust your proxy.
Stupid configuration
It is quite easy to have a stupid configuration which will harm
service for end users.
Privacy
Logs from proxies need to be kept secure, as they provide information
about users and end user patterns. A proxy log is even more
sensitive than a web server log, as all requests from the user
population goes through the proxy. Logs from replication servers may
need to be amalgamated to get aggregated statistics from a service,
transporting logs across borders may have legal implications. Log
handling is restricted by law in some countries.
Requirements for object security and privacy are the same in a web
replication and caching system as it is in the Internet at large.
The only reliable solution is strong cryptography. End to end
encryption does not necessarily make objects cacheable, as is the
Melve, Tomlinson, Cooper [Page 26]
Replication and Caching Taxonomy June 25, 1999
case of SSL encrypted web sessions.
Communication
Transient copies
The legislative forces of the world are still out on the question of
transient copies, like those kept in replication and caching system,
being legal. Legal implications of replication and caching is
subject to local law.
10. Acknowledgements
[Ed note: No decision made on authors list. Submitters of individual
entries are acknowledged in the text. Need to sort out how to give
credits where they are due.]
David Forster, Cisco, dforster@cisco.com provided info on Out-of-path
Transparent Caching Proxies.
Alex Rousskov, David Forster, Josh Cohen and John Martin for protocol
information.
John Dilley, Ivan Lovric and Joe Touch for terminology and taxonomy
information.
David Forster, Josh Cohen, Henrik Nordstrom and Patrick McManus for
their help in defining proxy transparency.
11. References
[1] Duane Wessels. Squid FAQ: Transparent Caching/Proxying.
National Laboratory for Applied Network Research. Available from:
http://squid.nlanr.net/Squid/FAQ/FAQ-17.html
[2] Peter Danzig and Karl L. Swartz. Transparent, Scalable, Fail-
Safe Web Caching. Network Appliance, Inc. Available from
http://www.netapp.com/technology/level3/3033.html
[3] Bert Williams. Transparent Web Caching Solutions. Alteon
Networks. Available from Transparent Web Caching Solutions
[4] Tony Hain. Architectural Implications of NAT. Internet
Architecture Board. Internet Draft (Work in Progress). Available from
ftp://ftp.nordu.net/internet-drafts/draft-iab-nat-implications-02.txt
Melve, Tomlinson, Cooper [Page 27]
Replication and Caching Taxonomy June 25, 1999
[5] Ingrid Melve, Lars Slettjord, Ton Verschuren, Henny Bekker,
Technical report European Union RE1004-M4.3 "Web caching
architecture"
[6] Fielding, et al. Hypertext Transfer Protocol -- HTTP/1.1. IETF
RFC2616. Available from http://www.rfc-editor.org/rfc/rfc2616.txt
[7] Netscape, Inc. Navigator Proxy Auto-Config File Format.
Available from
http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-
live.html
[8] Paul Gauthier, J. Cohen, Martin Dunsmuir and Charles Perkins.
The Web Proxy Auto-Discovery Protocol. Internet Draft. Available from
http://www.ietf.org/internet-drafts/draft-ietf-wrec-wpad-00.txt
[9] Vinod Valloppillil and Keith W. Ross. Cache Array Routing
Protocol. Internet Draft (Work in Progress) Available from
ftp://ftp.nordu.net/internet-drafts/draft-vinod-carp-v1-03.txt
[10] D. Wessels and K. Claffy. Internet Cache Protocol (ICP), version
2. 'RFC2186. Available from ftp://ftp.nordu.net/rfc/rfc2186.txt
[11] D. Wessels and K. Claffy. Application of Internet Cache Protocol
(ICP), version 2, RFC2187. Available from
ftp://ftp.nordu.net/rfc/rfc2187.txt
[12] Ivan Lovric. Internet Cache Protocol Extension Internet Draft
(Work in Progress) Available from ftp://ftp.nordu.net/internet-
drafts/draft-lovric-icp-ext-01.txt
[13] Duane Wessels. ICP Home Page, National Laboratory for Applied
Research. Available from [52]http://ircache.nlanr.net/Cache/ICP/
[14] University of Southern California. Internet Cache Protocol
Specification 1.4. Available from
http://excalibur.usc.edu/icpdoc/icp.html
[15] Paul Vixie and Duane Wessels. Hyper Text Caching Protocol
(HTCP/0.0). Internet Draft (Work in Progress) Available from
ftp://ftp.nordu.net/internet-drafts/draft-vixie-htcp-proto-03.txt
[16] Alex Rouskov and Duane Wessels. Cache Digests. National
Laboratory for Applied Network Research. Available from [16a] Cache
Digest specification http://squid.nlanr.net/Squid/CacheDigest/cache-
digest-v5.txt [16b] Squid Digest FAQ entry
http://squid.nlanr.net/Squid/FAQ/FAQ-16.html
Melve, Tomlinson, Cooper [Page 28]
Replication and Caching Taxonomy June 25, 1999
[17] Berners-Lee, et al. Hypertext Transfer Protocol -- HTTP/1.0 IETF
RFC1945 Available from http://www.rfc-editor.org/rfc/rfc1945.txt
[18] Cisco Web Cache Coordination Protocol V1.0. Internet Draft.
Available from http://www.ietf.org/internet-drafts/draft-ietf-wrec-
web-pro-00.txt
[19] Leech, et al. SOCKS Protocol Version 5, RFC1928 Available from
http://www.rfc-editor.org/rfc/rfc1928.txt
[20] Keith Moore, On the use of HTTP as a Substrate for Other
Protocols. Internet Draft (Work in Progress) Available from
ftp://ftp.nordu.net/internet-drafts/draft-iesg-using-http-00.txt
[21] Brisco, T. DNS Support for Load Balancing. RFC1794. Available
from http://www.rfc-editor.org/rfc/rfc1794.txt
[22] Cerpa, et al. Transparent Proxy Agent Control Protocol.
Internet Draft. Available from http://www.ietf.org/internet-
drafts/draft-ietf-wrec-tpact-00.txt
[23] Goutard, et al. Pre-filling a cache - A satellite overview.
Internet Draft. Available from http://www.ietf.org/internet-drafts/
draft-lovric-francetelecom-satellites-00.txt
12. Authors' Addresses
Ingrid Melve
UNINETT
Tempeveien 22, Trondheim, NORWAY
Phone: +47 73 55 79 07
Email: Ingrid.Melve@uninett.no
Gary Tomlinson
Novell, Inc.
122 East 1700 South
Provo, Utah 84606 USA
Phone: +1 801 861 7021
Email: garyt@novell.com
Ian Cooper
Mirror Image Internet, Inc.
18 Commerce Way, Suite 4800
Woburn, MA 01801 USA
Phone: +1 800 353 2923
Email: ian@mirror-image.com
Melve, Tomlinson, Cooper [Page 29]