Network Working Group Jeffrey Mogul, Compaq WRL,
Internet-Draft Fred Douglis, AT&T,
Expires: 25 February 2001 Daniel Hellerstein, ERS/USDA
24 August 2000
HTTP Delta Clusters and Templates
draft-mogul-http-dcluster-00.txt
STATUS OF THIS MEMO
This document is an Internet-Draft and is in full
conformance with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by
other documents at any time. It is inappropriate to use
Internet-Drafts as reference material or to cite them other
than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be
accessed at http://www.ietf.org/shadow.html.
Distribution of this document is unlimited. Please send
comments to the authors.
ABSTRACT
HTTP "Delta encoding," the transmission of a compact
encoding of the change between instances of a Web resource
instead of retransmitting the entire new value, has been
shown to be of potential value. Research has shown
additional benefits if deltas can be computed between
instances of different resources. This document describes
a compatible extension to HTTP delta encoding to support
"clustering", where multiple resources (URLs) are treated
as a pool, and the use of "templates", where a large set of
resource instances are most naturally described as deltas
from a chosen template resource.
Mogul et al. [Page 1]
Internet-Draft Delta clustering 24 August 2000 16:15
TABLE OF CONTENTS
1 Introduction 3
1.1 Related research and proposals 4
2 Terminology 5
3 Delta-encoding and clustering 6
4 Use of templates 8
5 Specification 11
5.1 Modified basic requirements for delta-encoded responses 11
5.2 Modified header specifications 12
5.2.1 A-IM 12
5.3 New header specifications 12
5.3.1 DCluster 12
5.3.2 DTemplate 13
5.4 Rules for determining base instances in a uniqueness scope 13
6 Security Considerations 15
6.1 Spoofing attacks using the DCluster header 15
6.2 Privacy attacks using the DCluster header 17
6.3 Data leakage attacks using the DCluster header 18
7 History 18
7.1 draft-mogul-http-dcluster-00.txt 18
8 Acknowledgements 18
9 References 18
10 Authors' addresses 20
Mogul et al. [Page 2]
Internet-Draft Delta clustering 24 August 2000 16:15
1 Introduction
WARNING: THIS SPECIFICATION WILL CHANGE. DO NOT DEPLOY
ANY IMPLEMENTATIONS BASED ON THIS SPECIFICATION.
The World Wide Web is a distributed system, and so often benefits
from caching to reduce retrieval delays. Retrieval of a Web resource
(such as document, image, icon, or applet) over the Internet or other
wide-area network usually takes enough time that the delay is over
the human threshold of perception. Often, that delay is measured in
seconds. Caching can often eliminate or significantly reduce
retrieval delays.
Many Web resources change over time, so a practical caching approach
must include a coherency mechanism, to avoid presenting stale
information to the user. Originally, the Hypertext Transfer Protocol
(HTTP) provided little support for caching, but under operational
pressures, it quickly evolved to support a simple mechanism for
maintaining cache coherency.
In HTTP/1.0 [2], the server may supply a ``last-modified'' timestamp
with a response. If a client stores this response in a cache entry,
and then later wishes to re-use the response, it may transmit a
request message with an ``If-modified-since'' field containing that
timestamp; this is known as a conditional retrieval. Upon receiving
a conditional request, the server may either reply with a full
response, or, if the resource has not changed, it may send an
abbreviated reply, indicating that the client's cache entry is still
valid. HTTP/1.0 also includes a means for the server to indicate,
via an ``Expires'' timestamp, that a response will be valid until
that time; if so, a client may use a cached copy of the response
until that time, without first validating it using a conditional
retrieval.
HTTP/1.1 [6] adds many new features to improve cache coherency and
performance. However, it preserves the all-or-none model for
responses to conditional retrievals: either the server indicates that
the resource value has not changed at all, or it must transmit the
entire current value.
Common sense suggests (and traces confirm), however, that even when a
Web resource does change, the new instance is often substantially
similar to the old one. If the difference, or ``delta'', between the
two instances could be sent to the client instead of the entire new
instance, a client holding a cached copy of the old instance could
apply the delta to construct the new version. In a world of finite
bandwidth, the reduction in response size and delay could be
significant.
One can think of deltas as a way to squeeze as much benefit as
possible from client and proxy caches. Rather than treating an
Mogul et al. [Page 3]
Internet-Draft Delta clustering 24 August 2000 16:15
entire response as the ``cache line,'' with deltas we can treat
arbitrary pieces of a cached response as the replaceable unit, and
avoid transferring pieces that have not changed.
A separate document [8] specifies a set of compatible extensions to
HTTP/1.1 that allow clients and servers to use delta encoding with
minimal overhead. That mechanism only supports deltas between
instances of a single resource.
This document specifies further extensions to the delta encoding
mechanism. These extensions allow deltas to be computed between
instances of different resources. This increases the likelihood that
a compact delta might be found to encode the current instance of a
requested resource.
We assume that the reader is familiar with the HTTP/1.1
specification, and with the delta encoding specification.
1.1 Related research and proposals
The WebExpress project [7] appears to be the first published
description of an implementation of delta encoding for HTTP (which
they call ``differencing''). WebExpress is aimed specifically at
wireless environments, and includes a number of orthogonal
optimizations. Also, the WebExpress design does not propose changing
the HTTP protocol itself, but rather uses a pair of interposed
proxies to convert the HTTP message stream into an optimized form.
The results reported for WebExpress differencing are impressive, but
are limited to a few selected benchmarks.
The WebExpress paper also pointed out that in many cases, the
individual responses to different queries with the same ``URL
prefix'' (that is, the prefix of the URL before the ``?'' character)
are often similar enough to make delta encoding effective. Since
users frequently make numerous different queries using the same URL
prefix, it might be much more effective to compute deltas between
different queries for a given URL prefix, rather than simply between
different queries using an identical URL. Banga et al. [1] make a
similar observation. A 1997 trace-based study [9] showed that this
approach has significant potential for improving the bandwidth
requirements. The "clustering" mechanism described in this
specification is intended to support the use of delta encoding in
contexts where the delta is computed between two different URLs.
The WebExpress project [7] adopted the concept of a designated ``base
object'', rather than simply relying on a prefix-matching mechanism.
WebExpress included a mechanism for ``rebasing'' a client (providing
it with a new base object). The "templates" mechanism described in
this specification supports a very similar approach.
The approaches described above, and in this specification, operate
independent of the syntax and semantics of the data being transferred
Mogul et al. [Page 4]
Internet-Draft Delta clustering 24 August 2000 16:15
(although delta encoding algorithms for images may require some
specialization). They function by decomposing responses at the bit
or byte level into currently-cached and need-to-be-transferred
components. One can also do this decomposition at a higher level.
Douglis et al. [5] describe an "HTML macro" mechanism, in which a set
of similar HTML pages is decomposed into a constant component (akin
to a macro body) and a variable component (akin to macro arguments).
In many cases, the variable component can be quite small; this means
once the constant component is in a cache, references to similar
pages require fetching only the small variable component, at a
significant cost savings over transferring a monolithic response.
The main drawback to the HTML macro approach is that it requires
direct involvement by the designer (or software) when generating the
Web pages, including some careful attention to the decomposition of a
set of similar pages. It might also require some additional
language-level standardization, although this perhaps could be
obviated through the use of Java-based macros. Therefore, support
for HTML macros is beyond the scope of this specification.
2 Terminology
HTTP/1.1 [6] defines the following terms:
resource A network data object or service that can be
identified by a URI, as defined in section 3.2.
Resources may be available in multiple
representations (e.g. multiple languages, data
formats, size, resolutions) or vary in other ways.
entity The information transferred as the payload of a
request or response. An entity consists of
metainformation in the form of entity-header fields
and content in the form of an entity-body, as
described in section 7.
variant A resource may have one, or more than one,
representation(s) associated with it at any given
instant. Each of these representations is termed a
`variant.' Use of the term `variant' does not
necessarily imply that the resource is subject to
content negotiation.
The specification for delta encoding [8] defined these additional
terms:
instance The entity that would be returned in a status-200
response to a GET request, at the current time, for
the selected variant of the specified resource, with
the application of zero or more content-codings, but
Mogul et al. [Page 5]
Internet-Draft Delta clustering 24 August 2000 16:15
without the application of any instance manipulations
or transfer-codings.
instance manipulation
An operation on one or more instances which may
result in an instance being conveyed from server to
client in parts, or in more than one response
message. For example, a range selection or a delta
encoding. Instance manipulations are end-to-end, and
often involve the use of a cache at the client.
See that specification for further discussion of those terms.
For the extensions specified in this document, we define one more
term:
uniqueness scope
The uniqueness scope of an entity tag is the set of
resources across which this entity tag is unique for
all time. That is, within this set of resources, if
two instances share an entity tag, then the values of
these instances (including their instance bodies and
their instance headers) are equal.
In unmodified HTTP/1.1, the uniqueness scope of an entity tag is
always a single resource. In this proposal, we provide a means to
extend the uniqueness scope to include multiple resources.
3 Delta-encoding and clustering
The basic delta-encoding model assumes that deltas are computed
between two instances of a specific resources; i.e., both deltas are
associated with a single URL. However, the WebExpress project [7]
suggested that by treating a query URL (that is, a URL with an
embedded ``?'') as a prefix followed by a set of parameters, one
could then profitably compute deltas between resource values whose
URLs have identical prefixes, but perhaps different parameters
(suffixes). Our trace-based study confirmed this [10]. We believe
that this might be generalized to certain other patterns of URLs
(i.e., not just those using ``?'' as a separator). We use the term
``clustering'' for this approach.
For example, if a client has cached a response for a DEC stock quote
(``http://quote.yahoo.com/q?s=DEC&d=f''), and then requests a quote
for AT&T from the same server (``http://quote.yahoo.com/q?s=T&d=f''),
the prefix for the cluster would be ``http://quote.yahoo.com/q?''.
In order to support clustering, we need a mechanism for the server to
indicate to the client which URLs are eligible for clustering (since
it would be highly inefficient for the client to send the entity tags
of every resource in its cache on every request).
Mogul et al. [Page 6]
Internet-Draft Delta clustering 24 August 2000 16:15
We propose a new, optional response header for this purpose, to
specify a URL-prefix for other resources that ``cluster'' with the
given response. The header name is ``DCluster''.
Once a cluster-eligible response is cached, when the client is about
to make a subsequent request, it would match the request-URI against
all of the URL-prefixes in its cache. (As specified in section
5.3.1, only cache entries received after the matching DCluster header
are eligible.) The ``If-None-Match'' field in its request could then
list the entity tags for all of the matching entries. In some cases,
it might be more efficient to list only a subset (such as the most
recently received cache entries), to avoid excessive request header
lengths.
For example, if a client makes this initial request:
GET /foo?p=1 HTTP/1.1
Host: bar.example.net
and receives this response:
HTTP/1.1 200 OK
Date: Sun, 06 Nov 1994 08:49:37 GMT
Etag: "abc"
DCluster: "//bar.example.net/foo?"
then when the client later makes a request for
``http://bar.example.net/foo?p=2'', it can match the stored cluster
prefix in its cache, and generate this request:
GET /foo?p=2 HTTP/1.1
Host: bar.example.net
If-None-Match: "abc"
A-IM: vcdiff
As a generalization, the DCluster header field may include multiple
URL-prefixes, to allow specification of a set of URIs that do not
share a single common prefix.
In order to use this approach to clustering, we need to impose one
important constraint. HTTP/1.1 requires so-called ``strong'' entity
tags to be unique for a given URI, but does not impose any broader
requirements on the uniqueness of entity tags. However, if a server
sends a ``DCluster'' header, this implies that the entity tag in the
response is unique not only for the Request-URI, but also for all
URIs for which the string given by ``DCluster'' is a prefix.
We call this set of URIs the ``uniqueness scope'' of the entity tag.
Note that a response might carry multiple ``DCluster'' header fields
(or, by the basic HTTP syntax rules, one such header field with a
comma-separated list of prefix strings). This means that the
Mogul et al. [Page 7]
Internet-Draft Delta clustering 24 August 2000 16:15
uniqueness scope is the union of the scopes specified by the set of
prefixes, plus the original Request-URI. Because the URI in a
``DCluster'' header field can be an absolute URI (i.e., contain a
host name), a uniqueness scope can span multiple servers.
Presumably, these servers have some out-of-band means to maintain the
uniqueness property.
A client making a request may have cache entries for many different
resources in the uniqueness scope of the Request-URI. This is
another situation where the ability of ``If-None-Match'' to carry
multiple entity tags is employed. Abstractly, when the client makes
a request for which it wants a delta-encoded response, it finds all
of its cache entries in the same uniqueness scope, then sends the
entity tags for these cache entries in an ``If-None-Match'' header.
It would not make sense to have an extremely broad uniqueness scope
(i.e., one that includes large numbers of resources), because this
would imply that a client that has cache entries for many of those
files would send lots of entity-tags in its request for a delta.
This would bloat the request message, obviating the transfer-time
reduction of the delta encoding. Therefore, in actual use, the
``DCluster'' header field value should represent not the entire
uniqueness scope, but a subset of the uniqueness scope that is most
likely to result in small deltas.
Client implementations, however, should be prepared to prune their
``If-None-Match'' headers in case a server inadvertently (or
maliciously) specifies an over-broad uniqueness scope.
Server implementation that support clustering should minimize the
length of the entity tags that they generate, consistent with the
other requirements for entity tags, since the effect of overlong
entity on request-header size is potentially multiplied many times by
the use of clustering.
Note that the ``DCluster'' header can be used in a potential spoofing
attack. This attack, and defenses against it, are discussed in
section 6.1.
4 Use of templates
The model of delta encoding outlined so far requires the server to
compute a delta between the current instance of the resource and some
previous instance of that resource, or (if clustering is used) a
previous instance of some other resource. This means that the base
instance is, in effect, a moving target, since we do not want to
require servers or clients to retain old instances for indefinite
periods.
Mogul et al. [Page 8]
Internet-Draft Delta clustering 24 August 2000 16:15
Douglis et al. describe an approach to dynamically-generated
documents in which the document is broken down into separate static
and dynamic parts [5]. The static part is a macro with unbound
variables, and the dynamic part is a set of bindings between
variables and specific values. In their mechanism, the client
retains the static part, called a ``template'' in its cache. It
repeatedly requests, as needed, a new instance of the dynamic part,
and then reevaluates the template macro, with its variables bound as
specified in the dynamic part, in order to generate the current
instance of the entire document. Their macro language is an
extension to HTML, although other languages (such as Java) might be
just as suitable.
The WebExpress project [7] adopted the concept of a designated ``base
object'', which is nearly identical to the template concept described
here. WebExpress included a mechanism for ``rebasing'' a client
(providing it with a new base object). The primary difference
between the WebExpress approach and our approach is the time at which
a client discovers the identity of a (possibly new) template.
We can apply a similar template-based mechanism to substantially
simplify the use of delta encoding. In this approach, the server
``computes'' the delta between the current instance of a resource,
and a separately-identified template resource. (Depending on the
encoding format, it might be possible to generate the delta directly,
rather than generating the current instance and then computing a
delta.) The client then applies the delta to the template resource,
rather than to a previous instance of the requested resource.
Since this approach avoids the need to retain old instances of the
dynamic resource at either the client or the server, it greatly
simplifies the implementation and optimization of base instance
management at both client and server. However, it requires a new
mechanism to inform the client of the appropriate template resource,
and its success may depend on the proper construction of the
template.
To support template-base deltas, therefore, we define a new response
header that the origin server uses as a ``hint'' to inform a client
of the URI of the template resource. For example, if the client
request is
GET /foo.html HTTP/1.1
Host: bar.example.net
A-IM: vcdiff
the server might send:
HTTP/1.1 200 OK
Date: Sun, 06 Nov 1994 08:49:37 GMT
Mogul et al. [Page 9]
Internet-Draft Delta clustering 24 August 2000 16:15
Etag: "abc"
DTemplate: "http://bar.example.net/foo.tplt"
The implication of the DTemplate header is that, on subsequent
requests for http://bar.example.net/foo.html, the client should ask
for a delta between http://bar.example.net/foo.tplt and the current
instance. This means, of course, that the client would first have
obtained and cached an instance of http://bar.example.net/foo.tplt.
The client might retrieve the template either on demand (i.e., just
before making the new request for foo.html), or during an otherwise
idle moment, or not at all (since the use of deltas is fully
optional).
The DTemplate header implies that the specified URL is within the
uniqueness scope of the Request-URI (or else it would not be
meaningful to ask for a delta between the template and the
Request-URI). For example, if the client requests the template:
GET /foo.tplt HTTP/1.1
Host: bar.example.net
and receives the response:
HTTP/1.1 200 OK
Date: Sun, 06 Nov 1994 08:49:47 GMT
Etag: "pqr"
then the client can make a subsequent request for foo.html as:
GET /foo.html HTTP/1.1
Host: bar.example.net
If-None-match: "pqr"
A-IM: vcdiff
Alternatively, the DTemplate header field can be used to specify that
a specific instance of a resource (rather than any available
instance) be used as a template, by including an entity tag in the
header field. For example:
HTTP/1.1 200 OK
Date: Sun, 06 Nov 1994 08:49:37 GMT
Etag: "abc"
DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr"
This form of the header further simplifies the instance-management
problem, by eliminating any ambiguity about which instances are worth
saving. It might, however, reduce the possibilities for delta
encoding.
Finally, the DTemplate and DCluster headers can be combined. For
example:
Mogul et al. [Page 10]
Internet-Draft Delta clustering 24 August 2000 16:15
HTTP/1.1 200 OK
Date: Sun, 06 Nov 1994 08:49:37 GMT
Etag: "abc"
DTemplate: "http://bar.example.net/foo.tplt"
DCluster: "//bar.example.net/foo?"
This means that for any Request-URI matching the prefix specified in
the DCluster header field, the URI specified in the DTemplate field
is an appropriate template.
Note that an origin server ought not necessarily send a DTemplate
header field on every response; doing so could waste network
bandwidth, if the recipient is not delta-capable. Instead, the
server should employ heuristics to decide whether to send this header
field. For example, it might be worth sending it whenever the
client's request message indicates its willingness to accept a
delta-encoded response, and when the If-None-Match field in the
request does not already specify the entity-tag of the template
resource.
5 Specification
In this specification, the The key words "MUST", "MUST NOT",
"SHOULD", "SHOULD NOT", and "MAY" document are to be interpreted as
described in RFC2119 [4].
5.1 Modified basic requirements for delta-encoded responses
The basic requirements for delta-encoded responses, specified in [8],
are modified for servers that support the DCluster and/or DTemplate
header fields.
A server MAY send a delta-encoded response if:
1. The server would be able to send a 200 (OK) response for
the request.
2. The client's request includes an A-IM header field listing
at least one delta-coding.
3. The client's request includes an If-None-Match header
field listing at least one valid entity tag for an
instance (a "base instance") of at least one of:
a. the Request-URI.
b. a different URI within the uniqueness scope of the
Request-URI.
c. a URI that matches a uri-prefix in a DTemplate
header field that was sent in a response for a URI
within the uniqueness scope of the Request-URI.
Mogul et al. [Page 11]
Internet-Draft Delta clustering 24 August 2000 16:15
XXX Anything else?
5.2 Modified header specifications
One of the headers defined in the specification for delta
encoding [8] has a slightly different meaning when delta clustering
or delta templates are used.
5.2.1 A-IM
When an A-IM request-header field includes one or more delta-coding
values, the request MUST contain an If-None-Match header field,
listing one or more entity tags from URIs in the uniqueness scope of
an entity tag from a prior response for the request-URI.
Section 5.4 defines rules that a client uses for determining the set
of base instances in the uniqueness scope of a request-URI.
5.3 New header specifications
The following headers are defined, for use as entity-headers. (Due
to the terminological confusion discussed in [8], some entity-headers
are more properly associated with instances than with entities.)
5.3.1 DCluster
The DCluster entity-header field is used in a response to specify a
subset of the uniqueness scope of the entity tag given in the Etag
header field of the response. The uniqueness scope is the set of
URIs across which this strong entity tag is guaranteed to be unique,
for all time. A uniqueness scope is specified by providing one or
more prefixes for other URIs in the set.
DCluster = "DCluster" ":" #( <"> uri-prefix <">)
uri-prefix = scheme ":" "//" host [ ":" port ] [ abs_path ]
| abs_path
| rel_path
If the uri-prefix is an abs_path or rel_path, the implied scheme is
the scheme used in the Request-URI. (Typically, the scheme would be
"http".) If the uri-prefix is an abs_path, it is interpreted
relative to the origin server host name. If the uri-prefix is a
rel_path, it is interpreted relative to the Request-URI.
The uniqueness scope of a strong entity tag in an ETag header field
always includes the Request-URI of the corresponding request, and the
union of all URIs matching one or more of the uri-prefix strings in
the DCluster header field of the response. It may include other URIs
not described in a DCluster header field. That is, the set of URIs
for which a uri-prefix in a DCluster header field is a prefix MUST be
a subset of the uniqueness scope, and MAY be a proper subset.
Generally, the DCluster header does not necessarily describe the
entire uniqueness scope of an entity tag. Rather, it describes a
subset of the uniqueness scope whose members are likely to differ by
small deltas.
Mogul et al. [Page 12]
Internet-Draft Delta clustering 24 August 2000 16:15
A server SHOULD NOT include a uri-prefix in a DCluster header field
if the server is not likely to be able to generate deltas between the
Request-URI and the URIs matching that uri-prefix.
The uniqueness scope specified by a DCluster header is valid for use
by the client only for entity tags received in the same response or
in subsequent responses, never for entity tags received in previous
responses.
Section 5.4 defines rules that a client uses for determining the set
of base instances in the uniqueness scope of a request-URI.
5.3.2 DTemplate
The DTemplate entity-header field is used in a response to specify
another resource that the origin server prefers to use as the base
instance for computing deltas for the Request-URI, or for other
resources in the uniqueness scope specified by a DCluster header
field in the response.
DTemplate = "DTemplate" ":"
#( <"> dt-uri <"> [ "/" dt-param])
dt-uri = absoluteURI | abs_path
dt-param = "etag" "=" entity-tag
If the dt-uri is an abs_path, it is interpreted relative to the
origin server host name.
A URI specified in a DTemplate header field is, by definition, in the
uniqueness scope of the Request-URI.
If a client has received a DTemplate header field within a given
uniqueness scope, the client SHOULD use an instance of the specified
template resource(s) as the base instance for any future delta
requests for other resources in the uniqueness scope.
If the DTemplate header field includes an entity tag with a URI, then
the client SHOULD use only the specified instance of the template
resource base instance for any future delta requests for other
resources in the uniqueness scope.
The URI specified by a DTemplate header is valid for use by the
client only with entity tags received in the same response or in
subsequent responses, never for use with entity tags received in
previous responses.
5.4 Rules for determining base instances in a uniqueness scope
When a client is about to make a request for a given Request-URI, and
wishes to choose entity tags to the request's If-None-Match header
field, it follows a set of rules to determine which base instances
(and hence, which entity tags) may be included. These rules do not
require the client to include any entity tags, and for reasons of
Mogul et al. [Page 13]
Internet-Draft Delta clustering 24 August 2000 16:15
performance, a client implementation should not necessarily include
all of the legal choices.
Recall that the uniqueness scope of an entity tag is the set of
resources across which this entity tag is unique for all time. In
other words, if the client and server correctly agree that the
Request-URI is contained in the uniqueness scope for an entity tag E
for some URI X, then if the client sends this entity tag E in an
If-None-Match header field, the server will know unambiguously which
resource it refers to (even though X is not explicitly named in the
request).
The client's view of the uniqueness scope of an entity tag might be a
subset of the server's view. (It cannot be a superset, or the server
would be unable to interpret the If-None-Match field.) For example,
a server might not list all possible uri-prefix values in a DCluster
header, for performance reasons, or the client might not support the
DTemplate header. A client probably will not have received responses
for more than a small subset of the URIs in a uniqueness scope, or it
might have deleted some of the instances in order to create space in
its cache. A client SHOULD NOT list an entity tag in an
If-None-Match header unless it has a cache entry containing at least
part of the corresponding instance, since this would otherwise lead
to uninterpretable delta responses.
A Request-URI is in the uniqueness scope of an entity tag E for an
instance of URI X if one or more of these conditions holds:
1. X is the Request-URI.
2. The DCluster header field of a prior response for the
Request-URI includes a prefix of X. The base instance
associated with entity tag E MUST NOT have been received
before the first such DCluster header field.
3. The DCluster header field of a prior response for X
includes a prefix of the Request-URI. The base instance
associated with entity tag E MUST NOT have been received
before the first such DCluster header field.
4. X has been listed in the DTemplate header field of a prior
response for the Request-URI, or of a prior response for
another URI Y in the uniqueness scope of the Request-URI
(by recursive application of these conditions to an
instance of URI Y).
XXX Is this unambiguous?
Security considerations (see section 6.1) require that a client not
always trust every DCluster header that it receives. A malicious
server might send a DCluster header that could cause the client to
Mogul et al. [Page 14]
Internet-Draft Delta clustering 24 August 2000 16:15
believe that a URI is within the uniqueness scope of an entity tag
when, in fact, it is not. Therefore, a client MUST NOT use condition
#3 above (DCluster of a prior response for X includes prefix of
Request-URI) unless it can securely verify that a resulting delta is
not spoofed.
Our current belief is that spoofing can be detected by any one of the
following means:
- The delta-encoded response is accompanied by a secure
message digest covering the entire current instance,
generated by the origin server. This allows the client to
verify that it has received the current instance of the
Request-URI.
- All of the URIs in the uniqueness scope of the Request-URI
have the same "hostport" as the Request-URI; see
RFC2396 [3] for the specification of this term. This
ensures that, if no interception mechanism is in use, that
the client receives what the server wishes it to receive.
(In general, malicious interception mechanisms create
broader risks than the spoofing of deltas.)
- All of base instances associated with the entity tags
listed in the client's A-IM header came from URIs listed in
DCluster or DTemplate headers in responses for prior
Request-URIs having the same "hostport" as the current
Request-URI. This ensures that the chosen base instances
came from origin servers trusted by the origin server for
the current Request-URI.
Note: the spoofing detection mechanisms listed above should be
reviewed by competent security experts.
6 Security Considerations
Note: This aspect of the specification is the subject of some
controversy, and the details of protections against spoofing
attacks in particular are likely to change. We will seek a
more formal security review of this specification as part of
the IETF standardization process.
6.1 Spoofing attacks using the DCluster header
We have identified a potential spoofing attack via the ``DCluster''
header. In this scenario, a malicious server (e.g.,
malicious.example.org) generates a response (e.g., for
http://malicious.example.org/trap.html) with a ``DCluster'' header
indicating that the uniqueness scope of the entity tag in the
response includes another server (e.g., victim.example.com). Suppose
that the response includes the entity tag "abc". Now suppose that
the client makes this request:
Mogul et al. [Page 15]
Internet-Draft Delta clustering 24 August 2000 16:15
GET /foo.html HTTP/1.1
host: victim.example.com
If-None-Match: "abc"
A-IM: vcdiff
If the victim.example.com server does actually have an instance with
entity tag "abc", either for http://victim.example.com/foo.html or
for a resource that really is in the same uniqueness scope, then the
server will generate a delta. However, if the client applies this
delta to the cached response for
http://malicious.example.org/trap.html, it will end up either with
garbage, or (more perniciously) with an apparently genuine result
that actually contains bogus information inserted by
malicious.example.org. (The response for
http://malicious.example.org/trap.html might contain the bogus
information concealed in HTML comments.)
Protection against this attack can be accomplished by the use of
end-to-end digests on the instances, as described in another
proposal [11]. (Message digests, such as provided by ``Content-MD5''
or by Digest Authentication, are not sufficient, since none of the
individual messages are tampered with in this attack.)
Note that protection against spoofing via the ``DCluster'' header
does not inherently require a keyed digest. Since the delta encoded
response for http://victim.example.com/foo.html is not itself
generated by malicious.example.org, an end-to-end digest included
with this response by victim.example.com is sufficient to prove that
the client's reconstruction of foo.html is correct. However, if
message tampering is also a possibility, then the server should also
provide a keyed message digest.
Another defense against such an attack is for the client to ignore a
``DCluster'' header that specifies a different server. However, this
defense is only effective if servers that generate delta-encoded
responses are not shared among multiple, possibly mutually
untrustworthy, content providers. It also reduces the potential
effectiveness of clustering, especially for large sites split across
multiple servers.
Note that because the DTemplate header field also adds one or more
URIs to the uniqueness scope of an entity tag, the same spoofing
attack is possible using the DTemplate header, and the same defenses
apply.
We recommend that if a client receives a delta-encoded response
without an accompanying Digest, and if the client's view of the
uniqueness scope for the Request-URI includes more than one server
hostname, then the response should either be discarded, or presented
to the user as potentially corrupt.
Mogul et al. [Page 16]
Internet-Draft Delta clustering 24 August 2000 16:15
6.2 Privacy attacks using the DCluster header
Many people have drawn attention to the privacy risks associated with
HTTP Cookies, which allow a site (or group of cooperating sites) to
track the activity of a user. More recently, Martin Pool has
identified a similar tracking mechanism based on cache validators,
especially entity tags [12]. In this attack, a site encodes
user-specific information in an entity tag, and then tracks repeated
requests by that user to the same resource, as the user's browser
attempts to validate its cache entry using that entity tag.
Although this tracks only the requests for a specific resource (URL),
a site can indirectly track references to many other pages by
embedding an image reference to the tracked URL on each of those
pages.
Just as with Cookies, the entity-tag tracking mechanism depends upon
the server's ability to induce the client to send back a specific
string on subsequent requests. However, the basic entity-tag
tracking mechanism only allows a site to track access to pages that
it controls.
The ``DCluster'' header field specified in this document makes this
tracking mechanism more powerful, by allowing one site to gain access
to entity tags from many other sites. For example, suppose that the
site evil.example.com knows the format used to encode client-specific
information in entity tags issued by the site naive.example.com. Any
client who visits http://evil.example.com/home.html and receives a
DCluster: http://naive.example.com/
header in response might then later make a delta-capable request to
evil.example.com that includes entity tags issued by
naive.example.com.
It might be possible to defend against such ``hijacked'' tracking
attacks by chosing a cryptographically strong encoding for the
client-specific data hidden in entity tags, but this might not always
be feasible. In any event, this could not hide from evil.example.com
the fact that the client had at some point visited naive.example.com
(which could be significant if this site provided, for example,
medical information about an embarrassing disease).
Cryptographic digests of instances, as described in section 6.1 to
protect against DCluster spoofing, do not help, because the malicious
site in this case is the source of the requested data, and need not
actually use a delta encoding to accomplish its attack.
As in section 6.1, one possible defense is for the client to ignore a
``DCluster'' header that specifies a different server, but (also as
discussed in section 6.1) this is not ideal.
Mogul et al. [Page 17]
Internet-Draft Delta clustering 24 August 2000 16:15
User agents SHOULD provide a method to allow users to disable the use
of the ``DCluster'' header, preferably either in all cases, or in
cross-site cases.
6.3 Data leakage attacks using the DCluster header
Suppose that a server has asserted, using a DCluster header, that
resources URL1 and URL2 are in the same uniqueness scope. Also
suppose that a client is allowed to access URL1, but is not allowed
to access URL2. (Access may be denied due to a lack of
authentication, or a server configuration setting, or some other
mechanism.) Finally, suppose that the client can guess or obtain the
entity tag ET2 of some instance of URL2. If the client asks the
server for the current instance of URL1 as a delta from the ET2
instance of URL2, and the server responds with such a delta, this may
reveal information about the contents of URL2. (The amount of
information revealed depends strongly on the delta-coding format, and
probably will not be enough to recover the full contents of URL2.)
A server MUST NOT reply using a delta encoding, if the chosen base
instance is not an instance of the Request-URI, unless the server can
verify that the client would currently be allowed access to both the
chosen base instance and the Request-URI.
7 History
7.1 draft-mogul-http-dcluster-00.txt
This document was split off from draft-mogul-http-delta-*.txt, to
avoid having the security issues affect the basic HTTP delta encoding
specification, and to ensure that the design of clusters and
templates was done so that they are entirely optional for
implementors of basic delta encoding.
8 Acknowledgements
Andrew Birrell alerted us to the possibility of data leakage attacks
using the DCluster header. Koen Holtman contributed to the drafting
of this document, and especially to the security considerations and
mechanisms.
9 References
NOTE TO RFC EDITOR: many of the references here might be out of date.
Please verify these with the primary author of this Internet-Draft
before issuing this document as an RFC.
1. Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic
Deltas for WWW Latency Reduction. Proc. 1997 USENIX Technical
Conference, Anaheim, CA, January, 1997, pp. 289-303.
Mogul et al. [Page 18]
Internet-Draft Delta clustering 24 August 2000 16:15
2. T. Berners-Lee, R. Fielding, and H. Frystyk. Hypertext Transfer
Protocol -- HTTP/1.0. RFC 1945, HTTP Working Group, May, 1996.
3. T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource
Identifiers (URI): Generic Syntax. RFC 2396, IETF, August, 1998.
4. S. Bradner. Key words for use in RFCs to Indicate Requirement
Levels. RFC 2119, Harvard University, March, 1997.
5. Fred Douglis, Antonio Haro, and Michael Rabinovich. HPP: HTML
Macro-Preprocessing to Support Dynamic Document Caching. Proc.
USENIX Symposium on Internet Technologies and Systems, USENIX,
Monterey, CA, December, 1997, pp. 83-94.
6. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
Nielsen, Larry Masinter, Paul Leach, and Tim Berners-Lee. Hypertext
Transfer Protocol -- HTTP/1.1. RFC 2616, HTTP Working Group, June,
1999.
7. Barron C. Housel and David B. Lindquist. WebExpress: A System
for Optimizing Web Browsing in a Wireless Environment. Proc. 2nd
Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye, New
York, November, 1996, pp. 108-116.
http://www.networking.ibm.com/art/artwewp.htm.
8. Jeffrey C. Mogul, Balachander Krishnamurthy, Fred Douglis, Anja
Feldmann, Yaron Goland, and Arthur van Hoff. Delta encoding in HTTP.
Internet-Draft draft-mogul-http-delta-06, IETF, August, 2000. This is
a work in progress.
9. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
Krishnamurthy. Potential benefits of delta encoding and data
compression for HTTP. Proc. SIGCOMM '97, Cannes, France, September,
1997, pp. 181-194.
10. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
Krishnamurthy. Potential benefits of delta encoding and data
compression for HTTP. Research Report 97/4, DECWRL, July, 1997. URL
http://www.research.digital.com/wrl/techreports/abstracts/97.4.html.
11. Jeffrey C. Mogul and Arthur Van Hoff. Instance Digests in HTTP.
Internet-Draft draft-mogul-http-digest-02, IETF, March, 2000. This is
a work in progress.
12. Martin Pool. meantime: non-consensual http user tracking using
caches. http://www.linuxcare.com.au/mbp/meantime/.
Mogul et al. [Page 19]
Internet-Draft Delta clustering 24 August 2000 16:15
10 Authors' addresses
Jeffrey C. Mogul
Western Research Laboratory
Compaq Computer Corporation
250 University Avenue
Palo Alto, California, 94305, U.S.A.
Email: mogul@pa.dec.com
Phone: 1 650 617 3304 (email preferred)
Fred Douglis
AT&T Labs - Research
180 Park Ave, Room B-137
Florham Park, NJ 07932-0971, U.S.A.
Email: douglis@research.att.com
Phone: 1 973 360-8775
Daniel M. Hellerstein
Economic Research Service, USDA
1909 Franwall Ave, Wheaton MD 20902
E-mail: danielh@crosslink.net or webmaster@srehttp.org
Phone: 1 202 694-5613 or 1 301 649-4728
Mogul et al. [Page 20]