[Search] [txt|pdfized|bibtex] [Tracker] [Email] [Nits]
Versions: 00                                                            
Network Working Group                         Jeffrey Mogul, Compaq WRL,
Internet-Draft                                       Fred Douglis, AT&T,
Expires: 25 February 2001                   Daniel Hellerstein, ERS/USDA
                                                          24 August 2000



                   HTTP Delta Clusters and Templates

                    draft-mogul-http-dcluster-00.txt


STATUS OF THIS MEMO

        This document is an Internet-Draft and is in full
        conformance with all provisions of Section 10 of RFC2026.

        Internet-Drafts are working documents of the Internet
        Engineering Task Force (IETF), its areas, and its working
        groups. Note that other groups may also distribute working
        documents as Internet-Drafts.

        Internet-Drafts are draft documents valid for a maximum of
        six months and may be updated, replaced, or obsoleted by
        other documents at any time. It is inappropriate to use
        Internet-Drafts as reference material or to cite them other
        than as "work in progress."

        The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt

        The list of Internet-Draft Shadow Directories can be
        accessed at http://www.ietf.org/shadow.html.

        Distribution of this document is unlimited.  Please send
        comments to the authors.


ABSTRACT

        HTTP "Delta encoding," the transmission of a compact
        encoding of the change between instances of a Web resource
        instead of retransmitting the entire new value, has been
        shown to be of potential value.  Research has shown
        additional benefits if deltas can be computed between
        instances of different resources.  This document describes
        a compatible extension to HTTP delta encoding to support
        "clustering", where multiple resources (URLs) are treated
        as a pool, and the use of "templates", where a large set of
        resource instances are most naturally described as deltas
        from a chosen template resource.




Mogul et al.                                                    [Page 1]


Internet-Draft              Delta clustering        24 August 2000 16:15


                           TABLE OF CONTENTS

1 Introduction                                                         3
     1.1 Related research and proposals                                4
2 Terminology                                                          5
3 Delta-encoding and clustering                                        6
4 Use of templates                                                     8
5 Specification                                                       11
     5.1 Modified basic requirements for delta-encoded responses      11
     5.2 Modified header specifications                               12
          5.2.1 A-IM                                                  12
     5.3 New header specifications                                    12
          5.3.1 DCluster                                              12
          5.3.2 DTemplate                                             13
     5.4 Rules for determining base instances in a uniqueness scope   13
6 Security Considerations                                             15
     6.1 Spoofing attacks using the DCluster header                   15
     6.2 Privacy attacks using the DCluster header                    17
     6.3 Data leakage attacks using the DCluster header               18
7 History                                                             18
     7.1 draft-mogul-http-dcluster-00.txt                             18
8 Acknowledgements                                                    18
9 References                                                          18
10 Authors' addresses                                                 20




























Mogul et al.                                                    [Page 2]


Internet-Draft              Delta clustering        24 August 2000 16:15


1 Introduction

          WARNING: THIS SPECIFICATION WILL CHANGE.  DO NOT DEPLOY
       ANY IMPLEMENTATIONS BASED ON THIS SPECIFICATION.

   The World Wide Web is a distributed system, and so often benefits
   from caching to reduce retrieval delays.  Retrieval of a Web resource
   (such as document, image, icon, or applet) over the Internet or other
   wide-area network usually takes enough time that the delay is over
   the human threshold of perception.  Often, that delay is measured in
   seconds.  Caching can often eliminate or significantly reduce
   retrieval delays.

   Many Web resources change over time, so a practical caching approach
   must include a coherency mechanism, to avoid presenting stale
   information to the user.  Originally, the Hypertext Transfer Protocol
   (HTTP) provided little support for caching, but under operational
   pressures, it quickly evolved to support a simple mechanism for
   maintaining cache coherency.

   In HTTP/1.0 [2], the server may supply a ``last-modified'' timestamp
   with a response.  If a client stores this response in a cache entry,
   and then later wishes to re-use the response, it may transmit a
   request message with an ``If-modified-since'' field containing that
   timestamp; this is known as a conditional retrieval.  Upon receiving
   a conditional request, the server may either reply with a full
   response, or, if the resource has not changed, it may send an
   abbreviated reply, indicating that the client's cache entry is still
   valid.  HTTP/1.0 also includes a means for the server to indicate,
   via an ``Expires'' timestamp, that a response will be valid until
   that time; if so, a client may use a cached copy of the response
   until that time, without first validating it using a conditional
   retrieval.

   HTTP/1.1 [6] adds many new features to improve cache coherency and
   performance.  However, it preserves the all-or-none model for
   responses to conditional retrievals: either the server indicates that
   the resource value has not changed at all, or it must transmit the
   entire current value.

   Common sense suggests (and traces confirm), however, that even when a
   Web resource does change, the new instance is often substantially
   similar to the old one.  If the difference, or ``delta'', between the
   two instances could be sent to the client instead of the entire new
   instance, a client holding a cached copy of the old instance could
   apply the delta to construct the new version.  In a world of finite
   bandwidth, the reduction in response size and delay could be
   significant.

   One can think of deltas as a way to squeeze as much benefit as
   possible from client and proxy caches.  Rather than treating an

Mogul et al.                                                    [Page 3]


Internet-Draft              Delta clustering        24 August 2000 16:15


   entire response as the ``cache line,'' with deltas we can treat
   arbitrary pieces of a cached response as the replaceable unit, and
   avoid transferring pieces that have not changed.

   A separate document [8] specifies a set of compatible extensions to
   HTTP/1.1 that allow clients and servers to use delta encoding with
   minimal overhead.  That mechanism only supports deltas between
   instances of a single resource.

   This document specifies further extensions to the delta encoding
   mechanism.  These extensions allow deltas to be computed between
   instances of different resources.  This increases the likelihood that
   a compact delta might be found to encode the current instance of a
   requested resource.

   We assume that the reader is familiar with the HTTP/1.1
   specification, and with the delta encoding specification.

1.1 Related research and proposals
   The WebExpress project [7] appears to be the first published
   description of an implementation of delta encoding for HTTP (which
   they call ``differencing'').  WebExpress is aimed specifically at
   wireless environments, and includes a number of orthogonal
   optimizations.  Also, the WebExpress design does not propose changing
   the HTTP protocol itself, but rather uses a pair of interposed
   proxies to convert the HTTP message stream into an optimized form.
   The results reported for WebExpress differencing are impressive, but
   are limited to a few selected benchmarks.

   The WebExpress paper also pointed out that in many cases, the
   individual responses to different queries with the same ``URL
   prefix'' (that is, the prefix of the URL before the ``?'' character)
   are often similar enough to make delta encoding effective.  Since
   users frequently make numerous different queries using the same URL
   prefix, it might be much more effective to compute deltas between
   different queries for a given URL prefix, rather than simply between
   different queries using an identical URL.  Banga et al. [1] make a
   similar observation.  A 1997 trace-based study [9] showed that this
   approach has significant potential for improving the bandwidth
   requirements.  The "clustering" mechanism described in this
   specification is intended to support the use of delta encoding in
   contexts where the delta is computed between two different URLs.

   The WebExpress project [7] adopted the concept of a designated ``base
   object'', rather than simply relying on a prefix-matching mechanism.
   WebExpress included a mechanism for ``rebasing'' a client (providing
   it with a new base object).  The "templates" mechanism described in
   this specification supports a very similar approach.

   The approaches described above, and in this specification, operate
   independent of the syntax and semantics of the data being transferred

Mogul et al.                                                    [Page 4]


Internet-Draft              Delta clustering        24 August 2000 16:15


   (although delta encoding algorithms for images may require some
   specialization).  They function by decomposing responses at the bit
   or byte level into currently-cached and need-to-be-transferred
   components.  One can also do this decomposition at a higher level.
   Douglis et al. [5] describe an "HTML macro" mechanism, in which a set
   of similar HTML pages is decomposed into a constant component (akin
   to a macro body) and a variable component (akin to macro arguments).
   In many cases, the variable component can be quite small; this means
   once the constant component is in a cache, references to similar
   pages require fetching only the small variable component, at a
   significant cost savings over transferring a monolithic response.

   The main drawback to the HTML macro approach is that it requires
   direct involvement by the designer (or software) when generating the
   Web pages, including some careful attention to the decomposition of a
   set of similar pages.  It might also require some additional
   language-level standardization, although this perhaps could be
   obviated through the use of Java-based macros.  Therefore, support
   for HTML macros is beyond the scope of this specification.


2 Terminology

   HTTP/1.1 [6] defines the following terms:

   resource         A network data object or service that can be
                   identified by a URI, as defined in section 3.2.
                   Resources may be available in multiple
                   representations (e.g. multiple languages, data
                   formats, size, resolutions) or vary in other ways.

   entity           The information transferred as the payload of a
                   request or response.  An entity consists of
                   metainformation in the form of entity-header fields
                   and content in the form of an entity-body, as
                   described in section 7.

   variant          A resource may have one, or more than one,
                   representation(s) associated with it at any given
                   instant. Each of these representations is termed a
                   `variant.' Use of the term `variant' does not
                   necessarily imply that the resource is subject to
                   content negotiation.

   The specification for delta encoding [8] defined these additional
   terms:

   instance         The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, with
                   the application of zero or more content-codings, but

Mogul et al.                                                    [Page 5]


Internet-Draft              Delta clustering        24 August 2000 16:15


                   without the application of any instance manipulations
                   or transfer-codings.

   instance manipulation
                   An operation on one or more instances which may
                   result in an instance being conveyed from server to
                   client in parts, or in more than one response
                   message.  For example, a range selection or a delta
                   encoding.  Instance manipulations are end-to-end, and
                   often involve the use of a cache at the client.

   See that specification for further discussion of those terms.

   For the extensions specified in this document, we define one more
   term:

   uniqueness scope
                   The uniqueness scope of an entity tag is the set of
                   resources across which this entity tag is unique for
                   all time.  That is, within this set of resources, if
                   two instances share an entity tag, then the values of
                   these instances (including their instance bodies and
                   their instance headers) are equal.

   In unmodified HTTP/1.1, the uniqueness scope of an entity tag is
   always a single resource.  In this proposal, we provide a means to
   extend the uniqueness scope to include multiple resources.


3 Delta-encoding and clustering

   The basic delta-encoding model assumes that deltas are computed
   between two instances of a specific resources; i.e., both deltas are
   associated with a single URL.  However, the WebExpress project [7]
   suggested that by treating a query URL (that is, a URL with an
   embedded ``?'')  as a prefix followed by a set of parameters, one
   could then profitably compute deltas between resource values whose
   URLs have identical prefixes, but perhaps different parameters
   (suffixes).  Our trace-based study confirmed this [10].  We believe
   that this might be generalized to certain other patterns of URLs
   (i.e., not just those using ``?'' as a separator).  We use the term
   ``clustering'' for this approach.

   For example, if a client has cached a response for a DEC stock quote
   (``http://quote.yahoo.com/q?s=DEC&d=f''), and then requests a quote
   for AT&T from the same server (``http://quote.yahoo.com/q?s=T&d=f''),
   the prefix for the cluster would be ``http://quote.yahoo.com/q?''.

   In order to support clustering, we need a mechanism for the server to
   indicate to the client which URLs are eligible for clustering (since
   it would be highly inefficient for the client to send the entity tags
   of every resource in its cache on every request).
Mogul et al.                                                    [Page 6]


Internet-Draft              Delta clustering        24 August 2000 16:15


   We propose a new, optional response header for this purpose, to
   specify a URL-prefix for other resources that ``cluster'' with the
   given response.  The header name is ``DCluster''.

   Once a cluster-eligible response is cached, when the client is about
   to make a subsequent request, it would match the request-URI against
   all of the URL-prefixes in its cache.  (As specified in section
   5.3.1, only cache entries received after the matching DCluster header
   are eligible.)  The ``If-None-Match'' field in its request could then
   list the entity tags for all of the matching entries.  In some cases,
   it might be more efficient to list only a subset (such as the most
   recently received cache entries), to avoid excessive request header
   lengths.

   For example, if a client makes this initial request:

      GET /foo?p=1 HTTP/1.1
      Host: bar.example.net

   and receives this response:

      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DCluster: "//bar.example.net/foo?"

   then when the client later makes a request for
   ``http://bar.example.net/foo?p=2'', it can match the stored cluster
   prefix in its cache, and generate this request:

      GET /foo?p=2 HTTP/1.1
      Host: bar.example.net
      If-None-Match: "abc"
      A-IM: vcdiff

   As a generalization, the DCluster header field may include multiple
   URL-prefixes, to allow specification of a set of URIs that do not
   share a single common prefix.

   In order to use this approach to clustering, we need to impose one
   important constraint.  HTTP/1.1 requires so-called ``strong'' entity
   tags to be unique for a given URI, but does not impose any broader
   requirements on the uniqueness of entity tags.  However, if a server
   sends a ``DCluster'' header, this implies that the entity tag in the
   response is unique not only for the Request-URI, but also for all
   URIs for which the string given by ``DCluster'' is a prefix.

   We call this set of URIs the ``uniqueness scope'' of the entity tag.
   Note that a response might carry multiple ``DCluster'' header fields
   (or, by the basic HTTP syntax rules, one such header field with a
   comma-separated list of prefix strings).  This means that the

Mogul et al.                                                    [Page 7]


Internet-Draft              Delta clustering        24 August 2000 16:15


   uniqueness scope is the union of the scopes specified by the set of
   prefixes, plus the original Request-URI.  Because the URI in a
   ``DCluster'' header field can be an absolute URI (i.e., contain a
   host name), a uniqueness scope can span multiple servers.
   Presumably, these servers have some out-of-band means to maintain the
   uniqueness property.

   A client making a request may have cache entries for many different
   resources in the uniqueness scope of the Request-URI.  This is
   another situation where the ability of ``If-None-Match'' to carry
   multiple entity tags is employed.  Abstractly, when the client makes
   a request for which it wants a delta-encoded response, it finds all
   of its cache entries in the same uniqueness scope, then sends the
   entity tags for these cache entries in an ``If-None-Match'' header.

   It would not make sense to have an extremely broad uniqueness scope
   (i.e., one that includes large numbers of resources), because this
   would imply that a client that has cache entries for many of those
   files would send lots of entity-tags in its request for a delta.
   This would bloat the request message, obviating the transfer-time
   reduction of the delta encoding.  Therefore, in actual use, the
   ``DCluster'' header field value should represent not the entire
   uniqueness scope, but a subset of the uniqueness scope that is most
   likely to result in small deltas.

   Client implementations, however, should be prepared to prune their
   ``If-None-Match'' headers in case a server inadvertently (or
   maliciously) specifies an over-broad uniqueness scope.

   Server implementation that support clustering should minimize the
   length of the entity tags that they generate, consistent with the
   other requirements for entity tags, since the effect of overlong
   entity on request-header size is potentially multiplied many times by
   the use of clustering.

   Note that the ``DCluster'' header can be used in a potential spoofing
   attack.  This attack, and defenses against it, are discussed in
   section 6.1.


4 Use of templates

   The model of delta encoding outlined so far requires the server to
   compute a delta between the current instance of the resource and some
   previous instance of that resource, or (if clustering is used) a
   previous instance of some other resource.  This means that the base
   instance is, in effect, a moving target, since we do not want to
   require servers or clients to retain old instances for indefinite
   periods.



Mogul et al.                                                    [Page 8]


Internet-Draft              Delta clustering        24 August 2000 16:15


   Douglis et al. describe an approach to dynamically-generated
   documents in which the document is broken down into separate static
   and dynamic parts [5].  The static part is a macro with unbound
   variables, and the dynamic part is a set of bindings between
   variables and specific values.  In their mechanism, the client
   retains the static part, called a ``template'' in its cache.  It
   repeatedly requests, as needed, a new instance of the dynamic part,
   and then reevaluates the template macro, with its variables bound as
   specified in the dynamic part, in order to generate the current
   instance of the entire document.  Their macro language is an
   extension to HTML, although other languages (such as Java) might be
   just as suitable.

   The WebExpress project [7] adopted the concept of a designated ``base
   object'', which is nearly identical to the template concept described
   here.  WebExpress included a mechanism for ``rebasing'' a client
   (providing it with a new base object).  The primary difference
   between the WebExpress approach and our approach is the time at which
   a client discovers the identity of a (possibly new) template.

   We can apply a similar template-based mechanism to substantially
   simplify the use of delta encoding.  In this approach, the server
   ``computes'' the delta between the current instance of a resource,
   and a separately-identified template resource.  (Depending on the
   encoding format, it might be possible to generate the delta directly,
   rather than generating the current instance and then computing a
   delta.)  The client then applies the delta to the template resource,
   rather than to a previous instance of the requested resource.

   Since this approach avoids the need to retain old instances of the
   dynamic resource at either the client or the server, it greatly
   simplifies the implementation and optimization of base instance
   management at both client and server.  However, it requires a new
   mechanism to inform the client of the appropriate template resource,
   and its success may depend on the proper construction of the
   template.

   To support template-base deltas, therefore, we define a new response
   header that the origin server uses as a ``hint'' to inform a client
   of the URI of the template resource.  For example, if the client
   request is

      GET /foo.html HTTP/1.1
      Host: bar.example.net
      A-IM: vcdiff

   the server might send:

      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT


Mogul et al.                                                    [Page 9]


Internet-Draft              Delta clustering        24 August 2000 16:15


      Etag: "abc"
      DTemplate: "http://bar.example.net/foo.tplt"

   The implication of the DTemplate header is that, on subsequent
   requests for http://bar.example.net/foo.html, the client should ask
   for a delta between http://bar.example.net/foo.tplt and the current
   instance.  This means, of course, that the client would first have
   obtained and cached an instance of http://bar.example.net/foo.tplt.
   The client might retrieve the template either on demand (i.e., just
   before making the new request for foo.html), or during an otherwise
   idle moment, or not at all (since the use of deltas is fully
   optional).

   The DTemplate header implies that the specified URL is within the
   uniqueness scope of the Request-URI (or else it would not be
   meaningful to ask for a delta between the template and the
   Request-URI).  For example, if the client requests the template:

      GET /foo.tplt HTTP/1.1
      Host: bar.example.net

   and receives the response:

      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:47 GMT
      Etag: "pqr"

   then the client can make a subsequent request for foo.html as:

      GET /foo.html HTTP/1.1
      Host: bar.example.net
      If-None-match: "pqr"
      A-IM: vcdiff

   Alternatively, the DTemplate header field can be used to specify that
   a specific instance of a resource (rather than any available
   instance) be used as a template, by including an entity tag in the
   header field.  For example:

      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr"

   This form of the header further simplifies the instance-management
   problem, by eliminating any ambiguity about which instances are worth
   saving.  It might, however, reduce the possibilities for delta
   encoding.

   Finally, the DTemplate and DCluster headers can be combined.  For
   example:

Mogul et al.                                                   [Page 10]


Internet-Draft              Delta clustering        24 August 2000 16:15


      HTTP/1.1 200 OK
      Date: Sun, 06 Nov 1994 08:49:37 GMT
      Etag: "abc"
      DTemplate: "http://bar.example.net/foo.tplt"
      DCluster: "//bar.example.net/foo?"

   This means that for any Request-URI matching the prefix specified in
   the DCluster header field, the URI specified in the DTemplate field
   is an appropriate template.

   Note that an origin server ought not necessarily send a DTemplate
   header field on every response; doing so could waste network
   bandwidth, if the recipient is not delta-capable.  Instead, the
   server should employ heuristics to decide whether to send this header
   field.  For example, it might be worth sending it whenever the
   client's request message indicates its willingness to accept a
   delta-encoded response, and when the If-None-Match field in the
   request does not already specify the entity-tag of the template
   resource.


5 Specification

   In this specification, the The key words "MUST", "MUST NOT",
   "SHOULD", "SHOULD NOT", and "MAY" document are to be interpreted as
   described in RFC2119 [4].

5.1 Modified basic requirements for delta-encoded responses
   The basic requirements for delta-encoded responses, specified in [8],
   are modified for servers that support the DCluster and/or DTemplate
   header fields.

   A server MAY send a delta-encoded response if:

      1. The server would be able to send a 200 (OK) response for
         the request.

      2. The client's request includes an A-IM header field listing
         at least one delta-coding.

      3. The client's request includes an If-None-Match header
         field listing at least one valid entity tag for an
         instance (a "base instance") of at least one of:

            a. the Request-URI.

            b. a different URI within the uniqueness scope of the
               Request-URI.

            c. a URI that matches a uri-prefix in a DTemplate
               header field that was sent in a response for a URI
               within the uniqueness scope of the Request-URI.
Mogul et al.                                                   [Page 11]


Internet-Draft              Delta clustering        24 August 2000 16:15


   XXX Anything else?

5.2 Modified header specifications
   One of the headers defined in the specification for delta
   encoding [8] has a slightly different meaning when delta clustering
   or delta templates are used.

5.2.1 A-IM
   When an A-IM request-header field includes one or more delta-coding
   values, the request MUST contain an If-None-Match header field,
   listing one or more entity tags from URIs in the uniqueness scope of
   an entity tag from a prior response for the request-URI.

   Section 5.4 defines rules that a client uses for determining the set
   of base instances in the uniqueness scope of a request-URI.

5.3 New header specifications
   The following headers are defined, for use as entity-headers.  (Due
   to the terminological confusion discussed in [8], some entity-headers
   are more properly associated with instances than with entities.)

5.3.1 DCluster
   The DCluster entity-header field is used in a response to specify a
   subset of the uniqueness scope of the entity tag given in the Etag
   header field of the response.  The uniqueness scope is the set of
   URIs across which this strong entity tag is guaranteed to be unique,
   for all time.  A uniqueness scope is specified by providing one or
   more prefixes for other URIs in the set.

       DCluster = "DCluster" ":" #( <"> uri-prefix <">)
       uri-prefix = scheme ":" "//" host [ ":" port ] [ abs_path ]
               | abs_path
               | rel_path

   If the uri-prefix is an abs_path or rel_path, the implied scheme is
   the scheme used in the Request-URI.  (Typically, the scheme would be
   "http".)  If the uri-prefix is an abs_path, it is interpreted
   relative to the origin server host name.  If the uri-prefix is a
   rel_path, it is interpreted relative to the Request-URI.

   The uniqueness scope of a strong entity tag in an ETag header field
   always includes the Request-URI of the corresponding request, and the
   union of all URIs matching one or more of the uri-prefix strings in
   the DCluster header field of the response.  It may include other URIs
   not described in a DCluster header field.  That is, the set of URIs
   for which a uri-prefix in a DCluster header field is a prefix MUST be
   a subset of the uniqueness scope, and MAY be a proper subset.

   Generally, the DCluster header does not necessarily describe the
   entire uniqueness scope of an entity tag.  Rather, it describes a
   subset of the uniqueness scope whose members are likely to differ by
   small deltas.
Mogul et al.                                                   [Page 12]


Internet-Draft              Delta clustering        24 August 2000 16:15


   A server SHOULD NOT include a uri-prefix in a DCluster header field
   if the server is not likely to be able to generate deltas between the
   Request-URI and the URIs matching that uri-prefix.

   The uniqueness scope specified by a DCluster header is valid for use
   by the client only for entity tags received in the same response or
   in subsequent responses, never for entity tags received in previous
   responses.

   Section 5.4 defines rules that a client uses for determining the set
   of base instances in the uniqueness scope of a request-URI.

5.3.2 DTemplate
   The DTemplate entity-header field is used in a response to specify
   another resource that the origin server prefers to use as the base
   instance for computing deltas for the Request-URI, or for other
   resources in the uniqueness scope specified by a DCluster header
   field in the response.

       DTemplate = "DTemplate" ":"
                        #( <"> dt-uri <"> [ "/" dt-param])
       dt-uri = absoluteURI | abs_path
       dt-param = "etag" "=" entity-tag

   If the dt-uri is an abs_path, it is interpreted relative to the
   origin server host name.

   A URI specified in a DTemplate header field is, by definition, in the
   uniqueness scope of the Request-URI.

   If a client has received a DTemplate header field within a given
   uniqueness scope, the client SHOULD use an instance of the specified
   template resource(s) as the base instance for any future delta
   requests for other resources in the uniqueness scope.

   If the DTemplate header field includes an entity tag with a URI, then
   the client SHOULD use only the specified instance of the template
   resource base instance for any future delta requests for other
   resources in the uniqueness scope.

   The URI specified by a DTemplate header is valid for use by the
   client only with entity tags received in the same response or in
   subsequent responses, never for use with entity tags received in
   previous responses.

5.4 Rules for determining base instances in a uniqueness scope
   When a client is about to make a request for a given Request-URI, and
   wishes to choose entity tags to the request's If-None-Match header
   field, it follows a set of rules to determine which base instances
   (and hence, which entity tags) may be included.  These rules do not
   require the client to include any entity tags, and for reasons of

Mogul et al.                                                   [Page 13]


Internet-Draft              Delta clustering        24 August 2000 16:15


   performance, a client implementation should not necessarily include
   all of the legal choices.

   Recall that the uniqueness scope of an entity tag is the set of
   resources across which this entity tag is unique for all time.  In
   other words, if the client and server correctly agree that the
   Request-URI is contained in the uniqueness scope for an entity tag E
   for some URI X, then if the client sends this entity tag E in an
   If-None-Match header field, the server will know unambiguously which
   resource it refers to (even though X is not explicitly named in the
   request).

   The client's view of the uniqueness scope of an entity tag might be a
   subset of the server's view.  (It cannot be a superset, or the server
   would be unable to interpret the If-None-Match field.)  For example,
   a server might not list all possible uri-prefix values in a DCluster
   header, for performance reasons, or the client might not support the
   DTemplate header.  A client probably will not have received responses
   for more than a small subset of the URIs in a uniqueness scope, or it
   might have deleted some of the instances in order to create space in
   its cache.  A client SHOULD NOT list an entity tag in an
   If-None-Match header unless it has a cache entry containing at least
   part of the corresponding instance, since this would otherwise lead
   to uninterpretable delta responses.

   A Request-URI is in the uniqueness scope of an entity tag E for an
   instance of URI X if one or more of these conditions holds:

      1. X is the Request-URI.

      2. The DCluster header field of a prior response for the
         Request-URI includes a prefix of X. The base instance
         associated with entity tag E MUST NOT have been received
         before the first such DCluster header field.

      3. The DCluster header field of a prior response for X
         includes a prefix of the Request-URI.  The base instance
         associated with entity tag E MUST NOT have been received
         before the first such DCluster header field.

      4. X has been listed in the DTemplate header field of a prior
         response for the Request-URI, or of a prior response for
         another URI Y in the uniqueness scope of the Request-URI
         (by recursive application of these conditions to an
         instance of URI Y).

   XXX Is this unambiguous?

   Security considerations (see section 6.1) require that a client not
   always trust every DCluster header that it receives.  A malicious
   server might send a DCluster header that could cause the client to

Mogul et al.                                                   [Page 14]


Internet-Draft              Delta clustering        24 August 2000 16:15


   believe that a URI is within the uniqueness scope of an entity tag
   when, in fact, it is not.  Therefore, a client MUST NOT use condition
   #3 above (DCluster of a prior response for X includes prefix of
   Request-URI) unless it can securely verify that a resulting delta is
   not spoofed.

   Our current belief is that spoofing can be detected by any one of the
   following means:

      - The delta-encoded response is accompanied by a secure
        message digest covering the entire current instance,
        generated by the origin server.  This allows the client to
        verify that it has received the current instance of the
        Request-URI.

      - All of the URIs in the uniqueness scope of the Request-URI
        have the same "hostport" as the Request-URI; see
        RFC2396 [3] for the specification of this term.  This
        ensures that, if no interception mechanism is in use, that
        the client receives what the server wishes it to receive.
        (In general, malicious interception mechanisms create
        broader risks than the spoofing of deltas.)

      - All of base instances associated with the entity tags
        listed in the client's A-IM header came from URIs listed in
        DCluster or DTemplate headers in responses for prior
        Request-URIs having the same "hostport" as the current
        Request-URI.  This ensures that the chosen base instances
        came from origin servers trusted by the origin server for
        the current Request-URI.

      Note: the spoofing detection mechanisms listed above should be
      reviewed by competent security experts.


6 Security Considerations

      Note: This aspect of the specification is the subject of some
      controversy, and the details of protections against spoofing
      attacks in particular are likely to change.  We will seek a
      more formal security review of this specification as part of
      the IETF standardization process.

6.1 Spoofing attacks using the DCluster header
   We have identified a potential spoofing attack via the ``DCluster''
   header. In this scenario, a malicious server (e.g.,
   malicious.example.org) generates a response (e.g., for
   http://malicious.example.org/trap.html) with a ``DCluster'' header
   indicating that the uniqueness scope of the entity tag in the
   response includes another server (e.g., victim.example.com).  Suppose
   that the response includes the entity tag "abc".  Now suppose that
   the client makes this request:
Mogul et al.                                                   [Page 15]


Internet-Draft              Delta clustering        24 August 2000 16:15


      GET /foo.html HTTP/1.1
      host: victim.example.com
      If-None-Match: "abc"
      A-IM: vcdiff

   If the victim.example.com server does actually have an instance with
   entity tag "abc", either for http://victim.example.com/foo.html or
   for a resource that really is in the same uniqueness scope, then the
   server will generate a delta.  However, if the client applies this
   delta to the cached response for
   http://malicious.example.org/trap.html, it will end up either with
   garbage, or (more perniciously) with an apparently genuine result
   that actually contains bogus information inserted by
   malicious.example.org.  (The response for
   http://malicious.example.org/trap.html might contain the bogus
   information concealed in HTML comments.)

   Protection against this attack can be accomplished by the use of
   end-to-end digests on the instances, as described in another
   proposal [11].  (Message digests, such as provided by ``Content-MD5''
   or by Digest Authentication, are not sufficient, since none of the
   individual messages are tampered with in this attack.)

   Note that protection against spoofing via the ``DCluster'' header
   does not inherently require a keyed digest.  Since the delta encoded
   response for http://victim.example.com/foo.html is not itself
   generated by malicious.example.org, an end-to-end digest included
   with this response by victim.example.com is sufficient to prove that
   the client's reconstruction of foo.html is correct.  However, if
   message tampering is also a possibility, then the server should also
   provide a keyed message digest.

   Another defense against such an attack is for the client to ignore a
   ``DCluster'' header that specifies a different server.  However, this
   defense is only effective if servers that generate delta-encoded
   responses are not shared among multiple, possibly mutually
   untrustworthy, content providers.  It also reduces the potential
   effectiveness of clustering, especially for large sites split across
   multiple servers.

   Note that because the DTemplate header field also adds one or more
   URIs to the uniqueness scope of an entity tag, the same spoofing
   attack is possible using the DTemplate header, and the same defenses
   apply.

   We recommend that if a client receives a delta-encoded response
   without an accompanying Digest, and if the client's view of the
   uniqueness scope for the Request-URI includes more than one server
   hostname, then the response should either be discarded, or presented
   to the user as potentially corrupt.


Mogul et al.                                                   [Page 16]


Internet-Draft              Delta clustering        24 August 2000 16:15


6.2 Privacy attacks using the DCluster header
   Many people have drawn attention to the privacy risks associated with
   HTTP Cookies, which allow a site (or group of cooperating sites) to
   track the activity of a user.  More recently, Martin Pool has
   identified a similar tracking mechanism based on cache validators,
   especially entity tags [12].  In this attack, a site encodes
   user-specific information in an entity tag, and then tracks repeated
   requests by that user to the same resource, as the user's browser
   attempts to validate its cache entry using that entity tag.

   Although this tracks only the requests for a specific resource (URL),
   a site can indirectly track references to many other pages by
   embedding an image reference to the tracked URL on each of those
   pages.

   Just as with Cookies, the entity-tag tracking mechanism depends upon
   the server's ability to induce the client to send back a specific
   string on subsequent requests.  However, the basic entity-tag
   tracking mechanism only allows a site to track access to pages that
   it controls.

   The ``DCluster'' header field specified in this document makes this
   tracking mechanism more powerful, by allowing one site to gain access
   to entity tags from many other sites.  For example, suppose that the
   site evil.example.com knows the format used to encode client-specific
   information in entity tags issued by the site naive.example.com.  Any
   client who visits http://evil.example.com/home.html and receives a

       DCluster: http://naive.example.com/

   header in response might then later make a delta-capable request to
   evil.example.com that includes entity tags issued by
   naive.example.com.

   It might be possible to defend against such ``hijacked'' tracking
   attacks by chosing a cryptographically strong encoding for the
   client-specific data hidden in entity tags, but this might not always
   be feasible.  In any event, this could not hide from evil.example.com
   the fact that the client had at some point visited naive.example.com
   (which could be significant if this site provided, for example,
   medical information about an embarrassing disease).

   Cryptographic digests of instances, as described in section 6.1 to
   protect against DCluster spoofing, do not help, because the malicious
   site in this case is the source of the requested data, and need not
   actually use a delta encoding to accomplish its attack.

   As in section 6.1, one possible defense is for the client to ignore a
   ``DCluster'' header that specifies a different server, but (also as
   discussed in section 6.1) this is not ideal.


Mogul et al.                                                   [Page 17]


Internet-Draft              Delta clustering        24 August 2000 16:15


   User agents SHOULD provide a method to allow users to disable the use
   of the ``DCluster'' header, preferably either in all cases, or in
   cross-site cases.

6.3 Data leakage attacks using the DCluster header
   Suppose that a server has asserted, using a DCluster header, that
   resources URL1 and URL2 are in the same uniqueness scope.  Also
   suppose that a client is allowed to access URL1, but is not allowed
   to access URL2.  (Access may be denied due to a lack of
   authentication, or a server configuration setting, or some other
   mechanism.)  Finally, suppose that the client can guess or obtain the
   entity tag ET2 of some instance of URL2.  If the client asks the
   server for the current instance of URL1 as a delta from the ET2
   instance of URL2, and the server responds with such a delta, this may
   reveal information about the contents of URL2.  (The amount of
   information revealed depends strongly on the delta-coding format, and
   probably will not be enough to recover the full contents of URL2.)

   A server MUST NOT reply using a delta encoding, if the chosen base
   instance is not an instance of the Request-URI, unless the server can
   verify that the client would currently be allowed access to both the
   chosen base instance and the Request-URI.


7 History

7.1 draft-mogul-http-dcluster-00.txt
   This document was split off from draft-mogul-http-delta-*.txt, to
   avoid having the security issues affect the basic HTTP delta encoding
   specification, and to ensure that the design of clusters and
   templates was done so that they are entirely optional for
   implementors of basic delta encoding.


8 Acknowledgements

   Andrew Birrell alerted us to the possibility of data leakage attacks
   using the DCluster header.  Koen Holtman contributed to the drafting
   of this document, and especially to the security considerations and
   mechanisms.


9 References

   NOTE TO RFC EDITOR: many of the references here might be out of date.
   Please verify these with the primary author of this Internet-Draft
   before issuing this document as an RFC.

   1.  Gaurav Banga, Fred Douglis, and Michael Rabinovich.  Optimistic
   Deltas for WWW Latency Reduction.  Proc. 1997 USENIX Technical
   Conference, Anaheim, CA, January, 1997, pp. 289-303.

Mogul et al.                                                   [Page 18]


Internet-Draft              Delta clustering        24 August 2000 16:15


   2.  T. Berners-Lee, R. Fielding, and H. Frystyk.  Hypertext Transfer
   Protocol -- HTTP/1.0.  RFC 1945, HTTP Working Group, May, 1996.

   3.  T. Berners-Lee, R. Fielding, and L. Masinter.  Uniform Resource
   Identifiers (URI): Generic Syntax.  RFC 2396, IETF, August, 1998.

   4.  S. Bradner.  Key words for use in RFCs to Indicate Requirement
   Levels.  RFC 2119, Harvard University, March, 1997.

   5.  Fred Douglis, Antonio Haro, and Michael Rabinovich.  HPP: HTML
   Macro-Preprocessing to Support Dynamic Document Caching.  Proc.
   USENIX Symposium on Internet Technologies and Systems, USENIX,
   Monterey, CA, December, 1997, pp. 83-94.

   6.  Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
   Nielsen, Larry Masinter, Paul Leach, and Tim Berners-Lee.  Hypertext
   Transfer Protocol -- HTTP/1.1.  RFC 2616, HTTP Working Group, June,
   1999.

   7.  Barron C. Housel and David B. Lindquist.  WebExpress: A System
   for Optimizing Web Browsing in a Wireless Environment.  Proc. 2nd
   Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye, New
   York, November, 1996, pp. 108-116.
   http://www.networking.ibm.com/art/artwewp.htm.

   8.  Jeffrey C. Mogul, Balachander Krishnamurthy, Fred Douglis, Anja
   Feldmann, Yaron Goland, and Arthur van Hoff.  Delta encoding in HTTP.
   Internet-Draft draft-mogul-http-delta-06, IETF, August, 2000. This is
   a work in progress.

   9.  Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
   Krishnamurthy.  Potential benefits of delta encoding and data
   compression for HTTP.  Proc. SIGCOMM '97, Cannes, France, September,
   1997, pp. 181-194.

   10.  Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
   Krishnamurthy.  Potential benefits of delta encoding and data
   compression for HTTP.  Research Report 97/4, DECWRL, July, 1997. URL
   http://www.research.digital.com/wrl/techreports/abstracts/97.4.html.

   11.  Jeffrey C. Mogul and Arthur Van Hoff.  Instance Digests in HTTP.
   Internet-Draft draft-mogul-http-digest-02, IETF, March, 2000. This is
   a work in progress.

   12.  Martin Pool.  meantime: non-consensual http user tracking using
   caches.  http://www.linuxcare.com.au/mbp/meantime/.






Mogul et al.                                                   [Page 19]


Internet-Draft              Delta clustering        24 August 2000 16:15


10 Authors' addresses

   Jeffrey C. Mogul
   Western Research Laboratory
   Compaq Computer Corporation
   250 University Avenue
   Palo Alto, California, 94305, U.S.A.
   Email: mogul@pa.dec.com
   Phone: 1 650 617 3304 (email preferred)

   Fred Douglis
   AT&T Labs - Research
   180 Park Ave, Room B-137
   Florham Park, NJ 07932-0971, U.S.A.
   Email: douglis@research.att.com
   Phone: 1 973 360-8775

   Daniel M. Hellerstein
   Economic Research Service, USDA
   1909 Franwall Ave, Wheaton MD 20902
   E-mail: danielh@crosslink.net or webmaster@srehttp.org
   Phone: 1 202 694-5613 or 1 301 649-4728






























Mogul et al.                                                   [Page 20]