[Search] [txt|pdf|bibtex] [Tracker] [Email] [Nits]

Versions: 00                                                            
Network Working Group                             Jeffrey Mogul, DECWRL,
Internet-Draft                                  Arthur van Hoff, Marimba
Expires: 15 October 1998                                   15 April 1998




                     Duplicate Suppression in HTTP

                     draft-mogul-http-dupsup-00.txt


STATUS OF THIS MEMO

        This document is an Internet-Draft. Internet-Drafts are
        working documents of the Internet Engineering Task Force
        (IETF), its areas, and its working groups. Note that other
        groups may also distribute working documents as
        Internet-Drafts.

        Internet-Drafts are draft documents valid for a maximum of
        six months and may be updated, replaced, or obsoleted by
        other documents at any time. It is inappropriate to use
        Internet-Drafts as reference material or to cite them other
        than as "work in progress."

        To learn the current status of any Internet-Draft, please
        check the "1id-abstracts.txt" listing contained in the
        Internet-Drafts Shadow Directories on ftp.is.co.za
        (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific
        Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US
        West Coast).

        Distribution of this document is unlimited.  Please send
        comments to the authors.


ABSTRACT

        A significant fraction of Web content is often exactly
        duplicated under several different URIs.  This duplication
        can lead to suboptimal use of network bandwidth, and
        unnecessary latency for users.  Much of this duplication
        can be avoided through the use of a simple mechanism,
        described here, which allows a cache to efficiently
        substitute one byte-for-byte identical value for another.
        By doing so, the cache avoids some or all of the network
        costs associated with retrieving the duplicate value.







Mogul, van Hoff                                                 [Page 1]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


                           TABLE OF CONTENTS

1 Introduction                                                         2
2 Terminology                                                          3
3 Goals                                                                4
4 Overview                                                             5
     4.1 Scenario                                                      5
     4.2 Requirements for a duplicate-suppression mechanism            6
     4.3 How instance bodies may be uniquely indicated                 7
          4.3.1 Use of cryptographic digests                           8
          4.3.2 Use of Rabin fingerprints                              8
          4.3.3 Use of entity tags and uniqueness scopes               9
          4.3.4 Other forms of indicia                                10
     4.4 Cache entry freshness                                        10
     4.5 Message headers                                              11
     4.6 Server hints                                                 12
     4.7 Other semantic issues                                        13
          4.7.1 Is SubOK hop-by-hop?                                  13
          4.7.2 Response caching in intermediate proxies              13
          4.7.3 Conditional GETs                                      14
          4.7.4 Interaction with Hit-metering                         14
          4.7.5 Interaction with compression and delta-encoding       16
          4.7.6 Interaction with range retrievals                     16
          4.7.7 Other points                                          17
5 Specification                                                       17
     5.1 Protocol parameter specifications                            17
          5.1.1 Indicia schemes                                       17
          5.1.2 Indicia values                                        17
     5.2 Header specifications                                        17
          5.2.1 SubOK                                                 18
          5.2.2 Subst                                                 19
6 IANA Considerations                                                 19
7 Security Considerations                                             19
     7.1 Manipulation of instance-bodies                              20
     7.2 Manipulation of a uniqueness scope                           20
     7.3 False association of an indicia with a URI                   22
8 Acknowledgements                                                    22
9 References                                                          22
10 Authors' addresses                                                 24


1 Introduction

   A significant fraction of Web content is often exactly duplicated
   under several different URIs.  One trace-based study found that 18%
   of the non-empty message-bodies were identical to at least one other
   message-body for a different resource [4].  Another study showed
   that, of 30 million HTML and text documents found by a search-engine
   crawler, more than 5.3 million (18%) were identical duplicates [3].



Mogul, van Hoff                                                 [Page 2]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   Content can be duplicated for many reasons.  The observed rate of
   duplication of HTML and text documents suggests that many
   duplications are made instead of hyperlinking to a remote copy of a
   document.  Many graphical elements, such as logos, bullets,
   backgrounds, bars, buttons, etc. are duplicated so that a particular
   page can be displayed without depending on the simultaneous
   availability of several Web servers.  Push-based distribution
   mechanisms often send the same content to many users.  The automatic
   distribution of software packages via the Web can lead to duplicated
   distribution of the same program or program components (such as a
   library).

   This duplication can lead to suboptimal use of network bandwidth, and
   unnecessary latency for users.  Each time a duplicate is loaded
   across the network instead of from a local or nearby cache, network
   bandwidth is wasted.  Such retrievals may also increase the latency
   seen by the ultimate client, especially if the message is large or
   the available bandwidth is small.

   This document describes a simple, optional, efficient, and compatible
   extension to HTTP/1.1 that can be used to avoid much of this
   duplication.  In particular, it allows a client to inform a proxy
   cache that the client will accept either a response for the requested
   resource, or the substitution of the cached response from a different
   resource whose entity-body is, with extremely high probability,
   byte-for-byte identical to that of the requested resource.  By
   performing the substitution, the cache avoids some or all of the
   network costs associated with retrieving the duplicate value.


2 Terminology

   HTTP/1.1 [5] defines the following terms:

   resource         A network data object or service that can be
                   identified by a URI, as defined in section 3.2.
                   Resources may be available in multiple
                   representations (e.g. multiple languages, data
                   formats, size, resolutions) or vary in other ways.

   entity           The information transferred as the payload of a
                   request or response.  An entity consists of
                   metainformation in the form of entity-header fields
                   and content in the form of an entity-body, as
                   described in section 7.

   variant          A resource may have one, or more than one,
                   representation(s) associated with it at any given
                   instant. Each of these representations is termed a
                   `variant.' Use of the term `variant' does not
                   necessarily imply that the resource is subject to
                   content negotiation.
Mogul, van Hoff                                                 [Page 3]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   The dictionary definition for ``entity'' is ``something that has
   separate and distinct existence and objective or conceptual
   reality'' [11].  Unfortunately, the definition for ``entity'' in
   HTTP/1.1 is similar to that used in MIME [7], based on an entirely
   false analogy between MIME and HTTP.

   In MIME, electronic mail messages do have distinct and separate
   existences, so the MIME definite ``entity'' as something that
   ``refers specifically to the MIME-defined header fields and contents
   of either a message or one of the parts in the body of a multipart
   entity'' make sense.

   In HTTP, however, a response message to a GET does not have a
   distinct and separate existence.  Rather, it is describing the
   current state of a resource (or a variant, subject to a set of
   constraints).  The HTTP/1.1 specification provides no term to
   describe ``the value that would be returned in response to a GET
   request at the current time for the selected variant of the specified
   resource.''  This leads to awkward wordings in the HTTP/1.1
   specification in places where this concept is necessary.

   It is too late to fix the terminological failure in the HTTP/1.1
   specification, so we instead define a new term, for use in this
   document:

   instance         The entity that would be returned in a status-200
                   response to a GET request, at the current time, for
                   the selected variant of the specified resource, but
                   without the application of any content-coding or
                   transfer-coding.

   One can think of an instance as a snapshot in the life of a resource.

   It is convenient to think of an entity tag, in HTTP/1.1, as being
   associated with an instance, rather than an entity.  That is, for a
   given resource, two different response messages might include the
   same entity tag, but two different instances of the resource should
   never be associated with the same (strong) entity tag.


3 Goals

   The goals of this proposal are:

      1. Allow substitution of a cached response for a response of
         a requested resource, when the cached instance body is
         byte-for-byte identical to that of the requested instance,
         and when the URIs for the two resources do not match.

      2. Interoperate with all HTTP/1.1-compliant implementations.


Mogul, van Hoff                                                 [Page 4]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


      3. Add minimal overhead to HTTP messages.

      4. Add minimal execution time for HTTP implementations.

      5. Be entirely optional for any implementation and at any
         time.

   The goals do not include:

      - Compression of HTTP entity bodies; this is supported in the
        basic HTTP protocol.

      - Delta encoding (or ``differential'' download) of an
        instance body that differs slightly from an entry in the
        client's own cache; this is supported in an independent
        extension [13].

      - Providing a mechanism for clients to discover the entity
        tag, checksum, digest, or other instance-specific
        information that allows the accurate specification of
        substitutable instance bodies.  This information can be
        provided by numerous techniques; the extension described in
        this document assumes the existence of one or more such
        techniques.


4 Overview

4.1 Scenario
   We start by describing part of a scenario in which duplicate
   suppression might be used.  Imagine that an HTTP client has obtained,
   by some means (we will address this later), the MD5 digest for the
   body of a resource instance that the client wishes to retrieve.
   Imagine further that the client is using a caching proxy, and that
   the client has some reason to believe that the resource may be
   duplicated in the Web.

   To be specific, assume that the resource is http://foo.com/logo.gif,
   and that the known MD5 digest is "HUXZLQLMuI/KZ5KDcJPcOA==".  The
   client sends this request to the proxy:

      GET http://foo.com/logo.gif HTTP/1.1
      Host: foo.com
      SubOK: md5="HUXZLQLMuI/KZ5KDcJPcOA==", inform

   The meaning of this request is ``I want the value of
   http://foo.com/logo.gif, but you can substitute any response whose
   MD5 instance digest is HUXZLQLMuI/KZ5KDcJPcOA==.  Please inform me of
   any substitution, though.''  (Instance digests are described in
   another document [15].)


Mogul, van Hoff                                                 [Page 5]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   If the cache has a fresh copy of the requested resource, it simply
   returns that to the client, and ignores the SubOK header field.
   Otherwise, the proxy may then check its cache to see if any cached
   instance has the right MD5 digest.  If not, it simply forwards the
   request as it normally would.

   However, if the proxy does find a cached instance of another resource
   with the specified MD5 digest, it may return the cached resource to
   the client.

   For example, the proxy might return:

      HTTP/1.1 200 OK
      Date: Thu, 29 Jan 98 17:47:55 GMT
      Age: 37
      Etag: "xyzzy"
      Content-Type: image/gif
      Subst: http://bar.com/foo_logo.gif

   The Subst header field is added because the client used the
   ``inform'' directive; it tells the client where the response actually
   originated.

   Note that there is an assumption behind this scenario, that if the
   client receives the ``right'' instance body, it does not care if it
   receives the ``wrong'' headers.  In most cases, this is not an issue.
   Either the header is pretty much guaranteed to be the same for the
   original and for the substitute (e.g., it is unlikely that the
   Content-Type would differ if the instance body is the same), or it is
   not consequential to the use of the response (e.g., the Date header
   field).  However, if the client does want the actual headers from the
   requested URI, then we provide a optional mechanism, to be described
   later, that make this possible.

4.2 Requirements for a duplicate-suppression mechanism
   To generalize from this scenario, a mechanism for duplicate
   suppression needs:

      1. A way for the client to indicate that it is willing to
         accept substitutions.

      2. A way for the client to concisely and reliably indicate
         what instance body value it wants to receive.

   Both requirements are met by the SubOK header field, which allows a
   client to indicate the circumstances under which it will accept
   substitutions.  It also allows the client to give various directives
   regarding the behavior of the proxy cache.  The full specification of
   the SubOK header field is given in section 5.2.1.



Mogul, van Hoff                                                 [Page 6]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   It is important to understand that the client will be using a
   mechanism not described in this document to obtain a value, such as
   an instance digest, that defines the specific instance body being
   retrieved.  We will refer to such a value as an ``indicia'', ``an
   identifying marking ... used to single out one thing from
   another'' [12] (the plural can be either ``indicia'' or
   ``indicias'').

   Several such mechanisms are available:

      - The HTTP Distribution and Replication Protocol (DRP),
        proposed to W3C by Marimba, Netscape, Sun, Novell, and At
        Home, aims to provide a collection of new features for
        HTTP, to support ``the efficient replication of data over
        HTTP'' [9].  DRP includes an ``index'' data structure, to
        ``describe the exact state of a set of data files.''  A DRP
        index can include attributes, such as an entity tag,
        instance digest, or URN, associated with a given URI.

      - Similarly, the proposed ``Extensions for Distributed
        Authoring on the World Wide Web'' (WEBDAV) [8] includes a
        method for obtaining arbitrary properties of a resource,
        including its entity tag, or for a collection of resources.
        The WEBDAV property mechanism appears to be easily extended
        to other types of information about resources.

      - Several researchers have proposed that Web servers can
        provide, in a response about one URI, some ``hint''
        information pertaining to other resources.  For example, in
        a ``predictive prefetching'' mechanism [18], the server can
        suggest other URIs that the client application might
        beneficially prefetch.  Or a server might indicate which
        cached responses are still valid or invalid [10].  In
        either case, the ``hint'' information could convey an
        entity tag or digest for the current instance(s) of the
        relevant resource(s).  This information would not only
        prevent the client from attempting to retrieve something it
        already has a fresh copy of; it could also be used in the
        duplicate suppression mechanism.

4.3 How instance bodies may be uniquely indicated
   The second requirement for the SubOK header is that the client can
   use some form of indicia to ``concisely and reliably indicate'' to a
   proxy cache what instance body it wants to receive.  Generally, a URL
   is unacceptable as an indicia, because the binding between a URL and
   an instance may vary both with time (as the resource is modified) and
   with arbitrary parameters of the request, such as the
   ``Accept-Language'' header.  (While in principle the client could
   discover which request fields the desired response varies on, in
   practice this is likely to be too complicated.)


Mogul, van Hoff                                                 [Page 7]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   The two obvious candidates for indicia are cryptographic instance
   digests and strong entity tags.

4.3.1 Use of cryptographic digests
   A cryptographic digest algorithm, such as MD5 or SHA, takes a
   sequence of bytes (i.e., an instance body) as its input, and produces
   a short bit-string as its output.  Such algorithms have two
   attractive properties:

      - The output form is concise (24 octets for MD5, 28 octets
        for SHA, using the somewhat inefficient base64 encoding).

      - The probability that two inputs will generate the same
        output (``collide'') is extremely small.

   Although the probability of a collision is non-zero, a carefully
   constructed digest algorithm should reduce this probability to a
   point far below the probability of undetected TCP data corruption,
   for example.  For MD5, for example, ``[it] is conjectured that the
   difficulty of coming up with two messages having the same message
   digest is on the order of 2^64 operations'' [20].  It is widely
   assumed that the output of MD5 is ``close to random,'' so the
   probability of an MD5 collision between two distinct instance bodies
   picked at random should be quite small.

4.3.2 Use of Rabin fingerprints
   Another similar way to produce appropriate indicia values is to use
   the Rabin ``fingerprint'' method [19].  A Rabin fingerprint is a
   polynomial checksum with the property that if two files are
   different, the probability that their fingerprints are equal (a
   ``collision'') is extremely low.  More precisely, in a universe with
   N documents of mean length L bits, using k-bit fingerprints, the
   expected number of collisions is (N^2)*L/(2^k).

   For example, if one assumes that the entire web contains 1
   quadrillion (10^15) files, and that the mean length is 1 megabyte
   (about 2^23 bits), and one uses a 128-bit fingerprint, then the
   probability of even one collision is

       ((10^15)^2)*(20^23)/(2^128) = (10^30)/(2^105) = 0.025

   Note that any given proxy cache would presumably contain a much
   smaller number of cache entries at any given time, and so the actual
   probability of a collision in the duplicate-suppression algorithm
   would be infinitesimal.

   This probably guarantee depends, however, on the random choice of a
   parameter of the Rabin algorithm, so that this parameter is
   independent of any of the possible input files.  This implies, in
   turn, that an adversary knowing the value of this parameter can
   easily construct a pair of different files with identical
   fingerprints.
Mogul, van Hoff                                                 [Page 8]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   Since we cannot standardize on an indicia function with a secret
   parameter, this means that the Rabin fingerprint is not a full
   substitute for a secure message digest, such as MD5 or SHA.  However,
   simple experiments [2] suggest that the cost of computing a
   fingerprint is significantly less than computing an MD5 or SHA
   digest, and so for applications (such as within an intranet) where
   security is not necessary, fingerprints might be the most appropriate
   way to generate indicia for duplicate suppression.

4.3.3 Use of entity tags and uniqueness scopes
   The HTTP/1.1 specification [5] specifies that a strong entity tag
   ``may be shared by two entities of a resource only if they are
   equivalent by octet equality.'' This requirement is stated only with
   respect to a specific resource, so under this rule, instances of two
   different resources could have identical strong entity tags.
   Therefore, entity tags as specified in HTTP/1.1 are not sufficient
   for detecting duplication among different resources.

   However, HTTP/1.1 does not forbid the use of entity tags that are
   unique across a broader scope.  The proposal for support of delta
   encoding in HTTP [13] introduced the concept of a ``uniqueness
   scope'' of an entity tag: the set of resources across which an entity
   tag is unique for all time, and specified a DCluster header, which
   allows an origin server to describe a uniqueness scope for the entity
   tag carried by a response.  A uniqueness scope is described as the
   set of URLs that share a set of prefixes.

   This approach can also be applied to the problem of duplicate
   suppression.  Assume that the proxy cache has a cached response for
   an instance of resource http://bar.com/foo_logo.gif, with an entity
   tag of "xyzzy", and a uniqueness scope that includes the prefix
   ``http://foo.com/''.  If the client then sends the proxy this
   request;

      GET http://foo.com/logo.gif HTTP/1.1
      Host: foo.com
      SubOK: etag="xyzzy"

   then the proxy can determine that (1) the requested resource instance
   and the cached instance are in the same uniqueness scope, and (2) the
   entity tags do match.  This means, by the definition of a
   ``uniqueness scope'', and because ``xyzzy'' is a strong entity tag,
   that the cached instance body is exactly identical to the requested
   instance body.

   Note that the requesting client need not know the uniqueness scope of
   the entity tag used in the SubOK header field.  It is the proxy cache
   that has to determine whether the requested resource is in the
   uniqueness scope of any of its cached responses.  This implies the
   use of a somewhat sophisticated data structure by the proxy, but not
   one with great cost or complexity.

Mogul, van Hoff                                                 [Page 9]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   The DCluster header field proposed in [13] has one drawback for this
   application: because it was specified for use with delta encoding,
   rather than duplicate suppression, the intention is that ``the
   DCluster header does not necessarily describe the entire uniqueness
   scope of an entity tag.  Rather, it describes a subset of the
   uniqueness scope whose members are likely to differ by small
   deltas.''  When doing duplicate suppression, on the other hand, one
   would like to know the entire uniqueness scope, so as to maximize the
   likelihood of finding a duplicate instance body.  This implies either
   the use of a distinct header name for purposes of duplicate
   suppression, or a modification of the DCluster proposal to allow a
   distinction to be made.

   In this document, we propose a distinct header field name, to allow
   independent discussion of the duplicate suppression proposal and the
   delta encoding proposal, but at some point it might prove reasonable
   to merge the two headers into one.

4.3.4 Other forms of indicia
   It is possible that some form of URN could be used to indicate the
   correct instance body, especially if the URN embodies some indication
   of version number.  Immutability is not enough, however.  For
   example, although the namespace of IETF RFCs describes a set of
   immutable documents, each RFC can appear in several formats.  Thus,
   two resources that each represent the same RFC may not have
   byte-for-byte identical instance bodies.

   The DRP proposal [9] defined a variety of URN that embeds an MD5 or
   SHA digest within the URN.  This kind of URN has the necessary
   properties for duplicate suppression; however, it might not be
   convenient for other purposes.  The mechanism described in this
   document allows the use of URNs in the SubOK header field, but does
   not fully specify the constraints on URNs used in this way.

4.4 Cache entry freshness
   Suppose a client requests a resource

      GET http://foo.com/logo.gif HTTP/1.1
      Host: foo.com
      SubOK: md5="HUXZLQLMuI/KZ5KDcJPcOA==", inform

   and the proxy cache responds by substituting an instance of another
   resource:

      HTTP/1.1 200 OK
      Date: Thu, 29 Jan 98 17:47:55 GMT
      Age: 37
      Cache-control: max-age=3600
      Etag: "xyzzy"
      Content-Type: image/gif


Mogul, van Hoff                                                [Page 10]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


      Subst: http://bar.com/foo_logo.gif

   The response sent by the proxy indicates, through its Age and
   Cache-control headers, that it is still fresh for most of an hour.

   However, the Age and Cache-Control headers were drawn from a response
   for http://bar.com/foo_logo.gif, not a response for
   http://foo.com/logo.gif.  How does the client know how long it can
   safely cache this response as a representation of
   http://foo.com/logo.gif?

   Recall that we have already assumed the existence of a mechanism,
   outside the scope of this document, that the origin server has used
   to provide an indicia to the client, presumably for an instance that
   is current at the time the indicia is sent.  The same mechanism could
   easily be used to transfer freshness-lifetime information about that
   instance: whenever the origin server sends the indicia, it also sends
   an expiration time or maximum-age value.

   For example, if the indicia for an instance is sent in a DRP index
   data structure, the freshness lifetime could either be provided
   within the index, as an attribute of the instance, or it could be the
   freshness lifetime of the response containing the index (i.e., the
   Expires or max-age value from that response's HTTP headers).

4.5 Message headers
   In section 4.1, we noted that the simple mechanism described so far
   does not necessarily provide to the client the actual headers that
   would have been obtained if an instance of the requested resource had
   been returned, rather than a cached instance of a different resource
   with the same instance body.  We also noted that this might not, in
   general, be a problem.

   However, if it is important that the client get the actual headers
   for the requested resource, the duplicate-suppression mechanism can
   be augmented by several simple extensions.

   The first extension is outside the scope of this document, and would
   be part of the mechanism(s) used by the origin server to provide the
   indicia to the client (as described in section 4.2).  Along with the
   indicia value that describes the instance body, the origin server
   would provide a directive telling the client to obtain the actual
   resource headers even if the instance body is obtained from a
   duplicate.  For example, this might be done using an additional
   attribute in a DRP index data structure (as with the
   freshness-lifetime information; see section 4.4).

   Once the client knows that it ought to obtain the actual headers of
   the requested instance, it needs to indicate this requirement to the
   proxy cache (if any) performing the substitution.  It does so using a
   directive in the SubOK header field, e.g.:

Mogul, van Hoff                                                [Page 11]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


      GET http://foo.com/logo.gif HTTP/1.1
      Host: foo.com
      SubOK: md5="HUXZLQLMuI/KZ5KDcJPcOA==", hdrs

   The ``hdrs'' directive tells the proxy cache that the client will
   accept substitution for the instance body of http://foo.com/logo.gif
   (as long as the MD5 digest matches), but the proxy MUST provide the
   actual instance headers of the resource given by the Request-URI.

   If the proxy has a fresh cached copy of the headers for the
   Request-URI (that is, the Age of the cache entry with these headers
   has not reached the max-age value, or the cached Expires deadline has
   not been reached), then the proxy MAY simply return these cached
   headers, along with the substituted instance body.  However, if the
   proxy does not have a fresh cache entry containing these headers,
   then the ``hdrs'' directive means that the proxy MUST contact the
   origin server to obtain a fresh copy of the headers.

   Note that while this does require the proxy to send a request to the
   origin server, it does not require the proxy to retrieve the
   instance-body from the origin server.  Therefore, even in the case
   where the client's request causes a message to be sent to the origin
   server, the duplicate suppression mechanism might still avoid
   retrieving the instance body from the origin server.  This could
   significantly reduce bandwidth utilization and latency, if the body
   is large.

   The best means for the proxy to obtain fresh headers for the
   Request-URI is to use a HEAD request.  The HEAD method, by its
   specification, returns exactly what a corresponding GET request would
   return, except for a message-body.

   A conditional GET request will not work, because the HTTP/1.1
   specification for the 304 (Not Modified) response requires that the
   origin server SHOULD NOT provide many of the entity-headers for the
   instance.  For example, if the Content-Encoding for the substituted
   instance really did differ from that of the requested instance, a 304
   response would not reliably reveal this difference.

   Note that the unlikelihood of this scenario (two byte-for-byte
   identical instance bodies with different Content-Encodings) implies
   that the ``SubOK: hdrs'' mechanism probably will not be used very
   often (and so the attendant overhead will not be incurred).  On the
   other hand, the possibility of such a scenario (perhaps involving
   some other, as yet undefined, entity-header) implies the need for
   such a mechanism.

4.6 Server hints
   The mechanism by which the client obtains indicia information from
   the server is, as stated earlier, outside the scope of this document.
   We have explained elsewhere that certain other useful information
   could be supplied by the server at the same time:
Mogul, van Hoff                                                [Page 12]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


      - Freshness lifetime information for the instance described
        by the indicia (section 4.4).

      - Indication of the need for the client to obtain the actual
        HTTP headers for the Request-URI (section 4.5).

   The server might also provide other ``hint'' information along with
   the indicia.  This could include

      - An indication of likelihood of duplication; the client
        would only expend request bytes, and proxy processing time,
        on the SubOK header in cases where there the likelihood of
        duplication is above some threshold.  (Note that the mere
        fact that the server provides an indicia value might not,
        in itself, be an indication of likely duplication.  For
        example, an MD5 digest might be provided simply for
        security reasons.)

      - [Other hints?]

4.7 Other semantic issues
   There are a number of issues that have not yet been adequately
   addressed in this design, or that need to be considered when
   evaluating the design.

   Right now, some of the text here is just a placeholder for a full
   discussion.

4.7.1 Is SubOK hop-by-hop?
   Should the SubOK header be sent hop-by-hop (i.e., protected by a
   Connection header)?  If not, then it could be acted upon by a
   "distant" cache.  But this might not be a huge benefit.  It might be
   reasonable to make it forwardable by an implementation without
   requiring the implementation to fully implement the header.

4.7.2 Response caching in intermediate proxies
   If client C1 sends a substitutable request for resource R1 via proxy
   P1 and then via proxy P2, and then P2 substitutes R2 and then
   responds, should P1 cache the response as if for R2 or as if for R1?

   If the SubOK header isn't hop-by-hop, and P1 is naive about
   SubOK/Subst, then it would seem risky to allow P1 to cache the
   response.  Another naive client (C2) might make a request for R1 via
   P1, and P1 would then naively return the substituted cached response.
   C2 would thus receive a substitute without realizing it.

   Is there a good mechanism to prevent this? Perhaps some application
   of "Cache-Control: s-maxage=0" would work, but at a cost of increased
   complexity.  Alternatively, this could be a good argument in favor of
   making SubOK a hop-by-hop header.


Mogul, van Hoff                                                [Page 13]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   However, it's not entirely clear that C2, in the example above, would
   actually suffer from its naivete.  The cached response for R2 should
   be effectively equivalent to a cached response for R1, and C2
   shouldn't notice the difference.

4.7.3 Conditional GETs
   In principle, a conditional GET request (e.g., one including an
   If-Modified-Since or If-None-Match header field) can include the
   SubOK header field.  The server's interpretation of such a
   combination is straightforward:

      - Evaluate the conditional(s).  If this would result in a
        response status of 304 (Not Modified) or 412 (Precondition
        failed), ignore the SubOK field.

      - If the conditionals have no effect, then the proxy may
        evaluate the SubOK header to see if a substitution can be
        made.

   Informally, the meaning is "if my cache entry is valid, tell me so.
   Otherwise, send me a full response but feel free to substitute an
   equivalent instance body."

   In practice, it may not make much sense for a client to include a
   SubOK header field in a conditional GET.  Any mechanism that the
   client uses to obtain indicia values would probably also give it
   direct information on the validity of its cache entries.  If so, the
   client could make the validity determination locally, rather than
   asking the proxy (via an If-Modified-Since or If-None-Match header
   field) to make this determination.

4.7.4 Interaction with Hit-metering
   Hit-metering [14] is a proposed extension to HTTP/1.1 that allows an
   origin server to negotiate with proxy caches to receive reports about
   the number of times a cached response is used.  This allows the
   origin server to obtain accurate use-counts for a resource, without
   defeating caching.

   The hit-metering design depends on the existence of a connected path
   between the requesting client and the origin server, since
   negotiations are made with respect to hops along this path, and
   reports are forwarded hop-by-hop.  However, if a client using the
   duplicate suppression mechanism makes a request for resource R1 via a
   proxy cache, and the proxy cache substitutes an instance of resource
   R2, there may never be a complete request path connecting the proxy
   to the origin server for resource R1.  In such a case, the proxy
   cannot negotiate the use of hit-metering for resource R1.  In other
   words, if the origin server is expecting to count uses of R1, and the
   proxy substitutes an instance of R2 for a potential use of R1, the
   origin server's expectation will be violated.


Mogul, van Hoff                                                [Page 14]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   There are several ways to resolve the contradiction between
   hit-metering and duplicate suppression:

      1. The duplicate suppression mechanism requires that the
         client know an indicia value for the requested resource.
         Although this document does not specify how a client
         obtains an indicia value for a resource instance, it
         presumably comes from the origin server for that resource.
         Therefore, if an origin server wishes to reliably
         hit-meter a resource, it need only refrain from providing
         indicia values.  This would prevent any substitution for
         the resource in question.

      2. If the origin server desires only to know the sum of the
         use-counts for a collection of mutually substitutable
         instances, rather than the individual counts, then the
         existing hit-metering mechanism should suffice.  (This
         assumes that the origin server is the only source for
         instances with the given indicia value.)  The substitution
         counts as a use of the cache entry (by the ``counting
         rules'' of the hit-metering specification [14]) and so it
         will be reported to the origin server, albeit for a
         resource other than the client's Request-URI.

      3. If the client's request includes the ``hdrs''
         subok-directive, this forces the substituting proxy to
         complete the path to the origin server for the
         Request-URI, which gives the origin server information
         about the initial reference.  It also allows the origin
         server to inform that the response, if cached, should be
         hit-metered.  The proxy will then count (and report)
         further uses of the cache entry for the Request-URI (if
         one adopts a slightly broad view of the requirements of
         the ``counting rules'' of the hit-metering
         specification [14]: in this case, the proxy should
         increment the count for the Request-URI, and not for the
         substituted cache entry).  Therefore, if the mechanism by
         which the client obtains an indicia value from the origin
         server also has a means to require that the client use the
         ``hdrs'' subok-directive if it sends a SubOK header field,
         the origin server will obtain accurate use-counts.

   The second alternative may not be suitable for many applications of
   hit-metering (for example, counting uses of an ad banner may be less
   interesting than knowing which sites lead to its display).  The third
   alternative requires additional mechanism and minor changes to the
   hit-metering specification, and only works if the user agents can be
   trusted to honor the origin server's demand to use the ``hdrs''
   subok-directive.  Therefore, in practice the first alternative
   (provide no indicia for hit-metered resources) may be the best.


Mogul, van Hoff                                                [Page 15]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


4.7.5 Interaction with compression and delta-encoding
   HTTP/1.1 supports the use of compression both as a content-coding and
   as a transfer-coding.  A more recent proposal describes the use of
   delta encoding both as a content-coding and as a
   transfer-coding [13].  In delta encoding, the server sends the
   differences between two instances of a resource, instead of sending
   the newer instance in its entirety.

   The duplicate suppression mechanism described in this specification
   operates on instances.  By the definition in section 2, an instance
   body does not include any content-coding or transfer-coding.
   Therefore, when a proxy performing duplicate suppression decides that
   a particular cache entry is a suitable substitute, this decision is
   independent of whether the cached response was received with a
   content-coding.  Even if the substitution is appropriate according to
   the indicia, the proxy cannot return the cache entry if its
   content-coding is incompatible with the request (unless the proxy is
   able to undo the incompatible content-coding).

   Transfer-codings are, by definition, hop-by-hop.  Therefore, at least
   in concept, a cache entry does not have a transfer-coding.  (In
   practice, a cache implementation may choose to store a transfer-coded
   version of a response, as a performance optimization, if this
   behavior has no external visibility.)

   Therefore, duplicate suppression is essentially orthogonal to
   compression and delta-encoding.

4.7.6 Interaction with range retrievals
   HTTP/1.1 supports the retrieval of sub-ranges of a resource value.
   The range selection is done after the application of any
   content-coding, and before the application of any transfer-coding.
   As the result of receiving a Partial Content response, a proxy cache
   might create an entry that contains only a partial instance (or it
   might update an existing full or partial cache entry, resulting in a
   full instance).

   It is certainly possible for a substitution to be returned in
   response to a request with a Range header.  The subsitution, since it
   is done on an instance-by-instance basis, is performed before any
   range selection.

   Because the duplicate suppression mechanism operates on instances, it
   is not possible to use a partial cache entry as a substitution,
   unless the request uses a Range header that specifies a subset of the
   partial cache entry.  In any case, the content-coding of the cache
   entry must also be compatible with the Accept-Encoding header of the
   request.




Mogul, van Hoff                                                [Page 16]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


4.7.7 Other points
   T. B. S.


5 Specification

5.1 Protocol parameter specifications

5.1.1 Indicia schemes
   An indicia scheme is an algorithm used to generate an indicia from an
   instance body.

      indicia-scheme = token

   The Internet Assigned Numbers Authority (IANA) acts as a registry for
   indicia-scheme tokens. Initially, the registry contains the following
   tokens:

   MD5             The MD5 algorithm, as specified in RFC 1321 [20].
                   The output of this algorithm is encoded using the
                   base64 encoding [1].

   SHA             The SHA-1 algorithm [16].  The output of this
                   algorithm is encoded using the base64 encoding [1].

   UNIXcksum       The algorithm computed by the UNIX ``cksum'' command,
                   as defined by the Single UNIX Specification, Version
                   2 [17].  The output of this algorithm is an ASCII
                   digit string representing the 32-bit CRC, which is
                   the first word of the output of the UNIX ``cksum''
                   command.

5.1.2 Indicia values
   An indicia value is the results of applying an indicia scheme to an
   instance body.

       indicia-value = quoted-string

      ---------
      Syntax issues:

         1. Should we limit indicia-value to quoted-string, or
            could we also also token (e.g., a base64 encoding
            without quotes?)

      ---------

5.2 Header specifications
   The following headers are defined.



Mogul, van Hoff                                                [Page 17]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


5.2.1 SubOK
   The SubOK request header field is used to provide directives from an
   end-client to a proxy cache regarding the possible substitution of an
   instance body from a cached response for one resource instance for
   the instance body of the resource instance specified by the client's
   request.  A proxy MAY ignore the SubOK request header field on any
   request.

       SubOK = "SubOK" ":" #subok-directive
       subok-directive  = subok-mandatory-directive
                        | subok-indicia-directive
                        | subok-extension-directive
       subok-mandatory-directive =
                         "inform"
                       | "hdrs"
       subok-indicia-directive = indicia-scheme "=" indicia-value
       subok-extension-directive =
                       token [ "=" (quoted-string | token) ]

   All comparisons of subok-directive tokens are case-insensitive.
   Comparisons of quoted-strings in subok-directive values are
   case-sensitive.

   If a proxy does not ignore the entire SubOK field, it MUST ignore any
   subok-indicia-directive or subok-extension-directive that it does not
   understand, but it MUST implement all elements of the
   subok-mandatory-directive set.

   A proxy MAY substitute an appropriately fresh cache entry from a
   resource other than the Request-URI, if an indicia-value in the SubOK
   header field of the request matches the corresponding value for that
   cache entry.

   If the proxy's cache includes a usable entry for the Request-URI, it
   SHOULD NOT substitute an entry from a different resource.

      ---------
      Syntax issues:

         1. Should we allow a token as the value of a
            subok-extension-directive, or should we restrict this
            to a quoted-string?
         2. This syntax does not have an extensible way to mark
            directives as mandatory.  This means either that we
            have to specify all of the subok-mandatory-directive
            values now, or we need to devise a marking mechanism.

      ---------

   The meaning of the subok-mandatory-directive values is


Mogul, van Hoff                                                [Page 18]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   inform          If a substitution is made, the response MUST include
                   a valid Subst header field.

   hdrs            If a substitution is made, the HTTP headers in the
                   response MUST be a fresh superset of the headers that
                   would have been returned for a HEAD method on the
                   Request-URI.  If the proxy has a cache entry
                   containing such headers that has not exceeded its
                   freshness lifetime, the proxy may return these cached
                   headers.  Otherwise, it MUST forward a HEAD request
                   (or a GET) request towards the origin server for the
                   Request-URI, and then forward the response.  (The
                   HEAD request may be satisfied by an intervening proxy
                   cache, subject to any Cache-Control directives in the
                   original client's request.)

5.2.2 Subst
   The Subst response-header field MUST be used by a proxy to supply the
   URI of the original source of an entity-body, if the source is
   different from the client's Request-URI, and if the client's request
   included the ``inform'' directive in a SubOK request header field.
   Otherwise, a proxy MAY send a Subst response-header field, if it
   makes a substitution based on the information in a SubOK request
   header field.

       Subst = "Subst" ":" absoluteURI

      ---------
      Is it possible to use Content-Location instead of defining a
      new header, or would this be a conflict with the existing
      definition of Content-Location?
      ---------


6 IANA Considerations

      ---------
      This section will provide the necessary IANA considerations
      information, for the entering of indicia-scheme algorithms into
      the registries for HTTP indicia-schemes.  (Will reflect rules
      in draft-iesg-iana-considerations-01.)
      ---------


7 Security Considerations

   The duplicate suppression is vulnerable to a number of spoofing
   attacks

   There are three kinds of spoofing attack possible:


Mogul, van Hoff                                                [Page 19]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


      1. Manipulation of instance-bodies so as to cause an
         incorrect substitution by an unwitting proxy cache.

      2. Manipulation of the ``uniqueness scope'' of an entity tag
         so as to cause an incorrect substitution by an unwitting
         proxy cache.

      3. False association of an indicia with a URI.

7.1 Manipulation of instance-bodies
   In the first kind of attack, assume that the attacker can predict
   that a client C will request, via proxy P, a resource R1 and give a
   particular indicia value I in SubOK header field of the request.  If
   the attacker can cause proxy P to load an instance of a different
   resource R2, with the same indicia I, into P's cache before client C
   makes its request, then P might unwittingly send C the instance of
   R2, instead of the requested instance of R1.  The attacker can do
   this by constructing the appropriate value of R2, and then sending a
   request for R2 via proxy P.

   This is only really a problem if the instances of R1 and R2 are
   different, but in a way not easily detectable by the user.  For
   example, the attacker might cause the user to see a somewhat modified
   version of a document that the user wanted to see.

   This attack depends on the kind of indicia in use.  If a secure
   message digest, such as MD5 or SHA, is used for the indicia, there is
   no obvious vulnerability to spoofing (provided that there is no
   feasible attack on the digest algorithm).  This is because it is
   infeasible for an attacker to construct a bogus instance body that
   has the same secure message digest as a known instance body.

   However, if the Rabin fingerprinting method is used to generate the
   indicia, it is quite feasible for an attacker to construct a bogus
   instance body with a specific fingerprint value.  This means that the
   Rabin fingerprinting method is not an appropriate method to use in an
   insecure environment, although it may be useful within an Intranet.

   The Rabin fingerprinting method could be protected by using a secure
   end-to-end instance digest, but this would obviate much of the
   performance benefit of using the Rabin method.  However, there might
   still be a benefit in cases where the server supplying the indicia
   and the client are both able to efficiently compute a digest, but
   duplicate values could come from other servers not willing to attach
   a secure instance digest to their responses.

7.2 Manipulation of a uniqueness scope
   A similar problem arises when using entity tags as indicia.
   Supposing the attacker can predict that the client C will soon make
   this request via proxy P:


Mogul, van Hoff                                                [Page 20]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


      GET http://foo.com/logo.gif HTTP/1.1
      Host: foo.com
      SubOK: etag="xyzzy"

   Suppose that the attacker first requests, via proxy P, this response
   for http://evilhacker.com/blob.gif (from a server controlled by the
   attacker):

      HTTP/1.1 200 OK
      Date: Thu, 29 Jan 98 18:47:55 GMT
      Etag: "xyzzy"
      DCluster: "//foo.com/"

   Then, when client C does make its request for
   http://foo.com/logo.gif, proxy P will believe that it can substitute
   its cached instance body from the response for
   http://evilhacker.com/blob.gif (because the entity tags match, and
   the client's Request-URI matches the prefix in the DCluster header
   field of the cached response).

   A similar spoofing attack, via the DCluster header field, is possible
   when using delta encoding.  Therefore, the specification for delta
   encoding [13] includes a set of recommendations for preventing this
   attack; we provide an analogous set here.  Note, however, that the
   circumstances are sufficiently different that the defenses are also
   somewhat different.

   One possible protection against this attack would be for the server
   to provide a secure message digest along with the entity-tag based
   indicia value.  Under the assumption that the attacker cannot
   construct a bogus resource instance with the same message-digest
   value, this should protect against the spoofing attack.  However, if
   the server does provide a secure message digest, it seems preferable
   to use this directly as the indicia value, rather than to use it only
   as prevention against spoofing.

   If the responses in the proxy's cache are signed by the origin
   server, this would allow the client to detect spoofing (provided that
   the client has some means of discovering the true identity of the
   server).  It might be possible to use the proposed Digest
   Authentication scheme [6] for this purpose, but we have not done the
   necessary analysis.  Also, restriction of duplicate suppression to
   properly signed responses greatly restricts the potential benefits of
   duplication suppression, by eliminating the possibility of using a
   duplicate response from any of a large set of servers.

   Another defense against such an attack is for the proxy to ignore a
   ``DCluster'' header that specifies a different server from that in
   the Request-URI.  However, this defense is ineffective if a server is
   shared among multiple, possibly mutually untrustworthy, content
   providers.  As with signature-based defenses, it also greatly reduces
   the potential effectiveness of duplicate suppression.
Mogul, van Hoff                                                [Page 21]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   We recommend, therefore, that the use of entity tags as indicia
   values be restricted to environments, such as well-protected
   intranets, where the threat of spoofing attacks is prevented by other
   means (such as a trustworthy community of employees).

7.3 False association of an indicia with a URI
   In the third kind of attack, a malicious server tells a client that
   the current instance of a resource R1 has a given indicia value I,
   and thereby induces the client to make a request for that resource
   with a SubOK header field carrying indicia value I. If the attacker
   can predict that the request will be sent via a proxy P which has a
   cache entry for an instance of a resource R2 that also has an indicia
   value of I, then the client may receive an instance of R2 in its
   response.  This would be bad if the true value of R1 was actually
   quite different.

   This attack has two potential consequences.  It might be that the
   attacker really is the origin server for resource R1, but for some
   reason wants to be able to convince the client that the value of R1
   is something different.  It is not clear if this kind of attack is a
   serious threat.

   The other consequence is more significant.  If the attacker can
   provide the client with bogus indicia for a resource R1 that it does
   not actually control, then the attacker can easily cause the client
   to see the wrong apparent value for R1, with the unwitting assistance
   of the proxy.  The provision of a secure message digest with, or as,
   the indicia value does not help in this case, since the attacker
   could easily provide the message digest for R2.

   This implies that clients should not accept indicia information for a
   resource from a server that cannot be trusted to provide honest
   information about a resource.  For example, if a DRP index data
   structure sent by server evilhacker.com provides an indicia for a
   resource at server victimserver.com, the client should probably
   ignore the indicia, unless it has additional information implying
   that evilhacker.com reliably speaks on behalf of victimserver.com.
   (This defense breaks down if a server hostname is shared among
   multiple, possibly mutually untrustworthy, content providers.)


8 Acknowledgements

   Andrei Broder helped improve our understanding of digest and
   fingerprint functions.


9 References

   NOTE TO RFC EDITOR: many of the references here might be out of date.
   Please verify these with the primary author of this Internet-Draft
   before issuing this document as an RFC.
Mogul, van Hoff                                                [Page 22]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   1.  N. Borenstein and N. Freed.  MIME (Multipurpose Internet Mail
   Extensions) Part One:  Mechanisms for Specifying and Describing the
   Format of Internet Message Bodies.  RFC 1521, IETF, September, 1993.

   2.  Andrei Broder.  Some applications of Rabin's fingerprinting
   method.  In R. Capocelli, A. De Santis, and U. Vaccaro, Ed.,
   Sequences II: Methods in Communications, Security, and Computer
   Science, Springer-Verlag, 1993, pp. 143-152.

   3.  Andrei Z. Broder, Steven C. Glassman, and Mark S. Manasse.
   Syntactic Clustering of the Web.  Proc. 6th International World Wide
   Web Conf., Santa Clara, CA, April, 1997, pp. 391-404.
   http://www6.nttlabs.com/HyperNews/get/PAPER205.html.

   4.  Fred Douglis, Anja Feldmann, Balachander Krishnamurthy, and
   Jeffrey Mogul.  Rate of Change and Other Metrics:  a Live Study of
   the World Wide Web.  Proc. Symposium on Internet Technologies and
   Systems, USENIX, Monterey, CA, December, 1997. To appear.

   5.  Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
   Nielsen, and Tim Berners-Lee.  Hypertext Transfer Protocol --
   HTTP/1.1.  RFC 2068, HTTP Working Group, January, 1997.

   6.  J. Franks, P. Hallam-Baker, J. Hostetler, P. Leach, A. Luotonen,
   E. Sink, L. Stewart.  An Extension to HTTP: Digest Access
   Authentication.  RFC 2069, HTTP Working Group, January, 1997.

   7.  N. Freed and N. Borenstein.  Multipurpose Internet Mail
   Extensions (MIME) Part One:  Format of Internet Message Bodies.  RFC
   2045, Network Working Group, November, 1996.

   8.  Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R. Carter, and
   D. Jensen.  Extensions for Distributed Authoring on the World Wide
   Web -- WEBDAV.  Internet-Draft draft-ietf-webdav-protocol-06, HTTP
   Working Group, January, 1998. This is a work in progress.

   9.  Arthur van Hoff, John Giannandrea, Mark Hapner, Steve Carter, and
   Milo Medin.  The HTTP Distribution and Replication Protocol.
   Technical Report NOTE-DRP, World Wide Web Consortium, August, 1997.
   URL http://www.w3.org/TR/NOTE-drp-19970825.html.

   10.  Balachander Krishnamurthy and Craig E. Willis.  Piggyback server
   invalidation for proxy cache coherency.  Proc. 7th International
   World Wide Web Conf., Australia, April, 1998, pp. ??-??. To appear.

   11.  Merriam-Webster.  Webster's Seventh New Collegiate Dictionary.
   G. & C. Merriam Co., Springfield, MA, 1963.

   12.  Merriam-Webster.  Webster's Third New International Dictionary.
   Merriam-Webster Inc., Springfield, MA, 1986.


Mogul, van Hoff                                                [Page 23]


Internet-Draft         HTTP duplicate suppression    15 April 1998 18:19


   13.  Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, Balachander
   Krishnamurthy, Yaron Goland, and Arthur van Hoff.  Delta encoding in
   HTTP.  Internet-Draft draft-mogul-http-delta-00, HTTP Working Group,
   January, 1998. This is a work in progress.

   14.  Jeffrey C. Mogul and Paul Leach.  Simple Hit-Metering and
   Usage-Limiting for HTTP.  RFC 2227, Internet Engineering Task Force,
   October, 1997.

   15.  Jeffrey C. Mogul and Arthur Van Hoff.  Instance Digests in HTTP.
   Internet-Draft draft-mogul-http-digest-00, HTTP Working Group,
   January, 1998. This is a work in progress.

   16.  National Institute of Standards and Technology.  Secure Hash
   Standard.  FEDERAL INFORMATION PROCESSING STANDARDS PUBLICATION
   180-1, U.S. Department of Commerce, April, 1995.
   http://csrc.nist.gov/fips/fip180-1.txt.

   17.  The Open Group.  The Single UNIX Specification, Version 2 - 6
   Vol Set for UNIX 98.  Document number T912, The Open Group, February,
   1997.

   18.  Venkata N. Padmanabhan and Jeffrey C. Mogul.  "Using Predictive
   Prefetching to Improve World Wide Web Latency".  Computer
   Communication Review 26, 3 (1996), 22-36.

   19.  M. O. Rabin.  Fingerprinting by random polynomials.  Report
   TR-15-81, Department of Computer Science, Harvard University, 1981.

   20.  R. Rivest.  The MD5 Message-Digest Algorithm.  RFC 1321, Network
   Working Group, April, 1992.


10 Authors' addresses

   Jeffrey C. Mogul
   Western Research Laboratory
   Digital Equipment Corporation
   250 University Avenue
   Palo Alto, California, 94305, U.S.A.
   Email: mogul@wrl.dec.com
   Phone: 1 650 617 3304 (email preferred)

   Arthur van Hoff
   Marimba, Inc.
   440 Clyde Avenue
   Mountain View, CA 94043
   1 (650) 930 5283
   avh@marimba.com



Mogul, van Hoff                                                [Page 24]