Web Replication and Caching Working Group                     J. Dilley
INTERNET DRAFT                                          HP Laboratories
draft-ietf-wrec-known-prob-00.txt                                   (editor)

1 September 1999                                   Expires 1 March 2000

                     Known HTTP Proxy/Caching Problems

Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.

The list of current Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.

This Internet-Draft will expire on 1 March 2000.

Copyright Notice

Copyright (C) The Internet Society (1999). All Rights Reserved.

Abstract

This memo catalogs a number of known problems with World Wide Web proxy
and cache servers. The goal of the document is to provide a discussion
of the problems and proposed workarounds, and ultimately to improve
conditions by illustrating problems. The construction of this document
is a joint effort of the web caching community. It is being done under
the auspices of the IETF Web Replication and Caching working group. We
gratefully acknowledge RFC 2525, which helped define the initial format
for this known problems list.

Introduction

This memo discusses problems both with Proxy servers, which act as
application-level gateways for web requests, as well as Cache servers,
which hold copies of previously requested documents in the hope of
saving future network bandwidth and latency for users. Proxies often
perform a caching function, but the two are not necessarily
linked. Refer to the work in progress report Internet Web Replication
and Caching Taxonomy for definitions of proxy and cache terminology used
in this memo.

No individual or organization has complete knowledge of the know
problems in web caching. If you know of a problem that is not documented
on this list you are encouraged to send it to the WREC mailing list,
wrec@cs.utk.edu for discussion or to the memo's editor, jad@hpl.hp.com
for review and inclusion in the list.

This document is available in HTML and in text format. The text-only
version of the document is available at

http://www.hpl.hp.com/personal/John_Dilley/caching/draft-wrec-known-prob-00.txt

Problem Template

Each problem is defined in a common format, summarized in the following
table and described below.
------------------------------------------------------------------------

Name:            short, descriptive name of the problem (3-5 words)
Classification:  classifies the problem: performance, security, etc
Description:     describes the problem succinctly
Significance:    magnitude of problem, environments where it exists
Implications:    the impact of the problem on systems and networks
See Also:        a reference to a related known problem
Indications:     states how to detect the presence of this problem
Solution(s):     describe the solution(s) to this problem, if any
Workaround:      practical workaround for the problem
References:      information about the problem or solution
Contact:         contact name and email address for this section

------------------------------------------------------------------------

Name
     A short, descriptive name (3-5 words) name associated with the
     problem.
Classification
     Problems are grouped into categories of similar problems for ease
     of reading of this memo. Choose the category that best describes
     the problem. The suggested categories include three general
     categories and several more specific categories.
     o Architecture: the fundamental design is incomplete, or incorrect.
     o Specification: the spec is ambiguous, incomplete, or incorrect.
     o Implementation: the implementation of the spec is incorrect
       -----------------------------------------------------------------
     o Performance: perceived page response at the client is excessive;
       network bandwidth consumption is excessive; demand on origin or
       proxy servers exceed reasonable bounds.
     o Administration: care and feeding of caches is or causes a
       problem.
        o Security: privacy, integrity, or authentication concerns.
     This is the first draft of this memo. The classification structure
     is in revision. In the published drafts of the memo the
     classification structure should be fixed but may be revised from
     time to time.
Description
     A definition of the problem, succinct but including necessary
     background information.
Significance (High, Medium, Low)
     May include a brief summary of the environments for which the
     problem is significant.
Implications
     Why the problem is viewed as a problem. What inappropriate behavior
     results from it? This section should substantiate the magnitude of
     any problem indicated with High significance.
See Also
     Optional. List of other known problems that are related to this
     one.
Indications
     How to detect the presence of the problem. This may include
     references to one or more substantiating documents that demonstrate
     the problem.  This should include the network configuration that
     led to the problem such that it can be reproduced. Problems that
     are not reproduceable will not appear in this memo.
Solution(s)
     Solutions that permanently fix the problem, if such are known. For
     example, what version of the software does not exhibit the problem?
     Indicate if the solution is accepted by the community, one of
     several solutions pending agreement, or open possibly with
     experimental solutions.
Workaround
     Practical workaround if no solution is available or usable. The
     workaround should have sufficient detail for someone experiencing
     the problem to get around it.
References
     References to related information in technical publications or on
     the web. Where can someone interested in learning more go to find
     out more about this problem, its solution, or workarounds?
Contact
     Contact name and email address of the person who supplied the
     information for this section. If you would prefer to remain
     anonymous the editor's name will appear here instead, but we
     believe in credit where credit is due.

Document Template

Templates for submission of known problems can be found on the web at

http://www.hpl.hp.com/personal/John_Dilley/caching/known-prob-template-00.html
.

------------------------------------------------------------------------
Known Problems

The remaining sections present the currently documented known
problems. The problems are ordered by classification and
significance. Issues with web cache protocol specification or
architecture are first, followed by implementation issues. Issues of
high significance are first, followed by lower significance. The list
below links to each of the known problem descriptions below.

     Known Problems List - Tue Aug 31 15:17:45 1999
   * Network transparent proxies break client cache directives
   * Network transparent proxies prevent introduction of new HTTP methods
   * Cannot specify multiple URIs for replicated resources
   * Replica distance is unknown
   * Proxy resource location
   * Cache peer selection in heterogeneous networks
   * ICP performance
   * Cache meshes can break HTTP serialization of content
   * Use of Cache-Control headers
   * Lack of HTTP/1.1 compliance for proxy caches
   * ETag support
   * Client proxy failover
   * Servers and content should be optimized for caching
   * Some servers send bad Content-Length header
   * Lack of fine-grained, standardized hierarchy controls
   * Proxy/Server exhaustive log format standard for analysis
   * Trace log timestamps

Please send any updated or new problems to the document editor,
jad@hpl.hp.com. I will updated this document and re-post it as needed.
Thank you!

------------------------------------------------------------------------

Architecture

------------------------------------------------------------------------
Name
     Network transparent proxies break client cache directives
Classification
     Architecture
Description
     HTTP is designed for the client to be aware if it is connected to
     an origin server or to a proxy. Clients who believe they are
     transacting with an origin server but are really in a connection
     with a network transparent proxy may fail to send critical
     cache-control information they would have otherwise included in
     their request.
Significance
     High
Implications
     Clients may receive data that is not synchronized with the origin
     even when they request an end to end refresh because of the lack of
     inclusion of either a cache-control: no-cache or must-revalidate
     header. These headers have no impact on origin server behavior so
     may not be included by the browser if it believes it is connected
     to that resource. Other related data implications are possible as
     well. For instance data security may be compromised by the lack of
     inclusion of private or no-store clauses of the cache-control
     header under similar conditions.
Indications
     Easily detected by placing fresh (un-expired) content on a proxy
     while changing the authoritative copy and requesting an end to end
     reload of the data through a proxy in both transparent and explicit
     modes.
Solution(s)
     Eliminate the need for network transparent proxies and IP spoofing
     which will return correct context awareness to the client.
Workaround
     Include relevant cache-control: directives in every request at the
     cost of increased bandwidth and CPU requirements.
Contact
     Patrick McManus <mcmanus@AppliedTheory.com>

------------------------------------------------------------------------
Name
     Network transparent proxies prevent introduction of new HTTP methods
Classification
     Architecture
Description
     A proxy that receives a request with a method unknown to it is
     required to generate an HTTP 501 Error as a response. HTTP methods
     are designed to be extensible so there may be applications deployed
     with initial support just for the user agent and origin server. A
     transparent proxy that hijacks requests with new methods destined
     for servers that have implemented that method creates a de-facto
     firewall where none may be intended.
Significance
     Medium within network transparent proxy environments.
Implications
     Renders new compliant applications useless unless modifications are
     made to proxy software. Because new methods are not required to be
     globally standardized it is impossible to keep up to date in the
     general case.
Solution(s)
     Eliminate the need for network transparent proxies. A client
     receiving a 501 in a traditional HTTP environment may either choose
     to repeat the request to the origin server directly, or perhaps be
     configured to use a different cache.
Workaround
     Level 5 switches (sometimes called Level 7 or application layer
     switches) can be used to keep HTTP traffic with unknown methods out
     of the proxy. However, these devices have heavy buffering
     responsibilities, still require TCP sequence number spoofing, and
     do not interact well with persistent connections.

     The HTTP/1.1 specification allows a proxy to switch over to tunnel
     mode when it receives a request with a method or HTTP version it
     does not understand how to handle.
Contact
     Patrick McManus <mcmanus@AppliedTheory.com>
     Henrik Nordstrom <hno@hem.passagen.se> (HTTP/1.1 clarification)

------------------------------------------------------------------------
Name
     Cannot specify multiple URIs for replicated resources
Classification
     Architecture
Description
     There is no way to specify that multiple URIs may be used for a
     single resource, one for each replica of the resource. Similarly,
     there is no way to say that some set of proxies (each identified by
     a URI) may be used to resolve a URI.
Significance
     Medium
Implications
     Forces users to understand the replication model and
     mechanism. Makes it difficult to create a replication framework
     without protocol support for replication and naming.
Indications
     Inherent in HTTP 1.0, HTTP 1.1.
Solution(s)
     Architectural - protocol design is necessary.
Workaround
     Replication mechanisms force users to locate a replica or mirror
     site for replicated content.
Contact
     Daniel LaLiberte <liberte@w3.org>

------------------------------------------------------------------------
Name
     Replica distance is unknown
Classification
     Architecture
Description
     There is no recommended way to find out which of several servers or
     proxies is closer either to the requesting client or to another
     machine, either geographically or in the network topology.
Significance
     Medium
Implications
     Clients must guess which replica is closer to them when requesting
     a copy of a document that may be served from multiple
     locations. Users are must know the set of servers that can serve a
     particular object.  This in general is hard to determine and
     maintain. Users must understand network topology in order to choose
     the closest copy. Note that the closest copy is not always the one
     that will result in quickest service. A nearby but heavily loaded
     server may be slower than a more distant but lightly loaded server.
Indications
     Inherent in HTTP 1.0, HTTP 1.1.
Solution(s)
     Architectural - protocol work is necessary. This is a specific
     instance of a general problem in widely distributed systems. A
     general solution is unlikely, however a specific solution in the
     web context is possible.
Workaround
     Servers can (many do) provide location hints in a replica selection
     web page. Users choose one based upon their location. Users can
     learn which replica server gives them best performance. Note that
     the closest replica geographically is not necessarily the closest
     in terms of network topology. Expecting users to understand network
     topology is unreasonable.
Contact
     Daniel LaLiberte <liberte@w3.org>

------------------------------------------------------------------------
Name
     Proxy resource location
Classification
     Architecture
Description
     There is no way to tell a proxy that it may request a resource from
     another location, then the receiver should check the authenticity
     of the given resource.
Significance
     Medium
Implications
     Proxies have no systematic way to locate resources within other
     proxies or origin servers. This makes it more difficult to share
     information among proxies. Information sharing would improve global
     efficiency.
Indications
     Inherent in HTTP 1.0, HTTP 1.1.
Solution(s)
     Architectural - protocol design is necessary.
Workaround
     Certain proxies share location hints in the form of summary digests
     of their contents (e.g., Squid). Certain proxy protocols enable a
     proxy query another for its contents (e.g., ICP). (See however "ICP
     Performance" issue.)
Contact
     Daniel LaLiberte <liberte@w3.org>

------------------------------------------------------------------------
Name
     Cache peer selection in heterogeneous networks
Classification
     Architecture
Description
     Cache peer selection in networks with large variance in latency and
     bandwidth between peers can lead to non-optimal peer selection. For
     example take cache C with two siblings, Sib1 and Sib2, and the
     following network topology (summarized).
        o Cache C's link to Sib1, 2 Mbit/sec with 300 msec latency
        o Cache C's link to Sib2, 64 Kbit/sec with 10 msec latency.

     ICP won't work well in this context. If a user submits a request to
     Cache C for page P that results in a miss. C will send an ICP
     request to Sib1 and Sib2. Assume both siblings have the requested
     object P.  The ICP-HIT reply will always come from Sib2 before
     Sib1. However, for large objects it is clear that the retrieval
     will be faster from Sib1 rather than Sib2.

     In fact, the problem is more complex because Sib1 and Sib2 can't
     have a 100% hit ratio. With a hit rate of 10%, it is more efficient
     to use Sib1 with URLs larger than 48K. The best choice depends on
     at least the hit rate and link characteristics; maybe other
     parameters as well.
Significance
     Medium
Implications
     By selecting the first peer to respond peer selection algorithms
     are not optimizing retrieval latency to end users. Furthermore they
     are causing more work for the high-latency peer since it must
     respond to such requests but will never be chosen to serve content
     if the lower latency peer has a copy.
Indications
     Inherent in design of ICP v1, ICP v2, and any cache mesh protocol
     that selects peer based upon first response.

     This problem is not exhibited by cache digest or other protocols
     which (attempt to) maintain knowledge of peer contents and only hit
     peers that are believed to have a copy of the requested page.
Solution(s)
     This problem is architectural with the peer selection protocol.
Workaround
     Cache mesh design when using such a protocol should be done in such
     a way that there is not a high latency variance among peers. In the
     example presented in the Description the high latency high
     bandwidth peer could be used as a parent, but should not be used as
     a sibling.
Contact
     Ivan LOVRIC <ivan.lovric@cnet.francetelecom.fr>
     John Dilley <jad@hpl.hp.com>

------------------------------------------------------------------------
Name
     ICP performance
Classification
     Architecture(ICP), Performance
Description
     The ICP protocol exhibits O(n^2) scaling properties, where n is the
     number of peer proxies participating in the protocol. This can lead
     ICP traffic to dominate HTTP traffic within a network.
Significance
     Medium
Implications
     If a proxy has many ICP peers the bandwidth demand of ICP can be
     excessive. Cache managers must carefully regulate ICP peering. ICP
     also leads proxies to become heterogeneous in what they serve. This
     means if your proxy does not have a document it is unlikely your
     peers will have it either. Therefore, ICP traffic requests are
     largely unable to locate a local copy of an object [credit to
     Ingrid Melve's 3WCW talk for this].
Indications
     Inherent in design of ICP v1, ICP v2.
Solution(s)
     This problem is architectural - protocol redesign or replacement
     are required to solve it if ICP is to continue to be used.
Workaround
     Implementation workarounds exist, for example to turn off use of
     ICP, to carefully regulate peering, or to use another mechanism if
     available, such as cache digests. A cache digest protocol shares a
     summary of cache contents using a Bloom Filter technique. This
     allows a cache to estimate whether a peer has a document. Filters
     are updated regularly but are not always up-to-date so cannot help
     when a spike in popularity occurs. They also increase traffic but
     not as much as ICP.

     Cache clustering protocols organize caches into a mesh provide
     another alternative solution. There is ongoing research on this
     topic.
Contact
     John Dilley <jad@hpl.hp.com>

------------------------------------------------------------------------
Name
     Cache meshes can break HTTP serialization of content
Classification
     Architecture (HTTP protocol)
Description
     A cache mesh where a request may travel different paths depending
     on the sate of the mesh and associated caches can break HTTP
     content serialization, possibly causing the end user to receive
     older content than seen on an earlier request where the request
     traveled another path in the mesh.
Significance
     Medium
Implications
     Can cause end user confusion. May in some situations (sibling cache
     hit, object has changed state from cacheable to uncacheable) be
     close to impossible to get the caches properly updated with the new
     content.
Indications
     Older content is unexpectedly returned from a cache mesh after some
     time.
Solutions(s)
     Work with cache vendors and researchers to find a suitable protocol
     for maintaining cache relations and object state in a cache mesh.
Workaround
     When designing a cache hierarchy/mesh, make sure that for each
     end-user,URL combination there is only one single path in the mesh
     during normal operation.
Contact
     Henrik Nordstrom <hno@hem.passagen.se>

------------------------------------------------------------------------

Implementation

------------------------------------------------------------------------
Name
     Use of Cache-Control headers
Classification
     Implementation
Description
     Many (if not most) implementations incorrectly interpret
     Cache-Control response headers.
Significance
     High
Implications
     CC headers will be spurned by end users if there are conflicting or
     non-standard implementations.
Indications
     Check: Squid, NetCache, Cache Engine, HTTP State Management draft
     for use of CC: no-cache and must-revalidate against HTTP/1.1rev6.
Solution(s)
     Work with vendors and others to assure proper application
Workaround
     None
Contact
     Mark Nottingham <mnot@pobox.com>

------------------------------------------------------------------------
Name
     Lack of HTTP/1.1 compliance for proxy caches
Classification
     Implementation
Description
     Although performance benchmarking of caches is starting to be
     explored, protocol compliance is just as important.
Significance
     High
Implications
     Cache vendors implement their interpretation of the spec; because
     the specification is very large, sometimes vague and ambiguous,
     this can lead to inconsistent behavior between proxy caches.

     Proxy caches need to comply to the specification (or the
     specification needs to change).
Indications
     There is no currently known compliance test being used.

     There is work underway to quantify how closely servers comply with
     the current specification. A joint technical report between AT&T
     (#990803-05-TM, available at
     http://www.research.att.com/~bala/papers/procow-1.ps.gz and HP Labs
     (to be published) describes the compliance testing. This report
     examines how well each of a set of top traffic-producing sites
     support certain HTTP/1.1 features.

     The IRCache group is working to develop protocol compliance testing
     software. Running such a conformance test suite against proxy cache
     products would measure compliance and ultimately would help assure
     they comply to the specification.
Solution(s)
     Testing should commence and be reported in an open industry forum.
     Proxy implementations should conform to the specification.
Workaround
     There is no workaround for non-compliance.
Contact
     Mark Nottingham <mnot@pobox.com>
     IRCache: Duane Wessels <wessels@ircache.net>
     Glenn Chisholm <glenn@ircache.net>

------------------------------------------------------------------------
Name
     ETag support
Classification
     Implementation
Description
     No currently released cache implements ETag (strong) validation.
Significance
     Medium
Implications
     LM/IMS validation is inappropriate for many requirements, both
     because of its weakness and its use of dates. Lack of a usable,
     strong coherency protocol leads developers and end users not to
     trust caches.
Indications
     -
Solution(s)
     Work with vendors to implement ETags; work for better validation
     protocols
Workaround
     use LM/IMS validation
Contact
     Mark Nottingham <mnot@pobox.com>

------------------------------------------------------------------------
Name
     Client proxy failover
Classification
     Implementation
Description
     Failover between proxies at the client level (using a proxy.pac
     file) is erratic and no standard behavior is defined. Additionally,
     behavior is hard-coded into the browser, so that proxy
     administrators cannot use failover at the client level effectively.
Significance
     Medium
Implications
     Cache system architects are forced to implement failover at the
     cache itself, when it may be more appropriate and economical to do
     it at the client.
Indications
     If a browser detects that its primary proxy is down, it will wait n
     minutes before trying the next one it is configured to use. It will
     then wait y minutes before asking the user if they'd like to try
     the original proxy again. This is very confusing for end users.
Solution(s)
     Work with browser vendors to establish standard extensions to
     JavaScript proxy.pac libraries that will allow configuration of
     these timeouts.
Workaround
     User education; redundancy at the proxy level.
Contact
     Mark Nottingham <mnot@pobox.com>

------------------------------------------------------------------------
Name
     Servers and content should be optimized for caching
Classification
     Implementation (Performance)
Description
     Many web servers and much web content could be implemented to be
     more conducive to caching, reducing bandwidth demand and page load
     delay.
Significance
     Medium
Implications
     By making poor use of caches origin servers encourage longer load
     times, greater load on cache servers, and increased network demand.
Indications
     The problem is most apparent for pages that have low or zero
     expires time, yet do not change.
Solution(s)
     ...
Workaround
     For example servers could start using unique object identifiers for
     write-only content: if an object changes it gets a new name,
     otherwise is is considered to be immutable and therefore have an
     infinite expire age. Certain hosting providers do this already.
Contact
     Peter Danzig <danzig@netapp.com>

------------------------------------------------------------------------
Name
     Some servers send bad Content-Length header files that contain CR.
Classification
     Implementation
Description
     Certain web servers send a Content-length value that is larger than
     number of bytes in the HTTP message body. This happens when the
     server strips off CR characters from text files with lines
     terminated with CRLF as the file is written to the client. The
     server probably uses the stat() system call to get the file size
     for the Content-Length header. Servers that exhibit this behavior
     include the GN Web server (version 2.14 at least)
     (http://gopher.unicom.com/gn-info/).
Significance
     Low. Surveys indicate only a small number of sites run faulty
     servers.
Implications
     In this case, an HTTP agent (client or proxy) may believe it
     received a partial response. HTTP/1.1 (RFC 2616) advises that
     caches MAY store partial responses.
Indications
     Count the number of bytes in the message body and comparing it to
     the Content-length value. If they differ the server exhibits this
     problem.
Solutions
     Upgrade or replace the buggy server.
Workaround
     Some browsers and proxies use one TCP connection per object and
     ignore the Content-Length. The document end of file is identified
     by the close of the TCP socket.
Contact
     Duane Wessels <wessels@ircache.net>

------------------------------------------------------------------------

Administration

------------------------------------------------------------------------
Name
     Lack of fine-grained, standardized hierarchy controls
Classification
     Administration
Description
     There is no standard for instructing a cache as to how it should
     resolve what parent to fetch a given object from. Because of this,
     implementations vary greatly, and it can be difficult to make them
     interoperate correctly in a complex environment.
Significance
     Medium
Implications
     Complications in deployment of caches in a complex network (esp.
     corporate networks)
Indications
     Inability of some caches to be configured to direct traffic based
     on domain name, reverse lookup IP address, raw IP address, in
     normal operation and in failover mode. Inability in some caches to
     set a preferred parent / backup parent configuration.
Solution(s)
     ?
Workaround
     Work with vendors to establish an acceptable configuration within
     the limits of their product; standardize on one product
Contact
     Mark Nottingham <mnot@pobox.com>

------------------------------------------------------------------------
Name
     Proxy/Server exhaustive log format standard for analysis
Classification
     Administration
Description
     Most proxy or origin server logs used for characterization or
     evaluation do not provide sufficient detail to determine
     cachability of responses.
Significance
     Low (for operationality; high significance for research efforts)
Implications
     Characterizations and simulations are based on non-representative
     workloads.
See Also
     W3C Web Characterization Activity ( http://www.w3.org/WCA/) since
     they are are also concerned with collecting high quality logs and
     building characterizations from them.
Indications
Solution(s)
     To properly clean and to accurately determine cachability of
     responses, a complete log is required (including all request
     headers as well as all response headers such as User-agent [for
     removal of spiders] and Expires, max-age, set-cookie, no-cache,
     etc.)
Workaround
References
     See "Web Traffic Logs: An Imperfect Resource for Evaluation" in
     INET99 ( http://www.cs.rutgers.edu/~davison/pubs/inet99/) for some
     discussion of this.
Contact
     davison@cs.rutgers.edu (Brian D. Davison) tpkelly@eecs.umich.edu
     (Terence Kelly)

------------------------------------------------------------------------
Name
     Trace log timestamps
Classification
     Administration
Description
     Some proxies/servers log requests without sufficient timing detail.
     Millisecond resolution is often too small to preserve request
     ordering and either the servers should record request reception
     time in addition to completion time, or elapsed time plus either
     one.
Significance
     Low (for operationality; medium significance for research efforts)
Implications
     Characterization and simulation fidelity is improved with accurate
     timing and ordering information. Since logs are generally written
     in order of request completion, these logs cannot be re-played
     without knowing request generation times and reordering
     accordingly.
See Also
Indications
     Timestamps can be identical for multiple entries (when only
     millisecond resolution is used). Request orderings can be jumbled
     when clients open additional connections for embedded objects while
     still receiving the container object.
Solution(s)
     Since request completion time is common (e.g. Squid), recommend
     continuing to use it (with microsecond resolution if possible) plus
     recording elapsed time since request reception.
Workaround
References
     See "Web Traffic Logs: An Imperfect Resource for Evaluation" in
     INET99 ( http://www.cs.rutgers.edu/~davison/pubs/inet99/) for some
     discussion of this.
Contact
     davison@cs.rutgers.edu (Brian D. Davison)