INTERNET-DRAFT                                       Vinod Valloppillil
<draft-vinod-icp-traffic-dist-00.txt>             Microsoft Corporation
                                                             Josh Cohen
                                                Netscape Communications
                                                          21 April 1997
                                                   Expires October 1997


                Hierarchical HTTP Routing Protocol

Status of this Memo

  This document is an Internet-Draft.  Internet-Drafts are working
  documents of the Internet Engineering Task Force (IETF), its areas,
  and its working groups.  Note that other groups may also distribute
  working documents as Internet-Drafts.

  Internet-Drafts are draft documents valid for a maximum of six months
  and may be updated, replaced, or obsoleted by other documents at any
  time.  It is inappropriate to use Internet-Drafts as reference
  material or to cite them other than as ``work in progress.''

  To learn the current status of any Internet-Draft, please check the
  ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
  Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
  munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
  ftp.isi.edu (US West Coast).

Abstract

  Recent interest in finding solutions for traffic problems stemming
  from HTTP have centered around the use of cooperating proxy-caches.

  We contend that by using a deterministic, hash-based approach for
  routing URLs within an "array" of proxy servers, many of the benefits
  of alternative cache cooperation protocols (such as ICP) may be
  realized.

  As an example of such an implementation we propose the use of
  "Proxy Client Configuration Files" between proxy servers in order
  to exchange routing information.  This implementation is motivated
  in part by the adoption of this file by existing, popular web
  browsers to provide intelligent URL request routing.

  This draft discusses adopting this well-understood, widely
  implemented browser protocol by web proxies in order to facilitate
  intelligent routing of requests within a network of proxy servers.


Valloppillil & Cohen                                          [Page 1]


INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

1. Introduction

  There is significant interest in the Internet community and the
  ICP working group in particular in finding mechanisms where these
  public caches on individual proxy servers can be further aggregated
  and shared by as many browsers as possible.

  Philosophically, protocols such as ICPv2 are based on dynamic
  "pinging" of neighboring proxy servers in an attempt to locate
  copies of cached objects.

  We propose an alternate approach based on hash-based routing of
  URLs.  The hash-based routing approach documented here uses a known
  "request resolution path" through a network of proxies that is
  determined by the URL of the request.  An interesting side effect of
  this deterministic mechanism is that cache duplication is avoided.

  Hashing distributes the URL space among several proxies which are
  assumed to be relatively equidistant from each other.  Additionally,
  this hash-based approach is more tuned for "hierarchical" deployments
  of proxy servers.  One example of this might be a departmental level
  proxy which routes into an "array" of top level proxies in a
  corporation which provide the gateway to an ISP.  The ISP, in turn,
  might operate another "array" of proxies at his/her POP.

  By contrast, ICP networks typically involve peered caches which
  may operate at the top level of many ISP hierarchies.

  As an example of an implementation of hash-based routing, we propose
  extending the existing "Proxy Client Configuration File" protocol used
  by browsers to intelligently route HTTP requests.

  Our proposal would implement this protocol on proxy servers in order
  to provide a vendor independent mechanism for specifying sophisticated
  hop-by-hop HTTP routing between groups of proxy servers.

  We also demonstrate that intelligent utilization of this routing
  protocol can yield almost all of the benefits of alternative cache
  cooperation protocols.

  We do NOT propose any specific routing scripts and instead leave
  determination of such scripts up to individual vendor
  implementations.

  Although there are clear advantages to the use of the
  Proxy Client Configuration File as the vehicle for transporting
  routing information, there may be interest in the working group
  in exploring other vehicles (e.g. publishing a static data table
  containing proxies in an "array" implementing a well-known hash
  function within proxies)

Valloppillil & Cohen                                          [Page 2]


INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

2. Proxy Client Configuration File

  The Proxy Client Configuration File is described in [1] and [2].
  Additionally, multiple interoperable implementations of this protocol
  are available in popular client browsers.

  As originally constructed, this file is intended for consumption by
  client programs (web browsers) and is evaluated per URL to be
  retrieved by the browser.  The output of this script provides an
  ordered series of proxy servers to be used by the browser to retrieve
  the object specified by the URL.

  One of the excellent properties of HTTP-proxy protocol [5] is that it
  exposes proxy servers to upstream servers & upstream proxies as
  regular clients.  Because the administrator a group of proxies may
  wish to make make assumptions about a downstream client's ability
  to interpret a script, we wish to extend the metaphor to include
  use of the configuration file by proxies as well as "classical"
  clients.


3. Example implementation

  Researchers have documented the concept of using client-side
  hash-based routing to spread load across multiple proxy servers.
  The deterministic nature of many of these algorithms has the
  additional benefit of improving cache hit rates by creating the
  image of a single logical cache spread over many proxies. [4]

  In this proposal, the administrator of an "array" of proxies at an
  ISP may wish to construct a script that hashes URLs and distributes
  the hash space across each of his/her proxy servers.  Using the same
  downstream script, the administrator should be able to service both
  dial-in clients (whose browsers already support the protocol) as well
  as leased lines to corporate proxies.

  The hop-by-hop nature of the routing provides additional flexibility
  in this example.  The corporation  may wish to use one particular
  routing script internally (one which tells clients to directly access
  intranet content, for example) whereas the ISP may wish for the
  corporation's proxy servers to use a different script to route into
  the ISP's proxies (one which routes all requests through the caches
  for maximum hit rates).



Valloppillil & Cohen                                          [Page 3]


INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

4. Security Considerations

  Security issues are not directly addressed in this document.  Any
  security functionality is derived from the underlying HTTP layer.

  Some consideration may need to be given to ensure the integrity /
  security of the initial script passing.  More specifically, this
  draft doesn't address issues that may stem from the possiblity that
  malicious scripts may be constructed.

5. Advantages of script-based routing vs. ICP v2

  We now provide a comparison of this proposal vs. the current Internet
  Cache Protocol draft [3].

  a. Symmetric protocol between client -> proxy and proxy -> proxy

    This preserves the symmetry of HTTP's presentation of proxy servers
    as "mega clients" to upstream servers / proxies.

    ICP is not currently processed / generated by client browsers.

  b. Eliminate messages for cache 'miss' events.

    A very significant percentage of all ICP messages exchanged in the
    field are cache "misses." [NLANR's field experience indicates that
    85-90% of all ICP transactions are "misses".]

    Because this protocol eliminates querying, miss messages no longer
    occur  (the outcome of all forwards are now either either "cache
    hit" or "continue resolving upstream").

  c. Takes advantage of all HTTP work including options, cache-control,
  authentication, etc.

    HTTP already provides protocol options to perform functions such as
    proxy to proxy authentication, etc.  These functions don't have to
    be re-invented.

    Additionally, much of the new behavior in the HTTP 1.1 cache-control
    headers is not expressible in ICPv2.  Forwarding the entire HTTP
    request to the next upstream/neighboring proxy allows it to be
    privy to these options.

  d. Already implemented on the browser

    Eases compliance testing and demonstrates soundness of the protocol
    (in a limited case).


Valloppillil & Cohen                                          [Page 4]


INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

  e. Sorted requests between proxies = single logical cache

    Over time, assuming that URL requests are randomly routed (e.g.
    round robin DNS) to a set of peer ICP neighbors (e.g. on a LAN
    within an ISP's head-end), the contents of these neighboring
    caches will eventually become roughly identical.

    A deterministic hash-based routing scheme, however, provides for a
    single logical cache image across 'n' proxies instead of 'n'
    identical caches.

    ICP's peer to peer queries are replaced by intelligent request
    routing in the previous level of the hierarchy.

  f.  No new transport protocols

    The behavior of HTTP is already well understood by system
    administrators and passed through firewalls, etc.  By contrast,
    ICP is relatively unknown in the vast majority of intranets
    which may affect speed of deployment.

    In general, the development and deployment of new wire protocols
    should be a carefully evaluated endeavor due to huge support
    costs and "entropy" effects on corporate networks.

6.     Advantages of ICP v2 vs. script-based routing

  a. Exchange of messages over WAN

    ICP is sometimes used across very wide area links to perform
    cache look-ups.  An example of this might be peered top-level
    caches between two overseas ISPs.  This protocol is more
    intended for use by proxies that are in relative proximity to each
    other.

    One critical question is whether these transoceanic cache
    look-ups are worth their cost.  This is especially a concern given
    the opportunity to build larger caches within a traditional cache
    hierarchy.  Do large local caches "skim" most of the potential
    cache hits?  This question could be answered with some idea of the
    hit rate for ICP over WAN links between very large peer caches.

Valloppillil & Cohen                                          [Page 5]


INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

  b. Exchange of messages across peer administrative domains

    Correct implementation of the proxy configuration script is in part
    dependent on having a series of proxies within the same
    administrative domain which share their logical cache.

    Because ICP maintains a very loose relationship between neighbors,
    it is easier to implement across such domains.  However, once
    again, the question of whether anything more than 2 or 3 levels of
    cache look-ups is valuable becomes pertinent.  If not, then a 2-3
    level hierarchical array of proxies within corporations & ISPs
    might be sufficient for maximum cache hit rates.

  c. Binary protocol

    ICP is clearly faster and easier to parse than HTTP due to it's
    binary nature.  However, the construction of efficient HTTP engines
    is already at a premium due to the wide deployment of the protocol.

  d. Connectionless transport

    ICP can and often is transported over UDP which is lighter weight
    than HTTP's TCP connection.  Many of these disadvantages may be
    mitigated by performance optimizations such as keep-alives and
    pipelining.

    Additionally, notice that in the case of a cache hit, ICP may
    require construction of a TCP connection to transport the requested
    object.

    Furthermore, the lack of congestion control on ICP messages is
    the obvious downside of connectionless transport.  In this scheme
    connections between proxy servers would almost certainly be HTTP
    Keep-Alive sessions.

  e. Failure case benefit.

    If for some reason, the ICP cache who has a URL is too slow to
    respond or is down an alternate cache will be used to fulfill
    the request.  It is likely that this cache will cache the
    results.  At any later point in time, this cache will respond
    with a HIT message when queried about the URL.  This allows
    very busy URLs to be spread among multiple caches and stems from
    the non-deterministic nature of the protocol.

    In the hashing scheme, if a busy set of URLS is assigned to one
    cache via the hash, and that server is too slow or down, another
    cache will handle and cache that request.  Unfortunately, that
    cached version is of no use to any clients or proxies anymore
    since the clients/proxies will never go to that proxy again if it
    doesnt match the hash function.

Valloppillil & Cohen                                          [Page 7]


INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

  f.  Server distance determination

    In the field, a secondary benefit of ICP has been use of its
    UDP round-trip times as a means of guaging relative distance
    between peer caches.  Because hash-based routing relies on TCP
    and implies hierarchies known a priori, this feature of ICP
    isn't realized.

  g.  Current installed base

    ICP currently has an installed base of ~3000 proxies.

7. Open Issues

  As specified via Proxy Client Configuration files, there are
  two primary open issues associated with this protocol:

  1)  Standardization of the Proxy-client configuration file.

    Currently, this protocol is only a de facto standard and has not
    been formally accepted / endorsed by the IETF

  2)  Performance of script evaluation on proxy servers.

    There are potentially significant issues with evaluating proxy
    configuration scripts per URL processed by a proxy server.
    Requiring an interpreter for Javascript [1] may be outside of
    the bounds of the working group.

    Additionally, performance of the script + script interpreter may
    be a significant cost for proxy servers which need to handle high
    transaction volumes.


8. Acknowledgements

  The authors would like to thank Brian Smith, Kip Compton, Ari
  Luotonen, and Kerry Schwartz for their assistance in preparing
  this document.

Valloppillil & Cohen                                          [Page 8]

INTERNET-DRAFT   Hierarchical HTTP Routing Protocol    21 April 1997

9. References

  [1] Luotonen, Ari., "Navigator Proxy Auto-Config File Format",
 Netscape Corporation, http://home.netscape.com/eng/mozilla/2.0/
 relnotes/demo/proxy-live.html, March 1996.

  [2] Microsoft Corporation., "Automatic Proxy Configuration",
 http://www.microsoft.com/ie/ieak/autosys.htm, March 21, 1997.

  [3] Wessels, Duane., "Internet Cache Protocol Version 2", http://ds.
 internic.net/internet-drafts/draft-wessels-icp-v2-00.txt, March 21,
 1997.

  [4] Sharp Corporation., "Super Proxy Script",
 http://naragw.sharp.co.jp/sps/, August 9, 1996.

  [5] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1",
 RFC 2068, UC Irvine, January 1997.


10.  Author Information

    Vinod Valloppillil
    Microsoft Corporation
    One Microsoft Way
    Redmond, WA 98052

    Phone:  1.206.703.3460
    Email:  VinodV@Microsoft.Com

    Josh Cohen
    Netscape Communications Corporation
    501 E. Middlefield Rd.
    Mountain View, CA 94043

    Phone: 1.415.937.4157
    Email: Josh@Netscape.Com





Expires October 1997