Network Working Group                                       Ingrid Melve
INTERNET-DRAFT                                           Simon Wilkinson
draft-melve-cachecontrol-00.txt

Expires September 1998




          Access-restricted, HTTP/1.1 Cache Control Extension



Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

Abstract

   User agents such as caches and web indexers, which act on behalf of
   more than one user are often given access to documents which are
   restricted by IP address or domain. These agents then republish this
   information to users outside the allowed block, as there is currently
   no means of marking these objects with their access restrictions.

   This document details an extension to the Cache-control header in
   HTTP/1.1 [HTTP/1.1] to add information about IP or domain based
   access restrictions.  It also stresses that Cache-control should
   apply to all User-agents which work on behalf on a number of users,
   and not just to caches.


1. The rationale for header information about access restrictions




Melve, Wilkinson                                                [Page 1]


Access-restricted extension            March 1998


   Web caches and indexing robots are examples of user agents which do
   not act on behalf of one end user.  The problem of access control
   when sharing indexes or caches is not trivial for documents which
   have access control based on IP address or domain name, since there
   is no indication of access control being used for the particular
   document.  Web servers do not send any information about access
   control done by IP address.  If a user within the allowed IP address
   range requests the document, the document is stored in the cache and
   subsequent request will be served from the cache.  This may cause
   documents to be served to users without access rights.

   Several popular web servers permit users to create their own access
   control, like Apache does with local .htaccess files, and the local
   web master may not know about access restrictions.  The local cache
   master is even more unlikely to know about such restrictions.  This
   problem is also the same for site licenses for information and
   software, if access control is implemented on basis of IP numbers or
   domain names.  A sibling cache requests a document available at the
   cache server (this server has an IP address within the allowed range)
   and this may then be handed out to sibling.  The solution to this has
   been manual configuration and collection of information about such
   site licenses by the cache administrators.

2. Access restrictions

   A number of methods have been proposed for communicating some access
   control information to visiting User-Agents.  HTTP/1.1 provides the
   Cache-Control header which can indicate the "private" or "public"
   nature of a document, but provides no information as to the community
   that the information is private to.

   An example of why this causes problems is with site licenses for web
   information. A server may be located in the United States, and the
   users in Norway, yet using the "private" header prevents any cache
   from caching the data. Making documents uncacheable is clearly stupid
   as the latency often is too high to ensure a good service for the end
   users.

   Another method proposed for robots is the use of the robots.txt file
   This method works on a centrally controlled server, where the
   maintainer of the robots.txt file is aware of all access restrictions
   in place, but breaks down on a server where any user may add access
   restrictions to their pages.

   Proposed Cache Control Extension
    Access-restricted="IP:"
    Access-restricted="Domain:"




Melve, Wilkinson                                                [Page 2]


Access-restricted extension            March 1998


   This header does not ensure the security of a document, but gives
   multi-user agents an opportunity to restrict access.  If an unknown
   realm is encountered, the indexing robot or cache should treat the
   document as restricted and not share information.

3. The Access-restricted extension

   HTTP/1.1 allows an extension to Cache-Control directives, allowing
   additional extensions to act as modifiers to the base directives

   We propose the addition of an "Access-restricted" extension which
   would be used with the "private" directive to give additional control
   of cache information.

   This header does not ensure the security of a document, but gives
   multi-user agents an opportunity to restrict access.  If an unknown
   realm is encountered, the indexing robot or cache should treat the
   document as restricted and not share information.


   3.1 Using Access-restricted with IP address

   Access restriction by IP address is popular and may by locally
   configured by users for their web pages, which puts it out of control
   of the web master.  In open shared communities, like universities,
   this may cause problems as restricted documents are indexed or
   cached.

   Information about which IP address ranges are allowed to access the
   document would prevent unauthorized users from gaining access.

   The Access-restricted header is followed by a comma separated list of
   IP ranges for which access to the document is permitted.

   Example
    Cache-Control: private, Access-restricted="IP:158.38.60.0/24"


   3.2 Using Access-restricted with domains

   Access restrictions by domain should be interpreted as all FQDN in
   the domain and all subdomains of the domain name may get access.
   Domain names are restricted from left to right.

   Example
    Cache-Control: private, Access-
   restricted="Domain:*.tjener.uninett.no"




Melve, Wilkinson                                                [Page 3]


Access-restricted extension            March 1998


   This restricts access to all hosts in the tjener.uninett.no subdomain

   Example
    Cache-Control: private, Access-restricted="Domain:nurket.uninett.no"

   This restricts access to the host nurket.uninett.no, and no other
   hosts

   3.3 Comma separated lists

   Access restrictions may be combined by using a comma separated list

   Example
    Cache-Control: private,
              Access-
   restricted="IP:158.38.60.0/24,Domain:*.dcs.ed.ac.uk"

   This restricts access to all hosts in the IP address range
   158.38.60.*  as well as hosts in the subdomain dcs.ed.ac.uk (the
   example is broken to fit in the text)

4. Security considerations

   This proposal enhances the security of access restricted web objects,
   as it stops today's practice of accidental sharing.

   Information about access restrictions should only be handed out with
   the web objects, to prevent users without access from get information
   about these restricted web objects.

   Some servers may have restrictions which are time or load-dependent
   and expressing those can be a problem (i.e. a server intended as an
   EU mirror of U.S. data may refuse or redirect U.S. requests, unless
   its load is below some set point)

   Releasing the information on restrictions may provide an opportunity
   for someone to follow up with an IP or domain-spoofed request for the
   data.  The proposal is to give access restriction information only to
   hosts which are not restricted, this reduces the problem.

   Content providers must bear in mind that there is no guarantee of a
   particular user agent honouring either the Cache-Control, or Access-
   restricted headers. Alternative measures should be taken if document
   confidentiality is important.

5. References

   [HTTP/1.1] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Berners-



Melve, Wilkinson                                                [Page 4]


Access-restricted extension            March 1998


   Lee, T.,
     "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, January, 1997.


6.  Authors' Addresses

   Ingrid Melve
   UNINETT
   Tempeveien 22, Trondheim, NORWAY
   Phone: +47 73 55 79 07
   Email: Ingrid.Melve@uninett.no


   Simon Wilkinson
   Department of Computer Science, University of Edinburgh
   Kings Buildings
   Mayfield Road, Edinburgh
   Scotland, UK
   Email: sxw@dcs.ed.ac.uk
































Melve, Wilkinson                                                [Page 5]