INTERNET-DRAFT                                        FRANCE TELECOM
February 18, 1999                                     Cedric Goutard,
Expires: July 18, 1999                                   Ivan Lovric,
draft-lovric-francetelecom-satellites-00.txt   Eric Maschio-Esposito




                Pre-filling a cache - A satellite overview


Status of this Memo

This document is an Internet-Draft and is in full conformance with
all the provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as  reference material or to cite them other than as "work in
progress".

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt


The list of Internet-Drafts Shadow Directories can be accessed at
http://www.ietf.org/shadow.html .


Abstract

Today, satellites are becoming major vectors of the information
diffusion on the Internet. Their use can prove to be fully useful for
the cache pre-filling because they allow big volumes of data to be
transferred at high speed (up to 45 Mb/s) and to be distributed
simultaneously on several reception dishes. When having this pre-
filling information on the cache, users can benefit from better access
time to the stored pages.
In this context, the satellite allows the quality of service for the
end user to be improved by optimizing satellite links and by
transferring large volumes of data directly only when the traffic on
the network is low.








France Telecom                Expires: July 1999              [Page 1]


                           Internet-Draft              February 1999


Table of contents

I  Introduction

II Experiments

  II-1 Proxy-satellite experimentation

  II-2 Caches pre-filling experiments over satellite

    II-2.1 Technical solution of cache pre-filling by co-operation

      II-2.1.1 Choice of the ICP co-operation mode

      II-2.1.2 Impact of a second cache in the architecture

      II-2.1.3 Collecting the test URLs

      II-2.1.4 FTP transfer over satellite

      II-2.1.5 Results of caches pre-filling by co-operation

    II-2.2 Technical solution of pre-filling by HTTP redirection

      II-2.2.1 Principle

      II-2.2.2 Experiments of pre-filling by redirection

III Technical description of pre-filling methods

  III-1 Technical description of the pre-filling with Wcol

    III-1.1 Description of the files structure hierarchy with Wcol

    III-1.2 Complementary files generated by Wcol

      III-1.2.1 Creation of the INFO file

      III-1.2.2 Creation of the HEAD file

      III-1.3 Complete pre-filling process with Wcol

  III-2 Technical description of the pre-filling with SQUID

    III-2.1 Description of files structure hierarchy with SQUID

       III-2.1.1 Description of the LOG file

       III-2.1.2 Localization of the storage path

    III-2.2 Content of a file stored by SQUID

    III-2.3  LOG file generation

France Telecom                Expires: July 1999              [Page 2]


                           Internet-Draft              February 1999


    III-2.4 Full pre-filling process with SQUID

    III-2.5 Difficulties encountered with dates

  III-3 Description file of URLS for the pre-filling with Wcol
        and SQUID

IV Advantages highlighted by these experiments

V Next experiments

  V-1 Multicast diffusion toward several remote caches

  V-2 Automatic feeding of the HTTP server

  V-3 Diffusion services toward communities of interest

  V-4 Pre-filling using ICP extensions

VI Partnership France Telecom - EUTELSAT

VII References

VIII Acknowledgments

IX Authors' addresses

I Introduction

In order to supply the internal needs of France Telecom on information
broadcast technologies on the Internet , we chose to define services
through a satellite diffusion infrastructure. Indeed, thanks to the
intrinsic nature of satellite broadcast, it is the most practical way
to develop these kind of services. Multicast technologies which are
designed fo diffusion do not manage to impose themselves on the market
and the deployment costs (i.e. the modification of all the routers
to work in multicast mode) are two high. The satellite is becoming an
unavoidable communication mode for the Internet community. Thanks to
satellite, it is possible to increase the bandwidth without investing
huge amounts of money for the connectivity needs. Transmission speed
and information diffusion are critical points that all the ISPs are
trying to solve without finding effective cheap means.

We began with the observation that the Web traffic is asymmetrical:
the amount of information returned by the Web servers is much higher
than the one generated by client requests. So we studied the way to
deploy an unidirectional high-speed access over satellite (only for
the return way) for a company Intranet in order to decrease the
traffic on the leased connections. This Intranet is built around a LAN
which is connected with frame relay, ISDN or specialized connections
that have an Internet access point.



France Telecom                Expires: July 1999              [Page 3]


                           Internet-Draft              February 1999


II Experiments

II-1 Proxy-Satellite experimentation

During this experimentation on a proxy-satellite, we simulated the
needs of a company who has to broadcast bulky documents on its
Intranet (to several work offices in several areas). We supposed that
this company only has a limited bandwidth (64 kb/s to 256 kb/s), and
does not want to change its network infrastructure (routers, etc.).
With this configuration, the traffic generated by the broadcast of
documents is rapidly going to saturate the local Network, and the
other applications will not have enough bandwidth to run correctly.
All the client/server applications, which regularly use a part of the
Network resources, will be blocked.

So, we modified the access architecture to the Internet/Intranet of
the company by installing a proxy on the local network (taking into
account the existing parameters), at the reception point of the
satellite. As it is not possible to build differentiated services for
the use of the Intranet and for the use of the Internet, this proxy
can concentrate all the Intranet/Internet services (HTTP, FTP, NNTP).
Then, the default routing parameters direct the requests (clients to
servers) toward the terrestrial network and allow the responses
(servers to clients) to be received over the satellite link.
To configure the client's browsers, this solution only requires
automatic configuration utilities (i.e. the file proxy.pac) so that
they can access Intranet/Internet services through the proxy.
We also modified the routing table of the last router to redirect all
the packets whose destination address corresponds to the Proxy towards
the satellite up link site.
























France Telecom                Expires: July 1999              [Page 4]


                           Internet-Draft              February 1999



The diagram below shows the architecture defined for our experiment.

                            ___________
                 ////////  /           \  ////////
                //////////X  Satellite  X//////////
                 ////////  \___________/  ////////

                  .                            .
                .                                .
         \\   .                                    .  //
        \\  .                                        . //
        \\.                                           .//
       * \\                                          // *
      / \  \\                                      //  / \
     /___\                                            /___\
      ||                                Satellite board ||
 +------------------------+                        +---------+
 |       ISP              | /|                     |         |
 |  CACHE             R   |/ |     ISDN            |  PROXY  |
 |                    E   |  |=====================|         |
 +---------------+    M   |\ |  FRAME RELAY        +---------+
        ||       |    O   | \|                      ||
        ||       |    T   |                         ||
      +++++      |    E   |                LAN ======================
    ++     ++    |        |                     |             |
  +           +  | ACCESS |                  +--------+     +--------+
 +             + |        |                  | User 1 |     | User 2 |
 +  INTERNET   + +--------+                  +--------+     +--------+
  +           +
    ++     ++
      +++++

Although this solution already optimizes the response times, it is
however insufficient and does not allow the broadcasting facilities
of the satellites to be used. It still remains a unicast communication
tool in a potentially multicast environment.


II-2 Caches pre-filling experiments over satellite

In order to optimize the efficiency of the satellite connections we
undertook to pre-fill the Proxy Server's content (client side). The
pre-filling is done in two principal steps :
The first one consists in analyzing the logs of the proxy and
determining a significant number of requested URLs.
The second one aims at refreshing the contents (ISP Cache Server) and
preparing a pre-filling file which will be broadcasted over satellite
(for example as a background task during the night when the activity
of the company is reduced). When the download is done, an application
installs the updated URLs immediately on the local cache or on an HTTP
server.


France Telecom                Expires: July 1999              [Page 5]


                           Internet-Draft              February 1999


So there are two different methods to pre-fill the cache :
- The pre-filling by co-operation with an other local cache.
- The pre-filling by redirection of the traffic towards a local HTTP
  server.

The most frequently requested URLs are in fact directly delivered and
are locally available. No backbone connection is needed with the
original remote servers.
As the update over satellite of the local cache or the HTTP server is
an unidirectional mechanism, the bandwidth of the backbone is not at
all affected by the refreshing of contents.


II-2.1 Technical solution of pre-filling by co-operation

The Netscape Proxy Server 3.5 used for the experiment can not easily
be pre-filled. Although it comprises a development API, this one does
not have functions to act on the cache content. Then the hierarchy
generated on the proxy server is complex, the file names are
transcoded. So we decided to install a second cache. Due to their
public domain sources, we chose to use the Proxies Caches Wcol and
Squid. The source files only allow us to better understand the
management mechanism of the content treated by these caches.

The optimization of the communications remains a major problem, so, to
avoid the increase of remote connections, we chose to install the
cache that we wanted to pre-fill on the local network at the reception
point of the satellite. On this network, the pre-filling technique is
based on the use of the ICP protocol.
                            ___________
                 ////////  /           \  ////////
                //////////X  Satellite  X//////////
                 ////////  \___________/  ////////
               .                           .
             .                               .
      \\   .                                   .  //
     \\  .                                       . //
     \\.                                          .//
    * \\                                         // *
   / \  \\                                     //  / \
  /___\                                           /___\
    ||                                             ||
 +------------------------+                   +------+      +-------+
 |       ISP              | /|                |      |      |  PRE  |
 |  CACHE             R   |/ |     ISDN       |PROXY | ICP  |FILLING|
 |                    E   |  |================|      |<---->| CACHE |
 +---------------+    M   |\ |  FRAME RELAY   +------+      +-------+
        ||       |    O   | \|                   ||            ||
      ++++++++   |    T   |                      ||            ||
     ++      ++  |    E   |                  ===================== LAN
    +          + |        |                     |             |
   +  INTERNET  +| ACCESS |                  +--------+     +--------+
   +            +|        |                  | User 1 |     | User 2 |
     ++++++++++  +--------+                  +--------+     +--------+
 France Telecom                Expire: July 1999              [Page 6]


                             Internet-Draft              February 1999


II-2.1.1 Choice of the ICP co-operation mode

The ICP protocol allows a hierarchy of co-operating caches to be
defined.
Usually, it is natural to define a vertical hierarchy with two or
three levels of parents/children.

                                Internet
                                   ||
First Level                      Parent
                              /         \
                             /           \
Second Level              Child1       Child2
                          /    \       /    \
                         /      \     /      \
Third Level             C3      C4   C5      C6

We privileged the relationship child/child ("sibling" mode) for the
two following reasons :

1- The pre-filling operation requires to stop the cache, to launch
   the pre-filling process and then to restart the cache. In
   operational mode, the Internet/Intranet services should not be
   stopped during updates even temporarily. Moreover, the content
   to pre-fill can become very quickly voluminous, and can require
   a relatively long process time. Some further studies will permit
   these parameters to be improved.

2- Using a parent/child hierarchy obliges the child to systematically
   request its parent, whatever the URL it needs. The parent must
   search for this URL on the Internet, if it does not have it in its
   cache. In a child/child relation, if the requested child does not
   have the document, it only replies by a MISS. In no way, this
   requested child will connect on the network to get the document.
   It is the querying child that will directly get the document on the
   Internet.
   This process does not modify its initial way of working.


II-2.1.2 Impact of a second cache in the architecture.

The pre-filled cache completes the main proxy-cache. The two graphics
below explain the modification introduced by the pre-filled
cache in the architecture.

Sibling mode, chosen for the previously evoked reasons, generates a
weak traffic between the querying and the replying caches.







France Telecom                Expires: July 1999              [Page 7]


                            Internet-Draft              February 1999


A - THE URL is contained in the pre-filled cache.

                                    Query (ICP)
              +--------------+      2               +--------------+
              |              |   --------------- >  |     WCOLD    |
              | PROXY SERVER |                      |       or     |
              |   NETSCAPE   |   < ---------------  |     SQUID    |
              +--------------+      3 HIT           +--------------+
                 / \    |
                  |     |
                1 |     | 4
                  |    \ /
              +--------------+
              |              |
              |    CLIENT    |
              |              |
              +--------------+

The pre-filled cache contains the requested URL, it replies HIT and
it subsequently returns the URL. As we are on the local network of the
compagny, transfer times are almost immediate.

B - THE URL is not contained in caches


          Direct to the main server

                 / \    |
                  |     |
                4 |     | 5
                  |    \ /         Query (ICP)
              +--------------+      2               +--------------+
              |              |   --------------- >  |     WCOLD    |
              | PROXY SERVER |                      |       or     |
              |   NETSCAPE   |   < ---------------  |     SQUID    |
              +--------------+      3 MISS          +--------------+
                 / \    |
                  |     |
                1 |     | 6
                  |    \ /
              +--------------+
              |              |
              |    CLIENT    |
              |              |
              +--------------+

The pre-filled cache does not contain the requested URL, the Proxy
server, upon receiving a MISS reply, decides to contact the original
server to get the URL. In spite of the negative answer, the cache
response time is negligible compared to the connection time to the
remote original server.



France Telecom                Expires: July 1999              [Page 8]


                            Internet-Draft              February 1999


II-2.1.3 Collecting the test URLs.

We used a software that enables us to download all or part of a Web
site. Files are recorded in their original formats (HTML, GIF, JPG,
etc.) while preserving original path of the information. In fact,
this software permits us to replicate a part of the downloaded site.
A ZIP compression tool is used to optimize the transfer times.

Remark:
We noticed that it was not easy to predict the right level of the
downloading. It is necessary to optimise the pre-filled contents to
avoid downloading documents that would be of very little interest.


II-2.1.4 FTP transfer over satellite

For the purpose of feasibility and demonstration, we used a FTP server
available on our Internet experimental platform . We put down a zipped
file on the platform and launched the FTP downloading from the Server
containing the satellite board and the Netscape Proxy.


II-2.1.5 Results of caches pre-filling by co-operation

The progressive integration of the different elements of the
experimentation (satellite, then the proxy-satellite couple, then the
proxy pre-filling) shows a constant improvement in the rapid access
time to documents. Every element of the process participates in
reducing downloading times. It is all the more remarkable if tests are
carried out on video sequences or on Web sites containing a large
number of high definition pictures. When used in unicast mode, the
FTP transfers is more rapid. Background updates of a pre-filled
content increases considerably the quality of service and limits the
remote connection load.


II-2.2 Technical solution of pre-filling by HTTP redirection

We are going to describe in this part an alternative solution to the
pre-filling of cache by co-operation. This solution also permits the
URLs to be pre-fetched over satellite in order to improve the quality
of service.


II-2.2.1 principle

Being a cache, the Proxy is able to filter the requests it receives.
Due to this capability, one can deduce that it must be able to
redirect those requests toward an HTTP server of our choice.

The following diagram presents the functional principle of the
pre-filling by redirection that we experimented:


France Telecom                Expires: July 1999              [Page 9]


                            Internet-Draft              February 1999


     +++++                          +------+      +---------+
   ++     ++      /|                |      |      |  HTTP   |
 +           +   / |                |PROXY | HTTP | SERVER  |
+             + |  |================|      |<---->|         |
+  INTERNET   +  \ |                +------+      +---------+
 +           +    \|                   ||             ||
   ++     ++                           ||             ||
     +++++                             ||             ||
                                    ====================== LAN
                                        |             |
                                  +--------+     +--------+
                                  | User 1 |     | User 2 |
                                  +--------+     +--------+


We can analyse this process in two distinct parts:

1. The redirection of client requests by the cache toward
   an HTTP server.
2. The feeding of an HTTP server with up-to-date documents

All HTTP requests from a client are parsed by the cache. If one
filter is applicable, the request is modified in order to be
transmitted to the local HTTP server, otherwise, the request is
normally processed by the cache. The modification of the request
follows this principle:

Initial URL requested by the client to the cache:
http://remote_server/document.html

Requested URL submitted by the cache to the HTTP server and returned
to the client:
http://local_server/remote_server/document.html

The principle of the pre-filling consists in applying filters to the
client requests so that the cache could request directly some
documents to the local HTTP server. The applied filters can use
regular expressions and can be the following ones :

  asked URL                 |  mapped URL
----------------------------------------------------------------------
http://www.ft.fr /          | http://server/www.ft.fr/index.html
http://www.ft.fr/ima1.gif   | http://server/www.ft.fr/ima1.gif
http://www.ft.fr/ima2.gif   | http://server/www.ft.fr/ima2.gif
http://www.cnet.fr /        | http://server/www.cnet.fr /

These examples permit us to redirect either some particular documents
(those for the www.ft.fr site), or a whole site (www.cnet.fr).
We must particularly pay attention to the writing of filters to make
sure that only the documents to be pre-filled are taken into account.
These filters must be up-to-date as soon as the content of the local
HTTP server is modified. Finally, the great advantage of this


France Telecom                Expires: July 1999              [Page 10]


                             Internet-Draft              February 1999


redirection type is that it is transparent for the client. The client
thinks he reaches the original Web server whereas in fact, the
document he receives comes from another http server with the same
field address (contrary to the redirection defined in the HTTP
protocol).

The local HTTP server is regularly fed with up-to-date documents and
the study of this transfer file will be the subject of a next
experiment.


II-2.2.2 Experiments of pre-filling by redirection

We used, for this experimentation, the Netscape proxy-cache 3.52 on
Solaris 2.6. This solution has been chosen because it enabled us to
easily create filtering and mapping rules just by modifying a
configuration file (obj.conf) and restarting the cache with the
in-line command "restart".

The HTTP server that we used is Apache but all other server could
match for the experimentation. The two necessary points for this
server, in our experimentation, are that it needs to be easy to feed
and very efficient.

We developped a Shell script that uses a list of URLs to create the
Netscape configuration file. This script creates an up-to-date
configuration file and then restarts the proxy-cache. The file
containing the URLs has the following form:

http://www.ft.fr/index.html
http://www.ft.fr/ima1.gif
http://www.ft.fr/ima2.gif
http://www.cnet.fr/intro.html
   ...

This script proved us the feasibility of a cache pre-filling service
while using a simple and effective principle of HTTP traffic
redirection. This kind of service can therefore be an efficient
alternative to the experiments previously described.

For the moment, this script needs to be manually launched once the
up-to-date URLs are donwloaded and the filter file is created on the
targeted server.

III Technical description of pre-filling methods

This chapter describes methods used to pre-fill the caches Wcol and
Squid. These methods were successfully implemented in the technical
solution of cache pre-filling by ICP co-operation which has been
previously described in this document.




France Telecom                Expires: July 1999              [Page 11]


                             Internet-Draft              February 1999


III-1 Technical description of the pre-filling with Wcol

Wcol (see [http://shika.aist-nara.ac.jp/products/wcol/wcol.html]) is a
cache which has particular pre-fetching functionalities, but these
capabilities have not been used in our cache pre-filling studies.
In fact, the interest of Wcol consists first in its capacity to
support all or part of the ICP protocol since the WcolD version (the
following version WcolE fully implements the protocol ICPv2 whereas
the WcolD version only implements a small part of the ICP messages,
which is however sufficient for our experiments). The second interest
of Wcol is the simplicity of the hierarchy of the stored Web pages
on the cache.


III-1.1 Description of the files structure hierarchy with Wcol

Under a main directory "http", corresponding to the protocol, a
hash-coding key permits a first selection of the URLs and a first
level of directory to be constituted. Then, the URLs are stored
directly within the hash directory whose name has the format
"hxxx" (xxx represents a number between 000 and 999), with a first
directory level corresponding to the server name, then the HTTP port,
and then the different directory names stored hierarchically. This
storage mode is very similar to the one used in Web servers except
for the hash-coding level. So, the internal storage hierarchy is easy
to recreate, and that is the reason why Wcol presented an interesting
solution for the experiment of pre-filling caches.

Example :
If the directory of internal storage is /home/cache/ (obtained by
initializing the CacheDir keyword in the configuration file of Wcol),
the http://sample/Welcome.html URL stored in the cache will have the
following path:
/home/cache/http/h001/sample/80/Welcome.html


III-1.2 Complementary files generated by Wcol

When a URL is stored within Wcol (for example Welcome.html), the cache
completes the stored URL by an information file with ",info" extension
(ex: Welcome.html,info) which contains the information related to the
stored URL for a specific internal use.
Among this information file, we can find attributes like the number of
times that the document has been accessed, the last modification date,
the creation date, etc.
For every stored URL, there is also a header file with ",head"
extension(ex: Welcome.html, head). This file contains the HTTP header
and all related information. If the information file or the header
file are missing, then Wcol does not consider the URL as valid though
it is stored at the good path. Therefore, in order for an URL to
be correctly pre-loaded in the cache, it is essential to create the
"HEAD" and "INFO" files.


France Telecom                Expires: July 1999              [Page 12]


                             Internet-Draft              February 1999


In our experimentation, it was therefore necessary, in order to be
able to pre-fill the cache, to implement the internal mechanism of
Wcol for creating the HEAD and INFO files.

Remark :
Once the INFO and HEAD files created and the URL stored at the good
place in the storage space of Wcol, the file is then validated by the
cache though the information in the HEAD and INFO files are partial.


III-1.2.1 Creation of the INFO file

The creation of the information file is hard and requires the call to
specific routines of Wcol stored in modules "base.c" and "info.c".
The routine named "AssignFileName" stored in "base.c" has the
advantage, for a given name of URL, to specify its exact location in
the internal storage space of the cache.
The "NewInfo" and "SaveInfo" routines of the "info.c" module permit
the INFO file corresponding to a specific URL to be automatically
created. Although many attributes are not initialized in the INFO
structure created by a call to these routines, we noticed that a
restricted information file, created by this way, is sufficient for
the URL to be recognized by Wcol as valid, if at least the fields
"attr.name", "attr.state", and "attr.last" of the INFO structure are
correctly initialized.


III-1.2.2 Creation of the HEAD file

For a pre-filled URL, it is always necessary to create a HEAD file in
order to be recognized by the cache.
In fact, it is sufficient to create a short HEAD file that contains
only the following information :

HTTP/1.1 200 OK
Content-type: -the MIME type corresponding to the URL-

III-1.3 Complete pre-filling process with Wcol

Once understood and implemented in a software aiming at recreating the
HEAD and INFO files, the following step consisted in creating a tool
permitting the whole description file making the link between a URL to
preload and its physical location to be processed. The format of this
file is described in the chapter III-3.
The tool creates information and header files. It also moves the URL
to store, from its initial physical location on the hard disk, to the
right place in the storage space of Wcol. This process is executed by
the tool for each entry in the description file.
Once the description file is created, it is necessary to store
temporarily or not the files to include in the cache at places stated
in the description file. Then this tool previously described has just
to be launched. Therefore the prefilling mechanism of Wcol that has


France Telecom                Expires: July 1999              [Page 13]


                             Internet-Draft              February 1999


been achieved in the experiment of cache pre-filling over satellite
contains these three elements:

- process aiming at recreating the INFO and HEAD files
- tool processing of the description file of the URLs
- the description file itself

Remark:
For our experiments we stopped Wcol before each pre-filling process
and reactivated it in order to simplify the complete experiment and
to avoid that data stored in memory by Wcol interferes with preloaded
data.


III-2 Technical description of the pre-filling with SQUID

As it has been previously described, the cache Wcol has the advantage
of storing the information in a very simple way, which is very similar
to the hierarchies of files stored on Web servers. The disadvantage of
this solution is the fact that Wcol is not sufficiently widespread
compared to the main caches that we find on the market (Netscape Proxy
Server, SQUID, etc.). That is why the second part of the
experimentation consisted in studying the opportunities to pre-fill
the content of a frequently used cache which supports ICP and whose
file sources are available in the public domain. The only one we found
is SQUID. This famous cache is also known for its quality and its
resistance in the case of important loads; experiments have been done
with the version 1.1.22 of SQUID (see [http://squid.nlanr.net /]).


III-2.1 Description of the files structure hierarchy with SQUID

With SQUID, the hierarchy of stored files is more complex than it is
with Wcol. In fact, the hierarchy of files stored by SQUID does not
have any common point with the one of a Web server, because it was
created in order to optimize the research of files by the use of
hash-coding keys on two distinct levels of hierarchy whereas Wcol only
 has one hash-coding key level. Moreover, files are not stored
directly within the cache as it is the case with Wcol, but a
transformation and a renaming are operated before their storage.
The exact location of a file within SQUID is made possible through the
analysis of the file "log" that contains the link between an URL and
the stored file.


III-2.1.1 Description of the LOG file

The information permitting the location of a file is specified in the
"log" file whose exact location is written in the configuration file
of SQUID (key word: cache_dir).

In this LOG file, there is one line for each URL stored in the cache.
The format of that line is the following one:

France Telecom                Expires: July 1999              [Page 14]


                             Internet-Draft              February 1999


- name of the file on 8 hexadecimal characters
- creation date
- expiration date
- last modification date
- length of the file
- URL corresponding to the file

Thus, it appears that, without this LOG file, it is not possible to
make the link between a URL and the corresponding stored file.

Therefore, it is necessary to generate this LOG file to be able to
pre-fill SQUID.


III-2.1.2 Localization of the storage path

The name of a file stored in a SQUID cache, after transformation, is
being coded on 8 hexadecimal numbers. So, it is not sufficient to
describe precisely the exact place of physical storage of the file in
the cache. In fact, it is absolutely necessary to use information
stored in the configuration file (keywords: swap_level1_dirs and
swap_level2_dirs) that permit the final file path to be calculated.
The physical storage path on the disk is then generated using the
following formulas :

- first level of directory = name of file % swap_level1_dirs

- second level of directory = name of file / swap_level1_dirs %
                              swap_level2_dirs

These formulas come from function "storeSwapFullPatch" stored in the
"store.c" module.
By applying these formulas, it is then possible to calculate the final
location on the hard drive from the Squid filename.

For example :
the file 00000001 is stored at /Squid/cache/01/00/00000001 if keyword
cache_dir is equal to /Squid/cache in the configuration file of SQUID.


III-2.2 Content of a file stored in SQUID

Once the physical storage place in cache is known, it is necessary to
create the file to store. This file is based on the URL and includes
any supplementary information that SQUID required in order to consider
the stored file as valid. These complementary information must be
stored at the beginning of a file, then, the full content of the URL
must be added. The complementary information to add at the beginning
of the file are :

HTTP/1.1 200 OK
Content-type: -the MIME type MIME of the stored object-


France Telecom                Expires: July 1999              [Page 15]


                             Internet-Draft              February 1999


It is interesting to see that this complementary information is
precisely the same than the one that was necessary in the HEAD file
of Wcol. Other information can be added, which appeare in the web
browsers properties menus. But the previously main characteristics
described are sufficient to consider the stored files as valid.

III-2.3 LOG file generation

In order for the pre-filling to be correctly taken into account by
SQUID, it is still necessary to generate the line of the LOG file
corresponding to the URL that must be forced in the cache. In our
case, we noticed that the following information has to be absolutely
created for each line with the following form :

- name of the stored file (8 hexadecimal numbers) incremented by
  one for each new line.
- creation date.
- fffffffe for the date of expiration (see paragraph III-2-5).
- modification date lower than the creation date.
- size of the stored file in the cache; that is size of header info +
  size of the object to store.
- URL corresponding to the stored file .

Entry sample in the LOG file:

00000009 3a6c6c6c fffffffe 3a6c6c60 250 http://sample/Welcome.html


III-2.4 Full pre-filling process with SQUID

After the previous descriptions of the way SQUID stores URLs, the
necessary work to pre-fill the cache consisted in the creation of the
following three elements :
- process aiming at generating the file header and to create the file
  to store within SQUID from the URL to include; the process generates
  the line of the LOG file corresponding to the URL, and inserts the
  file to store within.
- process responsible for the processing of the description file of
  the URLS
- description file itself whose syntax is precisely the same that
  was used for Wcol

The method used to pre-fill SQUID is really similar to the one used
for Wcol. The supplementary complexity of SQUID is due to the more
complex representation of a URL in its storage place.

Remark:
On each SQUID startup, the link between the URLs and the stored files
on the disk is recreated in memory in order to optimize the research
time.
Thus, for the experimentation, it was necessary to stop SQUID before
every pre-filling process in order to avoid the data located in memory
to interfere with pre-filled information.

France Telecom                Expires: July 1999              [Page 16]


                             Internet-Draft              February 1999


III-2.5 Difficulties encountered with dates

A problem appeared concerning the expiration date of pre-filled
documents. It has been observed that SQUID quickly considers a pre-
filled document as outdated after a few minutes, what means that the
supplementary information about creation and last modification dates
must also be added in header at the beginning of the stored documents.
However, for the purpose of the experimentation, it was sufficient to
fix an expiration date in the LOG different of fffffffe and superior
to the creation date so that the document is no more considered as
outdated by SQUID, until expiration of this date.


III-3 Description file of URLS for the pre-filling with Wcol and SQUID

The description file has a very simple structure because, in fact,
only three attributes are absolutely necessary to well pre-fill Wcol
and SQUID.
The required data for each line of the file is the URL in its
normalized format (see [RFC 1738]), the exact location (on the hard
drive or on the network) of the document that must be included in the
cache , as well as the MIME type of this document (see [RFC 1341]).
Each of these fields must be separated by a space. The description
file contains "n" lines for "n" URLs to include in the cache.

Example of description file:

http://sample/Welcome.html /home/sample/Welcome.html text/html
http://sample/Welcome.gif /home/sample/Welcome.gif image/gif


IV - Advantages highlighted by these experiments

Pre-filling cache technologies which have been implemented and
described in this document have shown their feasibility in real
experiments. The reading of the SQUID and WCOL sources available in
the public domain, considerably helped us to quickly find pre-filling
solution. Moreover, these experiments allowed us in complex co-
operating cache architectures using ICP to be validated between
two different operating systems and two different caches products.
Thus that shows the inter-operability of these solutions and also the
advantages of the ICP protocol.

The use of a satellite link highlights the great potentiality of this
media to transfer bulky contents very quickly as near as possible from
the end-users. Information access times are considerably improved.
The couple satellite link and pre-filling cache avoids part of the
problems involved in the traffic congestion. In spite of the delay of
300 ms generated by a GEO satellite , the benefit of this connection
becomes undeniable as soon as the volume of the required document
exceeds 5 KBytes. Some of our tests were relative to large volumes
of video and high definition pictures. In that case, playing a video


France Telecom                Expires: July 1999              [Page 17]


                             Internet-Draft              February 1999


animation, while continuing the transfer in background, ensures a good
fluidity of the video sequences without any cuts which frequently
occur on ISDN connections (64 Kb/s). Moreover, the level of confidence
of the satellites makes it possible to use light error correction
protocols.

A smart anticipation of user's needs and a fine update processing
can even make him suppose that he can use an Internet bandwidth equal
to the bandwidth of the local area network. Satellite is also a simple
way to provide powerful and very fast accesses in critical or badly
served areas with no high speed infrastructures.
Moreover, when an ISP's architecture is being upgraded by adding a new
cache server, it is possible to take benefits of these techniques to
pre-fill this cache and initialize it with  a preset content
according to interests of the users. That allows time to be gained and
the cache to be made immediately effective, whereas, in
practice, a long initialization process is still necessary, during
which first users have no benefits from using this cache.


V - Next Experiments

We will first improve the tools previously described. Then, the next
researches concern the diffusion of contents. In fact we aim at pre-
filling simultaneously several caches. For this purpose, we will use
satellite and multicast protocols, for example, the MFTP protocol.
Moreover, we will study the use of satellite diffusion in order to
pre-fill caches with contents specifically targeted for communities
of interest.


V-1 Multicast diffusion toward several remote caches

The following steps consists in experimenting the pre-filling while
using a real multicast transfer between the satellite up-link site
and the different reception sites.


V-2 Automatic feeding of the HTTP server

In the case of pre-filling by redirection, the experimentation does
not yet integrate the basics of feeding the HTTP server with up to
date documents. We also do not have yet implemented neither an
automatic file transfer method nor an automatic choice of the files
 we download on the HTTP server.

So, our next work will consist in an improvement of our script in
order to automate and optimize updates and the restart the cache as
soon as a new directory structure is written on the HTTP server.

The second stage will be the use of a file transfer method to the HTTP
server using a satellite link.


France Telecom                Expires: July 1999              [Page 18]


                             Internet-Draft              February 1999


V-3 Diffusion services toward communities of interest

Using the architectures defined in the previous experiments, we will
work on the definition of a feeding service of up-to-date documents.
We will use for that purpose a file transfer method that we will
later develop. This service could depend on an analysis of the logs
we could get on the caches, and, for example, could decide that the
next satellite update of the HTTP server will only concern the most
popular URLs. In that case, a feedback on the most used cached data
(by analyzing the log files) will contribute to make the pre-filling
more cost-effective and more interesting.

The identification and the binding of the major points of interest we
can extract from the analysis of logs could enable us to create groups
of interest (which can be different on each site).
As all clients do not have the same points of interest, studies will
be led to optimize the transfer of a common content to all caches,
and, then the transfer of a personalized content for the purpose of
each community of interest.

We will also work on dynamic pages.


V-4 Pre-filling using ICP extensions

Another kind of caches pre-filling based on the ICP extensions will
be implemented in a further experiments.
Indeed, ICP extensions proposed in the referenced draft
[draft-lovric-icp-ext-01.txt] permit the content of any cache to be
pre-filled thanks to push-caching messages.

A process that would send to a targeted cache an ICP_OP_SET message
with the ICP_FLAG_ALIAS flag set, could force an URL in the targeted
cache. It uses for that purpose a protocol like "file://..." to
specify to the cache the network path of the stored URL alias. The
cache must then fetch this alias in a lower or equal time than the
delay set in the ICP_OP_SET message. Otherwise, it is also possible
to pre-fill a full list of URLs by sending an ICP_OP_SET_TAB message
with the ICP_FLAG_ALIAS flag set. In this case, the alias contains the
list of the URLs to pre-fill. Each URL must also have an alias
specifying its network path.

Example of list file
(see [draft-lovric-icp-ext-01.txt] for list file syntax)
to pre-fill the following URLs :  http://sample/Welcome.html
                                  http://sample/Welcome.gif
1,http
2,sample
3,80
4,/
5,I,Welcome.html,A,file://home/sample/Welcome.html
5,I,Welcome.gif,A,file://home/sample/Welcome.gif


France Telecom                Expires: July 1999              [Page 19]


                             Internet-Draft              February 1999

Note: ICP Extensions also permit compressed aliases to be pre-filled.


VI Partnership France Telecom - EUTELSAT

A partnership between France Telecom and EUTELSAT will focus on the
evaluation of the previously described solutions in a large scale
multicast platform using satellites.
EUTELSAT will provide the UpLink site and the satellite bandwidth
and France Telecom will provide the cache pre-filling solutions.
New diffusion services of personalized contents to different
communities of interest will also be evaluated on this platform.

The results of these evaluations will be published in a second draft
which will be written by both partners.

VII References

[RFC-1341] Borenstein, N., N. Freed and, "MIME (Multipurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing the Format
of Internet Message Bodies",
RFC 1341, Bellcore, June, 1992.

[RFC1738]

Berners-Lee, T., Masinter, L., and Mr. McCahill, "Uniform Resource
Locators (URL)", RFC 1738, CERN, Xerox PARK, University of Minnesota,
December 1994.

[RFC2186]
D. Wessels, K., Claffy, "Internet Cache Protocol (ICP), version 2",
RFC 2186, National Laboratory for Applied Network Research/UCSD,
September 1997

[draft-lovric-icp-ext-01.txt]
Lovric, "Internet Cache Protocol Extension", France Telecom,
October 1998


VIII Acknowledgments

The authors wish to thank Sandrine CHELLES, Christophe NETILLARD,
Gilles GRATTARD, Betty PREHU, Sylvie LOVRIC for helping us in writing
this document.











France Telecom                Expires: July 1999              [Page 20]


                             Internet-Draft              February 1999



IX Authors' addresses

Cedric Goutard
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243
14066 Caen Cedex
France
Phone: +33 2 31 75 91 49
Fax: +33 2 31 73 56 26
E-mail: cedric.goutard@cnet.francetelecom.fr

Ivan Lovric
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243
14066 Caen Cedex
France
Phone: +33 2 31 75 91 25
Fax: +33 2 31 73 56 26
E-mail: ivan.lovric@cnet.francetelecom.fr

Eric Maschio-Esposito
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243
14066 Caen Cedex
France
Phone: +33 2 31 75 91 63
Fax: +33 2 31 73 56 26
E-mail: eric.maschio-esposito@cnet.francetelecom.fr






















France Telecom                Expires: July 1999              [Page 21]