[Search] [txt|pdf|bibtex] [Tracker] [Email] [Nits]

Versions: 00                                                            
INTERNET DRAFT                                     F. Giudici, A. Sappia
Category: Informational                       University of Genoa, Italy
February 22, 1997                                Expires August 22, 1997

             An Extension to the Web Robots Control Method
                      for supporting Mobile Agents

Status of this Memo

   This document is an Internet-Draft. Internet-Drafts are working
   documents on the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress''.

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

1. Abstract

   The Web Robots Control Standard [1] is a method for administrators of
   sites on the World-Wide-Web to give instructions to visiting Web
   robots. This document describes an extension for supporting Robots
   based on Mobile Agents, in a way that is independent of the
   technology used for their actual implementation.

2. Introduction

   Web Robots are Web client programs that automatically traverse the
   World Wide Web by retrieving a document and recursively retrieving
   all documents that are referenced. Robots are used for maintenance,
   indexing and search purposes.

   ``Classic'' Robots perform their job from the host from which they

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 1]

INTERNET DRAFT  An Extension to the Web Robots Control...   Feb 22, 1997

   have been launched; recent technologies offer the possibility of
   writing Robots that are able to physically move through the network,
   to operate within the website that hosts data being processed.

   Mobile Robots can lead to bandwidth and computational power savings,
   as well as to personalized search robots. A more detailed discussion
   of Mobile Robots pros and cons is out of the purposes of this

   Mobile Agents [5] is a technology that, among other things, allows
   the implementation of Mobile Robots. Mobile Agents are a
   computational paradigm in which programs can ``migrate'' from host to
   host, preserving their current state.

   To migrate through the Internet, Mobile Agents have to transfer data
   over the networks, for both their code and their internal data
   structures. On this purpose, they need a communication protocol.

   To receive and execute a Mobile Agent, a host must be equipped with a
   proper daemon that listen a port for incoming requests.

   Given the protocol name and the port number that the daemon is
   listening, addresses for Mobile Agents destinations can be written in
   form of a URL [2] as follows:

     <protocol> :// <network address> : <port number>

   For instance, considering the Agent Transfer Protocol (ATP) [3] and
   given a fictional site www.fict.org, a valid address for dispatching
   a Mobile Agent could be


3. Specification

   To control the way Robots can access a WWW site, a method is being
   currently used [1]. Simply speaking, the method states that a special
   document, named /robots.txt and whose MIME type is text/plain, should
   be available at the root of the website. Referring to the previous
   example, the URL of this document would be


   /robots.txt contains a list of records that describe in details which
   subtrees of the website are available for exploration by a given
   Robot and which are not. The format of these records is the following

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 2]

INTERNET DRAFT  An Extension to the Web Robots Control...   Feb 22, 1997


     <record name> ":" <record contents>

   A typical example follows:

     User-agent:  webcrawler
     Allow:       /
     Disallow:    /reserved

   The method specifications allow extensions to this structure, so new
   records can be added by just defining new tokens.

3.1. The-Mobile-agent-server record

   To control dispatching of Mobile Robots, a new record type is defined
   with the following form (the formal syntax is described in the next

     Mobile-agent-server: <path> <url>

   These records associate a well defined path on the website to the URL
   of a host that accepts Mobile Robots for exploring that path.

   More than one Mobile-agent-server line can be used, and in this case
   more recent lines always override older ones. Using multiple lines
   allows to assign different subtrees to different Mobile Agent capable
   hosts, or eventually to none. In the following example the website
   root (/) is not assigned to any host, while /dir1 and /dir1/dir2 are
   assigned to different targets:

     Mobile-agent-server: /          none
     Mobile-agent-server: /dir1      atp://www.fict.org:544
     Mobile-agent-server: /dir1/dir2 atp://www.fict.org:543

   This mechanism is independent of the protocol and the programming
   language used for implementing the Mobile Robot.

3.2. Formal Syntax

   This is a BNF-like description of the Mobile-agent-server record
   line, using the conventions of RFC 822 [4], except that "|" is used
   to designate alternatives. Briefly, literals are quoted with "",
   parentheses "(" and ")" are used to group elements, optional elements

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 3]

INTERNET DRAFT  An Extension to the Web Robots Control...   Feb 22, 1997

   are enclosed in [brackets], and elements may be preceded with <n>* to
   designate n or more repetitions of the following element; n defaults
   to 0.

   The Mobile Robot extension defines a new record line as follows:

     mobileagentrec  = "Mobile-agent-server:" *space path
                       *space (simplified_url | "none")

     simplified_url  = scheme "://" net_loc
     scheme          = 1*( alpha | digit | "+" | "-" | "." )
     net_loc         =  *( pchar | ";" | "?" )

     space           = 1*(SP | HT)

   The simplified URL is a subcase of a URL as defined in RFC 1808 [2]
   and only designates a protocol, a network location and a port number.

   The syntax for "path" and other symbols are defined in RFC 1808 and
   reproduced here for convenience:

     path        = fsegment *( "/" segment)
     fsegment    = 1*pchar
     segment     =  *pchar

     pchar       = uchar | ":" | "@" | "&" | "="
     uchar       = unreserved | escape
     unreserved  = alpha | digit | safe | extra

     escape      = "%" hex hex
     hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                           "a" | "b" | "c" | "d" | "e" | "f"

     alpha       = lowalpha | hialpha
     lowalpha    = "a" | "b" | "c" | "d" | "e" | "f" | "g" |
                   "h" | "i" | "j" | "k" | "l" | "m" | "n" |
                   "o" | "p" | "q" | "r" | "s" | "t" | "u" |
                   "v" | "w" | "x" | "y" | "z"
     hialpha     = "A" | "B" | "C" | "D" | "E" | "F" | "G" |
                   "H" | "I" | "J" | "K" | "L" | "M" | "N" |
                   "O" | "P" | "Q" | "R" | "S" | "T" | "U" |
                   "V" | "W" | "X" | "Y" | "Z"

     digit         = "0" | "1" | "2" | "3" | "4" | "5" | "6" |
                   "7" | "8" | "9"

     safe        = "$" | "-" | "_" | "." | "+"
     extra       = "!" | "*" | "'" | "(" | ")" | ","

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 4]

INTERNET DRAFT  An Extension to the Web Robots Control...   Feb 22, 1997

4. Examples

   This section contains an example of how an extended /robots.txt may
   be used.

   Let us suppose that a fictional site has the following URLs:


   Let be user1.fict.org and user2.fict.org two hosts equipped for
   receiving Mobile Agents, for example by means of the ATP protocol.

   The /robots.txt contains Mobile Agents directives as follows:

     Mobile-agent-server: /             atp://www.fict.org:8001
     Mobile-agent-server: /home/        none
     Mobile-agent-server: /home/user1/  atp://user1.fict.org:854
     Mobile-agent-server: /home/user2/  atp://user2.fict.org:831

   The following matrix shows if Mobile Agents are supported for
   indexing a given document, and on which host:

     URL                                        HOST

     http://www.fict.org/index.html             atp://www.fict.org:8001
     http://www.fict.org/services/              atp://www.fict.org:8001
     http://www.fict.org/services/index.html    atp://www.fict.org:8001
     http://www.fict.org/robots.txt             atp://www.fict.org:8001
     http://www.fict.org/home/                  not available
     http://www.fict.org/home/user1/            atp://user1.fict.org:854
     http://www.fict.org/home/user1/index.html  atp://user1.fict.org:854
     http://www.fict.org/home/user2/            atp://user1.fict.org:831
     http://www.fict.org/home/user2/index.html  atp://user1.fict.org:831
     http://www.fict.org/home/user3/            not available
     http://www.fict.org/home/user3/index.html  not available

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 5]

INTERNET DRAFT  An Extension to the Web Robots Control...   Feb 22, 1997

5. Security considerations

   The Mobile-agent-server record can expose the existence of resources
   not otherwise linked to on the site, which may aid people guessing
   for URLs.

   If the exposed resource is the URL of a document, no further risks
   are induced other than those ones already implied by the standard

   If the exposed resource is the URL of a site that can host Mobile
   Agents, security problems are to be dealt with at the site itself by
   means of a proper security model that should allow incoming Robots to
   only perform those operations needed for exploring the assigned
   website subtrees.  However this is an issue related to the specific
   technology used for the implementation of the Mobile Robots and it is
   not to be discussed here.

   The same considerations about impersonation and encryption stated in
   the Standard Specification also apply here.

6. References

   [1] Koster, M. "A Standard for Robot Exclusion",
   http://info.webcrawler.com/mak/projects/robot/norobots.html, June

   [2] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource
   Locators (URL)", RFC 1738, CERN, Xerox PARC, University of Minnesota,
   December 1994.

   [3] Lange, D. B., "Agent Transfer Protocol - ATP/0.1 Draft", IBM
   Tokyo Research Laboratory,
   http://www.trl.ibm.co.jp/aglets/atp/atp.htm, July 1996.

   [4] Crocker, D., "Standard for the Format of ARPA Internet Text
   Messages", STD 11, RFC 822, UDEL, August 1982.

   [5] Chang, D. T., and Lange, D. B., "Mobile Agents: A New Paradigm
   for Distributed Object Computing on the WWW", IBM Tokyo Research
   Laboratory, OOPSLA'96 Workshop "Toward the integration of WWW and
   Distributed Object Technology",

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 6]

INTERNET DRAFT  An Extension to the Web Robots Control...   Feb 22, 1997

7. Authors' Addresses

   Fabrizio Giudici, fritz@dibe.unige.it, phone: +39-10-3532192
   Andrea Sappia, sappia@dibe.unige.it, phone: +39-10-3532192

   Electronic Systems and Networking Group
   Department of Biophysical and Electronic Engineering
   University of Genoa
   Via Opera Pia 11/a, 16145 - Genoa, ITALY

   Expires August 22, 1997

Giudici Sappia    draft-giudici-web-robots-cntrl-00.txt         [Page 7]