Internet Draft                                        Paul Hoffman
draft-hoffkohn-rfc1738bis-00.txt                    VPN Consortium
June 19, 2003                                             Dan Kohn
Expires in six months                             Skymoon Ventures
Intended status: Standards Track

                      Definitions of Early URI Schemes

Status of this Memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.

Abstract

This document specifies many Uniform Resource Identifier (URI) schemes
that were originally specified in RFC 1738 [RFC1738]. Some of these
schemes are specified more fully in this document. The purpose of
this document is to allow RFC 1738 to be moved to historic while keeping
the information about the schemes on standards track.

1. Introduction

URIs are currently defined RFC 2396, which is being updated by
[RFC2396BIS]. Those documents also specify how to define schemes for
URIs.

The first definition for many URI schemes appeared in RFC 1738. Because
that document may be moved to Historic status, this document copies the
still-needed material from it to allow that material to remain on
standards track. Specifically, this document copies the URI schemes.

Some of the URI scheme definitions have been changed. The following
lists all of the changes:

- http: was removed because it is specified in RFC 2616

- mailto: was removed because it is specified in RFC 2368

It should be noted that three of the schemes for protocols that are
described in this document (Gopher+, WAIS, and Prospero) were never
documented in RFCs, and the references to them are URLs that may not be
long-lasting. In fact, at least two of those URLs are no longer
working at the time of this writing.

1.1 Open issues

Section 2.8: will be updated to include specific usage of the file:
scheme on different operating systems

References: some of the references are to URLs that no longer work or
are likely to be abandoned in the future. How do we want to deal with
this?

2. Specific Schemes

The mapping for some existing standard and experimental protocols is
outlined in the BNF syntax definition.  Notes on particular protocols
follow. The schemes covered are:

ftp                     File Transfer protocol
gopher                  The Gopher protocol
news                    USENET news
nntp                    USENET news using NNTP access
telnet                  Reference to interactive sessions
wais                    Wide Area Information Servers
file                    Host-specific file names
prospero                Prospero Directory Service

2.1. Common Internet Scheme Syntax

While the syntax for the rest of the URL may vary depending on the
particular scheme selected, URL schemes that involve the direct use
of an IP-based protocol to a specified host on the Internet use a
common syntax for the scheme-specific data:

        //<user>:<password>@<host>:<port>/<url-path>

Some or all of the parts "<user>:<password>@", ":<password>",
":<port>", and "/<url-path>" may be excluded.  The scheme specific
data start with a double slash "//" to indicate that it complies with
the common Internet scheme syntax. The different components obey the
following rules:

user
        An optional user name. Some schemes (e.g., ftp) allow the
        specification of a user name.

password
        An optional password. If present, it follows the user
        name separated from it by a colon.

The user name (and password), if present, are followed by a
commercial at-sign "@". Within the user and password field, any ":",
"@", or "/" must be encoded.

Note that an empty user name or password is different than no user
name or password; there is no way to specify a password without
specifying a user name. E.g., <URL:ftp://@host.com/> has an empty
user name and no password, <URL:ftp://host.com/> has no user name,
while <URL:ftp://foo:@host.com/> has a user name of "foo" and an
empty password.

host
        The fully qualified domain name of a network host, or its IP
        address as a set of four decimal digit groups separated by
        ".". Fully qualified domain names take the form as described
        in Section 2.5 of RFC 1034 [STD13] and Section 2.1 of RFC 1123
        [STD3]: a sequence of domain labels separated by ".", each domain
        label starting and ending with an alphanumerical character and
        possibly also containing "-" characters. The rightmost domain
        label will never start with a digit, though, which
        syntactically distinguishes all domain names from the IP
        addresses.

port
        The port number to connect to. Most schemes designate
        protocols that have a default port number. Another port number
        may optionally be supplied, in decimal, separated from the
        host by a colon. If the port is omitted, the colon is as well.

url-path
        The rest of the locator consists of data specific to the
        scheme, and is known as the "url-path". It supplies the
        details of how the specified resource can be accessed. Note
        that the "/" between the host (or port) and the url-path is
        NOT part of the url-path.

The url-path syntax depends on the scheme being used, as does the
manner in which it is interpreted.

2.2. FTP

The FTP URL scheme is used to designate files and directories on
Internet hosts accessible using the FTP protocol (RFC959).

A FTP URL follow the syntax described in Section 2.1.  If :<port> is
omitted, the port defaults to 21.

2.2.1. FTP Name and Password

A user name and password may be supplied; they are used in the ftp
"USER" and "PASS" commands after first making the connection to the
FTP server.  If no user name or password is supplied and one is
requested by the FTP server, the conventions for "anonymous" FTP are
to be used, as follows:

        The user name "anonymous" is supplied.

        The password is supplied as the Internet e-mail address
        of the end user accessing the resource.

If the URL supplies a user name but no password, and the remote
server requests a password, the program interpreting the FTP URL
should request one from the user.

2.2.2. FTP url-path

The url-path of a FTP URL has the following syntax:

        <cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>

Where <cwd1> through <cwdN> and <name> are (possibly encoded) strings
and <typecode> is one of the characters "a", "i", or "d".  The part
";type=<typecode>" may be omitted. The <cwdx> and <name> parts may be
empty. The whole url-path may be omitted, including the "/"
delimiting it from the prefix containing user, password, host, and
port.

The url-path is interpreted as a series of FTP commands as follows:

  Each of the <cwd> elements is to be supplied, sequentially, as the
  argument to a CWD (change working directory) command.

  If the typecode is "d", perform a NLST (name list) command with
  <name> as the argument, and interpret the results as a file
  directory listing.

  Otherwise, perform a TYPE command with <typecode> as the argument,
  and then access the file whose name is <name> (for example, using
  the RETR command.)

Within a name or CWD component, the characters "/" and ";" are
reserved and must be encoded. The components are decoded prior to
their use in the FTP protocol.  In particular, if the appropriate FTP
sequence to access a particular file requires supplying a string
containing a "/" as an argument to a CWD or RETR command, it is

For example, the URL <URL:ftp://myname@host.dom/%2Fetc/motd> is
interpreted by FTP-ing to "host.dom", logging in as "myname"
(prompting for a password if it is asked for), and then executing
"CWD /etc" and then "RETR motd". This has a different meaning from
<URL:ftp://myname@host.dom/etc/motd> which would "CWD etc" and then
"RETR motd"; the initial "CWD" might be executed relative to the
default directory for "myname". On the other hand,
<URL:ftp://myname@host.dom//etc/motd>, would "CWD " with a null
argument, then "CWD etc", and then "RETR motd".

FTP URLs may also be used for other operations; for example, it is
possible to update a file on a remote file server, or infer
information about it from the directory listings. The mechanism for
doing so is not spelled out here.

2.2.2. FTP Typecode is Optional

The entire ;type=<typecode> part of a FTP URL is optional. If it is
omitted, the client program interpreting the URL must guess the
appropriate mode to use. In general, the data content type of a file
can only be guessed from the name, e.g., from the suffix of the name;
the appropriate type code to be used for transfer of the file can
then be deduced from the data content of the file.

2.2.4 Hierarchy

For some file systems, the "/" used to denote the hierarchical
structure of the URL corresponds to the delimiter used to construct a
file name hierarchy, and thus, the filename will look similar to the
URL path. This does NOT mean that the URL is a Unix filename.

2.2.5. Optimization

Clients accessing resources via FTP may employ additional heuristics
to optimize the interaction. For some FTP servers, for example, it
may be reasonable to keep the control connection open while accessing
multiple URLs from the same server. However, there is no common
hierarchical model to the FTP protocol, so if a directory change
command has been given, it is impossible in general to deduce what
sequence should be given to navigate to another directory for a
second retrieval, if the paths are different.  The only reliable
algorithm is to disconnect and reestablish the control connection.

2.3. GOPHER

The Gopher URL scheme is used to designate Internet resources
accessible using the Gopher protocol.

The base Gopher protocol is described in RFC 1436 and supports items
and collections of items (directories). The Gopher+ protocol is a set
of upward compatible extensions to the base Gopher protocol and is
described in [Gopher+]. Gopher+ supports associating arbitrary sets of
attributes and alternate data representations with Gopher items.
Gopher URLs accommodate both Gopher and Gopher+ items and item
attributes.

2.3.1. Gopher URL syntax

A Gopher URL takes the form:

  gopher://<host>:<port>/<gopher-path>

where <gopher-path> is one of

   <gophertype><selector>
   <gophertype><selector>%09<search>
   <gophertype><selector>%09<search>%09<gopher+_string>

If :<port> is omitted, the port defaults to 70.  <gophertype> is a
single-character field to denote the Gopher type of the resource to
which the URL refers. The entire <gopher-path> may also be empty, in
which case the delimiting "/" is also optional and the <gophertype>
defaults to "1".

<selector> is the Gopher selector string.  In the Gopher protocol,
Gopher selector strings are a sequence of octets which may contain
any octets except 09 hexadecimal (US-ASCII HT or tab) 0A hexadecimal
(US-ASCII character LF), and 0D (US-ASCII character CR).

Gopher clients specify which item to retrieve by sending the Gopher
selector string to a Gopher server.

Within the <gopher-path>, no characters are reserved.

Note that some Gopher <selector> strings begin with a copy of the
<gophertype> character, in which case that character will occur twice
consecutively. The Gopher selector string may be an empty string;
this is how Gopher clients refer to the top-level directory on a
Gopher server.

2.3.2 Specifying URLs for Gopher Search Engines

If the URL refers to a search to be submitted to a Gopher search
engine, the selector is followed by an encoded tab (%09) and the
search string. To submit a search to a Gopher search engine, the
Gopher client sends the <selector> string (after decoding), a tab,
and the search string to the Gopher server.

2.3.3 URL syntax for Gopher+ items

URLs for Gopher+ items have a second encoded tab (%09) and a Gopher+
string. Note that in this case, the %09<search> string must be
supplied, although the <search> element may be the empty string.

The <gopher+_string> is used to represent information required for
retrieval of the Gopher+ item. Gopher+ items may have alternate
views, arbitrary sets of attributes, and may have electronic forms
associated with them.

To retrieve the data associated with a Gopher+ URL, a client will
connect to the server and send the Gopher selector, followed by a tab
and the search string (which may be empty), followed by a tab and the
Gopher+ commands.

2.3.4 Default Gopher+ data representation

When a Gopher server returns a directory listing to a client, the
Gopher+ items are tagged with either a "+" (denoting Gopher+ items)
or a "?" (denoting Gopher+ items which have a +ASK form associated
with them). A Gopher URL with a Gopher+ string consisting of only a
"+" refers to the default view (data representation) of the item
while a Gopher+ string containing only a "?" refer to an item with a
Gopher electronic form associated with it.

2.3.5 Gopher+ items with electronic forms

Gopher+ items which have a +ASK associated with them (i.e. Gopher+
items tagged with a "?") require the client to fetch the item's +ASK
attribute to get the form definition, and then ask the user to fill
out the form and return the user's responses along with the selector
string to retrieve the item.  Gopher+ clients know how to do this but
depend on the "?" tag in the Gopher+ item description to know when to
handle this case. The "?" is used in the Gopher+ string to be
consistent with Gopher+ protocol's use of this symbol.

2.3.6 Gopher+ item attribute collections

To refer to the Gopher+ attributes of an item, the Gopher URL's
Gopher+ string consists of "!" or "$". "!" refers to the all of a
Gopher+ item's attributes. "$" refers to all the item attributes for
all items in a Gopher directory.

2.3.7 Referring to specific Gopher+ attributes

To refer to specific attributes, the URL's gopher+_string is
"!<attribute_name>" or "$<attribute_name>". For example, to refer to
the attribute containing the abstract of an item, the gopher+_string
would be "!+ABSTRACT".

To refer to several attributes, the gopher+_string consists of the
attribute names separated by coded spaces. For example,
"!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELL attributes
of an item.

2.3.8 URL syntax for Gopher+ alternate views

Gopher+ allows for optional alternate data representations (alternate
views) of items. To retrieve a Gopher+ alternate view, a Gopher+
client sends the appropriate view and language identifier (found in
the item's +VIEW attribute). To refer to a specific Gopher+ alternate
view, the URL's Gopher+ string would be in the form:

For example, a Gopher+ string of "+application/postscript%20Es_ES"
refers to the Spanish language postscript alternate view of a Gopher+
item.

2.3.9 URL syntax for Gopher+ electronic forms

The gopher+_string for a URL that refers to an item referenced by a
Gopher+ electronic form (an ASK block) filled out with specific
values is a coded version of what the client sends to the server.
The gopher+_string is of the form:

+%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A

To retrieve this item, the Gopher client sends:

   <a_gopher_selector><tab>+<tab>1<cr><lf>
   +-1<cr><lf>
   <ask_item1_value><cr><lf>
   <ask_item2_value><cr><lf>
   .<cr><lf>

to the Gopher server.

2.4. NEWS

The news URL scheme is used to refer to either news groups or
individual articles of USENET news, as specified in RFC 1036.

A news URL takes one of two forms:

 news:<newsgroup-name>
 news:<message-id>

A <newsgroup-name> is a period-delimited hierarchical name, such as
"comp.infosystems.www.misc". A <message-id> corresponds to the
Message-ID of section 2.1.5 of RFC 1036, without the enclosing "<"
and ">"; it takes the form <unique>@<full_domain_name>.  A message
identifier may be distinguished from a news group name by the
presence of the commercial at "@" character. No additional characters
are reserved within the components of a news URL.

If <newsgroup-name> is "*" (as in <URL:news:*>), it is used to refer
to "all available news groups".

The news URLs are unusual in that by themselves, they do not contain
sufficient information to locate a single resource, but, rather, are
location-independent.

2.5. NNTP

The nntp URL scheme is an alternative method of referencing news
articles, useful for specifying news articles from NNTP servers (RFC
977).

A nntp URL take the form:

  nntp://<host>:<port>/<newsgroup-name>/<article-number>

where <host> and <port> are as described in Section 2.1. If :<port>
is omitted, the port defaults to 119.

The <newsgroup-name> is the name of the group, while the <article-
number> is the numeric id of the article within that newsgroup.

Note that while nntp: URLs specify a unique location for the article
resource, most NNTP servers currently on the Internet today are
configured only to allow access from local clients, and thus nntp
URLs do not designate globally accessible resources. Thus, the news:
form of URL is preferred as a way of identifying news articles.

2.6. TELNET

The Telnet URL scheme is used to designate interactive services that
may be accessed by the Telnet protocol.

A telnet URL takes the form:

   telnet://<user>:<password>@<host>:<port>/

as specified in Section 2.1. The final "/" character may be omitted.
If :<port> is omitted, the port defaults to 23.  The :<password> can
be omitted, as well as the whole <user>:<password> part.

This URL does not designate a data object, but rather an interactive
service. Remote interactive services vary widely in the means by
which they allow remote logins; in practice, the <user> and
<password> supplied are advisory only: clients accessing a telnet URL
merely advise the user of the suggested username and password.

2.7.  WAIS

The WAIS URL scheme is used to designate WAIS databases, searches, or
individual documents available from a WAIS database. WAIS is
described in [WAIS]. The WAIS protocol is described in RFC 1625 [RFC1625];
Although the WAIS protocol is based on Z39.50-1988, the WAIS URL
scheme is not intended for use with arbitrary Z39.50 services.

A WAIS URL takes one of the following forms:

 wais://<host>:<port>/<database>
 wais://<host>:<port>/<database>?<search>
 wais://<host>:<port>/<database>/<wtype>/<wpath>

where <host> and <port> are as described in Section 2.1. If :<port>
is omitted, the port defaults to 210.  The first form designates a
WAIS database that is available for searching. The second form
designates a particular search.  <database> is the name of the WAIS
database being queried.

The third form designates a particular document within a WAIS
database to be retrieved. In this form <wtype> is the WAIS
designation of the type of the object. Many WAIS implementations
require that a client know the "type" of an object prior to
retrieval, the type being returned along with the internal object
identifier in the search response.  The <wtype> is included in the
URL in order to allow the client interpreting the URL adequate
information to actually retrieve the document.

The <wpath> of a WAIS URL consists of the WAIS document-id. The WAIS
document-id should be treated opaquely; it may only be decomposed by
the server that issued it.

2.8 FILES

The file URL scheme is used to designate files accessible on a
particular host computer. This scheme, unlike most other URL schemes,
does not designate a resource that is universally accessible over the
Internet.

A file URL takes the form:

   file://<host>/<path>

where <host> is the fully qualified domain name of the system on
which the <path> is accessible, and <path> is a hierarchical
directory path of the form <directory>/<directory>/.../<name>.

For example, a VMS file

 DISK$USER:[MY.NOTES]NOTE123456.TXT

might become

 <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>

As a special case, <host> can be the string "localhost" or the empty
string; this is interpreted as `the machine from which the URL is
being interpreted'.

The file URL scheme is unusual in that it does not specify an
Internet protocol or access method for such files; as such, its
utility in network protocols between hosts is limited.

2.9 PROSPERO

The Prospero URL scheme is used to designate resources that are
accessed via the Prospero Directory Service. The Prospero protocol is
described elsewhere [PROSPERO].

A prospero URLs takes the form:

  prospero://<host>:<port>/<hsoname>;<field>=<value>

where <host> and <port> are as described in Section 2.1. If :<port>
is omitted, the port defaults to 1525. No username or password is
allowed.

The <hsoname> is the host-specific object name in the Prospero
protocol, suitably encoded.  This name is opaque and interpreted by
the Prospero server.  The semicolon ";" is reserved and may not
appear without quoting in the <hsoname>.

Prospero URLs are interpreted by contacting a Prospero directory
server on the specified host and port to determine appropriate access
methods for a resource, which might themselves be represented as
different URLs. External Prospero links are represented as URLs of
the underlying access method and are not represented as Prospero
URLs.

Note that a slash "/" may appear in the <hsoname> without quoting and
no significance may be assumed by the application.  Though slashes
may indicate hierarchical structure on the server, such structure is
not guaranteed. Note that many <hsoname>s begin with a slash, in
which case the host or port will be followed by a double slash: the
slash from the URL syntax, followed by the initial slash from the
<hsoname>. (E.g., <URL:prospero://host.dom//pros/name> designates a
<hsoname> of "/pros/name".)

In addition, after the <hsoname>, optional fields and values
associated with a Prospero link may be specified as part of the URL.
When present, each field/value pair is separated from each other and
from the rest of the URL by a ";" (semicolon).  The name of the field
and its value are separated by a "=" (equal sign). If present, these
fields serve to identify the target of the URL.  For example, the
OBJECT-VERSION field can be specified to identify a specific version
of an object.


3. Security Considerations

There are many security considerations for URIs, as described in
[RFC2396BIS].


4. References

[Gopher+] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D.,
Johnson, D., and B. Alberti, "Gopher+: Upward compatible enhancements to
the Internet Gopher protocol", University of Minnesota, July 1993.
<URL:ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol
/Gopher+/Gopher+.txt>

[PROSPERO] Neuman, B., and S. Augart, "The Prospero Protocol",
USC/Information Sciences Institute, June 1993.
<URL:ftp://prospero.isi.edu/pub/prospero/doc /prospero-protocol.PS.Z>

[RFC1625] St. Pierre, et. al., "WAIS over Z39.50-1988", RFC 1625, June
1994.

[RFC1738] Berners-Lee, et. al.,  "Uniform Resource Locators (URL)", RFC
1738, December 1994.

[RFC2396BIS] Berners-Lee, et. al., "Uniform Resource Identifier (URI):
Generic Syntax", draft-fielding-uri-rfc2396bis

[STD3] Braden, R., Editor, "Requirements for Internet Hosts --
Application and Support", STD 3, RFC 1123, October 1989.

[STD13] Mockapetris, P., "Domain Names - Concepts and Facilities", STD
13, RFC 1034, November 1987.

[WAIS] Davis, et. al, "WAIS Interface Protocol Prototype Functional
Specification", (v1.5), Thinking Machines Corporation, April 1990.
<URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>


5. Authors' Contact Information

Dan Kohn
Skymoon Ventures
3045 Park Boulevard
Palo Alto, California  94306  USA
Phone: +1-650-327-2600
EMail: dan@dankohn.com
URI:   http://www.dankohn.com/

Paul Hoffman
VPN Consortium
127 Segre Place
Santa Cruz, CA 95060 USA
Phone: +1-831-426-9827
EMail: paul.hoffman@vpnc.org