INTERNET-DRAFT                                                M.Mealling
Expires six months from June 1998                Network Solutions, Inc.
Intended category: Experimental
draft-mealling-human-friendly-identifier-arch-00.txt

     An Architecture for Supporting Human Friendly Identifiers

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working documents
of the Internet Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and
may be updated, replaced, or obsoleted by other documents at any time. It
is inappropriate to use Internet-Drafts as reference material or to cite
them other than as work in progress.

To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).

Abstract

   This document describes an architecture that satisfies the
   requirements for Human Friendly Identifiers as specified in [HFI-REQ].
   Specifically it describes the URI scheme "go" as an HFI encoding
   mechanism, a protocol for the resolution of HFIs, and a scalable and
   open infrastructure for resolving those HFIs.

1. The Architecture

   This architecture borrows heavily from DNS both in terms of local
   servers holding data while the root holds only referrals and in
   terms of its operational organization reflecting the current
   direction of DNS root management toward registrars and registries.

   There are five distinct parts of the architecture:

        The Root -- Due to the flatness of the HFI space, this service
            will be heavily loaded. Thus, the data served from the root
            should be small. Like DNS, it should only contain referrals
            to locally maintained servers. This can also be thought
            of as a registry in the parlance of the current gTLD
            debate.

        Registrars -- There are two classes of these: qualified and
            unqualified. Qualified registrars offer a guaranteed
            level of service as applied to the data that is presented
            by the service.  This distinction between qualified and
            unqualified data is presented to the client by ranking
            responses so that hits from qualified
            registrars are ranked higher than those from unqualified
            registrars. Unqualified registrars can be any entity. This
            allows anyone to write entities to the root. Qualified vs.
            Unqualified is discussed in more detail below.

        Content Servers -- Since referrals are the only entities kept
            in the root, the actual data returned during resolution
            is retrieved from a separate server. This server can
            be maintained by a registrar or by the entity that
            requests the HFI.

        Local Server -- Much like DNS, these servers act
            as caches and contain data for use only be the
            local entity. They use the same protocol as the root
            and act as a chained or referral basis depending on
            their configuration.

        Clients -- The entire reason for most systems, the clients
            are the part that actually send queries and process
            results.


              +-----------------------------------------------+
              |               Root (Registry)                 |
              |          (HFI, referral, contexts)            |
              +-----------------------------------------------+
                                       /|\/|\  R|   /|\
   +---------------+                    |  |   e|    |R
   | Registrar     |---Qualified Entry--+  |   f|    |o
   +---------------+                       |   e|    |o
                                           |   r|    |t
   +---------------+                       |   r|    |Q
   | End user      |---Unqualified Entry---+   a|    |u
   +---------------+                           l|    |e
                                    +-----------+    |r
   +----------------+               |                |y
   | Content Server |               |        +-------------------------+
   +----------------+               |        | Enterprise Level Server |
      |     /|\                     |        +-------------------------+
     C|     R|                      |               /|\
     o|     e|                      |                |
     n|     q|       Referral       |        +-------------------------+
     t|     u|     +----------------+        | Department Level Server |
     e|     e|     |                         +------------+------------+
     n|     s|     |                                /|\   |Possible    |
     t|     t|    \|/                                |    |Local       |
      |    +---------------+      Resolution Request |    |Content     |
      +--->| Client        |-------------------------+    +------------+
           +---------------+

                                Figure 1.

   Data Model

   The data model used by the architecture is fairly simple. The root
   only contains the actual identifier string, zero or more
   discriminating contexts, and enough information to refer the client
   to the host that contains the required data. Contexts are values
   specified by the registrant that discriminate the particular HFI from
   other HFIs with the same value. Potential contexts include geographic
   region, topic area/industry segment, popularity, or unique identifier.
   It has not been determined which contexts are required, if any.

   The metadata that is returned to the client resides on Content Servers.
   The referral to the client contains a host/port tuple that refers to
   Content Server. The data maintained there is encoded in an RDF [1]
   object that adheres to the RDF Schema specified in Appendix A. Since
   RDF allows multiple schema, the local Content Server maintainer has the
   ability to include community specific information within the returned
   object. The client is only required to understand the schema in
   Appendix A.

   Match Semantics

   The first match is done on the HFI itself. The user's query can specify
   simple syntactic matches at this point. Since the HFI is in Unicode
   there may be language specific matches that are possible. Unicode
   specific match semantics are a topic of much discussion.

   One 1 or more syntactic matches are made, the user supplied contexts
   are matched with the result set. Due to the expected size and load on
   the root, contexts should be thought of as simple scalar values.
   For example, if geographic area is specified as a context then the
   values should be normalized outside of the root. This allows
   the root to do very simple and fast comparisons on normalized codes.
   The root should not be required to support a GIS back-end in order to
   understand geographic location.

   Syntactic matches are matches based on the exact Unicode values
   of the HFI strings. These include exact and substring where
   appropriate. It is probably NOT possible to support soundex style
   matches across such a large, multi-lingual dataset.

2. The "go" URI scheme

   In order for an HFI to be used within the existing Internet and
   WorldWideWeb infrastructure it must adhere to the syntax and
   semantics of Uniform Resource Identifiers [RFC2396]. The HFI
   requirement that it be short suggests an URI scheme that is
   small but recognizable. Thus the scheme "go" is specified as
   the default method of specifying an HFI.

   The "go" scheme contains a single element which is the HFI
   itself. Since the HFI is required to be internationalized the scheme
   will need to be able to handle any language or character set.
   This requirement suggests that UTF-8 encoded Unicode is appropriate.

   When displayed to the user an HFI should not be shown in its
   URI encoded form unless no other form is available. Instead an
   HFI should be shown according to the localization rules of the
   user.

   As with URNs (and most URIs for that matter), the "go" scheme
   is considered independent of its resolution method. While the
   protocol for that resolution is specified in this document, the
   reader should take care to realize that a "go" URI can and will
   be resolved by other protocols.

   Example:
        Displayed Form              Encoded Form
        -------------------------   ---------------------------------
        go:Nike                     go:Nike
        go:Network Solutions        go:Network%20Solutions
        go:Martin J. Duerst         go:Martin%20J.%20D%C3%BCrst


   NOTE: In the last example the limits of this ASCII document do not
   allow for the correct representation of Martin's last name.

3. The HFI resolution protocols

   This architecture has several client-server interactions of differing
   flavors. The protocol between the qualified registrars and the
   registry is almost out of scope since it is an operational issue
   that may have its own policy and security issues. The query protocol
   between the Client and the Local Servers should be identical to the
   query protocol used with the root since there shouldn't be any
   architectural difference between the two. The protocol between the
   Client and the Content Server can be handled by any existing
   retrieval protocol. HTTP immediately comes to mind as a very valid
   Content Server protocol.

3.1 Client to Server Query Protocol (CSQP?)

   For speed the protocol should be simple and small. For a low barrier
   to adoption the protocol should not require a great deal of encoding.
   To balance these the protocol will be UTF-8 encoded Unicode. The
   interaction is simple: the client connects and issues a query after
   which the server responds with 1 or more referrals. Since both the
   query and responses are atomic, the protocol can use either TCP or
   UDP as its transport. TCP uses a simple text based, line oriented
   interaction while UDP uses a simple, TFTP [RFC1350] style, packet
   reconstruction.

3.1.1 UDP Interaction

   Specification of exact UDP interaction should go here. See
   TFTP [RFC1350] for a good example of how it should be done.

3.1.2 TCP Interaction

   Specification of exact TCP interaction should go here. This should
   be fairly easy since its simply the UDP version without any
   block numbers or acknowledgments.

3.1.3 The Query

   /* Authors Temporary Comment: These formats are arbitrary and    */
   /* thus can (and probably should)  be changed.                   */
   The Query is made up of 3 elements: the query type, the HFI
   and n contexts. They are specified as follows:

   query = query-type " " hfi " " *(crlf context) crlf
   query-type = "substring" / "exact" / 1*alphadigit
   hfi = <"go" scheme URI>
   contect = context-name ":" context-value crlf
   context-name = 1*alphadigit
   contet-value = 1*alphadigit
   alphadigit = alpha / digit / "_" / "-"
   alpha = "a".."z" / "A".."Z"
   digit = "0".."9"
   lf = <ASCII LF (linefeed)>
   cr = <ASCII CR (carriage return)>
   crlf = cr lf

   Example:

   substring go:Nike
   location:us-ga-atlanta-lawrenceville
   industry:28

   This example shows a query for the HFI "Nike" in the city of
   Lawrenceville where the entity is in the International
   Trademark Class 28 ("Toys and sporting goods").


   exact go:Network%20Solutions
   location:us
   industry:38

   This example shows a query for the HFI "Network Solutions" in
   the United States where the entity is in the International
   Trademark Class 38 ("Communication services").

3.1.4 The Response

   /* Authors Temporary Comment: These formats are arbitrary and    */
   /* thus can (and probably should)  be changed.                   */
   A response is a simple list of hits where each hit is a tuple of
   the actual HFI that matched, the domain-name of the Content Server,
   the port on which to contact that host, and a unique id that is used
   by the Content Server to insure that the correct HFI is requested.
   It is in the following format:

   response = *hit
   hit = hfi domain port unique-id crlf
   hfi = <"go" scheme URI>
   port = "0".."65535"
   unique-id = 1*alphadigit
   alphadigit = alpha / digit / "_" / "-"
   alpha = "a".."z" / "A".."Z"
   digit = "0".."9"
   lf = <ASCII LF (linefeed)>
   cr = <ASCII CR (carriage return)>
   crlf = cr lf

   Examples:

   go:Network%20Solutions services.netsol.com 8080 01BDF839.D979BBA0@netsol.com

   This example shows the HFI that matched ("Network Solutions"), the
   host to be contacted (services.netsol.com), the port (8080) and the
   unique-id (01BDF839.D979BBA0@netsol.com). The unique ID is to serve
   as the identifier that is retrieved from the content server. This is
   for cases where a Content Server maintains multiple objects that share
   the same HFI.

3.2 The Content Retrieval Protocol

   The protocol for retrieving the actual RDF object is HTTP. The
   host is contacted on the given port and the path is requested.
   The requested path "/hfi/<uid>" where <uid> is the unique-id
   found in the referral. The response from the server should be
   a text/xml or application/xml object that contains an RDF object
   following the specification in Appendix A.

   Example:

   The user requests the HFI for "go:Network%20Solutions" and is
   presented with the hit from the above example. The client then
   connects to services.netsol.com on port 8080 and, using HTTP,
   requests the resource "/hfi/01BDF839.D979BBA0%64netsol.com". The
   response should be for either an application/xml or text/xml
   resource that contains the RDF object.

   All standard HTTP functions are valid.

4. Qualified vs Unqualified

   The reasoning behind allowing non-registrars to write unqualified
   entries to the root is to allow for the two communities that are
   being targeted with HFIs: the business community and the end user.
   Businesses desire an HFI that is of a higher quality and that have
   a bit of uniqueness to them. In their case, trademark is extremely
   important. The end user is simply looking for a cool identifier
   for use by friends and online contacts. Uniqueness and trademark
   status are unimportant whereas coolness and vanity are of utmost
   importance. In order for the system to be used by both, there is
   the need for the two types of entries to be disambiguated.

   For example, the South Park cartoon character Cartman is an important
   trademark for Comedy Central. At the same time, South Park's popularity
   has caused many online game players to use Cartman as a nickname to
   identify their online character. Both can use the identifier
   go:Cartman without there being any confusion as to which one is
   Comedy Central's official Cartman HFI. One additional feature is
   that, since the root contains both, Comedy Central has a fairly
   easy method for checking on infringers and, if so desired, could
   discover unqualified entries that it wished to pursue infringement
   litigation against.

7. Author Contact Information

Michael Mealling
Network Solutions
505 Huntmar Park Drive
Herndon, VA 22070
voice: (703) 742-0400
fax: (703) 742-9552
email: michaelm@rwhois.net


Appendix A -- XML DTD for Content

This is just an example. I'm sure it will end up being a bit more elaborate
than this.

<!-- HFI DTD -->
<!ELEMENT hfi-content (hfi | identifiers | address | contacts)>
<!ELEMENT hfi (#PCDATA)>
<!ELEMENT identifiers (homepage | urn)>
<!ELEMENT homepage (#PCDATA)>
<!ELEMENT urn (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT contacts (technical | adminstrative)* >
<!ELEMENT technical (#PCDATA)>
<!ELEMENT administrative (#PCDATA)>