INTERNET-DRAFT                                                M.Mealling
Expires six months from June 1998                Network Solutions, Inc.
Intended category: Experimental
draft-mealling-human-friendly-identifier-req-00.txt

                Requirements for Human Friendly Identifiers

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working documents
of the Internet Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and
may be updated, replaced, or obsoleted by other documents at any time. It
is inappropriate to use Internet-Drafts as reference material or to cite
them other than as work in progress.

To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).

Abstract

This document includes a set of requirements for an identifier that
is engineered for human consumption. While the identifier is still
machine consumable, the services and capabilities of the underlying
system are designed with humans in mind. This includes concepts of
geographic and context specific constraints, non-uniqueness, and
natural language match semantics.

1. Introduction

The many identifiers used on the Internet are, in general, designed with
machines in mind. Domains, URIs, and email addresses are all used to
identify some component of the network. They are not engineered to provide
the easiest interface for users. Users routinely handle identifiers where
two entities are known by the same name (companies with the same name in
two different geographic locations) or short versions of long names
(Coke and Coca-Cola). This document specifies requirements for
an identifier and resolution system that are can be engineered to solve
human oriented identification needs. Identifiers that solve these
problems are referred to as "human friendly identifiers".

2. Justification

The phenomenal growth of the Internet over the past few years has
had in immense impact on systems designed for a community of users
who were highly computer literate. These early users were willing
(eager?) to use systems in a manner that suited the machine more than
the user.

Some would argue that DNS was built specifically for the purpose of
making life easier for the user. On closer examination though, DNS'
intended user was significantly more sophisticated than today's
Internet user. The dot notation and the strict hierarchy are
foreign to today's users and do not match their methods of
organizing information and resources very well.

To exacerbate the user-unfriendliness of domain-names, the
WorldWideWeb has added the additional property of specifying
resources and protocols at the particular host identified
by the domain-name. Many of those in the URI working group cringed
when the first attempts at URLs were heard on radio or printed
in newspapers. The quintessential example was heard on the
Larry King Live show on CNN. The guest was David Letterman.

   LETTERMAN: Can I just take a second here, Larry -- I'm
   sorry, I don't mean to interrupt -- To give our World Wide Web
   address. If people want to e-mail, we are on the World Wide Web
   as well.

   KING: You are too?

   LETTERMAN: wwwwww.com com com com ........ com com
   diggity diggity diggity dank.com.com diggity www.com
   Dave.com.com. So give us some of that e-mail or something.

   KING: Hold on.

   LETTERMAN: Have you got that, Larry?

   KING: Would you repeat that? I want to get it right.

   LETTERMAN: Come on, Larry. The bit's over. Pick it up.

While humorous, Letterman's point is that URLs and domain-names
are not suited for the regular, day-to-day information
needs of humans. Internet identifiers usually contain odd characters
that are needed to delimit syntax elements. The mutual exclusivity
of DNS means that two entities cannot have the same name, thus
causing those without the desired domain to resort to acronyms
or other combinations that simply do not meet users expectations.

This inappropriate use of existing identifiers has created two problems:

        Users are left confused and intimidated. While growth
        of the Internet is large, there are significant sections
        of the population that are so intimidated by the
        technical lingo that they refuse to go online.

        Identifiers and services are abused in order to squeeze
        out some modicum of human oriented functionality. DNS'
        ".com" domain is one such example. Companies and governments
        routinely attempt to apply trademark law to a medium that
        cannot cope with the basic tenets of trademarks.

These two problems cannot be solved using existing Internet systems.
Intimidated users will not feel comfortable until they can use
the same identifiers they use in everyday conversation. Existing
systems will be further pressed into service until some system
accommodates most of the needs of marketers and lawyers.

A solution is needed. This document intends to explore the requirements
needed to supply a solution. The first task is to identify the specific
user communities and the specific HFI oriented problems they face.
Secondly, specific parts of the problem space are analysed for being
in or out of scope for this effort.  The remaining problems are then merged
into a simple set of requirements that define a solvalbe and useful
problem space.

3. Intended Audience

There are three distinct user communities that have an interest in
a human friendly identifier:

        Users - The general Internet user desires an identifier that
                can be easily remembered and guessed. This makes it
                much easier to find important resources.

        Marketing - Businesses desire an identifier that gives
                their marketing campaigns the greatest latitude
                in terms of character sets, length, and simplicity.
                In many cases the identifier will be determined as
                much by the media in which it is conveyed as the
                idea it is attempting to convey.

        Trademark holders - Businesses that own trademarks desire
                an easy way to protect those resources. Many
                have invested large amounts of money in protecting
                their marks according to an existing legal
                framework.

The features that make sense to users are fairly straightforward. They
desire an identifier that is as close as possible to the identifiers
they use in everyday life. When someone mentioned the term "tide", most
users can differentiate the laundry detergent from the rise
and fall of oceans by context. At the very least a user would expect an
identifier to be able to support two definitions for the same term.

The features that a marketing campaign needs are subtly different.
Currently there is a desire in marketing campaigns that deal with
the Internet to use the Internet connection as an additional marketing
point. The ".com" suffix has become a brandname of sorts that
signifies that the resource being marketed as "Internet savvy".
Additionally, marketing desires identifiers that are short and
not syntactically complex. It should be very easy for either
the user or marketer to use the same identifier for radio and
television as well as the Internet. For example, Network Solutions
does not like to use "NSI" as an identifier because it
does not convey meaning. The string "Network Solutions" is preferable.
In existing Internet identifiers the space (ASCII 20) character
is problematic. A marketing campaign should not have to know this
or change their techniques just to advertise on the Internet.

Once a marketing campaign begins to use a slogan or name in
the public, that name or slogan takes on value as a trademark.
Trademark makes one very important assumption: a mark can be used
by two different entities as long as they are either geographically
separate or exist in two distinctly different industry segments.
There are exceptions to this of course (federal anti-dilution
laws) but by and large it is how trademarks have been used
for hundreds of years. Any system that hopes to be usable by
a marketing campaign must also be capable of co-existing to some
degree with existing trademark law. This means any identifier should
be capable of being used by two separate entities. It
also means that in order to create the geographic and
industry specific segmentation, the user should be able to
specify these components when requesting the resource for
the identifier.

4. Scope

Each of these user communities, when asked, would probably suggest a rather
expansive system that would normally be characterized as a full directory
service.  The task here is to decide which of those features are
required and which are out of scope.

One feature that the end-user will probably request is that the system
allow for keyword searches on the data returned by an identifier or that
the HFIs be organized into some topical hierarchy to be used for browsing.
These features are simply to elaborate and would turn the entire system
into a simple search engine. These already exist and should not be
standardized as part of this problem.

Another feature that the marketing and trademark owners would prefer is
that the system itself protect trademarks by inserting legal/business
logic deep into the resolution system. Due to the massive differences
in legal systems, customs and user expectations, this is simply impossible
to do with current technology. Thus, all decisions about what entity is
allowed to insert which identifiers is a policy issue to be decided
by those entities that participate in the system. In other words, this
system is not a generic trademark enforcement mechanism anymore than
the printing press is. Trademark disputes are still adjudicated within
the legal system. This system should merely reflect that, not enforce it.

Succinctly stated, the requirements that are considered out of scope are
generic search/navigation and trademark enforcement.

5. Requirements

The requirements that are left are fairly simple and should allow for
a system that can be implemented but that still solves enough problems
to be useful.

Shortness

The identifier should be short so that those dealing with marketing
and media can create very short identifiers that users can remember
easily.

Internationalization

The identifier should be fully internationalized. This includes
matching semantics for left-right, right-left, top-bottom orientation;
multi-language soundex, etc.

N-to-N mapping

A single identifier should be capable of being used by two separate
entities. Conversely, an entity should be capable of having more than
one identifier.

Matching semantics

At the least, substring matches are required. Other methods
of matching should be evaluated based on performance and ability
to give the user an accurate result set.

User level context

The client should be able to communicate to the resolution service
its geographic and semantic context so that matches can be ranked
according to location and relevance to the users current context.
The system should be capable of conveying other contexts on a
per-application basis.

Hierarchy

The identifier should be capable of expressing hierarchy. In some
cases it makes sense for an identifier to appear to belong to a
hierarchy. But this is merely a capability. It is not a hierarchy.
It is expected that hierarchical identifiers will be a distinct
minority.

Openness

The system should allow for end-users to insert their own identifiers
into the system in an open manner.

Quality of Service

The user should be presented with some simple system for understanding
whether the identifier was created by an entity that puts a higher
quality of service on the data represented by the identifier. The basic
level of service is where any entity can insert any identifier so
long as no gaurantees are made about that identifiers legal or commercial
status.  The highest level of service is where the identifier is
gauranteed to be a legal trademark in all of the specified contexts and
the data returned by the service is gauranteed to be complete and accurate.

Distributed

While the namespace is inherently flag, the top level servers must be
distributable and should only contain referrals to servers where the
actual data is stored.

Data representation

The data returned to the client should be in a format that allows for
fairly rich content but that does not require the content to be rich.

6. Conclusions

These requirements define a problem space that currently does not have
a solution. They do, on the other hand, describe a problem that is
solvable using existing or easiliy developed/evolved technology.

7. Author Contact Information

Michael Mealling
Network Solutions
505 Huntmar Park Drive
Herndon, VA 22070
voice: (703) 742-0400
fax: (703) 742-9552
email: michaelm@rwhois.net