Internet-Draft                                                 Herbert
  Document: draft-vandesompel-info-uri-00.txt              Van de Sompel
  Expires: March 2004                                Los Alamos National
                                                              Laboratory
                                                            Tony Hammond
                                                                Elsevier
                                                           Eamonn Neylon
                                                      Manifest Solutions
                                                        Stuart L. Weibel
                                                    OCLC Online Computer
                                                          Library Center
  
                                                          September 2003
  
  
                The "info" URI Scheme for Information Assets
                    with Identifiers in Public Namespaces
  
  Status of this Memo
  
     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC 2026.
  
     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as Internet-
     Drafts.
  
     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time. It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."
  
     The list of current Internet-Drafts can be accessed at
          http://www.ietf.org/ietf/1id-abstracts.txt
     The list of Internet-Draft Shadow Directories can be accessed at
          http://www.ietf.org/shadow.html.
  
  Abstract
  
     This document defines the "info" Uniform Resource Identifier (URI)
     scheme for information assets with identifiers in public
     namespaces. Namespaces participating in the "info" URI scheme are
     regulated by an "info" Registry mechanism.
  
  Table of Contents
  
     1  Introduction..................................................2
  
  
  
  Van de Sompel            Expires û March 2004                 [Page 1]


                          The "info" URI Scheme          September 2003
  
  
     2  Terminology...................................................3
     3  Application of the "info" URI Scheme..........................3
     4  The "info" Registry...........................................4
     5  The "info" URI Scheme.........................................5
     6  Normalization and Comparison of "info" URIs...................8
     7  Rationale.....................................................8
     8  Security Considerations.......................................9
     9  Acknowledgements.............................................10
     10   Normative References.......................................10
     11   Non-Normative References...................................10
     12   Authors' Addresses.........................................11
     13   Full Copyright Statement...................................12
  
  1  Introduction
  
     This document defines the "info" Uniform Resource Identifier (URI)
     scheme for information assets that have identifiers in public
     namespaces that are not part of the URI allocation. By information
     asset this document intends any information construct û i.e. any
     abstraction or manifestation - that has identity within a public
     namespace.
  
     There exist many information assets with identifiers in public
     namespaces that are not referenceable by URI schemes. Examples of
     such namespaces include Dewey Decimal Classifications [DEWEY],
     Library of Congress Control Numbers [LCCN], NASA Astrophysics Data
     System Bibcodes [BIBCODE], and Open Archives Initiative
     identifiers [OAI].  Other possible namespaces could include
     National Library of Medicine PubMed identifiers [PUBMED],
     International DOI Foundation Digital Object Identifiers (DOI)
     [DOI], Publisher Item Identifiers (PII) [PII], NISO Serial Item
     and Contribution Identifier (SICI) codes [SICI], and NISO OpenURL
     Framework identifiers [OFI].
  
     The "info" URI scheme facilitates the referencing of information
     assets that have identifiers in such public namespaces by means of
     URIs.  When referencing an information asset by means of its
     "info" URI, the asset SHALL be considered a "resource" as defined
     in RFC 2396 [RFC2396]. As such, the "info" URI scheme enables
     public namespaces that are not part of the URI allocation to
     become part of the URI allocation. The "info" URI scheme thus
     provides a bridging mechanism to allow public namespaces to become
     part of the URI allocation.
  
     Namespaces declared under the "info" URI scheme are regulated by
     an "info" Registry mechanism.  The "info" Registry allows a public
     namespace that is not part of the URI allocation to be declared in
     a registration process by the organization that manages it (the
     Namespace Authority).  The "info" Registry supports the
  
  
  Van de Sompel            Expires - March 2004                 [Page 2]


                          The "info" URI Scheme          September 2003
  
  
     declaration of public namespaces that are not part of the URI
     allocation in a manner that facilitates the construction of URIs
     for information assets without imposing the necessity of URI
     maintenance on the Namespace Authority. Information assets
     identified within a registered namespace SHALL be added or deleted
     according to the business processes of the Namespace Authority,
     and yet MAY be referenced within network applications via the
     "info" URI in an open, standardized way without additional action
     on the part of the Namespace Authority.
  
     The "info" URI scheme explicitly decouples identification from
     resolution.  Applications SHOULD NOT assume that an "info" URI can
     be dereferenced to a representation of the resource identified by
     the URI, though some business processes MAY make "info" URIs
     resolvable either directly or conditionally. The purposes of the
     "info" URI scheme are the identification of information assets and
     the standardization of rules for declaring and comparing identity
     of information assets without regard to any resolution of the URI
     or even whether the information asset identified by the URI is
     accessible on the Internet.
  
  2  Terminology
  
     In this document the key words "MUST", "MUST NOT", "SHALL", "SHALL
     NOT", "SHOULD", "SHOULD NOT", and "MAY" are to be interpreted as
     described in RFC 2119 [RFC2119] and indicate requirement levels
     for compliant implementations.
  
  3  Application of the "info" URI Scheme
  
     Public namespaces that are used for the identification of
     information assets, and that are not referenceable within the URI
     allocation, MAY be registered as namespaces within the "info"
     Registry. Namespace Authorities MAY register these namespaces in
     the "info" URI Registry, thereby making these namespaces
     addressable in applications that use URIs.  Registrations of
     public namespaces that are not part of the URI allocation by
     parties other than the Namespace Authority SHALL NOT be permitted,
     thereby insuring against hostile usurpation or inappropriate usage
     of registered service marks or the public namespaces of others.
  
     Registration under the "info" Registry of a public namespace that
     is not part of the URI allocation implies no particular
     functionalities of the identifiers from the registered namespace
     other than the identification of information assets within the
     namespace being registered. There is no assurance of resolution
     mechanisms associated with the "info" URI scheme, though for any
     particular namespace there MAY exist mechanisms for resolving
     identifiers to information assets. The definition of such services
  
  
  Van de Sompel            Expires - March 2004                 [Page 3]


                          The "info" URI Scheme          September 2003
  
  
     falls outside the scope of the "info" URI scheme. Registration
     does not define namespace-specific semantics for identifiers
     within a registered namespace, though allowable character sets and
     normalization rules are specified in sections 5 and 6 so as to
     ensure that the URIs created using such identifiers are compliant
     with applications that use URIs.
  
     The registration of a public namespace in the "info" Registry
     SHALL NOT preclude further development of services associated with
     that namespace which MAY qualify the namespace for additional
     publication elsewhere within the URI allocation.
  
  4  The "info" Registry
  
     The "info" Registry provides a mechanism for the registration of
     public namespaces that are used for the identification of
     information assets, irrespective of whether such assets are
     accessible on the Internet.
  
     NISO [NISO], the National Information Standards Organization, will
     act as the Maintenance Authority for the "info" Registry, and will
     delegate the day-to-day operation of the "info" Registry to a
     Registry Operator. As the Maintenance Agency, NISO will ensure
     that the Registry Operator operates the "info" Registry in
     accordance with publicly articulated policy established under NISO
     governance and available at <http://info-uri.niso.org/info-uri-
     policy>.  The "info" Registry policy defines a review process for
     candidate namespaces and provides measures of quality control and
     suitability for entry of namespaces.
  
  4.1 Management Characteristics of the "info" Registry
  
     The "info" Registry will be managed according to policies
     established under the auspices of NISO.  All such policies, as
     well as the namespace declarations in the "info" Registry, will be
     public.
  
  4.2 Functional Characteristics of the "info" Registry
  
     The "info" Registry will be publicly accessible and will support
     discovery (by both humans and machines) of:
  
         . string literals identifying the namespaces
         . names and contact information of Namespace Authorities
         . syntax requirement for identifiers maintained in such
           namespaces
         . normalization methodology for identifiers maintained in such
           namespaces
  
  
  
  Van de Sompel            Expires - March 2004                 [Page 4]


                          The "info" URI Scheme          September 2003
  
  
         . if applicable, resolution mechanism, and accompanying
           security considerations, for identifiers maintained in such
           namespaces
  
     where the Registry entries refer to the corresponding "namespace"
     and "identifier" components which are defined in the ABNF given in
     section 5.1 of this document.
  
  4.3 Maintenance of the "info" Registry
  
     The public namespaces that MAY be registered in the "info"
     Registry will be those of interest to the communities served by
     NISO and therefore NISO is committed to act as Maintenance
     Authority for the "info" Registry, and to assign a Registry
     Operator to operate it on a day-to-day basis.
  
     NISO, a non-profit association accredited by the American National
     Standards Institute (ANSI), identifies, develops, maintains, and
     publishes technical standards to manage information in the digital
     environment. NISO standards apply technologies to the full range
     of information-related needs, including retrieval, re-purposing,
     storage, metadata, and preservation.
  
     Founded in 1939, incorporated as a not-for-profit education
     association in 1983, and assuming its current name the following
     year, NISO draws its support from the communities it serves. The
     leaders of over 70 organizations in the fields of publishing,
     libraries, IT and media serve as its voting members. Hundreds of
     experts and practitioners serve on NISO committees and as officers
     of the association.
  
     NISO has been designated by ANSI to represent US interests to the
     International Organization for Standardization's (ISO) Technical
     Committee 46 on Information and Documentation.
  
     The NISO headquarters office is located at: 4733 Bethesda Ave.,
     Bethesda, MD 20814, USA.
  
  5  The "info" URI Scheme
  
  5.1 Definition of "info" URI Syntax
  
     The "info" URI syntax presented in this document conforms to the
     generic URI syntax defined in RFC 2396 [RFC2396]. This
     specification uses the Augmented Backus-Naur Form (ABNF) notation
     of RFC 2234 [RFC2234] to define the URI. The following core ABNF
     productions are used by this specification as defined by Section
  
  
  
  
  Van de Sompel            Expires - March 2004                 [Page 5]


                          The "info" URI Scheme          September 2003
  
  
     6.1 of RFC 2234: ALPHA, DIGIT, HEXDIG. The complete URI syntax is
     as follows:
  
       info-uri        = info-scheme ":" info-identifier
  
       info-scheme     = "info"
  
       info-identifier = info-namespace "/" identifier
  
       info-namespace  = ALPHA *name-char
  
       identifier      = *path-char
  
       name-char       = ALPHA / DIGIT / "+" / "-" / "."
  
       path-char       = unescaped / escaped
  
       unescaped       = ALPHA / DIGIT /
                         "-" / "_" / "." / "!" / "~" / "*" / "'" /
                         "(" / ")" / ";" / ":" / "@" / "&" / "=" /
                         "+" / "$" / ","
  
       escaped         = "%" HEXDIG HEXDIG
  
     An "info" URI has an "info-identifier" as its scheme-specific
     part. An "info-identifier" is constructed by appending an
     "identifier" component to an "info-namespace" component separated
     by a slash "/" character. The "info" scheme is hierarchical as
     indicated by the presence of the slash "/" character but is pruned
     to be one level deep.
  
     Values for the "info-namespace" component of the "info" URI are
     name tokens composed of name characters ("name-char") only.  They
     identify the public namespace in which the value for the
     "identifier" component originates, and are registered in the
     "info" Registry, which guarantees their uniqueness. Although the
     "info-namespace" component is case-insensitive, the canonical form
     is lowercase and documents that specify values for the "info-
     namespace" component MUST do so using lowercase letters. An
     implementation SHOULD accept uppercase letters as equivalent to
     lowercase in "info-namespace" names, for the sake of robustness,
     but SHOULD only generate lowercase "info-namespace" names, for
     consistency.
  
     Values for the "identifier" component of the "info" URI are opaque
     strings composed of path segment characters ("path-char").  In
     their originating public namespace, the values for the
     "identifier" component identify an information asset. The values
     for the "identifier" component MUST be %-escaped, and thus display
  
  
  Van de Sompel            Expires - March 2004                 [Page 6]


                          The "info" URI Scheme          September 2003
  
  
     no visible URI structure. The "identifier" component MUST be
     treated as case-sensitive, although the "info" Registry MAY record
     the case-sensitivity of identifiers from within particular
     registered public namespaces.
  
  5.2 Allowed Characters Under the "info" URI Scheme
  
     The "info" URI syntax uses the same set of allowed US-ASCII
     characters as specified in RFC 2396 [RFC2396] for a generic URI.
     Reserved characters as well as excluded US-ASCII characters and
     non-US-ASCII characters MUST be %-escaped before forming the URI.
     Details of the %-escape encoding can be found in RFC 2396, section 
     2.4.
  
  5.3 Examples of "info" URIs
  
     Some examples of syntactically valid "info" URIs are given below:
  
       a) info:ddc/22%2Feng%2F%2F004.678
  
     where "ddc" is the "info-namespace" component for the namespace of
     Dewey Decimal Classifications [DEWEY] and "22%2Feng%2F%2F004.678"
     is the "identifier" component for an identifier of an information
     asset within that namespace in %-escaped form. In unescaped form
     this identifier is "22/eng//004.678", i.e. the (22nd Ed.) English-
     language Dewey Decimal Classification for "Internet".
  
       b) info:lccn/2002022641
  
     where "lccn" is the "info-namespace" component for a Library of
     Congress Control Number [LCCN] namespace and "2002022641" is the
     "identifier" component for an identifier of an information asset
     within that namespace.
  
       c) <rdf:Description about="info:bibcode/2003Icar..163..263Z"/>
  
     where "bibcode" is the "info-namespace" component for a NASA ADS
     Bibcode [BIBCODE] namespace and "2003Icar..163..263Z " is the
     "identifier" component for an identifier of an information asset
     within that namespace.
  
       d) info:oai/arXiv.org:hep-th%2F9901001
  
     where "oai" is the "info-namespace" component for the namespace of
     OAI identifiers [OAI] and "arXiv.org:hep-th%2F9901001" is the
     "identifier" component for an identifier of an information asset
     within that namespace in %-escaped form. In unescaped form this
     identifier is represented using the Unicode [UNICODE] character
  
  
  Van de Sompel            Expires - March 2004                 [Page 7]


                          The "info" URI Scheme          September 2003
  
  
     set and is encoded in UTF-8 [RFC2279] as "arXiv.org:hep-
     th/9901001".
  
  6  Normalization and Comparison of "info" URIs
  
     In order to facilitate comparison of "info" URIs, a sequence of
     normalization steps SHOULD be applied to minimize the amount of
     software processing.
  
     Since the "info" URI is case-sensitive, a canonical form MAY only
     be arrived at by consulting the "info" Registry for possible
     information on the case-sensitivity of identifiers from a
     registered public namespace, and any case normalization step to
     apply.
  
     The following normalization steps SHOULD be applied:
  
         a) Normalize the case of the URI scheme token to be
            lowercase
         b) Normalize the case of the "info-namespace" component to be
            lowercase
         c) Unescape all unreserved %-escaped characters
         d) Normalize the case of any %-escaped characters to be
            uppercase
  
     The following unnormalized forms of an "info" URI
  
         U1. INFO:OAI/arXiv.org:hep-th%2F9901001
         U2. info:oai/ARXIV.ORG:hep-th%2f9901001
         U3. info:oai/arXiv.org:hep-th%2f9901001
         U4. info:OAI/arXiv.org%3AHEP-TH%2F9901001
  
     are normalized to the following respective forms
  
         N1. info:oai/arXiv.org:hep-th%2F9901001
         N2. info:oai/ARXIV.ORG:hep-th%2F9901001
         N3. info:oai/arXiv.org:hep-th%2F9901001
         N4. info:oai/arXiv.org:HEP-TH%2F9901001
  
     If the "oai" public namespace were registered in the "info"
     Registry as being case-insensitive and normalized to a lowercase
     form, then the above URI forms can be reduced to the following
     unique canonical form
  
         N0. info:oai/arxiv.org:hep-th%2F9901001
  
  7  Rationale
  
  
  
  
  Van de Sompel            Expires - March 2004                 [Page 8]


                          The "info" URI Scheme          September 2003
  
  
  7.1 Why Create a New URI Scheme for Identifiers from Public
  Namespaces?
  
     Under RFC 2718, "Guidelines for new URL Schemes" [RFC2718], it is
     stated that a URI scheme SHOULD have a "demonstrated utility", and
     in particular SHOULD be applied to "things that cannot be referred
     to in any other way". The "info" URI scheme allows identifiers
     within registered public namespaces, used for the identification
     of information assets, to be referred to within the URI
     allocation. Once a namespace is registered in the "info" Registry,
     the "info" URI scheme enables an information asset with an
     identifier in that namespace to be referenced by means of a URI.
     As a result, the information asset SHALL be considered a resource
     as defined in RFC 2396 [RFC2396].
  
  7.2 Why Not Use a URN Namespace ID for Identifiers from Public
  Namespaces?
  
     RFC 2396 [RFC2396] states that a "URN differs from a URL in that
     itÆs primary purpose is persistent labeling of a resource with an
     identifier". An "info" URI on the other hand does not assert
     persistence of resource names or of the resource itself, but
     rather declares namespaces of identifiers managed according to the
     policies and business models of the Namespace Authorities.  Some
     of these namespaces will not have persistence of identifiers as a
     primary purpose, while others will have locator semantics as well
     as name semantics. It would therefore be inappropriate to employ a
     URN Namespace ID for such namespaces.
  
  8  Security Considerations
  
     The "info" URI scheme syntax is subject to the same security
     considerations as the generic URI syntax described in RFC 2396
     [RFC2396].
  
     An "info" URI provides no assurance that any resolution mechanism
     is deployed and therefore security considerations regarding the
     use of "info" URIs MUST be limited solely to the nature of the
     identifier string itself. This document makes no guarantees about
     the safety of resolving "info" URIs. Should a namespace registered
     under the "info" scheme provide mechanisms for resolving "info"
     URIs to information assets, an implementation MUST consult the
     "info" Registry for any security considerations. The definition of
     such resolution mechanisms falls outside the scope of the "info"
     URI scheme.
  
     If an implementation for whatever reason cannot consult the "info"
     Registry for entries regarding a particular "info" namespace it
  
  
  Van de Sompel            Expires - March 2004                 [Page 9]


                          The "info" URI Scheme          September 2003
  
  
     MUST NOT make any assumptions regarding security arrangements for
     that namespace beyond the nature of the identifier string itself.
  
  9  Acknowledgements
  
     The authors acknowledge the contributions of Michael Mealling,
     Verisign, and Patrick Hochstenbach, Los Alamos National
     Laboratory.
  
  10 Normative References
  
     [RFC2119] Bradner, S., "Key Words for Use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.
               Retrieved September 20, 2003 from
               <http://www.ietf.org/rfc/rfc2119.txt>.
  
     [RFC2234] Crocker, D.H. and Overell, P., "Augmented BNF for Syntax
               Specifications: ABNF", RFC 2234, November 1997.
               Retrieved September 20, 2003 from
               <http://www.ietf.org/rfc/rfc2234.txt>.
  
     [RFC2279] Yergeau, F., "UTF-8, A Transformation Format for Unicode
               and ISO10646", RFC 2279, October 1996. Retrieved
               September 20, 2003 from
               <http://www.ietf.org/rfc/rfc2279.txt>.
  
     [RFC2396] Berners-Lee, T., R. Fielding and L. Manister, "Uniform
               Resource Identifiers (URI): Generic Syntax", RFC 2396,
               August 1998. Retrieved September 20, 2003 from
               <http://www.ietf.org/rfc/rfc2396.txt>.
  
     [RFC2718] Masinter, L., H. Alvestrand, D. Zigmond and P. Petke,
               "Guidelines for new URL Schemes", RFC 2718, November
               1999. Retrieved September 20, 2003 from
               <http://www.ietf.org/rfc/rfc2718.txt>.
  
     [UNICODE] The Unicode Consortium. The Unicode Standard, Version
               4.0.0, defined by: The Unicode Standard, Version 4.0
               (Reading, MA, Addison-Wesley, 2003). ISBN 0-321-18578-1.
  
  11 Non-Normative References
  
     [BIBCODE] "NASA Astrophysics Data System Bibliographic Code".
               Retrieved August 1, 2003 from
               <http://adsdoc.harvard.edu/abs_doc/help_pages/data.html>
               .
  
     [DEWEY]   "Dewey Decimal Classification". Retrieved September 20,
               2003 from <http://www.oclc.org/dewey/>.
  
  
  Van de Sompel            Expires - March 2004                [Page 10]


                          The "info" URI Scheme          September 2003
  
  
  
     [DOI]     ANSI/NISO Z39.84-2000, "Syntax for the Digital Object
               Identifier", ISBN 1-880124-47-5. Retrieved September 25,
               2003 from <http://www.niso.org/standards/resources/Z39-
               84-2000.pdf>.
  
     [LCCN]    "Library of Congress Control Number". Retrieved August
               1, 2003 from
               <http://lcweb.loc.gov/marc/lccn_structure.html>.
  
     [NISO]    National Information Standards Organization.  Retrieved
               August 1, 2003 from <http://www.niso.org>.
  
     [OAI]     Lagoze, C., H. Van de Sompel, M. Nelson and S. Warner.
               "Specification and XML Schema for the OAI Identifier
               Format", June 2002.  Retrieved September 4, 2003 from
               <http://www.openarchives.org/OAI/2.0/guidelines-oai-
               identifier.htm>.
  
     [OFI]     Draft Standard for Trial Use ANSI/NISO Z39.88, "The
               OpenURL Framework for Context-Sensitive Services".
               Retrieved September 20, 2003 from
               <http://library.caltech.edu/openurl/Public_Comments.htm>
  
     [PII]     "Publisher Item Identifier as a means of document
               identification". Retrieved September 25, 2003 from
               <http://www.elsevier.nl/inca/homepage/about/pii/>.
  
     [PUBMED]  "PubMed Overview", National Library of Medicine.
               Retrieved September 25, 2003 from
               <http://www.ncbi.nlm.nih.gov/entrez/query/static/overvie
               w.html>.
  
     [SICI]    ANSI/NISO Z39.56-1996 (R2002), "Serial Item and
               Contribution Identifier (SICI)", ISBN 1-880124-28-9.
               Retrieved September 25, 2003 from
               <http://www.niso.org/standards/resources/Z39-56-
               1996.pdf>
  
  12 Authors' Addresses
  
     Herbert Van de Sompel
     Los Alamos National Laboratory,
     Research Library, MS-P362,
     PO Box 1663,
     Los Alamos, NM 87545-1362, USA
     herbertv@lanl.gov
  
     Tony Hammond
  
  
  Van de Sompel            Expires - March 2004                [Page 11]


                          The "info" URI Scheme          September 2003
  
  
     Elsevier Ltd
     32 Jamestown Road
     London, NW1 7BY, UK
     t.hammond@elsevier.com
  
     Eamonn Neylon
     Manifest Solutions
     Bicester
     Oxfordshire, OX26 2HX, UK
     eneylon@manifestsolutions.com
  
     Stu Weibel
     OCLC Online Computer Library Center, Inc.
     6565 Frantz Road
     Dublin, OH 43017-3395, USA
     weibel@oclc.org
  
  13 Full Copyright Statement
  
     Copyright (C) The Internet Society (2003).  All Rights Reserved.
  
     This document and translations of it may be copied and furnished
     to others, and derivative works that comment on or otherwise
     explain it or assist in its implementation may be prepared, copied,
     published and distributed, in whole or in part, without
     restriction of any kind, provided that the above copyright notice
     and this paragraph are included on all such copies and derivative
     works.  However, this document itself may not be modified in any
     way, such as by removing the copyright notice or references to the
     Internet Society or other Internet organizations, except as needed
     for the purpose of developing Internet standards in which case the
     procedures for copyrights defined in the Internet Standards
     process MUST be followed, or as required to translate it into
     languages other than English.
  
     The limited permissions granted above are perpetual and will not
     be revoked by the Internet Society or its successors or assigns.
  
     This document and the information contained herein is provided on
     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
     IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
     THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
  
  
  
  
  
  
  
  Van de Sompel            Expires - March 2004                [Page 12]