Uniform Resource Names (urnbis) J.C. Klensin
Internet-Draft April 7, 2014
Updates: 3986 (if approved)
Intended status: Standards Track
Expires: October 07, 2014
Names are Not Locators and URNs are Not URIs
draft-ietf-urnbis-urns-are-not-uris-00.txt
Abstract
Experience has shown that identifiers associated with persistent
names are quite different from identifiers associated with the
locations of objects. This is especially true when such names are
are expected to be stable for a very long time or when they identify
large and complex entities. In order to allow Uniform Resource Names
(URNs) to evolve to meet the needs of the Informational Sciences
community and other users, this specification separates the syntax
for URNs from the generic syntax for Uniform Resource Identifiers
(URIs) specified in RFC 3986, updating the latter specification
accordingly.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 07, 2014.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (http://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Klensin Expires October 07, 2014 [Page 1]
Internet-Draft URNs are not URIs April 2014
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. A perspective on locations and names . . . . . . . . . . . . . 2
3. Changes to RFC 3986 . . . . . . . . . . . . . . . . . . . . . 5
4. Other Required Actions . . . . . . . . . . . . . . . . . . . . 5
5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 5
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 5
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
8. Security Considerations . . . . . . . . . . . . . . . . . . . 6
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
9.1. Normative References . . . . . . . . . . . . . . . . . . . 6
9.2. Informative References . . . . . . . . . . . . . . . . . . 6
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 6
1. Introduction
The Internet community now has many years of experience with both
name-type identifiers (notably Uniform Resource Names (URNs [RFC2141]
[RFC2141bis]) and location-based identifiers (notably Uniform
Resource Locators (URLs) [RFC1738]). That experience leads to the
conclusion that it is impractical to constrain URNs to the syntax and
high-level semantics of URLs. Generalization from URLs to generic
Uniform Resource Identifiers (URIs) [RFC3986], especially to name-
based, high-stability, long-persistence, identifiers of the URN
variety, has failed because the assumed similarities do not exist to
a sufficient degree. Ultimately, locators, which typically depend on
particular accessing protocols and a specification relative to some
physical space or network topology, are simply different creatures
from long-persistence, location-independent, object identifiers. The
syntax and semantic constraints that are appropriate for locators are
either irrelevant to or interfere with the needs of resource names as
a class. That was tolerable as long as the URN system didn't need
additional capabilities but experience since RFC 2141 was published
has shown that they are, in fact, needed.
This specification updates the Generic URI Syntax specification
[RFC3986] to exclude URNs from its coverage. Put differently, with
the publication of this specification, URNs are no longer considered
a member of the class of URIs to which RFC 3986 applies.
[[Note in draft: the above leaves it ambiguous as to whether it
remains appropriate to call URNs "URIs". That ambiguity is
intentional and, if possible should keep the question part of the
"someone else's problem" category.]]
For URLs and such other URIs as may exist or be created in the
future, this specification does not change the syntax rules and other
requirements and recommendations of RFC 3986.
2. A perspective on locations and names
Klensin Expires October 07, 2014 [Page 2]
Internet-Draft URNs are not URIs April 2014
Content industries (e.g., publishers) and memory organizations (e.g.,
libraries, archives, and museums) invest a lot of resources on naming
things and the topics of naming and classification are important
information science issues. Tens, if not hundreds, of millions of
persistent identifiers have been assigned during the last decade.
Several identifier systems have been developed for persistent and
unique identification of resources. When there is a real need to
preserve something important (such as scientific publications,
research data, government publications, etc.) for the long term, URNs
or other persistent identifiers are used; URLs (or other generic
URIs) are not being used for identification or even linking purposes.
Naming and locating e.g. library resources are both complex
activities which have different aims. Traditionally, naming and
locating resources have been separate activities, and the rules for
the former are much more stringent than for the latter. The same
principles are being applied to digital materials as well as more
traditional ones. In a library, any book, be it printed or digital,
has both unique and persistent International Standard Book Number
(ISBN) and non-unique (each copy has its own location information)
and short-lived location information which cannot be trusted in the
long run. ISBN never changes, but both shelf locations and Web
addresses usually do, many times during the book's life span.
Giving location information a role in identification would not only
force libraries to adopt different policies for printed and digital
content, it would also undermine the value of existing identifier
systems. Let us assume that ten people independently upload a copy
of an electronic book into different locations in the Web. Are all
these ten URLs valid identifiers of the book? And what is their
relation to the ISBN or other identification information of the book
such as its title?
From the perspective of the communities who depend on persistent
identifiers, critical issues include:
1. Resource identification has to be a managed process. Assigning
URIs generally is not. Although it may be possible to introduce
some level of control to URI assignment, a user cannot determine
whether some URI is reliable or not.
2. Anyone may assign new URIs to resources even if these resources
already have proper identifiers assigned to them. Claiming that
these URIs actually identify something undermines the value of
proper identifiers.
3. There is no 1:1 relation between the resource identified and
URIs. An e-book in the Web may be represented as 1-n files
(URIs), and a single file may contain several books. And books
are simple, we need to name very complex objects such as research
data sets, or some component parts within these complex data
sets.
Klensin Expires October 07, 2014 [Page 3]
Internet-Draft URNs are not URIs April 2014
4. One resource such as a scientific article is typically available
from multiple locations, including (for instance) the publisher's
document supply service, a university's open repositories and
other cooperative repository systems, legal deposit collections
and the Internet archive. A resource should have one and only
one identifier of a given type; URIs do not meet this
requirement.
5. URIs relate to instances (copies) of resources, whereas
traditionally identification has much broader scope. Identifiers
may be assigned to, e.g., an immaterial work (such as Hamlet),
its expressions (e.g. Finnish translation of Hamlet), and
manifestations of works and expressions (e.g. PDF version of
Finnish translation of Hamlet).
6. Over time, different resources (or different versions of the same
resource) may be found from the same non-URN URI. A user has no
way of knowing whether the resource has changed. One of the
basic principles for proper identifier systems is that the same
identifier is never assigned to another resource. In general,
URIs do not meet this requirement.
7. Persistent identification must be available for resources which
are available only in databases and other environments that are
often identified today as "deep web". URIs for these resources
tend to be very complicated and it will be difficult to keep them
alive even with the help of DNS redirection when e.g. the
underlying database management system changes.
8. The role URI fragment and query could or should have in
identification is unclear and the statements in RFC 3986 are
definitely problematic from the points of view of existing
identifier systems and management of naming.
Does fragment identify a location or a certain section of a resource?
In the evolving set of URN Internet standards, fragment will not be a
part of the Namespace Specific String. Then fragment only indicates
a place / segment within the identified resource, but does not
identify it. If fragment had a role in identification, fragments
would extend the scope of existing standard identifiers to component
parts of resources. For instance, anyone could use URN based on ISBN
+ fragment to identify chapters of electronic books.
Things get even more complicated with query since what an identifier
+ query resolves to may not have anything to do with the original
resource. For instance, URN based in ISBN + query may resolve to the
metadata record describing the book. These records have their own
identifiers which are not based on ISBNs.
[[Note in draft: Most of the discussion above may belong in 2141bis
rather than here.]]
Klensin Expires October 07, 2014 [Page 4]
Internet-Draft URNs are not URIs April 2014
9. For many organizations, persistence means decades or centuries.
Anything that is protocol dependent will eventually fail. URLs
do not change by themselves, but in the long run it is very
difficult for people to not change them or the objects to which
they point.
The mention of centuries is intentional. Content industries, memory
organizations (such as national and repository libraries and national
archives) and universities and other research organizations, need
identifiers that will persist for hundreds of years. Such
identifiers might even need to outlast the institutions themselves,
and definitely should be usable even if current technologies such as
the Web and the Internet cease to exist or are supplanted by
something new (as unlikely as that might seem today).
In addition, operations on, or additional specifications about, names
and the associated objects must be possible, as stable as the names
themselves, and reasonably efficient. For example, if a URN were
assigned to an encyclopedia that consisted of many volumes, it should
be feasible to identify (and locate and retrieve if that were
desired) a particular volume or even a particular article without
accessing or retrieving the entire set.
3. Changes to RFC 3986
This specification removes URNs from the scope of RFC 3896. It makes
no changes for URI types that remain within that scope.
4. Other Required Actions
The basic URN syntax specification [RFC2141] was published well
before RFC 3986 and therefore does not depend on it. Successors to
that specification will need to fully spell out the syntax and
semantics of URNs without generic or implicit reference to any URI
specification.
5. Acknowledgments
This specification was inspired by a search in the IETF URNBIS WG for
other alternatives that would both satisfy the needs of persistent
name-type identifiers and still fully conform to the specifications
and intent of RFC 3986. That search lasted several years and
considered many alternatives. Discussions with Leslie Daigle, Juha
Hakala, Barry Leiba, Keith Moore, Andrew Newton, and Peter Saint-
Andre during the last quarter of 2013 and the first quarter of 2014
were particularly helpful in getting to the conclusion that a
conceptual separation of notions of location-based identifiers (e.g.,
URLs) and the types of persistent identifiers represented by URNs was
necessary. Peter Saint-Andre provided significant text in a pre-
publication review.
6. Contributors
Klensin Expires October 07, 2014 [Page 5]
Internet-Draft URNs are not URIs April 2014
Juha Hakala contributed most of the text of Section 2.
Contact Information:
Juha Hakala
The National Library of Finland
P.O. Box 15, Helsinki University
Helsinki, MA FIN-00014
Finland
Email: juha.hakala@helsinki.fi
7. IANA Considerations
[[RFC Editor: Please remove this section before publication.]]
This memo is not believed to require any action on IANA's part. In
particular, we note that there are a collection of "Uniform Resource
Identifier (URI) Schemes" that does not include URNs and a series of
URN-specific registries that do not rely on the URI specificstions.
8. Security Considerations
All drafts are required to have a security considerations section.
9. References
9.1. Normative References
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, RFC
3986, January 2005.
9.2. Informative References
[DeterministicURI]
Mazahir, O., Thaler, D. and G. Montenegro, "Deterministic
URI Encoding", February 2014, <http://www.ietf.org/id/
draft-montenegro-httpbis-uri-encoding-00.txt>.
[RFC1738] Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform
Resource Locators (URL)", RFC 1738, December 1994.
[RFC2141bis]
Saint-Andre, P., "Uniform Resource Name (URN) Syntax",
January 2014, <https://datatracker.ietf.org/doc/draft-
ietf-urnbis-rfc2141bis-urn/>.
Author's Address
Klensin Expires October 07, 2014 [Page 6]
Internet-Draft URNs are not URIs April 2014
John C Klensin
1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140
USA
Phone: +1 617 245 1457
Email: john-ietf@jck.com
Klensin Expires October 07, 2014 [Page 7]