Network Working Group N. Walsh
Internet-Draft Sun Microsystems, Inc.
Expires: August 14, 2001 J. Cowan
Reuters Health Information
P. Grosso
Arbortext, Inc.
February 13, 2001
A URN Namespace for Public Identifiers
draft-walsh-urn-publicid-00
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 14, 2001.
Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
This document describes a URN namespace that is designed to allow
Public Identifiers to be expressed in URI syntax.
1. Introduction
XML[1] external entities have two identifiers: a public identifier
and a system identifier. The system identifier is a URI, by
definition, but the public identifier is simply a string.
Walsh, et. al. Expires August 14, 2001 [Page 1]
Internet-Draft A URN Namespace for Public Ids February 2001
Historically, the system identifier of an external entity has been a
local, or system-specific identifier while the public identifier has
been a more global, persistent name.
Unfortunately, public identifiers do not fit neatly into the
existing web architecture because they are not legal URIs. Many new
specifications (XSLT, XML Schema, etc.) have the implicit or
explicit requirement that all external identifiers be URIs.
Any string which consists only of the public identifier characters
(defined by Production 13 of Extensible Markup Language (XML) 1.0
Second Edition[1]) is a legal public identifier. But SGML[3]
defines a restricted subset of public identifier called a "Formal
Public Identifier" (FPI). For the purpose of this document, the
significant difference between public identifiers and FPIs is that
FPIs have internal structure and may have registered owner
identifiers.
This document describes a scheme for representing public identifiers
as URNs by introducing a public identifier namespace, "publicid".
This namespace specification is for a formal namespace.
2. Specification Template
Namespace ID:
"publicid" requested.
Registration Information:
Registration Version Number: 1
Registration Date: 2001-02-13
Declared registrant of the namespace:
Norman Walsh
Sun Microsystems, Inc.
One Network Drive MS UBURO2-201
Burlington, MA
01803-0902
Norman.Walsh@East.Sun.COM
Declaration of structure:
The purpose of this namespace is to allow public identifiers
to be encoded in URNs in a reliable, comparable way. To that
end, this document mandates that public identifiers be
Walsh, et. al. Expires August 14, 2001 [Page 2]
Internet-Draft A URN Namespace for Public Ids February 2001
normalized before encoding them into URNs. As described in ISO
8879[3], a public identifier is normalized by removing all
leading and trailing whitespace and replacing all remaining
sequences of two or more whitespace characters with a single
space.
For public identifiers that are not FPIs, the Namespace
Specific String (NSS) for URNs in the "publicid" namespace has
the following structure:
urn:publicid:{public-identifier-text}
The character set of public identifiers is constrained by
XML[1]. Most of the legal public identifier characters are
also legal characters in URNs. Unless otherwise noted, the
characters in the {public-identifier-text} are directly
transcribed from the corresponding character in the public
identifier. The following exceptions are made:
+ Spaces in the public identifier are transcribed as "+"
characters. Whitespace normalization must be performed
before constructing a URN in the "publicid" namespace,
therefore the sequence of characters "++" should never
occur in such URNs.
+ Literal "+" characters in the public identifier must be
%-encoded.
+ Literal ":" characters in the public identifier must be
%-encoded.
+ The reserved characters that may appear in public
identifiers, "%", "/", "?", and "#", must be %-encoded.
Formal Public Identifiers are a subset of public identifiers.
They are strings composed from the same range of characters,
but have an explicit internal structure. The structure of
Formal Public Identifiers is normatively described in SGML[3],
we review it here for convenience.
Most Formal Public Identifiers consist of the following
fields, in this order: an owner identifier, a public text
class, a public text description, a public text language or
public text designating sequence, and an optional public text
display version.
Owner identifiers may begin with "-//" or "+//", otherwise
"//" is used to delimit fields in the FPI with the exception
of the public text class which is delimited from the public
text description by a space.
In other words, most FPIs look like this:
Walsh, et. al. Expires August 14, 2001 [Page 3]
owner//class description//language//version
and most owners begin with "+//" or "-//", although they are
not required to. Here are some example FPIs:
+//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN//XML
-//OASIS//DTD DocBook XML V4.1.2//EN
-//ArborText::prod//DTD Help Navigation Document::19970708//EN
ISO/IEC 10179:1996//DTD DSSSL Architecture//EN
ISO 8879:1986//ENTITIES Added Latin 1//EN
An algorithm for correctly identifying a Formal Public
Identifier and determining the various fields within it is out
of scope for this document. We begin our discussion of the
representation of FPIs in our URN namespace under the
assumption that these steps have already been taken.
The Namespace Specific String (NSS) for the URNs in the
"publicid" namespace that represent Formal Public Identifiers
have the following structure:
urn:publicid:{owner-identifier}:{text-class}
:{text-description}:{language|designating-sequence}
{:display-version}?
Where:
{owner-identifier} is derived from the owner identifier in
the FPI. Owner identifiers in FPIs have one of three forms:
"+//" followed by a string, "-//" followed by a string, or
a string that does not contain "//". The following rules
apply to derive a URN {owner-identifier} from the owner
identifier in an FPI:
- Owner identifiers that begin "+//" are transcribed into
the URN {owner-identifier} by replacing "+//" with "+:"
and transcribing the remaining string.
- Owner identifiers that begin "-//" are transcribed into
the URN {owner-identifier} by replacing "-//" with "-:"
and transcribing the remaining string.
- All other {owner-identifiers} are transcribed directly
from the owner identifier in the FPI.
{text-class} is the public text class from the FPI. The
public text class of FPIs is constrained by SGML[3] to the
following 13 strings: "CAPACITY", "CHARSET", "DOCUMENT",
"DTD", "ELEMENTS", "ENTITIES", "LPD", "NONSGML",
"NOTATION", "SHORTREF", "SUBDOC", "SYNTAX", or "TEXT". The
"publicid" URN namespace explicitly relaxes this
Walsh, et. al. Expires August 14, 2001 [Page 4]
Internet-Draft A URN Namespace for Public Ids February 2001
constraint. Any string may be used.
{text-description} is the public text description transcribed
from the FPI.
{language} is the public text language transcribed from the
FPI. The {language} codes used in "publicid" URNs should be
drawn from RFC 3066[6].
{designating-sequence} is the public text designating
sequence transcribed from the FPI. Formal Public
Identifiers that describe character sets may use the
designating sequence (a string defined by ISO 2022[2]) to
identify the character set.
{display-version} is the public text display version
transcribed from the FPI.
Most of the legal public identifier characters are also legal
characters in URNs. Unless otherwise noted, the characters in
the {owner-identifier}, {text-class}, {text-description},
{language}, {designating-sequence}, and {display-version} are
directly transcribed from the corresponding character in the
Formal Public Identifier. The following exceptions are made:
+ Spaces in the FPI are transcribed as "+" characters.
Whitespace normalization must be performed before
constructing a URN in the "publicid" namespace, therefore
the sequence of characters "++" should never occur in such
URNs.
+ Literal "+" characters in the FPI, except at the beginning
of {owner-identifier}s for FPIs that have the "+//"-form of
owner identifier, must be %-encoded. The "+" characters at
the beginning of {owner-identifier}s for FPIs that have the
"+//"-form of owner identifier, must not be %-encoded.
+ The sequence "::" in the owner identifier or public text
description is transcribed as "::"; all other uses of a
literal ":" in the FPI must be %-encoded.
+ The reserved characters that may appear in FPIs, "%", "/",
"?", and "#", must be %-encoded.
A small subset of Formal Public Identifiers cannot be
represented by this namespace. An FPI cannot be represented if
either of the following conditions applies:
+ After transcription, the {owner-identifier}, {text-class},
{text-description}, {language}, or {designating-sequence}
would be empty. Allowing any of these fields to be empty
could introduce ambiguous "::" sequences into the URN.
Walsh, et. al. Expires August 14, 2001 [Page 5]
Internet-Draft A URN Namespace for Public Ids February 2001
+ The FPI uses the optional unavailable text indicator
defined in SGML[3] but rarely used in practice.
Relevant ancillary documentation:
Extensible Markup Language (XML) Version 1.0 Second Edition[1]
Standard Generalized Markup Language (SGML)[3]
Registration procedures for public text owner identifiers[4]
Identifier uniqueness considerations:
The identifier uniqueness considerations for URNs in the
"publicid" namespace are the same as the identifier uniqueness
considerations for public identifiers. Formal Public
Identifiers with registered owner identifiers are required to
be unique. For unregistered owner identifiers and informal
public identifiers, they may or may not be unique, no
enforcement policy can be asserted.
Identifier persistence considerations:
The persistence of URNs in the "publicid" namespace is the
same as the persistence of the corresponding public identifier.
Process of identifier assignment:
Identifiers in the "publicid" namespace may be assigned by the
same policies and procedures as public identifiers.
Process of identifier resolution:
Identifiers in the "publicid" namespace may be resolved by the
same policies and procedures as public identifiers.
Rules for Lexical Equivalence:
Whitespace normalization is performed before constructing a
URN in the "publicid" namespace, so such URNs are lexically
equivalent if they are lexically identical.
Conformance with URN Syntax:
No special considerations.
Validation mechanism:
None specified.
Walsh, et. al. Expires August 14, 2001 [Page 6]
Internet-Draft A URN Namespace for Public Ids February 2001
Scope:
Global
3. Examples
The following examples are not guaranteed to be real. They are
listed for pedagogical reasons only.
"ISO/IEC 10179:1996//DTD DSSSL Architecture//EN" becomes
"urn:publicid:ISO%2FIEC+10179%3A1996:DTD:DSSSL+Architecture:EN"
"ISO 8879:1986//ENTITIES Added Latin 1//EN" becomes
"urn:publicid:ISO+8879%3A1986:ENTITIES:Added+Latin+1:EN"
"-//OASIS//DTD DocBook XML V4.1.2//EN" becomes
"urn:publicid:-:OASIS:DTD:DocBook+XML+V4.1.2:EN"
"+//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN//XML"
becomes
"urn:publicid:+:IDN+python.org:DTD:XML+Bookmark+Exchange+Language+1.0:EN:XML"
"-//ArborText::prod//DTD Help Navigation Document::19970708//EN"
becomes
"urn:publicid:-:ArborText::prod:DTD+Help+Navigation+Document::19970708:EN"
"foo" becomes
"urn:publicid:foo"
"3+3=6" becomes
"urn:publicid:3%2B3=6"
"-//Acme, Inc.//DTD General Book Markup Version 1.0" becomes
"urn:publicid:-%2F%2FAcme,+Inc.%2F%2FDTD+General+Book+Markup+Version+1.0"
because it is not an FPI (it has no public text language or
designating sequence).
4. Security Considerations
There are no additional security considerations other than those
normally associated with the use and resolution of URNs in general.
References
[1] W3C, XML WG, "Extensible Markup Language (XML) 1.0 Second
Walsh, et. al. Expires August 14, 2001 [Page 7]
Internet-Draft A URN Namespace for Public Ids February 2001
Edition", February 1998,
<http://www.w3.org/TR/REC-xml>.
[2] JTC 1, SC 2, "ISO (International Organization for
Standardization) ISO 2022:1994 Information technology --
Character code structure and extension techniques (fourth
edition).", 1994.
[3] JTC 1, SC 34, "ISO 8879:1986 Information processing -- Text and
office systems -- Standard Generalized Markup Language (SGML)",
1986.
[4] JTC 1, SC 34, "ISO/IEC 9070:1991 Information technology -- SGML
support facilities -- Registration procedures for public text
owner identifiers", 1991.
[5] Moats, R., "URN Syntax", RFC 2141, May 1997.
[6] Alvestrand, H., "Tags for the Identification of Languages", RFC
3066, January 2001.
Authors' Addresses
Norman Walsh
Sun Microsystems, Inc.
One Network Drive MS UBURO2-201
Burlington, MA 01803-0902
US
EMail: Norman.Walsh@East.Sun.COM
John Cowan
Reuters Health Information
1700 Broadway, 31st Floor
New York, NY 10019
US
EMail: jcowan@reutershealth.com
Paul Grosso
Arbortext, Inc.
1000 Victors Way
Ann Arbor, MI 48108-2744
US
EMail: pgrosso@arbortext.com
Walsh, et. al. Expires August 14, 2001 [Page 8]
Internet-Draft A URN Namespace for Public Ids February 2001
Full Copyright Statement
Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC editor function is currently provided by the
Internet Society.
Walsh, et. al. Expires August 14, 2001 [Page 9]