URN Syntax
RFC 2141
Document | Type |
RFC - Proposed Standard
(May 1997; Errata)
Obsoleted by RFC 8141
Was draft-ietf-urn-syntax (urn WG)
|
|
---|---|---|---|
Author | Ryan Moats | ||
Last updated | 2014-07-29 | ||
Stream | IETF | ||
Formats | plain text html pdf htmlized bibtex | ||
Stream | WG state | (None) | |
Document shepherd | No shepherd assigned | ||
IESG | IESG state | RFC 2141 (Proposed Standard) | |
Consensus Boilerplate | Unknown | ||
Telechat date | |||
Responsible AD | (None) | ||
Send notices to | (None) |
Network Working Group R. Moats Request for Comments: 2141 AT&T Category: Standards Track May 1997 URN Syntax Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers. This document sets forward the canonical syntax for URNs. A discussion of both existing legacy and new namespaces and requirements for URN presentation and transmission are presented. Finally, there is a discussion of URN equivalence and how to determine it. 1. Introduction Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers and are designed to make it easy to map other namespaces (which share the properties of URNs) into URN-space. Therefore, the URN syntax provides a means to encode character data in a form that can be sent in existing protocols, transcribed on most keyboards, etc. 2. Syntax All URNs have the following syntax (phrases enclosed in quotes are REQUIRED): <URN> ::= "urn:" <NID> ":" <NSS> where <NID> is the Namespace Identifier, and <NSS> is the Namespace Specific String. The leading "urn:" sequence is case-insensitive. The Namespace ID determines the _syntactic_ interpretation of the Namespace Specific String (as discussed in [1]). Moats Standards Track [Page 1] RFC 2141 URN Syntax May 1997 RFC 1630 [2] and RFC 1737 [3] each presents additional considerations for URN encoding, which have implications as far as limiting syntax. On the other hand, the requirement to support existing legacy naming systems has the effect of broadening syntax. Thus, we discuss the acceptable syntax for both the Namespace Identifier and the Namespace Specific String separately. 2.1 Namespace Identifier Syntax The following is the syntax for the Namespace Identifier. To (a) be consistent with all potential resolution schemes and (b) not put any undue constraints on any potential resolution scheme, the syntax for the Namespace Identifier is: <NID> ::= <let-num> [ 1,31<let-num-hyp> ] <let-num-hyp> ::= <upper> | <lower> | <number> | "-" <let-num> ::= <upper> | <lower> | <number> <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" This is slightly more restrictive that what is stated in [4] (which allows the characters "." and "+"). Further, the Namespace Identifier is case insensitive, so that "ISBN" and "isbn" refer to the same namespace. To avoid confusion with the "urn:" identifier, the NID "urn" is reserved and MUST NOT be used. Moats Standards Track [Page 2] RFC 2141 URN Syntax May 1997 2.2 Namespace Specific String Syntax As required by RFC 1737, there is a single canonical representation of the NSS portion of an URN. The format of this single canonical form follows: <NSS> ::= 1*<URN chars> <URN chars> ::= <trans> | "%" <hex> <hex> <trans> ::= <upper> | <lower> | <number> | <other> | <reserved> <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" <other> ::= "(" | ")" | "+" | "," | "-" | "." | ":" | "=" | "@" | ";" | "$" | "_" | "!" | "*" | "'" Depending on the rules governing a namespace, valid identifiers in a namespace might contain characters that are not members of the URN character set above (<URN chars>). Such strings MUST be translated into canonical NSS format before using them as protocol elements or otherwise passing them on to other applications. Translation is done by encoding each character outside the URN character set as a sequence of one to six octets using UTF-8 encoding [5], and theShow full document text