Skip to main content

Compact, Grammar-Friendly Representations for UUIDs
draft-taylor-uuid-ncname-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Expired".
Author Dorian Taylor
Last updated 2020-07-26
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-taylor-uuid-ncname-00
Network Working Group                                          D. Taylor
Internet-Draft                                               Independent
Updates: RFC4122 (if approved)                              26 July 2020
Intended status: Informational                                          
Expires: 27 January 2021

          Compact, Grammar-Friendly Representations for UUIDs
                      draft-taylor-uuid-ncname-00

Abstract

   The Universally Unique Identifier is a suitable standard for, as the
   name suggests, uniquely identifying entities in a symbol space large
   enough that the identifiers do not collide.  The literal
   representation, however, specified in RFC 4122 and elsewhere, cannot
   be used in conjunction with a number of formal grammars where it
   would be beneficial to do so.  This document provides the UUID with
   two additional representations to make these applications possible.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 27 January 2021.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Taylor                   Expires 27 January 2021                [Page 1]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Motivation & Applications . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Strategy  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
     4.1.  Recognizing UUID-NCName Symbols . . . . . . . . . . . . .   4
     4.2.  Equivalency . . . . . . . . . . . . . . . . . . . . . . .   5
   5.  Algorithms  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     5.1.  Encoding Algorithm  . . . . . . . . . . . . . . . . . . .   5
     5.2.  Decoding Algorithm  . . . . . . . . . . . . . . . . . . .   6
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   8.  Normative References  . . . . . . . . . . . . . . . . . . . .   7
   9.  Informative References  . . . . . . . . . . . . . . . . . . .   7
   Appendix A.  Samples  . . . . . . . . . . . . . . . . . . . . . .   8
   Appendix B.  Implementations  . . . . . . . . . . . . . . . . . .   9
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   There are a number of places in formal languages where it would be
   useful to put UUIDs, but the grammar forbids it.  Many grammars
   forbid identifiers to begin with numbers, or contain hyphens, or
   contain colons (as with the URN representation in RFC 4122
   [RFC4122]).  The NCName production [XML-NAMES], which is pervasive in
   XML and RDF applications, is one such example.  Up until a recent
   change, the HTML ID production had similar constraints.  Virtually
   every programming language likewise requires identifiers such as
   variables and function names to start with a letter or underscore,
   and very few admit hyphens.  This constraint causes developers to
   turn to ad-hoc solutions when they want to use UUIDs in these places.

   This document specifies a representation - or rather, two
   representations - as well as the related transformations to and from
   the familiar UUID format.  A provisional name for these
   representations is _UUID-NCName_, with the two variants styled as
   _UUID-NCName-32_ and _UUID-NCName-64_, referring to the base of their
   respective encodings.  The goal of this specification is in part to
   eliminate an extra decision on the part of developers who find
   themselves in this position, and in part to provide alternative
   representations for UUIDs which remain valid but are shorter than the
   original.

Taylor                   Expires 27 January 2021                [Page 2]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

1.1.  Motivation & Applications

   The purpose of an identifier in general is to pick out some
   information resource or other, such that it can be referred to,
   ideally unambiguously.  The purpose of a large, generated identifier
   like the UUID, is to satisfy the uniqueness criterion while also
   specifying a datatype and normal form for said identifiers, and
   ultimately alleviate the need to sit down and think these identifiers
   up.  Why one would want to go inserting UUIDs in places they wouldn't
   otherwise fit, is so these UUIDs can be cross-referenced in some
   other database where they _do_ fit.  Consider:

   *  A component content management system that uses UUIDs to identify
      elementary content components, uses the UUID-NCName-64
      representations of the same UUIDs as fragment identifiers for when
      those components are transcluded.

   *  A literate programming system uses the UUID-NCName-32
      representation as stable identifiers for all symbols (variables,
      constants, class names, etc.), enabling said identifiers to be
      defined and described elsewhere, while still yielding
      syntactically-correct code.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Strategy

   Not all 128 bits of a UUID are data; rather, several bits are masked.
   The top four bits of the third segment, known as
   "time_hi_and_version", specify the UUID's version, which is fixed.
   Up to three high bits in the following segment, called
   "clock_seq_hi_and_reserved", specify the variant: how the UUID - if
   applicable - is meant to be read.  We remove these masked quartets
   (we take an extra bit for the variant) and use them as "bookends" for
   the rest of the identifier, mapping them to the first sixteen symbols
   of the Base32 table [RFC4648], which are all letters.  The remaining
   120 bits, which we bit-shift to close the gaps of the two masked
   quartets we removed, now divide evenly by both 5 and 6, the number of
   bits per character in Base32 and Base64, respectively.

Taylor                   Expires 27 January 2021                [Page 3]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

   The transformation takes the UUID
   "4abc6330-f548-4e67-b9f9-12d4323769cd", and returns the result
   "ESrxjMPVI5nn5EtQyN2nNL" for base64, and "ejk6ggmhvjdtht6is2qzdo2onl"
   for base32.  These symbols will always start and end with case-
   insensitive letters, and the entire base32 symbol is case-
   insensitive.

4.  Syntax

   Here is the ABNF grammar for the productions "uuid-ncname-32" and
   "uuid-ncname-64":

   uuid-ncname-32 = bookend 24base32 bookend
   uuid-ncname-64 = bookend 20base64url bookend
   bookend        = %x41-50 / %x61-70 ; [A-Pa-p]
   base32         = %x32-37 / %x41-5a / %x61-7a ; [2-7A-Za-z]
   base64url      = %x2d / %x30-39 / %x41-5a / %x5f / %x61-7a
                    ; [-0-9A-Z_a-z]

   "Bookends" are 4-bit sequences (nybbles, quartets, etc.) which we map
   directly onto the Base32 table from [RFC4648].  Indeed the this
   portion of the Base64 table is identical, though we say Base32 to
   underscore the fact that bookend characters are case-insensitive.
   Certain environments encode meaning into the case of the first
   character of a symbol, so it is important that its literal
   representation be flexible.  There is likewise little value in
   arbitrarily constraining the last character.  Nevertheless, UUID-
   NCName-64 symbols SHOULD be generated with upper-case bookend
   characters, while UUID-NCName-32 bookends (and indeed the entire
   symbol) SHOULD be lower-case.

4.1.  Recognizing UUID-NCName Symbols

   UUID-NCName symbols always have a fixed length and certain
   characteristics: UUID-NCName-32 symbols are always exactly 26
   characters long while UUID-NCName-64 symbols are always 22 characters
   long.  The version (first bookend character) is mapped to the Base32
   table where "A" is 0, so "B" is 1, etc.  Random (version 4) UUIDs
   will therefore always start with the letter "E".  Any value higher
   than "F" (version 5/truncated SHA-1 UUID) is unspecified (though
   there is room for future UUID specifications to go all the way up to
   version 15).  Likewise the variant bit-mask defined in [RFC4122] will
   cause the symbol to always end, modulo upper/lower-case, in "I", "J",
   "K", or "L" (8, 9, 10, 11).

Taylor                   Expires 27 January 2021                [Page 4]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

4.2.  Equivalency

   Two UUID-NCName symbols are necessarily identical if they produce the
   same UUID.  Two UUID-NCName-32 symbols are identical if their string
   values match when normalized to all upper- or lower-case letters.
   Two UUID-NCName-64 symbols are identical if their string values match
   when the bookend characters are normalized to either upper- or lower-
   case.

5.  Algorithms

   These are candidate algorithms for encoding and decoding the symbols,
   transforming them to and from the conventional UUID representation.
   There are certainly many equivalents.

5.1.  Encoding Algorithm

   First we apply the shifting algorithm:

   1.  Convert the UUID to a binary string "bin".

   2.  Convert "bin" to an array of four 32-bit unsigned network-endian
       integers "ints".

   3.  Extract "version" as "(ints[1] & 0x0000f000) >> 12".

   4.  Extract "variant" as "(ints[2] & 0xf0000000) >> 24".

   5.  Assign "ints[1] = (ints[1] & 0xffff0000) | ((ints[1] &
       0x00000fff) << 4) | ((ints[2] & 0x0fffffff) >> 24)".

   6.  Assign "ints[2] = (ints[2] & 0x00ffffff) << 8 | (ints[3] >> 24)".

   7.  Assign "ints[3] = (ints[3] << 8) | variant".

   8.  Convert "ints" back into a binary string and return it along with
       the "version".

   Then one of the formatting algorithms, here is Base64:

   1.  Take the binary string "bin" and shift the last octet to the
       right by two bits.

   2.  Encode "bin" with the base64url algorithm to get the string
       "b64".

   3.  Truncate "b64" to 21 characters.

Taylor                   Expires 27 January 2021                [Page 5]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

   4.  Convert "version" to its value in the base32 table.

   5.  return "version" concatenated to "b64".

   And Base32:

   1.  Take the binary string "bin" and shift the last octet to the
       right by one bit.

   2.  Encode "bin" with the base32 algorithm to get the string "b32".

   3.  Truncate "b32" to 25 characters.

   4.  Convert "version" to its value in the Base32 table.

   5.  Return "version" concatenated to "b32", optionally in either
       upper or lower case.

5.2.  Decoding Algorithm

   1.  First verify the syntax and determine whether the symbol "ncname"
       is base32 or base64.

   2.  If "ncname" is base64 and the last character is lowercase, set it
       to uppercase.

   3.  Remove the first character of the symbol "ncname" and convert it
       into an integer according to the base32 spec; call that integer
       "version".

   4.  Append padding if necessary to satisfy the decoder, "A======" for
       Base32 and "A==" for Base64.

   5.  Decode the remainder of "ncname" by either the base32 or
       base64url decoding algorithm into binary string "bin".

   6.  If "ncname" was base32, shift the last octet of "bin" one bit to
       the left; if base64 shift it two bits.

   Now we apply the shifting algorithm in reverse:

   1.   Ensure "version" is in the range of 0-15 by masking it with
        "0xf".

   2.   Convert the binary string "bin" into four 32-bit unsigned
        network-endian integers "ints".

   3.   Assign "variant = (ints[3] & 0xf0) << 24".

Taylor                   Expires 27 January 2021                [Page 6]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

   4.   Shift and assign "ints[3] >>= 8".

   5.   Union and assign "ints[3] |= ((ints[2] & 0xff) << 24)".

   6.   Shift and assign "ints[2] >>= 8".

   7.   Union and assign "ints[2] |= ((ints[1] & 0xf) << 24) | variant".

   8.   Assign "ints[1] = (ints[1] & 0xffff0000) | (version << 12) |
        ((ints[1] >> 4) & 0xfff)".

   9.   Convert "ints" back into the new binary string "bin".

   10.  Format "bin" as a UUID.

6.  IANA Considerations

   There are no discernible IANA considerations associated with this
   specification.

7.  Security Considerations

   As UUID-NCName symbols are isomorphic to their conventional UUID
   representations, the security considerations for these symbols also
   the same as [RFC4122], though we repeat here the admonition not to
   assume that UUIDs are hard to guess.

8.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4122]  Leach, P., Mealling, M., and R. Salz, "A Universally
              Unique IDentifier (UUID) URN Namespace", RFC 4122,
              DOI 10.17487/RFC4122, July 2005,
              <https://www.rfc-editor.org/info/rfc4122>.

   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
              <https://www.rfc-editor.org/info/rfc4648>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

9.  Informative References

Taylor                   Expires 27 January 2021                [Page 7]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

   [XML-NAMES]
              Bray, T., Hollander, D., Layman, A., Tobin, R., and H S.
              Thompson, "Namespaces in XML 1.0 (Third Edition)", 8
              December 2009,
              <https://www.w3.org/TR/2009/REC-xml-names-20091208/>.

Appendix A.  Samples

       +===================+======================================+
       | Version           | Canonical UUID Representation        |
       +===================+======================================+
       | 0, Nil            | 00000000-0000-0000-0000-000000000000 |
       +===================+--------------------------------------+
       | 1, Timestamp      | ca6be4c8-cbaf-11ea-b2ab-00045a86c8a1 |
       +===================+--------------------------------------+
       | 2, DCE "Security" | 000003e8-cbb9-21ea-b201-00045a86c8a1 |
       +===================+--------------------------------------+
       | 3, MD5            | 3d813cbb-47fb-32ba-91df-831e1593ac29 |
       +===================+--------------------------------------+
       | 4, Random         | 01867b2c-a0dd-459c-98d7-89e545538d6c |
       +===================+--------------------------------------+
       | 5, SHA-1          | 21f7f8de-8051-5b89-8680-0195ef798b6a |
       +===================+--------------------------------------+

            Table 1: Samples of canonical UUID representations

   +============+============================+========================+
   | Version    | Base32                     | Base64                 |
   +============+============================+========================+
   | 0, Nil     | aaaaaaaaaaaaaaaaaaaaaaaaaa | AAAAAAAAAAAAAAAAAAAAAA |
   +============+----------------------------+------------------------+
   | 1,         | bzjv6jsglv4pkfkyaarninsfbl | BymvkyMuvHqKrAARahsihL |
   | Timestamp  |                            |                        |
   +============+----------------------------+------------------------+
   | 2, DCE     | caaaah2glxepkeaiaarninsfbl | CAAAD6Mu5HqIBAARahsihL |
   | "Security" |                            |                        |
   +============+----------------------------+------------------------+
   | 3, MD5     | dhwatzo2h7mv2dx4ddykzhlbjj | DPYE8u0f7K6Hfgx4Vk6wpJ |
   +============+----------------------------+------------------------+
   | 4, Random  | eagdhwlfa3vm4rv4j4vcvhdlmj | EAYZ7LKDdWcjXieVFU41sJ |
   +============+----------------------------+------------------------+
   | 5, SHA-1   | feh37rxuakg4jnaabsxxxtc3ki | FIff43oBRuJaAAZXveYtqI |
   +============+----------------------------+------------------------+

             Table 2: Samples of UUID-NCName representations

Taylor                   Expires 27 January 2021                [Page 8]
Internet-DraCompact, Grammar-Friendly Representations for UUI  July 2020

Appendix B.  Implementations

   As of this writing, there are two implementations of UUID-NCName:

   *  Perl, https://metacpan.org/pod/Data::UUID::NCName

   *  Ruby, https://rubygems.org/gems/uuid-ncname

Author's Address

   Dorian Taylor
   Independent

   Email: ietf@doriantaylor.com
   URI:   https://doriantaylor.com/

Taylor                   Expires 27 January 2021                [Page 9]