Skip to main content

Gordian dCBOR: Deterministic CBOR Implementation Practices
draft-mcnally-deterministic-cbor-01

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Wolf McNally , Christopher Allen
Last updated 2023-05-04 (Latest revision 2023-03-08)
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-mcnally-deterministic-cbor-01
Network Working Group                                         W. McNally
Internet-Draft                                                  C. Allen
Intended status: Experimental                         Blockchain Commons
Expires: 5 November 2023                                      4 May 2023

       Gordian dCBOR: Deterministic CBOR Implementation Practices
                  draft-mcnally-deterministic-cbor-01

Abstract

   CBOR has many advantages over other data serialization formats.  One
   of its strengths is specifications and guidelines for serializing
   data deterministically, such that multiple agents serializing the
   same data automatically achieve consensus on the exact byte-level
   form of that serialized data.  Nonetheless, determinism is an opt-in
   feature of the specification, and most existing CBOR codecs put the
   primary burden of correct deterministic serialization and validation
   of deterministic encoding during deserialization on the engineer.
   This document specifies a set of norms and practices for CBOR codec
   implementors intended to support deterministic CBOR ("dCBOR") at the
   codec API level.

Discussion Venues

   This note is to be removed before publishing as an RFC.

   Source for this draft and an issue tracker can be found at
   https://github.com/BlockchainCommons/WIPs-IETF-draft-deterministic-
   cbor.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 5 November 2023.

McNally & Allen          Expires 5 November 2023                [Page 1]
Internet-Draft                    dCBOR                         May 2023

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Serialization Level . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Base Requirements . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Reduction of Floating Point Values to Integers  . . . . .   5
     3.3.  Reduction of NaNs and Infinities. . . . . . . . . . . . .   5
     3.4.  Reduction of BigNums to Integers  . . . . . . . . . . . .   6
     3.5.  CBOR_NEGATIVE_INT_MAX disallowed  . . . . . . . . . . . .   6
   4.  Application Level . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Optional/Default Values . . . . . . . . . . . . . . . . .   6
     4.2.  Tagging Items . . . . . . . . . . . . . . . . . . . . . .   7
   5.  Future Work . . . . . . . . . . . . . . . . . . . . . . . . .   7
   6.  API-Level Recommendations . . . . . . . . . . . . . . . . . .   7
     6.1.  General Practices for dCBOR Codecs  . . . . . . . . . . .   8
     6.2.  API Handling of Maps  . . . . . . . . . . . . . . . . . .   8
     6.3.  API Handling of Numeric Values  . . . . . . . . . . . . .   8
     6.4.  Validation Errors . . . . . . . . . . . . . . . . . . . .   9
   7.  Reference Implementations . . . . . . . . . . . . . . . . . .   9
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  10
     10.2.  Informative References . . . . . . . . . . . . . . . . .  10
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

McNally & Allen          Expires 5 November 2023                [Page 2]
Internet-Draft                    dCBOR                         May 2023

1.  Introduction

   The goal of determinism in data encoding is that multiple agents
   serializing the same data will automatically achieve consensus on the
   byte-level form of that serialized data.  Many data serialization
   formats give developers wide latitude on the serialized form, for
   example:

   *  The use of whitespace in JSON, which may be omitted or used to
      taste.

   *  The key-value pairs of map/dictionary structures are usually
      considered unordered.  Therefore their order of serialization is
      taken to be semantically insignificant and so varies depending on
      the implementation.

   *  Standards for the binary encoding of floating point numeric values
      often include bit patterns that are functionally equivalent, such
      as 0.0 and -0.0 or NaN and signalling NaN.

   *  The number of bytes used to encode an integer or floating point
      value; e.g., in well-formed CBOR there are four valid ways to
      encode the integer 1 and three valid ways to encode the floating
      point value 1.0 giving a total of seven valid ways to encode the
      semantic concept 1.0.  In JSON the problem is even worse, given
      that 1, 1., 1.0, 1.00, 1.000, etc. are equivalent representations
      of the same value.

   Each of these choices made differently by separate agents yield
   different binary serializations that cannot be compared based on
   their hash values, and which therefore must be separately parsed and
   validated semantically field-by-field to decide whether they are
   identical.  Such fast comparison for identicality using hashes is
   important in certain classes of application, where the hash is
   published or incorporated into other documents, hence "freezing" the
   form of the document.  Where the hash is known or fixed, it is
   impossible to substitute a different document for the original that
   differs by even a single bit.

   The CBOR standard addresses this problem in [RFC8949] §4.2, by
   narrowing the scope of choices available for encoding various values,
   but does not specify a set of norms and practices for CBOR codec
   implementors who value the benefits of deterministic CBOR,
   hereinafter called "dCBOR".

   This document's goal is to specify such a set of norms and practices
   for dCBOR codec implementors.

McNally & Allen          Expires 5 November 2023                [Page 3]
Internet-Draft                    dCBOR                         May 2023

   It is important to stress that dCBOR is _not_ a new dialect of CBOR,
   and that all dCBOR is well-formed CBOR that can be read by existing
   CBOR codecs.

   This document is segmented into four sections.  They include norms
   and practices that:

   *  MUST be implemented in the codec (Serialization level),

   *  MUST be implemented by developers of specifications dependent on
      dCBOR (Application level).

   *  are acknowledged to fall under the purview of this document, but
      which are not yet specified (Future work).

   *  are RECOMMENDED for dCBOR codec implementors (Recommendations).

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   This specification makes use of the following terminology:

   byte  Used in its now-customary sense as a synonym for "octet".

   codec  "coder-decoder", a software suite that both encodes
      (serializes) and decodes (deserializes) a data format.

   dCBOR  "deterministic CBOR" encoded in conformance with the CBOR
      specification in this document.

   insert/extract  To convert platform-native or application-centric
      data structures to/from an in-memory symbolic representation of
      CBOR.

   serialize/deserialize  To convert an in-memory symbolic
      representation of CBOR to/from a byte stream.

3.  Serialization Level

   This section defines requirements and practices falling in the
   purview of the dCBOR codec.

McNally & Allen          Expires 5 November 2023                [Page 4]
Internet-Draft                    dCBOR                         May 2023

3.1.  Base Requirements

   dCBOR encoders MUST only emit CBOR conforming to the requirements of
   [RFC8949] §4.2.1.  To summarize:

   *  Variable-length integers MUST be as short as possible.

   *  Floating-point values MUST use the shortest form that preseves the
      value.

   *  Indefinite-length arrays and maps MUST NOT be used.

   *  Map keys MUST be sorted in bytewise lexicographic order of their
      deterministic encodings.

   dCBOR codecs MUST validate and return errors for any CBOR that is not
   conformant.

3.2.  Reduction of Floating Point Values to Integers

   While there is no requirement that dCBOR codecs implement support for
   floating point numbers, dCBOR codecs that do support them MUST reduce
   floating point values with no fractional part to the smallest integer
   value that can accurately represent it.  If a numeric value has a
   fractional part or an exponent that takes it out of the range of
   representable integers, then it SHALL be encoded as a floating point
   value.

   This practice still produces well-formed CBOR according to the
   standard, and all existing implementations will be able to read it.
   It does exclude a map such as the following from being validated as
   dCBOR, as it would have a duplicate key:

   {
      10: "ten",
      10.0: "floating ten"
   }

3.3.  Reduction of NaNs and Infinities.

   [IEEE754] defines the NaN (Not a Number) value [NAN].  This is
   usually divided into two types: _quiet NaNs_ and _signalling NaNs_,
   and the sign bit is used to distinguish between these two types.
   However, the specification also includes a range of "payload" bits.
   These bit fields have no definite purpose and could be used to break
   CBOR determinism.

McNally & Allen          Expires 5 November 2023                [Page 5]
Internet-Draft                    dCBOR                         May 2023

   dCBOR encoders that support floating point MUST reduce all NaN values
   to the half-width quiet NaN value having the canonical bit pattern
   0x7e00.

   Similarly, encoders that support floating point MUST reduce all +INF
   values to the half-width +INF having the canonical bit pattern 0x7c00
   and likewise with -INF to 0xfc00.

3.4.  Reduction of BigNums to Integers

   While there is no requirement that dCBOR codecs implement support for
   BigNums ≥ 2^64 (tags 2 and 3), codecs that do support them MUST use
   regular integer encodings where integers can represent the value.

3.5.  CBOR_NEGATIVE_INT_MAX disallowed

   The largest negative integer that can be represented in 64 bits two's
   complement (STANDARD_NEGATIVE_INT_MAX) is -2^63 (0x8000000000000000).

   However, the largest negative integer that can be represented in CBOR
   (CBOR_NEGATIVE_INT_MAX) is -2^64 (0x10000000000000000), which
   requires 65 bits.  The CBOR encoding for CBOR_NEGATIVE_INT_MAX is
   0x3BFFFFFFFFFFFFFFFF.

   Because of this incompatibility between the CBOR and standard
   representations, dCBOR disallows CBOR_NEGATIVE_INT_MAX: conformant
   encoders MUST never encode this sequence and conformant decoders MUST
   reject CBOR_NEGATIVE_INT_MAX as not well-formed.

   Implementations that support BIGNUM are able to encode and decode
   this value as BIGNUM.

4.  Application Level

4.1.  Optional/Default Values

   Protocols that depend on dCBOR MUST specify the optionality and
   semantics of field values.  In key-value paired structures like CBOR
   maps, protocols MUST specify whether the field:

   *  REQUIRED and the value MUST NOT be null.

   *  OPTIONAL but if present the value MUST NOT be null.

   *  REQUIRED and the value MAY be null.

   *  OPTIONAL and the value MAY be null.

McNally & Allen          Expires 5 November 2023                [Page 6]
Internet-Draft                    dCBOR                         May 2023

   In the last case, the protocol specifier MUST state the semantic
   difference between the field being not present at all, and being
   present but having a null value.  For example, in a map representing
   user preferences:

   *  The absence of the field means the user needs to be asked for
      their preference,

   *  The presence of the field with a null value means the user has
      been asked, but specified that they accept the current default.

   *  If the field is present and the value is non-null, the user would
      have affirmatively specified a preference.

   The rationale for this specificity is to remove semantic ambiguity
   and eliminate the choice over whether to encode a key-value pair
   where the value is null or omit it entirely.

4.2.  Tagging Items

   Protocols that depend on dCBOR MUST specify the circumstances under
   which a data item MUST or MUST NOT be tagged.

   The codec API SHOULD afford conveniences such as protocol
   conformances that allow the association of a tag with a particular
   data type.  The encoder MUST use such an associated tag when
   serializing, and the decoder MUST expect the associated tag when
   extracting a structure of that type.

5.  Future Work

   The following issues are currently left for future work:

   *  How to deal with subnormal floating point values [SUBNORMAL].

6.  API-Level Recommendations

   This section is informative.

   Many existing CBOR implementations give little or no guidance at the
   API level as to whether the CBOR being read conforms to the CBOR
   specification for deterministic encoding [RFC8949] §4.2, for example
   by emitting errors or warnings at deserialization time.  Conversely,
   many existing implementations do not carry any burden of ensuring
   that CBOR is serialized in conformance with the CBOR determinstic
   encoding specification, again putting that burden on developers.

McNally & Allen          Expires 5 November 2023                [Page 7]
Internet-Draft                    dCBOR                         May 2023

   The authors of this document believe that for applications where
   dCBOR correctness as specified in this document is important, the
   codec itself should carry as much of this burden as possible.  This
   is important both to minimize cognitive load during development, and
   help ensure interoperability between implementations.

6.1.  General Practices for dCBOR Codecs

   It is RECOMMENDED that dCBOR codecs:

   *  Make it easy to emit compliant dCBOR.

   *  Make it hard to emit non-compliant dCBOR.

   *  Make it an error to read non-compliant dCBOR.

6.2.  API Handling of Maps

   It is RECOMMENDED that dCBOR APIs provide a dCBOR Map structure or
   similar that models the dCBOR canonical key encoding and order:

   *  Supports insertion of unencoded key-value pairs.

   *  Supports iteration through entries in dCBOR canonical key order.

   *  Supports treating keys as duplicate that have identical dCBOR
      encodings, e.g., 10 and 10.0.

   The dCBOR decoder SHOULD return an error if it encounters misordered
   or duplicate map keys.

6.3.  API Handling of Numeric Values

   The authors do make the following recommendations:

   *  The encoder API SHOULD accept any supported numeric type for
      insertion into the CBOR stream and decide the dCBOR-conformant
      form for its encoding.

   *  The API SHOULD allow any supported numeric type to be extracted,
      and return errors when the actual type encountered is not
      representable in the requested type.  For example,

      -  If the encoded value is "1.5" then requesting extraction of the
         value as floating point will succeed but requesting extraction
         as an integer will fail.

McNally & Allen          Expires 5 November 2023                [Page 8]
Internet-Draft                    dCBOR                         May 2023

      -  Similarly, if the value has a large exponent and therefore can
         be represented as either a floating point value or a BigNum,
         then attempting to extract it as a machine integer will fail.

6.4.  Validation Errors

   It is RECOMMENDED that a dCBOR decoder return errors when it
   encounters any of these conditions in the input stream:

   *  underrun: early end of stream

   *  badHeaderValue: unsupported CBOR major/minor item header

   *  nonCanonicalNumeric: An integer, floating-point value, or BigNum
      was encoded in non-canonical form

   *  invalidString: An invalid UTF-8 string was encountered

   *  unusedData: Unused data encountered past the expected end of the
      input stream

   *  misorderedMapKey: A map has keys not in canonical order

   *  duplicateMapKey: A map has a duplicate key

7.  Reference Implementations

   This section is informative.

   The current reference implementations that conform to these
   specifications are:

   *  Swift implementation [SwiftDCBOR]

   *  Rust implementation [RustDCBOR]

8.  Security Considerations

   This document inherits the security considerations of CBOR [RFC8949].

   Vulnerabilities regarding dCBOR will revolve around whether an
   attacker can find value in either:

   *  producing semantically different documents that are serialized
      using identical byte streams, or

   *  producing semantically equivalent documents that are nonetheless
      serialized into non-identical byte streams

McNally & Allen          Expires 5 November 2023                [Page 9]
Internet-Draft                    dCBOR                         May 2023

   The first consideration is unlikely due to the Law of Identity (A is
   A).  The second consideration could indicate the failure of a dCBOR
   decoder to correctly validate according to this document, or the
   failure of the developer to properly specify or implement
   application-level requirements for dCBOR.  Whether these
   possibilities present an identifiable attack surface is an open
   question that developers should consider.

9.  IANA Considerations

   This document makes no requests of IANA.

   We considered requesting a new media type [RFC6838] for deterministic
   CBOR, e.g., application/d+cbor, but chose not to pursue this as all
   dCBOR is well-formed CBOR.  Therefore, existing CBOR codecs can read
   dCBOR, and many existing codecs can also write dCBOR if the encoding
   rules are observed.  Protocols that adopt dCBOR will simply have more
   stringent requirments for the CBOR they emit and ingest.

10.  References

10.1.  Normative References

   [IEEE754]  "IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE
              Std 754-2019, DOI 10.1109/IEEESTD.2019.8766229", n.d.,
              <https://ieeexplore.ieee.org/document/8766229>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
              Specifications and Registration Procedures", BCP 13,
              RFC 6838, DOI 10.17487/RFC6838, January 2013,
              <https://www.rfc-editor.org/rfc/rfc6838>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

   [RFC8949]  Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", STD 94, RFC 8949,
              DOI 10.17487/RFC8949, December 2020,
              <https://www.rfc-editor.org/rfc/rfc8949>.

10.2.  Informative References

McNally & Allen          Expires 5 November 2023               [Page 10]
Internet-Draft                    dCBOR                         May 2023

   [NAN]      "NaN", n.d., <https://en.wikipedia.org/wiki/NaN>.

   [RustDCBOR]
              "Deterministic CBOR ("dCBOR") for Rust.", n.d.,
              <https://github.com/BlockchainCommons/bc-dcbor-rust>.

   [SUBNORMAL]
              "Subnormal number", n.d.,
              <https://en.wikipedia.org/wiki/Subnormal_number>.

   [SwiftDCBOR]
              "Deterministic CBOR ("dCBOR") for Swift.", n.d.,
              <https://github.com/BlockchainCommons/BCSwiftDCBOR>.

Acknowledgments

   TODO acknowledge.

Authors' Addresses

   Wolf McNally
   Blockchain Commons
   Email: wolf@wolfmcnally.com

   Christopher Allen
   Blockchain Commons
   Email: christophera@lifewithalacrity.com

McNally & Allen          Expires 5 November 2023               [Page 11]