Internet-Draft dCBOR August 2023
McNally & Allen Expires 9 February 2024 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-mcnally-deterministic-cbor-05
Published:
Intended Status:
Experimental
Expires:
Authors:
W. McNally
Blockchain Commons
C. Allen
Blockchain Commons

Gordian dCBOR: A Deterministic CBOR Application Profile

Abstract

CBOR (RFC 8949) defines "Deterministically Encoded CBOR" in its Section 4.2. The present document provides the application profile "dCBOR" that can be used to help achieve interoperable deterministic encoding.

Discussion Venues

This note is to be removed before publishing as an RFC.

Source for this draft and an issue tracker can be found at https://github.com/BlockchainCommons/WIPs-IETF-draft-deterministic-cbor.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 9 February 2024.

1. Introduction

CBOR [RFC8949] has many advantages over other data serialization formats. One of its strengths is specifications and guidelines for serializing data deterministically, such that multiple agents serializing the same data automatically achieve consensus on the exact byte-level form of that serialized data. This is particularly useful when data must be compared for semantic equivalence by comparing the hash of its contents.

Nonetheless, determinism is an opt-in feature of CBOR, and most existing CBOR codecs put the primary burden of correct deterministic serialization and validation of deterministic encoding during deserialization on the engineer. This document specifies a set of requirements for the application profile "dCBOR" that MUST be implemented at the codec level. These requirements include but go beyond [RFC8949] §4.2.

1.1. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Application Profile

The dCBOR Application Profile specifies the use of Deterministic Encoding as defined in Section 4.2 of [RFC8949] together with some application-level rules specified in this section.

The application-level rules specified here do not "fork" CBOR. A dCBOR implementation produces well-formed, deterministically encoded CBOR according to [RFC8949], and existing generic CBOR decoders will therefore be able to decode it, including those that check for deterministic encoding. Similarly, generic CBOR encoders will be able to produce valid dCBOR if handed dCBOR conforming data model level information from an application.

Note that the separation between standard CBOR processing and the processing required by the dCBOR application profile is a conceptual one: Both dCBOR processing and standard CBOR processing may be combined into a unified dCBOR/CBOR codec. The requirements in this document apply to encoding or decoding of dCBOR data, regardless of whether the codec is a unified dCBOR/CBOR codec operating in dCBOR-compliant modes, or a single-purpose dCBOR codec. Both of these are generically referred to as "dCBOR codecs" in this document.

This application profile is intended to be used in conjunction with an application, which typically will use a subset of CBOR, which in turn influences which subset of the application profile is used. As a result, this application profile places no direct requirement on what subset of CBOR is implemented. For instance, there is no requirement that dCBOR implementations support floating point numbers (or any other kind of number, such as arbitrary precision integers or 64-bit negative integers) when they are used with applications that do not use them. However, this document does place requirements on dCBOR implementations that support negative 64-bit integers and 64-bit or smaller floating point numbers.

2.1. Base Requirements

dCBOR encoders MUST only emit CBOR conforming to the requirements "Core Deterministic Encoding Requirements" of [RFC8949] §4.2.1. To summarize,

dCBOR encoders:

  1. MUST encode variable-length integers using the shortest form possible.
  2. MUST encode floating-point values using the shortest form that preserves the value.
  3. MUST NOT encode indefinite-length arrays or maps.
  4. MUST sort map keys in bytewise lexicographic order of their deterministic encodings.

In addition, dCBOR decoders:

  1. MUST reject any variable length integers that are not encoded in the shortest form possible.
  1. MUST reject any floating-point values that are not encoded in the shortest form that preserves the value.
  1. MUST reject any indefinite-length arrays or maps.
  1. MUST reject any maps whose keys are not sorted in bytewise lexicographic order of their deterministic encodings.

2.2. Duplicate Map Keys

Standard CBOR [RFC8949] defines maps with duplicate keys as invalid, but leaves how to handle such cases to the implementor (§2.2, §3.1, §5.4, §5.6).

dCBOR encoders:

  1. MUST NOT emit CBOR that contains duplicate map keys.

dCBOR decoders:

  1. MUST reject encoded maps with duplicate keys.

2.3. Numeric Reduction

dCBOR codecs that support floating point numbers (CBOR major type 7):

  1. MUST support floating point [IEEE754] binary16 as the most-preferred encoding for floating point values, followed by binary32, then binary64.

dCBOR encoders that support floating point numbers:

  1. MUST reduce floating point values with no fractional part to the shortest integer encoding that can accurately represent them.
  1. MUST reduce floating point values with a non-zero fractional part to the shortest floating point encoding that can accurately represent them.

dCBOR decoders that support floating point numbers:

  1. MUST reject any encoded floating point values that are not encoded as the shortest encoding that can accurately represent them.

The above rules still produce well-formed CBOR according to the standard, and all existing generic decoders will be able to read it. It does exclude a map such as the following from being validated as dCBOR, even though it would be allowed in standard CBOR because:

  • 10.0 is an invalid numeric value in dCBOR, and
  • using the unsigned integer value 10 more than once as a map key is not allowed.
{
   10: "ten",
   10.0: "floating ten"
}

2.3.1. Reduction of Negative Zero

[IEEE754] defines a negative zero value -0.0.

dCBOR encoders that support floating point:

  1. MUST reduce all negative zero values to the integer value 0.

dCBOR decoders that support floating point:

  1. MUST reject any encoded negative zero values.

Therefore with dCBOR, 0.0, -0.0, and 0 all encode to the same canonical single-byte value 0x00.

2.3.2. Reduction of NaNs and Infinities

[IEEE754] defines the NaN (Not a Number) value [NAN]. This is usually divided into two types: quiet NaNs and signalling NaNs, and the sign bit is used to distinguish between these two types. The specification also includes a range of "payload" bits. These bit fields have no definite purpose and could be used to break determinism or exfiltrate data.

dCBOR encoders that support floating point:

  1. MUST reduce all NaN values to the binary16 quiet NaN value having the canonical bit pattern 0x7e00.
  2. MUST reduce all +INF values to the binary16 +INF having the canonical bit pattern 0x7c00.
  3. MUST reduce all -INF values to the binary16 -INF having the canonical bit pattern 0xfc00.

dCBOR decoders that support floating point:

  1. MUST reject any encoded NaN values not having the canonical bit pattern 0x7e00.
  1. MUST reject any encoded +INF values not having the canonical bit pattern 0x7c00.
  1. MUST reject any encoded -INF values not having the canonical bit pattern 0xfc00.

2.4. 65-bit Negative Integers

The largest negative integer that can be represented in 64-bit two's complement (STANDARD_NEGATIVE_INT_MAX) is -263 (0x8000000000000000).

However, standard CBOR major type 1 can encode negative integers as low as CBOR_NEGATIVE_INT_MAX, which is -264 (two's complement: 0x10000000000000000, CBOR: 0x3BFFFFFFFFFFFFFFFF).

Negative integers in the range [CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1] require 65 bits of precision, and are thus not representable in typical machine-sized integers.

Because of this incompatibility between standard CBOR and typical machine-size representations, dCBOR disallows encoding negative integer values in the range [CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1].

dCBOR encoders:

  1. MUST NOT encode these values as CBOR major type 1.

dCBOR decoders:

  1. MUST reject these encoded major type 1 CBOR values.

2.5. Simple Values

CBOR Major Type 7 includes the floating point values (0xf7, 0xfa, 0xfb) and also the "simple values" false (0xf4), true (0xf5), and null (0xf6).

dCBOR encoders:

  1. MUST NOT encode major type 7 values other than false, true, null, and the floating point values.

dCBOR decoders:

  1. MUST reject any encoded major type 7 values other than false, true, null, and the floating point values.

2.6. All Requirements are Narrowing

Any apparent conflict between the requirements above are resolved by understanding that all of the requirements in this document are narrowing, meaning that starting from the CBOR specification [RFC8949] each requirement herein narrows the set of valid dCBOR encodings.

For example: due to the requirements in §2.4., there are no valid dCBOR major type 1 values that can encode negative integers requiring more than 64 bits of precision, hence reduction of negative floating point values with no fractional part to negative integers (§2.3.) is narrowed to the range of valid dCBOR major type 1 negative integer encodings. Therefore any negative floating point values with no fractional part that fall outside this range are encoded as floating point values (§2.2.).

3. Reference Implementations

This section is informative.

These are single-purpose dCBOR codecs that conform to these specifications:

4. Security Considerations

This document inherits the security considerations of CBOR [RFC8949].

Vulnerabilities regarding dCBOR will revolve around whether an attacker can find value in producing semantically equivalent documents that are nonetheless serialized into non-identical byte streams. Such documents could be used to contain malicious payloads or exfiltrate sensitive data. The ability to create such documents could indicate the failure of a dCBOR decoder to correctly validate according to this document, or the failure of the developer to properly specify or implement application protocol requirements using dCBOR. Whether these possibilities present an identifiable attack surface is a question that developers should consider.

5. IANA Considerations

This document makes no requests of IANA.

6. Other Approaches

As of this writing the specification of deterministic CBOR beyond [RFC8949] is an active item before the CBOR working group. [BormannDCBOR] and [RundgrenDCBOR] are other approaches to deterministic CBOR.

7. References

7.1. Normative References

[IEEE754]
"IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE Std 754-2019, DOI 10.1109/IEEESTD.2019.8766229", n.d., <https://ieeexplore.ieee.org/document/8766229>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC8949]
Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <https://www.rfc-editor.org/rfc/rfc8949>.

7.2. Informative References

[BormannDCBOR]
"dCBOR – an Application Profile for Use with CBOR Deterministic Encoding", n.d., <https://www.ietf.org/archive/id/draft-bormann-cbor-dcbor-00.html>.
[NAN]
"NaN", n.d., <https://en.wikipedia.org/wiki/NaN>.
[RundgrenDCBOR]
"Deterministically Encoded CBOR (D-CBOR)", n.d., <https://www.ietf.org/archive/id/draft-rundgren-deterministc-cbor-02.html>.
[RustDCBOR]
"Deterministic CBOR ("dCBOR") for Rust.", n.d., <https://github.com/BlockchainCommons/bc-dcbor-rust>.
[SwiftDCBOR]
"Deterministic CBOR ("dCBOR") for Swift.", n.d., <https://github.com/BlockchainCommons/BCSwiftDCBOR>.
[TypescriptDCBOR]
"Deterministic CBOR ("dCBOR") for Typescript.", n.d., <https://github.com/BlockchainCommons/bc-dcbor-ts>.

Acknowledgments

The authors are grateful for the contributions of Carsten Bormann, Joe Hildebrand, and Anders Rundgren in the CBOR working group.

Authors' Addresses

Wolf McNally
Blockchain Commons
Christopher Allen
Blockchain Commons