Internet-Draft | Envelope | December 2022 |
McNally & Allen | Expires 4 June 2023 | [Page] |
- Workgroup:
- Network Working Group
- Internet-Draft:
- draft-mcnally-envelope-00
- Published:
- Intended Status:
- Experimental
- Expires:
The Envelope Structured Data Format
Abstract
The envelope
protocol specifies a structured format for hierarchical binary data focused on the ability to transmit it in a privacy-focused way. Envelopes are designed to facilitate "smart documents" and have a number of unique features including: easy representation of a variety of semantic structures, a built-in Merkle-like digest tree, deterministic representation using CBOR, and the ability for the holder of a document to selectively encrypt or elide specific parts of a document without invalidating the document structure including the digest tree, or any cryptographic signatures that rely on it.¶
Discussion Venues
This note is to be removed before publishing as an RFC.¶
Source for this draft and an issue tracker can be found at https://github.com/BlockchainCommons/envelope-internet-draft.¶
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 June 2023.¶
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
1. Introduction
Gordian Envelope was designed with two key goals in mind: to be Structure-Ready, allowing for the reliable and interopable storage of information; and to be Privacy-Ready, ensuring that transmission of that data can occur in a privacy-protecting manner.¶
- Structure-Ready. Gordian Envelope is designed as a Smart Document, meant to store information about a subject. More than that, it's a meta-document that can contain or refer to other documents. It can support multiple data formats, from simple hierarchical structures to labeled property graphs, semantic triples, and other forms of structured graphs. Though its fundamental structure is a tree, it can be used to create Directed Acyclic Graphs (DAGs) through references between Envelopes.¶
- Privacy-Ready. Gordian Envelope protects the privacy of its data through progressive trust, allowing for holders to minimally disclose information by using elision or encryption, and then to optionally increase that disclosure over time. The fact that a holder can control data revelation, not just an issuer, creates a new level of privacy for all stakeholders. The progressive trust in Gordian Envelopes is accomplished through hashing of all elements, which creates foundational support for cryptographic functions such as signing and encryption, without actually defining which cryptographic functions must be used.¶
The following architectural decisions support these goals:¶
- Structured Merkle Tree. A variant of the Merkle Tree structure is created by forming the hashing of the elements in the Envelope into a tree of digests. (In this "structured Merkele Tree", all nodes contain both semantic content and digests, rather than semantic content being limited to leaves.)¶
- Deterministic Representation. There is only one way to encode any semantic representation within a Gordian Envelope. This is accomplished through the use of Deterministic CBOR and the sorting of the Envelope by hashes to create a lexicographic order. Any Envelope that doesn't follow these strict rules can be rejected; as a result, there's no need to worry about different people adding the assertions in a different order or at different times: if two Envelopes contain the same data, they will be encoded the same way.¶
1.1. Elision Support
- Elision of All Elements. Gordian Envelopes innately support elision for any part of its data, including subjects, predicates, and objects.¶
- Elision, Compression, and Encryption. Elision can be used for a variety of purposes including redaction (removing information), compression (removing duplicate information), and encryption (enciphering information).¶
- Holder-initiated Elision. Elision can be performed by the Holder of a Gordian Envelope, not just the Issuer.¶
- Granular Holder Control. Elision can not only be performed by any Holder, but also for any data, allowing each entity to elide data as is appropriate for the management of their personal (or business) risk.¶
- Progressive Trust. The elision mechanics in Gordian Envelopes allow for progressive trust, where increasing amounts of data are revealed over time, and can be combined with encryption to escrow data to later be revealed.¶
- Consistent Hashing. Even when elided or encrypted, hashes for those parts of the Gordian Envelope remain the same.¶
1.2. Privacy Support
- Proof of Inclusion. As an alternative to presenting elided structures, proofs of inclusion can be included in top-level hashes.¶
- Herd Privacy. Proofs of inclusion allow for herd privacy where all members of a class can share data such as a VC or DID without revealing individual information.¶
- Non-Correlation. Encrypted Gordian Envelope data can optionally be made less correlatable with the addition of salt.¶
1.3. Authentication Support
- Symmetric Key Permits. Gordian Envelopes can be locked ("closed") using a symmetric key.¶
- SSKR Permits. Gordian Envelopes can alternatively be locked ("closed") using a symmetric key sharded with Shamir's Secret Sharing, with the shares stored with copies of the Envelope, and the whole enveloped thus openable if copies of the Envelope with a quorum of different shares are gathered.¶
- Public Key Permits. Gordian Envelopes can alternatively be locked ("closed") with a public key and then be opened with the associated private key, or vice versa.¶
- Multiple Permits. Gordian Envelopes can simultaneously be locked ("closed") via a variety of means and then openable by any appropriate individual method, with different methods likely held by different people.¶
1.4. Future Looking
- Data Storage. The initial inspiration for Gordian Envelopes was for secure data storage.¶
- Credentials & Presentations. The usage of Gordian Envelope signing techniques allows for the creation of credentials and the ability to present them to different verifiers in different ways.¶
- Distributed or Decentralized Identifiers. Self-Certifying Identifiers (SCIDs) can be created and shared with peers, certified with a trust authority, or registered on blockchain.¶
- Future Techniques. Beyonds its technical specifics, Gordian Envelopes still allows for cl-sigs, bbs+, and other privacy-preserving techniques such as zk-proofs, differential privacy, etc.¶
- Cryptography Agnostic. Generally, the Gordian Envelope architecture is cryptography agnostic, allowing it to work with everything from older algorithms with silicon support through more modern algorithms suited to blockchains and to future zk-proof or quantum-attack resistent cryptographic choices. These choices are made in sets via ciphersuites.¶
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This specification makes use of the following terminology:¶
3. Envelope Format Specification
This section is normative, and specifies the binary format of envelopes in terms of its CBOR components and their sequencing. The formal language used is the Concise Data Definition Language (CDDL) [RFC8610]. To be considered a well-formed envelope, a sequence of bytes MUST be well-formed deterministic CBOR [RFC8949] and MUST conform to the specifications in this section.¶
3.1. Top Level
An envelope is a tagged enumerated type with seven cases. Four of these cases have no children:¶
Two of these cases, encrypted
and elided
"declare" their digest, i.e., they actually encode their digest in the envelope serialization. For all other cases, their digest is implicit in the data itself and may be computed and cached by implementations when an envelope is deserialized.¶
The other three cases have one or more children:¶
- The
node
case has a child for itssubject
and an additional child for each of itsassertion
s.¶ - The
wrapped-envelope
case has exactly one child: the envelope that has been wrapped.¶ - The
assertion
case has exactly two children: thepredicate
and theobject
.¶
envelope = #6.200( envelope-content ) envelope-content = ( leaf / known-value / encrypted / elided / node / wrapped-envelope / assertion )¶
3.2. Cases Without Children
3.2.1. Leaf Case Format
A leaf
case is used when the envelope contains only user-defined CBOR content. It is tagged using #6.24, per [RFC8949] section 3.4.5.1, "Encoded CBOR Data Item".¶
leaf = #6.24(bytes)¶
3.2.2. Known Value Case Format
A known-value
case is used to specify an unsigned integer in a namespace of well-known values. Known values are frequently used as predicates. Any envelope can be used as a predicate in an assertion, but many predicates are commonly used, e.g., verifiedBy
for signatures, hence it is desirable to keep common predicates short.¶
known-value = #6.223(uint)¶
3.2.3. Encrypted Case Format
An encrypted
case is used for an envelope that has been encrypted using an Authenticated Encryption with Associated Data (AEAD), and where the digest of the plaintext is declared by the encrypted structure's Additional Authenticated Data (AAD) field. This subsection specifies the construct used in the current reference implementation and is informative.¶
encrypted = crypto-msg¶
For crypto-msg
, the reference implementation [ENVELOPE-REFIMPL] uses the definition in "UR Type Definition for Secure Messages" [CRYPTO-MSG] and we repeat the salient specification here. This format specifies the use of "ChaCha20 and Poly1305 for IETF Protocols" as described in [RFC8439]. When used with envelopes, the crypto-msg
construct aad
(additional authenticated data) field contains the digest
of the plaintext, authenticating the declared digest using the Poly1305 MAC.¶
crypto-msg = #6.201([ ciphertext, nonce, auth, ? aad ]) ciphertext = bytes ; encrypted using ChaCha20 aad = digest ; Additional Authenticated Data nonce = bytes .size 12 ; Random, generated at encryption-time auth = bytes .size 16 ; Authentication tag created by Poly1305¶
3.2.4. Elided Case Format
An elided
case is used as a placeholder for an element that has been elided and its digest, produced by a cryptographic hash algorithm is left as a placeholder. This subsection specifies the construct used in the current reference implementation and is informative.¶
elided = digest¶
For digest
, the reference implementation [ENVELOPE-REFIMPL] uses of the BLAKE3 cryptographic hash function [BLAKE3] to generate a 32 byte digest.¶
digest = #6.203(blake3-digest) blake3-digest = bytes .size 32¶
3.3. Cases With Children
3.3.1. Node Case Format
A node
case is encoded as a CBOR array, and MUST be used when one or more assertions are present on the envelope. It MUST NOT be present when there is not at least one assertion. The first element of the array is the envelope's subject
, Followed by one or more assertion-element
s, each of which MUST be an assertion
, or the encrypted
or elided
transformation of that assertion. The assertion elements MUST appear in ascending lexicographic order by their digest. The array MUST NOT contain any assertion elements with identical digests.¶
node = [envelope-content, + assertion-element] assertion-element = ( assertion / encrypted / elided )¶
3.3.2. Wrapped Envelope Case Format
A wrapped-envelope
case is used where an envelope including all its assertions should be treated as a single element, e.g. for the purpose of signing.¶
wrapped-envelope = #6.224(envelope-content)¶
3.3.3. Assertion Case Format
An assertion
case is used for each of the assertions in an envelope. It is encoded as a CBOR array with exactly two elements in order:¶
- the envelope representing the predicate of the assertion, followed by¶
- the envelope representing the object of the assertion.¶
assertion = #6.221([predicate-envelope, object-envelope]) predicate-envelope = envelope object-envelope = envelope¶
4. Computing the Digest Tree
This section specifies how the digests for each of the envelope cases are computed. The minimum size of the digest and order of operations specified is normative, but the specific cryptographic hash algorithm used by the reference implementation [BLAKE3] is informative. When implementing using BLAKE3, the examples in this section may be used as test vectors.¶
Each of the seven enumerated envelope cases produces an image which is used as input to a cryptographic hash function to produce a digest of its contents.¶
The overall digest of an envelope is the digest of its specific case.¶
In this and subsequenct sections:¶
-
digest(image)
is the BLAKE3 hash function that produces a 32-byte digest.¶ - The
.digest
attribute is the digest of the named element computed as specified herein.¶ - The
||
operator represents contactenation of byte sequences.¶
4.1. Leaf Case Digest Calculation
The leaf
case consists of any CBOR object. Tagging the leaf CBOR is RECOMMENDED, especially for compound structures with a specified layout. The envelope image is the CBOR serialization of that object:¶
digest(cbor)¶
4.1.1. Example
The CBOR serialization of the plaintext string "Hello"
(not including the quotes) is 6548656C6C6F
. The following command line calculates the BLAKE3 sum of this sequence:¶
$ echo "6548656C6C6F" | xxd -r -p | b3sum --no-names bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
Using the envelope command line tool [ENVELOPE-CLI], we create an envelope with this string as the subject and display the envelope's digest. The digest below matches the one above.¶
$ envelope subject "Hello" | envelope digest --hex bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
4.2. Known Value Case Digest Calculation
The envelope image of the known-value
case is the CBOR serialization of the unsigned integer value of the value tagged with #6.223, as specified in the Known Value Case Format section above.¶
digest(#6.223(uint))¶
4.2.1. Example
The known value verifiedBy
in CBOR diagnostic notation is 223(3)
, which in hex is D8DF03
. The BLAKE3 sum of this sequence is:¶
$ echo "D8DF03" | xxd -r -p | b3sum --no-names d59f8c0ffd798eac7602d1dfb15c457d8e51c3ce34d499e5d2a4fbd2cfe3773f¶
Using the envelope command line tool [ENVELOPE-CLI], we create an envelope with this known value as the subject and display the envelope's digest. The digest below matches the one above.¶
$ envelope subject --known verifiedBy | envelope digest --hex d59f8c0ffd798eac7602d1dfb15c457d8e51c3ce34d499e5d2a4fbd2cfe3773f¶
4.3. Encrypted Case Digest Calculation
The encrypted
case declares its digest to be the digest of plaintext before encryption. The declaration is made using an MAC, and when decrypting an element the implementation MUST compare the digest of the decrypted element to the declared digest and flag an error if they do not match.¶
4.3.1. Example
If we create the envelope from the leaf example above, encrypt it, and then request its digest:¶
$ KEY=`envelope generate key` $ envelope subject "Hello" | \ envelope encrypt --key $KEY | \ envelope digest --hex bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
...we see that its digest is the same as its plaintext form:¶
$ envelope subject "Hello" | envelope digest --hex bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
4.4. Elided Case Digest Calculation
The elided
case declares its digest to be the digest of the envelope for which it is a placeholder.¶
4.4.1. Example
If we create the envelope from the leaf example above, elide it, and then request its digest:¶
$ envelope subject "Hello" | envelope elide | envelope digest --hex bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
...we see that its digest is the same as its unelided form:¶
$ envelope subject "Hello" | envelope digest --hex bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
4.5. Node Case Digest Calculation
The envelope image of the node
case is the concatenation of the digest of its subject
and the digests of its assertions sorted in ascending lexicographic order.¶
With a node
case, there MUST always be at least one assertion.¶
digest(subject.digest || assertion-0.digest || assertion-1.digest || ... || assertion-n.digest)¶
4.5.1. Example
We create four separate envelopes and display their digests:¶
$ SUBJECT=`envelope subject "Alice"` $ envelope digest --hex $SUBJECT 278403504ad3a9a9c24c1b35a3673eee165a5d523f8d2a5cf5ce6dd25a37f110 $ ASSERTION_0=`envelope subject assertion "knows" "Bob"` $ envelope digest --hex $ASSERTION_0 55560bdf060f1220199c87e84e29cecef96ef811de4f399dab2fde9425d0d418 $ ASSERTION_1=`envelope subject assertion "knows" "Carol"` $ envelope digest --hex $ASSERTION_1 71a3069088c61c928f54ec50859f3f09b9318e9ca6734e6a3b5f77aa3159a711 $ ASSERTION_2=`envelope subject assertion "knows" "Edward"` $ envelope digest --hex $ASSERTION_2 1e0b049b8d2b21d4bb32f90b4a9e6b5031526f868da303268a9c1c75c0082446¶
We combine the envelopes into a single envelope with three assertions:¶
$ ENVELOPE=`envelope assertion add envelope $ASSERTION_0 $SUBJECT | \ envelope assertion add envelope $ASSERTION_1 | \ envelope assertion add envelope $ASSERTION_2` $ envelope $ENVELOPE "Alice" [ "knows": "Bob" "knows": "Carol" "knows": "Edward" ] $ envelope digest --hex $ENVELOPE 0abac60ae3a45a8a7b448b309cca30bdd747f42f508a9a97ea64d657d1f7ea81¶
Note that in the envelope notation representation above, the assertions are sorted alphabetically, with "knows": "Edward"
coming last. But internally, the three assertions are ordered by digest in ascending lexicographic order, with "Edward" coming first because its digest starting with 1e0b049b
is the lowest, as in the tree formatted display below:¶
$ envelope --tree $ENVELOPE 0abac60a NODE 27840350 subj "Alice" 1e0b049b ASSERTION 7092d620 pred "knows" d5a375ff obj "Edward" 55560bdf ASSERTION 7092d620 pred "knows" 9a771715 obj "Bob" 71a30690 ASSERTION 7092d620 pred "knows" ad2c454b obj "Carol"¶
To replicate this, we make a list of digests, starting with the subject, and then each assertion's digest in ascending lexicographic order:¶
278403504ad3a9a9c24c1b35a3673eee165a5d523f8d2a5cf5ce6dd25a37f110 1e0b049b8d2b21d4bb32f90b4a9e6b5031526f868da303268a9c1c75c0082446 55560bdf060f1220199c87e84e29cecef96ef811de4f399dab2fde9425d0d418 71a3069088c61c928f54ec50859f3f09b9318e9ca6734e6a3b5f77aa3159a711¶
We then calculate the BLAKE3 hash of the concatenation of these four digests, and note that this is the same digest as the composite envelope's digest:¶
echo "278403504ad3a9a9c24c1b35a3673eee165a5d523f8d2a5cf5ce6dd2\ 5a37f1101e0b049b8d2b21d4bb32f90b4a9e6b5031526f868da303268a9c1c\ 75c008244655560bdf060f1220199c87e84e29cecef96ef811de4f399dab2f\ de9425d0d41871a3069088c61c928f54ec50859f3f09b9318e9ca6734e6a3b\ 5f77aa3159a711" | xxd -r -p | b3sum --no-names 0abac60ae3a45a8a7b448b309cca30bdd747f42f508a9a97ea64d657d1f7ea81 $ envelope digest --hex $ENVELOPE 0abac60ae3a45a8a7b448b309cca30bdd747f42f508a9a97ea64d657d1f7ea81¶
4.6. Wrapped Envelope Case Digest Calculation
The envelope image of the wrapped-envelope
case is the digest of the wrapped envelope:¶
digest(envelope.digest)¶
4.6.1. Example
As above, we note the digest of a leaf envelope is the digest of its CBOR:¶
$ envelope subject "Hello" | envelope digest --hex bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea $ echo "6548656C6C6F" | xxd -r -p | b3sum --no-names bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66c2d1d6b455ea¶
Now we note that the digest of a wrapped envelope is the digest of the wrapped envelope's digest:¶
$ envelope subject "Hello" | \ envelope subject --wrapped | \ envelope digest --hex 55d4e04399c54bec23346ebf612bf237e659a72e34df14420e18e0290f2\ 8c31b $ echo "bd6c78899fc1f22c667cfe6893aa2414f8124f25ae6ea80a1a66\ c2d1d6b455ea" | xxd -r -p | b3sum --no-names 55d4e04399c54bec23346ebf612bf237e659a72e34df14420e18e0290f28c31b¶
4.7. Assertion Case Digest Calculation
The envelope image of the assertion
case is the concatenation of the digests of the assertion's predicate and object in that order:¶
digest(predicate.digest || object.digest)¶
4.7.1. Example
We create an assertion from two separate envelopes and display their digests:¶
$ PREDICATE=`envelope subject "knows"` $ envelope digest --hex $PREDICATE 7092d62002c3d0f3c889058092e6915bad908f03263c2dc91bfea6fd8ee62fab $ OBJECT=`envelope subject "Bob"` $ envelope digest --hex $OBJECT 9a7717153d7a31b0390011413bdf9500ff4d8870ccf102ae31eaa165ab25df1a $ ASSERTION=`envelope subject assertion "knows" "Bob"` $ envelope digest --hex $ASSERTION 55560bdf060f1220199c87e84e29cecef96ef811de4f399dab2fde9425d0d418¶
To replicate this, we make a list of the predicate digest and the object digest, in that order:¶
7092d62002c3d0f3c889058092e6915bad908f03263c2dc91bfea6fd8ee62fab 9a7717153d7a31b0390011413bdf9500ff4d8870ccf102ae31eaa165ab25df1a¶
We then calculate the BLAKE3 hash of the concatenation of these two digests, and note that this is the same digest as the composite envelope's digest:¶
echo "7092d62002c3d0f3c889058092e6915bad908f03263c2dc91bfea6fd8e\ e62fab9a7717153d7a31b0390011413bdf9500ff4d8870ccf102ae31eaa165ab\ 25df1a" | xxd -r -p | b3sum --no-names 55560bdf060f1220199c87e84e29cecef96ef811de4f399dab2fde9425d0d418 $ envelope digest --hex $ASSERTION 55560bdf060f1220199c87e84e29cecef96ef811de4f399dab2fde9425d0d418¶
5. Envelope Hierarchy
This section is informative, and describes envelopes from the perspective of their hierachical structure and the various ways they can be formatted.¶
An envelope consists of a subject
and one or more predicate-object
pairs called assertions
:¶
subject [ predicate0: object0 predicate1: object1 ... predicateN: objectN ]¶
A concrete example of this might be:¶
"Alice" [ "knows": "Bob" "knows": "Carol" "knows": "Edward" ]¶
In the diagram above, there are five distinct "positions" of elements, each of which is itself an envelope and which therefore produces its own digest:¶
The examples above are printed in "envelope notation," which is designed to make the semantic content of envelopes human-readable, but it doesn't show the actual digests associated with each of the positions. To see the structure more completely, we can display every element of the envelope in Tree Notation:¶
0abac60a NODE 27840350 subj "Alice" 1e0b049b ASSERTION 7092d620 pred "knows" d5a375ff obj "Edward" 55560bdf ASSERTION 7092d620 pred "knows" 9a771715 obj "Bob" 71a30690 ASSERTION 7092d620 pred "knows" ad2c454b obj "Carol"¶
We can also show the digest tree graphically using Mermaid [MERMAID]:¶
For easy recognition, envelope trees and Mermaid diagrams only show the first four bytes of each digest, but internally all digests are 32 bytes.¶
From the above envelope and its tree, we make the following observations:¶
- The envelope is a
node
case, which holds the overall envelope digest.¶ - The subject "Alice" has its own digest.¶
- Each of the three assertions have their own digests¶
- The predicate and object of each assertion each have their own digests.¶
- The assertions appear in the structure in ascending lexicographic order by digest, which is distinct from envelope notation where they appear sorted alphabeticaly.¶
The following subsections present each of the seven enumerated envelope cases in five different output formats:¶
These examples may be used as test vectors. In addition, each subsection starts with the envelope command line [ENVELOPE-CLI] needed to generate the envelope being formatted.¶
5.1. Leaf Case
5.1.1. Envelope CLI Command Line
envelope subject "Alice"¶
5.1.2. Envelope Notation
"Alice"¶
5.1.5. CBOR Diagnostic Notation
200( ; envelope 24("Alice") ; leaf )¶
5.2. Known Value Case
5.2.1. Envelope CLI Command Line
envelope subject --known verifiedBy¶
5.2.2. Envelope Notation
verifiedBy¶
5.2.5. CBOR Diagnostic Notation
200( ; envelope 223(3) ; known-value )¶
5.3. Encrypted Case
5.3.1. Envelope CLI Command Line
envelope subject "Alice" | envelope encrypt \ --key `envelope generate key`¶
5.3.2. Envelope Notation
ENCRYPTED¶
5.3.5. CBOR Diagnostic Notation
200( ; envelope 201( ; crypto-msg [ h'6bfa027df241def0', h'5520ca6d9d798ffd32d075c4', h'd4b43d97a37eb280fdd89cf152ccf57d', h'd8cb5820278403504ad3a9a9c24c1b35a3673eee165a5d52\ 3f8d2a5cf5ce6dd25a37f110' ] ) )¶
5.4. Elided Case
5.4.1. Envelope CLI Command Line
envelope subject "Alice" | envelope elide¶
5.4.2. Envelope Notation
ELIDED¶
5.4.5. CBOR Diagnostic Notation
200( ; envelope 203( ; crypto-digest h'278403504ad3a9a9c24c1b35a3673eee165a5d523f8d2a5cf5ce6dd25a37\ f110' ) )¶
5.5. Node Case
5.5.1. Envelope CLI Command Line
envelope subject "Alice" | envelope assertion "knows" "Bob"¶
5.5.2. Envelope Notation
"Alice" [ "knows": "Bob" ]¶
5.5.3. Tree
e54d6fd3 NODE 27840350 subj "Alice" 55560bdf ASSERTION 7092d620 pred "knows" 9a771715 obj "Bob"¶
5.5.5. CBOR Diagnostic Notation
200( ; envelope [ 200( ; envelope 24("Alice") ; leaf ), 200( ; envelope 221( ; assertion [ 200( ; envelope 24("knows") ; leaf ), 200( ; envelope 24("Bob") ; leaf ) ] ) ) ] )¶
5.6. Wrapped Envelope Case
5.6.1. Envelope CLI Command Line
envelope subject "Alice" | envelope subject --wrapped¶
5.6.2. Envelope Notation
{ "Alice" }¶
5.6.5. CBOR Diagnostic Notation
200( ; envelope 224( ; wrapped-envelope 24("Alice") ; leaf ) )¶
5.7. Assertion Case
5.7.1. Envelope CLI Command Line
envelope subject assertion "knows" "Bob"¶
5.7.2. Envelope Notation
"knows": "Bob"¶
5.7.5. CBOR Diagnostic Notation
200( ; envelope 221( ; assertion [ 200( ; envelope 24("knows") ; leaf ), 200( ; envelope 24("Bob") ; leaf ) ] ) )¶
6. Known Values
This section is informative.¶
Known values are a specific case of envelope that defines a namespace consisting of single unsigned integers. The expectation is that the most common and widely useful predicates will be assigned in this namespace, but known values may be used in any position in an envelope.¶
Most of the examples in this document use UTF-8 strings as predicates, but in real-world applications the same predicate may be used many times in a document and across a body of knowledge. Since the size of an envelope is proportionate to the size of its content, a predicate made using a string like a human-readable sentence or a URL could take up a great deal of space in a typical envelope. Even emplacing the digest of a known structure takes 32 bytes. Known values provide a way to compactly represent predicates and other common values in as few as three bytes.¶
Other CBOR tags can be used to define completely separate namespaces if desired, but the reference implementation [ENVELOPE-REFIMPL] and its tools [ENVELOPE-CLI] recognize specific known values and their human-readable names.¶
Custom ontologies such as Web Ontology Language [OWL] or Friend of a Friend [FOAF] may someday be represented as ranges of integers in this known space, or be defined in their own namespaces.¶
A specification for a standard minimal ontology of known values is TBD.¶
The following table lists all the known values currently defined in the reference implementation [ENVELOPE-REFIMPL]. This list is currently informative, but all these known values have been used in the reference implementation for various examples and test vectors.¶
Note that a work-in-progress specification for remote procedure calls using envelope has been assigned a namespace starting at 100.¶
Value | Name | Used as | Description |
---|---|---|---|
1 |
id
|
predicate | A domain-unique identifier of some kind. |
2 |
isA
|
predicate | A domain-specific type identifier. |
3 |
verifiedBy
|
predicate | A signature on the digest of the subject, verifiable with the signer's public key. |
4 |
note
|
predicate | A human-readable informative note. |
5 |
hasRecipient
|
predicate | A sealed message encrypting to a specific recipient the ephemeral encryption key that was used to encrypt the subject. |
6 |
sskrShare
|
predicate | A single SSKR [SSKR] share of the emphemeral encryption key that was used to encrypt the subject. |
7 |
controller
|
predicate | A domain-unique identifier of the party that controls the contents of this document. |
8 |
publicKeys
|
predicate | A "public key base" consisting of the information needed to encrypt messages to a party or verify messages signed by them. |
9 |
dereferenceVia
|
predicate | A domain-unique Pointer such as a URL indicating from where the elided envelope subject can be recovered. |
10 |
entity
|
predicate | A document representing an entity of interest in the current context. |
11 |
hasName
|
predicate | The human-readable name of the subject. |
12 |
language
|
predicate | The ISO 639 [ISO639] code for the human natural language used to write the subject. |
13 |
issuer
|
predicate | A domain-unique identifier of the document's issuing entity. |
14 |
holder
|
predicate | A domain-unique identifier of the document's holder, i.e., the entity to which the document pertains. |
15 |
salt
|
predicate | A block of random data used to deliberately perturb the digest tree for the purpose of decorrelation. |
16 |
date
|
predicate | A timestamp, e.g., the time at which a remote procedure call request was signed. |
100 |
body
|
predicate | RPC: The body of a function call. The object is the function identifier and the assertions on the object are the function parameters. |
101 |
result
|
predicate | RPC: A result of a successful function call. The object is the returned value. |
102 |
error
|
predicate | RPC: A result of an unsuccessful function call. The object is message or other diagnostic state. |
103 |
ok
|
object | RPC: The object of a result predicate for a successful remote procedure call that has no other return value. |
104 |
processing
|
object | RPC: The object of a result predicate where a function call is accepted for processing and has not yet produced a result or error. |
7. Existence Proofs
This section is informative.¶
Because each element of an envelope provides a unique digest, and because changing an element in an envelope changes the digest of all elements upwards towards its root, the structure of an envelope is comparable to a [MERKLE].¶
In a Merkle Tree, all semantically significant information is carried by the tree's leaves (for example, the transactions in a block of Bitcoin transactions) while the internal nodes of the tree are nothing but digests computed from combinations of pairs of lower nodes, all the way up to the root of the tree (the "Merkle root".)¶
In an envelope, every digest references some semantically significant content: it could reference the subject of the envelope, or one of the assertions in the envelope, or at the predicate or object of a given assertion. Of course, those elements are all envelopes themselves, and thus potentially the root of their own subtree.¶
In a merkle tree, the minumum subset of hashes necessary to confirm that a specific leaf node (the "target") must be present is called a "Merkle proof." For envelopes, an analogous proof would be a transformation of the envelope that is entirely elided but preserves the structure necesssary to reveal the target.¶
As an example, we produce an envelope representing a simple FOAF [FOAF] style graph:¶
$ ALICE_FRIENDS=`envelope subject Alice | envelope assertion knows Bob | envelope assertion knows Carol | envelope assertion knows Dan` $ envelope $ALICE_FRIENDS "Alice" [ "knows": "Bob" "knows": "Carol" "knows": "Dan" ]¶
We then elide the entire envelope, leaving only the root-level digest. This digest is a cryptographic commitment to the envelope's contents.¶
$ COMMITMENT=`envelope elide $ALICE_FRIENDS` $ envelope --tree $COMMITMENT cd84aa96 ELIDED¶
A third party, having received this commitment, can then request proof that the envelope contains a particular assertion, called the target.¶
$ REQUESTED_ASSERTION=`envelope subject assertion knows Bob` $ envelope --tree $REQUESTED_ASSERTION 55560bdf ASSERTION 7092d620 pred "knows" 9a771715 obj "Bob"¶
The holder can then produce a proof, which is an elided form of the original document that contains a minimum spanning set of digests including the target.¶
$ KNOWS_BOB_DIGEST=`envelope digest $REQUESTED_ASSERTION` $ KNOWS_BOB_PROOF=`envelope proof create $ALICE_FRIENDS \ $KNOWS_BOB_DIGEST` $ envelope --tree $KNOWS_BOB_PROOF cd84aa96 NODE 27840350 subj ELIDED 55560bdf ELIDED 71a30690 ELIDED 907c8857 ELIDED¶
Note that the proof:¶
- has the same root digest as the commitment,¶
- includes the digest of the
knows-Bob
assertion:55560bdf
,¶ - includes only the other digests necessary to calculate the digest tree from the target back to the root, without revealing any additional information about the envelope.¶
Criteria 3 was met when the proof was produced. Critera 1 and 2 are checked by the command line tool when confirming the proof:¶
$ envelope proof confirm --silent $COMMITMENT $KNOWS_BOB_PROOF \ $KNOWS_BOB_DIGEST && echo "Success" Success¶
8. Reference Implementation
This section is informative.¶
The current reference implementation of envelope is written in Swift and is part of the Blockchain Commons Secure Components Framework [ENVELOPE-REFIMPL].¶
The envelope command line tool [ENVELOPE-CLI] is also written in Swift.¶
9. Future Proofing
This section is informative.¶
Because envelope is a specification for documents that may persist indefinitely, it is a design goal of this specification that later implementation versions are able to parse envelopes produced by earlier versions. Furthermore, later implementations should be able to compose new envelopes using older envelopes as components.¶
The authors considered adding a version number to every envelope, but deemed this unnecessary as any code that parses later envelopes can determine what features are required from the CBOR structure alone.¶
The general migration strategy is that the specific structure of envelopes defined in the first general release of this specification is the baseline, and later specifications may incrementally add structural features such as envelope cases, new tags, or support for new structures or algorithms, but are generally expected to maintain backward compatibility.¶
An example of addition would be to add an additional supported method of encryption. The crypto-msg
specification CDDL is a CBOR array with either three or four elements:¶
crypto-msg = #6.201([ ciphertext, nonce, auth, ? aad ]) ciphertext = bytes ; encrypted using ChaCha20 aad = digest ; Additional Authenticated Data nonce = bytes .size 12 ; Random, generated at encryption-time auth = bytes .size 16 ; Authentication tag created by Poly1305¶
For the sake of this example we assume the new method to be supported has all the same fields, but needs to be processed differently. In this case, the first element of the array could become an optional integer:¶
crypto-msg = #6.201([ ? version, ciphertext, nonce, auth, ? aad ]) version = uint ; absent for old method, 1 for new method¶
If present, the first field specifies the later encryption method. If absent, the original encryption method is specified. For low numbered versions, the storage cost of specifying a later version is one byte, and backwards compatibility is preserved.¶
10. Security Considerations
This section is informative unless noted otherwise.¶
10.1. Structural Considerations
10.1.1. CBOR Considerations
Generally, this document inherits the security considerations of CBOR [RFC8949]. Though CBOR has limited web usage, it has received strong usage in hardware, resulting in a mature specification.¶
10.2. Cryptographic Considerations
10.2.1. Inherited Considerations
Generally, this document inherits the security considerations of the cryptographic constructs it uses such as IETF-ChaCha20-Poly1305 [RFC8439] and BLAKE3 [BLAKE3].¶
10.2.2. Choice of Cryptographic Primitives (No Set Curve)
Though envelope recommends the use of certain cryptographic algorithms, most are not required (with the exception of BLAKE3 usage, noted below). In particular, envelope has no required curve. Different choices will obviously result in different security considerations.¶
10.3. Validation Requirements
Unlike HTML, envelope is intended to be conservative in both what it sends and what it accepts. This means that receivers of envelope-based documents should carefully validate them. Any deviation from the validation requirements of this specification MUST result in the rejection of the entire envelope. Even after validation, envelope contents should be treated with due skepticism.¶
10.4. Signature Considerations
This specification allows the signing of envelopes that are partially (or even entirely) elided. There may be use cases for this, such as when multiple users are each signing partially elided envelopes that will then be united. However, it's generally a dangerous practice. Our own tools require overrides to allow it. Other developes should take care to warn users of the dangers of signing elided envelopes.¶
10.5. Hashing
10.5.1. Choice of BLAKE3 Hash Primitive
Although BLAKE2 is more widely supported by IETF specifications, envelope instead makes use of BLAKE3. This is to take advantage of advances in the updated protocol: the new BLAKE3 implementation uses a Merkle Tree format that allows for streaming and for incremental updates as well as high levels of parallelism. The fact that BLAKE3 is newer should be taken into consideration, but its foundation in BLAKE2 and its support by experts such as the Zcash Foundation are considered to grant it sufficient maturity.¶
Whereas, envelope is written to allow for the easy exchange of most of its cryptographic protocols, this is not true for BLAKE3: swapping for another hash protocol would result in incompatible envelopes. Thus, any security considerations related to BLAKE3 should be given careful attention.¶
10.5.2. Well-Known Hashes
Because they are short unsigned integers, well-known values produce well-known digests. Elided envelopes may in some cases inadvertently reveal information by transmitting digests that may be correlated to known information. Envelopes can be salted by adding assertions that contain random data to perturb the digest tree, hence decorrelating it from any known values.¶
10.5.3. Digest Trees
Existence proofs include the minimal set of digests that are necessary to calculate the digest tree from the target to the root, but may themselves leak information about the contents of the envelope due to the other digests that must be included in the spanning set. Designers of envelope-based formats should anticipate such attacks and use decorrelation mechanisms like salting where necessary.¶
10.5.4. A Tree, Not a List
Envelope makes use of a hash tree instead of a hash list to allow this sort of minimal revelation. This decision may also have advantages in scaling. However, there should be further investigation of the limitations of hash trees regarding scaling, particularly for the scaling of large, elided structures.¶
There should also be careful consideration of the best practices needed for the creation of deeply nested envelopes, for the usage of subenvelopes created at different times, and for other technical details related to the use of a potentially broad hash tree, as such best practices do not currently exist.¶
10.5.5. Salts
Specifics for the size and usage of salt are not included in this specifications. There are also no requirements for whether salts should be revealed or can be elided. Careful attention may be required for these factors to ensure that they don't accidentally introduce vulnerabilities into usage.¶
10.5.6. Collisions
Hash trees tend to make it harder to create collisions than the use of a raw hash function. If attackers manage to find a collision for a hash, they can only replace one node (and its children), so the impact is limited, especially since finding collisions higher in a hash tree grows increasingly difficult because the collision must be a concatenation of multiple hashes. This should generally reduce issues with collisions: finding collisions that fit a hash tree tends to be harder than finding regular collisions. But, the issue always should be considered.¶
10.5.7. Leaf-Node Attacks
Envelope's hash tree is proof against the leaf-node weakness of Bitcoin that can affect SPVs because its predicates are an unordered set, serialized in increasing lexicographic order by digest, with no possibility for duplication and thus fully deterministic ordering of the tree.¶
See https://bitslog.com/2018/06/09/leaf-node-weakness-in-bitcoin-merkle-tree-design/ for the leaf-node attack.¶
10.5.8. Forgery Attacks on Unbalanced Trees
Envelopes should also be proof against forgery attacks before of their different construction, where all nodes contain both data and hashes. Nonetheless, care must still be taken with trees, especially when also using elision, which limits visible information.¶
See https://bitcointalk.org/?topic=102395 for the forgery attack.¶
10.6. Elision
10.6.1. Duplication of Claims
Support for elision allows for the possibility of contradictory claims where one is kept hidden at any time. So, for example, an evelope could contain contradictory predictions of election results and only reveal the one that matches the actual results. As a result, revealed material should be carefully assessed for this possibility when elided material also exists.¶
10.7. Additional Specification Creation
Creators of specifications for envelope-based documents should give due consideration to security implications that are outside the scope of this specification to anticipate or avert. One example would be the number and type of assertions allowed in a particular document, and whether additional assertions (metadata) are allowed on those assertions.¶
11. IANA Considerations
11.2. Media Type
The proposed media type [RFC6838] for envelope is application/envelope+cbor
.¶
- Type name: application¶
- Subtype name: envelope+cbor¶
- Required parameters: n/a¶
- Optional parameters: n/a¶
- Encoding considerations: binary¶
- Security considerations: See the previous section of this document¶
- Interoperability considerations: n/a¶
- Published specification: This document¶
- Applications that use this media type: None yet, but it is expected that this format will be deployed in protocols and applications.¶
-
Additional information:¶
-
Person & email address to contact for further information:¶
- Christopher Allen christophera@blockchaincommons.com¶
- Wolf McNally wolf@wolfmcnally.com¶
- Intended usage: COMMON¶
- Restrictions on usage: none¶
-
Author:¶
- Wolf McNally wolf@wolfmcnally.com¶
-
Change controller:¶
- The IESG iesg@ietf.org¶
12. Appendix: Why CBOR?
The Concise Binary Object Representation, or CBOR, was chosen as the foundational data structure envelopes for a variety of reasons. These include:¶
- IETF Standardization. CBOR is a mature open international IETF standard [RFC8949].¶
- IANA Registration. CBOR is further standardized by the registration of common data type tags through IANA [IANA-CBOR-TAGS].¶
- Fully Extensible. Beyond that, CBOR is entirely extensible with any data types desired, such as our own listing of UR tags [BC-UR-TAGS].¶
- Self-describing Descriptions. CBOR-encoded data is self-describing, so there are no requirements for pre-defined schemas nor more complex descriptions such as those found in ASN.1 [ASN-1].¶
- Constraint Friendly. CBOR is built to be frugal with CPU and memory, so it works well in constrained environments such as on cryptographic silicon chips.¶
- Unambiguous Encoding. Our use of Deterministic CBOR, combined with our own specification rules, such as the sorting of Envelopes by hash, results in a singular, unambiguous encoding.¶
- Multiple Implementations. Implementation are available in a variety of languages [CBOR-IMPLS].¶
- Compact Implementations. Compactness of encoding and decoding is one of CBOR's core goals; implementations are built on headers or snippets of code, and do not require any external tools.¶
Also see a comparison to Protocol Buffers [UR-QA], a comparison to Flatbuffers [CBOR-FLATBUFFERS], and a comparison to other binary formats [CBOR-FORMAT-COMPARISON].¶
13. References
13.1. Normative References
- [BLAKE3]
- "BLAKE3 Cryptographic Hash Function", n.d., <https://blake3.io>.
- [CRYPTO-MSG]
- "UR Type Definition for Secure Messages", n.d., <https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2022-001-secure-message.md>.
- [ENVELOPE-CLI]
- "Envelope Command Line Tool", n.d., <https://github.com/BlockchainCommons/envelope-cli-swift>.
- [ENVELOPE-REFIMPL]
- "Envelope Reference Implementation, part of the Blockchain Commons Secure Components Framework", n.d., <https://github.com/BlockchainCommons/BCSwiftSecureComponents>.
- [IANA-CBOR-TAGS]
- "IANA, Concise Binary Object Representation (CBOR) Tags", n.d., <https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
- [RFC2119]
- Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
- [RFC6838]
- Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, DOI 10.17487/RFC6838, , <https://www.rfc-editor.org/rfc/rfc6838>.
- [RFC8174]
- Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
- [RFC8439]
- Nir, Y. and A. Langley, "ChaCha20 and Poly1305 for IETF Protocols", RFC 8439, DOI 10.17487/RFC8439, , <https://www.rfc-editor.org/rfc/rfc8439>.
- [RFC8610]
- Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, , <https://www.rfc-editor.org/rfc/rfc8610>.
- [RFC8949]
- Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <https://www.rfc-editor.org/rfc/rfc8949>.
13.2. Informative References
- [ASN-1]
- "X.680 : Information technology - Abstract Syntax Notation One (ASN.1): Specification of basic notation", n.d., <https://www.itu.int/rec/T-REC-X.680/>.
- [BC-UR-TAGS]
- "Registry of Uniform Resource (UR) Types", n.d., <https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-006-urtypes.md>.
- [CBOR-FLATBUFFERS]
- "Flatbuffers vs CBOR", n.d., <https://stackoverflow.com/questions/47799396/flatbuffers-vs-cbor>.
- [CBOR-FORMAT-COMPARISON]
- "Comparison of Other Binary Formats to CBOR's Design Objectives", n.d., <https://www.rfc-editor.org/rfc/rfc8949#name-comparison-of-other-binary->.
- [CBOR-IMPLS]
- "CBOR Implementations", n.d., <http://cbor.io/impls.html>.
- [FOAF]
- "Friend of a Friend (FOAF)", n.d., <https://en.wikipedia.org/wiki/FOAF>.
- [ISO639]
- "ISO 639 - Standard for representation of names for language and language groups", n.d., <https://en.wikipedia.org/wiki/ISO_639>.
- [MERKLE]
- "Merkle Tree", n.d., <https://en.wikipedia.org/wiki/Merkle_tree>.
- [MERMAID]
- "Mermaid.js", n.d., <https://mermaid-js.github.io/mermaid/#/>.
- [OWL]
- "Web Ontology Language (OWL)", n.d., <https://www.w3.org/OWL/>.
- [SSKR]
- "Sharded Secret Key Recovery (SSKR)", n.d., <https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-011-sskr.md>.
- [UR-QA]
- "UR (Uniform Resources) Q&A", n.d., <https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-005-ur.md#qa>.
Acknowledgments
TODO acknowledge.¶