Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Authenticated Transfer Repository and Synchronization
draft-holmgren-at-repository-00

Versions:
This document is an Internet-Draft (I-D). Anyone may submit an I-D to the IETF. This I-D is not endorsed by the IETF and has no formal standing in the IETF standards process.
Document	Type	Active Internet-Draft (individual)
	Authors	Daniel Holmgren , Bryan Newbold
	Last updated	2025-09-14
	RFC stream	(None)
	Intended RFC status	(None)
	Formats	txt html xml htmlized bibtex bibxml
	Additional resources	GitHub Repository Additional Web Page
Stream	Stream state	(No stream defined)
	Consensus boilerplate	Unknown
	RFC Editor Note	(None)
IESG	IESG state	I-D Exists
	Telechat date	(None)
	Responsible AD	(None)
	Send notices to	(None)
Email authors IPR References Referenced by Nits Search email archive
draft-holmgren-at-repository-00
Network Working Group                                        D. Holmgren
Internet-Draft                                                B. Newbold
Intended status: Standards Track                          Bluesky Social
Expires: 18 March 2026                                 14 September 2025

         Authenticated Transfer Repository and Synchronization
                    draft-holmgren-at-repository-00

Abstract

   This document specifies the repository and synchronization semantics
   for Authenticated Transfer (AT), a protocol for cryptographically-
   verifiable storage and distribution of structured user-controlled
   data.  It defines the AT repository that serves as the fundamental
   data storage model.  It further specifies synchronization mechanisms
   that allow efficient distribution of repository changes to interested
   parties.

   This document specifically deals with the repository and sync
   protocol.  Overall network architecture is described further in
   [AT-ARCH].

About This Document

   This note is to be removed before publishing as an RFC.

   Status information for this document may be found at
   https://datatracker.ietf.org/doc/draft-holmgren-at-repository/.

   Source for this draft and an issue tracker can be found at
   https://github.com/bluesky-social/ietf-drafts.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

Holmgren & Newbold        Expires 18 March 2026                 [Page 1]
Internet-Draft              AT Repo and Sync              September 2025

   This Internet-Draft will expire on 18 March 2026.

Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Repository  . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.1.  Repository Semantics  . . . . . . . . . . . . . . . . . .   4
     2.2.  Repository Structure  . . . . . . . . . . . . . . . . . .   5
     2.3.  User Identifiers  . . . . . . . . . . . . . . . . . . . .   5
     2.4.  Commit Objects  . . . . . . . . . . . . . . . . . . . . .   6
     2.5.  MST Construction  . . . . . . . . . . . . . . . . . . . .   7
       2.5.1.  Tree Structure  . . . . . . . . . . . . . . . . . . .   7
       2.5.2.  Layer Calculation . . . . . . . . . . . . . . . . . .   7
       2.5.3.  MST Construction Example  . . . . . . . . . . . . . .   8
       2.5.4.  Empty Nodes . . . . . . . . . . . . . . . . . . . . .   9
       2.5.5.  MST Node Schema . . . . . . . . . . . . . . . . . . .   9
       2.5.6.  MST Node example  . . . . . . . . . . . . . . . . . .  10
     2.6.  Commit Signatures . . . . . . . . . . . . . . . . . . . .  10
       2.6.1.  Signature Generation  . . . . . . . . . . . . . . . .  11
       2.6.2.  Supported Curves  . . . . . . . . . . . . . . . . . .  11
       2.6.3.  Signature Canonicalization  . . . . . . . . . . . . .  11
     2.7.  Deterministic CBOR Encoding . . . . . . . . . . . . . . .  12
     2.8.  Repository Serialization Format . . . . . . . . . . . . .  12
       2.8.1.  Header Format . . . . . . . . . . . . . . . . . . . .  12
       2.8.2.  Block Format  . . . . . . . . . . . . . . . . . . . .  13
       2.8.3.  Block Ordering  . . . . . . . . . . . . . . . . . . .  13
   3.  Synchronization . . . . . . . . . . . . . . . . . . . . . . .  13
     3.1.  Repository Revisions  . . . . . . . . . . . . . . . . . .  14
       3.1.1.  Timestamp Identifier Format . . . . . . . . . . . . .  14
     3.2.  Repository Diffs  . . . . . . . . . . . . . . . . . . . .  14
     3.3.  Diff Verification Limitations . . . . . . . . . . . . . .  15
   4.  Real-time synchronization . . . . . . . . . . . . . . . . . .  16
     4.1.  Cursors . . . . . . . . . . . . . . . . . . . . . . . . .  16
     4.2.  Streaming Events  . . . . . . . . . . . . . . . . . . . .  17

Holmgren & Newbold        Expires 18 March 2026                 [Page 2]
Internet-Draft              AT Repo and Sync              September 2025

       4.2.1.  Commit Events . . . . . . . . . . . . . . . . . . . .  17
       4.2.2.  Sync Events . . . . . . . . . . . . . . . . . . . . .  18
     4.3.  Commit Validation . . . . . . . . . . . . . . . . . . . .  18
     4.4.  Re-synchronization  . . . . . . . . . . . . . . . . . . .  19
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  20
     5.1.  CBOR Processing limits  . . . . . . . . . . . . . . . . .  20
     5.2.  MST Structure Attacks . . . . . . . . . . . . . . . . . .  20
     5.3.  Repository Import Validation  . . . . . . . . . . . . . .  20
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
     6.1.  Normative References  . . . . . . . . . . . . . . . . . .  20
     6.2.  Informative References  . . . . . . . . . . . . . . . . .  21
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

1.  Introduction

   The Authenticated Transfer (AT) repository and synchronization
   protocol addresses the challenges of building decentralized
   applications that require consistent data replication across
   distributed multi-party infrastructure.  Traditional web platforms
   maintain user data at a single network location, creating vendor
   lock-in and limiting user agency over their digital identity and
   published content.

   In the AT model, user data is stored in cryptographically signed
   repositories that can be hosted, synchronized, and distributed by any
   compatible server while preserving data authenticity and user
   ownership.  Each repository consists of a set of CBOR-encoded objects
   called records, organized lexicographically by type.  The
   cryptographic structure allows repository contents to be re-
   distributed and cached by any network participant without requiring
   trust in intermediary hosts.

   The synchronization system provides efficient mechanisms for
   propagating repository state changes across the network, supporting
   both real-time streaming updates and bulk synchronization scenarios.
   The protocol can detect dropped or withheld updates and provides
   cryptographic proofs for all operations, including record deletions,
   ensuring that consumers can maintain accurate and complete views of
   repository state.

   This document specifically deals with the repository and sync
   protocol.  Overall network architecture is described further in
   [AT-ARCH].

Holmgren & Newbold        Expires 18 March 2026                 [Page 3]
Internet-Draft              AT Repo and Sync              September 2025

2.  Repository

   An AT repository provides a sorted key-value interface where values
   are CBOR-encoded objects refered to as records.  Applications
   interact with repositories through standard CRUD operations while the
   underlying Merkle tree structure enables cryptographic verification
   of all modifications.

   Repository authority is established through Decentralized Identifiers
   (DIDs).  Each repository is associated with exactly one DID, which
   resolves to the cryptographic key material necessary for verifying
   repositories.

   The repository structure provides several advantages over individualy
   signed objects:

   *  Simplified key rotation through a single repository-level
      signature rather than record-level signatures

   *  Cryptographic proofs of record deletion

   *  Completeness guarantees that enable observers to detect withheld,
      missing, or outdated records

   This document describes version 3 of the AT repository format.  Both
   previous versions are deprecated, and implementations do not need to
   support them.

2.1.  Repository Semantics

   Records are discrete units of user data, each CBOR-encoded and
   identified by a unique key within the repository.  The repository is
   schema-agnostic and provides the foundational layer for higher-level
   data models and application semantics.

   Repositories support individual record operations as well as batch
   writes that group multiple operations under a single commit, or
   signed mutation, to the repository.  When applying batch operations,
   implementations should ensure that the resulting changes can be
   adequately represented within the synchronization system (see
   Section 4.2.1).

   By convention, records are organized using a hierarchical two-part
   key structure consisting of a collection identifier and a record key.
   Record keys may be derived from timestamps or other monotonically
   increasing values, ensuring that new records are typically added to
   the lexicographically rightmost position within their collection.

Holmgren & Newbold        Expires 18 March 2026                 [Page 4]
Internet-Draft              AT Repo and Sync              September 2025

   Repository efficiency, especially in partial synchronization
   situations, benefits from grouping related records around
   lexicographically similar keys.  This grouping allows for structural
   sharing within the repository data structure and reduces
   cryptographic proof sizes.

2.2.  Repository Structure

   AT repositories are organized as a Merkle Search Tree
   (https://inria.hal.science/hal-02303490/document) ([MST]) with a
   cryptographically signed commit referencing the tree root.

   The MST provides several fundamental properties for repository
   operations.  As a content-addressed structure, it enables efficient
   verification of data.  The MST maintains lexicographic key ordering,
   enabling structural sharing of intermediate tree nodes for related
   records.  It is probabilistically self-balancing, offering consistent
   performance characteristics.  Additionally the MST exhibits unicity,
   meaning that any given set of keys and values will always produce the
   same tree structure and root hash regardless of insertion order.

   Repository contents are encoded using deterministic CBOR
   serialization and organized as a directed acyclic graph where data
   objects reference each other through content hashes.  These hash-
   identified data objects, referred to as "blocks," include three
   distinct types: commit objects, MST internal nodes, and user records.

   Large binary data such as images and media files are not stored
   directly within repositories.  Instead, such data is stored
   externally and referenced in records by a hash link.

2.3.  User Identifiers

   Repository authority is established through a resolvable user
   identifier specified in the repository commit (Section 2.4).  AT
   employs Decentralized Identifiers (DIDs) as defined in [DID] for this
   purpose.

   DIDs are globally unique identifiers that resolve to DID Documents
   containing cryptographic key material and other metadata associated
   with the identifier.  Resolution enables independent verification of
   repository commits without dependence on centralized authorities.

   Each repository must reference exactly one DID, and each DID may be
   associated with at most one AT repository.

Holmgren & Newbold        Expires 18 March 2026                 [Page 5]
Internet-Draft              AT Repo and Sync              September 2025

   The signing key for repository commits is specified within the DID
   document's verificationMethod array.  The key entry must have an id
   field ending in #atproto.  When multiple possible verification
   methods are present, implementations must use the first valid entry
   and ignore subsequent ones.  The public key must be encoded using the
   publicKeyMultibase format as specified in [CONTROLLEDID].  The
   signing key must use one of the signing algorithms described in
   Section 2.6.2.

   DID resolution may return supplementary information beyond the
   signing key, including canonical repository hosting locations,
   alternative user identifiers, or relevant service endpoints.

   To ensure interoperability, AT restricts support to specific DID
   methods.  Currently supported methods are did:web and did:plc.  The
   resolution mechanisms and specifications for these methods are
   described in [DIDWEB] and [DIDPLC].

2.4.  Commit Objects

   Commit objects serve as the authoritative root of each repository,
   establishing cryptographic ownership and providing a verifiable
   reference to the state of a repository at a particular point in time.
   Each commit is digitally signed by the repository owner and contains
   metadata necessary for verification and synchronization.

   A commit object contains the following data fields:

   *  *did* (string, required): The resolvable user identifier
      associated with the repository as described in Section 2.3

   *  *version* (integer, required): Repository format version, fixed
      value of *3* for the current specification

   *  *data* (hash link, required): Hash pointer to the root of the
      repository’s MST structure

   *  *rev* (string, required): Repository revision identifier that
      functions as a logical clock and must increase monotonically (see
      Section 3.1).

   *  *prev* (hash link, nullable): Optional pointer to the previous
      commit object in the repository's history chain.  While included
      for backward compatibility with version 2 repositories, this field
      is typically null in version 3 implementations

   *  *sig* (byte array, required): Cryptographic signature over the
      commit contents.

Holmgren & Newbold        Expires 18 March 2026                 [Page 6]
Internet-Draft              AT Repo and Sync              September 2025

   Commit signature generation and verification procedures are detailed
   in Section 2.6.1.

2.5.  MST Construction

   The MST structure is deterministically reproducible from any given
   key-value mapping, where keys are non-empty byte strings and values
   are hash link references to records.  This deterministic construction
   ensures that identical input sets always produce the same root hash
   regardless of insertion order.

   The tree's structural organization depends solely on the keys
   present, not on the record values they reference.  When a record
   value changes, the new content hash propagates up through the tree
   nodes to the root, but the tree's shape and node organization remain
   unchanged.

2.5.1.  Tree Structure

   Each MST node contains a list of key-value entries and references to
   child subtrees.  Entries and subtree links are maintained in
   lexicographic order, with all keys in a linked subtree falling within
   the range corresponding to that link's position.  The ordering
   proceeds from left (lexicographically first) to right
   (lexicographically last).

   Keys are assigned to tree levels based on a layer value computed from
   the key itself.  Nodes at each level contain all keys with the
   corresponding layer value, while subtree links point to nodes
   containing keys that fall within specific lexicographic ranges but
   have lower layer values.  Adjacent keys may appear within the same
   node, but adjacent subtrees must be separated by at least one key
   entry to prevent structural ambiguity.

2.5.2.  Layer Calculation

   The layer for a given key is calculated using SHA-256 with a 2-bit
   grouping scheme that provides an average fanout of 4:

   1.  Compute the SHA-256 hash of the key (byte string) with binary
       output

   2.  Count the number of leading binary zeros in the hash

   3.  Divide by 2, rounding down to the nearest integer

   Examples of layer calculation:

Holmgren & Newbold        Expires 18 March 2026                 [Page 7]
Internet-Draft              AT Repo and Sync              September 2025

   *  key1: SHA-256 begins 100000010111... → layer 0

   *  key7: SHA-256 begins 000111100011... → layer 1

   *  key515: SHA-256 begins 000000000111... → layer 4

   When processing the MST structure, implementations must verify the
   layer assignment and ordering of keys.  While this verification is
   most essential for untrusted inputs, implementations should perform
   these checks consistently regardless of data source.  Additional
   validation of node size limits and other structural parameters is
   required to prevent resource exhaustion attacks, as detailed in
   Security Considerations (Section 5).

2.5.3.  MST Construction Example

   The following is a Merkle Search Tree containing 9 records with keys
   A-I.  Each key would include a pointer to some record hash, though
   that hash is irrelevant to the construction of the tree.  Each
   asterisk (*) represents a hash pointer to the subtree under it.

   For the sake of illustration assume the following layer calculations:

   *  layer(D) = 2

   *  layer(A|E|I) = 1

   *  layer(B|C|F|G|H) = 0

            *
            |
      -------------
     |      |      |
     *      D      *
     |             |
    ---          -----
   |   |        |  |  |
   A   *        E  *  I
       |           |
      ---        -----
     |   |      |  |  |
     B   C      F  G  H

                      Figure 1: Example MST Structure

Holmgren & Newbold        Expires 18 March 2026                 [Page 8]
Internet-Draft              AT Repo and Sync              September 2025

2.5.4.  Empty Nodes

   An empty repository containing no records is represented as a single
   MST node with no entries.  This is the only case where a node without
   entries is permitted.

   Nodes that contain no key entries but do contain subtree links are
   allowed at intermediate positions, provided those subtrees eventually
   contain key entries.  However, such nodes are not permitted at the
   root position - the root must either contain key entries or be the
   special case of a completely empty repository.  Similarly, nodes
   without key entries are not permitted at leaf positions except for
   the empty repository case.

   This structure ensures that nodes lacking key-value entries are
   pruned from the top and bottom of the tree while preserving
   intermediate nodes that maintain proper height relationships and
   prevent subtree links from skipping layers.

2.5.5.  MST Node Schema

   Given their prevalence through the repository structure, MST nodes
   require a compact binary representation for storage efficiency.  Keys
   within each node use prefix compression, where each entry specifies
   the number of bytes it shares with the preceding key in the array.
   The first entry in each node contains the complete key with a prefix
   length of zero.  This compression applies only within individual
   nodes and does not extend across node boundaries.  The compression
   scheme is mandatory to ensure deterministic MST structure across all
   implementations.

   MST nodes contain the following fields:

   *  l (hash link, nullable): Reference to a subtree node at a lower
      layer containing keys that sort lexicographically before all keys
      in the current node

   *  e (array, required): Ordered array of entry objects, each
      containing:

      -  p (integer, required): Number of bytes shared with the previous
         entry in this node

      -  k (byte string, required): Key suffix remaining after removing
         the shared prefix bytes

      -  v (hash link, required): Reference to the record data for this
         entry

Holmgren & Newbold        Expires 18 March 2026                 [Page 9]
Internet-Draft              AT Repo and Sync              September 2025

      -  t (hash link, nullable): Reference to a subtree node at a lower
         layer containing keys that sort after this entry's key but
         before the next entry's key in the current node

2.5.6.  MST Node example

   The following example shows an MST node at layer 1 containing two
   subtree pointers and two key-value entries.  The node contents in
   order are:

   *  Left subtree: hash link 0x017112643b9326...

   *  Entry: key7 → record hash link 0x0171122d9aa87e...

   *  Right subtree: hash link 0x01711247e2886f...

   *  Entry: key10 → record hash link 0x01711210b6da2c...

   This node would be encoded as follows:

   {
           l: 0x017112643b9326...
           e: [
                   {
                           p: 0,
                           k: "key7",
                           v: 0x0171122d9aa87e...
                           t: 0x01711247e2886f...
                   },
                   {
                           p: 3,
                           k: "10",
                           v: 0x01711210b6da2c...
                           t: null
                   }
           ]
   {

2.6.  Commit Signatures

   Commit objects are signed by the key declared by the repository
   owner’s resolvable identifier.  Neither the signature nor the signed
   commit object contains information about the curve type or specific
   public key used for signing.  This information must be obtained by
   resolving the repository's DID as specified in Section 2.3.

Holmgren & Newbold        Expires 18 March 2026                [Page 10]
Internet-Draft              AT Repo and Sync              September 2025

   The most recent commit must always be verifiable using the currently
   resolvable signing key.  When rotating signing keys, a new repository
   commit must be created, even if the contents and structure of the
   repository remain unchanged.

2.6.1.  Signature Generation

   To generate a commit signature:

   1.  Populate all commit data fields except the sig field

   2.  Serialize the unsigned commit using deterministic CBOR encoding
       (see Section 2.7)

   3.  Compute the SHA-256 hash of the serialized bytes

   4.  Sign the hash using the current signing key associated with the
       repository's DID

   5.  Format the signature as a concatenation of the 32-byte r and
       32-byte s values

   6.  Add the resulting 64-byte signature to the commit object as the
       sig field

2.6.2.  Supported Curves

   AT implementations must support both of the following elliptic curves
   and signature algorithms:

   *  NIST P-256 (also known as secp256r1 or p256) [SEC2]

   *  secp256k1 (also known as k256) [SEC2]

2.6.3.  Signature Canonicalization

   ECDSA signatures exhibit malleability, allowing transformation into
   distinct but equally valid signatures without access to the private
   key or original data.  While the security impact is limited,
   signature malleability could enable broadcast of multiple valid
   versions of the same repository commit with different hashes,
   potentially causing synchronization confusion.

   To prevent such scenarios, AT requires all ECDSA signatures to be
   canonicalized in low-S form.  Specifically, the s component of the
   signature must satisfy s ≤ n/2, where n is the order of the curve's
   base point.

Holmgren & Newbold        Expires 18 March 2026                [Page 11]
Internet-Draft              AT Repo and Sync              September 2025

2.7.  Deterministic CBOR Encoding

   Repository content requires consistent binary representation across
   all implementations to ensure identical content hashes and verifiable
   integrity.  All records, MST nodes, and commits must be encoded using
   Deterministically Encoded CBOR as specified in Section 4.2 of [CBOR],
   with map key ordering following the original specification in
   Section 3.9 of [RFC7049] for historical compatibility.

   For interoperability purposes, hash links between repository objects
   are encoded using a specific format within the CBOR structure.
   SHA-256 hash links are represented as CBOR byte strings under tag 42,
   with the byte string containing the 32-byte hash value prefixed by
   the fixed byte sequence 0x017112.

   Hash links that point to arbitrary binary data instead of other
   repository objects should be encoded similarly though prefixed by the
   fixed byte sequence 0x015512.

2.8.  Repository Serialization Format

   Repositories are serialized for transmission and storage as a
   concatenated sequence of block data, where blocks represent the CBOR-
   encoded records, MST nodes, and commit objects that comprise the
   repository structure.  The serialization is prefixed with a header
   that identifies the root block, typically the repository's commit
   object.

   Serialized repositories may contain partial repository state, such as
   when transmitting cryptographic proofs for specific records.  In
   these situations, they may not include unrelated MST nodes or records
   outside the proof path.

2.8.1.  Header Format

   The header is constructed by CBOR-encoding an object with the
   following fields:

   *  version (integer, required): Fixed value of 1

   *  root (array, required): Single-element array containing the hash
      link of the commit block

   The CBOR-encoded header is prefixed with its byte length encoded as
   an unsigned LEB128 integer as described in Section 7.6 of [DWARF].

Holmgren & Newbold        Expires 18 March 2026                [Page 12]
Internet-Draft              AT Repo and Sync              September 2025

2.8.2.  Block Format

   Following the header, each repository block is serialized by
   concatenating:

   1.  The combined byte length of the following two components, encoded
       as an unsigned LEB128 integer

   2.  The block's content hash, prefixed with 0x017112 as specified in
       Section 2.7

   3.  The CBOR-encoded block data

  |------- Header -------| |------------------ Data ------------------|
  [ int | Header block ] [ int | hash | block ] [ int | hash | block ] …

                   Figure 2: Repo Serialization Layout

2.8.3.  Block Ordering

   Block ordering should follow preorder traversal of the included
   repository portion when possible, though parsers must be tolerant of
   other or unexpected orderings.

   Preorder traversal enables streaming verification of repositories,
   allowing parsers to walk the MST structure and output key-to-record
   mappings while maintaining minimal MST state in memory.  This
   approach supports efficient processing of large repositories without
   requiring complete buffering of the serialized data.

3.  Synchronization

   The AT synchronization model operates on the principle that any
   participant can independently verify repository updates.  This allows
   sync to occur between any client and server without requiring that
   the server is a canonical or trusted host.

   AT supports multiple synchronization patterns: full repository
   synchronization for complete replicas, partial synchronization for
   specific record subsets, and proof-only synchronization for
   cryptographic verification without content retrieval.

   The typical synchronization workflow establishes baseline state
   through full synchronization, then maintains currency through
   incremental updates.  Full synchronization is performed by fetching a
   complete serialized repository over HTTPS.

Holmgren & Newbold        Expires 18 March 2026                [Page 13]
Internet-Draft              AT Repo and Sync              September 2025

3.1.  Repository Revisions

   Each repository maintains a rev field (short for “revision”) that
   functions as a logical clock for the progression of the contents of
   the repository over time.  The revision value is a short string value
   that must increase lexicographically with each new commit.

   Revisions may be used when comparing two repositories, especially
   when obtained from a non-canonical host, to determine which is more
   recent.

3.1.1.  Timestamp Identifier Format

   The recommended mechanism for generating revision values is the
   Timestamp Identifier (TID) format.

   TIDs provide a standardized revision format with the following
   properties:

   *  64-bit integer with big-endian byte ordering

   *  Base32-sortable encoding using characters
      234567abcdefghijklmnopqrstuvwxyz

   *  Fixed 13-character length with no padding (integer zero encodes as
      2222222222222)

   The layout of the 64-bit integer is:

   *  The top bit is always 0

   *  The next 53 bits represent microseconds since the UNIX epoch. 53
      bits is chosen as the maximum safe integer precision in a 64-bit
      floating point number, as used by Javascript.

   *  The final 10 bits are a random "clock identifier."

3.2.  Repository Diffs

   Repository diffs enable efficient synchronization by containing only
   the data that changed between two repository revisions.  A diff
   includes the commit object, MST nodes, and records that differ
   between an older baseline revision and the current revision.
   Applying a diff to the baseline repository reconstructs the complete
   current repository state.

   Diffs use the same serialization format as complete repositories,
   with the commit block serving as the root.  A diff must include:

Holmgren & Newbold        Expires 18 March 2026                [Page 14]
Internet-Draft              AT Repo and Sync              September 2025

   *  The new commit block

   *  All created and updated record blocks

   *  All MST nodes in the current repository that did not exist in the
      baseline revision

   Required blocks must be included in the diff regardless of their
   presence in earlier repository history.  For example, if an MST node
   was previously present in the repository, then deleted, and
   subsequently reintroduced during the range that the diff represents,
   then the diff must include that block even though it appeared in
   prior revisions.

   Deleted records and past versions of updated records are excluded
   from diffs.

   With the exception of deleted record data, the diff may include
   additional blocks which receivers should ignore.

3.3.  Diff Verification Limitations

   Repository diffs present verification challenges for consumers who do
   not maintain complete repository state.  These consumers often wish
   to authenticate repository content and utilize records without
   persisting the entire repository structure, making diffs an
   attractive option for lightweight verification.

   Diffs partially support this use case by providing a signed commit
   and the relevant portions of the Merkle tree, creating a verifiable
   proof chain for record creations, updates, and deletions.  When a
   recipient possesses both a diff and a corresponding list of
   operations, they can use the diff contents to cryptographically
   verify that the operations are authentic.

   However, observers without knowledge of the complete baseline
   repository state cannot reliably enumerate all operations by
   examining the diff contents alone.  While comprehensive diffs reveal
   created or updated records by traversing to the leaf nodes, they
   provide no information about deletion operations that occurred during
   the period that the diff represents.

   This means that while diffs enable verification of a known operation
   list, they cannot be used to exhaustively reconstruct the complete
   operation list from diff contents alone.  However, if a recipient has
   a complete repository structure from some prior revision and receives
   a diff representing changes since that revision, they can compute the
   complete set of operations that occurred between the two versions.

Holmgren & Newbold        Expires 18 March 2026                [Page 15]
Internet-Draft              AT Repo and Sync              September 2025

   This asymmetry means diffs alone cannot substitute for complete state
   tracking when comprehensive operation enumeration is required.  An
   efficient mechanism for cross-verification of a diff and enumerated
   operation list against the prior repository commit state is described
   in {#streaming-validation}.

4.  Real-time synchronization

   AT supports real-time synchronization, enabling applications to
   receive repository updates with minimal latency through a pull-based
   WebSocket connection.

   Real-time streams of repository updates are often referred to as the
   “firehose”. The firehose delivers events containing repository diffs
   along with supporting metadata necessary for verification and
   processing.

   Each event includes a monotonic cursor that establishes a total
   ordering across all repository changes from a given host.  This
   ordering enables reliable event replay and ensures that consumers can
   maintain consistent state even when reconnecting after network
   interruptions.

   AT allows consumers to maintain fully-verified copies of repository
   records without storing the underlying Merkle tree structure,
   providing an efficient method for applications that need
   authenticated content access without the overhead of complete
   repository replication.

4.1.  Cursors

   Real-time synchronization streams include per-message cursors to
   improve transmission reliability.  Cursors are positive integers that
   increase monotonically across the stream.  Cursor semantics are
   flexible, and they may contain arbitrary gaps between consecutive
   messages.

   Consumers track the last cursor value they successfully processed and
   can specify this cursor when reconnecting to receive any missed
   messages within the provider's rollback window.  Providers maintain
   no persistent consumer state across connections, relying entirely on
   the cursor values supplied by consumers during reconnection.

   Stream behavior depends on the cursor value specified during
   connection:

Holmgren & Newbold        Expires 18 March 2026                [Page 16]
Internet-Draft              AT Repo and Sync              September 2025

   *  *No cursor specified*: The provider begins transmitting from the
      current stream position, providing only new messages generated
      after the connection is established.

   *  *Future cursor*: When the requested cursor exceeds the current
      stream cursor, the provider sends an error message and closes the
      connection.

   *  *Cursor within rollback window*: The provider transmits all
      persisted messages with cursor numbers greater than or equal to
      the requested cursor, then continues with the real-time stream
      once caught up.

   *  *Cursor older than rollback window*: The provider sends an
      informational message indicating that the requested cursor is too
      old, then begins transmission at the oldest available event, sends
      the entire rollback window, and continues with the real-time
      stream.

   *  *Cursor value of 0*: The provider treats this as a request for the
      complete available history, starting at the oldest available
      event, transmitting the entire rollback window, then continuing
      with the real-time stream.

4.2.  Streaming Events

   The real-time stream delivers two types of events: commit and sync.

4.2.1.  Commit Events

   Commit events represent an atomic set of repository modifications and
   consist of a repository diff combined with some supporting metadata.

   The diff MUST include the new commit and all blocks in the Merkle
   proof chain for any modified key, as well as blocks for keys directly
   adjacent to the modified keys.  The rationale for including adjacent
   keys is detailed in Section 4.3.

   The metadata provides additional context required for processing and
   verification and includes:

   *  The revision of the repository after the modifications

   *  The revision of the repository before the diff

   *  The root hash of the repository MST before the diff

Holmgren & Newbold        Expires 18 March 2026                [Page 17]
Internet-Draft              AT Repo and Sync              September 2025

   *  A description of the operations contained in the diff with each
      containing

      -  the key

      -  the hash of the new record at the key (in the case of a create/
         update)

      -  the hash of the old record at the key (in the case of an
         update/delete)

   A single commit events must contain no more than 200 repository
   operations and the full serialized event should be no larger than
   2MB.  Mutations that do not fit in these limits should instead be
   communicated through Sync Events.

4.2.2.  Sync Events

   Sync events declare the current state of a repository, regardless of
   the previous state.

   Sync events are emitted when commit events cannot adequately describe
   the transition between repository revisions.  This may occur in
   several scenarios:

   *  Large mutations that exceed the practical size limits for commit
      events

   *  Data loss or corruption that breaks the continuity of commit
      history

   *  Account migration between different infrastructure providers

   In these cases, a sync event provides a reset point that encourages
   consumers to resynchronize against the current authoritative state
   without requiring knowledge of the intervening changes.

4.3.  Commit Validation

   Commit validation occurs through a two-step process that ensures both
   the validity of the repository transition and the consumer's
   resulting synchronization state.

   First, the consumer validates that the commit represents a valid
   transition from a previous repository revision (revA) to the new
   revision (revB).  Second, the consumer confirms that they last
   observed the repository at revA.  Together, these steps establish
   that the repository is now definitively at revB.

Holmgren & Newbold        Expires 18 March 2026                [Page 18]
Internet-Draft              AT Repo and Sync              September 2025

   The validation process inverts all operations against the partial MST
   provided in the diff.  That is, each “create” operation will be
   inverted as a “delete” operation on the same key and applied to the
   tree.  Each “delete” will become a “create” of the same record, and
   every “update” will be updated back to the previous value.

   If the operation list is complete and accurate, applying the inverse
   operations will reconstruct the tree state as it existed before the
   commit.  The hash of this reconstructed tree must match the previous
   root hash of the MST as specified in the commit event.  If the hashes
   match then the provided list of operations is accurate and
   exhaustive.

   Because the previous MST root hash is included in the commit event,
   commits can be validated for internal consistency independent of any
   local state.  If the operation inversion process fails to produce a
   tree hash matching the declared previous root, the entire commit
   event should be treated as invalid.

   If the commit is internally consistent but its declared previous root
   does not match the previous MST root stored locally, then the
   consumer has become desynchronized, indicating missed events or a
   disjunction in the producer’s commit history.

4.4.  Re-synchronization

   When a consumer detects desynchronization, either through a
   disjunction in commit history or a sync event that does not match
   their local state, they must perform a complete re-synchronization
   process to restore consistency with the current repository state.

   Re-synchronization requires fetching and processing the full
   repository structure, though the record contents themselves are
   optional depending on the consumer's needs.  If the repository data
   is delivered in pre-order traversal, it can be validated
   incrementally as it streams in, producing a mapping of keys to record
   hashes that represents the complete repository state.

   This key-to-hash mapping can be compared against existing local state
   to identify discrepancies and verify the integrity of the re-
   synchronization.  Once validated, this mapping establishes the new
   baseline state against which future commit events can be applied.

   During the re-synchronization process, any incoming commit events for
   the repository should be buffered rather than processed immediately.
   Once re-synchronization completes successfully, these buffered
   commits can be validated and applied in sequence to bring the
   consumer fully up to date with the current repository state.

Holmgren & Newbold        Expires 18 March 2026                [Page 19]
Internet-Draft              AT Repo and Sync              September 2025

5.  Security Considerations

   Repositories constitute untrusted input as account holders have
   complete control over repository contents and repository hosts
   control binary encoding.  Implementations must handle potential
   denial of service vectors from both malicious actors and accidental
   conditions such as corrupted data or implementation bugs.

5.1.  CBOR Processing limits

   Generic precautions must be followed when processing CBOR data,
   including enforcement of maximum serialized object size, maximum
   recursion depth for nested structures, and memory budget limits for
   deserialized data.  While some CBOR implementations include these
   protections by default, implementations should verify and configure
   appropriate limits regardless of library defaults.

5.2.  MST Structure Attacks

   The efficiency of MST data structures depends on a uniform
   distribution of key hashes.  Since account holders control record
   keys, they can perform key mining to generate sets of keys with
   specific layer assignments and sorting characteristics, resulting in
   inefficient tree structures.  Such attacks can cause excessive
   storage overhead and network amplification during synchronization.

   To mitigate these attacks, implementations should:

   *  Limit the number of entries per MST node to a statistically
      reasonable maximum

   *  Impose limits on overall repository height

   *  Monitor and restrict other structural parameters that could be
      exploited through sophisticated key mining

5.3.  Repository Import Validation

   When importing repositories, implementations should verify the
   completeness and integrity of the repository structure.  Serialized
   repositories may contain additional unrelated blocks beyond those
   required for the repository structure.  Care should be taken during
   storage to avoid resource waste on unreferenced blocks and to prevent
   potential storage exhaustion attacks.

6.  References

6.1.  Normative References

Holmgren & Newbold        Expires 18 March 2026                [Page 20]
Internet-Draft              AT Repo and Sync              September 2025

   [CBOR]     Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", STD 94, RFC 8949,
              DOI 10.17487/RFC8949, December 2020,
              <https://www.rfc-editor.org/rfc/rfc8949>.

   [CONTROLLEDID]
              Longley, D., Sporny, M., Sabadello, M., Reed, D., Steele,
              O., and C. Allen, "Controlled Identifiers v1.0", May 2025,
              <https://www.w3.org/TR/2025/REC-cid-1.0-20250515/>.

   [DID]      Sporny, M., Longley, D., Sabadello, M., Reed, D., Steele,
              O., and C. Allen, "Decentralized Identifiers (DIDs) v1.0",
              July 2022,
              <https://www.w3.org/TR/2022/REC-did-core-20220719/>.

   [DWARF]    DWARF Debugging Information Format Committee, "DWARF
              Debugging Information Format Version 5", February 2017,
              <https://dwarfstd.org/doc/DWARF5.pdf>.

   [RFC7049]  Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
              October 2013, <https://www.rfc-editor.org/rfc/rfc7049>.

   [SEC2]     Standards for Efficient Cryptography Group, "SEC 2:
              Recommended Elliptic Curve Domain Parameters", January
              2010, <https://www.secg.org/sec2-v2.pdf>.

6.2.  Informative References

   [AT-ARCH]  Newbold, B. and D. Holmgren, "Authenticated Transfer:
              Architecture Overview", September 2025.

   [DIDPLC]   Holmgren, D., "did:plc Method Specification v0.1", May
              2023, <https://web.plc.directory/spec/v0.1/did-plc>.

   [DIDWEB]   Gribneau, C., Prorock, M., Steele, O., Terbu, O., Xu, M.,
              and D. Zagidulin, "did:web Method Specification (Draft)",
              July 2024, <https://w3c-ccg.github.io/did-method-web/>.

   [MST]      Auvolat, A. and F. Taïani, "Merkle Search Trees: Efficient
              State-Based CRDTs in Open Networks", October 2019,
              <https://inria.hal.science/hal-02303490/document>.

Authors' Addresses

   Daniel Holmgren
   Bluesky Social
   Email: daniel@blueskyweb.xyz

Holmgren & Newbold        Expires 18 March 2026                [Page 21]
Internet-Draft              AT Repo and Sync              September 2025

   Bryan Newbold
   Bluesky Social
   Email: bryan@blueskyweb.xyz

Holmgren & Newbold        Expires 18 March 2026                [Page 22]
Authenticated Transfer Repository and Synchronization draft-holmgren-at-repository-00

Authenticated Transfer Repository and Synchronization
draft-holmgren-at-repository-00