Verifiable Distributed Aggregation Functions
draftirtfcfrgvdaf12
This document is an InternetDraft (ID) that has been submitted to the Internet Research Task Force (IRTF) stream.
This ID is not endorsed by the IETF and has no formal standing in the
IETF standards process.
The information below is for an old version of the document.
Document  Type 
This is an older version of an InternetDraft whose latest revision state is "Active".



Authors  Richard Barnes , David Cook , Christopher Patton , Phillipp Schoppmann  
Last updated  20241004 (Latest revision 20240822)  
Replaces  draftpattoncfrgvdaf  
RFC stream  Internet Research Task Force (IRTF)  
Formats  
Additional resources  Mailing list discussion  
Stream  IRTF state  Active RG Document  
Consensus boilerplate  Unknown  
Document shepherd  (None)  
IESG  IESG state  ID Exists  
Telechat date  (None)  
Responsible AD  (None)  
Send notices to  (None) 
draftirtfcfrgvdaf12
CFRG R. L. Barnes InternetDraft Cisco Intended status: Informational D. Cook Expires: 7 April 2025 ISRG C. Patton Cloudflare P. Schoppmann Google 4 October 2024 Verifiable Distributed Aggregation Functions draftirtfcfrgvdaf12 Abstract This document describes Verifiable Distributed Aggregation Functions (VDAFs), a family of multiparty protocols for computing aggregate statistics over user measurements. These protocols are designed to ensure that, as long as at least one aggregation server executes the protocol honestly, individual measurements are never seen by any server in the clear. At the same time, VDAFs allow the servers to detect if a malicious or misconfigured client submitted an measurement that would result in an invalid aggregate result. Discussion Venues This note is to be removed before publishing as an RFC. Discussion of this document takes place on the Crypto Forum Research Group mailing list (cfrg@ietf.org), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=cfrg. Source for this draft and an issue tracker can be found at https://github.com/cjpatton/vdaf. Status of This Memo This InternetDraft is submitted in full conformance with the provisions of BCP 78 and BCP 79. InternetDrafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as InternetDrafts. The list of current Internet Drafts is at https://datatracker.ietf.org/drafts/current/. Barnes, et al. Expires 7 April 2025 [Page 1] InternetDraft VDAF October 2024 InternetDrafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use InternetDrafts as reference material or to cite them other than as "work in progress." This InternetDraft will expire on 7 April 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ licenseinfo) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Change Log . . . . . . . . . . . . . . . . . . . . . . . 8 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 15 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4. Definition of DAFs . . . . . . . . . . . . . . . . . . . . . 18 4.1. Sharding . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2. Preparation . . . . . . . . . . . . . . . . . . . . . . . 22 4.3. Validity of Aggregation Parameters . . . . . . . . . . . 22 4.4. Aggregation . . . . . . . . . . . . . . . . . . . . . . . 22 4.5. Unsharding . . . . . . . . . . . . . . . . . . . . . . . 23 4.6. Execution of a DAF . . . . . . . . . . . . . . . . . . . 24 5. Definition of VDAFs . . . . . . . . . . . . . . . . . . . . . 26 5.1. Sharding . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2. Preparation . . . . . . . . . . . . . . . . . . . . . . . 29 5.3. Validity of Aggregation Parameters . . . . . . . . . . . 32 5.4. Aggregation . . . . . . . . . . . . . . . . . . . . . . . 32 5.5. Unsharding . . . . . . . . . . . . . . . . . . . . . . . 32 5.6. Execution of a VDAF . . . . . . . . . . . . . . . . . . . 33 5.7. Communication Patterns for Preparation . . . . . . . . . 35 5.8. PingPong Topology (Only Two Aggregators) . . . . . . . . 36 5.9. Star Topology (Any Number of Aggregators) . . . . . . . . 42 6. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1. Finite Fields . . . . . . . . . . . . . . . . . . . . . . 43 6.1.1. Auxiliary Functions . . . . . . . . . . . . . . . . . 45 6.1.2. NTTFriendly Fields . . . . . . . . . . . . . . . . . 46 6.1.3. Parameters . . . . . . . . . . . . . . . . . . . . . 46 6.2. Extendable Output Functions . . . . . . . . . . . . . . . 47 6.2.1. XofTurboShake128 . . . . . . . . . . . . . . . . . . 49 Barnes, et al. Expires 7 April 2025 [Page 2] InternetDraft VDAF October 2024 6.2.2. XofFixedKeyAes128 . . . . . . . . . . . . . . . . . . 50 6.2.3. The Domain Separation Tag and Binder String . . . . . 51 7. Prio3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.1. Fully Linear Proof (FLP) Systems . . . . . . . . . . . . 53 7.1.1. Encoding the Input . . . . . . . . . . . . . . . . . 57 7.1.2. Multiple Proofs . . . . . . . . . . . . . . . . . . . 58 7.2. Construction . . . . . . . . . . . . . . . . . . . . . . 58 7.2.1. Sharding . . . . . . . . . . . . . . . . . . . . . . 60 7.2.2. Preparation . . . . . . . . . . . . . . . . . . . . . 65 7.2.3. Validity of Aggregation Parameters . . . . . . . . . 68 7.2.4. Aggregation . . . . . . . . . . . . . . . . . . . . . 68 7.2.5. Unsharding . . . . . . . . . . . . . . . . . . . . . 69 7.2.6. Auxiliary Functions . . . . . . . . . . . . . . . . . 69 7.2.7. Message Serialization . . . . . . . . . . . . . . . . 71 7.3. The FLP of BBCGGI19 . . . . . . . . . . . . . . . . . . . 74 7.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . 74 7.3.2. Validity Circuits . . . . . . . . . . . . . . . . . . 77 7.3.3. Construction . . . . . . . . . . . . . . . . . . . . 79 7.4. Instantiations . . . . . . . . . . . . . . . . . . . . . 84 7.4.1. Prio3Count . . . . . . . . . . . . . . . . . . . . . 84 7.4.2. Prio3Sum . . . . . . . . . . . . . . . . . . . . . . 85 7.4.3. Prio3SumVec . . . . . . . . . . . . . . . . . . . . . 88 7.4.4. Prio3Histogram . . . . . . . . . . . . . . . . . . . 93 7.4.5. Prio3MultihotCountVec . . . . . . . . . . . . . . . . 96 8. Poplar1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.1. Incremental Distributed Point Functions (IDPFs) . . . . . 102 8.1.1. Encoding inputs as indices . . . . . . . . . . . . . 104 8.2. Construction . . . . . . . . . . . . . . . . . . . . . . 105 8.2.1. Sharding . . . . . . . . . . . . . . . . . . . . . . 107 8.2.2. Preparation . . . . . . . . . . . . . . . . . . . . . 110 8.2.3. Validity of Aggregation Parameters . . . . . . . . . 113 8.2.4. Aggregation . . . . . . . . . . . . . . . . . . . . . 114 8.2.5. Unsharding . . . . . . . . . . . . . . . . . . . . . 115 8.2.6. Message Serialization . . . . . . . . . . . . . . . . 115 8.3. The IDPF scheme of BBCGGI21 . . . . . . . . . . . . . . . 120 8.3.1. Key Generation . . . . . . . . . . . . . . . . . . . 120 8.3.2. Key Evaluation . . . . . . . . . . . . . . . . . . . 122 8.3.3. Auxiliary Functions . . . . . . . . . . . . . . . . . 125 8.4. Instantiation . . . . . . . . . . . . . . . . . . . . . . 126 9. Security Considerations . . . . . . . . . . . . . . . . . . . 126 9.1. Requirements for the Verification Key . . . . . . . . . . 127 9.2. Requirements for the Nonce . . . . . . . . . . . . . . . 128 9.3. Requirements for the Public Share . . . . . . . . . . . . 129 9.4. Requirements for Aggregation Parameters . . . . . . . . . 129 9.4.1. Additional Privacy Considerations . . . . . . . . . . 129 9.4.2. Safe Usage of IDPF Outputs . . . . . . . . . . . . . 130 9.5. Requirements for XOFs . . . . . . . . . . . . . . . . . . 130 9.6. Choosing FLP Parameters . . . . . . . . . . . . . . . . . 131 Barnes, et al. Expires 7 April 2025 [Page 3] InternetDraft VDAF October 2024 9.7. Choosing the Number of Aggregators . . . . . . . . . . . 132 9.8. DefenseinDepth Measures . . . . . . . . . . . . . . . . 132 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 133 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 135 11.1. Normative References . . . . . . . . . . . . . . . . . . 135 11.2. Informative References . . . . . . . . . . . . . . . . . 135 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 138 Test Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Prio3Count . . . . . . . . . . . . . . . . . . . . . . . . . 139 Prio3Sum . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Prio3SumVec . . . . . . . . . . . . . . . . . . . . . . . . . 140 Prio3Histogram . . . . . . . . . . . . . . . . . . . . . . . 140 Prio3MultihotCountVec . . . . . . . . . . . . . . . . . . . . 140 Poplar1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 140 1. Introduction (RFC EDITOR: remove this paragraph.) The source for this draft and and the reference implementation can be found at https://github.com/cfrg/draftirtfcfrgvdaf. The ubiquity of the Internet makes it an ideal platform for measurement of largescale phenomena, whether public health trends or the behavior of computer systems at scale. There is substantial overlap, however, between information that is valuable to measure and information that users consider private. For example, consider an application that provides health information to users. The operator of an application might want to know which parts of their application are used most often, as a way to guide future development of the application. Specific users' patterns of usage, though, could reveal sensitive things about them, such as which users are researching a given health condition. In many situations, the measurement collector is only interested in aggregate statistics, e.g., which portions of an application are most used or what fraction of people have experienced a given disease. Thus systems that provide aggregate statistics while protecting individual measurements can deliver the value of the measurements while protecting users' privacy. This problem is often formulated in terms of differential privacy (DP) [Dwo06]. Roughly speaking, a data aggregation system that is differentially private ensures that the degree to which any individual measurement influences the value of the aggregate result can be precisely controlled. For example, in systems like RAPPOR Barnes, et al. Expires 7 April 2025 [Page 4] InternetDraft VDAF October 2024 [EPK14], each user samples noise from a wellknown distribution and adds it to their measurement before submitting to the aggregation server. The aggregation server then adds up the noisy measurements, and because it knows the distribution from which the noise was sampled, it can estimate the true sum with reasonable accuracy. Even when noise is added to the measurements, collecting them in the clear still reveals a significant amount of information to the collector. On the one hand, depending on the "amount" of noise a client adds to its measurement, it may be possible for a curious collector to make a reasonable guess of the measurement's true value. On the other hand, the more noise the clients add, the less reliable will be the server's estimate of the aggregate. Thus systems relying solely on a DP mechanism must strike a delicate balance between privacy and utility. The ideal goal for a privacypreserving measurement system is that of secure multiparty computation (MPC): no participant in the protocol should learn anything about an individual measurement beyond what it can deduce from the aggregate. MPC achieves this goal by distributing the computation of the aggregate across multiple aggregation servers, one of which is presumed to be honest, i.e., not under control of the attacker. Moreover, MPC can be composed with various DP mechanisms to ensure the aggregate itself does leak too much information about any one of the measurements [MPRV09]. This document describes two classes of MPC protocols, each aiming for a different set of goals. In a Distributed Aggregation Function (DAF, Section 4), each client splits its measurement into multiple secret shares, one for each aggregation server. DAFs require two properties of the secret sharing scheme. First, we can reconstruct the underlying measurement by simply adding up all of the shares. (Typically the shares are vectors over some finite field.) Second, given all but one of the shares, it is impossible to learn anything about the underlying measurement. These properties give rise to a simple strategy for privately aggregating the measurements: each aggregation server adds up its measurement shares locally before revealing their sum to the data collector; then all the data collector has to do is add up these sums to get the aggregate. Barnes, et al. Expires 7 April 2025 [Page 5] InternetDraft VDAF October 2024 This strategy is compatible with any aggregation function that can be represented as the sum of some encoding of the measurements. Examples include: summary statistics such as sum, mean, and standard deviation; estimation of quantiles, e.g., median; histograms; linear regression; or counting data structures, e.g., Bloom filters. However, not all functions fit into this rubric, as it is constrained to linear computations over the encoded measurements. In fact, our framework admits DAFs with slightly more functionality, computing aggregation functions of the form f(agg_param, meas_1, ..., meas_N) = g(agg_param, meas_1) + ... + g(agg_param, meas_N) where meas_1, ..., meas_N are the measurements, g is a possibly non linear function, and agg_param is a parameter of that function chosen by the data collector. This paradigm, known as function secret sharing [BGI15], allows for more sophisticated data analysis tasks, such as grouping metrics by private client attributes [MPDST25] or computing heavy hitters [BBCGGI21]. (More on the latter task below.) The second class of protocols defined in this document are called Verifiable Distributed Aggregation Functions (VDAFs, Section 5). In addition to being private, VDAFs are verifiable in the following sense. By design, a secret sharing of a valid measurement, e.g., a number between 1 and 10, is indistinguishable from a secret sharing of an invalid measurement, e.g., a number larger than 10. This means that DAFs are vulnerable to attacks from malicious clients attempting to disrupt the computation by submitting invalid measurements. Thus VDAFs are designed to allow the servers to detect and remove these measurements prior to aggregation. We refer to this property as robustness. Achieving robustness without sacrificing privacy requires the servers to interact with one another over a number of rounds of communication. DAFs on the other hand are noninteractive, and are therefore easier to deploy; but they do not provide robustness on their own. This may be tolerable in some applications. For instance, if the client's software is executed in a trusted execution environment, it may be reasonable to assume that no client is malicious. The DAF and VDAF abstractions encompass a variety of MPC techniques in the literature. These protocols vary in their operational and security requirements, sometimes in subtle but consequential ways. This document therefore has two important goals: Barnes, et al. Expires 7 April 2025 [Page 6] InternetDraft VDAF October 2024 1. Providing higherlevel protocols like [DAP] (RFC EDITOR: remove this reference if not published before the current document) with a simple, uniform interface for accessing privacypreserving measurement schemes, documenting relevant operational and security requirements, and specifying constraints for safe usage: 1. General patterns of communications among the various actors involved in the system (clients, aggregation servers, and the collector of the aggregate result); 2. Capabilities of a malicious coalition of servers attempting to divulge information about client measurements; and 3. Conditions that are necessary to ensure that malicious clients cannot corrupt the computation. 2. Providing cryptographers with design criteria that provide a clear deployment roadmap for new constructions. This document also specifies two concrete VDAF schemes, each based on a protocol from the literature. * The Prio system [CGB17] allows for the privacypreserving computation of a variety of aggregate statistics, combining additive secret sharing as described above with a mechanism for checking the validity of each measurement. In Section 7 we specify Prio3, a VDAF that follows the same overall framework as the original Prio protocol, but incorporates techniques introduced in [BBCGGI19] that result in significant performance gains. * The Poplar protocol [BBCGGI21] solves the heavyhitters problem in a privacypreserving manner. Here each client holds a bitstring, and the goal of the aggregation servers is to compute the set of strings that occur at least t times for some threshold t. The core primitive in their protocol is a secret sharing of a point function [GI14] (g in the notation above) that allows the servers to privately count how many of the clients' strings begin with a given prefix (agg_param in the notation above). In Section 8 we specify a VDAF called Poplar1 that implements this functionality. Barnes, et al. Expires 7 April 2025 [Page 7] InternetDraft VDAF October 2024 The remainder of this document is organized as follows: Section 2 lists definitions and conventions used for specification; Section 3 gives a brief overview of DAFs and VDAFs, the parties involved in the computation, and the requirements for noncollusion; Section 4 defines the syntax for DAFs; Section 5 defines the syntax for VDAFs; Section 6 defines various functionalities that are common to our constructions; Section 7 describes the Prio3 construction; Section 8 describes the Poplar1 construction; and Section 9 enumerates the security considerations for DAFs and VDAFs. 1.1. Change Log (RFC EDITOR: remove this section.) (*) Indicates a change that breaks wire compatibility with the previous draft. 12: * (V)DAF: Add an application context string parameter to sharding and preparation. The motivation for this change is to harden Prio3 against offline attacks. More generally, however, it allows designing schemes for which correct execution requires agreement on the application context. Accordingly, both Prio3 and Poplar1 have been modified to include the context in the domain separation tag of each XOF invocation. (*) * Prio3: Improve soundness of the base proof system and the circuits of some variants. Generally speaking, wherever we evaluate a univariate polynomial at a random point, we can instead evaluate a multivariate polynomial of lower degree. (*) * Prio3: Replace the helper's measurement and proof share seeds with a single seed. (*) * Prio3Sum: Update the circuit to support a more general range check and avoid using joint randomness. (*) * Prio3Histogram, Prio3MultihotCountVec: Move the final reduction of the intermediate outputs out of the circuit. (*) * IDPF: Add the application context string to key generation end evaluation and bind it to the fixed AES key. (*) * IDPF: Use XofTurboShake128 for deriving the leaf nodes in order to ensure the construction is extractable. (*) * IDPF: Simplify the public share encoding. (*) Barnes, et al. Expires 7 April 2025 [Page 8] InternetDraft VDAF October 2024 * XofTurboShake128: Change SEED_SIZE from 16 bytes to 32 to mitigate offline attacks on Prio3 robustness. In addition, allow seeds of different lengths so that we can continue to use XofTurboShake128 with IDPF. (*) * XofTurboShake128, XofFixedKeyAes128: Increase the length prefix for the domain separation tag from one by to two bytes. This is to accommodate the application context. (*) * Reassign codepoints for all Prio3 variants and Poplar1. (*) * Security considerations: Add a section on defenseindepth measures taken by Prio3 and Poplar1 and more discussion about choosing FLP parameters. 11: * Define message formats for the Poplar1 aggregation parameter and IDPF public share. * IDPF: Require the IDPF binder must be a random nonce. * VDAF: Replace the pseudocode description of the pingping topology with Python and sketch the star topology. * DAF: Align aggregation parameter validation with VDAF. * Replace Union[A, B] type with A  B. * Rename FFT ("Fast Fourier Transform") with NTT ("Number Theoretic Transform"). 10: * Define Prio3MultihotCountVec, a variant of Prio3 for aggregating bit vectors with bounded weight. * FLP: Allow the output of the circuit to be a vector. This makes it possible to skip joint randomness derivation in more cases. * Poplar1: On the first round of preparation, handle None as an error. Previously this message was interpreted as a length3 vector of zeros. * Prio3: Move specification of the field from the FLP validity circuit to the VDAF itself. Barnes, et al. Expires 7 April 2025 [Page 9] InternetDraft VDAF October 2024 * Clarify the extent to which the attacker controls the network in our threat models for privacy and robustness. * Clean up various aspects of the code, including: Follow existing objectoriented programming patterns for Python more closely; make the type hints enforceable; and avoid shadowing variables. * Poplar1: Align terminology with [BBCGGI23]. * IDPF: Add guidance for encoding byte strings as indices. 09: * Poplar1: Make prefix tree traversal stricter by requiring each node to be a child of a node that was already visited. This change is intended to make it harder for a malicious Aggregator to steer traversal towards nonheavyhitting measurements. * Prio3: Add more explicit guidance for choosing the field size. * IDPF: Define extractability and clarify (un)safe usage of intermediate prefix counts. Accordingly, add text ensuring public share consistency to security considerations. 08: * Poplar1: Bind the report nonce to the authenticator vector programmed into the IDPF. (*) * IdpfPoplar: Modify extend() by stealing each control bit from its corresponding seed. This improves performance by reducing the number of AES calls per level from 3 to 2. The cost is a slight reduction in the concrete privacy bound. (*) * Prio3: Add support for generating and verifying mutliple proofs per measurement. This enables a tradeoff between communication cost and runtime: if more proofs are used, then a smaller field can be used without impacting robustness. (*) * Replace SHAKE128 with TurboSHAKE128. (*) 07: * Rename PRG to XOF ("eXtendable Output Function"). Accordingly, rename PrgSha3 to XofShake128 and PrgFixedKeyAes128 to XofFixedKeyAes128. "PRG" is a misnomer since we don't actually treat this object as a pseudorandom generator in existing security analysis. Barnes, et al. Expires 7 April 2025 [Page 10] InternetDraft VDAF October 2024 * Replace cSHAKE128 with SHAKE128, reimplementing domain separation for the customization string using a simpler scheme. This change addresses the reality that implementations of cSHAKE128 are less common. (*) * Define a new VDAF, called Prio3SumVec, that generalizes Prio3Sum to a vector of summands. * Prio3Histogram: Update the codepoint and use the parallel sum optimization introduced by Prio3SumVec to reduce the proof size. (*) * Daf, Vdaf: Rename interface methods to match verbiage in the draft. * Daf: Align with Vdaf by adding a nonce to shard() and prep(). * Vdaf: Have prep_init() compute the first prep share. This change is intended to simplify the interface by making the input to prep_next() not optional. * Prio3: Split sharding into two auxiliary functions, one for sharding with joint randomness and another without. This change is intended to improve readability. * Fix bugs in the pingpong interface discovered after implementing it. 06: * Vdaf: Define a wrapper interface for preparation that is suitable for the "pingpong" topology in which two Aggregators exchange messages over a request/response protocol, like HTTP, and take turns executing the computation until input from the peer is required. * Prio3Histogram: Generalize the measurement type so that the histogram can be used more easily with discrete domains. (*) * Daf, Vdaf: Change the aggregation parameter validation algorithm to take the set of previous parameters rather than a list. (The order of the parameters is irrelevant.) * Daf, Vdaf, Idpf: Add parameter RAND_SIZE that specifies the number of random bytes consumed by the randomized algorithm (shard() for Daf and Vdaf and gen() for Idpf). 05: Barnes, et al. Expires 7 April 2025 [Page 11] InternetDraft VDAF October 2024 * IdpfPoplar: Replace PrgSha3 with PrgFixedKeyAes128, a fixedkey mode for AES128 based on a construction from [GKWWY20]. This change is intended to improve performance of IDPF evaluation. Note that the new PRG is not suitable for all applications. (*) * Idpf: Add a binder string to the keygeneration and evaluation algorithms. This is used to plumb the nonce generated by the Client to the PRG. * Plumb random coins through the interface of randomized algorithms. Specifically, add a random input to (V)DAF sharding algorithm and IDPF keygeneration algorithm and require implementations to specify the length of the random input. Accordingly, update Prio3, Poplar1, and IdpfPoplar to match the new interface. This change is intended to improve coverage of test vectors. * Use littleendian byteorder for field element encoding. (*) * Poplar1: Move the last step of sketch evaluation from prep_next() to prep_shares_to_prep(). 04: * Align security considerations with the security analysis of [DPRS23]. * Vdaf: Pass the nonce to the sharding algorithm. * Vdaf: Rather than allow the application to choose the nonce length, have each implementation of the Vdaf interface specify the expected nonce length. (*) * Prg: Split "info string" into two components: the "customization string", intended for domain separation; and the "binder string", used to bind the output to ephemeral values, like the nonce, associated with execution of a (V)DAF. * Replace PrgAes128 with PrgSha3, an implementation of the Prg interface based on SHA3, and use the new scheme as the default. Accordingly, replace Prio3Aes128Count with Prio3Count, Poplar1Aes128 with Poplar1, and so on. SHA3 is a safer choice for instantiating a random oracle, which is used in the analysis of Prio3 of [DPRS23]. (*) * Prio3, Poplar1: Ensure each invocation of the Prg uses a distinct customization string, as suggested by [DPRS23]. This is intended to make domain separation clearer, thereby simplifying security analysis. (*) Barnes, et al. Expires 7 April 2025 [Page 12] InternetDraft VDAF October 2024 * Prio3: Replace "joint randomness hints" sent in each input share with "joint randomness parts" sent in the public share. This reduces communication overhead when the number of shares exceeds two. (*) * Prio3: Bind nonce to joint randomness parts. This is intended to address birthday attacks on robustness pointed out by [DPRS23]. (*) * Poplar1: Use different Prg invocations for producing the correlated randomness for inner and leaf nodes of the IDPF tree. This is intended to simplify implementations. (*) * Poplar1: Don't bind the candidate prefixes to the verifier randomness. This is intended to improve performance, while not impacting security. According to the analysis of [DPRS23], it is necessary to restrict Poplar1 usage such that no report is aggregated more than once at a given level of the IDPF tree; otherwise, attacks on privacy may be possible. In light of this restriction, there is no added benefit of binding to the prefixes themselves. (*) * Poplar1: During preparation, assert that all candidate prefixes are unique and appear in order. Uniqueness is required to avoid erroneously rejecting a valid report; the ordering constraint ensures the uniqueness check can be performed efficiently. (*) * Poplar1: Increase the maximum candidate prefix count in the encoding of the aggregation parameter. (*) * Poplar1: Bind the nonce to the correlated randomness derivation. This is intended to provide defenseindepth by ensuring the Aggregators reject the report if the nonce does not match what the Client used for sharding. (*) * Poplar1: Clarify that the aggregation parameter encoding is OPTIONAL. Accordingly, update implementation considerations around crossaggregation state. * IdpfPoplar: Add implementation considerations around branching on the values of control bits. * IdpfPoplar: When decoding the the control bits in the public share, assert that the trailing bits of the final byte are all zero. (*) 03: Barnes, et al. Expires 7 April 2025 [Page 13] InternetDraft VDAF October 2024 * Define codepoints for (V)DAFs and use them for domain separation in Prio3 and Poplar1. (*) * Prio3: Align joint randomness computation with revised paper [BBCGGI19]. This change mitigates an attack on robustness. (*) * Prio3: Remove an intermediate PRG evaluation from query randomness generation. (*) * Add additional guidance for choosing FFTfriendly fields. 02: * Complete the initial specification of Poplar1. * Extend (V)DAF syntax to include a "public share" output by the Client and distributed to all of the Aggregators. This is to accommodate "extractable" IDPFs as required for Poplar1. (See [BBCGGI21], Section 4.3 for details.) * Extend (V)DAF syntax to allow the unsharding step to take into account the number of measurements aggregated. * Extend FLP syntax by adding a method for decoding the aggregate result from a vector of field elements. The new method takes into account the number of measurements. * Prio3: Align aggregate result computation with updated FLP syntax. * Prg: Add a method for statefully generating a vector of field elements. * Field: Require that field elements are fully reduced before decoding. (*) * Define new field Field255. 01: * Require that VDAFs specify serialization of aggregate shares. * Define Distributed Aggregation Functions (DAFs). * Prio3: Move proof verifier check from prep_next() to prep_shares_to_prep(). (*) * Remove public parameter and replace verification parameter with a "verification key" and "Aggregator ID". Barnes, et al. Expires 7 April 2025 [Page 14] InternetDraft VDAF October 2024 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Algorithms in this document are written in Python 3. Type hints are used to define input and output types. The type variable F is used in signatures to signify any type that is a subclass of Field. A fatal error in a program (e.g., failure to parse one of the function parameters) is usually handled by raising an exception. A variable with type bytes is a byte string. This document defines several bytestring constants. When comprised of printable ASCII characters, they are written as Python 3 bytestring literals (e.g., b'some constant string'). A global constant VERSION of type int is defined, which algorithms are free to use as desired. Its value SHALL be 12. This document describes algorithms for multiparty computations in which the parties typically communicate over a network. Wherever a quantity is defined that must be be transmitted from one party to another, this document prescribes a particular encoding of that quantity as a byte string. OPEN ISSUE It might be better to not be prescriptive about how quantities are encoded on the wire. See issue #58. Some common functionalities: * zeros(len: int) > bytes returns an array of zero bytes. The length of output MUST be len. * gen_rand(len: int) > bytes returns an array of random bytes generated by a cryptographically secure pseudorandom number generator (CSPRNG). The length of output MUST be len. * byte(int: int) > bytes returns the representation of int as a byte string. The value of int MUST be in [0,256). * concat(parts: list[bytes]) > bytes returns the concatenation of the input byte strings, i.e., parts[0]  ...  parts[len(parts) 1]. Barnes, et al. Expires 7 April 2025 [Page 15] InternetDraft VDAF October 2024 * front(length: int, vec: list[Any]) > (list[Any], list[Any]) splits vec into two vectors, where the first vector is made up of the first length elements of the input. I.e., (vec[:length], vec[length:]). * xor(left: bytes, right: bytes) > bytes returns the bitwise XOR of left and right. An exception is raised if the inputs are not the same length. * to_be_bytes(val: int, length: int) > bytes converts val to big endian bytes; its value MUST be at least 0 but less than 2**(8*length). Function from_be_bytes(encoded: bytes) > int computes the inverse. * to_le_bytes(val: int, length: int) > bytes converts val to littleendian bytes; its value MUST be at least 0 but less than 2**(8*length). Function from_le_bytes(encoded: bytes) > int computes the inverse. * next_power_of_2(n: int) > int returns the smallest integer greater than or equal to n that is also a power of two. * additive_secret_share(vec: list[Field], num_shares: int, field: type) > list[list[Field]] takes a vector of field elements and returns multiple vectors of the same length, such that they all add up to the input vector, and each proper subset of the vectors are indistinguishable from random. * cast(typ: type, val: object) > object returns the input value unchanged. This is only present to assist with static analysis of the Python code. Type checkers will ignore the inferred type of the input value, and assume the output value has the given type. * range(stop) or range(start, stop[, step]) is the range function from the Python standard library. The oneargument form returns the integers from zero (inclusive) to stop, exclusive. The two and threeargument forms allow overriding the start of the range and overriding the step between successive output values. 3. Overview Barnes, et al. Expires 7 April 2025 [Page 16] InternetDraft VDAF October 2024 ++ +> Aggregator 0 +  ++   ^      V   ++   +> Aggregator 1 +    ++   +++  ^  +>++  Client +  +> Collector > Aggregate +++ +>++  ...        V   ++  +> Aggregator N1 + ++ Input shares Aggregate shares Figure 1: Overall data flow of a (V)DAF In a DAF or VDAFbased private measurement system, we distinguish three types of actors: Clients, Aggregators, and Collectors. The overall flow of the measurement process is as follows: * To submit an individual measurement, the Client shards the measurement into "input shares" and sends one input share to each Aggregator. We sometimes refer to this sequence of input shares collectively as the Client's "report". * The Aggregators refine their input shares into "output shares".  Output shares are in onetoone correspondence with the input shares.  Just as each Aggregator receives one input share of each measurement, if this process succeeds, then each aggregator holds one output share.  In VDAFs, Aggregators will need to exchange information among themselves as part of the validation process. * Each Aggregator combines the output shares in the batch to compute the "aggregate share" for that batch, i.e., its share of the desired aggregate result. Barnes, et al. Expires 7 April 2025 [Page 17] InternetDraft VDAF October 2024 * The Aggregators submit their aggregate shares to the Collector, who combines them to obtain the aggregate result over the batch. Aggregators are a new class of actor relative to traditional measurement systems where Clients submit measurements to a single server. They are critical for both the privacy properties of the system and, in the case of VDAFs, the correctness of the measurements obtained. The privacy properties of the system are assured by non collusion among Aggregators, and Aggregators are the entities that perform validation of Client measurements. Thus Clients trust Aggregators not to collude (typically it is required that at least one Aggregator is honest; see Section 9.7), and Collectors trust Aggregators to correctly run the protocol. Within the bounds of the noncollusion requirements of a given (V)DAF instance, it is possible for the same entity to play more than one role. For example, the Collector could also act as an Aggregator, effectively using the other Aggregator(s) to augment a basic client server protocol. In this document, we describe the computations performed by the actors in this system. It is up to the higherlevel protocol making use of the (V)DAF to arrange for the required information to be delivered to the proper actors in the proper sequence. In general, we assume that all communications are confidential and mutually authenticated, with the exception that Clients submitting measurements may be anonymous. 4. Definition of DAFs By way of a gentle introduction to VDAFs, this section describes a simpler class of schemes called Distributed Aggregation Functions (DAFs). Unlike VDAFs, DAFs do not provide verifiability of the computation. Clients must therefore be trusted to compute their input shares correctly. Because of this fact, the use of a DAF is NOT RECOMMENDED for most applications. See Section 9 for additional discussion. A DAF scheme is used to compute a particular "aggregation function" over a set of measurements generated by Clients. Depending on the aggregation function, the Collector might select an "aggregation parameter" and disseminates it to the Aggregators. The semantics of this parameter is specific to the aggregation function, but in general it is used to represent the set of "queries" that can be made on the measurement set. For example, the aggregation parameter is used to represent the candidate prefixes in Poplar1 Section 8. Execution of a DAF has four distinct stages: Barnes, et al. Expires 7 April 2025 [Page 18] InternetDraft VDAF October 2024 * Sharding  Each Client generates input shares from its measurement and distributes them among the Aggregators. * Preparation  Each Aggregator converts each input share into an output share compatible with the aggregation function. This computation involves the aggregation parameter. In general, each aggregation parameter may result in a different an output share. * Aggregation  Each Aggregator combines a sequence of output shares into its aggregate share and sends the aggregate share to the Collector. * Unsharding  The Collector combines the aggregate shares into the aggregate result. Sharding and Preparation are done once per measurement. Aggregation and Unsharding are done over a batch of measurements (more precisely, over the recovered output shares). A concrete DAF specifies an algorithm for the computation needed in each of these stages. The interface of each algorithm is defined in the remainder of this section. In addition, a concrete DAF defines the associated constants and types enumerated in the following table. Barnes, et al. Expires 7 April 2025 [Page 19] InternetDraft VDAF October 2024 +=============+=====================================================+  Parameter  Description  +=============+=====================================================+  ID: int  Algorithm identifier for this    DAF, in range(2**32).  +++  SHARES: int  Number of input shares into which    each measurement is sharded.  +++  NONCE_SIZE:  Size of the nonce passed by the   int  application.  +++  RAND_SIZE:  Size of the random byte string   int  passed to sharding algorithm.  +++  Measurement  Type of each measurement.  +++  PublicShare  Type of each public share.  +++  InputShare  Type of each input share.  +++  AggParam  Type of aggregation parameter.  +++  OutShare  Type of each output share.  +++  AggShare  Type of the aggregate share.  +++  AggResult  Type of the aggregate result.  +++ Table 1: Constants and types defined by each concrete DAF. These types define the inputs and outputs of DAF methods at various stages of the computation. Some of these values need to be written to the network in order to carry out the computation. In particular, it is RECOMMENDED that concrete instantiations of the Daf interface specify a method of encoding the PublicShare, InputShare, and AggShare. Each DAF is identified by a unique, 32bit integer ID. Identifiers for each (V)DAF specified in this document are defined in Table 23. 4.1. Sharding In order to protect the privacy of its measurements, a DAF Client shards its measurements into a sequence of input shares. The shard method is used for this purpose. Barnes, et al. Expires 7 April 2025 [Page 20] InternetDraft VDAF October 2024 * daf.shard(ctx: bytes, measurement: Measurement, nonce: bytes, rand: bytes) > tuple[PublicShare, list[InputShare]] is the randomized sharding algorithm run by each Client that consumes the application context, a measurement, and a nonce and produces a "public share" distributed to each of the Aggregate and a corresponding sequence of input shares, one for each Aggregator. Preconditions:  nonce MUST have length equal to NONCE_SIZE and MUST be generated using a CSPRNG.  rand consists of the random bytes consumed by the algorithm. It MUST have length equal to RAND_SIZE and MUST be generated using a CSPRNG. Postconditions:  The number of input shares MUST equal SHARES. Sharding is bound to a specific "application context". The application context is a string intended to uniquely identify an instance of the higher level protocol that uses the DAF. This is intended to ensure that aggregation succeeds only if the Clients and Aggregators agree on the application context. (Preparation binds the application context, too; see Section 4.2.) Note that, unlike VDAFs (Section 5), there is no explicit signal of disagreement; it may only manifest as a garbled aggregate result. Client ====== measurement  V ++  shard  ++       ...  public_share      +++       V  V  V  input_share_0 input_share_1 input_share_[SHARES1]     ...   V V V V V V Aggregator 0 Aggregator 1 Aggregator SHARES1 Barnes, et al. Expires 7 April 2025 [Page 21] InternetDraft VDAF October 2024 Figure 2: The Client divides its measurement into input shares and distributes them to the Aggregators. The public share is broadcast to all Aggregators. 4.2. Preparation Once an Aggregator has received the public share and one of the input shares, the next step is to prepare the input share for aggregation. This is accomplished using the following algorithm: * daf.prep(ctx: bytes, agg_id: int, agg_param: AggParam, nonce: bytes, public_share: PublicShare, input_share: InputShare) > OutShare is the deterministic preparation algorithm. It takes as input the public share and one of the input shares generated by a Client, the application context, the Aggregator's unique identifier, the aggregation parameter selected by the Collector, and a nonce and returns an output share. Preconditions:  agg_id MUST be in range(SHARES) and match the index of input_share in the sequence of input shares produced by the Client.  nonce MUST have length NONCE_SIZE. 4.3. Validity of Aggregation Parameters Concrete DAFs implementations MAY impose certain restrictions for input shares and aggregation parameters. Protocols using a DAF MUST ensure that for each input share and aggregation parameter agg_param, daf.prep is only called if daf.is_valid(agg_param, previous_agg_params) returns True, where previous_agg_params contains all aggregation parameters that have previously been used with the same input share. DAFs MUST implement the following function: * daf.is_valid(agg_param: AggParam, previous_agg_params: list[AggParam]) > bool: checks if the agg_param is compatible with all elements of previous_agg_params. 4.4. Aggregation Once an Aggregator holds output shares for a batch of measurements (where batches are defined by the application), it combines them into a share of the desired aggregate result: Barnes, et al. Expires 7 April 2025 [Page 22] InternetDraft VDAF October 2024 * daf.aggregate(agg_param: AggParam, out_shares: list[OutShare]) > AggShare is the deterministic aggregation algorithm. It is run by each Aggregator a set of recovered output shares. Aggregator 0 Aggregator 1 Aggregator SHARES1 ============ ============ =================== out_share_0_0 out_share_1_0 out_share_[SHARES1]_0 out_share_0_1 out_share_1_1 out_share_[SHARES1]_1 out_share_0_2 out_share_1_2 out_share_[SHARES1]_2 ... ... ... out_share_0_B out_share_1_B out_share_[SHARES1]_B    V V V ++ ++ ++  aggregate   aggregate  ...  aggregate  ++ ++ ++    V V V agg_share_0 agg_share_1 agg_share_[SHARES1] Figure 3: Aggregation of output shares. `B` indicates the number of measurements in the batch. For simplicity, we have written this algorithm in a "oneshot" form, where all output shares for a batch are provided at the same time. Many DAFs may also support a "streaming" form, where shares are processed one at a time. Implementation note: for most natural DAFs (and VDAFs) it is not necessary for an Aggregator to store all output shares individually before aggregating. Typically it is possible to merge output shares into aggregate shares as they arrive, merge these into other aggregate shares, and so on. In particular, this is the case when the output shares are vectors over some finite field and aggregating them involves merely adding up the vectors elementwise. Such is the case for Prio3 Section 7 and Poplar1 Section 8. 4.5. Unsharding After the Aggregators have aggregated a sufficient number of output shares, each sends its aggregate share to the Collector, who runs the following algorithm to recover the following output: Barnes, et al. Expires 7 April 2025 [Page 23] InternetDraft VDAF October 2024 * daf.unshard(agg_param: AggParam, agg_shares: list[AggShare], num_measurements: int) > AggResult is run by the Collector in order to compute the aggregate result from the Aggregators' shares. The length of agg_shares MUST be SHARES. num_measurements is the number of measurements that contributed to each of the aggregate shares. This algorithm is deterministic. Aggregator 0 Aggregator 1 Aggregator SHARES1 ============ ============ =================== agg_share_0 agg_share_1 agg_share_[SHARES1]    V V V ++  unshard  ++  V agg_result Collector ========= Figure 4: Computation of the final aggregate result from aggregate shares. 4.6. Execution of a DAF Securely executing a DAF involves emulating the following procedure. def run_daf( daf: Daf[ Measurement, AggParam, PublicShare, InputShare, OutShare, AggShare, AggResult, ], ctx: bytes, agg_param: AggParam, measurements: list[Measurement], nonces: list[bytes]) > AggResult: """ Run a DAF on a list of measurements. Preconditions: Barnes, et al. Expires 7 April 2025 [Page 24] InternetDraft VDAF October 2024  `type(agg_param) == daf.AggParam`  `type(measurement) == daf.Measurement` for each `measurement` in `measurements`  `len(nonce) == daf.NONCE_SIZE` for each `nonce` in `nonces`  `len(nonces) == len(measurements)` """ if any(len(nonce) != daf.NONCE_SIZE for nonce in nonces): raise ValueError("incorrect nonce size") if len(nonces) != len(measurements): raise ValueError( "measurements and nonces lists have different lengths" ) out_shares: list[list[OutShare]] out_shares = [[] for j in range(daf.SHARES)] for (measurement, nonce) in zip(measurements, nonces): # Each Client shards its measurement into input shares and # distributes them among the Aggregators. rand = gen_rand(daf.RAND_SIZE) (public_share, input_shares) = \ daf.shard(ctx, measurement, nonce, rand) # Each Aggregator prepares its input share for aggregation. for j in range(daf.SHARES): out_shares[j].append( daf.prep(ctx, j, agg_param, nonce, public_share, input_shares[j])) # Each Aggregator aggregates its output shares into an aggregate # share and sends it to the Collector. agg_shares = [] for j in range(daf.SHARES): agg_share_j = daf.aggregate(agg_param, out_shares[j]) agg_shares.append(agg_share_j) # Collector unshards the aggregate result. num_measurements = len(measurements) agg_result = daf.unshard(agg_param, agg_shares, num_measurements) return agg_result Figure 5: Execution of a DAF. The inputs to this procedure are the same as the aggregation function computed by the DAF: an aggregation parameter and a sequence of measurements. The procedure prescribes how a DAF is executed in a "benign" environment in which there is no adversary and the messages Barnes, et al. Expires 7 April 2025 [Page 25] InternetDraft VDAF October 2024 are passed among the protocol participants over secure pointtopoint channels. In reality, these channels need to be instantiated by some "wrapper protocol", such as [DAP], that realizes these channels using suitable cryptographic mechanisms. Moreover, some fraction of the Aggregators (or Clients) may be malicious and diverge from their prescribed behaviors. Section 9 describes the execution of the DAF in various adversarial environments and what properties the wrapper protocol needs to provide in each. 5. Definition of VDAFs Like DAFs described in the previous section, a VDAF scheme is used to compute a particular aggregation function over a set of Client generated measurements. Evaluation of a VDAF involves the same four stages as for DAFs: Sharding, Preparation, Aggregation, and Unsharding. However, the Preparation stage will require interaction among the Aggregators in order to facilitate verifiability of the computation's correctness. Accommodating this interaction will require syntactic changes. Overall execution of a VDAF comprises the following stages: * Sharding  Computing input shares from an individual measurement * Preparation  Conversion and verification of input shares to output shares compatible with the aggregation function being computed * Aggregation  Combining a sequence of output shares into an aggregate share * Unsharding  Combining a sequence of aggregate shares into an aggregate result In contrast to DAFs, the Preparation stage for VDAFs now performs an additional task: verification of the validity of the recovered output shares. This process ensures that aggregating the output shares will not lead to a garbled aggregate result. The remainder of this section defines the VDAF interface. The attributes are listed in Table 2 are defined by each concrete VDAF. Barnes, et al. Expires 7 April 2025 [Page 26] InternetDraft VDAF October 2024 +=================+==========================================+  Parameter  Description  +=================+==========================================+  ID  Algorithm identifier for this VDAF.  +++  VERIFY_KEY_SIZE  Size (in bytes) of the verification key    (Section 5.2).  +++  RAND_SIZE  Size of the random byte string passed to    sharding algorithm.  +++  NONCE_SIZE  Size (in bytes) of the nonce.  +++  ROUNDS  Number of rounds of communication during    the Preparation stage (Section 5.2).  +++  SHARES  Number of input shares into which each    measurement is sharded (Section 5.1).  +++  Measurement  Type of each measurement.  +++  PublicShare  Type of each public share.  +++  InputShare  Type of each input share.  +++  AggParam  Type of aggregation parameter.  +++  OutShare  Type of each output share.  +++  AggShare  Type of the aggregate share.  +++  AggResult  Type of the aggregate result.  +++  PrepState  Aggregator's state during preparation.  +++  PrepShare  Type of each prep share.  +++  PrepMessage  Type of each prep message.  +++ Table 2: Constants and types defined by each concrete VDAF. Some of these values need to be written to the network in order to carry out the computation. In particular, it is RECOMMENDED that concrete instantiations of the Vdaf interface specify a method of encoding the PublicShare, InputShare, AggShare, PrepShare, and PrepMessage. Barnes, et al. Expires 7 April 2025 [Page 27] InternetDraft VDAF October 2024 Each VDAF is identified by a unique, 32bit integer ID. Identifiers for each (V)DAF specified in this document are defined in Table 23. The following method is defined for each VDAF specified in this document: def domain_separation_tag(self, usage: int, ctx: bytes) > bytes: """ Format domain separation tag for this VDAF with the given application context and usage. Preconditions:  `usage` in `range(2**16)` """ return format_dst(0, self.ID, usage) + ctx It is used to construct a domain separation tag for an instance of Xof used by the VDAF. (See Section 6.2.) 5.1. Sharding Sharding transforms a measurement and nonce into a public share and input shares as it does in DAFs (cf. Section 4.1): * vdaf.shard(ctx: bytes, measurement: Measurement, nonce: bytes, rand: bytes) > tuple[PublicShare, list[InputShare]] is the randomized sharding algorithm run by each Client that consumes the application context, a measurement, and a nonce and produces a public share distributed to each of the Aggregate and a corresponding sequence of input shares, one for each Aggregator. Depending on the VDAF, the input shares may encode additional information used to verify the recovered output shares (e.g., the "proof shares" in Prio3 Section 7) Preconditions:  nonce MUST have length equal to NONCE_SIZE and MUST be generated using a CSPRNG. (See Section 9 for details.)  rand consists of the random bytes consumed by the algorithm. It MUST have length equal to RAND_SIZE and MUST be generated using a CSPRNG. Postconditions:  The number of input shares MUST equal SHARES. Barnes, et al. Expires 7 April 2025 [Page 28] InternetDraft VDAF October 2024 Like DAFs, sharding is bound to the application context via the ctx string. Again, this is intended to ensure that aggregation succeeds only if the Clients and Aggregators agree on the application context. Unlike DAFs, however, disagreement on the context should manifest as a failure to validate the report, causing the report to be rejected without garbling the aggregate result. The application context also provides some defenseindepth against cross protocol attacks; see Section 9.8. 5.2. Preparation To recover and verify output shares, the Aggregators interact with one another over ROUNDS rounds. Prior to each round, each Aggregator constructs an outbound message. Next, the sequence of outbound messages is combined into a single message, called a "preparation message", or "prep message" for short. (Each of the outbound messages are called "preparationmessage shares", or "prep shares" for short.) Finally, the preparation message is distributed to the Aggregators to begin the next round. An Aggregator begins the first round with its input share and it begins each subsequent round with the previous prep message. Its output in the last round is its output share and its output in each of the preceding rounds is a prep share. This process involves a value called the "aggregation parameter" used to map the input shares to output shares. The Aggregators need to agree on this parameter before they can begin preparing the measurement shares for aggregation. Barnes, et al. Expires 7 April 2025 [Page 29] InternetDraft VDAF October 2024 Aggregator 0 Aggregator 1 Aggregator SHARES1 ============ ============ =================== input_share_0 input_share_1 input_share_[SHARES1]   ...  V V V ++ ++ ++  prep_init   prep_init   prep_init  ++ ++ ++     ...   V  V  V  ++ \    prep_shares_to_prep    ++      ...    V V V V V   x ROUNDS ++ ++ ++   prep_next   prep_next   prep_next   ++ ++ ++     ...    V V V V V / ... ... ...   ...  V V V out_share_0 out_share_1 out_share_[SHARES1] Figure 6: VDAF preparation process on the input shares for a single measurement. At the end of the computation, each Aggregator holds an output share or an error. To facilitate the preparation process, a concrete VDAF implements the following methods: * vdaf.prep_init(verify_key: bytes, ctx: bytes, agg_id: int, agg_param: AggParam, nonce: bytes, public_share: PublicShare, input_share: InputShare) > tuple[PrepState, PrepShare] is the deterministic preparationstate initialization algorithm run by each Aggregator to begin processing its input share into an output share. Its inputs are the shared verification key (verify_key), the application context (ctx), the Aggregator's unique identifier (agg_id), the aggregation parameter (agg_param), the nonce provided by the environment (nonce, see Figure 7), the public share (public_share), and one of the input shares generated by the Client (input_share). Its output is the Aggregator's initial preparation state and initial prep share. Barnes, et al. Expires 7 April 2025 [Page 30] InternetDraft VDAF October 2024 It is up to the high level protocol in which the VDAF is used to arrange for the distribution of the verification key prior to generating and processing reports. (See Section 9 for details.) Protocols MUST ensure that public share consumed by each of the Aggregators is identical. This is security critical for VDAFs such as Poplar1. Preconditions:  verify_key MUST have length vdaf.VERIFY_KEY_SIZE.  agg_id MUST be the integer in range(vdaf.SHARES) that matches the index of input_share in the sequence of input shares output by the Client.  nonce MUST have length vdaf.NONCE_SIZE. * vdaf.prep_next(ctx: bytes, prep_state: PrepState, prep_msg: PrepMessage) > tuple[PrepState, PrepShare]  OutShare is the deterministic preparationstate update algorithm run by each Aggregator. It updates the Aggregator's preparation state (prep_state) and returns either its next preparation state and its message share for the current round or, if this is the last round, its output share. An exception is raised if a valid output share could not be recovered. The input of this algorithm is the inbound preparation message. * vdaf.prep_shares_to_prep(ctx: bytes, agg_param: AggParam, prep_shares: list[PrepShare]) > PrepMessage is the deterministic preparationmessage preprocessing algorithm. It combines the prep shares generated by the Aggregators in the previous round into the prep message consumed by each in the next round. In effect, each Aggregator moves through a linear state machine with ROUNDS states. The Aggregator enters the first state on using the initialization algorithm, and the update algorithm advances the Aggregator to the next state. Thus, in addition to defining the number of rounds (ROUNDS), a VDAF instance defines the state of the Aggregator after each round. The preparationstate update accomplishes two tasks: recovery of output shares from the input shares and ensuring that the recovered output shares are valid. The abstraction boundary is drawn so that an Aggregator only recovers an output share if it is deemed valid (at least, based on the Aggregator's view of the protocol). Another way to draw this boundary would be to have the Aggregators recover output shares first, then verify that they are valid. However, this would Barnes, et al. Expires 7 April 2025 [Page 31] InternetDraft VDAF October 2024 allow the possibility of misusing the API by, say, aggregating an invalid output share. Moreover, in protocols like Prio+ [AGJOP21] based on oblivious transfer, it is necessary for the Aggregators to interact in order to recover aggregatable output shares at all. 5.3. Validity of Aggregation Parameters Similar to DAFs (see Section 4.3), VDAFs MAY impose restrictions for input shares and aggregation parameters. Protocols using a VDAF MUST ensure that for each input share and aggregation parameter agg_param, the preparation phase (including vdaf.prep_init, vdaf.prep_next, and vdaf.prep_shares_to_prep; see Section 5.2) is only called if vdaf.is_valid(agg_param, previous_agg_params) returns True, where previous_agg_params contains all aggregation parameters that have previously been used with the same input share. VDAFs MUST implement the following function: * vdaf.is_valid(agg_param: AggParam, previous_agg_params: list[AggParam]) > bool: checks if the agg_param is compatible with all elements of previous_agg_params. 5.4. Aggregation VDAF Aggregation is identical to DAF Aggregation (cf. Section 4.4): * vdaf.aggregate(agg_param: AggParam, out_shares: list[OutShare]) > AggShare is the deterministic aggregation algorithm. It is run by each Aggregator over the output shares it has computed for a batch of measurements. The data flow for this stage is illustrated in Figure 3. Here again, we have the aggregation algorithm in a "oneshot" form, where all shares for a batch are provided at the same time. VDAFs typically also support a "streaming" form, where shares are processed one at a time. 5.5. Unsharding VDAF Unsharding is identical to DAF Unsharding (cf. Section 4.5): * vdaf.unshard(agg_param: AggParam, agg_shares: list[AggShare], num_measurements: int) > AggResult is run by the Collector in order to compute the aggregate result from the Aggregators' shares. The length of agg_shares MUST be SHARES. num_measurements is the number of measurements that contributed to each of the aggregate shares. This algorithm is deterministic. Barnes, et al. Expires 7 April 2025 [Page 32] InternetDraft VDAF October 2024 The data flow for this stage is illustrated in Figure 4. 5.6. Execution of a VDAF Secure execution of a VDAF involves simulating the following procedure. def run_vdaf( vdaf: Vdaf[ Measurement, AggParam, PublicShare, InputShare, list[F], # OutShare AggShare, AggResult, PrepState, PrepShare, PrepMessage, ], verify_key: bytes, agg_param: AggParam, ctx: bytes, nonces: list[bytes], measurements: list[Measurement]) > AggResult: """ Run the VDAF on a list of measurements. Preconditions:  `len(verify_key) == vdaf.VERIFY_KEY_SIZE`  `len(nonces) == len(measurements)`  `all(len(nonce) == vdaf.NONCE_SIZE for nonce in nonces)` """ if len(verify_key) != vdaf.VERIFY_KEY_SIZE: raise ValueError("incorrect verify_key size") if any(len(nonce) != vdaf.NONCE_SIZE for nonce in nonces): raise ValueError("incorrect nonce size") if len(nonces) != len(measurements): raise ValueError( "measurements and nonces lists have different lengths" ) out_shares = [] for (nonce, measurement) in zip(nonces, measurements): assert len(nonce) == vdaf.NONCE_SIZE Barnes, et al. Expires 7 April 2025 [Page 33] InternetDraft VDAF October 2024 # Each Client shards its measurement into input shares. rand = gen_rand(vdaf.RAND_SIZE) (public_share, input_shares) = \ vdaf.shard(ctx, measurement, nonce, rand) # Each Aggregator initializes its preparation state. prep_states = [] outbound_prep_shares = [] for j in range(vdaf.SHARES): (state, share) = vdaf.prep_init(verify_key, ctx, j, agg_param, nonce, public_share, input_shares[j]) prep_states.append(state) outbound_prep_shares.append(share) # Aggregators recover their output shares. for i in range(vdaf.ROUNDS  1): prep_msg = vdaf.prep_shares_to_prep(ctx, agg_param, outbound_prep_shares) outbound_prep_shares = [] for j in range(vdaf.SHARES): out = vdaf.prep_next(ctx, prep_states[j], prep_msg) assert isinstance(out, tuple) (prep_states[j], prep_share) = out outbound_prep_shares.append(prep_share) # The final outputs of the prepare phase are the output # shares. prep_msg = vdaf.prep_shares_to_prep(ctx, agg_param, outbound_prep_shares) outbound_out_shares = [] for j in range(vdaf.SHARES): out_share = vdaf.prep_next(ctx, prep_states[j], prep_msg) assert not isinstance(out_share, tuple) outbound_out_shares.append(out_share) out_shares.append(outbound_out_shares) # Each Aggregator aggregates its output shares into an # aggregate share. In a distributed VDAF computation, the # aggregate shares are sent over the network. agg_shares = [] Barnes, et al. Expires 7 April 2025 [Page 34] InternetDraft VDAF October 2024 for j in range(vdaf.SHARES): out_shares_j = [out[j] for out in out_shares] agg_share_j = vdaf.aggregate(agg_param, out_shares_j) agg_shares.append(agg_share_j) # Collector unshards the aggregate. num_measurements = len(measurements) agg_result = vdaf.unshard(agg_param, agg_shares, num_measurements) return agg_result Figure 7: Execution of a VDAF. The inputs to this algorithm are the aggregation parameter, a list of measurements, and a nonce for each measurement. This document does not specify how the nonces are chosen, but security requires that the nonces be unique. See Section 9 for details. As explained in Section 4.6, the secure execution of a VDAF requires the application to instantiate secure channels between each of the protocol participants. 5.7. Communication Patterns for Preparation In each round of preparation, each Aggregator writes a prep share to some broadcast channel, which is then processed into the prep message using the public prep_shares_to_prep() algorithm and broadcast to the Aggregators to start the next round. In this section we describe some approaches to realizing this broadcast channel functionality in protocols that use VDAFs. The state machine of each Aggregator is shown in Figure 8. ++   v  Start > Continued(prep_state, prep_round) > Finished(out_share)     +> Rejected <+ Figure 8: State machine for VDAF preparation. State transitions are made when the state is acted upon by the host's local inputs and/or messages sent by the peers. The initial state is Start. The terminal states are Rejected, which indicates that the report cannot be processed any further, and Finished(out_share), which indicates that the Aggregator has recovered an output share out_share. Barnes, et al. Expires 7 April 2025 [Page 35] InternetDraft VDAF October 2024 class State: pass class Start(State): pass class Continued(State, Generic[PrepState]): def __init__(self, prep_state: PrepState, prep_round: int): self.prep_state = prep_state self.prep_round = prep_round def __eq__(self, other: object) > bool: return isinstance(other, Continued) and \ self.prep_state == other.prep_state and \ self.prep_round == other.prep_round class Finished(State, Generic[OutShare]): def __init__(self, out_share: OutShare): self.out_share = out_share def __eq__(self, other: object) > bool: return isinstance(other, Finished) and \ self.out_share == other.out_share class Rejected(State): pass The methods described in this section are defined in terms of opaque byte strings. A compatible Vdaf MUST specify methods for encoding public shares, input shares, prep shares, prep messages, and aggregation parameters. Implementations of Prio3 and Poplar1 MUST use the encoding scheme specified in Section 7.2.7 and Section 8.2.6 respectively. 5.8. PingPong Topology (Only Two Aggregators) For VDAFs with precisely two Aggregators (i.e., SHARES == 2), the following "ping pong" communication pattern can be used. It is compatible with any request/response transport protocol, such as HTTP. Let us call the initiating party the "Leader" and the responding party the "Helper". The highlevel idea is that the Leader and Helper will take turns running the computation locally until input from their peer is required: Barnes, et al. Expires 7 April 2025 [Page 36] InternetDraft VDAF October 2024 * For a 1round VDAF (e.g., Prio3 (Section 7)), the Leader sends its prep share to the Helper, who computes the prep message locally, computes its output share, then sends the prep message to the Leader. Preparation requires just one round trip between the Leader and the Helper. * For a 2round VDAF (e.g., Poplar1 (Section 8)), the Leader sends its firstround prep share to the Helper, who replies with the firstround prep message and its secondround prep share. In the next request, the Leader computes its secondround prep share locally, computes its output share, and sends the secondround prep message to the Helper. Finally, the Helper computes its own output share. * In general, each request includes the Leader's prep share for the previous round and/or the prep message for the current round; correspondingly, each response consists of the prep message for the current round and the Helper's prep share for the next round. The Aggregators proceed in this pingponging fashion until a step of the computation fails (indicating the report is invalid and should be rejected) or preparation is completed. All told there there are ceil((ROUNDS+1)/2) requests sent. Each message in the pingpong protocol is structured as follows (expressed in TLS syntax as defined in Section 3 of [RFC8446]): enum { initialize(0), continue(1), finish(2), (255) } MessageType; struct { MessageType type; select (Message.type) { case initialize: opaque prep_share<0..2^321>; case continue: opaque prep_msg<0..2^321>; opaque prep_share<0..2^321>; case finish: opaque prep_msg<0..2^321>; }; } Message; Barnes, et al. Expires 7 April 2025 [Page 37] InternetDraft VDAF October 2024 These messages are used to transition between the states described in Section 5.7. The Leader's initial transition is computed with the following method, implemented on Vdaf: def ping_pong_leader_init( self, vdaf_verify_key: bytes, ctx: bytes, agg_param: bytes, nonce: bytes, public_share: bytes, input_share: bytes) > tuple[State, Optional[bytes]]: """Called by the leader to initialize pingponging.""" try: (prep_state, prep_share) = self.prep_init( vdaf_verify_key, ctx, 0, self.decode_agg_param(agg_param), nonce, self.decode_public_share(public_share), self.decode_input_share(0, input_share), ) encoded_prep_share = self.encode_prep_share(prep_share) return ( Continued(prep_state, 0), encode(0, encoded_prep_share), # initialize ) except: return (Rejected(), None) The output is the State to which the Leader has transitioned and an encoded Message. If the Leader's state is Rejected, then processing halts. Otherwise, if the state is Continued, then processing continues. Function encode is used to encode the outbound message, here the initialize variant (hence 0). The Leader sends the outbound message to the Helper. The Helper's initial transition is computed using the following procedure: Barnes, et al. Expires 7 April 2025 [Page 38] InternetDraft VDAF October 2024 def ping_pong_helper_init( self, vdaf_verify_key: bytes, ctx: bytes, agg_param: bytes, nonce: bytes, public_share: bytes, input_share: bytes, inbound: bytes) > tuple[State, Optional[bytes]]: """ Called by the helper in response to the leader's initial message. """ try: (prep_state, prep_share) = self.prep_init( vdaf_verify_key, ctx, 1, self.decode_agg_param(agg_param), nonce, self.decode_public_share(public_share), self.decode_input_share(1, input_share), ) (inbound_type, inbound_items) = decode(inbound) if inbound_type != 0: # initialize return (Rejected(), None) encoded_prep_share = inbound_items[0] prep_shares = [ self.decode_prep_share(prep_state, encoded_prep_share), prep_share, ] return self.ping_pong_transition( ctx, self.decode_agg_param(agg_param), prep_shares, prep_state, 0, ) except: return (Rejected(), None) Procedure decode decodes the inbound message and returns the MessageType variant (initialize, continue, or finalize) and the sequence of fields. Procedure ping_pong_transition takes in the prep shares, combines them into the prep message, and computes the next prep state of the caller: Barnes, et al. Expires 7 April 2025 [Page 39] InternetDraft VDAF October 2024 def ping_pong_transition( self, ctx: bytes, agg_param: AggParam, prep_shares: list[PrepShare], prep_state: PrepState, prep_round: int) > tuple[State, bytes]: prep_msg = self.prep_shares_to_prep(ctx, agg_param, prep_shares) encoded_prep_msg = self.encode_prep_msg(prep_msg) out = self.prep_next(ctx, prep_state, prep_msg) if prep_round+1 == self.ROUNDS: return ( Finished(out), encode(2, encoded_prep_msg), # finalize ) (prep_state, prep_share) = cast( tuple[PrepState, PrepShare], out) encoded_prep_share = self.encode_prep_share(prep_share) return ( Continued(prep_state, prep_round+1), encode(1, encoded_prep_msg, encoded_prep_share) # continue ) The output is the State to which the Helper has transitioned and an encoded Message. If the Helper's state is Finished or Rejected, then processing halts. Otherwise, if the state is Continued, then processing continues. Next, the Helper sends the outbound message to the Leader. The Leader computes its next state transition using the function ping_pong_leader_continued: def ping_pong_leader_continued( self, ctx: bytes, agg_param: bytes, state: State, inbound: bytes, ) > tuple[State, Optional[bytes]]: """ Called by the leader to start the next step of pingponging. """ return self.ping_pong_continued( True, ctx, agg_param, state, inbound) def ping_pong_continued( Barnes, et al. Expires 7 April 2025 [Page 40] InternetDraft VDAF October 2024 self, is_leader: bool, ctx: bytes, agg_param: bytes, state: State, inbound: bytes, ) > tuple[State, Optional[bytes]]: try: if not isinstance(state, Continued): return (Rejected(), None) prep_round = state.prep_round (inbound_type, inbound_items) = decode(inbound) if inbound_type == 0: # initialize return (Rejected(), None) encoded_prep_msg = inbound_items[0] prep_msg = self.decode_prep_msg( state.prep_state, encoded_prep_msg, ) out = self.prep_next(ctx, state.prep_state, prep_msg) if prep_round+1 < self.ROUNDS and \ inbound_type == 1: # continue (prep_state, prep_share) = cast( tuple[PrepState, PrepShare], out) encoded_prep_share = inbound_items[1] prep_shares = [ self.decode_prep_share( prep_state, encoded_prep_share, ), prep_share, ] if is_leader: prep_shares.reverse() return self.ping_pong_transition( ctx, self.decode_agg_param(agg_param), prep_shares, prep_state, prep_round+1, ) elif prep_round+1 == self.ROUNDS and \ inbound_type == 2: # finish return (Finished(out), None) else: return (Rejected(), None) Barnes, et al. Expires 7 April 2025 [Page 41] InternetDraft VDAF October 2024 except: return (Rejected(), None) If the Leader's state is Finished or Rejected, then processing halts. Otherwise, the Leader sends the outbound message to the Helper. The Helper computes its next state transition using the function ping_pong_helper_continued: def ping_pong_helper_continued( self, ctx: bytes, agg_param: bytes, state: State, inbound: bytes, ) > tuple[State, Optional[bytes]]: """Called by the helper to continue pingponging.""" return self.ping_pong_continued( False, ctx, agg_param, state, inbound) They continue in this way until processing halts. Note that, depending on the number of rounds of preparation that are required, there may be one more message to send before the peer can also finish processing (i.e., outbound != None). 5.9. Star Topology (Any Number of Aggregators) The pingpong topology of the previous section is only suitable for VDAFs involving exactly two Aggregators. If the VDAF supports more than two Aggregators, then the star topology described in this section can be used instead. We again designate an Aggregator to initiate the computation. We refer to this Aggregator as the Leader and to all other Aggregators as Helpers. At the start of each round, the Leader requests from each Helper its prep share. After gathering each of the prep shares, the Leader computes the next prep message (via prep_shares_to_prep()) and broadcasts it to the Helpers. At this point, each Aggregator runs prep_next() locally to either recover an output share or, if more rounds of preparation are required, compute its updated state and prep share. If more are required, then the Helper responds to the broadcast message with its next prep share. The Aggregators proceed in this way until each recovers an output share or some step of the computation fails. Barnes, et al. Expires 7 April 2025 [Page 42] InternetDraft VDAF October 2024 6. Preliminaries This section describes the primitives that are common to the VDAFs specified in this document. 6.1. Finite Fields Both Prio3 and Poplar1 use finite fields of prime order. Finite field elements are represented by a class Field with the following associated parameters: * MODULUS: int is the prime modulus that defines the field. * ENCODED_SIZE: int is the number of bytes used to encode a field element as a byte string. A concrete Field also implements the following class methods: * Field.zeros(length: int) > list[Self] returns a vector of zeros. Preconditions:  length MUST be greater than or equal 0. Postconditions:  The length of the output MUST be length. * Field.rand_vec(length: int) > list[Self] returns a vector of random field elements. Same pre and postconditions as for Field.zeros(). A field element is an instance of a concrete Field. The concrete class defines the usual arithmetic operations on field elements. In addition, it defines the following instance method for converting a field element to an unsigned integer: * elem.as_unsigned() > int returns the integer representation of field element elem. Likewise, each concrete Field implements a constructor for converting an unsigned integer into a field element: * Field(integer: int) returns integer represented as a field element. The value of integer MUST be nonnegative and less than Field.MODULUS. Barnes, et al. Expires 7 April 2025 [Page 43] InternetDraft VDAF October 2024 Each concrete Field has two derived class methods, one for encoding a vector of field elements as a byte string and another for decoding a vector of field elements. def encode_vec(cls, vec: list[Self]) > bytes: """ Encode a vector of field elements `vec` as a byte string. """ encoded = bytes() for x in vec: encoded += to_le_bytes(x.as_unsigned(), cls.ENCODED_SIZE) return encoded def decode_vec(cls, encoded: bytes) > list[Self]: """ Parse a vector of field elements from `encoded`. """ L = cls.ENCODED_SIZE if len(encoded) % L != 0: raise ValueError( 'input length must be a multiple of the size of an ' 'encoded field element') vec = [] for i in range(0, len(encoded), L): encoded_x = encoded[i:i+L] x = from_le_bytes(encoded_x) if x >= cls.MODULUS: raise ValueError('modulus overflow') vec.append(cls(x)) return vec Figure 9: Derived class methods for finite fields. Finally, Field implements the following methods for representing a value as a sequence of field elements, each of which represents a bit of the input. Barnes, et al. Expires 7 April 2025 [Page 44] InternetDraft VDAF October 2024 def encode_into_bit_vector( cls, val: int, bits: int) > list[Self]: """ Encode the bit representation of `val` with at most `bits` number of bits, as a vector of field elements. Preconditions:  `val >= 0`  `bits >= 0` """ if val >= 2 ** bits: # Sanity check we are able to represent `val` with `bits` # number of bits. raise ValueError("Number of bits is not enough to represent " "the input integer.") encoded = [] for l in range(bits): encoded.append(cls((val >> l) & 1)) return encoded def decode_from_bit_vector(cls, vec: list[Self]) > Self: """ Decode the field element from the bit representation, expressed as a vector of field elements `vec`. """ bits = len(vec) if cls.MODULUS >> bits == 0: raise ValueError("Number of bits is too large to be " "represented by field modulus.") decoded = cls(0) for (l, bit) in enumerate(vec): decoded += cls(1 << l) * bit return decoded Figure 10: Derived class methods to encode integers into bit vector representation. 6.1.1. Auxiliary Functions The following auxiliary functions on vectors of field elements are used in the remainder of this document. Note that an exception is raised by each function if the operands are not the same length. Barnes, et al. Expires 7 April 2025 [Page 45] InternetDraft VDAF October 2024 def vec_sub(left: list[F], right: list[F]) > list[F]: """ Subtract the right operand from the left and return the result. """ if len(left) != len(right): raise ValueError("mismatched vector sizes") return list(map(lambda x: x[0]  x[1], zip(left, right))) def vec_add(left: list[F], right: list[F]) > list[F]: """Add the right operand to the left and return the result.""" if len(left) != len(right): raise ValueError("mismatched vector sizes") return list(map(lambda x: x[0] + x[1], zip(left, right))) def vec_neg(vec: list[F]) > list[F]: """Negate the input vector.""" return list(map(lambda x: x, vec)) Figure 11: Common functions for finite fields. 6.1.2. NTTFriendly Fields Some VDAFs require fields that are suitable for efficient computation of the number theoretic transform (NTT) [SML24], as this allows for fast polynomial interpolation. (One example is Prio3 (Section 7) when instantiated with the FLP of Section 7.3.3.) Specifically, a field is said to be "NTTfriendly" if, in addition to satisfying the interface described in Section 6.1, it implements the following method: * Field.gen() > Field returns the generator of a large subgroup of the multiplicative group. To be NTTfriendly, the order of this subgroup MUST be a power of 2. In addition, the size of the subgroup dictates how large interpolated polynomials can be. It is RECOMMENDED that a generator is chosen with order at least 2^20. NTTfriendly fields also define the following parameter: * GEN_ORDER: int is the order of a multiplicative subgroup generated by Field.gen(). 6.1.3. Parameters The tables below define finite fields used in the remainder of this document. Barnes, et al. Expires 7 April 2025 [Page 46] InternetDraft VDAF October 2024 +==============+================+=======================+==========+  Parameter  Field64  Field128  Field255  +==============+================+=======================+==========+  MODULUS  2^32 *  2^66 *  2^255     4294967295 + 1  4611686018427387897 +  19     1   +++++  ENCODED_SIZE  8  16  32  +++++  Generator  7^4294967295  7^4611686018427387897  n/a  +++++  GEN_ORDER  2^32  2^66  n/a  +++++ Table 3: Parameters for the finite fields used in this document. 6.2. Extendable Output Functions VDAFs in this specification use extendable output functions (XOFs) to extract short, fixedlength strings we call "seeds" from long input strings and expand seeds into long output strings. We specify a single interface that is suitable for both purposes. XOFs are defined by a class Xof with the following associated parameter and methods: * SEED_SIZE: int is the size (in bytes) of a seed. * Xof(seed: bytes, dst: bytes, binder: bytes) constructs an instance of Xof from the given seed, domain separation tag, and binder string. (See below for definitions of these.) The length of the seed will typically be SEED_SIZE, but some XOFs may support multiple seed sizes. The seed MUST be generated securely (i.e., it is either the output of a CSPRNG or a previous invocation of the XOF). * xof.next(length: int) returns the next length bytes of output of xof. Each Xof has three derived methods. The first is used to derive a fresh seed from an existing one. The second is used to compute a sequence of field elements. The third is a convenience method to construct an Xof from a seed, domain separation tag, and binder string, and then use it to compute a sequence of field elements. Barnes, et al. Expires 7 April 2025 [Page 47] InternetDraft VDAF October 2024 def derive_seed(cls, seed: bytes, dst: bytes, binder: bytes) > bytes: """ Derive a new seed. Preconditions:  `len(seed) == Xof.SEED_SIZE` """ xof = cls(seed, dst, binder) return xof.next(cls.SEED_SIZE) def next_vec(self, field: type[F], length: int) > list[F]: """ Output the next `length` field elements. Preconditions:  `field` is subclass of `Field`  `length > 0` """ m = next_power_of_2(field.MODULUS)  1 vec: list[F] = [] while len(vec) < length: x = from_le_bytes(self.next(field.ENCODED_SIZE)) x &= m if x < field.MODULUS: vec.append(field(x)) return vec def expand_into_vec(cls, field: type[F], seed: bytes, dst: bytes, binder: bytes, length: int) > list[F]: """ Expand the input `seed` into vector of `length` field elements. Preconditions:  `field` is subclass of `Field`  `len(seed) == Xof.SEED_SIZE`  `length > 0` """ xof = cls(seed, dst, binder) Barnes, et al. Expires 7 April 2025 [Page 48] InternetDraft VDAF October 2024 return xof.next_vec(field, length) Figure 12: Derived methods for XOFs. 6.2.1. XofTurboShake128 This section describes XofTurboShake128, an XOF based on the TurboSHAKE128 [TurboSHAKE]. (RFC EDITOR: Update this reference to the RFC for draftirtfcfrgkangarootwelve once published.) This XOF is RECOMMENDED for all use cases within VDAFs. The length of the domain separation string dst passed to XofTurboShake128 MUST NOT exceed 65535 bytes. class XofTurboShake128(Xof): """XOF wrapper for TurboSHAKE128.""" # Associated parameters SEED_SIZE = 32 def __init__(self, seed: bytes, dst: bytes, binder: bytes): self.l = 0 self.m = \ to_le_bytes(len(dst), 2) + dst \ to_le_bytes(len(seed), 1) + seed + \ binder def next(self, length: int) > bytes: self.l += length # Function `TurboSHAKE128(M, D, L)` is as defined in # Section 2.2 of [TurboSHAKE]. # # Implementation note: rather than regenerate the output # stream each time `next()` is invoked, most implementations # of TurboSHAKE128 will expose an "absorbthensqueeze" API # that allows stateful handling of the stream. stream = TurboSHAKE128(self.m, 1, self.l) return stream[length:] Figure 13: Definition of XOF XofTurboShake128. Barnes, et al. Expires 7 April 2025 [Page 49] InternetDraft VDAF October 2024 6.2.2. XofFixedKeyAes128 While XofTurboShake128 as described above can be securely used in all cases where a XOF is needed in the VDAFs described in this document, there are some cases where a more efficient instantiation based on fixedkey AES is possible. For now, this is limited to the XOF used inside the Idpf Section 8.1 implementation in Poplar1 Section 8.3. It is NOT RECOMMENDED to use this XOF anywhere else. The length of the domain separation string dst passed to XofFixedKeyAes128 MUST NOT exceed 65535 bytes. See Section 9 for a more detailed discussion. class XofFixedKeyAes128(Xof): """ XOF based on a circular collisionresistant hash function from fixedkey AES. """ # Associated parameters SEED_SIZE = 16 def __init__(self, seed: bytes, dst: bytes, binder: bytes): if len(seed) != self.SEED_SIZE: raise ValueError("incorrect seed size") self.length_consumed = 0 # Use TurboSHAKE128 to derive a key from the binder string # and domain separation tag. Note that the AES key does not # need to be kept secret from any party. However, when used # with an IDPF, we require the binder to be a random nonce. # # Implementation note: this step can be cached across XOF # evaluations with many different seeds. dst_length = to_le_bytes(len(dst), 2) self.fixed_key = TurboSHAKE128( dst_length + dst + binder, 2, 16, ) self.seed = seed def next(self, length: int) > bytes: offset = self.length_consumed % 16 new_length = self.length_consumed + length block_range = range( self.length_consumed // 16, new_length // 16 + 1 ) Barnes, et al. Expires 7 April 2025 [Page 50] InternetDraft VDAF October 2024 self.length_consumed = new_length hashed_blocks = [ self.hash_block(xor(self.seed, to_le_bytes(i, 16))) for i in block_range ] return concat(hashed_blocks)[offset:offset+length] def hash_block(self, block: bytes) > bytes: """ The multiinstance tweakable circular correlationrobust hash function of [GKWWY20] (Section 4.2). The tweak here is the key that stays constant for all XOF evaluations of the same Client, but differs between Clients. Function `AES128(key, block)` is the AES128 blockcipher. """ lo, hi = block[:8], block[8:] sigma_block = concat([hi, xor(hi, lo)]) return xor(AES128(self.fixed_key, sigma_block), sigma_block) 6.2.3. The Domain Separation Tag and Binder String XOFs are used to map a seed to a finite domain, e.g., a fresh seed or a vector of field elements. To ensure domain separation, the derivation is needs to be bound to some distinguished domain separation tag. The domain separation tag encodes the following values: 1. The document version (i.e.,VERSION); 2. The "class" of the algorithm using the output (e.g., VDAF); 3. A unique identifier for the algorithm; and 4. Some indication of how the output is used (e.g., for deriving the measurement shares in Prio3 Section 7). The following algorithm is used in the remainder of this document in order to format the domain separation tag: Barnes, et al. Expires 7 April 2025 [Page 51] InternetDraft VDAF October 2024 def format_dst(algo_class: int, algo: int, usage: int) > bytes: """ Format XOF domain separation tag for use within a (V)DAF. Preconditions:  `algo_class` in `range(0, 2 ** 8)`  `algo` in `range(0, 2 ** 32)`  `usage` in `range(0, 2 ** 16)` """ return concat([ to_be_bytes(VERSION, 1), to_be_bytes(algo_class, 1), to_be_bytes(algo, 4), to_be_bytes(usage, 2), ]) It is also sometimes necessary to bind the output to some ephemeral value that multiple parties need to agree on. We call this input the "binder string". 7. Prio3 This section describes Prio3, a VDAF for Prio [CGB17]. Prio is suitable for a wide variety of aggregation functions, including (but not limited to) sum, mean, standard deviation, estimation of quantiles (e.g., median), and linear regression. In fact, the scheme described in this section is compatible with any aggregation function that has the following structure: * Each measurement is encoded as a vector over some finite field. * Measurement validity is determined by an arithmetic circuit evaluated over the encoded measurement. (An "arithmetic circuit" is a function comprised of arithmetic operations in the field.) The circuit's output is a single field element: if zero, then the measurement is said to be "valid"; otherwise, if the output is nonzero, then the measurement is said to be "invalid". * The aggregate result is obtained by summing up the encoded measurement vectors and computing some function of the sum. At a high level, Prio3 distributes this computation as follows. Each Client first shards its measurement by first encoding it, then splitting the vector into secret shares and sending a share to each Aggregator. Next, in the preparation phase, the Aggregators carry Barnes, et al. Expires 7 April 2025 [Page 52] InternetDraft VDAF October 2024 out a multiparty computation to determine if their shares correspond to a valid measurement (as determined by the arithmetic circuit). This computation involves a "proof" of validity generated by the Client. Next, each Aggregator sums up its shares locally. Finally, the Collector sums up the aggregate shares and computes the aggregate result. This VDAF does not have an aggregation parameter. Instead, the output share is derived from the measurement share by applying a fixed map. See Section 8 for an example of a VDAF that makes meaningful use of the aggregation parameter. The core component of Prio3 is a "Fully Linear Proof (FLP)" system. Introduced by [BBCGGI19], the FLP encapsulates the functionality required for encoding and validating measurements. Prio3 can be thought of as a transformation of a particular class of FLPs into a VDAF. The remainder of this section is structured as follows. The syntax for FLPs is described in Section 7.1. The generic transformation of an FLP into Prio3 is specified in Section 7.2. Next, a concrete FLP suitable for any validity circuit is specified in Section 7.3. Finally, instantiations of Prio3 for various types of measurements are specified in Section 7.4. Test vectors can be found in Appendix "Test Vectors". 7.1. Fully Linear Proof (FLP) Systems Conceptually, an FLP is a twoparty protocol executed by a prover and a verifier. In actual use, however, the prover's computation is carried out by the Client, and the verifier's computation is distributed among the Aggregators. The Client generates a "proof" of its measurement's validity and distributes shares of the proof to the Aggregators. Each Aggregator then performs some computation on its measurement share and proof share locally and sends the result to the other Aggregators. Combining the exchanged messages allows each Aggregator to decide if it holds a share of a valid measurement. (See Section 7.2 for details.) As usual, we will describe the interface implemented by a concrete FLP in terms of an abstract base class Flp that specifies the set of methods and parameters a concrete FLP must provide. The parameters provided by a concrete FLP are listed in Table 4. Barnes, et al. Expires 7 April 2025 [Page 53] InternetDraft VDAF October 2024 +================+==========================================+  Parameter  Description  +================+==========================================+  PROVE_RAND_LEN  Length of the prover randomness, the    number of random field elements consumed    by the prover when generating a proof  +++  QUERY_RAND_LEN  Length of the query randomness, the    number of random field elements consumed    by the verifier  +++  JOINT_RAND_LEN  Length of the joint randomness, the    number of random field elements consumed    by both the prover and verifier  +++  MEAS_LEN  Length of the encoded measurement    (Section 7.1.1)  +++  OUTPUT_LEN  Length of the aggregatable output    (Section 7.1.1)  +++  PROOF_LEN  Length of the proof  +++  VERIFIER_LEN  Length of the verifier message generated    by querying the measurement and proof  +++  Measurement  Type of the measurement  +++  AggResult  Type of the aggregate result  +++  field  Class object for the field (Section 6.1)  +++ Table 4: Constants and types defined by a concrete FLP. An FLP specifies the following algorithms for generating and verifying proofs of validity (encoding is described below in Section 7.1.1): * flp.prove(meas: list[F], prove_rand: list[F], joint_rand: list[F]) > list[F] is the deterministic proofgeneration algorithm run by the prover. Its inputs are the encoded measurement, the "prover randomness" prove_rand, and the "joint randomness" joint_rand. The prover randomness is used only by the prover, but the joint randomness is shared by both the prover and verifier. Barnes, et al. Expires 7 April 2025 [Page 54] InternetDraft VDAF October 2024 * flp.query(meas: list[F], proof: list[F], query_rand: list[F], joint_rand: list[F], num_shares: int) > list[F] is the query generation algorithm run by the verifier. This is used to "query" the measurement and proof. The result of the query (i.e., the output of this function) is called the "verifier message". In addition to the measurement and proof, this algorithm takes as input the query randomness query_rand and the joint randomness joint_rand. The former is used only by the verifier. num_shares specifies how many shares were generated. * flp.decide(verifier: list[F]) > bool is the deterministic decision algorithm run by the verifier. It takes as input the verifier message and outputs a boolean indicating if the measurement from which it was generated is valid. Our application requires that the FLP is "fully linear" in the sense defined in [BBCGGI19]. As a practical matter, what this property implies is that, when run on a share of the measurement and proof, the querygeneration algorithm outputs a share of the verifier message. Furthermore, the privacy property of the FLP system ensures that the verifier message reveals nothing about the measurement other than whether it is valid. Therefore, to decide if a measurement is valid, the Aggregators will run the querygeneration algorithm locally, exchange verifier shares, combine them to recover the verifier message, and run the decision algorithm. The querygeneration algorithm includes a parameter num_shares that specifies the number of shares that were generated. If these data are not secret shared, then num_shares == 1. This parameter is useful for certain FLP constructions. For example, the FLP in Section 7.3 is defined in terms of an arithmetic circuit; when the circuit contains constants, it is sometimes necessary to normalize those constants to ensure that the circuit's output, when run on a valid measurement, is the same regardless of the number of shares. An FLP is executed by the prover and verifier as follows: Barnes, et al. Expires 7 April 2025 [Page 55] InternetDraft VDAF October 2024 def run_flp( flp: Flp[Measurement, AggResult, F], meas: list[F], num_shares: int) > bool: """Run the FLP on an encoded measurement.""" joint_rand = flp.field.rand_vec(flp.JOINT_RAND_LEN) prove_rand = flp.field.rand_vec(flp.PROVE_RAND_LEN) query_rand = flp.field.rand_vec(flp.QUERY_RAND_LEN) # Prover generates the proof. proof = flp.prove(meas, prove_rand, joint_rand) # Shard the measurement and the proof. meas_shares = additive_secret_share( meas, num_shares, flp.field, ) proof_shares = additive_secret_share( proof, num_shares, flp.field, ) # Verifier queries the meas shares and proof shares. verifier_shares = [ flp.query( meas_share, proof_share, query_rand, joint_rand, num_shares, ) for meas_share, proof_share in zip(meas_shares, proof_shares) ] # Combine the verifier shares into the verifier. verifier = flp.field.zeros(len(verifier_shares[0])) for verifier_share in verifier_shares: verifier = vec_add(verifier, verifier_share) # Verifier decides if the measurement is valid. return flp.decide(verifier) Figure 14: Execution of an FLP. Barnes, et al. Expires 7 April 2025 [Page 56] InternetDraft VDAF October 2024 The proof system is constructed so that, if meas is valid, then run_flp(flp, meas, 1) always returns True. On the other hand, if meas is invalid, then as long as joint_rand and query_rand are generated uniform randomly, the output is False with high probability. False positives are possible: there is a small probability that a verifier accepts an invalid input as valid. An FLP is said to be "sound" if this probability is sufficiently small. The soundness of the FLP depends on a variety of parameters, like the length of the input and the size of the field. See Section 7.3 for details. Note that soundness of an FLP system is not the same as robustness for a VDAF In particular, soundness of the FLP is necessary, but insufficient for robusntess of Prio3 (Section 7). See Section 9.6 for details. We remark that [BBCGGI19] defines a much larger class of fully linear proof systems than we consider here. In particular, what is called an "FLP" here is called a 1.5round, publiccoin, interactive oracle proof system in their paper. 7.1.1. Encoding the Input The type of measurement being aggregated is defined by the FLP. Hence, the FLP also specifies a method of encoding raw measurements as a vector of field elements: * flp.encode(measurement: Measurement) > list[F] encodes a raw measurement as a vector of field elements. The return value MUST be of length MEAS_LEN. For some FLPs, the encoded measurement also includes redundant field elements that are useful for checking the proof, but which are not needed after the proof has been checked. An example is the "integer sum" data type from [CGB17] in which an integer in range(2**k) is encoded as a vector of k field elements, each representing a bit of the integer (this type is also defined in Section 7.4.2). After consuming this vector, all that is needed is the integer it represents. Thus the FLP defines an algorithm for truncating the encoded measurement to the length of the aggregated output: * flp.truncate(meas: list[F]) > list[F] maps an encoded measurement (e.g., the bitencoding of the measurement) to an aggregatable output (e.g., the singleton vector containing the measurement). The length of the input MUST be MEAS_LEN and the length of the output MUST be OUTPUT_LEN. Barnes, et al. Expires 7 April 2025 [Page 57] InternetDraft VDAF October 2024 Once the aggregate shares have been computed and combined together, their sum can be converted into the aggregate result. This could be a projection from the FLP's field to the integers, or it could include additional postprocessing. * flp.decode(output: list[F], num_measurements: int) > AggResult maps a sum of aggregate shares to an aggregate result. Preconditions:  The length of output MUST be OUTPUT_LEN.  num_measurements MUST equal the number of measurements that contributed to the output. We remark that, taken together, these three functionalities correspond roughly to the notion of "Affineaggregatable encodings (AFEs)" from [CGB17]. 7.1.2. Multiple Proofs It is sometimes desirable to generate and verify multiple independent proofs for the same input. First, this improves the soundness of the proof system without having to change any of its parameters. Second, it allows a smaller field to be used (e.g., replace Field128 with Field64, see Section 7.3) without sacrificing soundness. Generally, choosing a smaller field can significantly reduce communication cost. (This is a tradeoff, of course, since generating and verifying more proofs requires more time.) Given these benefits, this feature is implemented by Prio3 (Section 7). To generate these proofs for a specific measurement, the prover calls flp.prove multiple times, each time using an independently generated prover and joint randomness string. The verifier checks each proof independently, each time with an independently generated query randomness string. It accepts the measurement only if all the decision algorithm accepts on each proof. See Section 9.6 below for discussions on choosing the right number of proofs. 7.2. Construction This section specifies Prio3, an implementation of the Vdaf interface (Section 5). It has three generic parameters: an NttField ({{ntt field}}), anFlp({{flp}}) and aXof({{xof}}). It also has an associated constant,PROOFS, with a value inrange(1, 256)`, denoting the number of FLPs generated by the Client (Section 7.1.2). Barnes, et al. Expires 7 April 2025 [Page 58] InternetDraft VDAF October 2024 The associated constants and types required by the Vdaf interface are defined in Table 5. The methods required for sharding, preparation, aggregation, and unsharding are described in the remaining subsections. These methods refer to constants enumerated in Table 6. +=================+==============================================+  Parameter  Value  +=================+==============================================+  VERIFY_KEY_SIZE  Xof.SEED_SIZE  +++  RAND_SIZE  Xof.SEED_SIZE * SHARES if flp.JOINT_RAND_LEN    == 0 else 2 * Xof.SEED_SIZE * SHARES  +++  NONCE_SIZE  16  +++  ROUNDS  1  +++  SHARES  in [2, 256)  +++  Measurement  Flp.Measurement  +++  AggParam  None  +++  PublicShare  Optional[list[bytes]]  +++  InputShare  tuple[list[F], list[F], Optional[bytes]]     tuple[bytes, Optional[bytes]]  +++  OutShare  list[F]  +++  AggShare  list[F]  +++  AggResult  Flp.AggResult  +++  PrepState  tuple[list[F], Optional[bytes]]  +++  PrepShare  tuple[list[F], Optional[bytes]]  +++  PrepMessage  Optional[bytes]  +++ Table 5: VDAF parameters for Prio3. Barnes, et al. Expires 7 April 2025 [Page 59] InternetDraft VDAF October 2024 +=============================+=======+  Variable  Value  +=============================+=======+  USAGE_MEAS_SHARE: int  1  +++  USAGE_PROOF_SHARE: int  2  +++  USAGE_JOINT_RANDOMNESS: int  3  +++  USAGE_PROVE_RANDOMNESS: int  4  +++  USAGE_QUERY_RANDOMNESS: int  5  +++  USAGE_JOINT_RAND_SEED: int  6  +++  USAGE_JOINT_RAND_PART: int  7  +++ Table 6: Constants used by Prio3. 7.2.1. Sharding Recall from Section 7.1 that the FLP syntax calls for "joint randomness" shared by the prover (i.e., the Client) and the verifier (i.e., the Aggregators). VDAFs have no such notion. Instead, the Client derives the joint randomness from its measurement in a way that allows the Aggregators to reconstruct it from their shares. (This idea is based on the FiatShamir heuristic and is described in Section 6.2.3 of [BBCGGI19].) The sharding algorithm involves the following steps: 1. Encode the Client's measurement for the FLP 2. Shard the measurement into a sequence of measurement shares 3. Derive the joint randomness from the measurement shares and nonce 4. Run the FLP proofgeneration algorithm using the derived joint randomness 5. Shard the proof into a sequence of proof shares 6. Return the public share, consisting of the joint randomness parts, and the input shares, each consisting of the measurement share, proof share, and blind of one of the Aggregators Barnes, et al. Expires 7 April 2025 [Page 60] InternetDraft VDAF October 2024 As described in Section 7.1.2, the soundness of the FLP can be amplified by generating and verifying multiple FLPs. (This in turn improves the robustness of Prio3.) To support this, in Prio3: * In step 3, derive as much joint randomness as required by PROOFS proofs * Repeat step 4 PROOFS times, each time with a unique joint randomness Depending on the FLP, joint randomness may not be required. In particular, when flp.JOINT_RAND_LEN == 0, the Client does not derive the joint randomness (Step 3). The sharding algorithm is specified below. def shard( self, ctx: bytes, measurement: Measurement, nonce: bytes, rand: bytes) > tuple[ Optional[list[bytes]], list[Prio3InputShare]]: if len(nonce) != self.NONCE_SIZE: raise ValueError("incorrect nonce size") if len(rand) != self.RAND_SIZE: raise ValueError("incorrect size of random bytes argument") l = self.xof.SEED_SIZE seeds = [rand[i:i + l] for i in range(0, self.RAND_SIZE, l)] meas = self.flp.encode(measurement) if self.flp.JOINT_RAND_LEN > 0: return self.shard_with_joint_rand(ctx, meas, nonce, seeds) else: return self.shard_without_joint_rand(ctx, meas, seeds) Figure 15: Inputdistribution algorithm for Prio3. It starts by splitting the randomness into seeds. It then encodes the measurement as prescribed by the FLP and calls one of two methods, depending on whether joint randomness is required by the FLP. The methods are defined in the subsections below. 7.2.1.1. FLPs without joint randomness The following method is used for FLPs that do not require joint randomness, i.e., when flp.JOINT_RAND_LEN == 0: Barnes, et al. Expires 7 April 2025 [Page 61] InternetDraft VDAF October 2024 def shard_without_joint_rand( self, ctx: bytes, meas: list[F], seeds: list[bytes]) > tuple[ Optional[list[bytes]], list[Prio3InputShare[F]]]: k_helper_shares, seeds = front(self.SHARES  1, seeds) (k_prove,), seeds = front(1, seeds) # Shard the encoded measurement into shares. leader_meas_share = meas for j in range(self.SHARES  1): leader_meas_share = vec_sub( leader_meas_share, self.helper_meas_share(ctx, j + 1, k_helper_shares[j]), ) # Generate and shard each proof into shares. prove_rands = self.prove_rands(ctx, k_prove) leader_proofs_share = [] for _ in range(self.PROOFS): prove_rand, prove_rands = front( self.flp.PROVE_RAND_LEN, prove_rands) leader_proofs_share += self.flp.prove(meas, prove_rand, []) for j in range(self.SHARES  1): leader_proofs_share = vec_sub( leader_proofs_share, self.helper_proofs_share( ctx, j + 1, k_helper_shares[j], ), ) # Each Aggregator's input share contains its measurement share # and share of proof(s). input_shares: list[Prio3InputShare[F]] = [] input_shares.append(( leader_meas_share, leader_proofs_share, None, )) for j in range(self.SHARES  1): input_shares.append(( k_helper_shares[j], None, )) Barnes, et al. Expires 7 April 2025 [Page 62] InternetDraft VDAF October 2024 return (None, input_shares) Figure 16: Sharding an encoded measurement without joint randomness. The steps in this method are as follows: 1. Shard the encoded measurement into shares 2. Generate and shard each proof into shares 3. Encode each measurement and shares of each proof into an input share Notice that only one pair of measurement and proof(s) share (called the "leader" shares above) are vectors of field elements. The other shares (called the "helper" shares) are represented instead by an XOF seed, which is expanded into vectors of field elements. The methods on Prio3 for deriving the prover randomness, measurement shares, and proof shares and the methods for encoding the input shares are defined in Section 7.2.6. 7.2.1.2. FLPs with joint randomness The following method is used for FLPs that require joint randomness, i.e., for which flp.JOINT_RAND_LEN > 0: def shard_with_joint_rand( self, ctx: bytes, meas: list[F], nonce: bytes, seeds: list[bytes]) > tuple[ Optional[list[bytes]], list[Prio3InputShare[F]]]: k_helper_seeds, seeds = front((self.SHARES  1) * 2, seeds) k_helper_shares = [ k_helper_seeds[i] for i in range(0, (self.SHARES  1) * 2, 2) ] k_helper_blinds = [ k_helper_seeds[i] for i in range(1, (self.SHARES  1) * 2, 2) ] (k_leader_blind, k_prove), seeds = front(2, seeds) # Shard the encoded measurement into shares and compute the # joint randomness parts. Barnes, et al. Expires 7 April 2025 [Page 63] InternetDraft VDAF October 2024 leader_meas_share = meas k_joint_rand_parts = [] for j in range(self.SHARES  1): helper_meas_share = self.helper_meas_share( ctx, j + 1, k_helper_shares[j]) leader_meas_share = vec_sub(leader_meas_share, helper_meas_share) k_joint_rand_parts.append(self.joint_rand_part( ctx, j + 1, k_helper_blinds[j], helper_meas_share, nonce)) k_joint_rand_parts.insert(0, self.joint_rand_part( ctx, 0, k_leader_blind, leader_meas_share, nonce)) # Generate the proof and shard it into proof shares. prove_rands = self.prove_rands(ctx, k_prove) joint_rands = self.joint_rands( ctx, self.joint_rand_seed(ctx, k_joint_rand_parts)) leader_proofs_share = [] for _ in range(self.PROOFS): prove_rand, prove_rands = front( self.flp.PROVE_RAND_LEN, prove_rands) joint_rand, joint_rands = front( self.flp.JOINT_RAND_LEN, joint_rands) leader_proofs_share += self.flp.prove( meas, prove_rand, joint_rand, ) for j in range(self.SHARES  1): leader_proofs_share = vec_sub( leader_proofs_share, self.helper_proofs_share( ctx, j + 1, k_helper_shares[j], ), ) # Each Aggregator's input share contains its measurement share, # share of proof(s), and blind. The public share contains the # Aggregators' joint randomness parts. input_shares: list[Prio3InputShare[F]] = [] input_shares.append(( leader_meas_share, leader_proofs_share, k_leader_blind, )) for j in range(self.SHARES  1): Barnes, et al. Expires 7 April 2025 [Page 64] InternetDraft VDAF October 2024 input_shares.append(( k_helper_shares[j], k_helper_blinds[j], )) return (k_joint_rand_parts, input_shares) Figure 17: Sharding an encoded measurement with joint randomness. The difference between this procedure and previous one is that here we compute joint randomnesses joint_rands, split it into multiple joint_rand, and pass each joint_rand to the proof generationg algorithm. (In Figure 16 the joint randomness is the empty vector, [].) This requires generating an additional value, called the "blind", that is incorporated into each input share. The joint randomness computation involves the following steps: 1. Compute a "joint randomness part" from each measurement share and blind 2. Compute a "joint randomness seed" from the joint randomness parts 3. Compute the joint randomness for each proof evaluation from the joint randomness seed This threestep process is designed to ensure that the joint randomness does not leak the measurement to the Aggregators while preventing a malicious Client from tampering with the joint randomness in a way that allows it to break robustness. To bootstrap the required check, the Client encodes the joint randomness parts in the public share. (See Section 7.2.2 for details.) The methods used in this computation are defined in Section 7.2.6. 7.2.2. Preparation This section describes the process of recovering output shares from the input shares. The highlevel idea is that each Aggregator first queries its measurement and share of proof(s) locally, then exchanges its share of verifier(s) with the other Aggregators. The shares of verifier(s) are then combined into the verifier message(s) used to decide whether to accept. In addition, for FLPs that require joint randomness, the Aggregators must ensure that they have all used the same joint randomness for the querygeneration algorithm. To do so, they collectively rederive the joint randomness from their measurement shares just as the Client did during sharding. Barnes, et al. Expires 7 April 2025 [Page 65] InternetDraft VDAF October 2024 In order to avoid extra round of communication, the Client sends each Aggregator a "hint" consisting of the joint randomness parts. This leaves open the possibility that the Client cheated by, say, forcing the Aggregators to use joint randomness that biases the proof check procedure some way in its favor. To mitigate this, the Aggregators also check that they have all computed the same joint randomness seed before accepting their output shares. To do so, they exchange their parts of the joint randomness along with their shares of verifier(s). Implementation note: the preparation state for Prio3 includes the output share that will be released once preparation is complete. In some situations, it may be necessary for the Aggregator to encode this state as bytes and store it for retrieval later on. For all but the first Aggregator, it is possible to save storage by storing the measurement share rather than output share itself. It is relatively inexpensive to expand this seed into the input share, then truncate the input share to get the output share. The definitions of constants and a few auxiliary functions are defined in Section 7.2.6. def prep_init( self, verify_key: bytes, ctx: bytes, agg_id: int, _agg_param: None, nonce: bytes, public_share: Optional[list[bytes]], input_share: Prio3InputShare[F]) > tuple[ Prio3PrepState[F], Prio3PrepShare[F]]: k_joint_rand_parts = public_share (meas_share, proofs_share, k_blind) = \ self.expand_input_share(ctx, agg_id, input_share) out_share = self.flp.truncate(meas_share) # Compute the joint randomness. joint_rand: list[F] = [] k_corrected_joint_rand, k_joint_rand_part = None, None if self.flp.JOINT_RAND_LEN > 0: assert k_blind is not None assert k_joint_rand_parts is not None k_joint_rand_part = self.joint_rand_part( ctx, agg_id, k_blind, meas_share, nonce) k_joint_rand_parts[agg_id] = k_joint_rand_part k_corrected_joint_rand = self.joint_rand_seed( ctx, k_joint_rand_parts) Barnes, et al. Expires 7 April 2025 [Page 66] InternetDraft VDAF October 2024 joint_rands = self.joint_rands(ctx, k_corrected_joint_rand) # Query the measurement and proof share. query_rands = self.query_rands(verify_key, ctx, nonce) verifiers_share = [] for _ in range(self.PROOFS): proof_share, proofs_share = front( self.flp.PROOF_LEN, proofs_share) query_rand, query_rands = front( self.flp.QUERY_RAND_LEN, query_rands) if self.flp.JOINT_RAND_LEN > 0: joint_rand, joint_rands = front( self.flp.JOINT_RAND_LEN, joint_rands) verifiers_share += self.flp.query( meas_share, proof_share, query_rand, joint_rand, self.SHARES, ) prep_state = (out_share, k_corrected_joint_rand) prep_share = (verifiers_share, k_joint_rand_part) return (prep_state, prep_share) def prep_next( self, _ctx: bytes, prep_state: Prio3PrepState[F], prep_msg: Optional[bytes] ) > tuple[Prio3PrepState[F], Prio3PrepShare[F]]  list[F]: k_joint_rand = prep_msg (out_share, k_corrected_joint_rand) = prep_state # If joint randomness was used, check that the value computed by # the Aggregators matches the value indicated by the Client. if k_joint_rand != k_corrected_joint_rand: raise ValueError('joint randomness check failed') return out_share def prep_shares_to_prep( self, ctx: bytes, _agg_param: None, prep_shares: list[Prio3PrepShare[F]]) > Optional[bytes]: # Unshard the verifier shares into the verifier message. verifiers = self.flp.field.zeros( Barnes, et al. Expires 7 April 2025 [Page 67] InternetDraft VDAF October 2024 self.flp.VERIFIER_LEN * self.PROOFS) k_joint_rand_parts = [] for (verifiers_share, k_joint_rand_part) in prep_shares: verifiers = vec_add(verifiers, verifiers_share) if self.flp.JOINT_RAND_LEN > 0: assert k_joint_rand_part is not None k_joint_rand_parts.append(k_joint_rand_part) # Verify that each proof is wellformed and input is valid for _ in range(self.PROOFS): verifier, verifiers = front(self.flp.VERIFIER_LEN, verifiers) if not self.flp.decide(verifier): raise ValueError('proof verifier check failed') # Combine the joint randomness parts computed by the # Aggregators into the true joint randomness seed. This is # used in the last step. k_joint_rand = None if self.flp.JOINT_RAND_LEN > 0: k_joint_rand = self.joint_rand_seed(ctx, k_joint_rand_parts) return k_joint_rand Figure 18: Preparation state for Prio3. 7.2.3. Validity of Aggregation Parameters Every input share MUST only be used once, regardless of the aggregation parameters used. def is_valid( self, _agg_param: None, previous_agg_params: list[None]) > bool: """ Checks if `previous_agg_params` is empty, as input shares in Prio3 may only be used once. """ return len(previous_agg_params) == 0 Figure 19: Validity of aggregation parameters for Prio3. 7.2.4. Aggregation Aggregating a set of output shares is simply a matter of adding up the vectors elementwise. Barnes, et al. Expires 7 April 2025 [Page 68] InternetDraft VDAF October 2024 def aggregate( self, _agg_param: None, out_shares: list[list[F]]) > list[F]: agg_share = self.flp.field.zeros(self.flp.OUTPUT_LEN) for out_share in out_shares: agg_share = vec_add(agg_share, out_share) return agg_share Figure 20: Aggregation algorithm for Prio3. 7.2.5. Unsharding To unshard a set of aggregate shares, the Collector first adds up the vectors elementwise. It then converts each element of the vector into an integer. def unshard( self, _agg_param: None, agg_shares: list[list[F]], num_measurements: int) > AggResult: agg = self.flp.field.zeros(self.flp.OUTPUT_LEN) for agg_share in agg_shares: agg = vec_add(agg, agg_share) return self.flp.decode(agg, num_measurements) Figure 21: Computation of the aggregate result for Prio3. 7.2.6. Auxiliary Functions This section defines a number of auxiliary functions referenced by the main algorithms for Prio3 in the preceding sections. The following methods are called by the sharding and preparation algorithms. def helper_meas_share( self, ctx: bytes, agg_id: int, k_share: bytes) > list[F]: return self.xof.expand_into_vec( self.flp.field, k_share, self.domain_separation_tag(USAGE_MEAS_SHARE, ctx), byte(agg_id), self.flp.MEAS_LEN, Barnes, et al. Expires 7 April 2025 [Page 69] InternetDraft VDAF October 2024 ) def helper_proofs_share( self, ctx: bytes, agg_id: int, k_share: bytes) > list[F]: return self.xof.expand_into_vec( self.flp.field, k_share, self.domain_separation_tag(USAGE_PROOF_SHARE, ctx), byte(self.PROOFS) + byte(agg_id), self.flp.PROOF_LEN * self.PROOFS, ) def expand_input_share( self, ctx: bytes, agg_id: int, input_share: Prio3InputShare[F]) > tuple[ list[F], list[F], Optional[bytes]]: if agg_id > 0: assert len(input_share) == 2 (k_share, k_blind) = input_share meas_share = self.helper_meas_share(ctx, agg_id, k_share) proofs_share = self.helper_proofs_share(ctx, agg_id, k_share) else: assert len(input_share) == 3 (meas_share, proofs_share, k_blind) = input_share return (meas_share, proofs_share, k_blind) def prove_rands(self, ctx: bytes, k_prove: bytes) > list[F]: return self.xof.expand_into_vec( self.flp.field, k_prove, self.domain_separation_tag(USAGE_PROVE_RANDOMNESS, ctx), byte(self.PROOFS), self.flp.PROVE_RAND_LEN * self.PROOFS, ) def query_rands( self, verify_key: bytes, ctx: bytes, nonce: bytes) > list[F]: return self.xof.expand_into_vec( Barnes, et al. Expires 7 April 2025 [Page 70] InternetDraft VDAF October 2024 self.flp.field, verify_key, self.domain_separation_tag(USAGE_QUERY_RANDOMNESS, ctx), byte(self.PROOFS) + nonce, self.flp.QUERY_RAND_LEN * self.PROOFS, ) def joint_rand_part( self, ctx: bytes, agg_id: int, k_blind: bytes, meas_share: list[F], nonce: bytes) > bytes: return self.xof.derive_seed( k_blind, self.domain_separation_tag(USAGE_JOINT_RAND_PART, ctx), byte(agg_id) + nonce + self.flp.field.encode_vec(meas_share), ) def joint_rand_seed(self, ctx: bytes, k_joint_rand_parts: list[bytes]) > bytes: """Derive the joint randomness seed from its parts.""" return self.xof.derive_seed( zeros(self.xof.SEED_SIZE), self.domain_separation_tag(USAGE_JOINT_RAND_SEED, ctx), concat(k_joint_rand_parts), ) def joint_rands(self, ctx: bytes, k_joint_rand_seed: bytes) > list[F]: """Derive the joint randomness from its seed.""" return self.xof.expand_into_vec( self.flp.field, k_joint_rand_seed, self.domain_separation_tag(USAGE_JOINT_RANDOMNESS, ctx), byte(self.PROOFS), self.flp.JOINT_RAND_LEN * self.PROOFS, ) 7.2.7. Message Serialization This section defines serialization formats for messages exchanged over the network while executing Prio3. It is RECOMMENDED that implementations provide serialization methods for them. Barnes, et al. Expires 7 April 2025 [Page 71] InternetDraft VDAF October 2024 Message structures are defined following Section 3 of [RFC8446]). In the remainder we use S as an alias for prio3.xof.SEED_SIZE and F as an alias for prio3.field.ENCODED_SIZE. XOF seeds are represented as follows: opaque Prio3Seed[S]; Field elements are encoded in littleendian byte order (as defined in Section 6.1) and represented as follows: opaque Prio3Field[F]; 7.2.7.1. Public Share The encoding of the public share depends on whether joint randomness is required for the underlying FLP (i.e., prio3.flp.JOINT_RAND_LEN > 0). If joint randomness is not used, then the public share is the empty string. If joint randomness is used, then the public share encodes the joint randomness parts as follows: struct { Prio3Seed k_joint_rand_parts[S * prio3.SHARES]; } Prio3PublicShareWithJointRand; 7.2.7.2. Input share Just as for the public share, the encoding of the input shares depends on whether joint randomness is used. If so, then each input share includes the Aggregator's blind for generating its joint randomness part. In addition, the encoding of the input shares depends on which aggregator is receiving the message. If the aggregator ID is 0, then the input share includes the full measurement and share of proof(s). Otherwise, if the aggregator ID is greater than 0, then the measurement and shares of proof(s) are represented by an XOF seed. We shall call the former the "Leader" and the latter the "Helpers". In total there are four variants of the input share. When joint randomness is not used, the Leader's share is structured as follows: struct { Prio3Field meas_share[F * prio3.flp.MEAS_LEN]; Prio3Field proofs_share[F * prio3.flp.PROOF_LEN * prio3.PROOFS]; } Prio3LeaderShare; When joint randomness is not used, the Helpers' shares are structured as follows: Barnes, et al. Expires 7 April 2025 [Page 72] InternetDraft VDAF October 2024 struct { Prio3Seed k_share; } Prio3HelperShare; When joint randomness is used, the Leader's input share is structured as follows: struct { Prio3LeaderShare inner; Prio3Seed k_blind; } Prio3LeaderShareWithJointRand; Finally, when joint randomness is used, the Helpers' shares are structured as follows: struct { Prio3HelperShare inner; Prio3Seed k_blind; } Prio3HelperShareWithJointRand; 7.2.7.3. Prep Share When joint randomness is not used, the prep share is structured as follows: struct { Prio3Field verifiers_share[ F * prio3.flp.VERIFIER_LEN * prio3.PROOFS ]; } Prio3PrepShare; When joint randomness is used, the prep share includes the Aggregator's joint randomness part and is structured as follows: struct { Prio3Field verifiers_share[ F * prio3.flp.VERIFIER_LEN * prio3.PROOFS ]; Prio3Seed k_joint_rand_part; } Prio3PrepShareWithJointRand; 7.2.7.4. Prep Message When joint randomness is not used, the prep message is the empty string. Otherwise the prep message consists of the joint randomness seed computed by the Aggregators: Barnes, et al. Expires 7 April 2025 [Page 73] InternetDraft VDAF October 2024 struct { Prio3Seed k_joint_rand; } Prio3PrepMessageWithJointRand; 7.2.7.5. Aggregation Aggregate shares are structured as follows: struct { Prio3Field agg_share[F * prio3.flp.OUTPUT_LEN]; } Prio3AggShare; 7.3. The FLP of [BBCGGI19] This section describes an FLP based on the construction from [BBCGGI19], Section 4.2. We begin in Section 7.3.1 with an overview of their proof system and some extensions to it. The construction is specified in Section 7.3.3. 7.3.1. Overview Conventional zeroknowledge proof systems involve two parties: * The prover, who holds a measurement and generates a proof of the measurement's validity * The verifier who holds an encryption of, or commitment to, the measurement and checks the proof Our proof system is much the same, except the verifier is split across multiple Aggregators, each of which has a secret share of the measurement rather than an encryption of it. Validity is defined in terms of an arithmetic circuit evaluated over the measurement. The inputs to this circuit are elements of a finite field that comprise the encoded measurement; the gates of the circuit are multiplication, addition, and subtraction operations; and the output of the circuit is a single field element. If the value is zero, then the measurement is deemed valid; otherwise, if the output is nonzero, then the measurement is deemed invalid. For example, the simplest circuit specified in this document is the following (Section 7.4.1): C(x) = x * (x1) Barnes, et al. Expires 7 April 2025 [Page 74] InternetDraft VDAF October 2024 This circuit contains one subtraction gate (x  1) and one multiplication gate (x * (x  1)). Observe that C(x) = 0 if and only if x in range(2). Our goal is to allow each Aggregator, who holds a secret share of x, to correctly compute a secret share of C(x). This allows the Aggregators to determine validity by combining their shares of the output. Suppose for a moment that the validity circuit C is an affine arithmetic circuit, meaning its only operations are addition, subtraction, and multiplicationbyconstant. (The circuit above is nonaffine because it contains a multiplication gate with two non constant inputs.) Then each Aggregator can compute its share locally, since C(x_shares[0] + ... + x_shares[SHARES1]) = C(x_shares[0]) + ... + C(x_shares[SHARES1]) (Note that, for this equality to hold, it is necessary to scale any addition of a constant in the circuit by 1/SHARES.) However, this is not the case if C contains multiplication gates with two nonconstant inputs. Thus our goal is to transform these multiplication gates into computations on secret shared data that each Aggregator can perform locally. The key idea is to have the prover construct a polynomial p such that p(j) is equal to the output of the jth multiplication gate. Polynomial evaluation is fully linear, which means the coefficients of the polynomial can be secret shared in a way that allows each Aggregator to compute a share of p(j) for any j. These intermediate results can then be combined with the affine arithmetic operations of the validity circuit to produce the final output. Applying this idea to the example circuit C above: 1. The Client, given its measurement x, constructs the lowest degree polynomial p for which p(0) = s and p(1) = x * (x  1), where s is a random blinding value generated by the Client. (The blinding value is to protect the privacy of the measurement.) It sends secret shares of x and the coefficients of p to each of the Aggregators. 2. Each Aggregator locally computes and broadcasts its share of p(1), which is equal to its share of C(x). Barnes, et al. Expires 7 April 2025 [Page 75] InternetDraft VDAF October 2024 In fact, our FLP is slightly more general than this. We can replace the multiplication gate with any nonaffine subcircuit and apply the same idea. For example, in Section 7.4.2, the validity circuit uses the following subcircuit multiple times: Range2(x) = x * (x  1) = x^2  x (This is the same functionality computed by the example circuit C above.) Here again we can interpolate the lowest degree polynomial p for which p(j) is the value of the jth call to Range2 in the validity circuit. Each validity circuit defines a subcircuit that encapsulates its nonaffine arithmetic operations. We refer to this subcircuit as the "gadget". This idea provides the needed functionality, but it has one more problem: it is possible for a malicious Client to produce a gadget polynomial p that would result in C(x) being computed incorrectly, potentially resulting in an invalid measurement being accepted. To prevent this, the Aggregators perform a probabilistic test to check that the gadget polynomial was constructed properly. This "gadget test", and the procedure for constructing the polynomial, are described in detail in Section 7.3.3. 7.3.1.1. Extensions The FLP described in {#flpbbcggi19construction} extends the proof system of [BBCGGI19], Section 4.2 in a few ways. First, the validity circuit in our construction includes an additional, random input (this is the "joint randomness" derived from the measurement shares in Prio3; see Section 7.2). This allows for circuit optimizations that trade a small soundness error for a shorter proof. For example, consider a circuit that recognizes the set of lengthN vectors for which each element is either one or zero. A deterministic circuit could be constructed for this language, but it would involve a large number of multiplications that would result in a large proof. (See the discussion in [BBCGGI19], Section 5.2 for details). A much shorter proof can be constructed for the following randomized circuit: C(x, r) = r * Range2(x[0]) + ... + r^N * Range2(x[N1]) Barnes, et al. Expires 7 April 2025 [Page 76] InternetDraft VDAF October 2024 (Note that this is a special case of [BBCGGI19], Theorem 5.2.) Here x is the lengthN input and r is a random field element. The gadget circuit Range2 is the "rangecheck" polynomial described above, i.e., Range2(x) = x^2  x. The idea is that, if x is valid (i.e., each x[j] is in range(2)), then the circuit will evaluate to 0 regardless of the value of r; but if some x[j] is not in range(2), then output will be nonzero with high probability. The second extension implemented by our FLP allows the validity circuit to contain multiple gadget types. (This generalization was suggested in [BBCGGI19], Remark 4.5.) This provides additional flexibility for designing circuits by allowing multiple, nonaffine subcircuits. For example, the following circuit is allowed: C(meas, r) = r * Range2(meas[0]) + ... + r^L * Range2(meas[L1]) + \ r^(L+1) * Range3(meas[L]) + ... + r^N * Range3(meas[N1]) where Range3(x) = x^3  3x^2 + 2x. This circuit checks that the first L inputs are in range(2) and the last NL inputs are in range(3). The same circuit can be expressed using a simpler gadget, namely multiplication, but the resulting proof would be longer. Third, rather than interpolate the gadget polynomial at inputs 1, 2, ..., j, ..., where j is the jth invocation of the gadget, we use powers of alpha, where alpha is a root of unity for the field. This allows us to construct each gadget polynomial via the number theoretic transform [SML24], which is far more efficient than generic formulas. This requires the field to be "NTTfriendly" as defined in Section 6.1.2. Finally, the validity circuit in our FLP may have any number of outputs (at least one). The input is said to be valid if each of the outputs is zero. To save bandwidth, we take a random linear combination of the outputs. If each of the outputs is zero, then the reduced output will be zero; but if one of the outputs is nonzero, then the reduced output will be nonzero with high probability. 7.3.2. Validity Circuits The FLP described in Section 7.3.3 is defined in terms of a validity circuit Valid that implements the interface described here. A concrete Valid defines the following parameters: Barnes, et al. Expires 7 April 2025 [Page 77] InternetDraft VDAF October 2024 +=================+=======================================+  Parameter  Description  +=================+=======================================+  GADGETS  A list of gadgets  +++  GADGET_CALLS  Number of times each gadget is called  +++  MEAS_LEN  Length of the measurement  +++  OUTPUT_LEN  Length of the aggregatable output  +++  JOINT_RAND_LEN  Length of the random input  +++  EVAL_OUTPUT_LEN  Length of the circuit output  +++  Measurement  The type of measurement  +++  AggResult  Type of the aggregate result  +++  field  Class object for the field  +++ Table 7: Validity circuit parameters. Each gadget G in GADGETS defines a constant DEGREE that specifies the circuit's "arithmetic degree". This is defined to be the degree of the polynomial that computes it. For example, the Mul circuit in Section 7.3.1 is defined by the polynomial Mul(x) = x * x, which has degree 2. Hence, the arithmetic degree of this gadget is 2. Each gadget also defines a parameter ARITY that specifies the circuit's arity (i.e., the number of input wires). Gadgets provide a method to evaluate their circuit on a list of inputs, eval(). The inputs can either belong to the validity circuit's field, or the polynomial ring over that field. A concrete Valid provides the following methods for encoding a measurement as an input vector, truncating an input vector to the length of an aggregatable output, and converting an aggregated output to an aggregate result: * valid.encode(measurement: Measurement) > list[F] returns a vector of length MEAS_LEN representing a measurement. * valid.truncate(meas: list[F]) > list[F] returns a vector of length OUTPUT_LEN representing an aggregatable output. Barnes, et al. Expires 7 April 2025 [Page 78] InternetDraft VDAF October 2024 * valid.decode(output: list[F], num_measurements: int) > AggResult returns an aggregate result. Finally, the following methods are derived for each concrete Valid: def prove_rand_len(self) > int: """Length of the prover randomness.""" return sum(g.ARITY for g in self.GADGETS) def query_rand_len(self) > int: """Length of the query randomness.""" query_rand_len = len(self.GADGETS) if self.EVAL_OUTPUT_LEN > 1: query_rand_len += self.EVAL_OUTPUT_LEN return query_rand_len def proof_len(self) > int: """Length of the proof.""" length = 0 for (g, g_calls) in zip(self.GADGETS, self.GADGET_CALLS): P = next_power_of_2(1 + g_calls) length += g.ARITY + g.DEGREE * (P  1) + 1 return length def verifier_len(self) > int: """Length of the verifier message.""" length = 1 for g in self.GADGETS: length += g.ARITY + 1 return length def check_valid_eval( self, meas: list[F], joint_rand: list[F]) > None: if len(meas) != self.MEAS_LEN: raise ValueError('incorrect measurement length') if len(joint_rand) != self.JOINT_RAND_LEN: raise ValueError('incorrect joint randomness length') Figure 22: Derived methods for validity circuits. 7.3.3. Construction This section specifies an implementation of the Flp interface (Section 7.1). It has as a generic parameter a validity circuit Valid implementing the interface defined in Section 7.3.2. Barnes, et al. Expires 7 April 2025 [Page 79] InternetDraft VDAF October 2024 The parameters are defined in Table 8. The required methods for generating the proof, generating the verifier, and deciding validity are specified in the remaining subsections. In the remainder, we let [n] denote the set {1, ..., n} for positive integer n. We also define the following constants: * Let H = len(valid.GADGETS) * For each i in [H]:  Let G_i = valid.GADGETS[i]  Let L_i = valid.GADGETS[i].ARITY  Let M_i = valid.GADGET_CALLS[i]  Let P_i = next_power_of_2(M_i+1)  Let alpha_i = field.gen()^(field.GEN_ORDER / P_i) +================+============================================+  Parameter  Value  +================+============================================+  PROVE_RAND_LEN  valid.prove_rand_len() (see Section 7.3.2)  +++  QUERY_RAND_LEN  valid.query_rand_len() (see Section 7.3.2)  +++  JOINT_RAND_LEN  valid.JOINT_RAND_LEN  +++  MEAS_LEN  valid.MEAS_LEN  +++  OUTPUT_LEN  valid.OUTPUT_LEN  +++  PROOF_LEN  valid.proof_len() (see Section 7.3.2)  +++  VERIFIER_LEN  valid.verifier_len() (see Section 7.3.2)  +++  Measurement  valid.Measurement  +++ Table 8: Parameters of FLP of BBCGGI19. 7.3.3.1. Proof Generation Barnes, et al. Expires 7 April 2025 [Page 80] InternetDraft VDAF October 2024 ++  Prove   ++    Valid < meas   ++ < joint rand    Gadget   < prove rand   ++    ++  ++  V proof Figure 23: The proof generation algorithm invokes the validity circuit on the encoded measurement and joint randomness. The validity circuit in turn invokes the gadget(s) defined by the circuit. The prove randomness is used to construct the gadget polynomial(s). On input of meas, prove_rand, and joint_rand, the proof is computed as follows: 1. For each i in [H] create an empty table wire_i. 2. Partition the prover randomness prove_rand into subvectors seed_1, ..., seed_H where len(seed_i) == L_i for all i in [H]. Let us call these the "wire seeds" of each gadget. 3. Evaluate Valid on input of meas and joint_rand, recording the inputs of each gadget in the corresponding table. Specifically, for every i in [H], set wire_i[j1,k1] to the value on the jth wire into the kth call to gadget G_i. 4. Compute the "wire polynomials". That is, for every i in [H] and j in [L_i], construct poly_wire_i[j1], the jth wire polynomial for the ith gadget, as follows: * Let w = [seed_i[j1], wire_i[j1,0], ..., wire_i[j1,M_i1]]. * Let padded_w = w + field.zeros(P_i  len(w)). * Let poly_wire_i[j1] be the lowest degree polynomial for which poly_wire_i[j1](alpha_i^k) == padded_w[k] for all k in [P_i]. 5. Compute the "gadget polynomials". That is, for every i in [H]: Barnes, et al. Expires 7 April 2025 [Page 81] InternetDraft VDAF October 2024 * Let poly_gadget_i = G_i(poly_wire_i[0], ..., poly_wire_i[L_i 1]). That is, evaluate the circuit G_i on the wire polynomials for the ith gadget. (Arithmetic is in the ring of polynomials over field.) The proof is the vector proof = seed_1 + coeff_1 + ... + seed_H + coeff_H, where coeff_i is the vector of coefficients of poly_gadget_i for each i in [H]. 7.3.3.2. Query Generation  proof (share) ++ V  ++  Query    ++    Valid  < meas (share)   ++ < joint rand    V   < query rand   ++    ++  ++  V verifier (share) Figure 24: The query generation algorithm invokes the validity circuit on the encoded measurement and joint randomness. It evaluates the gadget polynomials(s) encoded by the proof (share) to produce (a share of) each gadget output. The verifier (share) consists of (a share of) the validity circuit's output and (a share of) each gadget test. The tests consume the query randomness. On input of meas, proof, query_rand, and joint_rand, the verifier message is generated as follows: 1. For every i in [H] create an empty table wire_i. 2. Partition proof into the subvectors seed_1, coeff_1, ..., seed_H, coeff_H defined in Section 7.3.3.1. 3. Evaluate Valid on input of meas and joint_rand, recording the inputs of each gadget in the corresponding table. This step is similar to the prover's step (3.) except the verifier does not evaluate the gadgets. Instead, it computes the output of the kth call to G_i by evaluating poly_gadget_i(alpha_i^k). Let out denote the output of the circuit evaluation. Barnes, et al. Expires 7 April 2025 [Page 82] InternetDraft VDAF October 2024 4. Next, reduce out as follows. If EVAL_OUTPUT_LEN > 1, then consume the first EVAL_OUTPUT_LEN elements of query_rand by letting r, query_rand = front(1, query_rand). Then let v = r[0]*out[0] + r[1]*out[1] + r[2]*out[2] + .... 5. Compute the wire polynomials just as in the prover's step (4.). 6. Compute the tests for wellformedness of the gadget polynomials. That is, for every i in [H]: * Let t = query_rand[i]. Check if t^(P_i) == 1: if so, then raise ERR_ABORT and halt. (This prevents the verifier from inadvertently leaking a gadget output in the verifier message.) * Let y_i = poly_gadget_i(t). * For each j in [0,L_i) let x_i[j1] = poly_wire_i[j1](t). The verifier message is the vector verifier = [v] + x_1 + [y_1] + ... + x_H + [y_H]. 7.3.3.3. Decision  verifier V ++  Decide  ++  V is_valid Figure 25: The decision algorithm consumes the verifier, consisting of the reduced output of the validity circuit and each gadget test. The verifier is produced by adding up the verifier shares output by each Aggregator. On input of vector verifier, the verifier decides if the measurement is valid as follows: 1. Parse verifier into v, x_1, y_1, ..., x_H, y_H as defined in Section 7.3.3.2. 2. Check for wellformedness of the gadget polynomials. For every i in [H]: * Let z = G_i(x_i). That is, evaluate the circuit G_i on x_i and set z to the output. Barnes, et al. Expires 7 April 2025 [Page 83] InternetDraft VDAF October 2024 * If z != y_i, then return False and halt. 3. Return True if v == 0 and False otherwise. 7.3.3.4. Encoding The FLP encoding and truncation methods invoke valid.encode, valid.truncate, and valid.decode in the natural way. 7.4. Instantiations This section specifies instantiations of Prio3 for various measurement types. Each is determined by a field (Section 6.1), a validity circuit (Section 7.3.2), an XOF (Section 6.2). and the number of proofs to generate and verify. Test vectors for each can be found in Appendix "Test Vectors". 7.4.1. Prio3Count +===========+==================================+  Parameter  Value  +===========+==================================+  Valid  Count(Field64) (this section)  +++  Field  Field64 (Table 3)  +++  PROOFS  1  +++  Xof  XofTurboShake128 (Section 6.2.1)  +++ Table 9: Parameters for Prio3Count. Our first instance of Prio3 is for a simple counter: each measurement is either one or zero and the aggregate result is the sum of the measurements. Its validity circuit, denoted Count, uses the following degree2, arity2 gadget, denoted Mul: def eval(self, _field: type[F], inp: list[F]) > F: self.check_gadget_eval(inp) return inp[0] * inp[1] The call to check_gadget_eval() raises an error if the length of the input is not equal to the gadget's ARITY parameter. The Count validity circuit is defined as Barnes, et al. Expires 7 April 2025 [Page 84] InternetDraft VDAF October 2024 def eval( self, meas: list[F], joint_rand: list[F], _num_shares: int) > list[F]: self.check_valid_eval(meas, joint_rand) squared = self.GADGETS[0].eval(self.field, [meas[0], meas[0]]) return [squared  meas[0]] The measurement is encoded and decoded as a singleton vector in the natural way. The parameters for this circuit are summarized below. +=================+=================+  Parameter  Value  +=================+=================+  GADGETS  [Mul]  +++  GADGET_CALLS  [1]  +++  MEAS_LEN  1  +++  OUTPUT_LEN  1  +++  JOINT_RAND_LEN  0  +++  EVAL_OUTPUT_LEN  1  +++  Measurement  int in range(2)  +++  AggResult  int  +++ Table 10: Parameters of validity circuit Count. 7.4.2. Prio3Sum +===========+==============================================+  Parameter  Value  +===========+==============================================+  Valid  Sum(Field64, max_measurement) (this section)  +++  Field  Field64 (Table 3)  +++  PROOFS  1  +++  Xof  XofTurboShake128 (Section 6.2.1)  +++ Barnes, et al. Expires 7 April 2025 [Page 85] InternetDraft VDAF October 2024 Table 11: Parameters for Prio3Sum. The next instance of Prio3 supports summing of integers in a pre determined range. Each measurement is an integer in range(max_measurement+1), where max_measurement is an associated parameter equal to the largest valid measurement. The range check is accomplished by encoding the measurement as a bit vector, encoding the measurement plus an offset as a bit vector, then checking that the two encoded integers are consistent. Let * bits = max_measurement.bit_length(), the number of bits needed to encode the largest valid measurement * offset = 2**bits  1  max_measurement The first bitencoded integer is the measurement itself. Note that only measurements between 0 and 2**bits  1 can be encoded this way with bits bits. The second bitencoded integer is the sum of the measurement and offset. Observe that this sum can only be encoded this way if it is between 0 and 2**bits  1, which implies that the measurement is between offset and max_measurement. The circuit, denoted Sum, first checks that each entry of both bit vectors is a one or a zero. It then decodes both the measurement and the offset measurement, and subtracts offset from the latter. It then checks if these two values are equal. Since both the measurement and the measurement plus offset are in the same range of range(2**bits), this means that the measurement itself is between 0 and max_measurement. The measurement is encoded in 2*bits field elements as follows: Barnes, et al. Expires 7 April 2025 [Page 86] InternetDraft VDAF October 2024 def encode(self, measurement: int) > list[F]: encoded = [] encoded += self.field.encode_into_bit_vector( measurement, self.bits ) encoded += self.field.encode_into_bit_vector( measurement + self.offset.as_unsigned(), self.bits ) return encoded def truncate(self, meas: list[F]) > list[F]: return [self.field.decode_from_bit_vector(meas[:self.bits])] def decode(self, output: list[F], _num_measurements: int) > int: return output[0].as_unsigned() The validity circuit checks that the input consists of ones and zeros. Its gadget, denoted Range2, is the degree2, arity1 gadget defined as def eval(self, _field: type[F], inp: list[F]) > F: self.check_gadget_eval(inp) return inp[0] * inp[0]  inp[0] The Sum validity circuit is defined as def eval( self, meas: list[F], joint_rand: list[F], num_shares: int) > list[F]: self.check_valid_eval(meas, joint_rand) shares_inv = self.field(num_shares).inv() out = [] for b in meas: out.append(self.GADGETS[0].eval(self.field, [b])) range_check = self.offset * shares_inv + \ self.field.decode_from_bit_vector(meas[:self.bits])  \ self.field.decode_from_bit_vector(meas[self.bits:]) out.append(range_check) return out Barnes, et al. Expires 7 April 2025 [Page 87] InternetDraft VDAF October 2024 +=================+=================================+  Parameter  Value  +=================+=================================+  GADGETS  [Range2]  +++  GADGET_CALLS  [2*bits]  +++  MEAS_LEN  2*bits  +++  OUTPUT_LEN  1  +++  JOINT_RAND_LEN  0  +++  EVAL_OUTPUT_LEN  2*bits + 1  +++  Measurement  int in range(max_measurement+1)  +++  AggResult  int  +++ Table 12: Parameters of validity circuit Sum. 7.4.3. Prio3SumVec +===========+================================+  Parameter  Value  +===========+================================+  Valid  SumVec(Field128, length, bits,    chunk_lengh) (this section)  +++  Field  Field128 (Table 3)  +++  PROOFS  1  +++  Xof  XofTurboShake128    (Section 6.2.1)  +++ Table 13: Parameters for Prio3SumVec. This instance of Prio3 supports summing a vector of integers. It has three parameters, length, bits, and chunk_length. Each measurement is a vector of positive integers with length equal to the length parameter. Each element of the measurement is an integer in the range(2**bits). It is RECOMMENDED to set chunk_length to an integer near the square root of length * bits (see Section 7.4.3.1). Barnes, et al. Expires 7 April 2025 [Page 88] InternetDraft VDAF October 2024 The validity circuit is denoted SumVec. Measurements are encoded as a vector of field elements with length length * bits. The field elements in the encoded vector represent all the bits of the measurement vector's elements, consecutively, in LSB to MSB order: def encode(self, measurement: list[int]) > list[F]: if len(measurement) != self.length: raise ValueError('incorrect measurement length') encoded = [] for val in measurement: if val not in range(2**self.bits): raise ValueError( 'entry of measurement vector is out of range' ) encoded += self.field.encode_into_bit_vector(val, self.bits) return encoded def truncate(self, meas: list[F]) > list[F]: truncated = [] for i in range(self.length): truncated.append(self.field.decode_from_bit_vector( meas[i * self.bits: (i + 1) * self.bits] )) return truncated def decode( self, output: list[F], _num_measurements: int) > list[int]: return [x.as_unsigned() for x in output] This validity circuit uses a ParallelSum gadget to achieve a smaller proof size. This optimization for "parallelsum circuits" is described in [BBCGGI19], section 4.4. Briefly, for circuits that add up the output of multiple identical subcircuits, it is possible to achieve smaller proof sizes (on the order of O(sqrt(MEAS_LEN)) instead of O(MEAS_LEN)) by packaging more than one such subcircuit into a gadget. Barnes, et al. Expires 7 April 2025 [Page 89] InternetDraft VDAF October 2024 The ParallelSum gadget is parameterized with an arithmetic subcircuit, and a count of how many times it evaluates that subcircuit. It takes in a list of inputs and passes them through to instances of the subcircuit in the same order. It returns the sum of the subcircuit outputs. Note that only the ParallelSum gadget itself, and not its subcircuit, participates in the FLP's wire recording during evaluation, gadget consistency proofs, and proof validation, even though the subcircuit is provided to ParallelSum as an implementation of the Gadget interface. def eval(self, field: type[F], inp: list[F]) > F: self.check_gadget_eval(inp) out = field(0) for i in range(self.count): start_index = i * self.subcircuit.ARITY end_index = (i + 1) * self.subcircuit.ARITY out += self.subcircuit.eval( field, inp[start_index:end_index], ) return out The SumVec validity circuit checks that the encoded measurement consists of ones and zeros. Rather than use the Range2 gadget on each element, as in the Sum validity circuit, it instead uses Mul subcircuits and "free" constant multiplication and addition gates to simultaneously evaluate the same range check polynomial on each element, and multiply by a constant. One of the two Mul subcircuit inputs is equal to a measurement element multiplied by a power of one of the elements of the joint randomness vector, and the other is equal to the same measurement element minus one. These Mul subcircuits are evaluated by a ParallelSum gadget, and the results are added up both within the ParallelSum gadget and after it. Barnes, et al. Expires 7 April 2025 [Page 90] InternetDraft VDAF October 2024 def eval( self, meas: list[F], joint_rand: list[F], num_shares: int) > list[F]: self.check_valid_eval(meas, joint_rand) out = self.field(0) shares_inv = self.field(num_shares).inv() for i in range(self.GADGET_CALLS[0]): r = joint_rand[i] r_power = r inputs: list[Optional[F]] inputs = [None] * (2 * self.chunk_length) for j in range(self.chunk_length): index = i * self.chunk_length + j if index < len(meas): meas_elem = meas[index] else: meas_elem = self.field(0) inputs[j * 2] = r_power * meas_elem inputs[j * 2 + 1] = meas_elem  shares_inv r_power *= r out += self.GADGETS[0].eval( self.field, cast(list[F], inputs), ) return [out] Barnes, et al. Expires 7 April 2025 [Page 91] InternetDraft VDAF October 2024 +=================+====================================+  Parameter  Value  +=================+====================================+  GADGETS  [ParallelSum(Mul(), chunk_length)]  +++  GADGET_CALLS  [(length * bits + chunk_length     1) // chunk_length]  +++  MEAS_LEN  length * bits  +++  OUTPUT_LEN  length  +++  JOINT_RAND_LEN  GADGET_CALLS[0]  +++  EVAL_OUTPUT_LEN  1  +++  Measurement  list[int], each element in    range(2**bits)  +++  AggResult  list[int]  +++ Table 14: Parameters of validity circuit SumVec. 7.4.3.1. Selection of ParallelSum chunk length The chunk_length parameter provides a tradeoff between the arity of the ParallelSum gadget and the number of times the gadget is called. The proof length is asymptotically minimized when the chunk length is near the square root of the length of the measurement. However, the relationship between VDAF parameters and proof length is complicated, involving two forms of rounding (the circuit pads the inputs to its last ParallelSum gadget call, up to the chunk length, and proof system rounds the degree of wire polynomials  determined by the number of times a gadget is called  up to the next power of two). Therefore, the optimal choice of chunk_length for a concrete measurement size will vary, and must be found through trial and error. Setting chunk_length equal to the square root of the appropriate measurement length will result in proofs up to 50% larger than the optimal proof size. Barnes, et al. Expires 7 April 2025 [Page 92] InternetDraft VDAF October 2024 7.4.4. Prio3Histogram +===========+=============================+  Parameter  Value  +===========+=============================+  Valid  Histogram(Field128, length,    chunk_lengh) (this section)  +++  Field  Field128 (Table 3)  +++  PROOFS  1  +++  Xof  XofTurboShake128    (Section 6.2.1)  +++ Table 15: Parameters for Prio3Histogram. This instance of Prio3 allows for estimating the distribution of some quantity by computing a simple histogram. Each measurement increments one histogram bucket, out of a set of fixed buckets. (Bucket indexing begins at 0.) For example, the buckets might quantize the real numbers, and each measurement would report the bucket that the corresponding client's realnumbered value falls into. The aggregate result counts the number of measurements in each bucket. The validity circuit is denoted Histogram. It has two parameters, length, the number of histogram buckets, and chunk_length, which is used by by a circuit optimization described below. It is RECOMMENDED to set chunk_length to an integer near the square root of length (see Section 7.4.3.1). The measurement is encoded as a onehot vector representing the bucket into which the measurement falls: Barnes, et al. Expires 7 April 2025 [Page 93] InternetDraft VDAF October 2024 def encode(self, measurement: int) > list[F]: encoded = [self.field(0)] * self.length encoded[measurement] = self.field(1) return encoded def truncate(self, meas: list[F]) > list[F]: return meas def decode( self, output: list[F], _num_measurements: int) > list[int]: return [bucket_count.as_unsigned() for bucket_count in output] The Histogram validity circuit checks for onehotness in two steps, by checking that the encoded measurement consists of ones and zeros, and by checking that the sum of all elements in the encoded measurement is equal to one. The individual checks constitute the output of the circuit. As in the SumVec validity circuit (Section 7.4.3), the first part of the validity circuit uses the ParallelSum gadget to perform range checks while achieving a smaller proof size. The ParallelSum gadget uses Mul subcircuits to evaluate a range check polynomial on each element, and includes an additional constant multiplication. One of the two Mul subcircuit inputs is equal to a measurement element multiplied by a power of an element of the joint randomness vector, and the other is equal to the same measurement element minus one. The results are added up both within the ParallelSum gadget and after it. Barnes, et al. Expires 7 April 2025 [Page 94] InternetDraft VDAF October 2024 def eval( self, meas: list[F], joint_rand: list[F], num_shares: int) > list[F]: self.check_valid_eval(meas, joint_rand) # Check that each bucket is one or zero. range_check = self.field(0) shares_inv = self.field(num_shares).inv() for i in range(self.GADGET_CALLS[0]): r = joint_rand[i] r_power = r inputs: list[Optional[F]] inputs = [None] * (2 * self.chunk_length) for j in range(self.chunk_length): index = i * self.chunk_length + j if index < len(meas): meas_elem = meas[index] else: meas_elem = self.field(0) inputs[j * 2] = r_power * meas_elem inputs[j * 2 + 1] = meas_elem  shares_inv r_power *= r range_check += self.GADGETS[0].eval( self.field, cast(list[F], inputs), ) # Check that the buckets sum to 1. sum_check = shares_inv for b in meas: sum_check += b return [range_check, sum_check] Note that this circuit depends on the number of shares into which the measurement is sharded. This is provided to the FLP by Prio3. Barnes, et al. Expires 7 April 2025 [Page 95] InternetDraft VDAF October 2024 +=================+===============================================+  Parameter  Value  +=================+===============================================+  GADGETS  [ParallelSum(Mul(), chunk_length)]  +++  GADGET_CALLS  [(length + chunk_length  1) // chunk_length]  +++  MEAS_LEN  length  +++  OUTPUT_LEN  length  +++  JOINT_RAND_LEN  GADGET_CALLS[0]  +++  EVAL_OUTPUT_LEN  2  +++  Measurement  int  +++  AggResult  list[int]  +++ Table 16: Parameters of validity circuit Histogram. 7.4.5. Prio3MultihotCountVec +===========+=========================================+  Parameter  Value  +===========+=========================================+  Valid  MultihotCountVec(Field128, length,    max_weight, chunk_lengh) (this section)  +++  Field  Field128 (Table 3)  +++  PROOFS  1  +++  Xof  XofTurboShake128 (Section 6.2.1)  +++ Table 17: Parameters for Prio3MultihotCountVec. For this instance of Prio3, each measurement is a vector of ones and zeros, where the number of ones is bounded. This provides a functionality similar to Prio3Histogram except that more than one entry may be nonzero. This allows Prio3MultihotCountVec to be composed with a randomized response mechanism, like [EPK14], for providing differential privacy. (For example, each Client would set each entry to one with some small probability.) Barnes, et al. Expires 7 April 2025 [Page 96] InternetDraft VDAF October 2024 Prio3MultihotCountVec uses XofTurboShake128 (Section 6.2.1) as its XOF. Its validity circuit is denoted MultihotCountVec. It has three parameters: length, the number of of entries in the count vector; max_weight, the maximum number of nonzero entries (i.e., the weight must be at most max_weight); and chunk_length, used the same way as in Section 7.4.3 and Section 7.4.4. Validation works as follows. Let * bits_for_weight = max_weight.bit_length() * offset = 2**bits_for_weight  1  max_weight The Client reports the weight of the count vector by adding offset to it and bitencoding the result. Observe that only a weight of at most max_weight can be encoded with bits_for_weight bits. The verifier checks that each entry of the encoded measurement is a bit (i.e., either one or zero). It then decodes the reported weight and subtracts it from offset + sum(count_vec), where count_vec is the count vector. The result is zero if and only if the reported weight is equal to the true weight. Encoding, truncation, and decoding are defined as follows: Barnes, et al. Expires 7 April 2025 [Page 97] InternetDraft VDAF October 2024 def encode(self, measurement: list[int]) > list[F]: if len(measurement) != self.length: raise ValueError('invalid Client measurement length') # The first part is the vector of counters. count_vec = list(map(self.field, measurement)) # The second part is the reported weight. weight_reported = sum(count_vec, self.field(0)) encoded = [] encoded += count_vec encoded += self.field.encode_into_bit_vector( (self.offset + weight_reported).as_unsigned(), self.bits_for_weight) return encoded def truncate(self, meas: list[F]) > list[F]: return meas[:self.length] def decode( self, output: list[F], _num_measurements: int) > list[int]: return [bucket_count.as_unsigned() for bucket_count in output] Circuit evaluation is defined as follows: Barnes, et al. Expires 7 April 2025 [Page 98] InternetDraft VDAF October 2024 def eval( self, meas: list[F], joint_rand: list[F], num_shares: int) > list[F]: self.check_valid_eval(meas, joint_rand) # Check that each entry in the input vector is one or zero. range_check = self.field(0) shares_inv = self.field(num_shares).inv() for i in range(self.GADGET_CALLS[0]): r = joint_rand[i] r_power = r inputs: list[Optional[F]] inputs = [None] * (2 * self.chunk_length) for j in range(self.chunk_length): index = i * self.chunk_length + j if index < len(meas): meas_elem = meas[index] else: meas_elem = self.field(0) inputs[j * 2] = r_power * meas_elem inputs[j * 2 + 1] = meas_elem  shares_inv r_power *= r range_check += self.GADGETS[0].eval( self.field, cast(list[F], inputs), ) # Check that the weight `offset` plus the sum of the counters # is equal to the value claimed by the Client. count_vec = meas[:self.length] weight = sum(count_vec, self.field(0)) weight_reported = \ self.field.decode_from_bit_vector(meas[self.length:]) weight_check = self.offset*shares_inv + weight  \ weight_reported return [range_check, weight_check] Barnes, et al. Expires 7 April 2025 [Page 99] InternetDraft VDAF October 2024 +=================+====================================+  Parameter  Value  +=================+====================================+  GADGETS  [ParallelSum(Mul(), chunk_length)]  +++  GADGET_CALLS  [(length + bits_for_weight +    chunk_length  1) // chunk_length]  +++  MEAS_LEN  length + bits_for_weight  +++  OUTPUT_LEN  length  +++  JOINT_RAND_LEN  GADGET_CALLS[0]  +++  EVAL_OUTPUT_LEN  2  +++  Measurement  list[int]  +++  AggResult  list[int]  +++ Table 18: Parameters of validity circuit MultihotCountVec. 8. Poplar1 This section specifies Poplar1, a VDAF for the following task. Each Client holds a bitstring of length BITS and the Aggregators hold a sequence of Lbit strings, where L <= BITS. We will refer to the latter as the set of "candidate prefixes". The Aggregators' goal is to count how many measurements are prefixed by each candidate prefix. This functionality is the core component of the Poplar protocol [BBCGGI21], which was designed to compute the heavy hitters over a set of input strings. At a high level, the protocol works as follows. 1. Each Client splits its string into input shares and sends one share to each Aggregator. 2. The Aggregators agree on an initial set of candidate prefixes, say 0 and 1. 3. The Aggregators evaluate the VDAF on each set of input shares and aggregate the recovered output shares. The aggregation parameter is the set of candidate prefixes. Barnes, et al. Expires 7 April 2025 [Page 100] InternetDraft VDAF October 2024 4. The Aggregators send their aggregate shares to the Collector, who combines them to recover the counts of each candidate prefix. 5. Let H denote the set of prefixes that occurred at least t times. If the prefixes all have length BITS, then H is the set of t heavyhitters. Otherwise compute the next set of candidate prefixes, e.g., for each p in H, add p  0 and p  1 to the set. Repeat step 3 with the new set of candidate prefixes. Poplar1 is constructed from an "Incremental Distributed Point Function (IDPF)", a primitive described by [BBCGGI21] that generalizes the notion of a Distributed Point Function (DPF) [GI14]. Briefly, a DPF is used to distribute the computation of a "point function", a function that evaluates to zero on every input except at a programmable "point". The computation is distributed in such a way that no one party knows either the point or what it evaluates to. An IDPF generalizes this "point" to a path on a full binary tree from the root to one of the leaves. It is evaluated on an "index" representing a unique node of the tree. If the node is on the programmed path, then the function evaluates to a nonzero value; otherwise it evaluates to zero. This structure allows an IDPF to provide the functionality required for the above protocol: to compute the hit count for an index, just evaluate each set of IDPF shares at that index and add up the results. Consider the subtree constructed from a set of input strings and a target threshold t by including all indices that prefix at least t of the input strings. We shall refer to this structure as the "prefix tree" for the batch of inputs and target threshold. To compute the theavy hitters for a set of inputs, the Aggregators and Collector first compute the prefix tree, then extract the heavy hitters from the leaves of this tree. (Note that the prefix tree may leak more information about the set than the heavy hitters themselves; see Section 9.4.1 for details.) Poplar1 composes an IDPF with the arithmetic sketch of [BBCGGI21], Section 4.2. (The paper calls this a "secure sketch", but the underlying technique was later generalized in [BBCGGI23], where it is called "arithmetic sketching".) This protocol ensures that evaluating a set of input shares on a unique set of candidate prefixes results in shares of a "onehot" vector, i.e., a vector that is zero everywhere except for at most one element, which is equal to one. The remainder of this section is structured as follows. IDPFs are defined in Section 8.1; a concrete instantiation is given Section 8.3. The Poplar1 VDAF is defined in Section 8.2 in terms of Barnes, et al. Expires 7 April 2025 [Page 101] InternetDraft VDAF October 2024 a generic IDPF. Finally, a concrete instantiation of Poplar1 is specified in Section 8.4; test vectors can be found in Appendix "Test Vectors". 8.1. Incremental Distributed Point Functions (IDPFs) An IDPF is defined over a domain of size 2^BITS, where BITS is a constant. Indices into the IDPF tree are bit strings. (In Poplar1, each Client's bit string is an index; see Section 8.1.1 for details.) The Client specifies an index alpha and a vector of values beta, one for each "level" L in range(BITS). The key generation algorithm generates one IDPF "key" for each Aggregator. When evaluated at level L and index prefix, each IDPF key returns an additive share of beta[L] if prefix is the Lbit prefix of alpha and shares of zero otherwise. Each of the programmed points beta is a vector of elements of some finite field. We distinguish two types of fields: one for inner nodes (denoted FieldInner), and one for leaf nodes (FieldLeaf). (Our instantiation of Poplar1 (Section 8.4) will use a much larger field for leaf nodes than for inner nodes. This is to ensure the IDPF is "extractable" as defined in [BBCGGI21], Definition 1.) A concrete IDPF defines the types and constants enumerated in Table 19. In the remainder we write Output as shorthand for the type list[list[FieldInner]]  list[list[FieldLeaf]]. (This type denotes either a vector of inner node field elements or leaf node field elements.) The scheme is comprised of the following algorithms: * idpf.gen(alpha: tuple[bool, ...], beta_inner: list[list[FieldInner]], beta_leaf: list[FieldLeaf], ctx: bytes, nonce: bytes, rand: bytes) > tuple[PublicShare, list[bytes]] is the randomized IDPFkey generation algorithm. Its inputs are the index alpha, the values beta, an application context string, and a nonce string. The output is a public part (of type PublicShare) that is sent to all Aggregators and a vector of private IDPF keys, one for each aggregator. The binder string is used to derive the key in the underlying XofFixedKeyAes128 XOF that is used for expanding seeds at each level. Preconditions:  alpha MUST have length BITS.  beta_inner MUST have length BITS  1. Barnes, et al. Expires 7 April 2025 [Page 102] InternetDraft VDAF October 2024  beta_inner[level] MUST have length VALUE_LEN for each level in range(BITS  1).  beta_leaf MUST have length VALUE_LEN.  rand MUST be generated by a CSPRNG and have length RAND_SIZE.  nonce MUST be of length Idpf.NONCE_SIZE and chosen uniformly at random by the Client (see Section 9.2). * idpf.eval(agg_id: int, public_share: PublicShare, key: bytes, level: int, prefixes: Sequence[tuple[bool, ...]], ctx: bytes, nonce: bytes) > Output is the deterministic, stateless IDPFkey evaluation algorithm run by each Aggregator. Its inputs are the Aggregator's unique identifier, the public share distributed to all of the Aggregators, the Aggregator's IDPF key, the "level" at which to evaluate the IDPF, the sequence of candidate prefixes, an application context string, and a nonce string. It returns the share of the value corresponding to each candidate prefix. The output type (i.e., Output) depends on the value of level: if level < BITS1, the output is the value for an inner node, which has type list[list[FieldInner]]; otherwise, if level == BITS1, then the output is the value for a leaf node, which has type list[list[FieldLeaf]]. Preconditions:  agg_id MUST be in range(SHARES) and match the index of key in the sequence of IDPF keys output by the Client.  level MUST be in range(0, BITS).  Each prefix in prefixes MUST be distinct and have length level + 1. In addition, the following method is derived for each concrete Idpf: def current_field( self, level: int) > type[FieldInner]  type[FieldLeaf]: if level < self.BITS  1: return self.field_inner return self.field_leaf Finally, an implementation note. The interface for IDPFs specified here is stateless, in the sense that there is no state carried between IDPF evaluations. This is to align the IDPF syntax with the Barnes, et al. Expires 7 April 2025 [Page 103] InternetDraft VDAF October 2024 VDAF abstraction boundary, which does not include shared state across VDAF evaluations. In practice, of course, it will often be beneficial to expose a stateful API for IDPFs and carry the state across evaluations. See Section 8.3 for details. +=============+====================================================+  Parameter  Description  +=============+====================================================+  SHARES  Number of IDPF keys output by IDPFkey generator  +++  BITS  Length in bits of each input string  +++  VALUE_LEN  Number of field elements of each output value  +++  RAND_SIZE  Size of the random string consumed by the IDPFkey    generator. Equal to twice the XOF's seed size.  +++  NONCE_SIZE  Size of the randon nonce generated by the Client.  +++  KEY_SIZE  Size in bytes of each IDPF key  +++  FieldInner  Implementation of Field (Section 6.1) used for    values of inner nodes  +++  FieldLeaf  Implementation of Field used for values of leaf    nodes  +++  PublicShare  Type of public share for this IDPF  +++  Output  Alias of list[list[FieldInner]]     list[list[FieldLeaf]]  +++  FieldVec  Alias of list[FieldInner]  list[FieldLeaf]  +++ Table 19: Constants and types defined by a concrete IDPF. 8.1.1. Encoding inputs as indices How data are represented as IDPF indices is up to the application. When the inputs are fixedlength byte strings, the most natural choice of representation is as a bit string formed from all the bits of the byte string, first ordered by byte position, then ordered from most significant bit to least significant bit within each byte. This ensures that, when a byte string is a prefix of another, so too is its corresponding index. (Index prefixes are defined in Section 8.1.) For example, Barnes, et al. Expires 7 April 2025 [Page 104] InternetDraft VDAF October 2024 Byte string: 01 02 Bit string: 00000001 00000010 is a prefix of Byte string: 01 02 03 Bit string: 00000001 00000010 00000011 Additionally, lexicographic ordering is preserved by this mapping from a byte string to a bit string. When the inputs are variable length, it is necessary to pad each input to some fixed length. Further, the padding scheme must be non ambiguous. For example, each input could be padded with b"\x01" followed by as many b"\x00" bytes as needed. 8.2. Construction This section specifies Poplar1, an implementation of the Vdaf interface (Section 5). It is defined in terms of any Idpf (Section 8.1) for which SHARES == 2 and VALUE_LEN == 2 and an implementation of Xof (Section 6.2). The associated constants and types required by the Vdaf interface are defined in Table 20. The methods required for sharding, preparation, aggregation, and unsharding are described in the remaining subsections. These methods make use of constants defined in Table 21. Barnes, et al. Expires 7 April 2025 [Page 105] InternetDraft VDAF October 2024 +=================+========================================+  Parameter  Value  +=================+========================================+  VERIFY_KEY_SIZE  Xof.SEED_SIZE  +++  RAND_SIZE  Xof.SEED_SIZE * 3 + Idpf.RAND_SIZE  +++  NONCE_SIZE  16  +++  ROUNDS  2  +++  SHARES  2  +++  Measurement  tuple[bool, ...]  +++  AggParam  tuple[int, Sequence[tuple[bool, ...]]]  +++  PublicShare  same as the IDPF  +++  InputShare  tuple[bytes, bytes, list[FieldInner],    list[FieldLeaf]]  +++  OutShare  FieldVec  +++  AggShare  FieldVec  +++  AggResult  list[int]  +++  PrepState  tuple[bytes, int, FieldVec]  +++  PrepShare  FieldVec  +++  PrepMessage  Optional[FieldVec]  +++ Table 20: VDAF parameters for Poplar1. Barnes, et al. Expires 7 April 2025 [Page 106] InternetDraft VDAF October 2024 +========================+=======+  Variable  Value  +========================+=======+  USAGE_SHARD_RAND: int  1  +++  USAGE_CORR_INNER: int  2  +++  USAGE_CORR_LEAF: int  3  +++  USAGE_VERIFY_RAND: int  4  +++ Table 21: Constants used by Poplar1. 8.2.1. Sharding The Client's measurement is an IDPF index, denoted alpha. (See Section 8.1.1 for guidelines on index encoding.) The programmed IDPF values are pairs of field elements (1, k) where each k is chosen at random. This random value is used as part of the arithmetic sketching protocol of [BBCGGI21], Appendix C.4. After evaluating their IDPF key shares on a given sequence of candidate prefixes, the sketching protocol is used by the Aggregators to verify that they hold shares of a onehot vector. In addition, for each level of the tree, the prover generates random elements a, b, and c and computes A = 2*a + k B = a^2 + b  k*a + c and sends additive shares of a, b, c, A and B to the Aggregators. Putting everything together, the sharding algorithm is defined as follows. def shard( self, ctx: bytes, measurement: tuple[bool, ...], nonce: bytes, rand: bytes, ) > tuple[Poplar1PublicShare, list[Poplar1InputShare]]: if len(nonce) != self.NONCE_SIZE: raise ValueError("incorrect nonce size") if len(rand) != self.RAND_SIZE: raise ValueError("incorrect size of random bytes argument") l = self.xof.SEED_SIZE Barnes, et al. Expires 7 April 2025 [Page 107] InternetDraft VDAF October 2024 # Split the random input into the random input for IDPF key # generation, correlated randomness, and sharding. if len(rand) != self.RAND_SIZE: raise ValueError('incorrect rand size') idpf_rand, rand = front(self.idpf.RAND_SIZE, rand) seeds = [rand[i:i + l] for i in range(0, 3 * l, l)] corr_seed, seeds = front(2, seeds) (k_shard,), seeds = front(1, seeds) xof = self.xof( k_shard, self.domain_separation_tag(USAGE_SHARD_RAND, ctx), nonce, ) # Construct the IDPF values for each level of the IDPF tree. # Each "data" value is 1; in addition, the Client generates # a random "authenticator" value used by the Aggregators to # evaluate the sketch during preparation. This sketch is used # to verify the onehotness of their output shares. beta_inner = [ [self.idpf.field_inner(1), k] for k in xof.next_vec(self.idpf.field_inner, self.idpf.BITS  1) ] beta_leaf = [self.idpf.field_leaf(1)] + \ xof.next_vec(self.idpf.field_leaf, 1) # Generate the IDPF keys. (public_share, keys) = self.idpf.gen( measurement, beta_inner, beta_leaf, ctx, nonce, idpf_rand, ) # Generate correlated randomness used by the Aggregators to # evaluate the sketch over their output shares. Seeds are used # to encode shares of the `(a, b, c)` triples. (See [BBCGGI21, # Appendix C.4].) corr_offsets: list[Field] = vec_add( self.xof.expand_into_vec( self.idpf.field_inner, corr_seed[0], self.domain_separation_tag(USAGE_CORR_INNER, ctx), byte(0) + nonce, Barnes, et al. Expires 7 April 2025 [Page 108] InternetDraft VDAF October 2024 3 * (self.idpf.BITS  1), ), self.xof.expand_into_vec( self.idpf.field_inner, corr_seed[1], self.domain_separation_tag(USAGE_CORR_INNER, ctx), byte(1) + nonce, 3 * (self.idpf.BITS  1), ), ) corr_offsets += vec_add( self.xof.expand_into_vec( self.idpf.field_leaf, corr_seed[0], self.domain_separation_tag(USAGE_CORR_LEAF, ctx), byte(0) + nonce, 3, ), self.xof.expand_into_vec( self.idpf.field_leaf, corr_seed[1], self.domain_separation_tag(USAGE_CORR_LEAF, ctx), byte(1) + nonce, 3, ), ) # For each level of the IDPF tree, shares of the `(A, B)` # pairs are computed from the corresponding `(a, b, c)` # triple and authenticator value `k`. corr_inner: list[list[Field64]] = [[], []] for level in range(self.idpf.BITS): field = cast(type[Field], self.idpf.current_field(level)) k = beta_inner[level][1] if level < self.idpf.BITS  1 \ else beta_leaf[1] (a, b, c), corr_offsets = corr_offsets[:3], corr_offsets[3:] A = field(2) * a + k B = a ** 2 + b  a * k + c corr1 = xof.next_vec(field, 2) corr0 = vec_sub([A, B], corr1) if level < self.idpf.BITS  1: corr_inner[0] += cast(list[Field64], corr0) corr_inner[1] += cast(list[Field64], corr1) else: corr_leaf = [ cast(list[Field255], corr0), cast(list[Field255], corr1), ] Barnes, et al. Expires 7 April 2025 [Page 109] InternetDraft VDAF October 2024 # Each input share consists of the Aggregator's IDPF key # and a share of the correlated randomness. input_shares = list(zip(keys, corr_seed, corr_inner, corr_leaf)) return (public_share, input_shares) Figure 26: The sharding algorithm for Poplar1. 8.2.2. Preparation The aggregation parameter encodes a sequence of candidate prefixes. When an Aggregator receives an input share from the Client, it begins by evaluating its IDPF share on each candidate prefix, recovering a data_share and auth_share for each. The Aggregators use these and the correlation shares provided by the Client to verify that the sequence of data_share values are additive shares of a onehot vector. Aggregators MUST ensure the candidate prefixes are all unique and appear in lexicographic order. (This is enforced in the definition of prep_init() below.) Uniqueness is necessary to ensure the refined measurement (i.e., the sum of the output shares) is in fact a onehot vector. Otherwise, sketch verification might fail, causing the Aggregators to erroneously reject a report that is actually valid. Note that enforcing the order is not strictly necessary, but this does allow uniqueness to be determined more efficiently. def prep_init( self, verify_key: bytes, ctx: bytes, agg_id: int, agg_param: Poplar1AggParam, nonce: bytes, public_share: Poplar1PublicShare, input_share: Poplar1InputShare) > tuple[ Poplar1PrepState, FieldVec]: (level, prefixes) = agg_param (key, corr_seed, corr_inner, corr_leaf) = input_share field = self.idpf.current_field(level) # Ensure that candidate prefixes are all unique and appear in # lexicographic order. for i in range(1, len(prefixes)): if prefixes[i  1] >= prefixes[i]: raise ValueError('out of order prefix') # Evaluate the IDPF key at the given set of prefixes. Barnes, et al. Expires 7 April 2025 [Page 110] InternetDraft VDAF October 2024 value = self.idpf.eval( agg_id, public_share, key, level, prefixes, ctx, nonce) # Get shares of the correlated randomness for evaluating the # Aggregator's share of the sketch. if level < self.idpf.BITS  1: corr_xof = self.xof( corr_seed, self.domain_separation_tag(USAGE_CORR_INNER, ctx), byte(agg_id) + nonce, ) # Fastforward the XOF state to the current level. corr_xof.next_vec(field, 3 * level) else: corr_xof = self.xof( corr_seed, self.domain_separation_tag(USAGE_CORR_LEAF, ctx), byte(agg_id) + nonce, ) (a_share, b_share, c_share) = corr_xof.next_vec(field, 3) if level < self.idpf.BITS  1: (A_share, B_share) = cast( list[Field], corr_inner[2 * level:2 * (level + 1)], ) else: (A_share, B_share) = cast(list[Field], corr_leaf) # Evaluate the Aggregator's share of the sketch. These are # called the "masked input values" [BBCGGI21, Appendix C.4]. verify_rand_xof = self.xof( verify_key, self.domain_separation_tag(USAGE_VERIFY_RAND, ctx), nonce + to_be_bytes(level, 2), ) verify_rand = cast( list[Field], verify_rand_xof.next_vec(field, len(prefixes)), ) sketch_share = [a_share, b_share, c_share] out_share = [] for (i, r) in enumerate(verify_rand): data_share = cast(Field, value[i][0]) auth_share = cast(Field, value[i][1]) sketch_share[0] += data_share * r sketch_share[1] += data_share * r ** 2 sketch_share[2] += auth_share * r out_share.append(data_share) Barnes, et al. Expires 7 April 2025 [Page 111] InternetDraft VDAF October 2024 prep_mem = [A_share, B_share, field(agg_id)] + out_share return ( ( b'evaluate sketch', level, cast(FieldVec, prep_mem), ), cast(FieldVec, sketch_share), ) def prep_next( self, _ctx: bytes, prep_state: Poplar1PrepState, prep_msg: Optional[FieldVec] ) > tuple[Poplar1PrepState, FieldVec]  FieldVec: prev_sketch = cast(list[Field], prep_msg) (step, level, prep_mem) = prep_state if step == b'evaluate sketch': if prev_sketch is None: raise ValueError('expected value, got none') elif len(prev_sketch) != 3: raise ValueError('incorrect sketch length') A_share = cast(Field, prep_mem[0]) B_share = cast(Field, prep_mem[1]) agg_id = cast(Field, prep_mem[2]) prep_mem = prep_mem[3:] sketch_share = [ agg_id * (prev_sketch[0] ** 2  prev_sketch[1]  prev_sketch[2]) + A_share * prev_sketch[0] + B_share ] return cast( tuple[Poplar1PrepState, FieldVec], ( ( b'reveal sketch', level, prep_mem, ), sketch_share, ) ) elif step == b'reveal sketch': Barnes, et al. Expires 7 April 2025 [Page 112] InternetDraft VDAF October 2024 if prev_sketch is None: return prep_mem # Output shares else: raise ValueError('invalid prep message') raise ValueError('invalid prep state') def prep_shares_to_prep( self, _ctx: bytes, agg_param: Poplar1AggParam, prep_shares: list[FieldVec]) > Optional[FieldVec]: if len(prep_shares) != 2: raise ValueError('incorrect number of prep shares') (level, _) = agg_param field = self.idpf.current_field(level) sketch = vec_add( cast(list[Field], prep_shares[0]), cast(list[Field], prep_shares[1]), ) if len(sketch) == 3: return cast(FieldVec, sketch) elif len(sketch) == 1: if sketch == field.zeros(1): # In order to reduce communication overhead, let `None` # denote a successful sketch verification. return None else: raise ValueError('sketch verification failed') else: raise ValueError('incorrect sketch length') Figure 27: Preparation state for Poplar1. 8.2.3. Validity of Aggregation Parameters Aggregation parameters are valid for a given input share if no aggregation parameter with the same level has been used with the same input share before. The whole preparation phase MUST NOT be run more than once for a given combination of input share and level. This function checks that levels are increasing between calls, and also enforces that the prefixes at each level are suffixes of the previous level's prefixes. Barnes, et al. Expires 7 April 2025 [Page 113] InternetDraft VDAF October 2024 def get_ancestor( index: tuple[bool, ...], level: int) > tuple[bool, ...]: """ Helper function to determine the prefix of `index` at `level`. """ return index[:level + 1] def is_valid( self, agg_param: Poplar1AggParam, previous_agg_params: list[Poplar1AggParam]) > bool: """ Checks that levels are increasing between calls, and also enforces that the prefixes at each level are suffixes of the previous level's prefixes. """ if len(previous_agg_params) < 1: return True (level, prefixes) = agg_param (last_level, last_prefixes) = previous_agg_params[1] last_prefixes_set = set(last_prefixes) # Check that level increased. if level <= last_level: return False # Check that prefixes are suffixes of the last level's prefixes. for prefix in prefixes: last_prefix = get_ancestor(prefix, last_level) if last_prefix not in last_prefixes_set: # Current prefix not a suffix of last level's prefixes. return False return True Figure 28: Validity of aggregation parameters for Poplar1. 8.2.4. Aggregation Aggregation involves simply adding up the output shares. Barnes, et al. Expires 7 April 2025 [Page 114] InternetDraft VDAF October 2024 def aggregate( self, agg_param: Poplar1AggParam, out_shares: list[FieldVec]) > FieldVec: (level, prefixes) = agg_param field = self.idpf.current_field(level) agg_share = cast(list[Field], field.zeros(len(prefixes))) for out_share in out_shares: agg_share = vec_add(agg_share, cast(list[Field], out_share)) return cast(FieldVec, agg_share) Figure 29: Aggregation algorithm for Poplar1. 8.2.5. Unsharding Finally, the Collector unshards the aggregate result by adding up the aggregate shares. def unshard( self, agg_param: Poplar1AggParam, agg_shares: list[FieldVec], _num_measurements: int) > list[int]: (level, prefixes) = agg_param field = self.idpf.current_field(level) agg = cast(list[Field], field.zeros(len(prefixes))) for agg_share in agg_shares: agg = vec_add(agg, cast(list[Field], agg_share)) return [x.as_unsigned() for x in agg] Figure 30: Computation of the aggregate result for Poplar1. 8.2.6. Message Serialization This section defines serialization formats for messages exchanged over the network while executing Poplar1. It is RECOMMENDED that implementations provide serialization methods for them. Message structures are defined following Section 3 of [RFC8446]). In the remainder we use Fi as an alias for poplar1.idpf.field_inner.ENCODED_SIZE, Fl as an alias for poplar1.idpf.field_leaf.ENCODED_SIZE, and B as an alias for poplar1.idpf.BITS. Elements of the inner field are encoded in littleendian byte order (as defined in Section 6.1) and are represented as follows: opaque Poplar1FieldInner[Fi]; Barnes, et al. Expires 7 April 2025 [Page 115] InternetDraft VDAF October 2024 Likewise, elements of the leaf field are encoded in littleendian byte order (as defined in Section 6.1) and are represented as follows: opaque Poplar1FieldLeaf[Fl]; 8.2.6.1. Public Share The public share of the IDPF scheme in Section 8.3 consists of a sequence of "correction words". A correction word has three components: 1. the XOF seed of type bytes; 2. the control bits of type tuple[Field2, Field2]; and 3. the payload of type list[Field64] for the first BITS1 words and list[Field255] for the last word. The encoding is a straightforward structure of arrays, except that the control bits are packed as tightly as possible. The encoded public share is structured as follows: struct { opaque packed_control_bits[packed_len]; opaque seed[poplar1.idpf.KEY_SIZE*B]; Poplar1FieldInner payload_inner[Fi*poplar1.idpf.VALUE_LEN*(B1)]; Poplar1FieldLeaf payload_leaf[Fl*poplar1.idpf.VALUE_LEN]; } Poplar1PublicShare; Here packed_len = (2*B + 7) // 8 is the length of the packed control bits. Field packed_control_bits is encoded with the following function: packed_control_buf = [int(0)] * packed_len for i, bit in enumerate(control_bits): packed_control_buf[i // 8] = bit.as_unsigned() << (i % 8) packed_control_bits = bytes(packed_control_buf) It encodes each group of eight bits into a byte, in LSB to MSB order, padding the most significant bits of the last byte with zeros as necessary, and returns the byte array. Decoding performs the reverse operation: it takes in a byte array and a number of bits, and returns a list of bits, extracting eight bits from each byte in turn, in LSB to MSB order, and stopping after the requested number of bits. If the byte array has an incorrect length, or if unused bits in the last bytes are not zero, it throws an error: Barnes, et al. Expires 7 April 2025 [Page 116] InternetDraft VDAF October 2024 control_bits = [] for i in range(length): control_bits.append(Field2( (packed_control_bits[i // 8] >> (i % 8)) & 1 )) leftover_bits = packed_control_bits[1] >> ( (length + 7) % 8 + 1 ) if (length + 7) // 8 != len(packed_control_bits) or \ leftover_bits != 0: raise ValueError('trailing bits') 8.2.6.2. Input Share Each input share is structured as follows: struct { opaque idpf_key[poplar1.idpf.KEY_SIZE]; opaque corr_seed[poplar1.xof.SEED_SIZE]; Poplar1FieldInner corr_inner[Fi * 2 * (poplar1.idpf.BITS  1)]; Poplar1FieldLeaf corr_leaf[Fl * 2]; } Poplar1InputShare; 8.2.6.3. Prep Share Encoding of the prep share depends on the round of sketching: if the first round, then each sketch share has three field elements; if the second round, then each sketch share has one field element. The field that is used depends on the level of the IDPF tree specified by the aggregation parameter, either the inner field or the leaf field. For the first round and inner field: struct { Poplar1FieldInner sketch_share[Fi * 3]; } Poplar1PrepShareRoundOneInner; For the first round and leaf field: struct { Poplar1FieldLeaf sketch_share[Fl * 3]; } Poplar1PrepShareRoundOneLeaf; For the second round and inner field: struct { Poplar1FieldInner sketch_share; } Poplar1PrepShareRoundTwoInner; Barnes, et al. Expires 7 April 2025 [Page 117] InternetDraft VDAF October 2024 For the second round and leaf field: struct { Poplar1FieldLeaf sketch_share; } Poplar1PrepShareRoundTwoLeaf; 8.2.6.4. Prep Message Likewise, the structure of the prep message for Poplar1 depends on the sketching round and field. For the first round and inner field: struct { Poplar1FieldInner[Fi * 3]; } Poplar1PrepMessageRoundOneInner; For the first round and leaf field: struct { Poplar1FieldLeaf sketch[Fl * 3]; } Poplar1PrepMessageRoundOneLeaf; Note that these messages have the same structures as the prep shares for the first round. The secondround prep message is the empty string. This is because the sketch shares are expected to sum to a particular value if the output shares are valid; we represent a successful preparation with the empty string and otherwise return an error. 8.2.6.5. Aggregate Share The encoding of the aggregate share depends on whether the inner or leaf field is used, and the number of candidate prefixes. Both of these are determined by the aggregation parameter. Let prefix_count denote the number of candidate prefixes. For the inner field: struct { Poplar1FieldInner agg_share[Fi * prefix_count]; } Poplar1AggShareInner; For the leaf field: struct { Poplar1FieldLeaf agg_share[Fl * prefix_count]; } Poplar1AggShareLeaf; Barnes, et al. Expires 7 April 2025 [Page 118] InternetDraft VDAF October 2024 8.2.6.6. Aggregation Parameter The aggregation parameter is encoded as follows: struct { uint16_t level; uint32_t num_prefixes; opaque encoded_prefixes[prefixes_len]; } Poplar1AggParam; The fields in this struct are: level, the level of the IDPF tree of each prefixes; num_prefixes, the number of prefixes to evaluate; and encoded_prefixes, the sequence of prefixes encoded into a byte string of length prefixes_len. The prefixes are encoded with the following procedure: prefixes_len = ((level + 1) + 7) // 8 * len(prefixes) encoded_prefixes = bytearray() for prefix in prefixes: for chunk in itertools.batched(prefix, 8): byte_out = 0 for (bit_position, bit) in enumerate(chunk): byte_out = bit << (7  bit_position) encoded_prefixes.append(byte_out) Decoding involves the following procedure: prefixes = [] bytes_per_prefix = ((level + 1) + 7) // 8 for chunk in itertools.batched(encoded_prefixes, bytes_per_prefix): prefix = [] for i in range(level + 1): byte_index = i // 8 bit_offset = 7  (i % 8) bit = (chunk[byte_index] >> bit_offset) & 1 != 0 prefix.append(bit) prefixes.append(tuple(prefix)) Implementation note: the aggregation parameter includes the level of the IDPF tree and the sequence of indices to evaluate. For implementations that perform perreport caching across executions of the VDAF, this may be more information than is strictly needed. In particular, it may be sufficient to convey which indices from the previous execution will have their children included in the next. This would help reduce communication overhead. Barnes, et al. Expires 7 April 2025 [Page 119] InternetDraft VDAF October 2024 8.3. The IDPF scheme of [BBCGGI21] In this section we specify a concrete IDPF suitable for instantiating Poplar1. The scheme gets its name from the name of the protocol of [BBCGGI21]. The constant and type definitions required by the Idpf interface are given in Table 22. Our IDPF requires an XOF for deriving the output shares, as well as a variety of other artifacts used internally. For performance reasons, we instantiate this object using XofFixedKeyAes128 (Section 6.2.2) wherever possible. See Section 9.5 for more information. +============+======================+  Parameter  Value  +============+======================+  SHARES  2  +++  BITS  any positive integer  +++  VALUE_LEN  any positive integer  +++  KEY_SIZE  Xof.SEED_SIZE  +++  FieldInner  Field64 (Table 3)  +++  FieldLeaf  Field255 (Table 3)  +++ Table 22: Constants and type definitions for the IDPF of BBCGGI21. 8.3.1. Key Generation TODO Describe the construction in prose, beginning with a gentle introduction to the high level idea. The description of the IDPFkey generation algorithm makes use of auxiliary functions extend() and convert() defined in Section 8.3.3. In the following, we let Field2 denote the field GF(2). Barnes, et al. Expires 7 April 2025 [Page 120] InternetDraft VDAF October 2024 def gen( self, alpha: tuple[bool, ...], beta_inner: list[list[Field64]], beta_leaf: list[Field255], ctx: bytes, nonce: bytes, rand: bytes) > tuple[list[CorrectionWord], list[bytes]]: if len(alpha) != self.BITS: raise ValueError("incorrect alpha length") if len(beta_inner) != self.BITS  1: raise ValueError("incorrect beta_inner length") if len(rand) != self.RAND_SIZE: raise ValueError("incorrect rand size") if len(nonce) != self.NONCE_SIZE: raise ValueError("incorrect nonce size") key = [ rand[:XofFixedKeyAes128.SEED_SIZE], rand[XofFixedKeyAes128.SEED_SIZE:], ] seed = key.copy() ctrl = [Field2(0), Field2(1)] public_share = [] for level in range(self.BITS): keep = int(alpha[level]) lose = 1  keep bit = Field2(keep) (s0, t0) = self.extend(level, seed[0], ctx, nonce) (s1, t1) = self.extend(level, seed[1], ctx, nonce) seed_cw = xor(s0[lose], s1[lose]) ctrl_cw = ( t0[0] + t1[0] + bit + Field2(1), t0[1] + t1[1] + bit, ) # Implementation note: these conditional XORs and # inputdependent array indices should be replaced with # constanttime selects in practice in order to reduce # leakage via timing side channels. if ctrl[0].as_unsigned(): x0 = xor(s0[keep], seed_cw) ctrl[0] = t0[keep] + ctrl_cw[keep] else: x0 = s0[keep] ctrl[0] = t0[keep] Barnes, et al. Expires 7 April 2025 [Page 121] InternetDraft VDAF October 2024 if ctrl[1].as_unsigned(): x1 = xor(s1[keep], seed_cw) ctrl[1] = t1[keep] + ctrl_cw[keep] else: x1 = s1[keep] ctrl[1] = t1[keep] (seed[0], w0) = self.convert(level, x0, ctx, nonce) (seed[1], w1) = self.convert(level, x1, ctx, nonce) if level < self.BITS  1: b = cast(list[Field], beta_inner[level]) else: b = cast(list[Field], beta_leaf) if len(b) != self.VALUE_LEN: raise ValueError( "length of beta must match the value length" ) w_cw = vec_add(vec_sub(b, w0), w1) # Implementation note: this conditional negation should be # replaced with a constant time select or a constant time # multiplication in practice in order to reduce leakage via # timing side channels. if ctrl[1].as_unsigned(): for i in range(len(w_cw)): w_cw[i] = w_cw[i] public_share.append((seed_cw, ctrl_cw, w_cw)) return (public_share, key) Figure 31: IDPFkey generation algorithm of BBCGGI21. 8.3.2. Key Evaluation TODO Describe in prose how IDPFkey evaluation algorithm works. The description of the IDPFevaluation algorithm makes use of auxiliary functions extend() and convert() defined in Section 8.3.3. def eval( self, agg_id: int, public_share: list[CorrectionWord], key: bytes, level: int, prefixes: Sequence[tuple[bool, ...]], ctx: bytes, nonce: bytes) > list[list[Field64]]  list[list[Field255]]: Barnes, et al. Expires 7 April 2025 [Page 122] InternetDraft VDAF October 2024 if agg_id not in range(self.SHARES): raise ValueError('aggregator id out of range') if level not in range(self.BITS): raise ValueError('level out of range') if len(set(prefixes)) != len(prefixes): raise ValueError('prefixes must be unique') out_share = [] for prefix in prefixes: if len(prefix) != level + 1: raise ValueError('incorrect prefix length') # The Aggregator's output share is the value of a node of # the IDPF tree at the given `level`. The node's value is # computed by traversing the path defined by the candidate # `prefix`. Each node in the tree is represented by a seed # (`seed`) and a control bit (`ctrl`). seed = key ctrl = Field2(agg_id) y: FieldVec for current_level in range(level + 1): bit = int(prefix[current_level]) # Implementation note: typically the current round of # candidate prefixes would have been derived from # aggregate results computed during previous rounds. # For example, when using the IDPF to compute heavy # hitters, a string whose hit count exceeded the # given threshold in the last round would be the # prefix of each `prefix` in the current round. (See # [BBCGGI21, Section 5.1].) In this case, part of the # path would have already been traversed. # # Recomputing nodes along previously traversed paths is # wasteful. Implementations can eliminate this added # complexity by caching nodes (i.e., `(seed, ctrl)` # pairs) output by previous calls to `eval_next()`. (seed, ctrl, y) = self.eval_next( seed, ctrl, public_share[current_level], current_level, bit, ctx, nonce, ) if agg_id == 0: out_share.append(cast(list[Field], y)) Barnes, et al. Expires 7 April 2025 [Page 123] InternetDraft VDAF October 2024 else: out_share.append(vec_neg(cast(list[Field], y))) return cast( list[list[Field64]]  list[list[Field255]], out_share, ) def eval_next( self, prev_seed: bytes, prev_ctrl: Field2, correction_word: CorrectionWord, level: int, bit: int, ctx: bytes, nonce: bytes) > tuple[bytes, Field2, FieldVec]: """ Compute the next node in the IDPF tree along the path determined by a candidate prefix. The next node is determined by `bit`, the bit of the prefix corresponding to the next level of the tree. """ seed_cw = correction_word[0] ctrl_cw = correction_word[1] w_cw = cast(list[Field], correction_word[2]) (s, t) = self.extend(level, prev_seed, ctx, nonce) # Implementation note: these conditional operations and # inputdependent array indices should be replaced with # constanttime selects in practice in order to reduce leakage # via timing side channels. if prev_ctrl.as_unsigned(): s[0] = xor(s[0], seed_cw) s[1] = xor(s[1], seed_cw) t[0] += ctrl_cw[0] t[1] += ctrl_cw[1] next_ctrl = t[bit] convert_output = self.convert(level, s[bit], ctx, nonce) next_seed = convert_output[0] y = cast(list[Field], convert_output[1]) # Implementation note: this conditional addition should be # replaced with a constanttime select in practice in order to # reduce leakage via timing side channels. if next_ctrl.as_unsigned(): for i in range(len(y)): y[i] += w_cw[i] Barnes, et al. Expires 7 April 2025 [Page 124] InternetDraft VDAF October 2024 return (next_seed, next_ctrl, cast(FieldVec, y)) Figure 32: IDPFevaluation generation algorithm of BBCGGI21. 8.3.3. Auxiliary Functions def extend( self, level: int, seed: bytes, ctx: bytes, nonce: bytes) > tuple[list[bytes], list[Field2]]: xof = self.current_xof(level, seed, format_dst(1, 0, 0) + ctx, nonce) s = [ bytearray(xof.next(self.KEY_SIZE)), bytearray(xof.next(self.KEY_SIZE)), ] # Use the least significant bits as the control bit correction, # and then zero it out. This gives effectively 127 bits of # security, but reduces the number of AES calls needed by 1/3. t = [Field2(s[0][0] & 1), Field2(s[1][0] & 1)] s[0][0] &= 0xFE s[1][0] &= 0xFE return ([bytes(s[0]), bytes(s[1])], t) def convert( self, level: int, seed: bytes, ctx: bytes, nonce: bytes) > tuple[bytes, FieldVec]: xof = self.current_xof(level, seed, format_dst(1, 0, 1) + ctx, nonce) next_seed = xof.next(self.KEY_SIZE) field = self.current_field(level) w = xof.next_vec(field, self.VALUE_LEN) return (next_seed, cast(FieldVec, w)) def current_xof(self, level: int, seed: bytes, dst: bytes, nonce: bytes) > Xof: if level < self.BITS1: return XofFixedKeyAes128(seed, dst, nonce) return XofTurboShake128(seed, dst, nonce) Figure 33: Helper functions for the IDPF. Barnes, et al. Expires 7 April 2025 [Page 125] InternetDraft VDAF October 2024 8.4. Instantiation By default, Poplar1 is instantiated with the IDPF in Section 8.3 (VALUE_LEN == 2) and XofTurboShake128 (Section 6.2.1). This VDAF is suitable for any positive value of BITS. Test vectors can be found in Appendix "Test Vectors". 9. Security Considerations VDAFs (Section 5) have two essential security goals: 1. Privacy: an attacker that controls the Collector and a subset of Clients and Aggregators learns nothing about the measurements of honest Clients beyond what it can deduce from the aggregate result. We assume the attacker controls the entire network except for channels between honest Clients and honest Aggregators. In particular, it cannot forge or prevent transmission of messages on these channels. 2. Robustness: an attacker that controls a subset of Clients cannot cause the Collector to compute anything other than the aggregate of the measurements of honest Clients. We assume the attacker eavesdrops on the network but does not control transmission of messages between honest parties. Formal definitions of privacy and robustness can be found in [DPRS23]. A VDAF is the core cryptographic primitive of a protocol that achieves the above privacy and robustness goals. It is not sufficient on its own, however. The application will need to assure a few security properties, for example: * Securely distributing the longlived parameters, in particular the verification key. * Establishing secure channels:  Confidential and authentic channels among Aggregators, and between the Aggregators and the Collector; and  Confidential and Aggregatorauthenticated channels between Clients and Aggregators. * Enforcing the noncollusion properties required of the specific VDAF in use. In such an environment, a VDAF provides the highlevel privacy property described above: the Collector learns only the aggregate measurement, and nothing about individual measurements aside from Barnes, et al. Expires 7 April 2025 [Page 126] InternetDraft VDAF October 2024 what can be inferred from the aggregate result. The Aggregators learn neither individual measurements nor the aggregate result. The Collector is assured that the aggregate statistic accurately reflects the inputs as long as the Aggregators correctly executed their role in the VDAF. On their own, VDAFs do not provide: 1. Mitigation of Sybil attacks [Dou02]. In this attack, the adversary observes a subset of input shares transmitted by a Client it is interested in. It allows the input shares to be processed, but corrupts and picks bogus measurements for the remaining Clients. Applications can guard against these risks by adding additional controls on report submission, such as Client authentication and rate limits. 2. Differential privacy [Dwo06]. Depending on the distribution of the measurements, the aggregate result itself can still leak a significant amount of information about an individual measurement or the person that generated it. 3. Robustness in the presence of a malicious Aggregator. An Aggregator can, without detection, manipulate the aggregate result by modifying its own aggregate share. 4. Guaranteed output delivery [GSZ20]. An attacker that controls transmission of messages between honest parties can prevent computation of the aggregate result by dropping messages. 9.1. Requirements for the Verification Key The Aggregators are responsible for exchanging the verification key in advance of executing the VDAF. Any procedure is acceptable as long as the following conditions are met: 1. To ensure robustness of the computation, the Aggregators MUST NOT reveal the verification key to the Clients. Otherwise, a malicious Client might be able to exploit knowledge of this key to craft an invalid report that would be accepted by the Aggregators. 2. To ensure privacy of the measurements, the Aggregators MUST commit to the verification key prior to processing reports generated by Clients. Otherwise, a malicious Aggregator may be able to craft a verification key that, for a given report, causes an honest Aggregator to leak information about the measurement during preparation. Barnes, et al. Expires 7 April 2025 [Page 127] InternetDraft VDAF October 2024 Meeting these conditions is required in order to leverage security analysis in the framework of [DPRS23]. Their definition of robustness allows the attacker, playing the role of a cohort of malicious Clients, to submit arbitrary reports to the Aggregators and eavesdrop on their communications as they process them. Security in this model is achievable as long as the verification key is kept secret from the attacker. The privacy definition of [DPRS23] considers an active attacker that controls the network and a subset of Aggregators; in addition, the attacker is allowed to choose the verification key used by each honest Aggregator over the course of the experiment. Security is achievable in this model as long as the key is picked at the start of the experiment, prior to any reports being generated. (The model also requires nonces to be generated at random; see Section 9.2 below.) Meeting these requirements is relatively straightforward. For example, the Aggregators may designate one of their peers to generate the verification key and distribute it to the others. To assure Clients of key commitment, the Clients and (honest) Aggregators could bind reports to a shared context string derived from the key. For instance, the "task ID" of DAP [DAP] could be set to the hash of the verification key; then as long as honest Aggregators only consume reports for the task indicated by the Client, forging a new key after the fact would reduce to finding collisions in the underlying hash function. (Keeping the key secret from the Clients would require the hash function to be oneway.) However, since rotating the key implies rotating the task ID, this scheme would not allow key rotation over the lifetime of a task. 9.2. Requirements for the Nonce The sharding and preparation steps of VDAF execution depend on a nonce associated with the Client's report. To ensure privacy of the underlying measurement, the Client MUST generate this nonce using a CSPRNG. This is required in order to leverage security analysis for the privacy definition of [DPRS23], which assumes the nonce is chosen at random prior to generating the report. Other security considerations may require the nonce to be non repeating. For example, to achieve differential privacy it is necessary to avoid "over exposing" a report by including it too many times in a single batch or across multiple batches. It is RECOMMENDED that the nonce generated by the Client be used by the Aggregators for replay protection. Barnes, et al. Expires 7 April 2025 [Page 128] InternetDraft VDAF October 2024 9.3. Requirements for the Public Share The Aggregators MUST ensure they have both received the same public share from the Client. It is sufficient, for example, to exchange a hash of the public share over a secure channel. 9.4. Requirements for Aggregation Parameters As described in Section 4.3 and Section 5.3 respectively, DAFs and VDAFs may impose restrictions on the reuse of input shares. This is to ensure that correlated randomness provided by the Client through the input share is not used more than once, which might compromise confidentiality of the Client's measurements. Protocols that make use of VDAFs therefore MUST call vdaf.is_valid on the set of all aggregation parameters used for a Client's input share, and only proceed with the preparation and aggregation phases if that function call returns True. 9.4.1. Additional Privacy Considerations Aggregating a batch of reports multiple times, each time with a different aggregation parameter, could result in information leakage beyond what is used by the application. For example, when Poplar1 is used for heavy hitters, the Aggregators learn not only the heavy hitters themselves, but also the prefix tree (as defined in Section 8) computed along the way. Indeed, this leakage is inherent to any construction that uses an IDPF (Section 8.1) in the same way. Depending on the distribution of the measurements, the prefix tree can leak a significant amount of information about unpopular inputs. For instance, it is possible (though perhaps unlikely) for a large set of nonheavyhitter values to share a common prefix, which would be leaked by a prefix tree with a sufficiently small threshold. A malicious adversary controlling the Collector and one of the Aggregators can further turn arbitrary nonheavy prefixes into heavy ones by tampering with the IDPF output at any position. While our construction ensures that the nodes evaluated at one level are children of the nodes evaluated at the previous level, this still may allow an adversary to discover individual nonheavy strings. The only practical, generalpurpose approach to mitigating these leakages is via differential privacy, which is RECOMMENDED for all protocols using Poplar1 for heavyhitter type applications. Barnes, et al. Expires 7 April 2025 [Page 129] InternetDraft VDAF October 2024 9.4.2. Safe Usage of IDPF Outputs The arithmetic sketch described in Section 8 is used by the Aggregators to check that the shares of the vector obtained by evaluating a Client's IDPF at a sequence of candidate prefixes has at most one nonzero value, and that the nonzero value is 1. Depending on how the values are used, the arithmetic sketch on its own may not be sufficient for robustness of the application. In particular, a malicious Client may attempt to influence the computation by choosing an IDPF that evaluates to 1 at more than one node at a given level of the tree. This issue can be mitigated by using an IDPF that is extractable as defined in in Appendix D of [BBCGGI21]. Extractability ensures that, for a particular level of the tree, it is infeasible for an attacker to control values of the IDPF such that it takes on chosen nonzero values at more than one node. (It can practically only achieve the zero function, a point function, or a pseudorandom function.) The IDPF specified in Section 8.1 only guarantees extractability at the last level of the tree. (This is by virtue of using a larger field for the leaves than for inner nodes and using an XOF to derive leaves that is safe to model as a random oracle (see Section 9.5).) For intermediate levels, it is feasible for a client to produce IDPF shares with two controlled nonzero nodes. This is not an issue for running heavy hitters, since (1) each node in the prefix tree is a child of a previously traversed node, (2) the arithmetic sketch would detect double voting at every level of the prefix tree, and (3) the IDPF is extractable at the last level of the tree. However, the lack of extractability at intermediate levels may result in attacks on the robustness of certain applications. Thus applications SHOULD NOT use prefix counts for intermediate levels for any purpose beyond the heavyhitters tree traversal. 9.5. Requirements for XOFs As described in Section 6.2, our constructions rely on eXtendable Output Functions (XOFs). In the security analyses of our protocols, these are usually modeled as random oracles. XofTurboShake128 is designed to be indifferentiable from a random oracle [MRH04], making it a suitable choice for most situations. The one exception is the IDPF of Section 8.3. Here, a random oracle is not needed to prove privacy, since the analysis of [BBCGGI21], Proposition 1, only requires a Pseudorandom Generator (PRG). As observed in [GKWY20], a PRG can be instantiated from a correlation Barnes, et al. Expires 7 April 2025 [Page 130] InternetDraft VDAF October 2024 robust hash function H. Informally, correlation robustness requires that for a random r, H(xor(r, x)) is computationally indistinguishable from a random function of x. A PRG can therefore be constructed as PRG(r) = H(xor(r, 1))  H(xor(r, 2))  ... since each individual hash function evaluation is indistinguishable from a random function. Our construction at Section 6.2.2 implements a correlationrobust hash function using fixedkey AES. For security, it assumes that AES with a fixed key can be modeled as a random permutation [GKWY20]. Additionally, we use a different AES key for every client, which in the ideal cipher model leads to better concrete security [GKWWY20]. We note that for robustness, the analysis of [BBCGGI21] still assumes a random oracle to make the IDPF extractable. We therefore use XofTurboShake128 instead for the last level of the tree. It is important that XofTurboShake128 supports 16 byte seeds, as this is the seed size for the inner levels. While XofFixedKeyAes128 has been shown to be differentiable from a random oracle [GKWWY20], there are no known attacks exploiting this difference. We also stress that even if the IDPF is not extractable, Poplar1 guarantees that every client can contribute to at most one prefix among the ones being evaluated by the helpers. 9.6. Choosing FLP Parameters Prio3 and other systems built from FLPs (Section 7.3 in particular) may benefit from choosing a field size that is as small as possible. Generally speaking, a smaller field results in lower communication and storage costs. Care must be taken, however, since a smaller field also results in degraded (or even vacuous) robustness. Different variants of Prio3 (Section 7) use different field sizes: Prio3Count and Prio3Sum use Field64; but Prio3SumVec, Prio3Histogram, and Prio3MultihotCountVec all use Field128, a field that is twice as large as Field64. This is due to the use of joint randomness (Section 7.1) in the latter variants. Joint randomness allows for more flexible circuit design (see Section 7.3.1.1), but opens up Prio3 to offline attacks in which the attacker searches for input shares for an invalid measurement that derive joint randomness that causes the circuit to accept. Choosing a large enough field ensures this computation is too expensive to be feasible. (See [DPRS23], Theorem 1.) Note that privacy is not susceptible to such attacks. Barnes, et al. Expires 7 April 2025 [Page 131] InternetDraft VDAF October 2024 Another way to mitigate this issue (or improve robustness in general) is to generate and verify multiple, independent proofs. (See Section 7.1.2.) For Prio3, the PROOFS parameter controls the number of proofs (at least one) that are generated and verified. In general the soundness error of the FLP is given by the following formula: (circuit_soundness + flp_soundness) ** PROOFS where: * circuit_soundness is the soundness of the validity circuit (Section 7.3.2) * flp_soundness is the base soundness of the proof system ([BBCGGI19], Theorem 4.3) For circuits involving joint randomness, we aim for the soundness error to be close to 2^128 in order to mitigate offline attacks. Such circuits MUST use Field128 with at least one proof or Field64 with at least three proofs. Depending on the circuit, Field64 with two proofs might have significantly lower soundness than Field128 with one proof. We stress that weak parameters (too small a field, too few proofs, or both) can be exploited to attack any aggregation task using those parameters. To mitigate offline attacks, it is necessary to disable all tasks that use the weak parameters. 9.7. Choosing the Number of Aggregators Two Aggregators are required for privacy in our threat model, but some (V)DAFs, including Prio3 (Section 7), allow for any number of Aggregators, only one of which needs to be trusted in order for the computation to be private. To hedge against corruptions that happen during the course of the attack, deployments may consider involving more than two Aggregators as described for example in Section 5.9. Note however that some schemes are not compatible with this mode of operation, such as Poplar1. 9.8. DefenseinDepth Measures Prio3 and Poplar1 are designed to resist some attacks that fall outside the main threat model for VDAFs. Broadly speaking, domain separation is used to prevent cross protocol attacks, in which data from evaluation of one VDAF translates to an attack against another. For example: Barnes, et al. Expires 7 April 2025 [Page 132] InternetDraft VDAF October 2024 1. Weak entropy sources: the VDAF algorithm ID is bound to each XOF invocation, thereby ensuring the outputs are different between VDAF invocations, even if the underlying randomness is the same. For example, two different instances of Prio3 would compute different measurement shares. 2. Weak parameters: Prio3 variants that require joint randomness are subject to offline attacks against robustness. These attacks are feasible if the field size or number of proofs is sufficiently small. (See Section 9.6.) The joint randomness derivation is bound to both the field (via the algorithm ID) and the number of proofs, thereby ensuring that joint randomness derived for weak parameters is not reused for stronger parameters. In addition, the joint randomness is bound to the application context, meaning any work the attacker does to attack some application is not useful for other applications that use the same parameters. There are also some important limitations to be aware of. For example, Prio3 provides domain separation between families of circuits, but does not provide domain separation between instances of a circuit. Concretely, it is possible for Aggregators to accept a report for Prio3SumVec from a Client who disagrees with them on the value of bits and length. This is because there is no binding of the circuit parameters to the computation. 10. IANA Considerations IANA is requested to make one new registry: * DAF and VDAF Identifiers This registry should be created under the heading "Verifiable Distributed Aggregation Functions (VDAF)", and administered under the Specification Required policy [RFC8126]. The "VDAF Identifiers" registry lists identifiers for Distributed Aggregation Functions (DAFs) and Verifiable Distributed Aggregation Functions (VDAFs). These identifiers are fourbyte values, so the minimum possible value is 0x00000000 and the maximum possible value is 0xffffffff. Template: * Value: The fourbyte identifier for the DAF or VDAF * Scheme: The name of the DAF or VDAF Barnes, et al. Expires 7 April 2025 [Page 133] InternetDraft VDAF October 2024 * Type: Either "DAF" for a Distributed Aggregation Function or "VDAF" for a Verifiable Distributed Aggregation Function * Reference: Where the algorithm is defined The initial contents of the registry are as follows: +===============+=======================+======+===============+  Value  Scheme  Type  Reference  +===============+=======================+======+===============+  0x00000000  Reserved  n/a  RFC XXXX  +++++  0x00000001  Prio3Count  VDAF  Section 7.4.1      of RFC XXXX  +++++  0x00000002  Prio3Sum  VDAF  Section 7.4.2      of RFC XXXX  +++++  0x00000003  Prio3SumVec  VDAF  Section 7.4.3      of RFC XXXX  +++++  0x00000004  Prio3Histogram  VDAF  Section 7.4.4      of RFC XXXX  +++++  0x00000005  Prio3MultihotCountVec  VDAF  Section 7.4.5      of RFC XXXX  +++++  0x00000006  Poplar1  VDAF  Section 8.4      of RFC XXXX  +++++  0xFFFF0000 to  Reserved for Private  n/a  n/a   0xFFFFFFFF  Use    +++++ Table 23: Verifiable Distributed Aggregation Function Identifiers Registry (RFC EDITOR: Please replace "RFC XXXX" above with the RFC number assigned to this document.) VDAF identifiers are used for domain separation, as described in Section 6.2.3. Domain separation guards against failures of entropy sources, by ensuring that invocations of different VDAFs use different derived values, even if they are invoked with the same underlying random data. Barnes, et al. Expires 7 April 2025 [Page 134] InternetDraft VDAF October 2024 The benefits of domain separation are undermined if different VDAFs are used with the same VDAF Identifier. The "Reserved for Private Use" code points should thus be used judiciously, because they provide no defense against such collisions. Applications SHOULD prefer the use of registered code points. 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfceditor.org/rfc/rfc2119>. [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, <https://www.rfceditor.org/rfc/rfc8126>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfceditor.org/rfc/rfc8174>. [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, <https://www.rfceditor.org/rfc/rfc8446>. [TurboSHAKE] Viguier, B., Wong, D., Van Assche, G., Dang, Q., and J. Daemen, "KangarooTwelve and TurboSHAKE", Work in Progress, InternetDraft, draftirtfcfrgkangarootwelve14, 9 May 2024, <https://datatracker.ietf.org/doc/html/draftirtf cfrgkangarootwelve14>. 11.2. Informative References [AGJOP21] Addanki, S., Garbe, K., Jaffe, E., Ostrovsky, R., and A. Polychroniadou, "Prio+: Privacy Preserving Aggregate Statistics via Boolean Shares", 2021, <https://ia.cr/2021/576>. [BBCGGI19] Boneh, D., Boyle, E., CorriganGibbs, H., Gilboa, N., and Y. Ishai, "ZeroKnowledge Proofs on SecretShared Data via Fully Linear PCPs", CRYPTO 2019 , 2019, <https://ia.cr/2019/188>. Barnes, et al. Expires 7 April 2025 [Page 135] InternetDraft VDAF October 2024 [BBCGGI21] Boneh, D., Boyle, E., CorriganGibbs, H., Gilboa, N., and Y. Ishai, "Lightweight Techniques for Private Heavy Hitters", IEEE S&P 2021 , 2021, <https://ia.cr/2021/017>. [BBCGGI23] Boneh, D., Boyle, E., CorriganGibbs, H., Gilboa, N., and Y. Ishai, "Arithmetic Sketching", CRYPTO 2023 , 2023, <https://ia.cr/2023/1012>. [BGI15] Boyle, E., Gilboa, N., and Y. Ishai, "Function Secret Sharing", EUROCRYPT 2015 , 2015, <https://www.iacr.org/archive/ eurocrypt2015/90560300/90560300.pdf>. [CGB17] CorriganGibbs, H. and D. Boneh, "Prio: Private, Robust, and Scalable Computation of Aggregate Statistics", NSDI 2017 , 2017, <https://dl.acm.org/doi/10.5555/3154630.3154652>. [DAP] Geoghegan, T., Patton, C., Pitman, B., Rescorla, E., and C. A. Wood, "Distributed Aggregation Protocol for Privacy Preserving Measurement", Work in Progress, InternetDraft, draftietfppmdap11, 21 May 2024, <https://datatracker.ietf.org/doc/html/draftietfppmdap 11>. [Dou02] Douceur, J., "The Sybil Attack", IPTPS 2002 , 2002, <https://doi.org/10.1007/3540457488_24>. [DPRS23] Davis, H., Patton, C., Rosulek, M., and P. Schoppmann, "Verifiable Distributed Aggregation Functions", PETS 2023 , 2023, <https://ia.cr/2023/130>. [Dwo06] Dwork, C., "Differential Privacy", ICALP 2006 , 2006, <https://link.springer.com/chapter/10.1007/11787006_1>. [ENPA] "Exposure Notification Privacypreserving Analytics (ENPA) White Paper", 2021, <https://covid19static.cdn apple.com/applications/covid19/current/static/contact tracing/pdf/ENPA_White_Paper.pdf>. [EPK14] Erlingsson, Ú., Pihur, V., and A. Korolova, "RAPPOR: Randomized Aggregatable PrivacyPreserving Ordinal Response", CCS 2014 , 2014, <https://dl.acm.org/doi/10.1145/2660267.2660348>. Barnes, et al. Expires 7 April 2025 [Page 136] InternetDraft VDAF October 2024 [GI14] Gilboa, N. and Y. Ishai, "Distributed Point Functions and Their Applications", EUROCRYPT 2014 , 2014, <https://link.springer.com/ chapter/10.1007/9783642552205_35>. [GKWWY20] Guo, C., Katz, J., Wang, X., Weng, C., and Y. Yu, "Better concrete security for halfgates garbling (in the multi instance setting)", CRYPTO 2020 , 2020, <https://link.springer.com/ chapter/10.1007/9783030568801_28>. [GKWY20] Guo, C., Katz, J., Wang, X., and Y. Yu, "Efficient and Secure Multiparty Computation from FixedKey Block Ciphers", S&P 2020 , 2020, <https://eprint.iacr.org/2019/074>. [GSZ20] Goyal, V., Song, Y., and C. Zhu, "Guaranteed Output Delivery Comes Free in Honest Majority MPC", CRYPTO 2020 , 2020, <https://link.springer.com/ chapter/10.1007/9783030568801_22>. [MPDST25] Mouris, D., Patton, C., Davis, H., Sarkar, P., and N. G. Tsoutsos, "Mastic: Private Weighted HeavyHitters and AttributeBased Metrics", PETS 2025 , 2025, <https://eprint.iacr.org/2024/221>. [MPRV09] Mironov, I., Pandey, O., Reingold, O., and S. Vadhan, "Computational Differential Privacy", CRYPTO 2009 , n.d., <https://link.springer.com/ chapter/10.1007/9783642033568_8>. [MRH04] Maurer, U., Renner, R., and C. Holenstein, "Indifferentiability, impossibility results on reductions, and applications to the random oracle methodology", In TCC 2004: Theory of Cryptography, pages 2139, DOI 10.1007/9783540246381_2, February 2004, <https://doi.org/10.1007/9783540246381_2>. [OriginTelemetry] "Origin Telemetry", 2020, <https://web.archive.org/web/20221025174046/ https://firefoxsource docs.mozilla.org/toolkit/components/telemetry/collection/ origin.html>. [SML24] Satriawan, A., Mareta, R., and H. Lee, "A Complete Beginner Guide to the Number Theoretic Transform (NTT)", 2024, <https://eprint.iacr.org/2024/585>. Barnes, et al. Expires 7 April 2025 [Page 137] InternetDraft VDAF October 2024 [TestVectors] "Test vectors for Prio3 and Poplar1", commit hash <TODO> , September 2024, <https://github.com/cfrg/draftirtfcfrg vdaf/tree/main/test_vec/vdaf>. Acknowledgments The impetus of this work is the success of recent deployments of predecessors of Prio3. The Mozilla Origin Telemetry project [OriginTelemetry] and the Exposure Notification Private Analytics collaboration among the Internet Security Research Group (ISRG), Google, Apple, and others [ENPA] have together aggregated data from hundreds of millions of users. As the name implies, Prio3 is a descendant of the original Prio construction [CGB17]. A second iteration was deployed in the [ENPA] system, and like the VDAF described here, the ENPA system was built from techniques introduced in [BBCGGI19] that significantly improve communication cost. That system was specialized for a particular aggregation function; the goal of Prio3 is to provide the same level of generality as the original construction. The security considerations in Section 9 are based largely on the security analysis of [DPRS23]. Thanks to Hannah Davis and Mike Rosulek, who lent their time to developing definitions and security proofs. Thanks to Junye Chen, Henry CorriganGibbs, Armando FazHernández, Simon Friedberger, Tim Geoghegan, Albert Liu, Brandon Pitman, Mariana Raykova, Michael Rosenberg, Jacob Rothstein, Shan Wang, Xiao Wang, Bas Westerbaan, and Christopher Wood for useful feedback on and contributions to the spec. Test Vectors TODO Update the reference in [TestVectors] by replacing <TODO> with the commit hash with the final version of the test vectors.) The test vectors are available at [TestVectors]. The directory contains a set of JSON files. Each file contains a test vector for an instance of Vdaf (Section 5). A test vector covers sharding, preparation, aggregation, and unsharding of each of several measurements. The test vector schema is defined below. Schema ctx: The application context string encoded in hexadecimal. Barnes, et al. Expires 7 April 2025 [Page 138] InternetDraft VDAF October 2024 verify_key: The verification key encoded in hexadecimal. agg_param: The aggregation parameter of type Vdaf.AggParam. prep: A list of objects with the following schema: measurement: The measurement of type Vdaf.Measurement. nonce: The nonce encoded in hexadecimal. rand: The sharding randomness encoded in hexadecimal. public_share: The expected public share encoded in hexadecimal. input_shares: The expected list of input shares, each incoded in hexadecimal. prep_shares: The expected list of prep shares generated by each Aggregator at each round of preparation, encoded in hexadecimal. prep_messages: The expected list of prep messages for each round of preparation, encoded in hexadecimal. out_shares: The expected list of output shares, encoded in hexadecimal. agg_shares: The expected aggregate shares encoded in hexadecimal. agg_result: The expected aggregate result of type Vdaf.AggResult. The schema also includes whatever parameters are required to instantiate the VDAF. These are listed in the subsections below. Prio3Count shares: The number of shares, an integer. Prio3Sum shares: The number of shares, an integer. max_measurement: The largest measurement, an integer. Each element is in range range(max_measurement+1). Barnes, et al. Expires 7 April 2025 [Page 139] InternetDraft VDAF October 2024 Prio3SumVec shares: The number of shares, an integer. length: The lengh of the vector, an integer. chunk_length: the length of each vector chunk, an integer. bits: the bit length of each element of the vector, an integer. each element is in range(2 ** bits). Prio3Histogram shares: The number of shares, an integer. length: The lengh of the vector, an integer. chunk_length: the length of each vector chunk, an integer. Prio3MultihotCountVec shares: The number of shares, an integer. length: The lengh of the vector, an integer. chunk_length: the length of each vector chunk, an integer. max_weight: The largest vector weight, an integer. The sum of the elements must be in range(max_weight+1). Poplar1 bits: The length of each input, an integer. Authors' Addresses Richard L. Barnes Cisco Email: rlb@ipv.sx David Cook ISRG Email: divergentdave@gmail.com Christopher Patton Cloudflare Barnes, et al. Expires 7 April 2025 [Page 140] InternetDraft VDAF October 2024 Email: chrispatton+ietf@gmail.com Phillipp Schoppmann Google Email: schoppmann@google.com Barnes, et al. Expires 7 April 2025 [Page 141]