Skip to main content

Verifiable Distributed Aggregation Functions
draft-irtf-cfrg-vdaf-12

Document Type Active Internet-Draft (cfrg RG)
Authors Richard Barnes , David Cook , Christopher Patton , Phillipp Schoppmann
Last updated 2024-10-04
Replaces draft-patton-cfrg-vdaf
RFC stream Internet Research Task Force (IRTF)
Intended RFC status Informational
Formats
Additional resources Mailing list discussion
Stream IRTF state Active RG Document
Consensus boilerplate Unknown
Document shepherd (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-irtf-cfrg-vdaf-12
CFRG                                                        R. L. Barnes
Internet-Draft                                                     Cisco
Intended status: Informational                                   D. Cook
Expires: 7 April 2025                                               ISRG
                                                               C. Patton
                                                              Cloudflare
                                                           P. Schoppmann
                                                                  Google
                                                          4 October 2024

              Verifiable Distributed Aggregation Functions
                        draft-irtf-cfrg-vdaf-12

Abstract

   This document describes Verifiable Distributed Aggregation Functions
   (VDAFs), a family of multi-party protocols for computing aggregate
   statistics over user measurements.  These protocols are designed to
   ensure that, as long as at least one aggregation server executes the
   protocol honestly, individual measurements are never seen by any
   server in the clear.  At the same time, VDAFs allow the servers to
   detect if a malicious or misconfigured client submitted an
   measurement that would result in an invalid aggregate result.

Discussion Venues

   This note is to be removed before publishing as an RFC.

   Discussion of this document takes place on the Crypto Forum Research
   Group mailing list (cfrg@ietf.org), which is archived at
   https://mailarchive.ietf.org/arch/search/?email_list=cfrg.

   Source for this draft and an issue tracker can be found at
   https://github.com/cjpatton/vdaf.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Barnes, et al.            Expires 7 April 2025                  [Page 1]
Internet-Draft                    VDAF                      October 2024

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 7 April 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.1.  Change Log  . . . . . . . . . . . . . . . . . . . . . . .   8
   2.  Conventions and Definitions . . . . . . . . . . . . . . . . .  15
   3.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .  16
   4.  Definition of DAFs  . . . . . . . . . . . . . . . . . . . . .  18
     4.1.  Sharding  . . . . . . . . . . . . . . . . . . . . . . . .  20
     4.2.  Preparation . . . . . . . . . . . . . . . . . . . . . . .  22
     4.3.  Validity of Aggregation Parameters  . . . . . . . . . . .  22
     4.4.  Aggregation . . . . . . . . . . . . . . . . . . . . . . .  22
     4.5.  Unsharding  . . . . . . . . . . . . . . . . . . . . . . .  23
     4.6.  Execution of a DAF  . . . . . . . . . . . . . . . . . . .  24
   5.  Definition of VDAFs . . . . . . . . . . . . . . . . . . . . .  26
     5.1.  Sharding  . . . . . . . . . . . . . . . . . . . . . . . .  28
     5.2.  Preparation . . . . . . . . . . . . . . . . . . . . . . .  29
     5.3.  Validity of Aggregation Parameters  . . . . . . . . . . .  32
     5.4.  Aggregation . . . . . . . . . . . . . . . . . . . . . . .  32
     5.5.  Unsharding  . . . . . . . . . . . . . . . . . . . . . . .  32
     5.6.  Execution of a VDAF . . . . . . . . . . . . . . . . . . .  33
     5.7.  Communication Patterns for Preparation  . . . . . . . . .  35
     5.8.  Ping-Pong Topology (Only Two Aggregators) . . . . . . . .  36
     5.9.  Star Topology (Any Number of Aggregators) . . . . . . . .  42
   6.  Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .  43
     6.1.  Finite Fields . . . . . . . . . . . . . . . . . . . . . .  43
       6.1.1.  Auxiliary Functions . . . . . . . . . . . . . . . . .  45
       6.1.2.  NTT-Friendly Fields . . . . . . . . . . . . . . . . .  46
       6.1.3.  Parameters  . . . . . . . . . . . . . . . . . . . . .  46
     6.2.  Extendable Output Functions . . . . . . . . . . . . . . .  47
       6.2.1.  XofTurboShake128  . . . . . . . . . . . . . . . . . .  49

Barnes, et al.            Expires 7 April 2025                  [Page 2]
Internet-Draft                    VDAF                      October 2024

       6.2.2.  XofFixedKeyAes128 . . . . . . . . . . . . . . . . . .  50
       6.2.3.  The Domain Separation Tag and Binder String . . . . .  51
   7.  Prio3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .  52
     7.1.  Fully Linear Proof (FLP) Systems  . . . . . . . . . . . .  53
       7.1.1.  Encoding the Input  . . . . . . . . . . . . . . . . .  57
       7.1.2.  Multiple Proofs . . . . . . . . . . . . . . . . . . .  58
     7.2.  Construction  . . . . . . . . . . . . . . . . . . . . . .  58
       7.2.1.  Sharding  . . . . . . . . . . . . . . . . . . . . . .  60
       7.2.2.  Preparation . . . . . . . . . . . . . . . . . . . . .  65
       7.2.3.  Validity of Aggregation Parameters  . . . . . . . . .  68
       7.2.4.  Aggregation . . . . . . . . . . . . . . . . . . . . .  68
       7.2.5.  Unsharding  . . . . . . . . . . . . . . . . . . . . .  69
       7.2.6.  Auxiliary Functions . . . . . . . . . . . . . . . . .  69
       7.2.7.  Message Serialization . . . . . . . . . . . . . . . .  71
     7.3.  The FLP of BBCGGI19 . . . . . . . . . . . . . . . . . . .  74
       7.3.1.  Overview  . . . . . . . . . . . . . . . . . . . . . .  74
       7.3.2.  Validity Circuits . . . . . . . . . . . . . . . . . .  77
       7.3.3.  Construction  . . . . . . . . . . . . . . . . . . . .  79
     7.4.  Instantiations  . . . . . . . . . . . . . . . . . . . . .  84
       7.4.1.  Prio3Count  . . . . . . . . . . . . . . . . . . . . .  84
       7.4.2.  Prio3Sum  . . . . . . . . . . . . . . . . . . . . . .  85
       7.4.3.  Prio3SumVec . . . . . . . . . . . . . . . . . . . . .  88
       7.4.4.  Prio3Histogram  . . . . . . . . . . . . . . . . . . .  93
       7.4.5.  Prio3MultihotCountVec . . . . . . . . . . . . . . . .  96
   8.  Poplar1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
     8.1.  Incremental Distributed Point Functions (IDPFs) . . . . . 102
       8.1.1.  Encoding inputs as indices  . . . . . . . . . . . . . 104
     8.2.  Construction  . . . . . . . . . . . . . . . . . . . . . . 105
       8.2.1.  Sharding  . . . . . . . . . . . . . . . . . . . . . . 107
       8.2.2.  Preparation . . . . . . . . . . . . . . . . . . . . . 110
       8.2.3.  Validity of Aggregation Parameters  . . . . . . . . . 113
       8.2.4.  Aggregation . . . . . . . . . . . . . . . . . . . . . 114
       8.2.5.  Unsharding  . . . . . . . . . . . . . . . . . . . . . 115
       8.2.6.  Message Serialization . . . . . . . . . . . . . . . . 115
     8.3.  The IDPF scheme of BBCGGI21 . . . . . . . . . . . . . . . 120
       8.3.1.  Key Generation  . . . . . . . . . . . . . . . . . . . 120
       8.3.2.  Key Evaluation  . . . . . . . . . . . . . . . . . . . 122
       8.3.3.  Auxiliary Functions . . . . . . . . . . . . . . . . . 125
     8.4.  Instantiation . . . . . . . . . . . . . . . . . . . . . . 126
   9.  Security Considerations . . . . . . . . . . . . . . . . . . . 126
     9.1.  Requirements for the Verification Key . . . . . . . . . . 127
     9.2.  Requirements for the Nonce  . . . . . . . . . . . . . . . 128
     9.3.  Requirements for the Public Share . . . . . . . . . . . . 129
     9.4.  Requirements for Aggregation Parameters . . . . . . . . . 129
       9.4.1.  Additional Privacy Considerations . . . . . . . . . . 129
       9.4.2.  Safe Usage of IDPF Outputs  . . . . . . . . . . . . . 130
     9.5.  Requirements for XOFs . . . . . . . . . . . . . . . . . . 130
     9.6.  Choosing FLP Parameters . . . . . . . . . . . . . . . . . 131

Barnes, et al.            Expires 7 April 2025                  [Page 3]
Internet-Draft                    VDAF                      October 2024

     9.7.  Choosing the Number of Aggregators  . . . . . . . . . . . 132
     9.8.  Defense-in-Depth Measures . . . . . . . . . . . . . . . . 132
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 133
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . . 135
     11.1.  Normative References . . . . . . . . . . . . . . . . . . 135
     11.2.  Informative References . . . . . . . . . . . . . . . . . 135
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 138
   Test Vectors  . . . . . . . . . . . . . . . . . . . . . . . . . . 138
     Schema  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
       Prio3Count  . . . . . . . . . . . . . . . . . . . . . . . . . 139
       Prio3Sum  . . . . . . . . . . . . . . . . . . . . . . . . . . 139
       Prio3SumVec . . . . . . . . . . . . . . . . . . . . . . . . . 140
       Prio3Histogram  . . . . . . . . . . . . . . . . . . . . . . . 140
       Prio3MultihotCountVec . . . . . . . . . . . . . . . . . . . . 140
       Poplar1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 140

1.  Introduction

   (RFC EDITOR: remove this paragraph.)  The source for this draft and
   and the reference implementation can be found at
   https://github.com/cfrg/draft-irtf-cfrg-vdaf.

   The ubiquity of the Internet makes it an ideal platform for
   measurement of large-scale phenomena, whether public health trends or
   the behavior of computer systems at scale.  There is substantial
   overlap, however, between information that is valuable to measure and
   information that users consider private.

   For example, consider an application that provides health information
   to users.  The operator of an application might want to know which
   parts of their application are used most often, as a way to guide
   future development of the application.  Specific users' patterns of
   usage, though, could reveal sensitive things about them, such as
   which users are researching a given health condition.

   In many situations, the measurement collector is only interested in
   aggregate statistics, e.g., which portions of an application are most
   used or what fraction of people have experienced a given disease.
   Thus systems that provide aggregate statistics while protecting
   individual measurements can deliver the value of the measurements
   while protecting users' privacy.

   This problem is often formulated in terms of differential privacy
   (DP) [Dwo06].  Roughly speaking, a data aggregation system that is
   differentially private ensures that the degree to which any
   individual measurement influences the value of the aggregate result
   can be precisely controlled.  For example, in systems like RAPPOR

Barnes, et al.            Expires 7 April 2025                  [Page 4]
Internet-Draft                    VDAF                      October 2024

   [EPK14], each user samples noise from a well-known distribution and
   adds it to their measurement before submitting to the aggregation
   server.  The aggregation server then adds up the noisy measurements,
   and because it knows the distribution from which the noise was
   sampled, it can estimate the true sum with reasonable accuracy.

   Even when noise is added to the measurements, collecting them in the
   clear still reveals a significant amount of information to the
   collector.  On the one hand, depending on the "amount" of noise a
   client adds to its measurement, it may be possible for a curious
   collector to make a reasonable guess of the measurement's true value.
   On the other hand, the more noise the clients add, the less reliable
   will be the server's estimate of the aggregate.  Thus systems relying
   solely on a DP mechanism must strike a delicate balance between
   privacy and utility.

   The ideal goal for a privacy-preserving measurement system is that of
   secure multi-party computation (MPC): no participant in the protocol
   should learn anything about an individual measurement beyond what it
   can deduce from the aggregate.  MPC achieves this goal by
   distributing the computation of the aggregate across multiple
   aggregation servers, one of which is presumed to be honest, i.e., not
   under control of the attacker.  Moreover, MPC can be composed with
   various DP mechanisms to ensure the aggregate itself does leak too
   much information about any one of the measurements [MPRV09].

   This document describes two classes of MPC protocols, each aiming for
   a different set of goals.

   In a Distributed Aggregation Function (DAF, Section 4), each client
   splits its measurement into multiple secret shares, one for each
   aggregation server.  DAFs require two properties of the secret
   sharing scheme.  First, we can reconstruct the underlying measurement
   by simply adding up all of the shares.  (Typically the shares are
   vectors over some finite field.)  Second, given all but one of the
   shares, it is impossible to learn anything about the underlying
   measurement.  These properties give rise to a simple strategy for
   privately aggregating the measurements: each aggregation server adds
   up its measurement shares locally before revealing their sum to the
   data collector; then all the data collector has to do is add up these
   sums to get the aggregate.

Barnes, et al.            Expires 7 April 2025                  [Page 5]
Internet-Draft                    VDAF                      October 2024

   This strategy is compatible with any aggregation function that can be
   represented as the sum of some encoding of the measurements.
   Examples include: summary statistics such as sum, mean, and standard
   deviation; estimation of quantiles, e.g., median; histograms; linear
   regression; or counting data structures, e.g., Bloom filters.
   However, not all functions fit into this rubric, as it is constrained
   to linear computations over the encoded measurements.

   In fact, our framework admits DAFs with slightly more functionality,
   computing aggregation functions of the form

   f(agg_param, meas_1, ..., meas_N) =
       g(agg_param, meas_1) + ... + g(agg_param, meas_N)

   where meas_1, ..., meas_N are the measurements, g is a possibly non-
   linear function, and agg_param is a parameter of that function chosen
   by the data collector.  This paradigm, known as function secret
   sharing [BGI15], allows for more sophisticated data analysis tasks,
   such as grouping metrics by private client attributes [MPDST25] or
   computing heavy hitters [BBCGGI21].  (More on the latter task below.)

   The second class of protocols defined in this document are called
   Verifiable Distributed Aggregation Functions (VDAFs, Section 5).  In
   addition to being private, VDAFs are verifiable in the following
   sense.  By design, a secret sharing of a valid measurement, e.g., a
   number between 1 and 10, is indistinguishable from a secret sharing
   of an invalid measurement, e.g., a number larger than 10.  This means
   that DAFs are vulnerable to attacks from malicious clients attempting
   to disrupt the computation by submitting invalid measurements.  Thus
   VDAFs are designed to allow the servers to detect and remove these
   measurements prior to aggregation. We refer to this property as
   robustness.

   Achieving robustness without sacrificing privacy requires the servers
   to interact with one another over a number of rounds of
   communication.  DAFs on the other hand are non-interactive, and are
   therefore easier to deploy; but they do not provide robustness on
   their own.  This may be tolerable in some applications.  For
   instance, if the client's software is executed in a trusted execution
   environment, it may be reasonable to assume that no client is
   malicious.

   The DAF and VDAF abstractions encompass a variety of MPC techniques
   in the literature.  These protocols vary in their operational and
   security requirements, sometimes in subtle but consequential ways.
   This document therefore has two important goals:

Barnes, et al.            Expires 7 April 2025                  [Page 6]
Internet-Draft                    VDAF                      October 2024

   1.  Providing higher-level protocols like [DAP] (RFC EDITOR: remove
       this reference if not published before the current document) with
       a simple, uniform interface for accessing privacy-preserving
       measurement schemes, documenting relevant operational and
       security requirements, and specifying constraints for safe usage:

       1.  General patterns of communications among the various actors
           involved in the system (clients, aggregation servers, and the
           collector of the aggregate result);

       2.  Capabilities of a malicious coalition of servers attempting
           to divulge information about client measurements; and

       3.  Conditions that are necessary to ensure that malicious
           clients cannot corrupt the computation.

   2.  Providing cryptographers with design criteria that provide a
       clear deployment roadmap for new constructions.

   This document also specifies two concrete VDAF schemes, each based on
   a protocol from the literature.

   *  The Prio system [CGB17] allows for the privacy-preserving
      computation of a variety of aggregate statistics, combining
      additive secret sharing as described above with a mechanism for
      checking the validity of each measurement.  In Section 7 we
      specify Prio3, a VDAF that follows the same overall framework as
      the original Prio protocol, but incorporates techniques introduced
      in [BBCGGI19] that result in significant performance gains.

   *  The Poplar protocol [BBCGGI21] solves the heavy-hitters problem in
      a privacy-preserving manner.  Here each client holds a bit-string,
      and the goal of the aggregation servers is to compute the set of
      strings that occur at least t times for some threshold t.  The
      core primitive in their protocol is a secret sharing of a point
      function [GI14] (g in the notation above) that allows the servers
      to privately count how many of the clients' strings begin with a
      given prefix (agg_param in the notation above).  In Section 8 we
      specify a VDAF called Poplar1 that implements this functionality.

Barnes, et al.            Expires 7 April 2025                  [Page 7]
Internet-Draft                    VDAF                      October 2024

   The remainder of this document is organized as follows: Section 2
   lists definitions and conventions used for specification; Section 3
   gives a brief overview of DAFs and VDAFs, the parties involved in the
   computation, and the requirements for non-collusion; Section 4
   defines the syntax for DAFs; Section 5 defines the syntax for VDAFs;
   Section 6 defines various functionalities that are common to our
   constructions; Section 7 describes the Prio3 construction; Section 8
   describes the Poplar1 construction; and Section 9 enumerates the
   security considerations for DAFs and VDAFs.

1.1.  Change Log

   (RFC EDITOR: remove this section.)

   (*) Indicates a change that breaks wire compatibility with the
   previous draft.

   12:

   *  (V)DAF: Add an application context string parameter to sharding
      and preparation.  The motivation for this change is to harden
      Prio3 against offline attacks.  More generally, however, it allows
      designing schemes for which correct execution requires agreement
      on the application context.  Accordingly, both Prio3 and Poplar1
      have been modified to include the context in the domain separation
      tag of each XOF invocation. (*)

   *  Prio3: Improve soundness of the base proof system and the circuits
      of some variants.  Generally speaking, wherever we evaluate a
      univariate polynomial at a random point, we can instead evaluate a
      multivariate polynomial of lower degree. (*)

   *  Prio3: Replace the helper's measurement and proof share seeds with
      a single seed. (*)

   *  Prio3Sum: Update the circuit to support a more general range check
      and avoid using joint randomness. (*)

   *  Prio3Histogram, Prio3MultihotCountVec: Move the final reduction of
      the intermediate outputs out of the circuit. (*)

   *  IDPF: Add the application context string to key generation end
      evaluation and bind it to the fixed AES key. (*)

   *  IDPF: Use XofTurboShake128 for deriving the leaf nodes in order to
      ensure the construction is extractable. (*)

   *  IDPF: Simplify the public share encoding. (*)

Barnes, et al.            Expires 7 April 2025                  [Page 8]
Internet-Draft                    VDAF                      October 2024

   *  XofTurboShake128: Change SEED_SIZE from 16 bytes to 32 to mitigate
      offline attacks on Prio3 robustness.  In addition, allow seeds of
      different lengths so that we can continue to use XofTurboShake128
      with IDPF. (*)

   *  XofTurboShake128, XofFixedKeyAes128: Increase the length prefix
      for the domain separation tag from one by to two bytes.  This is
      to accommodate the application context. (*)

   *  Reassign codepoints for all Prio3 variants and Poplar1. (*)

   *  Security considerations: Add a section on defense-in-depth
      measures taken by Prio3 and Poplar1 and more discussion about
      choosing FLP parameters.

   11:

   *  Define message formats for the Poplar1 aggregation parameter and
      IDPF public share.

   *  IDPF: Require the IDPF binder must be a random nonce.

   *  VDAF: Replace the pseudocode description of the ping-ping topology
      with Python and sketch the star topology.

   *  DAF: Align aggregation parameter validation with VDAF.

   *  Replace Union[A, B] type with A | B.

   *  Rename FFT ("Fast Fourier Transform") with NTT ("Number Theoretic
      Transform").

   10:

   *  Define Prio3MultihotCountVec, a variant of Prio3 for aggregating
      bit vectors with bounded weight.

   *  FLP: Allow the output of the circuit to be a vector.  This makes
      it possible to skip joint randomness derivation in more cases.

   *  Poplar1: On the first round of preparation, handle None as an
      error.  Previously this message was interpreted as a length-3
      vector of zeros.

   *  Prio3: Move specification of the field from the FLP validity
      circuit to the VDAF itself.

Barnes, et al.            Expires 7 April 2025                  [Page 9]
Internet-Draft                    VDAF                      October 2024

   *  Clarify the extent to which the attacker controls the network in
      our threat models for privacy and robustness.

   *  Clean up various aspects of the code, including: Follow existing
      object-oriented programming patterns for Python more closely; make
      the type hints enforceable; and avoid shadowing variables.

   *  Poplar1: Align terminology with [BBCGGI23].

   *  IDPF: Add guidance for encoding byte strings as indices.

   09:

   *  Poplar1: Make prefix tree traversal stricter by requiring each
      node to be a child of a node that was already visited.  This
      change is intended to make it harder for a malicious Aggregator to
      steer traversal towards non-heavy-hitting measurements.

   *  Prio3: Add more explicit guidance for choosing the field size.

   *  IDPF: Define extractability and clarify (un)safe usage of
      intermediate prefix counts.  Accordingly, add text ensuring public
      share consistency to security considerations.

   08:

   *  Poplar1: Bind the report nonce to the authenticator vector
      programmed into the IDPF. (*)

   *  IdpfPoplar: Modify extend() by stealing each control bit from its
      corresponding seed.  This improves performance by reducing the
      number of AES calls per level from 3 to 2.  The cost is a slight
      reduction in the concrete privacy bound. (*)

   *  Prio3: Add support for generating and verifying mutliple proofs
      per measurement.  This enables a trade-off between communication
      cost and runtime: if more proofs are used, then a smaller field
      can be used without impacting robustness. (*)

   *  Replace SHAKE128 with TurboSHAKE128. (*)

   07:

   *  Rename PRG to XOF ("eXtendable Output Function").  Accordingly,
      rename PrgSha3 to XofShake128 and PrgFixedKeyAes128 to
      XofFixedKeyAes128.  "PRG" is a misnomer since we don't actually
      treat this object as a pseudorandom generator in existing security
      analysis.

Barnes, et al.            Expires 7 April 2025                 [Page 10]
Internet-Draft                    VDAF                      October 2024

   *  Replace cSHAKE128 with SHAKE128, re-implementing domain separation
      for the customization string using a simpler scheme.  This change
      addresses the reality that implementations of cSHAKE128 are less
      common. (*)

   *  Define a new VDAF, called Prio3SumVec, that generalizes Prio3Sum
      to a vector of summands.

   *  Prio3Histogram: Update the codepoint and use the parallel sum
      optimization introduced by Prio3SumVec to reduce the proof size.
      (*)

   *  Daf, Vdaf: Rename interface methods to match verbiage in the
      draft.

   *  Daf: Align with Vdaf by adding a nonce to shard() and prep().

   *  Vdaf: Have prep_init() compute the first prep share.  This change
      is intended to simplify the interface by making the input to
      prep_next() not optional.

   *  Prio3: Split sharding into two auxiliary functions, one for
      sharding with joint randomness and another without.  This change
      is intended to improve readability.

   *  Fix bugs in the ping-pong interface discovered after implementing
      it.

   06:

   *  Vdaf: Define a wrapper interface for preparation that is suitable
      for the "ping-pong" topology in which two Aggregators exchange
      messages over a request/response protocol, like HTTP, and take
      turns executing the computation until input from the peer is
      required.

   *  Prio3Histogram: Generalize the measurement type so that the
      histogram can be used more easily with discrete domains. (*)

   *  Daf, Vdaf: Change the aggregation parameter validation algorithm
      to take the set of previous parameters rather than a list.  (The
      order of the parameters is irrelevant.)

   *  Daf, Vdaf, Idpf: Add parameter RAND_SIZE that specifies the number
      of random bytes consumed by the randomized algorithm (shard() for
      Daf and Vdaf and gen() for Idpf).

   05:

Barnes, et al.            Expires 7 April 2025                 [Page 11]
Internet-Draft                    VDAF                      October 2024

   *  IdpfPoplar: Replace PrgSha3 with PrgFixedKeyAes128, a fixed-key
      mode for AES-128 based on a construction from [GKWWY20].  This
      change is intended to improve performance of IDPF evaluation.
      Note that the new PRG is not suitable for all applications. (*)

   *  Idpf: Add a binder string to the key-generation and evaluation
      algorithms.  This is used to plumb the nonce generated by the
      Client to the PRG.

   *  Plumb random coins through the interface of randomized algorithms.
      Specifically, add a random input to (V)DAF sharding algorithm and
      IDPF key-generation algorithm and require implementations to
      specify the length of the random input.  Accordingly, update
      Prio3, Poplar1, and IdpfPoplar to match the new interface.  This
      change is intended to improve coverage of test vectors.

   *  Use little-endian byte-order for field element encoding. (*)

   *  Poplar1: Move the last step of sketch evaluation from prep_next()
      to prep_shares_to_prep().

   04:

   *  Align security considerations with the security analysis of
      [DPRS23].

   *  Vdaf: Pass the nonce to the sharding algorithm.

   *  Vdaf: Rather than allow the application to choose the nonce
      length, have each implementation of the Vdaf interface specify the
      expected nonce length. (*)

   *  Prg: Split "info string" into two components: the "customization
      string", intended for domain separation; and the "binder string",
      used to bind the output to ephemeral values, like the nonce,
      associated with execution of a (V)DAF.

   *  Replace PrgAes128 with PrgSha3, an implementation of the Prg
      interface based on SHA-3, and use the new scheme as the default.
      Accordingly, replace Prio3Aes128Count with Prio3Count,
      Poplar1Aes128 with Poplar1, and so on.  SHA-3 is a safer choice
      for instantiating a random oracle, which is used in the analysis
      of Prio3 of [DPRS23]. (*)

   *  Prio3, Poplar1: Ensure each invocation of the Prg uses a distinct
      customization string, as suggested by [DPRS23].  This is intended
      to make domain separation clearer, thereby simplifying security
      analysis. (*)

Barnes, et al.            Expires 7 April 2025                 [Page 12]
Internet-Draft                    VDAF                      October 2024

   *  Prio3: Replace "joint randomness hints" sent in each input share
      with "joint randomness parts" sent in the public share.  This
      reduces communication overhead when the number of shares exceeds
      two. (*)

   *  Prio3: Bind nonce to joint randomness parts.  This is intended to
      address birthday attacks on robustness pointed out by [DPRS23].
      (*)

   *  Poplar1: Use different Prg invocations for producing the
      correlated randomness for inner and leaf nodes of the IDPF tree.
      This is intended to simplify implementations. (*)

   *  Poplar1: Don't bind the candidate prefixes to the verifier
      randomness.  This is intended to improve performance, while not
      impacting security.  According to the analysis of [DPRS23], it is
      necessary to restrict Poplar1 usage such that no report is
      aggregated more than once at a given level of the IDPF tree;
      otherwise, attacks on privacy may be possible.  In light of this
      restriction, there is no added benefit of binding to the prefixes
      themselves. (*)

   *  Poplar1: During preparation, assert that all candidate prefixes
      are unique and appear in order.  Uniqueness is required to avoid
      erroneously rejecting a valid report; the ordering constraint
      ensures the uniqueness check can be performed efficiently. (*)

   *  Poplar1: Increase the maximum candidate prefix count in the
      encoding of the aggregation parameter. (*)

   *  Poplar1: Bind the nonce to the correlated randomness derivation.
      This is intended to provide defense-in-depth by ensuring the
      Aggregators reject the report if the nonce does not match what the
      Client used for sharding. (*)

   *  Poplar1: Clarify that the aggregation parameter encoding is
      OPTIONAL.  Accordingly, update implementation considerations
      around cross-aggregation state.

   *  IdpfPoplar: Add implementation considerations around branching on
      the values of control bits.

   *  IdpfPoplar: When decoding the the control bits in the public
      share, assert that the trailing bits of the final byte are all
      zero. (*)

   03:

Barnes, et al.            Expires 7 April 2025                 [Page 13]
Internet-Draft                    VDAF                      October 2024

   *  Define codepoints for (V)DAFs and use them for domain separation
      in Prio3 and Poplar1. (*)

   *  Prio3: Align joint randomness computation with revised paper
      [BBCGGI19].  This change mitigates an attack on robustness. (*)

   *  Prio3: Remove an intermediate PRG evaluation from query randomness
      generation. (*)

   *  Add additional guidance for choosing FFT-friendly fields.

   02:

   *  Complete the initial specification of Poplar1.

   *  Extend (V)DAF syntax to include a "public share" output by the
      Client and distributed to all of the Aggregators.  This is to
      accommodate "extractable" IDPFs as required for Poplar1.  (See
      [BBCGGI21], Section 4.3 for details.)

   *  Extend (V)DAF syntax to allow the unsharding step to take into
      account the number of measurements aggregated.

   *  Extend FLP syntax by adding a method for decoding the aggregate
      result from a vector of field elements.  The new method takes into
      account the number of measurements.

   *  Prio3: Align aggregate result computation with updated FLP syntax.

   *  Prg: Add a method for statefully generating a vector of field
      elements.

   *  Field: Require that field elements are fully reduced before
      decoding. (*)

   *  Define new field Field255.

   01:

   *  Require that VDAFs specify serialization of aggregate shares.

   *  Define Distributed Aggregation Functions (DAFs).

   *  Prio3: Move proof verifier check from prep_next() to
      prep_shares_to_prep(). (*)

   *  Remove public parameter and replace verification parameter with a
      "verification key" and "Aggregator ID".

Barnes, et al.            Expires 7 April 2025                 [Page 14]
Internet-Draft                    VDAF                      October 2024

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   Algorithms in this document are written in Python 3.  Type hints are
   used to define input and output types.  The type variable F is used
   in signatures to signify any type that is a subclass of Field.  A
   fatal error in a program (e.g., failure to parse one of the function
   parameters) is usually handled by raising an exception.

   A variable with type bytes is a byte string.  This document defines
   several byte-string constants.  When comprised of printable ASCII
   characters, they are written as Python 3 byte-string literals (e.g.,
   b'some constant string').

   A global constant VERSION of type int is defined, which algorithms
   are free to use as desired.  Its value SHALL be 12.

   This document describes algorithms for multi-party computations in
   which the parties typically communicate over a network.  Wherever a
   quantity is defined that must be be transmitted from one party to
   another, this document prescribes a particular encoding of that
   quantity as a byte string.

      OPEN ISSUE It might be better to not be prescriptive about how
      quantities are encoded on the wire.  See issue #58.

   Some common functionalities:

   *  zeros(len: int) -> bytes returns an array of zero bytes.  The
      length of output MUST be len.

   *  gen_rand(len: int) -> bytes returns an array of random bytes
      generated by a cryptographically secure pseudorandom number
      generator (CSPRNG).  The length of output MUST be len.

   *  byte(int: int) -> bytes returns the representation of int as a
      byte string.  The value of int MUST be in [0,256).

   *  concat(parts: list[bytes]) -> bytes returns the concatenation of
      the input byte strings, i.e., parts[0] || ... || parts[len(parts)-
      1].

Barnes, et al.            Expires 7 April 2025                 [Page 15]
Internet-Draft                    VDAF                      October 2024

   *  front(length: int, vec: list[Any]) -> (list[Any], list[Any])
      splits vec into two vectors, where the first vector is made up of
      the first length elements of the input.  I.e., (vec[:length],
      vec[length:]).

   *  xor(left: bytes, right: bytes) -> bytes returns the bitwise XOR of
      left and right.  An exception is raised if the inputs are not the
      same length.

   *  to_be_bytes(val: int, length: int) -> bytes converts val to big-
      endian bytes; its value MUST be at least 0 but less than
      2**(8*length).  Function from_be_bytes(encoded: bytes) -> int
      computes the inverse.

   *  to_le_bytes(val: int, length: int) -> bytes converts val to
      little-endian bytes; its value MUST be at least 0 but less than
      2**(8*length).  Function from_le_bytes(encoded: bytes) -> int
      computes the inverse.

   *  next_power_of_2(n: int) -> int returns the smallest integer
      greater than or equal to n that is also a power of two.

   *  additive_secret_share(vec: list[Field], num_shares: int, field:
      type) -> list[list[Field]] takes a vector of field elements and
      returns multiple vectors of the same length, such that they all
      add up to the input vector, and each proper subset of the vectors
      are indistinguishable from random.

   *  cast(typ: type, val: object) -> object returns the input value
      unchanged.  This is only present to assist with static analysis of
      the Python code.  Type checkers will ignore the inferred type of
      the input value, and assume the output value has the given type.

   *  range(stop) or range(start, stop[, step]) is the range function
      from the Python standard library.  The one-argument form returns
      the integers from zero (inclusive) to stop, exclusive.  The two-
      and three-argument forms allow overriding the start of the range
      and overriding the step between successive output values.

3.  Overview

Barnes, et al.            Expires 7 April 2025                 [Page 16]
Internet-Draft                    VDAF                      October 2024

                    +--------------+
              +---->| Aggregator 0 |----+
              |     +--------------+    |
              |             ^           |
              |             |           |
              |             V           |
              |     +--------------+    |
              | +-->| Aggregator 1 |--+ |
              | |   +--------------+  | |
   +--------+-+ |           ^         | +->+-----------+
   | Client |---+           |         +--->| Collector |--> Aggregate
   +--------+-+                         +->+-----------+
              |            ...          |
              |                         |
              |             |           |
              |             V           |
              |    +----------------+   |
              +--->| Aggregator N-1 |---+
                   +----------------+

         Input shares           Aggregate shares

                  Figure 1: Overall data flow of a (V)DAF

   In a DAF- or VDAF-based private measurement system, we distinguish
   three types of actors: Clients, Aggregators, and Collectors.  The
   overall flow of the measurement process is as follows:

   *  To submit an individual measurement, the Client shards the
      measurement into "input shares" and sends one input share to each
      Aggregator.  We sometimes refer to this sequence of input shares
      collectively as the Client's "report".

   *  The Aggregators refine their input shares into "output shares".

      -  Output shares are in one-to-one correspondence with the input
         shares.

      -  Just as each Aggregator receives one input share of each
         measurement, if this process succeeds, then each aggregator
         holds one output share.

      -  In VDAFs, Aggregators will need to exchange information among
         themselves as part of the validation process.

   *  Each Aggregator combines the output shares in the batch to compute
      the "aggregate share" for that batch, i.e., its share of the
      desired aggregate result.

Barnes, et al.            Expires 7 April 2025                 [Page 17]
Internet-Draft                    VDAF                      October 2024

   *  The Aggregators submit their aggregate shares to the Collector,
      who combines them to obtain the aggregate result over the batch.

   Aggregators are a new class of actor relative to traditional
   measurement systems where Clients submit measurements to a single
   server.  They are critical for both the privacy properties of the
   system and, in the case of VDAFs, the correctness of the measurements
   obtained.  The privacy properties of the system are assured by non-
   collusion among Aggregators, and Aggregators are the entities that
   perform validation of Client measurements.  Thus Clients trust
   Aggregators not to collude (typically it is required that at least
   one Aggregator is honest; see Section 9.7), and Collectors trust
   Aggregators to correctly run the protocol.

   Within the bounds of the non-collusion requirements of a given (V)DAF
   instance, it is possible for the same entity to play more than one
   role.  For example, the Collector could also act as an Aggregator,
   effectively using the other Aggregator(s) to augment a basic client-
   server protocol.

   In this document, we describe the computations performed by the
   actors in this system.  It is up to the higher-level protocol making
   use of the (V)DAF to arrange for the required information to be
   delivered to the proper actors in the proper sequence.  In general,
   we assume that all communications are confidential and mutually
   authenticated, with the exception that Clients submitting
   measurements may be anonymous.

4.  Definition of DAFs

   By way of a gentle introduction to VDAFs, this section describes a
   simpler class of schemes called Distributed Aggregation Functions
   (DAFs).  Unlike VDAFs, DAFs do not provide verifiability of the
   computation.  Clients must therefore be trusted to compute their
   input shares correctly.  Because of this fact, the use of a DAF is
   NOT RECOMMENDED for most applications.  See Section 9 for additional
   discussion.

   A DAF scheme is used to compute a particular "aggregation function"
   over a set of measurements generated by Clients.  Depending on the
   aggregation function, the Collector might select an "aggregation
   parameter" and disseminates it to the Aggregators.  The semantics of
   this parameter is specific to the aggregation function, but in
   general it is used to represent the set of "queries" that can be made
   on the measurement set.  For example, the aggregation parameter is
   used to represent the candidate prefixes in Poplar1 Section 8.

   Execution of a DAF has four distinct stages:

Barnes, et al.            Expires 7 April 2025                 [Page 18]
Internet-Draft                    VDAF                      October 2024

   *  Sharding - Each Client generates input shares from its measurement
      and distributes them among the Aggregators.

   *  Preparation - Each Aggregator converts each input share into an
      output share compatible with the aggregation function.  This
      computation involves the aggregation parameter.  In general, each
      aggregation parameter may result in a different an output share.

   *  Aggregation - Each Aggregator combines a sequence of output shares
      into its aggregate share and sends the aggregate share to the
      Collector.

   *  Unsharding - The Collector combines the aggregate shares into the
      aggregate result.

   Sharding and Preparation are done once per measurement.  Aggregation
   and Unsharding are done over a batch of measurements (more precisely,
   over the recovered output shares).

   A concrete DAF specifies an algorithm for the computation needed in
   each of these stages.  The interface of each algorithm is defined in
   the remainder of this section.  In addition, a concrete DAF defines
   the associated constants and types enumerated in the following table.

Barnes, et al.            Expires 7 April 2025                 [Page 19]
Internet-Draft                    VDAF                      October 2024

   +=============+=====================================================+
   | Parameter   | Description                                         |
   +=============+=====================================================+
   | ID: int     | Algorithm identifier for this                       |
   |             | DAF, in range(2**32).                               |
   +-------------+-----------------------------------------------------+
   | SHARES: int | Number of input shares into which                   |
   |             | each measurement is sharded.                        |
   +-------------+-----------------------------------------------------+
   | NONCE_SIZE: | Size of the nonce passed by the                     |
   | int         | application.                                        |
   +-------------+-----------------------------------------------------+
   | RAND_SIZE:  | Size of the random byte string                      |
   | int         | passed to sharding algorithm.                       |
   +-------------+-----------------------------------------------------+
   | Measurement | Type of each measurement.                           |
   +-------------+-----------------------------------------------------+
   | PublicShare | Type of each public share.                          |
   +-------------+-----------------------------------------------------+
   | InputShare  | Type of each input share.                           |
   +-------------+-----------------------------------------------------+
   | AggParam    | Type of aggregation parameter.                      |
   +-------------+-----------------------------------------------------+
   | OutShare    | Type of each output share.                          |
   +-------------+-----------------------------------------------------+
   | AggShare    | Type of the aggregate share.                        |
   +-------------+-----------------------------------------------------+
   | AggResult   | Type of the aggregate result.                       |
   +-------------+-----------------------------------------------------+

         Table 1: Constants and types defined by each concrete DAF.

   These types define the inputs and outputs of DAF methods at various
   stages of the computation.  Some of these values need to be written
   to the network in order to carry out the computation.  In particular,
   it is RECOMMENDED that concrete instantiations of the Daf interface
   specify a method of encoding the PublicShare, InputShare, and
   AggShare.

   Each DAF is identified by a unique, 32-bit integer ID.  Identifiers
   for each (V)DAF specified in this document are defined in Table 23.

4.1.  Sharding

   In order to protect the privacy of its measurements, a DAF Client
   shards its measurements into a sequence of input shares.  The shard
   method is used for this purpose.

Barnes, et al.            Expires 7 April 2025                 [Page 20]
Internet-Draft                    VDAF                      October 2024

   *  daf.shard(ctx: bytes, measurement: Measurement, nonce: bytes,
      rand: bytes) -> tuple[PublicShare, list[InputShare]] is the
      randomized sharding algorithm run by each Client that consumes the
      application context, a measurement, and a nonce and produces a
      "public share" distributed to each of the Aggregate and a
      corresponding sequence of input shares, one for each Aggregator.

      Pre-conditions:

      -  nonce MUST have length equal to NONCE_SIZE and MUST be
         generated using a CSPRNG.

      -  rand consists of the random bytes consumed by the algorithm.
         It MUST have length equal to RAND_SIZE and MUST be generated
         using a CSPRNG.

      Post-conditions:

      -  The number of input shares MUST equal SHARES.

   Sharding is bound to a specific "application context".  The
   application context is a string intended to uniquely identify an
   instance of the higher level protocol that uses the DAF.  This is
   intended to ensure that aggregation succeeds only if the Clients and
   Aggregators agree on the application context.  (Preparation binds the
   application context, too; see Section 4.2.)  Note that, unlike VDAFs
   (Section 5), there is no explicit signal of disagreement; it may only
   manifest as a garbled aggregate result.

       Client
       ======

       measurement
         |
         V
       +----------------------------------------------+
       | shard                                        |
       +----------------------------------------------+
         |              |              |     |
         |              |         ...  |    public_share
         |              |              |     |
         |    +---------|-----+--------|-----+
         |    |         |     |        |     |
         V    |         V     |        V     |
        input_share_0  input_share_1  input_share_[SHARES-1]
         |    |         |     |   ...  |     |
         V    V         V     V        V     V
       Aggregator 0   Aggregator 1    Aggregator SHARES-1

Barnes, et al.            Expires 7 April 2025                 [Page 21]
Internet-Draft                    VDAF                      October 2024

       Figure 2: The Client divides its measurement into input shares
       and distributes them to the Aggregators.  The public share is
                       broadcast to all Aggregators.

4.2.  Preparation

   Once an Aggregator has received the public share and one of the input
   shares, the next step is to prepare the input share for aggregation.
   This is accomplished using the following algorithm:

   *  daf.prep(ctx: bytes, agg_id: int, agg_param: AggParam, nonce:
      bytes, public_share: PublicShare, input_share: InputShare) ->
      OutShare is the deterministic preparation algorithm.  It takes as
      input the public share and one of the input shares generated by a
      Client, the application context, the Aggregator's unique
      identifier, the aggregation parameter selected by the Collector,
      and a nonce and returns an output share.

      Pre-conditions:

      -  agg_id MUST be in range(SHARES) and match the index of
         input_share in the sequence of input shares produced by the
         Client.

      -  nonce MUST have length NONCE_SIZE.

4.3.  Validity of Aggregation Parameters

   Concrete DAFs implementations MAY impose certain restrictions for
   input shares and aggregation parameters.  Protocols using a DAF MUST
   ensure that for each input share and aggregation parameter agg_param,
   daf.prep is only called if daf.is_valid(agg_param,
   previous_agg_params) returns True, where previous_agg_params contains
   all aggregation parameters that have previously been used with the
   same input share.

   DAFs MUST implement the following function:

   *  daf.is_valid(agg_param: AggParam, previous_agg_params:
      list[AggParam]) -> bool: checks if the agg_param is compatible
      with all elements of previous_agg_params.

4.4.  Aggregation

   Once an Aggregator holds output shares for a batch of measurements
   (where batches are defined by the application), it combines them into
   a share of the desired aggregate result:

Barnes, et al.            Expires 7 April 2025                 [Page 22]
Internet-Draft                    VDAF                      October 2024

   *  daf.aggregate(agg_param: AggParam, out_shares: list[OutShare]) ->
      AggShare is the deterministic aggregation algorithm.  It is run by
      each Aggregator a set of recovered output shares.

       Aggregator 0    Aggregator 1        Aggregator SHARES-1
       ============    ============        ===================

       out_share_0_0   out_share_1_0       out_share_[SHARES-1]_0
       out_share_0_1   out_share_1_1       out_share_[SHARES-1]_1
       out_share_0_2   out_share_1_2       out_share_[SHARES-1]_2
            ...             ...                     ...
       out_share_0_B   out_share_1_B       out_share_[SHARES-1]_B
         |               |                   |
         V               V                   V
       +-----------+   +-----------+       +-----------+
       | aggregate |   | aggregate |   ... | aggregate |
       +-----------+   +-----------+       +-----------+
         |               |                   |
         V               V                   V
       agg_share_0     agg_share_1         agg_share_[SHARES-1]

    Figure 3: Aggregation of output shares. `B` indicates the number of
                         measurements in the batch.

   For simplicity, we have written this algorithm in a "one-shot" form,
   where all output shares for a batch are provided at the same time.
   Many DAFs may also support a "streaming" form, where shares are
   processed one at a time.

   Implementation note: for most natural DAFs (and VDAFs) it is not
   necessary for an Aggregator to store all output shares individually
   before aggregating.  Typically it is possible to merge output shares
   into aggregate shares as they arrive, merge these into other
   aggregate shares, and so on.  In particular, this is the case when
   the output shares are vectors over some finite field and aggregating
   them involves merely adding up the vectors element-wise.  Such is the
   case for Prio3 Section 7 and Poplar1 Section 8.

4.5.  Unsharding

   After the Aggregators have aggregated a sufficient number of output
   shares, each sends its aggregate share to the Collector, who runs the
   following algorithm to recover the following output:

Barnes, et al.            Expires 7 April 2025                 [Page 23]
Internet-Draft                    VDAF                      October 2024

   *  daf.unshard(agg_param: AggParam, agg_shares: list[AggShare],
      num_measurements: int) -> AggResult is run by the Collector in
      order to compute the aggregate result from the Aggregators'
      shares.  The length of agg_shares MUST be SHARES. num_measurements
      is the number of measurements that contributed to each of the
      aggregate shares.  This algorithm is deterministic.

       Aggregator 0    Aggregator 1        Aggregator SHARES-1
       ============    ============        ===================

       agg_share_0     agg_share_1         agg_share_[SHARES-1]
         |               |                   |
         V               V                   V
       +-----------------------------------------------+
       | unshard                                       |
       +-----------------------------------------------+
         |
         V
       agg_result

       Collector
       =========

          Figure 4: Computation of the final aggregate result from
                             aggregate shares.

4.6.  Execution of a DAF

   Securely executing a DAF involves emulating the following procedure.

   def run_daf(
           daf: Daf[
               Measurement,
               AggParam,
               PublicShare,
               InputShare,
               OutShare,
               AggShare,
               AggResult,
           ],
           ctx: bytes,
           agg_param: AggParam,
           measurements: list[Measurement],
           nonces: list[bytes]) -> AggResult:
       """
       Run a DAF on a list of measurements.

       Pre-conditions:

Barnes, et al.            Expires 7 April 2025                 [Page 24]
Internet-Draft                    VDAF                      October 2024

           - `type(agg_param) == daf.AggParam`
           - `type(measurement) == daf.Measurement` for each
             `measurement` in `measurements`
           - `len(nonce) == daf.NONCE_SIZE` for each `nonce` in `nonces`
           - `len(nonces) == len(measurements)`
       """
       if any(len(nonce) != daf.NONCE_SIZE for nonce in nonces):
           raise ValueError("incorrect nonce size")
       if len(nonces) != len(measurements):
           raise ValueError(
               "measurements and nonces lists have different lengths"
           )

       out_shares: list[list[OutShare]]
       out_shares = [[] for j in range(daf.SHARES)]
       for (measurement, nonce) in zip(measurements, nonces):
           # Each Client shards its measurement into input shares and
           # distributes them among the Aggregators.
           rand = gen_rand(daf.RAND_SIZE)
           (public_share, input_shares) = \
               daf.shard(ctx, measurement, nonce, rand)

           # Each Aggregator prepares its input share for aggregation.
           for j in range(daf.SHARES):
               out_shares[j].append(
                   daf.prep(ctx, j, agg_param, nonce,
                            public_share, input_shares[j]))

       # Each Aggregator aggregates its output shares into an aggregate
       # share and sends it to the Collector.
       agg_shares = []
       for j in range(daf.SHARES):
           agg_share_j = daf.aggregate(agg_param,
                                       out_shares[j])
           agg_shares.append(agg_share_j)

       # Collector unshards the aggregate result.
       num_measurements = len(measurements)
       agg_result = daf.unshard(agg_param, agg_shares,
                                num_measurements)
       return agg_result

                       Figure 5: Execution of a DAF.

   The inputs to this procedure are the same as the aggregation function
   computed by the DAF: an aggregation parameter and a sequence of
   measurements.  The procedure prescribes how a DAF is executed in a
   "benign" environment in which there is no adversary and the messages

Barnes, et al.            Expires 7 April 2025                 [Page 25]
Internet-Draft                    VDAF                      October 2024

   are passed among the protocol participants over secure point-to-point
   channels.  In reality, these channels need to be instantiated by some
   "wrapper protocol", such as [DAP], that realizes these channels using
   suitable cryptographic mechanisms.  Moreover, some fraction of the
   Aggregators (or Clients) may be malicious and diverge from their
   prescribed behaviors.  Section 9 describes the execution of the DAF
   in various adversarial environments and what properties the wrapper
   protocol needs to provide in each.

5.  Definition of VDAFs

   Like DAFs described in the previous section, a VDAF scheme is used to
   compute a particular aggregation function over a set of Client-
   generated measurements.  Evaluation of a VDAF involves the same four
   stages as for DAFs: Sharding, Preparation, Aggregation, and
   Unsharding.  However, the Preparation stage will require interaction
   among the Aggregators in order to facilitate verifiability of the
   computation's correctness.  Accommodating this interaction will
   require syntactic changes.

   Overall execution of a VDAF comprises the following stages:

   *  Sharding - Computing input shares from an individual measurement

   *  Preparation - Conversion and verification of input shares to
      output shares compatible with the aggregation function being
      computed

   *  Aggregation - Combining a sequence of output shares into an
      aggregate share

   *  Unsharding - Combining a sequence of aggregate shares into an
      aggregate result

   In contrast to DAFs, the Preparation stage for VDAFs now performs an
   additional task: verification of the validity of the recovered output
   shares.  This process ensures that aggregating the output shares will
   not lead to a garbled aggregate result.

   The remainder of this section defines the VDAF interface.  The
   attributes are listed in Table 2 are defined by each concrete VDAF.

Barnes, et al.            Expires 7 April 2025                 [Page 26]
Internet-Draft                    VDAF                      October 2024

      +=================+==========================================+
      | Parameter       | Description                              |
      +=================+==========================================+
      | ID              | Algorithm identifier for this VDAF.      |
      +-----------------+------------------------------------------+
      | VERIFY_KEY_SIZE | Size (in bytes) of the verification key  |
      |                 | (Section 5.2).                           |
      +-----------------+------------------------------------------+
      | RAND_SIZE       | Size of the random byte string passed to |
      |                 | sharding algorithm.                      |
      +-----------------+------------------------------------------+
      | NONCE_SIZE      | Size (in bytes) of the nonce.            |
      +-----------------+------------------------------------------+
      | ROUNDS          | Number of rounds of communication during |
      |                 | the Preparation stage (Section 5.2).     |
      +-----------------+------------------------------------------+
      | SHARES          | Number of input shares into which each   |
      |                 | measurement is sharded (Section 5.1).    |
      +-----------------+------------------------------------------+
      | Measurement     | Type of each measurement.                |
      +-----------------+------------------------------------------+
      | PublicShare     | Type of each public share.               |
      +-----------------+------------------------------------------+
      | InputShare      | Type of each input share.                |
      +-----------------+------------------------------------------+
      | AggParam        | Type of aggregation parameter.           |
      +-----------------+------------------------------------------+
      | OutShare        | Type of each output share.               |
      +-----------------+------------------------------------------+
      | AggShare        | Type of the aggregate share.             |
      +-----------------+------------------------------------------+
      | AggResult       | Type of the aggregate result.            |
      +-----------------+------------------------------------------+
      | PrepState       | Aggregator's state during preparation.   |
      +-----------------+------------------------------------------+
      | PrepShare       | Type of each prep share.                 |
      +-----------------+------------------------------------------+
      | PrepMessage     | Type of each prep message.               |
      +-----------------+------------------------------------------+

       Table 2: Constants and types defined by each concrete VDAF.

   Some of these values need to be written to the network in order to
   carry out the computation.  In particular, it is RECOMMENDED that
   concrete instantiations of the Vdaf interface specify a method of
   encoding the PublicShare, InputShare, AggShare, PrepShare, and
   PrepMessage.

Barnes, et al.            Expires 7 April 2025                 [Page 27]
Internet-Draft                    VDAF                      October 2024

   Each VDAF is identified by a unique, 32-bit integer ID.  Identifiers
   for each (V)DAF specified in this document are defined in Table 23.
   The following method is defined for each VDAF specified in this
   document:

   def domain_separation_tag(self, usage: int, ctx: bytes) -> bytes:
       """
       Format domain separation tag for this VDAF with the given
       application context and usage.

       Pre-conditions:

           - `usage` in `range(2**16)`
       """
       return format_dst(0, self.ID, usage) + ctx

   It is used to construct a domain separation tag for an instance of
   Xof used by the VDAF.  (See Section 6.2.)

5.1.  Sharding

   Sharding transforms a measurement and nonce into a public share and
   input shares as it does in DAFs (cf.  Section 4.1):

   *  vdaf.shard(ctx: bytes, measurement: Measurement, nonce: bytes,
      rand: bytes) -> tuple[PublicShare, list[InputShare]] is the
      randomized sharding algorithm run by each Client that consumes the
      application context, a measurement, and a nonce and produces a
      public share distributed to each of the Aggregate and a
      corresponding sequence of input shares, one for each Aggregator.
      Depending on the VDAF, the input shares may encode additional
      information used to verify the recovered output shares (e.g., the
      "proof shares" in Prio3 Section 7)

      Pre-conditions:

      -  nonce MUST have length equal to NONCE_SIZE and MUST be
         generated using a CSPRNG.  (See Section 9 for details.)

      -  rand consists of the random bytes consumed by the algorithm.
         It MUST have length equal to RAND_SIZE and MUST be generated
         using a CSPRNG.

      Post-conditions:

      -  The number of input shares MUST equal SHARES.

Barnes, et al.            Expires 7 April 2025                 [Page 28]
Internet-Draft                    VDAF                      October 2024

   Like DAFs, sharding is bound to the application context via the ctx
   string.  Again, this is intended to ensure that aggregation succeeds
   only if the Clients and Aggregators agree on the application context.
   Unlike DAFs, however, disagreement on the context should manifest as
   a failure to validate the report, causing the report to be rejected
   without garbling the aggregate result.  The application context also
   provides some defense-in-depth against cross protocol attacks; see
   Section 9.8.

5.2.  Preparation

   To recover and verify output shares, the Aggregators interact with
   one another over ROUNDS rounds.  Prior to each round, each Aggregator
   constructs an outbound message.  Next, the sequence of outbound
   messages is combined into a single message, called a "preparation
   message", or "prep message" for short.  (Each of the outbound
   messages are called "preparation-message shares", or "prep shares"
   for short.)  Finally, the preparation message is distributed to the
   Aggregators to begin the next round.

   An Aggregator begins the first round with its input share and it
   begins each subsequent round with the previous prep message.  Its
   output in the last round is its output share and its output in each
   of the preceding rounds is a prep share.

   This process involves a value called the "aggregation parameter" used
   to map the input shares to output shares.  The Aggregators need to
   agree on this parameter before they can begin preparing the
   measurement shares for aggregation.

Barnes, et al.            Expires 7 April 2025                 [Page 29]
Internet-Draft                    VDAF                      October 2024

       Aggregator 0   Aggregator 1           Aggregator SHARES-1
       ============   ============           ===================

       input_share_0  input_share_1          input_share_[SHARES-1]
         |              |                 ...  |
         V              V                      V
       +-----------+  +-----------+          +-----------+
       | prep_init |  | prep_init |          | prep_init |
       +-----------+  +-----------+          +-----------+
         |       |      |       |         ...  |       |
         V       |      V       |              V       |
       +---------|--------------|----------------------|-+   \
       |         |              | prep_shares_to_prep  | |   |
       +---------|--------------|----------------------|-+   |
         |       |      |       |         ...  |       |     |
         V       V      V       V              V       |     | x ROUNDS
       +-----------+  +-----------+          +-----------+   |
       | prep_next |  | prep_next |          | prep_next |   |
       +-----------+  +-----------+          +-----------+   |
         |       |      |                 ...  |       |     |
         V       V      V                      V       V     /
        ...            ...                    ...
         |              |                 ...  |
         V              V                      V
       out_share_0    out_share_1         out_share_[SHARES-1]

    Figure 6: VDAF preparation process on the input shares for a single
   measurement.  At the end of the computation, each Aggregator holds an
                         output share or an error.

   To facilitate the preparation process, a concrete VDAF implements the
   following methods:

   *  vdaf.prep_init(verify_key: bytes, ctx: bytes, agg_id: int,
      agg_param: AggParam, nonce: bytes, public_share: PublicShare,
      input_share: InputShare) -> tuple[PrepState, PrepShare] is the
      deterministic preparation-state initialization algorithm run by
      each Aggregator to begin processing its input share into an output
      share.  Its inputs are the shared verification key (verify_key),
      the application context (ctx), the Aggregator's unique identifier
      (agg_id), the aggregation parameter (agg_param), the nonce
      provided by the environment (nonce, see Figure 7), the public
      share (public_share), and one of the input shares generated by the
      Client (input_share).  Its output is the Aggregator's initial
      preparation state and initial prep share.

Barnes, et al.            Expires 7 April 2025                 [Page 30]
Internet-Draft                    VDAF                      October 2024

      It is up to the high level protocol in which the VDAF is used to
      arrange for the distribution of the verification key prior to
      generating and processing reports.  (See Section 9 for details.)

      Protocols MUST ensure that public share consumed by each of the
      Aggregators is identical.  This is security critical for VDAFs
      such as Poplar1.

      Pre-conditions:

      -  verify_key MUST have length vdaf.VERIFY_KEY_SIZE.

      -  agg_id MUST be the integer in range(vdaf.SHARES) that matches
         the index of input_share in the sequence of input shares output
         by the Client.

      -  nonce MUST have length vdaf.NONCE_SIZE.

   *  vdaf.prep_next(ctx: bytes, prep_state: PrepState, prep_msg:
      PrepMessage) -> tuple[PrepState, PrepShare] | OutShare is the
      deterministic preparation-state update algorithm run by each
      Aggregator.  It updates the Aggregator's preparation state
      (prep_state) and returns either its next preparation state and its
      message share for the current round or, if this is the last round,
      its output share.  An exception is raised if a valid output share
      could not be recovered.  The input of this algorithm is the
      inbound preparation message.

   *  vdaf.prep_shares_to_prep(ctx: bytes, agg_param: AggParam,
      prep_shares: list[PrepShare]) -> PrepMessage is the deterministic
      preparation-message pre-processing algorithm.  It combines the
      prep shares generated by the Aggregators in the previous round
      into the prep message consumed by each in the next round.

   In effect, each Aggregator moves through a linear state machine with
   ROUNDS states.  The Aggregator enters the first state on using the
   initialization algorithm, and the update algorithm advances the
   Aggregator to the next state.  Thus, in addition to defining the
   number of rounds (ROUNDS), a VDAF instance defines the state of the
   Aggregator after each round.

   The preparation-state update accomplishes two tasks: recovery of
   output shares from the input shares and ensuring that the recovered
   output shares are valid.  The abstraction boundary is drawn so that
   an Aggregator only recovers an output share if it is deemed valid (at
   least, based on the Aggregator's view of the protocol).  Another way
   to draw this boundary would be to have the Aggregators recover output
   shares first, then verify that they are valid.  However, this would

Barnes, et al.            Expires 7 April 2025                 [Page 31]
Internet-Draft                    VDAF                      October 2024

   allow the possibility of misusing the API by, say, aggregating an
   invalid output share.  Moreover, in protocols like Prio+ [AGJOP21]
   based on oblivious transfer, it is necessary for the Aggregators to
   interact in order to recover aggregatable output shares at all.

5.3.  Validity of Aggregation Parameters

   Similar to DAFs (see Section 4.3), VDAFs MAY impose restrictions for
   input shares and aggregation parameters.  Protocols using a VDAF MUST
   ensure that for each input share and aggregation parameter agg_param,
   the preparation phase (including vdaf.prep_init, vdaf.prep_next, and
   vdaf.prep_shares_to_prep; see Section 5.2) is only called if
   vdaf.is_valid(agg_param, previous_agg_params) returns True, where
   previous_agg_params contains all aggregation parameters that have
   previously been used with the same input share.

   VDAFs MUST implement the following function:

   *  vdaf.is_valid(agg_param: AggParam, previous_agg_params:
      list[AggParam]) -> bool: checks if the agg_param is compatible
      with all elements of previous_agg_params.

5.4.  Aggregation

   VDAF Aggregation is identical to DAF Aggregation (cf.  Section 4.4):

   *  vdaf.aggregate(agg_param: AggParam, out_shares: list[OutShare]) ->
      AggShare is the deterministic aggregation algorithm.  It is run by
      each Aggregator over the output shares it has computed for a batch
      of measurements.

   The data flow for this stage is illustrated in Figure 3.  Here again,
   we have the aggregation algorithm in a "one-shot" form, where all
   shares for a batch are provided at the same time.  VDAFs typically
   also support a "streaming" form, where shares are processed one at a
   time.

5.5.  Unsharding

   VDAF Unsharding is identical to DAF Unsharding (cf.  Section 4.5):

   *  vdaf.unshard(agg_param: AggParam, agg_shares: list[AggShare],
      num_measurements: int) -> AggResult is run by the Collector in
      order to compute the aggregate result from the Aggregators'
      shares.  The length of agg_shares MUST be SHARES. num_measurements
      is the number of measurements that contributed to each of the
      aggregate shares.  This algorithm is deterministic.

Barnes, et al.            Expires 7 April 2025                 [Page 32]
Internet-Draft                    VDAF                      October 2024

   The data flow for this stage is illustrated in Figure 4.

5.6.  Execution of a VDAF

   Secure execution of a VDAF involves simulating the following
   procedure.

   def run_vdaf(
           vdaf: Vdaf[
               Measurement,
               AggParam,
               PublicShare,
               InputShare,
               list[F],  # OutShare
               AggShare,
               AggResult,
               PrepState,
               PrepShare,
               PrepMessage,
           ],
           verify_key: bytes,
           agg_param: AggParam,
           ctx: bytes,
           nonces: list[bytes],
           measurements: list[Measurement]) -> AggResult:
       """
       Run the VDAF on a list of measurements.

       Pre-conditions:

           - `len(verify_key) == vdaf.VERIFY_KEY_SIZE`
           - `len(nonces) == len(measurements)`
           - `all(len(nonce) == vdaf.NONCE_SIZE for nonce in nonces)`
       """

       if len(verify_key) != vdaf.VERIFY_KEY_SIZE:
           raise ValueError("incorrect verify_key size")
       if any(len(nonce) != vdaf.NONCE_SIZE for nonce in nonces):
           raise ValueError("incorrect nonce size")
       if len(nonces) != len(measurements):
           raise ValueError(
               "measurements and nonces lists have different lengths"
           )

       out_shares = []
       for (nonce, measurement) in zip(nonces, measurements):
           assert len(nonce) == vdaf.NONCE_SIZE

Barnes, et al.            Expires 7 April 2025                 [Page 33]
Internet-Draft                    VDAF                      October 2024

           # Each Client shards its measurement into input shares.
           rand = gen_rand(vdaf.RAND_SIZE)
           (public_share, input_shares) = \
               vdaf.shard(ctx, measurement, nonce, rand)

           # Each Aggregator initializes its preparation state.
           prep_states = []
           outbound_prep_shares = []
           for j in range(vdaf.SHARES):
               (state, share) = vdaf.prep_init(verify_key, ctx, j,
                                               agg_param,
                                               nonce,
                                               public_share,
                                               input_shares[j])
               prep_states.append(state)
               outbound_prep_shares.append(share)

           # Aggregators recover their output shares.
           for i in range(vdaf.ROUNDS - 1):
               prep_msg = vdaf.prep_shares_to_prep(ctx,
                                                   agg_param,
                                                   outbound_prep_shares)

               outbound_prep_shares = []
               for j in range(vdaf.SHARES):
                   out = vdaf.prep_next(ctx, prep_states[j], prep_msg)
                   assert isinstance(out, tuple)
                   (prep_states[j], prep_share) = out
                   outbound_prep_shares.append(prep_share)

           # The final outputs of the prepare phase are the output
           # shares.
           prep_msg = vdaf.prep_shares_to_prep(ctx,
                                               agg_param,
                                               outbound_prep_shares)

           outbound_out_shares = []
           for j in range(vdaf.SHARES):
               out_share = vdaf.prep_next(ctx, prep_states[j], prep_msg)
               assert not isinstance(out_share, tuple)
               outbound_out_shares.append(out_share)

           out_shares.append(outbound_out_shares)

       # Each Aggregator aggregates its output shares into an
       # aggregate share. In a distributed VDAF computation, the
       # aggregate shares are sent over the network.
       agg_shares = []

Barnes, et al.            Expires 7 April 2025                 [Page 34]
Internet-Draft                    VDAF                      October 2024

       for j in range(vdaf.SHARES):
           out_shares_j = [out[j] for out in out_shares]
           agg_share_j = vdaf.aggregate(agg_param, out_shares_j)
           agg_shares.append(agg_share_j)

       # Collector unshards the aggregate.
       num_measurements = len(measurements)
       agg_result = vdaf.unshard(agg_param, agg_shares,
                                 num_measurements)
       return agg_result

                       Figure 7: Execution of a VDAF.

   The inputs to this algorithm are the aggregation parameter, a list of
   measurements, and a nonce for each measurement.  This document does
   not specify how the nonces are chosen, but security requires that the
   nonces be unique.  See Section 9 for details.  As explained in
   Section 4.6, the secure execution of a VDAF requires the application
   to instantiate secure channels between each of the protocol
   participants.

5.7.  Communication Patterns for Preparation

   In each round of preparation, each Aggregator writes a prep share to
   some broadcast channel, which is then processed into the prep message
   using the public prep_shares_to_prep() algorithm and broadcast to the
   Aggregators to start the next round.  In this section we describe
   some approaches to realizing this broadcast channel functionality in
   protocols that use VDAFs.

   The state machine of each Aggregator is shown in Figure 8.

                     +----------------+
                     |                |
                     v                |
   Start ----> Continued(prep_state, prep_round) --> Finished(out_share)
    |                |
    |                |
    +--> Rejected <--+

               Figure 8: State machine for VDAF preparation.

   State transitions are made when the state is acted upon by the host's
   local inputs and/or messages sent by the peers.  The initial state is
   Start.  The terminal states are Rejected, which indicates that the
   report cannot be processed any further, and Finished(out_share),
   which indicates that the Aggregator has recovered an output share
   out_share.

Barnes, et al.            Expires 7 April 2025                 [Page 35]
Internet-Draft                    VDAF                      October 2024

   class State:
       pass

   class Start(State):
       pass

   class Continued(State, Generic[PrepState]):
       def __init__(self, prep_state: PrepState, prep_round: int):
           self.prep_state = prep_state
           self.prep_round = prep_round

       def __eq__(self, other: object) -> bool:
           return isinstance(other, Continued) and \
               self.prep_state == other.prep_state and \
               self.prep_round == other.prep_round

   class Finished(State, Generic[OutShare]):
       def __init__(self, out_share: OutShare):
           self.out_share = out_share

       def __eq__(self, other: object) -> bool:
           return isinstance(other, Finished) and \
               self.out_share == other.out_share

   class Rejected(State):
       pass

   The methods described in this section are defined in terms of opaque
   byte strings.  A compatible Vdaf MUST specify methods for encoding
   public shares, input shares, prep shares, prep messages, and
   aggregation parameters.

   Implementations of Prio3 and Poplar1 MUST use the encoding scheme
   specified in Section 7.2.7 and Section 8.2.6 respectively.

5.8.  Ping-Pong Topology (Only Two Aggregators)

   For VDAFs with precisely two Aggregators (i.e., SHARES == 2), the
   following "ping pong" communication pattern can be used.  It is
   compatible with any request/response transport protocol, such as
   HTTP.

   Let us call the initiating party the "Leader" and the responding
   party the "Helper".  The high-level idea is that the Leader and
   Helper will take turns running the computation locally until input
   from their peer is required:

Barnes, et al.            Expires 7 April 2025                 [Page 36]
Internet-Draft                    VDAF                      October 2024

   *  For a 1-round VDAF (e.g., Prio3 (Section 7)), the Leader sends its
      prep share to the Helper, who computes the prep message locally,
      computes its output share, then sends the prep message to the
      Leader.  Preparation requires just one round trip between the
      Leader and the Helper.

   *  For a 2-round VDAF (e.g., Poplar1 (Section 8)), the Leader sends
      its first-round prep share to the Helper, who replies with the
      first-round prep message and its second-round prep share.  In the
      next request, the Leader computes its second-round prep share
      locally, computes its output share, and sends the second-round
      prep message to the Helper.  Finally, the Helper computes its own
      output share.

   *  In general, each request includes the Leader's prep share for the
      previous round and/or the prep message for the current round;
      correspondingly, each response consists of the prep message for
      the current round and the Helper's prep share for the next round.

   The Aggregators proceed in this ping-ponging fashion until a step of
   the computation fails (indicating the report is invalid and should be
   rejected) or preparation is completed.  All told there there are
   ceil((ROUNDS+1)/2) requests sent.

   Each message in the ping-pong protocol is structured as follows
   (expressed in TLS syntax as defined in Section 3 of [RFC8446]):

   enum {
     initialize(0),
     continue(1),
     finish(2),
     (255)
   } MessageType;

   struct {
     MessageType type;
     select (Message.type) {
       case initialize:
         opaque prep_share<0..2^32-1>;
       case continue:
         opaque prep_msg<0..2^32-1>;
         opaque prep_share<0..2^32-1>;
       case finish:
         opaque prep_msg<0..2^32-1>;
     };
   } Message;

Barnes, et al.            Expires 7 April 2025                 [Page 37]
Internet-Draft                    VDAF                      October 2024

   These messages are used to transition between the states described in
   Section 5.7.  The Leader's initial transition is computed with the
   following method, implemented on Vdaf:

   def ping_pong_leader_init(
           self,
           vdaf_verify_key: bytes,
           ctx: bytes,
           agg_param: bytes,
           nonce: bytes,
           public_share: bytes,
           input_share: bytes) -> tuple[State, Optional[bytes]]:
       """Called by the leader to initialize ping-ponging."""
       try:
           (prep_state, prep_share) = self.prep_init(
               vdaf_verify_key,
               ctx,
               0,
               self.decode_agg_param(agg_param),
               nonce,
               self.decode_public_share(public_share),
               self.decode_input_share(0, input_share),
           )
           encoded_prep_share = self.encode_prep_share(prep_share)
           return (
               Continued(prep_state, 0),
               encode(0, encoded_prep_share),  # initialize
           )
       except:
           return (Rejected(), None)

   The output is the State to which the Leader has transitioned and an
   encoded Message.  If the Leader's state is Rejected, then processing
   halts.  Otherwise, if the state is Continued, then processing
   continues.  Function encode is used to encode the outbound message,
   here the initialize variant (hence 0).

   The Leader sends the outbound message to the Helper.  The Helper's
   initial transition is computed using the following procedure:

Barnes, et al.            Expires 7 April 2025                 [Page 38]
Internet-Draft                    VDAF                      October 2024

   def ping_pong_helper_init(
           self,
           vdaf_verify_key: bytes,
           ctx: bytes,
           agg_param: bytes,
           nonce: bytes,
           public_share: bytes,
           input_share: bytes,
           inbound: bytes) -> tuple[State, Optional[bytes]]:
       """
       Called by the helper in response to the leader's initial
       message.
       """
       try:
           (prep_state, prep_share) = self.prep_init(
               vdaf_verify_key,
               ctx,
               1,
               self.decode_agg_param(agg_param),
               nonce,
               self.decode_public_share(public_share),
               self.decode_input_share(1, input_share),
           )

           (inbound_type, inbound_items) = decode(inbound)
           if inbound_type != 0:  # initialize
               return (Rejected(), None)

           encoded_prep_share = inbound_items[0]
           prep_shares = [
               self.decode_prep_share(prep_state, encoded_prep_share),
               prep_share,
           ]
           return self.ping_pong_transition(
               ctx,
               self.decode_agg_param(agg_param),
               prep_shares,
               prep_state,
               0,
           )
       except:
           return (Rejected(), None)

   Procedure decode decodes the inbound message and returns the
   MessageType variant (initialize, continue, or finalize) and the
   sequence of fields.  Procedure ping_pong_transition takes in the prep
   shares, combines them into the prep message, and computes the next
   prep state of the caller:

Barnes, et al.            Expires 7 April 2025                 [Page 39]
Internet-Draft                    VDAF                      October 2024

   def ping_pong_transition(
           self,
           ctx: bytes,
           agg_param: AggParam,
           prep_shares: list[PrepShare],
           prep_state: PrepState,
           prep_round: int) -> tuple[State, bytes]:
       prep_msg = self.prep_shares_to_prep(ctx,
                                           agg_param,
                                           prep_shares)
       encoded_prep_msg = self.encode_prep_msg(prep_msg)
       out = self.prep_next(ctx, prep_state, prep_msg)
       if prep_round+1 == self.ROUNDS:
           return (
               Finished(out),
               encode(2, encoded_prep_msg),  # finalize
           )
       (prep_state, prep_share) = cast(
           tuple[PrepState, PrepShare], out)
       encoded_prep_share = self.encode_prep_share(prep_share)
       return (
           Continued(prep_state, prep_round+1),
           encode(1, encoded_prep_msg, encoded_prep_share)  # continue
       )

   The output is the State to which the Helper has transitioned and an
   encoded Message.  If the Helper's state is Finished or Rejected, then
   processing halts.  Otherwise, if the state is Continued, then
   processing continues.

   Next, the Helper sends the outbound message to the Leader.  The
   Leader computes its next state transition using the function
   ping_pong_leader_continued:

   def ping_pong_leader_continued(
       self,
       ctx: bytes,
       agg_param: bytes,
       state: State,
       inbound: bytes,
   ) -> tuple[State, Optional[bytes]]:
       """
       Called by the leader to start the next step of ping-ponging.
       """
       return self.ping_pong_continued(
           True, ctx, agg_param, state, inbound)

   def ping_pong_continued(

Barnes, et al.            Expires 7 April 2025                 [Page 40]
Internet-Draft                    VDAF                      October 2024

       self,
       is_leader: bool,
       ctx: bytes,
       agg_param: bytes,
       state: State,
       inbound: bytes,
   ) -> tuple[State, Optional[bytes]]:
       try:
           if not isinstance(state, Continued):
               return (Rejected(), None)
           prep_round = state.prep_round

           (inbound_type, inbound_items) = decode(inbound)
           if inbound_type == 0:  # initialize
               return (Rejected(), None)

           encoded_prep_msg = inbound_items[0]
           prep_msg = self.decode_prep_msg(
               state.prep_state,
               encoded_prep_msg,
           )
           out = self.prep_next(ctx, state.prep_state, prep_msg)
           if prep_round+1 < self.ROUNDS and \
                   inbound_type == 1:  # continue
               (prep_state, prep_share) = cast(
                   tuple[PrepState, PrepShare], out)
               encoded_prep_share = inbound_items[1]
               prep_shares = [
                   self.decode_prep_share(
                       prep_state,
                       encoded_prep_share,
                   ),
                   prep_share,
               ]
               if is_leader:
                   prep_shares.reverse()
               return self.ping_pong_transition(
                   ctx,
                   self.decode_agg_param(agg_param),
                   prep_shares,
                   prep_state,
                   prep_round+1,
               )
           elif prep_round+1 == self.ROUNDS and \
                   inbound_type == 2:  # finish
               return (Finished(out), None)
           else:
               return (Rejected(), None)

Barnes, et al.            Expires 7 April 2025                 [Page 41]
Internet-Draft                    VDAF                      October 2024

       except:
           return (Rejected(), None)

   If the Leader's state is Finished or Rejected, then processing halts.
   Otherwise, the Leader sends the outbound message to the Helper.  The
   Helper computes its next state transition using the function
   ping_pong_helper_continued:

   def ping_pong_helper_continued(
       self,
       ctx: bytes,
       agg_param: bytes,
       state: State,
       inbound: bytes,
   ) -> tuple[State, Optional[bytes]]:
       """Called by the helper to continue ping-ponging."""
       return self.ping_pong_continued(
           False, ctx, agg_param, state, inbound)

   They continue in this way until processing halts.  Note that,
   depending on the number of rounds of preparation that are required,
   there may be one more message to send before the peer can also finish
   processing (i.e., outbound != None).

5.9.  Star Topology (Any Number of Aggregators)

   The ping-pong topology of the previous section is only suitable for
   VDAFs involving exactly two Aggregators.  If the VDAF supports more
   than two Aggregators, then the star topology described in this
   section can be used instead.

   We again designate an Aggregator to initiate the computation.  We
   refer to this Aggregator as the Leader and to all other Aggregators
   as Helpers.

   At the start of each round, the Leader requests from each Helper its
   prep share.  After gathering each of the prep shares, the Leader
   computes the next prep message (via prep_shares_to_prep()) and
   broadcasts it to the Helpers.  At this point, each Aggregator runs
   prep_next() locally to either recover an output share or, if more
   rounds of preparation are required, compute its updated state and
   prep share.  If more are required, then the Helper responds to the
   broadcast message with its next prep share.

   The Aggregators proceed in this way until each recovers an output
   share or some step of the computation fails.

Barnes, et al.            Expires 7 April 2025                 [Page 42]
Internet-Draft                    VDAF                      October 2024

6.  Preliminaries

   This section describes the primitives that are common to the VDAFs
   specified in this document.

6.1.  Finite Fields

   Both Prio3 and Poplar1 use finite fields of prime order.  Finite
   field elements are represented by a class Field with the following
   associated parameters:

   *  MODULUS: int is the prime modulus that defines the field.

   *  ENCODED_SIZE: int is the number of bytes used to encode a field
      element as a byte string.

   A concrete Field also implements the following class methods:

   *  Field.zeros(length: int) -> list[Self] returns a vector of zeros.

      Pre-conditions:

      -  length MUST be greater than or equal 0.

      Post-conditions:

      -  The length of the output MUST be length.

   *  Field.rand_vec(length: int) -> list[Self] returns a vector of
      random field elements.  Same pre- and post-conditions as for
      Field.zeros().

   A field element is an instance of a concrete Field.  The concrete
   class defines the usual arithmetic operations on field elements.  In
   addition, it defines the following instance method for converting a
   field element to an unsigned integer:

   *  elem.as_unsigned() -> int returns the integer representation of
      field element elem.

   Likewise, each concrete Field implements a constructor for converting
   an unsigned integer into a field element:

   *  Field(integer: int) returns integer represented as a field
      element.  The value of integer MUST be non-negative and less than
      Field.MODULUS.

Barnes, et al.            Expires 7 April 2025                 [Page 43]
Internet-Draft                    VDAF                      October 2024

   Each concrete Field has two derived class methods, one for encoding a
   vector of field elements as a byte string and another for decoding a
   vector of field elements.

   def encode_vec(cls, vec: list[Self]) -> bytes:
       """
       Encode a vector of field elements `vec` as a byte string.
       """
       encoded = bytes()
       for x in vec:
           encoded += to_le_bytes(x.as_unsigned(), cls.ENCODED_SIZE)
       return encoded

   def decode_vec(cls, encoded: bytes) -> list[Self]:
       """
       Parse a vector of field elements from `encoded`.
       """
       L = cls.ENCODED_SIZE
       if len(encoded) % L != 0:
           raise ValueError(
               'input length must be a multiple of the size of an '
               'encoded field element')

       vec = []
       for i in range(0, len(encoded), L):
           encoded_x = encoded[i:i+L]
           x = from_le_bytes(encoded_x)
           if x >= cls.MODULUS:
               raise ValueError('modulus overflow')
           vec.append(cls(x))
       return vec

             Figure 9: Derived class methods for finite fields.

   Finally, Field implements the following methods for representing a
   value as a sequence of field elements, each of which represents a bit
   of the input.

Barnes, et al.            Expires 7 April 2025                 [Page 44]
Internet-Draft                    VDAF                      October 2024

   def encode_into_bit_vector(
           cls,
           val: int,
           bits: int) -> list[Self]:
       """
       Encode the bit representation of `val` with at most `bits` number
       of bits, as a vector of field elements.

       Pre-conditions:

           - `val >= 0`
           - `bits >= 0`
       """
       if val >= 2 ** bits:
           # Sanity check we are able to represent `val` with `bits`
           # number of bits.
           raise ValueError("Number of bits is not enough to represent "
                            "the input integer.")
       encoded = []
       for l in range(bits):
           encoded.append(cls((val >> l) & 1))
       return encoded

   def decode_from_bit_vector(cls, vec: list[Self]) -> Self:
       """
       Decode the field element from the bit representation, expressed
       as a vector of field elements `vec`.
       """
       bits = len(vec)
       if cls.MODULUS >> bits == 0:
           raise ValueError("Number of bits is too large to be "
                            "represented by field modulus.")
       decoded = cls(0)
       for (l, bit) in enumerate(vec):
           decoded += cls(1 << l) * bit
       return decoded

    Figure 10: Derived class methods to encode integers into bit vector
                              representation.

6.1.1.  Auxiliary Functions

   The following auxiliary functions on vectors of field elements are
   used in the remainder of this document.  Note that an exception is
   raised by each function if the operands are not the same length.

Barnes, et al.            Expires 7 April 2025                 [Page 45]
Internet-Draft                    VDAF                      October 2024

   def vec_sub(left: list[F], right: list[F]) -> list[F]:
       """
       Subtract the right operand from the left and return the result.
       """
       if len(left) != len(right):
           raise ValueError("mismatched vector sizes")
       return list(map(lambda x: x[0] - x[1], zip(left, right)))

   def vec_add(left: list[F], right: list[F]) -> list[F]:
       """Add the right operand to the left and return the result."""
       if len(left) != len(right):
           raise ValueError("mismatched vector sizes")
       return list(map(lambda x: x[0] + x[1], zip(left, right)))

   def vec_neg(vec: list[F]) -> list[F]:
       """Negate the input vector."""
       return list(map(lambda x: -x, vec))

               Figure 11: Common functions for finite fields.

6.1.2.  NTT-Friendly Fields

   Some VDAFs require fields that are suitable for efficient computation
   of the number theoretic transform (NTT) [SML24], as this allows for
   fast polynomial interpolation.  (One example is Prio3 (Section 7)
   when instantiated with the FLP of Section 7.3.3.)  Specifically, a
   field is said to be "NTT-friendly" if, in addition to satisfying the
   interface described in Section 6.1, it implements the following
   method:

   *  Field.gen() -> Field returns the generator of a large subgroup of
      the multiplicative group.  To be NTT-friendly, the order of this
      subgroup MUST be a power of 2.  In addition, the size of the
      subgroup dictates how large interpolated polynomials can be.  It
      is RECOMMENDED that a generator is chosen with order at least
      2^20.

   NTT-friendly fields also define the following parameter:

   *  GEN_ORDER: int is the order of a multiplicative subgroup generated
      by Field.gen().

6.1.3.  Parameters

   The tables below define finite fields used in the remainder of this
   document.

Barnes, et al.            Expires 7 April 2025                 [Page 46]
Internet-Draft                    VDAF                      October 2024

   +==============+================+=======================+==========+
   | Parameter    | Field64        | Field128              | Field255 |
   +==============+================+=======================+==========+
   | MODULUS      | 2^32 *         | 2^66 *                | 2^255 -  |
   |              | 4294967295 + 1 | 4611686018427387897 + | 19       |
   |              |                | 1                     |          |
   +--------------+----------------+-----------------------+----------+
   | ENCODED_SIZE | 8              | 16                    | 32       |
   +--------------+----------------+-----------------------+----------+
   | Generator    | 7^4294967295   | 7^4611686018427387897 | n/a      |
   +--------------+----------------+-----------------------+----------+
   | GEN_ORDER    | 2^32           | 2^66                  | n/a      |
   +--------------+----------------+-----------------------+----------+

     Table 3: Parameters for the finite fields used in this document.

6.2.  Extendable Output Functions

   VDAFs in this specification use extendable output functions (XOFs) to
   extract short, fixed-length strings we call "seeds" from long input
   strings and expand seeds into long output strings.  We specify a
   single interface that is suitable for both purposes.

   XOFs are defined by a class Xof with the following associated
   parameter and methods:

   *  SEED_SIZE: int is the size (in bytes) of a seed.

   *  Xof(seed: bytes, dst: bytes, binder: bytes) constructs an instance
      of Xof from the given seed, domain separation tag, and binder
      string.  (See below for definitions of these.)  The length of the
      seed will typically be SEED_SIZE, but some XOFs may support
      multiple seed sizes.  The seed MUST be generated securely (i.e.,
      it is either the output of a CSPRNG or a previous invocation of
      the XOF).

   *  xof.next(length: int) returns the next length bytes of output of
      xof.

   Each Xof has three derived methods.  The first is used to derive a
   fresh seed from an existing one.  The second is used to compute a
   sequence of field elements.  The third is a convenience method to
   construct an Xof from a seed, domain separation tag, and binder
   string, and then use it to compute a sequence of field elements.

Barnes, et al.            Expires 7 April 2025                 [Page 47]
Internet-Draft                    VDAF                      October 2024

   def derive_seed(cls,
                   seed: bytes,
                   dst: bytes,
                   binder: bytes) -> bytes:
       """
       Derive a new seed.

       Pre-conditions:

           - `len(seed) == Xof.SEED_SIZE`
       """
       xof = cls(seed, dst, binder)
       return xof.next(cls.SEED_SIZE)

   def next_vec(self, field: type[F], length: int) -> list[F]:
       """
       Output the next `length` field elements.

       Pre-conditions:

           - `field` is sub-class of `Field`
           - `length > 0`
       """
       m = next_power_of_2(field.MODULUS) - 1
       vec: list[F] = []
       while len(vec) < length:
           x = from_le_bytes(self.next(field.ENCODED_SIZE))
           x &= m
           if x < field.MODULUS:
               vec.append(field(x))
       return vec

   def expand_into_vec(cls,
                       field: type[F],
                       seed: bytes,
                       dst: bytes,
                       binder: bytes,
                       length: int) -> list[F]:
       """
       Expand the input `seed` into vector of `length` field elements.

       Pre-conditions:

           - `field` is sub-class of `Field`
           - `len(seed) == Xof.SEED_SIZE`
           - `length > 0`
       """
       xof = cls(seed, dst, binder)

Barnes, et al.            Expires 7 April 2025                 [Page 48]
Internet-Draft                    VDAF                      October 2024

       return xof.next_vec(field, length)

                    Figure 12: Derived methods for XOFs.

6.2.1.  XofTurboShake128

   This section describes XofTurboShake128, an XOF based on the
   TurboSHAKE128 [TurboSHAKE].  (RFC EDITOR: Update this reference to
   the RFC for draft-irtf-cfrg-kangarootwelve once published.)  This XOF
   is RECOMMENDED for all use cases within VDAFs.  The length of the
   domain separation string dst passed to XofTurboShake128 MUST NOT
   exceed 65535 bytes.

   class XofTurboShake128(Xof):
       """XOF wrapper for TurboSHAKE128."""

       # Associated parameters
       SEED_SIZE = 32

       def __init__(self, seed: bytes, dst: bytes, binder: bytes):
           self.l = 0
           self.m = \
               to_le_bytes(len(dst), 2) + dst \
               to_le_bytes(len(seed), 1) + seed + \
               binder

       def next(self, length: int) -> bytes:
           self.l += length

           # Function `TurboSHAKE128(M, D, L)` is as defined in
           # Section 2.2 of [TurboSHAKE].
           #
           # Implementation note: rather than re-generate the output
           # stream each time `next()` is invoked, most implementations
           # of TurboSHAKE128 will expose an "absorb-then-squeeze" API
           # that allows stateful handling of the stream.
           stream = TurboSHAKE128(self.m, 1, self.l)
           return stream[-length:]

               Figure 13: Definition of XOF XofTurboShake128.

Barnes, et al.            Expires 7 April 2025                 [Page 49]
Internet-Draft                    VDAF                      October 2024

6.2.2.  XofFixedKeyAes128

   While XofTurboShake128 as described above can be securely used in all
   cases where a XOF is needed in the VDAFs described in this document,
   there are some cases where a more efficient instantiation based on
   fixed-key AES is possible.  For now, this is limited to the XOF used
   inside the Idpf Section 8.1 implementation in Poplar1 Section 8.3.
   It is NOT RECOMMENDED to use this XOF anywhere else.  The length of
   the domain separation string dst passed to XofFixedKeyAes128 MUST NOT
   exceed 65535 bytes.  See Section 9 for a more detailed discussion.

   class XofFixedKeyAes128(Xof):
       """
       XOF based on a circular collision-resistant hash function from
       fixed-key AES.
       """

       # Associated parameters
       SEED_SIZE = 16

       def __init__(self, seed: bytes, dst: bytes, binder: bytes):
           if len(seed) != self.SEED_SIZE:
               raise ValueError("incorrect seed size")

           self.length_consumed = 0

           # Use TurboSHAKE128 to derive a key from the binder string
           # and domain separation tag. Note that the AES key does not
           # need to be kept secret from any party. However, when used
           # with an IDPF, we require the binder to be a random nonce.
           #
           # Implementation note: this step can be cached across XOF
           # evaluations with many different seeds.
           dst_length = to_le_bytes(len(dst), 2)
           self.fixed_key = TurboSHAKE128(
               dst_length + dst + binder,
               2,
               16,
           )
           self.seed = seed

       def next(self, length: int) -> bytes:
           offset = self.length_consumed % 16
           new_length = self.length_consumed + length
           block_range = range(
               self.length_consumed // 16,
               new_length // 16 + 1
           )

Barnes, et al.            Expires 7 April 2025                 [Page 50]
Internet-Draft                    VDAF                      October 2024

           self.length_consumed = new_length

           hashed_blocks = [
               self.hash_block(xor(self.seed, to_le_bytes(i, 16)))
               for i in block_range
           ]
           return concat(hashed_blocks)[offset:offset+length]

       def hash_block(self, block: bytes) -> bytes:
           """
           The multi-instance tweakable circular correlation-robust hash
           function of [GKWWY20] (Section 4.2). The tweak here is the
           key that stays constant for all XOF evaluations of the same
           Client, but differs between Clients.

           Function `AES128(key, block)` is the AES-128 blockcipher.
           """
           lo, hi = block[:8], block[8:]
           sigma_block = concat([hi, xor(hi, lo)])
           return xor(AES128(self.fixed_key, sigma_block), sigma_block)

6.2.3.  The Domain Separation Tag and Binder String

   XOFs are used to map a seed to a finite domain, e.g., a fresh seed or
   a vector of field elements.  To ensure domain separation, the
   derivation is needs to be bound to some distinguished domain
   separation tag.  The domain separation tag encodes the following
   values:

   1.  The document version (i.e.,VERSION);

   2.  The "class" of the algorithm using the output (e.g., VDAF);

   3.  A unique identifier for the algorithm; and

   4.  Some indication of how the output is used (e.g., for deriving the
       measurement shares in Prio3 Section 7).

   The following algorithm is used in the remainder of this document in
   order to format the domain separation tag:

Barnes, et al.            Expires 7 April 2025                 [Page 51]
Internet-Draft                    VDAF                      October 2024

   def format_dst(algo_class: int,
                  algo: int,
                  usage: int) -> bytes:
       """
       Format XOF domain separation tag for use within a (V)DAF.

       Pre-conditions:

           - `algo_class` in `range(0, 2 ** 8)`
           - `algo` in `range(0, 2 ** 32)`
           - `usage` in `range(0, 2 ** 16)`
       """
       return concat([
           to_be_bytes(VERSION, 1),
           to_be_bytes(algo_class, 1),
           to_be_bytes(algo, 4),
           to_be_bytes(usage, 2),
       ])

   It is also sometimes necessary to bind the output to some ephemeral
   value that multiple parties need to agree on.  We call this input the
   "binder string".

7.  Prio3

   This section describes Prio3, a VDAF for Prio [CGB17].  Prio is
   suitable for a wide variety of aggregation functions, including (but
   not limited to) sum, mean, standard deviation, estimation of
   quantiles (e.g., median), and linear regression.  In fact, the scheme
   described in this section is compatible with any aggregation function
   that has the following structure:

   *  Each measurement is encoded as a vector over some finite field.

   *  Measurement validity is determined by an arithmetic circuit
      evaluated over the encoded measurement.  (An "arithmetic circuit"
      is a function comprised of arithmetic operations in the field.)
      The circuit's output is a single field element: if zero, then the
      measurement is said to be "valid"; otherwise, if the output is
      non-zero, then the measurement is said to be "invalid".

   *  The aggregate result is obtained by summing up the encoded
      measurement vectors and computing some function of the sum.

   At a high level, Prio3 distributes this computation as follows.  Each
   Client first shards its measurement by first encoding it, then
   splitting the vector into secret shares and sending a share to each
   Aggregator.  Next, in the preparation phase, the Aggregators carry

Barnes, et al.            Expires 7 April 2025                 [Page 52]
Internet-Draft                    VDAF                      October 2024

   out a multi-party computation to determine if their shares correspond
   to a valid measurement (as determined by the arithmetic circuit).
   This computation involves a "proof" of validity generated by the
   Client.  Next, each Aggregator sums up its shares locally.  Finally,
   the Collector sums up the aggregate shares and computes the aggregate
   result.

   This VDAF does not have an aggregation parameter.  Instead, the
   output share is derived from the measurement share by applying a
   fixed map.  See Section 8 for an example of a VDAF that makes
   meaningful use of the aggregation parameter.

   The core component of Prio3 is a "Fully Linear Proof (FLP)" system.
   Introduced by [BBCGGI19], the FLP encapsulates the functionality
   required for encoding and validating measurements.  Prio3 can be
   thought of as a transformation of a particular class of FLPs into a
   VDAF.

   The remainder of this section is structured as follows.  The syntax
   for FLPs is described in Section 7.1.  The generic transformation of
   an FLP into Prio3 is specified in Section 7.2.  Next, a concrete FLP
   suitable for any validity circuit is specified in Section 7.3.
   Finally, instantiations of Prio3 for various types of measurements
   are specified in Section 7.4.  Test vectors can be found in
   Appendix "Test Vectors".

7.1.  Fully Linear Proof (FLP) Systems

   Conceptually, an FLP is a two-party protocol executed by a prover and
   a verifier.  In actual use, however, the prover's computation is
   carried out by the Client, and the verifier's computation is
   distributed among the Aggregators.  The Client generates a "proof" of
   its measurement's validity and distributes shares of the proof to the
   Aggregators.  Each Aggregator then performs some computation on its
   measurement share and proof share locally and sends the result to the
   other Aggregators.  Combining the exchanged messages allows each
   Aggregator to decide if it holds a share of a valid measurement.
   (See Section 7.2 for details.)

   As usual, we will describe the interface implemented by a concrete
   FLP in terms of an abstract base class Flp that specifies the set of
   methods and parameters a concrete FLP must provide.

   The parameters provided by a concrete FLP are listed in Table 4.

Barnes, et al.            Expires 7 April 2025                 [Page 53]
Internet-Draft                    VDAF                      October 2024

       +================+==========================================+
       | Parameter      | Description                              |
       +================+==========================================+
       | PROVE_RAND_LEN | Length of the prover randomness, the     |
       |                | number of random field elements consumed |
       |                | by the prover when generating a proof    |
       +----------------+------------------------------------------+
       | QUERY_RAND_LEN | Length of the query randomness, the      |
       |                | number of random field elements consumed |
       |                | by the verifier                          |
       +----------------+------------------------------------------+
       | JOINT_RAND_LEN | Length of the joint randomness, the      |
       |                | number of random field elements consumed |
       |                | by both the prover and verifier          |
       +----------------+------------------------------------------+
       | MEAS_LEN       | Length of the encoded measurement        |
       |                | (Section 7.1.1)                          |
       +----------------+------------------------------------------+
       | OUTPUT_LEN     | Length of the aggregatable output        |
       |                | (Section 7.1.1)                          |
       +----------------+------------------------------------------+
       | PROOF_LEN      | Length of the proof                      |
       +----------------+------------------------------------------+
       | VERIFIER_LEN   | Length of the verifier message generated |
       |                | by querying the measurement and proof    |
       +----------------+------------------------------------------+
       | Measurement    | Type of the measurement                  |
       +----------------+------------------------------------------+
       | AggResult      | Type of the aggregate result             |
       +----------------+------------------------------------------+
       | field          | Class object for the field (Section 6.1) |
       +----------------+------------------------------------------+

          Table 4: Constants and types defined by a concrete FLP.

   An FLP specifies the following algorithms for generating and
   verifying proofs of validity (encoding is described below in
   Section 7.1.1):

   *  flp.prove(meas: list[F], prove_rand: list[F], joint_rand: list[F])
      -> list[F] is the deterministic proof-generation algorithm run by
      the prover.  Its inputs are the encoded measurement, the "prover
      randomness" prove_rand, and the "joint randomness" joint_rand.
      The prover randomness is used only by the prover, but the joint
      randomness is shared by both the prover and verifier.

Barnes, et al.            Expires 7 April 2025                 [Page 54]
Internet-Draft                    VDAF                      October 2024

   *  flp.query(meas: list[F], proof: list[F], query_rand: list[F],
      joint_rand: list[F], num_shares: int) -> list[F] is the query-
      generation algorithm run by the verifier.  This is used to "query"
      the measurement and proof.  The result of the query (i.e., the
      output of this function) is called the "verifier message".  In
      addition to the measurement and proof, this algorithm takes as
      input the query randomness query_rand and the joint randomness
      joint_rand.  The former is used only by the verifier. num_shares
      specifies how many shares were generated.

   *  flp.decide(verifier: list[F]) -> bool is the deterministic
      decision algorithm run by the verifier.  It takes as input the
      verifier message and outputs a boolean indicating if the
      measurement from which it was generated is valid.

   Our application requires that the FLP is "fully linear" in the sense
   defined in [BBCGGI19].  As a practical matter, what this property
   implies is that, when run on a share of the measurement and proof,
   the query-generation algorithm outputs a share of the verifier
   message.  Furthermore, the privacy property of the FLP system ensures
   that the verifier message reveals nothing about the measurement other
   than whether it is valid.  Therefore, to decide if a measurement is
   valid, the Aggregators will run the query-generation algorithm
   locally, exchange verifier shares, combine them to recover the
   verifier message, and run the decision algorithm.

   The query-generation algorithm includes a parameter num_shares that
   specifies the number of shares that were generated.  If these data
   are not secret shared, then num_shares == 1.  This parameter is
   useful for certain FLP constructions.  For example, the FLP in
   Section 7.3 is defined in terms of an arithmetic circuit; when the
   circuit contains constants, it is sometimes necessary to normalize
   those constants to ensure that the circuit's output, when run on a
   valid measurement, is the same regardless of the number of shares.

   An FLP is executed by the prover and verifier as follows:

Barnes, et al.            Expires 7 April 2025                 [Page 55]
Internet-Draft                    VDAF                      October 2024

   def run_flp(
           flp: Flp[Measurement, AggResult, F],
           meas: list[F],
           num_shares: int) -> bool:
       """Run the FLP on an encoded measurement."""

       joint_rand = flp.field.rand_vec(flp.JOINT_RAND_LEN)
       prove_rand = flp.field.rand_vec(flp.PROVE_RAND_LEN)
       query_rand = flp.field.rand_vec(flp.QUERY_RAND_LEN)

       # Prover generates the proof.
       proof = flp.prove(meas, prove_rand, joint_rand)

       # Shard the measurement and the proof.
       meas_shares = additive_secret_share(
           meas,
           num_shares,
           flp.field,
       )
       proof_shares = additive_secret_share(
           proof,
           num_shares,
           flp.field,
       )

       # Verifier queries the meas shares and proof shares.
       verifier_shares = [
           flp.query(
               meas_share,
               proof_share,
               query_rand,
               joint_rand,
               num_shares,
           )
           for meas_share, proof_share in zip(meas_shares, proof_shares)
       ]

       # Combine the verifier shares into the verifier.
       verifier = flp.field.zeros(len(verifier_shares[0]))
       for verifier_share in verifier_shares:
           verifier = vec_add(verifier, verifier_share)

       # Verifier decides if the measurement is valid.
       return flp.decide(verifier)

                      Figure 14: Execution of an FLP.

Barnes, et al.            Expires 7 April 2025                 [Page 56]
Internet-Draft                    VDAF                      October 2024

   The proof system is constructed so that, if meas is valid, then
   run_flp(flp, meas, 1) always returns True.  On the other hand, if
   meas is invalid, then as long as joint_rand and query_rand are
   generated uniform randomly, the output is False with high
   probability.  False positives are possible: there is a small
   probability that a verifier accepts an invalid input as valid.  An
   FLP is said to be "sound" if this probability is sufficiently small.
   The soundness of the FLP depends on a variety of parameters, like the
   length of the input and the size of the field.  See Section 7.3 for
   details.

   Note that soundness of an FLP system is not the same as robustness
   for a VDAF In particular, soundness of the FLP is necessary, but
   insufficient for robusntess of Prio3 (Section 7).  See Section 9.6
   for details.

   We remark that [BBCGGI19] defines a much larger class of fully linear
   proof systems than we consider here.  In particular, what is called
   an "FLP" here is called a 1.5-round, public-coin, interactive oracle
   proof system in their paper.

7.1.1.  Encoding the Input

   The type of measurement being aggregated is defined by the FLP.
   Hence, the FLP also specifies a method of encoding raw measurements
   as a vector of field elements:

   *  flp.encode(measurement: Measurement) -> list[F] encodes a raw
      measurement as a vector of field elements.  The return value MUST
      be of length MEAS_LEN.

   For some FLPs, the encoded measurement also includes redundant field
   elements that are useful for checking the proof, but which are not
   needed after the proof has been checked.  An example is the "integer
   sum" data type from [CGB17] in which an integer in range(2**k) is
   encoded as a vector of k field elements, each representing a bit of
   the integer (this type is also defined in Section 7.4.2).  After
   consuming this vector, all that is needed is the integer it
   represents.  Thus the FLP defines an algorithm for truncating the
   encoded measurement to the length of the aggregated output:

   *  flp.truncate(meas: list[F]) -> list[F] maps an encoded measurement
      (e.g., the bit-encoding of the measurement) to an aggregatable
      output (e.g., the singleton vector containing the measurement).
      The length of the input MUST be MEAS_LEN and the length of the
      output MUST be OUTPUT_LEN.

Barnes, et al.            Expires 7 April 2025                 [Page 57]
Internet-Draft                    VDAF                      October 2024

   Once the aggregate shares have been computed and combined together,
   their sum can be converted into the aggregate result.  This could be
   a projection from the FLP's field to the integers, or it could
   include additional post-processing.

   *  flp.decode(output: list[F], num_measurements: int) -> AggResult
      maps a sum of aggregate shares to an aggregate result.

      Pre-conditions:

      -  The length of output MUST be OUTPUT_LEN.

      -  num_measurements MUST equal the number of measurements that
         contributed to the output.

   We remark that, taken together, these three functionalities
   correspond roughly to the notion of "Affine-aggregatable encodings
   (AFEs)" from [CGB17].

7.1.2.  Multiple Proofs

   It is sometimes desirable to generate and verify multiple independent
   proofs for the same input.  First, this improves the soundness of the
   proof system without having to change any of its parameters.  Second,
   it allows a smaller field to be used (e.g., replace Field128 with
   Field64, see Section 7.3) without sacrificing soundness.  Generally,
   choosing a smaller field can significantly reduce communication cost.
   (This is a trade-off, of course, since generating and verifying more
   proofs requires more time.)  Given these benefits, this feature is
   implemented by Prio3 (Section 7).

   To generate these proofs for a specific measurement, the prover calls
   flp.prove multiple times, each time using an independently generated
   prover and joint randomness string.  The verifier checks each proof
   independently, each time with an independently generated query
   randomness string.  It accepts the measurement only if all the
   decision algorithm accepts on each proof.

   See Section 9.6 below for discussions on choosing the right number of
   proofs.

7.2.  Construction

   This section specifies Prio3, an implementation of the Vdaf interface
   (Section 5).  It has three generic parameters: an NttField ({{ntt-
   field}}), anFlp({{flp}}) and aXof({{xof}}).  It also has an
   associated constant,PROOFS, with a value inrange(1, 256)`, denoting
   the number of FLPs generated by the Client (Section 7.1.2).

Barnes, et al.            Expires 7 April 2025                 [Page 58]
Internet-Draft                    VDAF                      October 2024

   The associated constants and types required by the Vdaf interface are
   defined in Table 5.  The methods required for sharding, preparation,
   aggregation, and unsharding are described in the remaining
   subsections.  These methods refer to constants enumerated in Table 6.

    +=================+==============================================+
    | Parameter       | Value                                        |
    +=================+==============================================+
    | VERIFY_KEY_SIZE | Xof.SEED_SIZE                                |
    +-----------------+----------------------------------------------+
    | RAND_SIZE       | Xof.SEED_SIZE * SHARES if flp.JOINT_RAND_LEN |
    |                 | == 0 else 2 * Xof.SEED_SIZE * SHARES         |
    +-----------------+----------------------------------------------+
    | NONCE_SIZE      | 16                                           |
    +-----------------+----------------------------------------------+
    | ROUNDS          | 1                                            |
    +-----------------+----------------------------------------------+
    | SHARES          | in [2, 256)                                  |
    +-----------------+----------------------------------------------+
    | Measurement     | Flp.Measurement                              |
    +-----------------+----------------------------------------------+
    | AggParam        | None                                         |
    +-----------------+----------------------------------------------+
    | PublicShare     | Optional[list[bytes]]                        |
    +-----------------+----------------------------------------------+
    | InputShare      | tuple[list[F], list[F], Optional[bytes]] |   |
    |                 | tuple[bytes, Optional[bytes]]                |
    +-----------------+----------------------------------------------+
    | OutShare        | list[F]                                      |
    +-----------------+----------------------------------------------+
    | AggShare        | list[F]                                      |
    +-----------------+----------------------------------------------+
    | AggResult       | Flp.AggResult                                |
    +-----------------+----------------------------------------------+
    | PrepState       | tuple[list[F], Optional[bytes]]              |
    +-----------------+----------------------------------------------+
    | PrepShare       | tuple[list[F], Optional[bytes]]              |
    +-----------------+----------------------------------------------+
    | PrepMessage     | Optional[bytes]                              |
    +-----------------+----------------------------------------------+

                   Table 5: VDAF parameters for Prio3.

Barnes, et al.            Expires 7 April 2025                 [Page 59]
Internet-Draft                    VDAF                      October 2024

                  +=============================+=======+
                  | Variable                    | Value |
                  +=============================+=======+
                  | USAGE_MEAS_SHARE: int       | 1     |
                  +-----------------------------+-------+
                  | USAGE_PROOF_SHARE: int      | 2     |
                  +-----------------------------+-------+
                  | USAGE_JOINT_RANDOMNESS: int | 3     |
                  +-----------------------------+-------+
                  | USAGE_PROVE_RANDOMNESS: int | 4     |
                  +-----------------------------+-------+
                  | USAGE_QUERY_RANDOMNESS: int | 5     |
                  +-----------------------------+-------+
                  | USAGE_JOINT_RAND_SEED: int  | 6     |
                  +-----------------------------+-------+
                  | USAGE_JOINT_RAND_PART: int  | 7     |
                  +-----------------------------+-------+

                     Table 6: Constants used by Prio3.

7.2.1.  Sharding

   Recall from Section 7.1 that the FLP syntax calls for "joint
   randomness" shared by the prover (i.e., the Client) and the verifier
   (i.e., the Aggregators).  VDAFs have no such notion.  Instead, the
   Client derives the joint randomness from its measurement in a way
   that allows the Aggregators to reconstruct it from their shares.
   (This idea is based on the Fiat-Shamir heuristic and is described in
   Section 6.2.3 of [BBCGGI19].)

   The sharding algorithm involves the following steps:

   1.  Encode the Client's measurement for the FLP

   2.  Shard the measurement into a sequence of measurement shares

   3.  Derive the joint randomness from the measurement shares and nonce

   4.  Run the FLP proof-generation algorithm using the derived joint
       randomness

   5.  Shard the proof into a sequence of proof shares

   6.  Return the public share, consisting of the joint randomness
       parts, and the input shares, each consisting of the measurement
       share, proof share, and blind of one of the Aggregators

Barnes, et al.            Expires 7 April 2025                 [Page 60]
Internet-Draft                    VDAF                      October 2024

   As described in Section 7.1.2, the soundness of the FLP can be
   amplified by generating and verifying multiple FLPs.  (This in turn
   improves the robustness of Prio3.)  To support this, in Prio3:

   *  In step 3, derive as much joint randomness as required by PROOFS
      proofs

   *  Repeat step 4 PROOFS times, each time with a unique joint
      randomness

   Depending on the FLP, joint randomness may not be required.  In
   particular, when flp.JOINT_RAND_LEN == 0, the Client does not derive
   the joint randomness (Step 3).  The sharding algorithm is specified
   below.

   def shard(
           self,
           ctx: bytes,
           measurement: Measurement,
           nonce: bytes,
           rand: bytes) -> tuple[
               Optional[list[bytes]],
               list[Prio3InputShare]]:
       if len(nonce) != self.NONCE_SIZE:
           raise ValueError("incorrect nonce size")
       if len(rand) != self.RAND_SIZE:
           raise ValueError("incorrect size of random bytes argument")

       l = self.xof.SEED_SIZE
       seeds = [rand[i:i + l] for i in range(0, self.RAND_SIZE, l)]

       meas = self.flp.encode(measurement)
       if self.flp.JOINT_RAND_LEN > 0:
           return self.shard_with_joint_rand(ctx, meas, nonce, seeds)
       else:
           return self.shard_without_joint_rand(ctx, meas, seeds)

             Figure 15: Input-distribution algorithm for Prio3.

   It starts by splitting the randomness into seeds.  It then encodes
   the measurement as prescribed by the FLP and calls one of two
   methods, depending on whether joint randomness is required by the
   FLP.  The methods are defined in the subsections below.

7.2.1.1.  FLPs without joint randomness

   The following method is used for FLPs that do not require joint
   randomness, i.e., when flp.JOINT_RAND_LEN == 0:

Barnes, et al.            Expires 7 April 2025                 [Page 61]
Internet-Draft                    VDAF                      October 2024

   def shard_without_joint_rand(
           self,
           ctx: bytes,
           meas: list[F],
           seeds: list[bytes]) -> tuple[
               Optional[list[bytes]],
               list[Prio3InputShare[F]]]:
       k_helper_shares, seeds = front(self.SHARES - 1, seeds)
       (k_prove,), seeds = front(1, seeds)

       # Shard the encoded measurement into shares.
       leader_meas_share = meas
       for j in range(self.SHARES - 1):
           leader_meas_share = vec_sub(
               leader_meas_share,
               self.helper_meas_share(ctx, j + 1, k_helper_shares[j]),
           )

       # Generate and shard each proof into shares.
       prove_rands = self.prove_rands(ctx, k_prove)
       leader_proofs_share = []
       for _ in range(self.PROOFS):
           prove_rand, prove_rands = front(
               self.flp.PROVE_RAND_LEN, prove_rands)
           leader_proofs_share += self.flp.prove(meas, prove_rand, [])
       for j in range(self.SHARES - 1):
           leader_proofs_share = vec_sub(
               leader_proofs_share,
               self.helper_proofs_share(
                   ctx,
                   j + 1,
                   k_helper_shares[j],
               ),
           )

       # Each Aggregator's input share contains its measurement share
       # and share of proof(s).
       input_shares: list[Prio3InputShare[F]] = []
       input_shares.append((
           leader_meas_share,
           leader_proofs_share,
           None,
       ))
       for j in range(self.SHARES - 1):
           input_shares.append((
               k_helper_shares[j],
               None,
           ))

Barnes, et al.            Expires 7 April 2025                 [Page 62]
Internet-Draft                    VDAF                      October 2024

       return (None, input_shares)

    Figure 16: Sharding an encoded measurement without joint randomness.

   The steps in this method are as follows:

   1.  Shard the encoded measurement into shares

   2.  Generate and shard each proof into shares

   3.  Encode each measurement and shares of each proof into an input
       share

   Notice that only one pair of measurement and proof(s) share (called
   the "leader" shares above) are vectors of field elements.  The other
   shares (called the "helper" shares) are represented instead by an XOF
   seed, which is expanded into vectors of field elements.

   The methods on Prio3 for deriving the prover randomness, measurement
   shares, and proof shares and the methods for encoding the input
   shares are defined in Section 7.2.6.

7.2.1.2.  FLPs with joint randomness

   The following method is used for FLPs that require joint randomness,
   i.e., for which flp.JOINT_RAND_LEN > 0:

   def shard_with_joint_rand(
           self,
           ctx: bytes,
           meas: list[F],
           nonce: bytes,
           seeds: list[bytes]) -> tuple[
               Optional[list[bytes]],
               list[Prio3InputShare[F]]]:
       k_helper_seeds, seeds = front((self.SHARES - 1) * 2, seeds)
       k_helper_shares = [
           k_helper_seeds[i]
           for i in range(0, (self.SHARES - 1) * 2, 2)
       ]
       k_helper_blinds = [
           k_helper_seeds[i]
           for i in range(1, (self.SHARES - 1) * 2, 2)
       ]
       (k_leader_blind, k_prove), seeds = front(2, seeds)

       # Shard the encoded measurement into shares and compute the
       # joint randomness parts.

Barnes, et al.            Expires 7 April 2025                 [Page 63]
Internet-Draft                    VDAF                      October 2024

       leader_meas_share = meas
       k_joint_rand_parts = []
       for j in range(self.SHARES - 1):
           helper_meas_share = self.helper_meas_share(
               ctx, j + 1, k_helper_shares[j])
           leader_meas_share = vec_sub(leader_meas_share,
                                       helper_meas_share)
           k_joint_rand_parts.append(self.joint_rand_part(
               ctx, j + 1, k_helper_blinds[j],
               helper_meas_share, nonce))
       k_joint_rand_parts.insert(0, self.joint_rand_part(
           ctx, 0, k_leader_blind, leader_meas_share, nonce))

       # Generate the proof and shard it into proof shares.
       prove_rands = self.prove_rands(ctx, k_prove)
       joint_rands = self.joint_rands(
           ctx, self.joint_rand_seed(ctx, k_joint_rand_parts))
       leader_proofs_share = []
       for _ in range(self.PROOFS):
           prove_rand, prove_rands = front(
               self.flp.PROVE_RAND_LEN, prove_rands)
           joint_rand, joint_rands = front(
               self.flp.JOINT_RAND_LEN, joint_rands)
           leader_proofs_share += self.flp.prove(
               meas,
               prove_rand,
               joint_rand,
           )
       for j in range(self.SHARES - 1):
           leader_proofs_share = vec_sub(
               leader_proofs_share,
               self.helper_proofs_share(
                   ctx,
                   j + 1,
                   k_helper_shares[j],
               ),
           )

       # Each Aggregator's input share contains its measurement share,
       # share of proof(s), and blind. The public share contains the
       # Aggregators' joint randomness parts.
       input_shares: list[Prio3InputShare[F]] = []
       input_shares.append((
           leader_meas_share,
           leader_proofs_share,
           k_leader_blind,
       ))
       for j in range(self.SHARES - 1):

Barnes, et al.            Expires 7 April 2025                 [Page 64]
Internet-Draft                    VDAF                      October 2024

           input_shares.append((
               k_helper_shares[j],
               k_helper_blinds[j],
           ))
       return (k_joint_rand_parts, input_shares)

     Figure 17: Sharding an encoded measurement with joint randomness.

   The difference between this procedure and previous one is that here
   we compute joint randomnesses joint_rands, split it into multiple
   joint_rand, and pass each joint_rand to the proof generationg
   algorithm.  (In Figure 16 the joint randomness is the empty vector,
   [].)  This requires generating an additional value, called the
   "blind", that is incorporated into each input share.

   The joint randomness computation involves the following steps:

   1.  Compute a "joint randomness part" from each measurement share and
       blind

   2.  Compute a "joint randomness seed" from the joint randomness parts

   3.  Compute the joint randomness for each proof evaluation from the
       joint randomness seed

   This three-step process is designed to ensure that the joint
   randomness does not leak the measurement to the Aggregators while
   preventing a malicious Client from tampering with the joint
   randomness in a way that allows it to break robustness.  To bootstrap
   the required check, the Client encodes the joint randomness parts in
   the public share.  (See Section 7.2.2 for details.)

   The methods used in this computation are defined in Section 7.2.6.

7.2.2.  Preparation

   This section describes the process of recovering output shares from
   the input shares.  The high-level idea is that each Aggregator first
   queries its measurement and share of proof(s) locally, then exchanges
   its share of verifier(s) with the other Aggregators.  The shares of
   verifier(s) are then combined into the verifier message(s) used to
   decide whether to accept.

   In addition, for FLPs that require joint randomness, the Aggregators
   must ensure that they have all used the same joint randomness for the
   query-generation algorithm.  To do so, they collectively re-derive
   the joint randomness from their measurement shares just as the Client
   did during sharding.

Barnes, et al.            Expires 7 April 2025                 [Page 65]
Internet-Draft                    VDAF                      October 2024

   In order to avoid extra round of communication, the Client sends each
   Aggregator a "hint" consisting of the joint randomness parts.  This
   leaves open the possibility that the Client cheated by, say, forcing
   the Aggregators to use joint randomness that biases the proof check
   procedure some way in its favor.  To mitigate this, the Aggregators
   also check that they have all computed the same joint randomness seed
   before accepting their output shares.  To do so, they exchange their
   parts of the joint randomness along with their shares of verifier(s).

   Implementation note: the preparation state for Prio3 includes the
   output share that will be released once preparation is complete.  In
   some situations, it may be necessary for the Aggregator to encode
   this state as bytes and store it for retrieval later on.  For all but
   the first Aggregator, it is possible to save storage by storing the
   measurement share rather than output share itself.  It is relatively
   inexpensive to expand this seed into the input share, then truncate
   the input share to get the output share.

   The definitions of constants and a few auxiliary functions are
   defined in Section 7.2.6.

   def prep_init(
           self,
           verify_key: bytes,
           ctx: bytes,
           agg_id: int,
           _agg_param: None,
           nonce: bytes,
           public_share: Optional[list[bytes]],
           input_share: Prio3InputShare[F]) -> tuple[
               Prio3PrepState[F],
               Prio3PrepShare[F]]:
       k_joint_rand_parts = public_share
       (meas_share, proofs_share, k_blind) = \
           self.expand_input_share(ctx, agg_id, input_share)
       out_share = self.flp.truncate(meas_share)

       # Compute the joint randomness.
       joint_rand: list[F] = []
       k_corrected_joint_rand, k_joint_rand_part = None, None
       if self.flp.JOINT_RAND_LEN > 0:
           assert k_blind is not None
           assert k_joint_rand_parts is not None
           k_joint_rand_part = self.joint_rand_part(
               ctx, agg_id, k_blind, meas_share, nonce)
           k_joint_rand_parts[agg_id] = k_joint_rand_part
           k_corrected_joint_rand = self.joint_rand_seed(
               ctx, k_joint_rand_parts)

Barnes, et al.            Expires 7 April 2025                 [Page 66]
Internet-Draft                    VDAF                      October 2024

           joint_rands = self.joint_rands(ctx, k_corrected_joint_rand)

       # Query the measurement and proof share.
       query_rands = self.query_rands(verify_key, ctx, nonce)
       verifiers_share = []
       for _ in range(self.PROOFS):
           proof_share, proofs_share = front(
               self.flp.PROOF_LEN, proofs_share)
           query_rand, query_rands = front(
               self.flp.QUERY_RAND_LEN, query_rands)
           if self.flp.JOINT_RAND_LEN > 0:
               joint_rand, joint_rands = front(
                   self.flp.JOINT_RAND_LEN, joint_rands)
           verifiers_share += self.flp.query(
               meas_share,
               proof_share,
               query_rand,
               joint_rand,
               self.SHARES,
           )

       prep_state = (out_share, k_corrected_joint_rand)
       prep_share = (verifiers_share, k_joint_rand_part)
       return (prep_state, prep_share)

   def prep_next(
       self,
       _ctx: bytes,
       prep_state: Prio3PrepState[F],
       prep_msg: Optional[bytes]
   ) -> tuple[Prio3PrepState[F], Prio3PrepShare[F]] | list[F]:
       k_joint_rand = prep_msg
       (out_share, k_corrected_joint_rand) = prep_state

       # If joint randomness was used, check that the value computed by
       # the Aggregators matches the value indicated by the Client.
       if k_joint_rand != k_corrected_joint_rand:
           raise ValueError('joint randomness check failed')

       return out_share

   def prep_shares_to_prep(
           self,
           ctx: bytes,
           _agg_param: None,
           prep_shares: list[Prio3PrepShare[F]]) -> Optional[bytes]:
       # Unshard the verifier shares into the verifier message.
       verifiers = self.flp.field.zeros(

Barnes, et al.            Expires 7 April 2025                 [Page 67]
Internet-Draft                    VDAF                      October 2024

           self.flp.VERIFIER_LEN * self.PROOFS)
       k_joint_rand_parts = []
       for (verifiers_share, k_joint_rand_part) in prep_shares:
           verifiers = vec_add(verifiers, verifiers_share)
           if self.flp.JOINT_RAND_LEN > 0:
               assert k_joint_rand_part is not None
               k_joint_rand_parts.append(k_joint_rand_part)

       # Verify that each proof is well-formed and input is valid
       for _ in range(self.PROOFS):
           verifier, verifiers = front(self.flp.VERIFIER_LEN, verifiers)
           if not self.flp.decide(verifier):
               raise ValueError('proof verifier check failed')

       # Combine the joint randomness parts computed by the
       # Aggregators into the true joint randomness seed. This is
       # used in the last step.
       k_joint_rand = None
       if self.flp.JOINT_RAND_LEN > 0:
           k_joint_rand = self.joint_rand_seed(ctx, k_joint_rand_parts)
       return k_joint_rand

                  Figure 18: Preparation state for Prio3.

7.2.3.  Validity of Aggregation Parameters

   Every input share MUST only be used once, regardless of the
   aggregation parameters used.

   def is_valid(
           self,
           _agg_param: None,
           previous_agg_params: list[None]) -> bool:
       """
       Checks if `previous_agg_params` is empty, as input shares in
       Prio3 may only be used once.
       """
       return len(previous_agg_params) == 0

          Figure 19: Validity of aggregation parameters for Prio3.

7.2.4.  Aggregation

   Aggregating a set of output shares is simply a matter of adding up
   the vectors element-wise.

Barnes, et al.            Expires 7 April 2025                 [Page 68]
Internet-Draft                    VDAF                      October 2024

   def aggregate(
           self,
           _agg_param: None,
           out_shares: list[list[F]]) -> list[F]:
       agg_share = self.flp.field.zeros(self.flp.OUTPUT_LEN)
       for out_share in out_shares:
           agg_share = vec_add(agg_share, out_share)
       return agg_share

                Figure 20: Aggregation algorithm for Prio3.

7.2.5.  Unsharding

   To unshard a set of aggregate shares, the Collector first adds up the
   vectors element-wise.  It then converts each element of the vector
   into an integer.

   def unshard(
           self,
           _agg_param: None,
           agg_shares: list[list[F]],
           num_measurements: int) -> AggResult:
       agg = self.flp.field.zeros(self.flp.OUTPUT_LEN)
       for agg_share in agg_shares:
           agg = vec_add(agg, agg_share)
       return self.flp.decode(agg, num_measurements)

         Figure 21: Computation of the aggregate result for Prio3.

7.2.6.  Auxiliary Functions

   This section defines a number of auxiliary functions referenced by
   the main algorithms for Prio3 in the preceding sections.

   The following methods are called by the sharding and preparation
   algorithms.

   def helper_meas_share(
           self,
           ctx: bytes,
           agg_id: int,
           k_share: bytes) -> list[F]:
       return self.xof.expand_into_vec(
           self.flp.field,
           k_share,
           self.domain_separation_tag(USAGE_MEAS_SHARE, ctx),
           byte(agg_id),
           self.flp.MEAS_LEN,

Barnes, et al.            Expires 7 April 2025                 [Page 69]
Internet-Draft                    VDAF                      October 2024

       )

   def helper_proofs_share(
           self,
           ctx: bytes,
           agg_id: int,
           k_share: bytes) -> list[F]:
       return self.xof.expand_into_vec(
           self.flp.field,
           k_share,
           self.domain_separation_tag(USAGE_PROOF_SHARE, ctx),
           byte(self.PROOFS) + byte(agg_id),
           self.flp.PROOF_LEN * self.PROOFS,
       )

   def expand_input_share(
           self,
           ctx: bytes,
           agg_id: int,
           input_share: Prio3InputShare[F]) -> tuple[
               list[F],
               list[F],
               Optional[bytes]]:
       if agg_id > 0:
           assert len(input_share) == 2
           (k_share, k_blind) = input_share
           meas_share = self.helper_meas_share(ctx, agg_id, k_share)
           proofs_share = self.helper_proofs_share(ctx, agg_id, k_share)
       else:
           assert len(input_share) == 3
           (meas_share, proofs_share, k_blind) = input_share
       return (meas_share, proofs_share, k_blind)

   def prove_rands(self, ctx: bytes, k_prove: bytes) -> list[F]:
       return self.xof.expand_into_vec(
           self.flp.field,
           k_prove,
           self.domain_separation_tag(USAGE_PROVE_RANDOMNESS, ctx),
           byte(self.PROOFS),
           self.flp.PROVE_RAND_LEN * self.PROOFS,
       )

   def query_rands(
           self,
           verify_key: bytes,
           ctx: bytes,
           nonce: bytes) -> list[F]:
       return self.xof.expand_into_vec(

Barnes, et al.            Expires 7 April 2025                 [Page 70]
Internet-Draft                    VDAF                      October 2024

           self.flp.field,
           verify_key,
           self.domain_separation_tag(USAGE_QUERY_RANDOMNESS, ctx),
           byte(self.PROOFS) + nonce,
           self.flp.QUERY_RAND_LEN * self.PROOFS,
       )

   def joint_rand_part(
           self,
           ctx: bytes,
           agg_id: int,
           k_blind: bytes,
           meas_share: list[F],
           nonce: bytes) -> bytes:
       return self.xof.derive_seed(
           k_blind,
           self.domain_separation_tag(USAGE_JOINT_RAND_PART, ctx),
           byte(agg_id) + nonce + self.flp.field.encode_vec(meas_share),
       )

   def joint_rand_seed(self,
                       ctx: bytes,
                       k_joint_rand_parts: list[bytes]) -> bytes:
       """Derive the joint randomness seed from its parts."""
       return self.xof.derive_seed(
           zeros(self.xof.SEED_SIZE),
           self.domain_separation_tag(USAGE_JOINT_RAND_SEED, ctx),
           concat(k_joint_rand_parts),
       )

   def joint_rands(self,
                   ctx: bytes,
                   k_joint_rand_seed: bytes) -> list[F]:
       """Derive the joint randomness from its seed."""
       return self.xof.expand_into_vec(
           self.flp.field,
           k_joint_rand_seed,
           self.domain_separation_tag(USAGE_JOINT_RANDOMNESS, ctx),
           byte(self.PROOFS),
           self.flp.JOINT_RAND_LEN * self.PROOFS,
       )

7.2.7.  Message Serialization

   This section defines serialization formats for messages exchanged
   over the network while executing Prio3.  It is RECOMMENDED that
   implementations provide serialization methods for them.

Barnes, et al.            Expires 7 April 2025                 [Page 71]
Internet-Draft                    VDAF                      October 2024

   Message structures are defined following Section 3 of [RFC8446]).  In
   the remainder we use S as an alias for prio3.xof.SEED_SIZE and F as
   an alias for prio3.field.ENCODED_SIZE.  XOF seeds are represented as
   follows:

   opaque Prio3Seed[S];

   Field elements are encoded in little-endian byte order (as defined in
   Section 6.1) and represented as follows:

   opaque Prio3Field[F];

7.2.7.1.  Public Share

   The encoding of the public share depends on whether joint randomness
   is required for the underlying FLP (i.e., prio3.flp.JOINT_RAND_LEN >
   0).  If joint randomness is not used, then the public share is the
   empty string.  If joint randomness is used, then the public share
   encodes the joint randomness parts as follows:

   struct {
       Prio3Seed k_joint_rand_parts[S * prio3.SHARES];
   } Prio3PublicShareWithJointRand;

7.2.7.2.  Input share

   Just as for the public share, the encoding of the input shares
   depends on whether joint randomness is used.  If so, then each input
   share includes the Aggregator's blind for generating its joint
   randomness part.

   In addition, the encoding of the input shares depends on which
   aggregator is receiving the message.  If the aggregator ID is 0, then
   the input share includes the full measurement and share of proof(s).
   Otherwise, if the aggregator ID is greater than 0, then the
   measurement and shares of proof(s) are represented by an XOF seed.
   We shall call the former the "Leader" and the latter the "Helpers".

   In total there are four variants of the input share.  When joint
   randomness is not used, the Leader's share is structured as follows:

   struct {
       Prio3Field meas_share[F * prio3.flp.MEAS_LEN];
       Prio3Field proofs_share[F * prio3.flp.PROOF_LEN * prio3.PROOFS];
   } Prio3LeaderShare;

   When joint randomness is not used, the Helpers' shares are structured
   as follows:

Barnes, et al.            Expires 7 April 2025                 [Page 72]
Internet-Draft                    VDAF                      October 2024

   struct {
       Prio3Seed k_share;
   } Prio3HelperShare;

   When joint randomness is used, the Leader's input share is structured
   as follows:

   struct {
       Prio3LeaderShare inner;
       Prio3Seed k_blind;
   } Prio3LeaderShareWithJointRand;

   Finally, when joint randomness is used, the Helpers' shares are
   structured as follows:

   struct {
       Prio3HelperShare inner;
       Prio3Seed k_blind;
   } Prio3HelperShareWithJointRand;

7.2.7.3.  Prep Share

   When joint randomness is not used, the prep share is structured as
   follows:

   struct {
       Prio3Field verifiers_share[
           F * prio3.flp.VERIFIER_LEN * prio3.PROOFS
       ];
   } Prio3PrepShare;

   When joint randomness is used, the prep share includes the
   Aggregator's joint randomness part and is structured as follows:

   struct {
       Prio3Field verifiers_share[
           F * prio3.flp.VERIFIER_LEN * prio3.PROOFS
       ];
       Prio3Seed k_joint_rand_part;
   } Prio3PrepShareWithJointRand;

7.2.7.4.  Prep Message

   When joint randomness is not used, the prep message is the empty
   string.  Otherwise the prep message consists of the joint randomness
   seed computed by the Aggregators:

Barnes, et al.            Expires 7 April 2025                 [Page 73]
Internet-Draft                    VDAF                      October 2024

   struct {
       Prio3Seed k_joint_rand;
   } Prio3PrepMessageWithJointRand;

7.2.7.5.  Aggregation

   Aggregate shares are structured as follows:

   struct {
       Prio3Field agg_share[F * prio3.flp.OUTPUT_LEN];
   } Prio3AggShare;

7.3.  The FLP of [BBCGGI19]

   This section describes an FLP based on the construction from
   [BBCGGI19], Section 4.2.  We begin in Section 7.3.1 with an overview
   of their proof system and some extensions to it.  The construction is
   specified in Section 7.3.3.

7.3.1.  Overview

   Conventional zero-knowledge proof systems involve two parties:

   *  The prover, who holds a measurement and generates a proof of the
      measurement's validity

   *  The verifier who holds an encryption of, or commitment to, the
      measurement and checks the proof

   Our proof system is much the same, except the verifier is split
   across multiple Aggregators, each of which has a secret share of the
   measurement rather than an encryption of it.

   Validity is defined in terms of an arithmetic circuit evaluated over
   the measurement.  The inputs to this circuit are elements of a finite
   field that comprise the encoded measurement; the gates of the circuit
   are multiplication, addition, and subtraction operations; and the
   output of the circuit is a single field element.  If the value is
   zero, then the measurement is deemed valid; otherwise, if the output
   is non-zero, then the measurement is deemed invalid.

   For example, the simplest circuit specified in this document is the
   following (Section 7.4.1):

   C(x) = x * (x-1)

Barnes, et al.            Expires 7 April 2025                 [Page 74]
Internet-Draft                    VDAF                      October 2024

   This circuit contains one subtraction gate (x - 1) and one
   multiplication gate (x * (x - 1)).  Observe that C(x) = 0 if and only
   if x in range(2).

   Our goal is to allow each Aggregator, who holds a secret share of x,
   to correctly compute a secret share of C(x).  This allows the
   Aggregators to determine validity by combining their shares of the
   output.

   Suppose for a moment that the validity circuit C is an affine
   arithmetic circuit, meaning its only operations are addition,
   subtraction, and multiplication-by-constant.  (The circuit above is
   non-affine because it contains a multiplication gate with two non-
   constant inputs.)  Then each Aggregator can compute its share
   locally, since

   C(x_shares[0] + ... + x_shares[SHARES-1]) =
       C(x_shares[0]) + ... + C(x_shares[SHARES-1])

   (Note that, for this equality to hold, it is necessary to scale any
   addition of a constant in the circuit by 1/SHARES.)  However, this is
   not the case if C contains multiplication gates with two non-constant
   inputs.  Thus our goal is to transform these multiplication gates
   into computations on secret shared data that each Aggregator can
   perform locally.

   The key idea is to have the prover construct a polynomial p such that
   p(j) is equal to the output of the j-th multiplication gate.
   Polynomial evaluation is fully linear, which means the coefficients
   of the polynomial can be secret shared in a way that allows each
   Aggregator to compute a share of p(j) for any j.  These intermediate
   results can then be combined with the affine arithmetic operations of
   the validity circuit to produce the final output.

   Applying this idea to the example circuit C above:

   1.  The Client, given its measurement x, constructs the lowest degree
       polynomial p for which p(0) = s and p(1) = x * (x - 1), where s
       is a random blinding value generated by the Client.  (The
       blinding value is to protect the privacy of the measurement.)  It
       sends secret shares of x and the coefficients of p to each of the
       Aggregators.

   2.  Each Aggregator locally computes and broadcasts its share of
       p(1), which is equal to its share of C(x).

Barnes, et al.            Expires 7 April 2025                 [Page 75]
Internet-Draft                    VDAF                      October 2024

   In fact, our FLP is slightly more general than this.  We can replace
   the multiplication gate with any non-affine sub-circuit and apply the
   same idea.  For example, in Section 7.4.2, the validity circuit uses
   the following sub-circuit multiple times:

   Range2(x) = x * (x - 1) = x^2 - x

   (This is the same functionality computed by the example circuit C
   above.)  Here again we can interpolate the lowest degree polynomial p
   for which p(j) is the value of the j-th call to Range2 in the
   validity circuit.  Each validity circuit defines a sub-circuit that
   encapsulates its non-affine arithmetic operations.  We refer to this
   sub-circuit as the "gadget".

   This idea provides the needed functionality, but it has one more
   problem: it is possible for a malicious Client to produce a gadget
   polynomial p that would result in C(x) being computed incorrectly,
   potentially resulting in an invalid measurement being accepted.  To
   prevent this, the Aggregators perform a probabilistic test to check
   that the gadget polynomial was constructed properly.  This "gadget
   test", and the procedure for constructing the polynomial, are
   described in detail in Section 7.3.3.

7.3.1.1.  Extensions

   The FLP described in {#flp-bbcggi19-construction} extends the proof
   system of [BBCGGI19], Section 4.2 in a few ways.

   First, the validity circuit in our construction includes an
   additional, random input (this is the "joint randomness" derived from
   the measurement shares in Prio3; see Section 7.2).  This allows for
   circuit optimizations that trade a small soundness error for a
   shorter proof.  For example, consider a circuit that recognizes the
   set of length-N vectors for which each element is either one or zero.
   A deterministic circuit could be constructed for this language, but
   it would involve a large number of multiplications that would result
   in a large proof.  (See the discussion in [BBCGGI19], Section 5.2 for
   details).  A much shorter proof can be constructed for the following
   randomized circuit:

   C(x, r) = r * Range2(x[0]) + ... + r^N * Range2(x[N-1])

Barnes, et al.            Expires 7 April 2025                 [Page 76]
Internet-Draft                    VDAF                      October 2024

   (Note that this is a special case of [BBCGGI19], Theorem 5.2.)  Here
   x is the length-N input and r is a random field element.  The gadget
   circuit Range2 is the "range-check" polynomial described above, i.e.,
   Range2(x) = x^2 - x.  The idea is that, if x is valid (i.e., each
   x[j] is in range(2)), then the circuit will evaluate to 0 regardless
   of the value of r; but if some x[j] is not in range(2), then output
   will be non-zero with high probability.

   The second extension implemented by our FLP allows the validity
   circuit to contain multiple gadget types.  (This generalization was
   suggested in [BBCGGI19], Remark 4.5.)  This provides additional
   flexibility for designing circuits by allowing multiple, non-affine
   sub-circuits.  For example, the following circuit is allowed:

   C(meas, r) = r * Range2(meas[0]) + ... + r^L * Range2(meas[L-1]) + \
               r^(L+1) * Range3(meas[L]) + ... + r^N * Range3(meas[N-1])

   where Range3(x) = x^3 - 3x^2 + 2x.  This circuit checks that the
   first L inputs are in range(2) and the last N-L inputs are in
   range(3).  The same circuit can be expressed using a simpler gadget,
   namely multiplication, but the resulting proof would be longer.

   Third, rather than interpolate the gadget polynomial at inputs 1, 2,
   ..., j, ..., where j is the j-th invocation of the gadget, we use
   powers of alpha, where alpha is a root of unity for the field.  This
   allows us to construct each gadget polynomial via the number
   theoretic transform [SML24], which is far more efficient than generic
   formulas.  This requires the field to be "NTT-friendly" as defined in
   Section 6.1.2.

   Finally, the validity circuit in our FLP may have any number of
   outputs (at least one).  The input is said to be valid if each of the
   outputs is zero.  To save bandwidth, we take a random linear
   combination of the outputs.  If each of the outputs is zero, then the
   reduced output will be zero; but if one of the outputs is non-zero,
   then the reduced output will be non-zero with high probability.

7.3.2.  Validity Circuits

   The FLP described in Section 7.3.3 is defined in terms of a validity
   circuit Valid that implements the interface described here.

   A concrete Valid defines the following parameters:

Barnes, et al.            Expires 7 April 2025                 [Page 77]
Internet-Draft                    VDAF                      October 2024

        +=================+=======================================+
        | Parameter       | Description                           |
        +=================+=======================================+
        | GADGETS         | A list of gadgets                     |
        +-----------------+---------------------------------------+
        | GADGET_CALLS    | Number of times each gadget is called |
        +-----------------+---------------------------------------+
        | MEAS_LEN        | Length of the measurement             |
        +-----------------+---------------------------------------+
        | OUTPUT_LEN      | Length of the aggregatable output     |
        +-----------------+---------------------------------------+
        | JOINT_RAND_LEN  | Length of the random input            |
        +-----------------+---------------------------------------+
        | EVAL_OUTPUT_LEN | Length of the circuit output          |
        +-----------------+---------------------------------------+
        | Measurement     | The type of measurement               |
        +-----------------+---------------------------------------+
        | AggResult       | Type of the aggregate result          |
        +-----------------+---------------------------------------+
        | field           | Class object for the field            |
        +-----------------+---------------------------------------+

                   Table 7: Validity circuit parameters.

   Each gadget G in GADGETS defines a constant DEGREE that specifies the
   circuit's "arithmetic degree".  This is defined to be the degree of
   the polynomial that computes it.  For example, the Mul circuit in
   Section 7.3.1 is defined by the polynomial Mul(x) = x * x, which has
   degree 2.  Hence, the arithmetic degree of this gadget is 2.

   Each gadget also defines a parameter ARITY that specifies the
   circuit's arity (i.e., the number of input wires).

   Gadgets provide a method to evaluate their circuit on a list of
   inputs, eval().  The inputs can either belong to the validity
   circuit's field, or the polynomial ring over that field.

   A concrete Valid provides the following methods for encoding a
   measurement as an input vector, truncating an input vector to the
   length of an aggregatable output, and converting an aggregated output
   to an aggregate result:

   *  valid.encode(measurement: Measurement) -> list[F] returns a vector
      of length MEAS_LEN representing a measurement.

   *  valid.truncate(meas: list[F]) -> list[F] returns a vector of
      length OUTPUT_LEN representing an aggregatable output.

Barnes, et al.            Expires 7 April 2025                 [Page 78]
Internet-Draft                    VDAF                      October 2024

   *  valid.decode(output: list[F], num_measurements: int) -> AggResult
      returns an aggregate result.

   Finally, the following methods are derived for each concrete Valid:

   def prove_rand_len(self) -> int:
       """Length of the prover randomness."""
       return sum(g.ARITY for g in self.GADGETS)

   def query_rand_len(self) -> int:
       """Length of the query randomness."""
       query_rand_len = len(self.GADGETS)
       if self.EVAL_OUTPUT_LEN > 1:
           query_rand_len += self.EVAL_OUTPUT_LEN
       return query_rand_len

   def proof_len(self) -> int:
       """Length of the proof."""
       length = 0
       for (g, g_calls) in zip(self.GADGETS, self.GADGET_CALLS):
           P = next_power_of_2(1 + g_calls)
           length += g.ARITY + g.DEGREE * (P - 1) + 1
       return length

   def verifier_len(self) -> int:
       """Length of the verifier message."""
       length = 1
       for g in self.GADGETS:
           length += g.ARITY + 1
       return length

   def check_valid_eval(
           self,
           meas: list[F],
           joint_rand: list[F]) -> None:
       if len(meas) != self.MEAS_LEN:
           raise ValueError('incorrect measurement length')
       if len(joint_rand) != self.JOINT_RAND_LEN:
           raise ValueError('incorrect joint randomness length')

             Figure 22: Derived methods for validity circuits.

7.3.3.  Construction

   This section specifies an implementation of the Flp interface
   (Section 7.1).  It has as a generic parameter a validity circuit
   Valid implementing the interface defined in Section 7.3.2.

Barnes, et al.            Expires 7 April 2025                 [Page 79]
Internet-Draft                    VDAF                      October 2024

   The parameters are defined in Table 8.  The required methods for
   generating the proof, generating the verifier, and deciding validity
   are specified in the remaining subsections.

   In the remainder, we let [n] denote the set {1, ..., n} for positive
   integer n.  We also define the following constants:

   *  Let H = len(valid.GADGETS)

   *  For each i in [H]:

      -  Let G_i = valid.GADGETS[i]

      -  Let L_i = valid.GADGETS[i].ARITY

      -  Let M_i = valid.GADGET_CALLS[i]

      -  Let P_i = next_power_of_2(M_i+1)

      -  Let alpha_i = field.gen()^(field.GEN_ORDER / P_i)

      +================+============================================+
      | Parameter      | Value                                      |
      +================+============================================+
      | PROVE_RAND_LEN | valid.prove_rand_len() (see Section 7.3.2) |
      +----------------+--------------------------------------------+
      | QUERY_RAND_LEN | valid.query_rand_len() (see Section 7.3.2) |
      +----------------+--------------------------------------------+
      | JOINT_RAND_LEN | valid.JOINT_RAND_LEN                       |
      +----------------+--------------------------------------------+
      | MEAS_LEN       | valid.MEAS_LEN                             |
      +----------------+--------------------------------------------+
      | OUTPUT_LEN     | valid.OUTPUT_LEN                           |
      +----------------+--------------------------------------------+
      | PROOF_LEN      | valid.proof_len() (see Section 7.3.2)      |
      +----------------+--------------------------------------------+
      | VERIFIER_LEN   | valid.verifier_len() (see Section 7.3.2)   |
      +----------------+--------------------------------------------+
      | Measurement    | valid.Measurement                          |
      +----------------+--------------------------------------------+

                  Table 8: Parameters of FLP of BBCGGI19.

7.3.3.1.  Proof Generation

Barnes, et al.            Expires 7 April 2025                 [Page 80]
Internet-Draft                    VDAF                      October 2024

   +------------------+
   | Prove            |
   | +-------------+  |
   | | Valid       |<---- meas
   | | +--------+  |<---- joint rand
   | | | Gadget |  |  |<- prove rand
   | | +--------+  |  |
   | +-------------+  |
   +------------------+
    |
    V proof

       Figure 23: The proof generation algorithm invokes the validity
       circuit on the encoded measurement and joint randomness.  The
       validity circuit in turn invokes the gadget(s) defined by the
       circuit.  The prove randomness is used to construct the gadget
                               polynomial(s).

   On input of meas, prove_rand, and joint_rand, the proof is computed
   as follows:

   1.  For each i in [H] create an empty table wire_i.

   2.  Partition the prover randomness prove_rand into sub-vectors
       seed_1, ..., seed_H where len(seed_i) == L_i for all i in [H].
       Let us call these the "wire seeds" of each gadget.

   3.  Evaluate Valid on input of meas and joint_rand, recording the
       inputs of each gadget in the corresponding table.  Specifically,
       for every i in [H], set wire_i[j-1,k-1] to the value on the jth
       wire into the kth call to gadget G_i.

   4.  Compute the "wire polynomials".  That is, for every i in [H] and
       j in [L_i], construct poly_wire_i[j-1], the jth wire polynomial
       for the ith gadget, as follows:

       *  Let w = [seed_i[j-1], wire_i[j-1,0], ..., wire_i[j-1,M_i-1]].

       *  Let padded_w = w + field.zeros(P_i - len(w)).

       *  Let poly_wire_i[j-1] be the lowest degree polynomial for which
          poly_wire_i[j-1](alpha_i^k) == padded_w[k] for all k in [P_i].

   5.  Compute the "gadget polynomials".  That is, for every i in [H]:

Barnes, et al.            Expires 7 April 2025                 [Page 81]
Internet-Draft                    VDAF                      October 2024

       *  Let poly_gadget_i = G_i(poly_wire_i[0], ..., poly_wire_i[L_i-
          1]).  That is, evaluate the circuit G_i on the wire
          polynomials for the ith gadget.  (Arithmetic is in the ring of
          polynomials over field.)

   The proof is the vector proof = seed_1 + coeff_1 + ... + seed_H +
   coeff_H, where coeff_i is the vector of coefficients of poly_gadget_i
   for each i in [H].

7.3.3.2.  Query Generation

    | proof (share)
    +--------+
    V        |
   +---------|--------+
   | Query   |        |
   | +-------|-----+  |
   | | Valid |     |<---- meas (share)
   | | +-----|--+  |<---- joint rand
   | | |     V  |  |  |<- query rand
   | | +--------+  |  |
   | +-------------+  |
   +------------------+
    |
    V verifier (share)

       Figure 24: The query generation algorithm invokes the validity
        circuit on the encoded measurement and joint randomness.  It
      evaluates the gadget polynomials(s) encoded by the proof (share)
     to produce (a share of) each gadget output.  The verifier (share)
       consists of (a share of) the validity circuit's output and (a
          share of) each gadget test.  The tests consume the query
                                randomness.

   On input of meas, proof, query_rand, and joint_rand, the verifier
   message is generated as follows:

   1.  For every i in [H] create an empty table wire_i.

   2.  Partition proof into the sub-vectors seed_1, coeff_1, ...,
       seed_H, coeff_H defined in Section 7.3.3.1.

   3.  Evaluate Valid on input of meas and joint_rand, recording the
       inputs of each gadget in the corresponding table.  This step is
       similar to the prover's step (3.) except the verifier does not
       evaluate the gadgets.  Instead, it computes the output of the kth
       call to G_i by evaluating poly_gadget_i(alpha_i^k).  Let out
       denote the output of the circuit evaluation.

Barnes, et al.            Expires 7 April 2025                 [Page 82]
Internet-Draft                    VDAF                      October 2024

   4.  Next, reduce out as follows.  If EVAL_OUTPUT_LEN > 1, then
       consume the first EVAL_OUTPUT_LEN elements of query_rand by
       letting r, query_rand = front(1, query_rand).  Then let v =
       r[0]*out[0] + r[1]*out[1] + r[2]*out[2] + ....

   5.  Compute the wire polynomials just as in the prover's step (4.).

   6.  Compute the tests for well-formedness of the gadget polynomials.
       That is, for every i in [H]:

       *  Let t = query_rand[i].  Check if t^(P_i) == 1: if so, then
          raise ERR_ABORT and halt.  (This prevents the verifier from
          inadvertently leaking a gadget output in the verifier
          message.)

       *  Let y_i = poly_gadget_i(t).

       *  For each j in [0,L_i) let x_i[j-1] = poly_wire_i[j-1](t).

   The verifier message is the vector verifier = [v] + x_1 + [y_1] + ...
   + x_H + [y_H].

7.3.3.3.  Decision

    | verifier
    V
   +------------------+
   | Decide           |
   +------------------+
    |
    V is_valid

          Figure 25: The decision algorithm consumes the verifier,
     consisting of the reduced output of the validity circuit and each
      gadget test.  The verifier is produced by adding up the verifier
                     shares output by each Aggregator.

   On input of vector verifier, the verifier decides if the measurement
   is valid as follows:

   1.  Parse verifier into v, x_1, y_1, ..., x_H, y_H as defined in
       Section 7.3.3.2.

   2.  Check for well-formedness of the gadget polynomials.  For every i
       in [H]:

       *  Let z = G_i(x_i).  That is, evaluate the circuit G_i on x_i
          and set z to the output.

Barnes, et al.            Expires 7 April 2025                 [Page 83]
Internet-Draft                    VDAF                      October 2024

       *  If z != y_i, then return False and halt.

   3.  Return True if v == 0 and False otherwise.

7.3.3.4.  Encoding

   The FLP encoding and truncation methods invoke valid.encode,
   valid.truncate, and valid.decode in the natural way.

7.4.  Instantiations

   This section specifies instantiations of Prio3 for various
   measurement types.  Each is determined by a field (Section 6.1), a
   validity circuit (Section 7.3.2), an XOF (Section 6.2). and the
   number of proofs to generate and verify.  Test vectors for each can
   be found in Appendix "Test Vectors".

7.4.1.  Prio3Count

             +===========+==================================+
             | Parameter | Value                            |
             +===========+==================================+
             | Valid     | Count(Field64) (this section)    |
             +-----------+----------------------------------+
             | Field     | Field64 (Table 3)                |
             +-----------+----------------------------------+
             | PROOFS    | 1                                |
             +-----------+----------------------------------+
             | Xof       | XofTurboShake128 (Section 6.2.1) |
             +-----------+----------------------------------+

                   Table 9: Parameters for Prio3Count.

   Our first instance of Prio3 is for a simple counter: each measurement
   is either one or zero and the aggregate result is the sum of the
   measurements.

   Its validity circuit, denoted Count, uses the following degree-2,
   arity-2 gadget, denoted Mul:

   def eval(self, _field: type[F], inp: list[F]) -> F:
       self.check_gadget_eval(inp)
       return inp[0] * inp[1]

   The call to check_gadget_eval() raises an error if the length of the
   input is not equal to the gadget's ARITY parameter.

   The Count validity circuit is defined as

Barnes, et al.            Expires 7 April 2025                 [Page 84]
Internet-Draft                    VDAF                      October 2024

   def eval(
           self,
           meas: list[F],
           joint_rand: list[F],
           _num_shares: int) -> list[F]:
       self.check_valid_eval(meas, joint_rand)
       squared = self.GADGETS[0].eval(self.field, [meas[0], meas[0]])
       return [squared - meas[0]]

   The measurement is encoded and decoded as a singleton vector in the
   natural way.  The parameters for this circuit are summarized below.

                   +=================+=================+
                   | Parameter       | Value           |
                   +=================+=================+
                   | GADGETS         | [Mul]           |
                   +-----------------+-----------------+
                   | GADGET_CALLS    | [1]             |
                   +-----------------+-----------------+
                   | MEAS_LEN        | 1               |
                   +-----------------+-----------------+
                   | OUTPUT_LEN      | 1               |
                   +-----------------+-----------------+
                   | JOINT_RAND_LEN  | 0               |
                   +-----------------+-----------------+
                   | EVAL_OUTPUT_LEN | 1               |
                   +-----------------+-----------------+
                   | Measurement     | int in range(2) |
                   +-----------------+-----------------+
                   | AggResult       | int             |
                   +-----------------+-----------------+

                      Table 10: Parameters of validity
                               circuit Count.

7.4.2.  Prio3Sum

       +===========+==============================================+
       | Parameter | Value                                        |
       +===========+==============================================+
       | Valid     | Sum(Field64, max_measurement) (this section) |
       +-----------+----------------------------------------------+
       | Field     | Field64 (Table 3)                            |
       +-----------+----------------------------------------------+
       | PROOFS    | 1                                            |
       +-----------+----------------------------------------------+
       | Xof       | XofTurboShake128 (Section 6.2.1)             |
       +-----------+----------------------------------------------+

Barnes, et al.            Expires 7 April 2025                 [Page 85]
Internet-Draft                    VDAF                      October 2024

                    Table 11: Parameters for Prio3Sum.

   The next instance of Prio3 supports summing of integers in a pre-
   determined range.  Each measurement is an integer in
   range(max_measurement+1), where max_measurement is an associated
   parameter equal to the largest valid measurement.

   The range check is accomplished by encoding the measurement as a bit
   vector, encoding the measurement plus an offset as a bit vector, then
   checking that the two encoded integers are consistent.  Let

   *  bits = max_measurement.bit_length(), the number of bits needed to
      encode the largest valid measurement

   *  offset = 2**bits - 1 - max_measurement

   The first bit-encoded integer is the measurement itself.  Note that
   only measurements between 0 and 2**bits - 1 can be encoded this way
   with bits bits.  The second bit-encoded integer is the sum of the
   measurement and offset.  Observe that this sum can only be encoded
   this way if it is between 0 and 2**bits - 1, which implies that the
   measurement is between -offset and max_measurement.

   The circuit, denoted Sum, first checks that each entry of both bit
   vectors is a one or a zero.  It then decodes both the measurement and
   the offset measurement, and subtracts offset from the latter.  It
   then checks if these two values are equal.  Since both the
   measurement and the measurement plus offset are in the same range of
   range(2**bits), this means that the measurement itself is between 0
   and max_measurement.

   The measurement is encoded in 2*bits field elements as follows:

Barnes, et al.            Expires 7 April 2025                 [Page 86]
Internet-Draft                    VDAF                      October 2024

   def encode(self, measurement: int) -> list[F]:
       encoded = []
       encoded += self.field.encode_into_bit_vector(
           measurement,
           self.bits
       )
       encoded += self.field.encode_into_bit_vector(
           measurement + self.offset.as_unsigned(),
           self.bits
       )
       return encoded

   def truncate(self, meas: list[F]) -> list[F]:
       return [self.field.decode_from_bit_vector(meas[:self.bits])]

   def decode(self, output: list[F], _num_measurements: int) -> int:
       return output[0].as_unsigned()

   The validity circuit checks that the input consists of ones and
   zeros.  Its gadget, denoted Range2, is the degree-2, arity-1 gadget
   defined as

   def eval(self, _field: type[F], inp: list[F]) -> F:
       self.check_gadget_eval(inp)
       return inp[0] * inp[0] - inp[0]

   The Sum validity circuit is defined as

   def eval(
           self,
           meas: list[F],
           joint_rand: list[F],
           num_shares: int) -> list[F]:
       self.check_valid_eval(meas, joint_rand)
       shares_inv = self.field(num_shares).inv()

       out = []
       for b in meas:
           out.append(self.GADGETS[0].eval(self.field, [b]))

       range_check = self.offset * shares_inv + \
           self.field.decode_from_bit_vector(meas[:self.bits]) - \
           self.field.decode_from_bit_vector(meas[self.bits:])
       out.append(range_check)
       return out

Barnes, et al.            Expires 7 April 2025                 [Page 87]
Internet-Draft                    VDAF                      October 2024

           +=================+=================================+
           | Parameter       | Value                           |
           +=================+=================================+
           | GADGETS         | [Range2]                        |
           +-----------------+---------------------------------+
           | GADGET_CALLS    | [2*bits]                        |
           +-----------------+---------------------------------+
           | MEAS_LEN        | 2*bits                          |
           +-----------------+---------------------------------+
           | OUTPUT_LEN      | 1                               |
           +-----------------+---------------------------------+
           | JOINT_RAND_LEN  | 0                               |
           +-----------------+---------------------------------+
           | EVAL_OUTPUT_LEN | 2*bits + 1                      |
           +-----------------+---------------------------------+
           | Measurement     | int in range(max_measurement+1) |
           +-----------------+---------------------------------+
           | AggResult       | int                             |
           +-----------------+---------------------------------+

               Table 12: Parameters of validity circuit Sum.

7.4.3.  Prio3SumVec

              +===========+================================+
              | Parameter | Value                          |
              +===========+================================+
              | Valid     | SumVec(Field128, length, bits, |
              |           | chunk_lengh) (this section)    |
              +-----------+--------------------------------+
              | Field     | Field128 (Table 3)             |
              +-----------+--------------------------------+
              | PROOFS    | 1                              |
              +-----------+--------------------------------+
              | Xof       | XofTurboShake128               |
              |           | (Section 6.2.1)                |
              +-----------+--------------------------------+

                  Table 13: Parameters for Prio3SumVec.

   This instance of Prio3 supports summing a vector of integers.  It has
   three parameters, length, bits, and chunk_length.  Each measurement
   is a vector of positive integers with length equal to the length
   parameter.  Each element of the measurement is an integer in the
   range(2**bits).  It is RECOMMENDED to set chunk_length to an integer
   near the square root of length * bits (see Section 7.4.3.1).

Barnes, et al.            Expires 7 April 2025                 [Page 88]
Internet-Draft                    VDAF                      October 2024

   The validity circuit is denoted SumVec.  Measurements are encoded as
   a vector of field elements with length length * bits.  The field
   elements in the encoded vector represent all the bits of the
   measurement vector's elements, consecutively, in LSB to MSB order:

   def encode(self, measurement: list[int]) -> list[F]:
       if len(measurement) != self.length:
           raise ValueError('incorrect measurement length')

       encoded = []
       for val in measurement:
           if val not in range(2**self.bits):
               raise ValueError(
                   'entry of measurement vector is out of range'
               )

           encoded += self.field.encode_into_bit_vector(val, self.bits)
       return encoded

   def truncate(self, meas: list[F]) -> list[F]:
       truncated = []
       for i in range(self.length):
           truncated.append(self.field.decode_from_bit_vector(
               meas[i * self.bits: (i + 1) * self.bits]
           ))
       return truncated

   def decode(
           self,
           output: list[F],
           _num_measurements: int) -> list[int]:
       return [x.as_unsigned() for x in output]

   This validity circuit uses a ParallelSum gadget to achieve a smaller
   proof size.  This optimization for "parallel-sum circuits" is
   described in [BBCGGI19], section 4.4.  Briefly, for circuits that add
   up the output of multiple identical subcircuits, it is possible to
   achieve smaller proof sizes (on the order of O(sqrt(MEAS_LEN))
   instead of O(MEAS_LEN)) by packaging more than one such subcircuit
   into a gadget.

Barnes, et al.            Expires 7 April 2025                 [Page 89]
Internet-Draft                    VDAF                      October 2024

   The ParallelSum gadget is parameterized with an arithmetic
   subcircuit, and a count of how many times it evaluates that
   subcircuit.  It takes in a list of inputs and passes them through to
   instances of the subcircuit in the same order.  It returns the sum of
   the subcircuit outputs.  Note that only the ParallelSum gadget
   itself, and not its subcircuit, participates in the FLP's wire
   recording during evaluation, gadget consistency proofs, and proof
   validation, even though the subcircuit is provided to ParallelSum as
   an implementation of the Gadget interface.

   def eval(self, field: type[F], inp: list[F]) -> F:
       self.check_gadget_eval(inp)
       out = field(0)
       for i in range(self.count):
           start_index = i * self.subcircuit.ARITY
           end_index = (i + 1) * self.subcircuit.ARITY
           out += self.subcircuit.eval(
               field,
               inp[start_index:end_index],
           )
       return out

   The SumVec validity circuit checks that the encoded measurement
   consists of ones and zeros.  Rather than use the Range2 gadget on
   each element, as in the Sum validity circuit, it instead uses Mul
   subcircuits and "free" constant multiplication and addition gates to
   simultaneously evaluate the same range check polynomial on each
   element, and multiply by a constant.  One of the two Mul subcircuit
   inputs is equal to a measurement element multiplied by a power of one
   of the elements of the joint randomness vector, and the other is
   equal to the same measurement element minus one.  These Mul
   subcircuits are evaluated by a ParallelSum gadget, and the results
   are added up both within the ParallelSum gadget and after it.

Barnes, et al.            Expires 7 April 2025                 [Page 90]
Internet-Draft                    VDAF                      October 2024

   def eval(
           self,
           meas: list[F],
           joint_rand: list[F],
           num_shares: int) -> list[F]:
       self.check_valid_eval(meas, joint_rand)

       out = self.field(0)
       shares_inv = self.field(num_shares).inv()
       for i in range(self.GADGET_CALLS[0]):
           r = joint_rand[i]
           r_power = r
           inputs: list[Optional[F]]
           inputs = [None] * (2 * self.chunk_length)
           for j in range(self.chunk_length):
               index = i * self.chunk_length + j
               if index < len(meas):
                   meas_elem = meas[index]
               else:
                   meas_elem = self.field(0)

               inputs[j * 2] = r_power * meas_elem
               inputs[j * 2 + 1] = meas_elem - shares_inv

               r_power *= r

           out += self.GADGETS[0].eval(
               self.field,
               cast(list[F], inputs),
           )

       return [out]

Barnes, et al.            Expires 7 April 2025                 [Page 91]
Internet-Draft                    VDAF                      October 2024

         +=================+====================================+
         | Parameter       | Value                              |
         +=================+====================================+
         | GADGETS         | [ParallelSum(Mul(), chunk_length)] |
         +-----------------+------------------------------------+
         | GADGET_CALLS    | [(length * bits + chunk_length -   |
         |                 | 1) // chunk_length]                |
         +-----------------+------------------------------------+
         | MEAS_LEN        | length * bits                      |
         +-----------------+------------------------------------+
         | OUTPUT_LEN      | length                             |
         +-----------------+------------------------------------+
         | JOINT_RAND_LEN  | GADGET_CALLS[0]                    |
         +-----------------+------------------------------------+
         | EVAL_OUTPUT_LEN | 1                                  |
         +-----------------+------------------------------------+
         | Measurement     | list[int], each element in         |
         |                 | range(2**bits)                     |
         +-----------------+------------------------------------+
         | AggResult       | list[int]                          |
         +-----------------+------------------------------------+

             Table 14: Parameters of validity circuit SumVec.

7.4.3.1.  Selection of ParallelSum chunk length

   The chunk_length parameter provides a trade-off between the arity of
   the ParallelSum gadget and the number of times the gadget is called.
   The proof length is asymptotically minimized when the chunk length is
   near the square root of the length of the measurement.  However, the
   relationship between VDAF parameters and proof length is complicated,
   involving two forms of rounding (the circuit pads the inputs to its
   last ParallelSum gadget call, up to the chunk length, and proof
   system rounds the degree of wire polynomials -- determined by the
   number of times a gadget is called -- up to the next power of two).
   Therefore, the optimal choice of chunk_length for a concrete
   measurement size will vary, and must be found through trial and
   error.  Setting chunk_length equal to the square root of the
   appropriate measurement length will result in proofs up to 50% larger
   than the optimal proof size.

Barnes, et al.            Expires 7 April 2025                 [Page 92]
Internet-Draft                    VDAF                      October 2024

7.4.4.  Prio3Histogram

                +===========+=============================+
                | Parameter | Value                       |
                +===========+=============================+
                | Valid     | Histogram(Field128, length, |
                |           | chunk_lengh) (this section) |
                +-----------+-----------------------------+
                | Field     | Field128 (Table 3)          |
                +-----------+-----------------------------+
                | PROOFS    | 1                           |
                +-----------+-----------------------------+
                | Xof       | XofTurboShake128            |
                |           | (Section 6.2.1)             |
                +-----------+-----------------------------+

                  Table 15: Parameters for Prio3Histogram.

   This instance of Prio3 allows for estimating the distribution of some
   quantity by computing a simple histogram.  Each measurement
   increments one histogram bucket, out of a set of fixed buckets.
   (Bucket indexing begins at 0.)  For example, the buckets might
   quantize the real numbers, and each measurement would report the
   bucket that the corresponding client's real-numbered value falls
   into.  The aggregate result counts the number of measurements in each
   bucket.

   The validity circuit is denoted Histogram.  It has two parameters,
   length, the number of histogram buckets, and chunk_length, which is
   used by by a circuit optimization described below.  It is RECOMMENDED
   to set chunk_length to an integer near the square root of length (see
   Section 7.4.3.1).

   The measurement is encoded as a one-hot vector representing the
   bucket into which the measurement falls:

Barnes, et al.            Expires 7 April 2025                 [Page 93]
Internet-Draft                    VDAF                      October 2024

   def encode(self, measurement: int) -> list[F]:
       encoded = [self.field(0)] * self.length
       encoded[measurement] = self.field(1)
       return encoded

   def truncate(self, meas: list[F]) -> list[F]:
       return meas

   def decode(
           self,
           output: list[F],
           _num_measurements: int) -> list[int]:
       return [bucket_count.as_unsigned() for bucket_count in output]

   The Histogram validity circuit checks for one-hotness in two steps,
   by checking that the encoded measurement consists of ones and zeros,
   and by checking that the sum of all elements in the encoded
   measurement is equal to one.  The individual checks constitute the
   output of the circuit.

   As in the SumVec validity circuit (Section 7.4.3), the first part of
   the validity circuit uses the ParallelSum gadget to perform range
   checks while achieving a smaller proof size.  The ParallelSum gadget
   uses Mul subcircuits to evaluate a range check polynomial on each
   element, and includes an additional constant multiplication.  One of
   the two Mul subcircuit inputs is equal to a measurement element
   multiplied by a power of an element of the joint randomness vector,
   and the other is equal to the same measurement element minus one.
   The results are added up both within the ParallelSum gadget and after
   it.

Barnes, et al.            Expires 7 April 2025                 [Page 94]
Internet-Draft                    VDAF                      October 2024

   def eval(
           self,
           meas: list[F],
           joint_rand: list[F],
           num_shares: int) -> list[F]:
       self.check_valid_eval(meas, joint_rand)

       # Check that each bucket is one or zero.
       range_check = self.field(0)
       shares_inv = self.field(num_shares).inv()
       for i in range(self.GADGET_CALLS[0]):
           r = joint_rand[i]
           r_power = r
           inputs: list[Optional[F]]
           inputs = [None] * (2 * self.chunk_length)
           for j in range(self.chunk_length):
               index = i * self.chunk_length + j
               if index < len(meas):
                   meas_elem = meas[index]
               else:
                   meas_elem = self.field(0)

               inputs[j * 2] = r_power * meas_elem
               inputs[j * 2 + 1] = meas_elem - shares_inv

               r_power *= r

           range_check += self.GADGETS[0].eval(
               self.field,
               cast(list[F], inputs),
           )

       # Check that the buckets sum to 1.
       sum_check = -shares_inv
       for b in meas:
           sum_check += b

       return [range_check, sum_check]

   Note that this circuit depends on the number of shares into which the
   measurement is sharded.  This is provided to the FLP by Prio3.

Barnes, et al.            Expires 7 April 2025                 [Page 95]
Internet-Draft                    VDAF                      October 2024

    +=================+===============================================+
    | Parameter       | Value                                         |
    +=================+===============================================+
    | GADGETS         | [ParallelSum(Mul(), chunk_length)]            |
    +-----------------+-----------------------------------------------+
    | GADGET_CALLS    | [(length + chunk_length - 1) // chunk_length] |
    +-----------------+-----------------------------------------------+
    | MEAS_LEN        | length                                        |
    +-----------------+-----------------------------------------------+
    | OUTPUT_LEN      | length                                        |
    +-----------------+-----------------------------------------------+
    | JOINT_RAND_LEN  | GADGET_CALLS[0]                               |
    +-----------------+-----------------------------------------------+
    | EVAL_OUTPUT_LEN | 2                                             |
    +-----------------+-----------------------------------------------+
    | Measurement     | int                                           |
    +-----------------+-----------------------------------------------+
    | AggResult       | list[int]                                     |
    +-----------------+-----------------------------------------------+

            Table 16: Parameters of validity circuit Histogram.

7.4.5.  Prio3MultihotCountVec

          +===========+=========================================+
          | Parameter | Value                                   |
          +===========+=========================================+
          | Valid     | MultihotCountVec(Field128, length,      |
          |           | max_weight, chunk_lengh) (this section) |
          +-----------+-----------------------------------------+
          | Field     | Field128 (Table 3)                      |
          +-----------+-----------------------------------------+
          | PROOFS    | 1                                       |
          +-----------+-----------------------------------------+
          | Xof       | XofTurboShake128 (Section 6.2.1)        |
          +-----------+-----------------------------------------+

              Table 17: Parameters for Prio3MultihotCountVec.

   For this instance of Prio3, each measurement is a vector of ones and
   zeros, where the number of ones is bounded.  This provides a
   functionality similar to Prio3Histogram except that more than one
   entry may be non-zero.  This allows Prio3MultihotCountVec to be
   composed with a randomized response mechanism, like [EPK14], for
   providing differential privacy.  (For example, each Client would set
   each entry to one with some small probability.)

Barnes, et al.            Expires 7 April 2025                 [Page 96]
Internet-Draft                    VDAF                      October 2024

   Prio3MultihotCountVec uses XofTurboShake128 (Section 6.2.1) as its
   XOF.  Its validity circuit is denoted MultihotCountVec.  It has three
   parameters: length, the number of of entries in the count vector;
   max_weight, the maximum number of non-zero entries (i.e., the weight
   must be at most max_weight); and chunk_length, used the same way as
   in Section 7.4.3 and Section 7.4.4.

   Validation works as follows.  Let

   *  bits_for_weight = max_weight.bit_length()

   *  offset = 2**bits_for_weight - 1 - max_weight

   The Client reports the weight of the count vector by adding offset to
   it and bit-encoding the result.  Observe that only a weight of at
   most max_weight can be encoded with bits_for_weight bits.

   The verifier checks that each entry of the encoded measurement is a
   bit (i.e., either one or zero).  It then decodes the reported weight
   and subtracts it from offset + sum(count_vec), where count_vec is the
   count vector.  The result is zero if and only if the reported weight
   is equal to the true weight.

   Encoding, truncation, and decoding are defined as follows:

Barnes, et al.            Expires 7 April 2025                 [Page 97]
Internet-Draft                    VDAF                      October 2024

   def encode(self, measurement: list[int]) -> list[F]:
       if len(measurement) != self.length:
           raise ValueError('invalid Client measurement length')

       # The first part is the vector of counters.
       count_vec = list(map(self.field, measurement))

       # The second part is the reported weight.
       weight_reported = sum(count_vec, self.field(0))

       encoded = []
       encoded += count_vec
       encoded += self.field.encode_into_bit_vector(
           (self.offset + weight_reported).as_unsigned(),
           self.bits_for_weight)
       return encoded

   def truncate(self, meas: list[F]) -> list[F]:
       return meas[:self.length]

   def decode(
           self,
           output: list[F],
           _num_measurements: int) -> list[int]:
       return [bucket_count.as_unsigned() for bucket_count in output]

   Circuit evaluation is defined as follows:

Barnes, et al.            Expires 7 April 2025                 [Page 98]
Internet-Draft                    VDAF                      October 2024

   def eval(
           self,
           meas: list[F],
           joint_rand: list[F],
           num_shares: int) -> list[F]:
       self.check_valid_eval(meas, joint_rand)

       # Check that each entry in the input vector is one or zero.
       range_check = self.field(0)
       shares_inv = self.field(num_shares).inv()
       for i in range(self.GADGET_CALLS[0]):
           r = joint_rand[i]
           r_power = r
           inputs: list[Optional[F]]
           inputs = [None] * (2 * self.chunk_length)
           for j in range(self.chunk_length):
               index = i * self.chunk_length + j
               if index < len(meas):
                   meas_elem = meas[index]
               else:
                   meas_elem = self.field(0)

               inputs[j * 2] = r_power * meas_elem
               inputs[j * 2 + 1] = meas_elem - shares_inv

               r_power *= r

           range_check += self.GADGETS[0].eval(
               self.field,
               cast(list[F], inputs),
           )

       # Check that the weight `offset` plus the sum of the counters
       # is equal to the value claimed by the Client.
       count_vec = meas[:self.length]
       weight = sum(count_vec, self.field(0))
       weight_reported = \
           self.field.decode_from_bit_vector(meas[self.length:])
       weight_check = self.offset*shares_inv + weight - \
           weight_reported

       return [range_check, weight_check]

Barnes, et al.            Expires 7 April 2025                 [Page 99]
Internet-Draft                    VDAF                      October 2024

         +=================+====================================+
         | Parameter       | Value                              |
         +=================+====================================+
         | GADGETS         | [ParallelSum(Mul(), chunk_length)] |
         +-----------------+------------------------------------+
         | GADGET_CALLS    | [(length + bits_for_weight +       |
         |                 | chunk_length - 1) // chunk_length] |
         +-----------------+------------------------------------+
         | MEAS_LEN        | length + bits_for_weight           |
         +-----------------+------------------------------------+
         | OUTPUT_LEN      | length                             |
         +-----------------+------------------------------------+
         | JOINT_RAND_LEN  | GADGET_CALLS[0]                    |
         +-----------------+------------------------------------+
         | EVAL_OUTPUT_LEN | 2                                  |
         +-----------------+------------------------------------+
         | Measurement     | list[int]                          |
         +-----------------+------------------------------------+
         | AggResult       | list[int]                          |
         +-----------------+------------------------------------+

                 Table 18: Parameters of validity circuit
                            MultihotCountVec.

8.  Poplar1

   This section specifies Poplar1, a VDAF for the following task.  Each
   Client holds a bit-string of length BITS and the Aggregators hold a
   sequence of L-bit strings, where L <= BITS.  We will refer to the
   latter as the set of "candidate prefixes".  The Aggregators' goal is
   to count how many measurements are prefixed by each candidate prefix.

   This functionality is the core component of the Poplar protocol
   [BBCGGI21], which was designed to compute the heavy hitters over a
   set of input strings.  At a high level, the protocol works as
   follows.

   1.  Each Client splits its string into input shares and sends one
       share to each Aggregator.

   2.  The Aggregators agree on an initial set of candidate prefixes,
       say 0 and 1.

   3.  The Aggregators evaluate the VDAF on each set of input shares and
       aggregate the recovered output shares.  The aggregation parameter
       is the set of candidate prefixes.

Barnes, et al.            Expires 7 April 2025                [Page 100]
Internet-Draft                    VDAF                      October 2024

   4.  The Aggregators send their aggregate shares to the Collector, who
       combines them to recover the counts of each candidate prefix.

   5.  Let H denote the set of prefixes that occurred at least t times.
       If the prefixes all have length BITS, then H is the set of t-
       heavy-hitters.  Otherwise compute the next set of candidate
       prefixes, e.g., for each p in H, add p || 0 and p || 1 to the
       set.  Repeat step 3 with the new set of candidate prefixes.

   Poplar1 is constructed from an "Incremental Distributed Point
   Function (IDPF)", a primitive described by [BBCGGI21] that
   generalizes the notion of a Distributed Point Function (DPF) [GI14].
   Briefly, a DPF is used to distribute the computation of a "point
   function", a function that evaluates to zero on every input except at
   a programmable "point".  The computation is distributed in such a way
   that no one party knows either the point or what it evaluates to.

   An IDPF generalizes this "point" to a path on a full binary tree from
   the root to one of the leaves.  It is evaluated on an "index"
   representing a unique node of the tree.  If the node is on the
   programmed path, then the function evaluates to a non-zero value;
   otherwise it evaluates to zero.  This structure allows an IDPF to
   provide the functionality required for the above protocol: to compute
   the hit count for an index, just evaluate each set of IDPF shares at
   that index and add up the results.

   Consider the sub-tree constructed from a set of input strings and a
   target threshold t by including all indices that prefix at least t of
   the input strings.  We shall refer to this structure as the "prefix
   tree" for the batch of inputs and target threshold.  To compute the
   t-heavy hitters for a set of inputs, the Aggregators and Collector
   first compute the prefix tree, then extract the heavy hitters from
   the leaves of this tree.  (Note that the prefix tree may leak more
   information about the set than the heavy hitters themselves; see
   Section 9.4.1 for details.)

   Poplar1 composes an IDPF with the arithmetic sketch of [BBCGGI21],
   Section 4.2.  (The paper calls this a "secure sketch", but the
   underlying technique was later generalized in [BBCGGI23], where it is
   called "arithmetic sketching".)  This protocol ensures that
   evaluating a set of input shares on a unique set of candidate
   prefixes results in shares of a "one-hot" vector, i.e., a vector that
   is zero everywhere except for at most one element, which is equal to
   one.

   The remainder of this section is structured as follows.  IDPFs are
   defined in Section 8.1; a concrete instantiation is given
   Section 8.3.  The Poplar1 VDAF is defined in Section 8.2 in terms of

Barnes, et al.            Expires 7 April 2025                [Page 101]
Internet-Draft                    VDAF                      October 2024

   a generic IDPF.  Finally, a concrete instantiation of Poplar1 is
   specified in Section 8.4; test vectors can be found in Appendix "Test
   Vectors".

8.1.  Incremental Distributed Point Functions (IDPFs)

   An IDPF is defined over a domain of size 2^BITS, where BITS is a
   constant.  Indices into the IDPF tree are bit strings.  (In Poplar1,
   each Client's bit string is an index; see Section 8.1.1 for details.)
   The Client specifies an index alpha and a vector of values beta, one
   for each "level" L in range(BITS).  The key generation algorithm
   generates one IDPF "key" for each Aggregator.  When evaluated at
   level L and index prefix, each IDPF key returns an additive share of
   beta[L] if prefix is the L-bit prefix of alpha and shares of zero
   otherwise.

   Each of the programmed points beta is a vector of elements of some
   finite field.  We distinguish two types of fields: one for inner
   nodes (denoted FieldInner), and one for leaf nodes (FieldLeaf).  (Our
   instantiation of Poplar1 (Section 8.4) will use a much larger field
   for leaf nodes than for inner nodes.  This is to ensure the IDPF is
   "extractable" as defined in [BBCGGI21], Definition 1.)

   A concrete IDPF defines the types and constants enumerated in
   Table 19.  In the remainder we write Output as shorthand for the type
   list[list[FieldInner]] | list[list[FieldLeaf]].  (This type denotes
   either a vector of inner node field elements or leaf node field
   elements.)  The scheme is comprised of the following algorithms:

   *  idpf.gen(alpha: tuple[bool, ...], beta_inner:
      list[list[FieldInner]], beta_leaf: list[FieldLeaf], ctx: bytes,
      nonce: bytes, rand: bytes) -> tuple[PublicShare, list[bytes]] is
      the randomized IDPF-key generation algorithm.  Its inputs are the
      index alpha, the values beta, an application context string, and a
      nonce string.

      The output is a public part (of type PublicShare) that is sent to
      all Aggregators and a vector of private IDPF keys, one for each
      aggregator.  The binder string is used to derive the key in the
      underlying XofFixedKeyAes128 XOF that is used for expanding seeds
      at each level.

      Pre-conditions:

      -  alpha MUST have length BITS.

      -  beta_inner MUST have length BITS - 1.

Barnes, et al.            Expires 7 April 2025                [Page 102]
Internet-Draft                    VDAF                      October 2024

      -  beta_inner[level] MUST have length VALUE_LEN for each level in
         range(BITS - 1).

      -  beta_leaf MUST have length VALUE_LEN.

      -  rand MUST be generated by a CSPRNG and have length RAND_SIZE.

      -  nonce MUST be of length Idpf.NONCE_SIZE and chosen uniformly at
         random by the Client (see Section 9.2).

   *  idpf.eval(agg_id: int, public_share: PublicShare, key: bytes,
      level: int, prefixes: Sequence[tuple[bool, ...]], ctx: bytes,
      nonce: bytes) -> Output is the deterministic, stateless IDPF-key
      evaluation algorithm run by each Aggregator.  Its inputs are the
      Aggregator's unique identifier, the public share distributed to
      all of the Aggregators, the Aggregator's IDPF key, the "level" at
      which to evaluate the IDPF, the sequence of candidate prefixes, an
      application context string, and a nonce string.  It returns the
      share of the value corresponding to each candidate prefix.

      The output type (i.e., Output) depends on the value of level: if
      level < BITS-1, the output is the value for an inner node, which
      has type list[list[FieldInner]]; otherwise, if level == BITS-1,
      then the output is the value for a leaf node, which has type
      list[list[FieldLeaf]].

      Pre-conditions:

      -  agg_id MUST be in range(SHARES) and match the index of key in
         the sequence of IDPF keys output by the Client.

      -  level MUST be in range(0, BITS).

      -  Each prefix in prefixes MUST be distinct and have length level
         + 1.

   In addition, the following method is derived for each concrete Idpf:

   def current_field(
           self,
           level: int) -> type[FieldInner] | type[FieldLeaf]:
       if level < self.BITS - 1:
           return self.field_inner
       return self.field_leaf

   Finally, an implementation note.  The interface for IDPFs specified
   here is stateless, in the sense that there is no state carried
   between IDPF evaluations.  This is to align the IDPF syntax with the

Barnes, et al.            Expires 7 April 2025                [Page 103]
Internet-Draft                    VDAF                      October 2024

   VDAF abstraction boundary, which does not include shared state across
   VDAF evaluations.  In practice, of course, it will often be
   beneficial to expose a stateful API for IDPFs and carry the state
   across evaluations.  See Section 8.3 for details.

   +=============+====================================================+
   | Parameter   | Description                                        |
   +=============+====================================================+
   | SHARES      | Number of IDPF keys output by IDPF-key generator   |
   +-------------+----------------------------------------------------+
   | BITS        | Length in bits of each input string                |
   +-------------+----------------------------------------------------+
   | VALUE_LEN   | Number of field elements of each output value      |
   +-------------+----------------------------------------------------+
   | RAND_SIZE   | Size of the random string consumed by the IDPF-key |
   |             | generator.  Equal to twice the XOF's seed size.    |
   +-------------+----------------------------------------------------+
   | NONCE_SIZE  | Size of the randon nonce generated by the Client.  |
   +-------------+----------------------------------------------------+
   | KEY_SIZE    | Size in bytes of each IDPF key                     |
   +-------------+----------------------------------------------------+
   | FieldInner  | Implementation of Field (Section 6.1) used for     |
   |             | values of inner nodes                              |
   +-------------+----------------------------------------------------+
   | FieldLeaf   | Implementation of Field used for values of leaf    |
   |             | nodes                                              |
   +-------------+----------------------------------------------------+
   | PublicShare | Type of public share for this IDPF                 |
   +-------------+----------------------------------------------------+
   | Output      | Alias of list[list[FieldInner]] |                  |
   |             | list[list[FieldLeaf]]                              |
   +-------------+----------------------------------------------------+
   | FieldVec    | Alias of list[FieldInner] | list[FieldLeaf]        |
   +-------------+----------------------------------------------------+

        Table 19: Constants and types defined by a concrete IDPF.

8.1.1.  Encoding inputs as indices

   How data are represented as IDPF indices is up to the application.
   When the inputs are fixed-length byte strings, the most natural
   choice of representation is as a bit string formed from all the bits
   of the byte string, first ordered by byte position, then ordered from
   most significant bit to least significant bit within each byte.  This
   ensures that, when a byte string is a prefix of another, so too is
   its corresponding index.  (Index prefixes are defined in
   Section 8.1.)  For example,

Barnes, et al.            Expires 7 April 2025                [Page 104]
Internet-Draft                    VDAF                      October 2024

   Byte string: 01 02
   Bit string: 00000001 00000010

   is a prefix of

   Byte string: 01 02 03
   Bit string: 00000001 00000010 00000011

   Additionally, lexicographic ordering is preserved by this mapping
   from a byte string to a bit string.

   When the inputs are variable length, it is necessary to pad each
   input to some fixed length.  Further, the padding scheme must be non-
   ambiguous.  For example, each input could be padded with b"\x01"
   followed by as many b"\x00" bytes as needed.

8.2.  Construction

   This section specifies Poplar1, an implementation of the Vdaf
   interface (Section 5).  It is defined in terms of any Idpf
   (Section 8.1) for which SHARES == 2 and VALUE_LEN == 2 and an
   implementation of Xof (Section 6.2).  The associated constants and
   types required by the Vdaf interface are defined in Table 20.  The
   methods required for sharding, preparation, aggregation, and
   unsharding are described in the remaining subsections.  These methods
   make use of constants defined in Table 21.

Barnes, et al.            Expires 7 April 2025                [Page 105]
Internet-Draft                    VDAF                      October 2024

       +=================+========================================+
       | Parameter       | Value                                  |
       +=================+========================================+
       | VERIFY_KEY_SIZE | Xof.SEED_SIZE                          |
       +-----------------+----------------------------------------+
       | RAND_SIZE       | Xof.SEED_SIZE * 3 + Idpf.RAND_SIZE     |
       +-----------------+----------------------------------------+
       | NONCE_SIZE      | 16                                     |
       +-----------------+----------------------------------------+
       | ROUNDS          | 2                                      |
       +-----------------+----------------------------------------+
       | SHARES          | 2                                      |
       +-----------------+----------------------------------------+
       | Measurement     | tuple[bool, ...]                       |
       +-----------------+----------------------------------------+
       | AggParam        | tuple[int, Sequence[tuple[bool, ...]]] |
       +-----------------+----------------------------------------+
       | PublicShare     | same as the IDPF                       |
       +-----------------+----------------------------------------+
       | InputShare      | tuple[bytes, bytes, list[FieldInner],  |
       |                 | list[FieldLeaf]]                       |
       +-----------------+----------------------------------------+
       | OutShare        | FieldVec                               |
       +-----------------+----------------------------------------+
       | AggShare        | FieldVec                               |
       +-----------------+----------------------------------------+
       | AggResult       | list[int]                              |
       +-----------------+----------------------------------------+
       | PrepState       | tuple[bytes, int, FieldVec]            |
       +-----------------+----------------------------------------+
       | PrepShare       | FieldVec                               |
       +-----------------+----------------------------------------+
       | PrepMessage     | Optional[FieldVec]                     |
       +-----------------+----------------------------------------+

                  Table 20: VDAF parameters for Poplar1.

Barnes, et al.            Expires 7 April 2025                [Page 106]
Internet-Draft                    VDAF                      October 2024

                    +========================+=======+
                    | Variable               | Value |
                    +========================+=======+
                    | USAGE_SHARD_RAND: int  | 1     |
                    +------------------------+-------+
                    | USAGE_CORR_INNER: int  | 2     |
                    +------------------------+-------+
                    | USAGE_CORR_LEAF: int   | 3     |
                    +------------------------+-------+
                    | USAGE_VERIFY_RAND: int | 4     |
                    +------------------------+-------+

                       Table 21: Constants used by
                                 Poplar1.

8.2.1.  Sharding

   The Client's measurement is an IDPF index, denoted alpha.  (See
   Section 8.1.1 for guidelines on index encoding.)  The programmed IDPF
   values are pairs of field elements (1, k) where each k is chosen at
   random.  This random value is used as part of the arithmetic
   sketching protocol of [BBCGGI21], Appendix C.4.  After evaluating
   their IDPF key shares on a given sequence of candidate prefixes, the
   sketching protocol is used by the Aggregators to verify that they
   hold shares of a one-hot vector.  In addition, for each level of the
   tree, the prover generates random elements a, b, and c and computes

       A = -2*a + k
       B = a^2 + b - k*a + c

   and sends additive shares of a, b, c, A and B to the Aggregators.
   Putting everything together, the sharding algorithm is defined as
   follows.

   def shard(
       self,
       ctx: bytes,
       measurement: tuple[bool, ...],
       nonce: bytes,
       rand: bytes,
   ) -> tuple[Poplar1PublicShare, list[Poplar1InputShare]]:
       if len(nonce) != self.NONCE_SIZE:
           raise ValueError("incorrect nonce size")
       if len(rand) != self.RAND_SIZE:
           raise ValueError("incorrect size of random bytes argument")

       l = self.xof.SEED_SIZE

Barnes, et al.            Expires 7 April 2025                [Page 107]
Internet-Draft                    VDAF                      October 2024

       # Split the random input into the random input for IDPF key
       # generation, correlated randomness, and sharding.
       if len(rand) != self.RAND_SIZE:
           raise ValueError('incorrect rand size')
       idpf_rand, rand = front(self.idpf.RAND_SIZE, rand)
       seeds = [rand[i:i + l] for i in range(0, 3 * l, l)]
       corr_seed, seeds = front(2, seeds)
       (k_shard,), seeds = front(1, seeds)

       xof = self.xof(
           k_shard,
           self.domain_separation_tag(USAGE_SHARD_RAND, ctx),
           nonce,
       )

       # Construct the IDPF values for each level of the IDPF tree.
       # Each "data" value is 1; in addition, the Client generates
       # a random "authenticator" value used by the Aggregators to
       # evaluate the sketch during preparation. This sketch is used
       # to verify the one-hotness of their output shares.
       beta_inner = [
           [self.idpf.field_inner(1), k]
           for k in xof.next_vec(self.idpf.field_inner,
                                 self.idpf.BITS - 1)
       ]
       beta_leaf = [self.idpf.field_leaf(1)] + \
           xof.next_vec(self.idpf.field_leaf, 1)

       # Generate the IDPF keys.
       (public_share, keys) = self.idpf.gen(
           measurement,
           beta_inner,
           beta_leaf,
           ctx,
           nonce,
           idpf_rand,
       )

       # Generate correlated randomness used by the Aggregators to
       # evaluate the sketch over their output shares. Seeds are used
       # to encode shares of the `(a, b, c)` triples. (See [BBCGGI21,
       # Appendix C.4].)
       corr_offsets: list[Field] = vec_add(
           self.xof.expand_into_vec(
               self.idpf.field_inner,
               corr_seed[0],
               self.domain_separation_tag(USAGE_CORR_INNER, ctx),
               byte(0) + nonce,

Barnes, et al.            Expires 7 April 2025                [Page 108]
Internet-Draft                    VDAF                      October 2024

               3 * (self.idpf.BITS - 1),
           ),
           self.xof.expand_into_vec(
               self.idpf.field_inner,
               corr_seed[1],
               self.domain_separation_tag(USAGE_CORR_INNER, ctx),
               byte(1) + nonce,
               3 * (self.idpf.BITS - 1),
           ),
       )
       corr_offsets += vec_add(
           self.xof.expand_into_vec(
               self.idpf.field_leaf,
               corr_seed[0],
               self.domain_separation_tag(USAGE_CORR_LEAF, ctx),
               byte(0) + nonce,
               3,
           ),
           self.xof.expand_into_vec(
               self.idpf.field_leaf,
               corr_seed[1],
               self.domain_separation_tag(USAGE_CORR_LEAF, ctx),
               byte(1) + nonce,
               3,
           ),
       )

       # For each level of the IDPF tree, shares of the `(A, B)`
       # pairs are computed from the corresponding `(a, b, c)`
       # triple and authenticator value `k`.
       corr_inner: list[list[Field64]] = [[], []]
       for level in range(self.idpf.BITS):
           field = cast(type[Field], self.idpf.current_field(level))
           k = beta_inner[level][1] if level < self.idpf.BITS - 1 \
               else beta_leaf[1]
           (a, b, c), corr_offsets = corr_offsets[:3], corr_offsets[3:]
           A = -field(2) * a + k
           B = a ** 2 + b - a * k + c
           corr1 = xof.next_vec(field, 2)
           corr0 = vec_sub([A, B], corr1)
           if level < self.idpf.BITS - 1:
               corr_inner[0] += cast(list[Field64], corr0)
               corr_inner[1] += cast(list[Field64], corr1)
           else:
               corr_leaf = [
                   cast(list[Field255], corr0),
                   cast(list[Field255], corr1),
               ]

Barnes, et al.            Expires 7 April 2025                [Page 109]
Internet-Draft                    VDAF                      October 2024

       # Each input share consists of the Aggregator's IDPF key
       # and a share of the correlated randomness.
       input_shares = list(zip(keys, corr_seed, corr_inner, corr_leaf))
       return (public_share, input_shares)

               Figure 26: The sharding algorithm for Poplar1.

8.2.2.  Preparation

   The aggregation parameter encodes a sequence of candidate prefixes.
   When an Aggregator receives an input share from the Client, it begins
   by evaluating its IDPF share on each candidate prefix, recovering a
   data_share and auth_share for each.  The Aggregators use these and
   the correlation shares provided by the Client to verify that the
   sequence of data_share values are additive shares of a one-hot
   vector.

   Aggregators MUST ensure the candidate prefixes are all unique and
   appear in lexicographic order.  (This is enforced in the definition
   of prep_init() below.)  Uniqueness is necessary to ensure the refined
   measurement (i.e., the sum of the output shares) is in fact a one-hot
   vector.  Otherwise, sketch verification might fail, causing the
   Aggregators to erroneously reject a report that is actually valid.
   Note that enforcing the order is not strictly necessary, but this
   does allow uniqueness to be determined more efficiently.

   def prep_init(
           self,
           verify_key: bytes,
           ctx: bytes,
           agg_id: int,
           agg_param: Poplar1AggParam,
           nonce: bytes,
           public_share: Poplar1PublicShare,
           input_share: Poplar1InputShare) -> tuple[
               Poplar1PrepState,
               FieldVec]:
       (level, prefixes) = agg_param
       (key, corr_seed, corr_inner, corr_leaf) = input_share
       field = self.idpf.current_field(level)

       # Ensure that candidate prefixes are all unique and appear in
       # lexicographic order.
       for i in range(1, len(prefixes)):
           if prefixes[i - 1] >= prefixes[i]:
               raise ValueError('out of order prefix')

       # Evaluate the IDPF key at the given set of prefixes.

Barnes, et al.            Expires 7 April 2025                [Page 110]
Internet-Draft                    VDAF                      October 2024

       value = self.idpf.eval(
           agg_id, public_share, key, level, prefixes, ctx, nonce)

       # Get shares of the correlated randomness for evaluating the
       # Aggregator's share of the sketch.
       if level < self.idpf.BITS - 1:
           corr_xof = self.xof(
               corr_seed,
               self.domain_separation_tag(USAGE_CORR_INNER, ctx),
               byte(agg_id) + nonce,
           )
           # Fast-forward the XOF state to the current level.
           corr_xof.next_vec(field, 3 * level)
       else:
           corr_xof = self.xof(
               corr_seed,
               self.domain_separation_tag(USAGE_CORR_LEAF, ctx),
               byte(agg_id) + nonce,
           )
       (a_share, b_share, c_share) = corr_xof.next_vec(field, 3)
       if level < self.idpf.BITS - 1:
           (A_share, B_share) = cast(
               list[Field],
               corr_inner[2 * level:2 * (level + 1)],
           )
       else:
           (A_share, B_share) = cast(list[Field], corr_leaf)

       # Evaluate the Aggregator's share of the sketch. These are
       # called the "masked input values" [BBCGGI21, Appendix C.4].
       verify_rand_xof = self.xof(
           verify_key,
           self.domain_separation_tag(USAGE_VERIFY_RAND, ctx),
           nonce + to_be_bytes(level, 2),
       )
       verify_rand = cast(
           list[Field],
           verify_rand_xof.next_vec(field, len(prefixes)),
       )
       sketch_share = [a_share, b_share, c_share]
       out_share = []
       for (i, r) in enumerate(verify_rand):
           data_share = cast(Field, value[i][0])
           auth_share = cast(Field, value[i][1])
           sketch_share[0] += data_share * r
           sketch_share[1] += data_share * r ** 2
           sketch_share[2] += auth_share * r
           out_share.append(data_share)

Barnes, et al.            Expires 7 April 2025                [Page 111]
Internet-Draft                    VDAF                      October 2024

       prep_mem = [A_share, B_share, field(agg_id)] + out_share
       return (
           (
               b'evaluate sketch',
               level,
               cast(FieldVec, prep_mem),
           ),
           cast(FieldVec, sketch_share),
       )

   def prep_next(
       self,
       _ctx: bytes,
       prep_state: Poplar1PrepState,
       prep_msg: Optional[FieldVec]
   ) -> tuple[Poplar1PrepState, FieldVec] | FieldVec:
       prev_sketch = cast(list[Field], prep_msg)
       (step, level, prep_mem) = prep_state

       if step == b'evaluate sketch':
           if prev_sketch is None:
               raise ValueError('expected value, got none')
           elif len(prev_sketch) != 3:
               raise ValueError('incorrect sketch length')
           A_share = cast(Field, prep_mem[0])
           B_share = cast(Field, prep_mem[1])
           agg_id = cast(Field, prep_mem[2])
           prep_mem = prep_mem[3:]
           sketch_share = [
               agg_id * (prev_sketch[0] ** 2
                         - prev_sketch[1]
                         - prev_sketch[2])
               + A_share * prev_sketch[0]
               + B_share
           ]
           return cast(
               tuple[Poplar1PrepState, FieldVec],
               (
                   (
                       b'reveal sketch',
                       level,
                       prep_mem,
                   ),
                   sketch_share,
               )
           )

       elif step == b'reveal sketch':

Barnes, et al.            Expires 7 April 2025                [Page 112]
Internet-Draft                    VDAF                      October 2024

           if prev_sketch is None:
               return prep_mem  # Output shares
           else:
               raise ValueError('invalid prep message')

       raise ValueError('invalid prep state')

   def prep_shares_to_prep(
           self,
           _ctx: bytes,
           agg_param: Poplar1AggParam,
           prep_shares: list[FieldVec]) -> Optional[FieldVec]:
       if len(prep_shares) != 2:
           raise ValueError('incorrect number of prep shares')
       (level, _) = agg_param
       field = self.idpf.current_field(level)
       sketch = vec_add(
           cast(list[Field], prep_shares[0]),
           cast(list[Field], prep_shares[1]),
       )
       if len(sketch) == 3:
           return cast(FieldVec, sketch)
       elif len(sketch) == 1:
           if sketch == field.zeros(1):
               # In order to reduce communication overhead, let `None`
               # denote a successful sketch verification.
               return None
           else:
               raise ValueError('sketch verification failed')
       else:
           raise ValueError('incorrect sketch length')

                 Figure 27: Preparation state for Poplar1.

8.2.3.  Validity of Aggregation Parameters

   Aggregation parameters are valid for a given input share if no
   aggregation parameter with the same level has been used with the same
   input share before.  The whole preparation phase MUST NOT be run more
   than once for a given combination of input share and level.  This
   function checks that levels are increasing between calls, and also
   enforces that the prefixes at each level are suffixes of the previous
   level's prefixes.

Barnes, et al.            Expires 7 April 2025                [Page 113]
Internet-Draft                    VDAF                      October 2024

   def get_ancestor(
           index: tuple[bool, ...],
           level: int) -> tuple[bool, ...]:
       """
       Helper function to determine the prefix of `index` at
       `level`.
       """
       return index[:level + 1]

   def is_valid(
           self,
           agg_param: Poplar1AggParam,
           previous_agg_params: list[Poplar1AggParam]) -> bool:
       """
       Checks that levels are increasing between calls, and also
       enforces that the prefixes at each level are suffixes of the
       previous level's prefixes.
       """
       if len(previous_agg_params) < 1:
           return True

       (level, prefixes) = agg_param
       (last_level, last_prefixes) = previous_agg_params[-1]
       last_prefixes_set = set(last_prefixes)

       # Check that level increased.
       if level <= last_level:
           return False

       # Check that prefixes are suffixes of the last level's prefixes.
       for prefix in prefixes:
           last_prefix = get_ancestor(prefix, last_level)
           if last_prefix not in last_prefixes_set:
               # Current prefix not a suffix of last level's prefixes.
               return False
       return True

         Figure 28: Validity of aggregation parameters for Poplar1.

8.2.4.  Aggregation

   Aggregation involves simply adding up the output shares.

Barnes, et al.            Expires 7 April 2025                [Page 114]
Internet-Draft                    VDAF                      October 2024

   def aggregate(
           self,
           agg_param: Poplar1AggParam,
           out_shares: list[FieldVec]) -> FieldVec:
       (level, prefixes) = agg_param
       field = self.idpf.current_field(level)
       agg_share = cast(list[Field], field.zeros(len(prefixes)))
       for out_share in out_shares:
           agg_share = vec_add(agg_share, cast(list[Field], out_share))
       return cast(FieldVec, agg_share)

               Figure 29: Aggregation algorithm for Poplar1.

8.2.5.  Unsharding

   Finally, the Collector unshards the aggregate result by adding up the
   aggregate shares.

   def unshard(
           self,
           agg_param: Poplar1AggParam,
           agg_shares: list[FieldVec],
           _num_measurements: int) -> list[int]:
       (level, prefixes) = agg_param
       field = self.idpf.current_field(level)
       agg = cast(list[Field], field.zeros(len(prefixes)))
       for agg_share in agg_shares:
           agg = vec_add(agg, cast(list[Field], agg_share))
       return [x.as_unsigned() for x in agg]

        Figure 30: Computation of the aggregate result for Poplar1.

8.2.6.  Message Serialization

   This section defines serialization formats for messages exchanged
   over the network while executing Poplar1.  It is RECOMMENDED that
   implementations provide serialization methods for them.

   Message structures are defined following Section 3 of [RFC8446]).  In
   the remainder we use Fi as an alias for
   poplar1.idpf.field_inner.ENCODED_SIZE, Fl as an alias for
   poplar1.idpf.field_leaf.ENCODED_SIZE, and B as an alias for
   poplar1.idpf.BITS.

   Elements of the inner field are encoded in little-endian byte order
   (as defined in Section 6.1) and are represented as follows:

   opaque Poplar1FieldInner[Fi];

Barnes, et al.            Expires 7 April 2025                [Page 115]
Internet-Draft                    VDAF                      October 2024

   Likewise, elements of the leaf field are encoded in little-endian
   byte order (as defined in Section 6.1) and are represented as
   follows:

   opaque Poplar1FieldLeaf[Fl];

8.2.6.1.  Public Share

   The public share of the IDPF scheme in Section 8.3 consists of a
   sequence of "correction words".  A correction word has three
   components:

   1.  the XOF seed of type bytes;

   2.  the control bits of type tuple[Field2, Field2]; and

   3.  the payload of type list[Field64] for the first BITS-1 words and
       list[Field255] for the last word.

   The encoding is a straightforward structure of arrays, except that
   the control bits are packed as tightly as possible.  The encoded
   public share is structured as follows:

   struct {
       opaque packed_control_bits[packed_len];
       opaque seed[poplar1.idpf.KEY_SIZE*B];
       Poplar1FieldInner payload_inner[Fi*poplar1.idpf.VALUE_LEN*(B-1)];
       Poplar1FieldLeaf payload_leaf[Fl*poplar1.idpf.VALUE_LEN];
   } Poplar1PublicShare;

   Here packed_len = (2*B + 7) // 8 is the length of the packed control
   bits.  Field packed_control_bits is encoded with the following
   function:

   packed_control_buf = [int(0)] * packed_len
   for i, bit in enumerate(control_bits):
       packed_control_buf[i // 8] |= bit.as_unsigned() << (i % 8)
   packed_control_bits = bytes(packed_control_buf)

   It encodes each group of eight bits into a byte, in LSB to MSB order,
   padding the most significant bits of the last byte with zeros as
   necessary, and returns the byte array.  Decoding performs the reverse
   operation: it takes in a byte array and a number of bits, and returns
   a list of bits, extracting eight bits from each byte in turn, in LSB
   to MSB order, and stopping after the requested number of bits.  If
   the byte array has an incorrect length, or if unused bits in the last
   bytes are not zero, it throws an error:

Barnes, et al.            Expires 7 April 2025                [Page 116]
Internet-Draft                    VDAF                      October 2024

   control_bits = []
   for i in range(length):
       control_bits.append(Field2(
           (packed_control_bits[i // 8] >> (i % 8)) & 1
       ))
   leftover_bits = packed_control_bits[-1] >> (
       (length + 7) % 8 + 1
   )
   if (length + 7) // 8 != len(packed_control_bits) or \
           leftover_bits != 0:
       raise ValueError('trailing bits')

8.2.6.2.  Input Share

   Each input share is structured as follows:

   struct {
       opaque idpf_key[poplar1.idpf.KEY_SIZE];
       opaque corr_seed[poplar1.xof.SEED_SIZE];
       Poplar1FieldInner corr_inner[Fi * 2 * (poplar1.idpf.BITS - 1)];
       Poplar1FieldLeaf corr_leaf[Fl * 2];
   } Poplar1InputShare;

8.2.6.3.  Prep Share

   Encoding of the prep share depends on the round of sketching: if the
   first round, then each sketch share has three field elements; if the
   second round, then each sketch share has one field element.  The
   field that is used depends on the level of the IDPF tree specified by
   the aggregation parameter, either the inner field or the leaf field.

   For the first round and inner field:

   struct {
       Poplar1FieldInner sketch_share[Fi * 3];
   } Poplar1PrepShareRoundOneInner;

   For the first round and leaf field:

   struct {
       Poplar1FieldLeaf sketch_share[Fl * 3];
   } Poplar1PrepShareRoundOneLeaf;

   For the second round and inner field:

   struct {
       Poplar1FieldInner sketch_share;
   } Poplar1PrepShareRoundTwoInner;

Barnes, et al.            Expires 7 April 2025                [Page 117]
Internet-Draft                    VDAF                      October 2024

   For the second round and leaf field:

   struct {
       Poplar1FieldLeaf sketch_share;
   } Poplar1PrepShareRoundTwoLeaf;

8.2.6.4.  Prep Message

   Likewise, the structure of the prep message for Poplar1 depends on
   the sketching round and field.  For the first round and inner field:

   struct {
       Poplar1FieldInner[Fi * 3];
   } Poplar1PrepMessageRoundOneInner;

   For the first round and leaf field:

   struct {
       Poplar1FieldLeaf sketch[Fl * 3];
   } Poplar1PrepMessageRoundOneLeaf;

   Note that these messages have the same structures as the prep shares
   for the first round.

   The second-round prep message is the empty string.  This is because
   the sketch shares are expected to sum to a particular value if the
   output shares are valid; we represent a successful preparation with
   the empty string and otherwise return an error.

8.2.6.5.  Aggregate Share

   The encoding of the aggregate share depends on whether the inner or
   leaf field is used, and the number of candidate prefixes.  Both of
   these are determined by the aggregation parameter.

   Let prefix_count denote the number of candidate prefixes.  For the
   inner field:

   struct {
       Poplar1FieldInner agg_share[Fi * prefix_count];
   } Poplar1AggShareInner;

   For the leaf field:

   struct {
       Poplar1FieldLeaf agg_share[Fl * prefix_count];
   } Poplar1AggShareLeaf;

Barnes, et al.            Expires 7 April 2025                [Page 118]
Internet-Draft                    VDAF                      October 2024

8.2.6.6.  Aggregation Parameter

   The aggregation parameter is encoded as follows:

   struct {
       uint16_t level;
       uint32_t num_prefixes;
       opaque encoded_prefixes[prefixes_len];
   } Poplar1AggParam;

   The fields in this struct are: level, the level of the IDPF tree of
   each prefixes; num_prefixes, the number of prefixes to evaluate; and
   encoded_prefixes, the sequence of prefixes encoded into a byte string
   of length prefixes_len.  The prefixes are encoded with the following
   procedure:

   prefixes_len = ((level + 1) + 7) // 8 * len(prefixes)
   encoded_prefixes = bytearray()
   for prefix in prefixes:
       for chunk in itertools.batched(prefix, 8):
           byte_out = 0
           for (bit_position, bit) in enumerate(chunk):
               byte_out |= bit << (7 - bit_position)
           encoded_prefixes.append(byte_out)

   Decoding involves the following procedure:

   prefixes = []
   bytes_per_prefix = ((level + 1) + 7) // 8
   for chunk in itertools.batched(encoded_prefixes, bytes_per_prefix):
       prefix = []
       for i in range(level + 1):
           byte_index = i // 8
           bit_offset = 7 - (i % 8)
           bit = (chunk[byte_index] >> bit_offset) & 1 != 0
           prefix.append(bit)
       prefixes.append(tuple(prefix))

   Implementation note: the aggregation parameter includes the level of
   the IDPF tree and the sequence of indices to evaluate.  For
   implementations that perform per-report caching across executions of
   the VDAF, this may be more information than is strictly needed.  In
   particular, it may be sufficient to convey which indices from the
   previous execution will have their children included in the next.
   This would help reduce communication overhead.

Barnes, et al.            Expires 7 April 2025                [Page 119]
Internet-Draft                    VDAF                      October 2024

8.3.  The IDPF scheme of [BBCGGI21]

   In this section we specify a concrete IDPF suitable for instantiating
   Poplar1.  The scheme gets its name from the name of the protocol of
   [BBCGGI21].

   The constant and type definitions required by the Idpf interface are
   given in Table 22.

   Our IDPF requires an XOF for deriving the output shares, as well as a
   variety of other artifacts used internally.  For performance reasons,
   we instantiate this object using XofFixedKeyAes128 (Section 6.2.2)
   wherever possible.  See Section 9.5 for more information.

                   +============+======================+
                   | Parameter  | Value                |
                   +============+======================+
                   | SHARES     | 2                    |
                   +------------+----------------------+
                   | BITS       | any positive integer |
                   +------------+----------------------+
                   | VALUE_LEN  | any positive integer |
                   +------------+----------------------+
                   | KEY_SIZE   | Xof.SEED_SIZE        |
                   +------------+----------------------+
                   | FieldInner | Field64 (Table 3)    |
                   +------------+----------------------+
                   | FieldLeaf  | Field255 (Table 3)   |
                   +------------+----------------------+

                        Table 22: Constants and type
                        definitions for the IDPF of
                                 BBCGGI21.

8.3.1.  Key Generation

      TODO Describe the construction in prose, beginning with a gentle
      introduction to the high level idea.

   The description of the IDPF-key generation algorithm makes use of
   auxiliary functions extend() and convert() defined in Section 8.3.3.
   In the following, we let Field2 denote the field GF(2).

Barnes, et al.            Expires 7 April 2025                [Page 120]
Internet-Draft                    VDAF                      October 2024

   def gen(
           self,
           alpha: tuple[bool, ...],
           beta_inner: list[list[Field64]],
           beta_leaf: list[Field255],
           ctx: bytes,
           nonce: bytes,
           rand: bytes) -> tuple[list[CorrectionWord], list[bytes]]:
       if len(alpha) != self.BITS:
           raise ValueError("incorrect alpha length")
       if len(beta_inner) != self.BITS - 1:
           raise ValueError("incorrect beta_inner length")
       if len(rand) != self.RAND_SIZE:
           raise ValueError("incorrect rand size")
       if len(nonce) != self.NONCE_SIZE:
           raise ValueError("incorrect nonce size")

       key = [
           rand[:XofFixedKeyAes128.SEED_SIZE],
           rand[XofFixedKeyAes128.SEED_SIZE:],
       ]

       seed = key.copy()
       ctrl = [Field2(0), Field2(1)]
       public_share = []
       for level in range(self.BITS):
           keep = int(alpha[level])
           lose = 1 - keep
           bit = Field2(keep)

           (s0, t0) = self.extend(level, seed[0], ctx, nonce)
           (s1, t1) = self.extend(level, seed[1], ctx, nonce)
           seed_cw = xor(s0[lose], s1[lose])
           ctrl_cw = (
               t0[0] + t1[0] + bit + Field2(1),
               t0[1] + t1[1] + bit,
           )

           # Implementation note: these conditional XORs and
           # input-dependent array indices should be replaced with
           # constant-time selects in practice in order to reduce
           # leakage via timing side channels.
           if ctrl[0].as_unsigned():
               x0 = xor(s0[keep], seed_cw)
               ctrl[0] = t0[keep] + ctrl_cw[keep]
           else:
               x0 = s0[keep]
               ctrl[0] = t0[keep]

Barnes, et al.            Expires 7 April 2025                [Page 121]
Internet-Draft                    VDAF                      October 2024

           if ctrl[1].as_unsigned():
               x1 = xor(s1[keep], seed_cw)
               ctrl[1] = t1[keep] + ctrl_cw[keep]
           else:
               x1 = s1[keep]
               ctrl[1] = t1[keep]
           (seed[0], w0) = self.convert(level, x0, ctx, nonce)
           (seed[1], w1) = self.convert(level, x1, ctx, nonce)

           if level < self.BITS - 1:
               b = cast(list[Field], beta_inner[level])
           else:
               b = cast(list[Field], beta_leaf)
           if len(b) != self.VALUE_LEN:
               raise ValueError(
                   "length of beta must match the value length"
               )

           w_cw = vec_add(vec_sub(b, w0), w1)
           # Implementation note: this conditional negation should be
           # replaced with a constant time select or a constant time
           # multiplication in practice in order to reduce leakage via
           # timing side channels.
           if ctrl[1].as_unsigned():
               for i in range(len(w_cw)):
                   w_cw[i] = -w_cw[i]

           public_share.append((seed_cw, ctrl_cw, w_cw))
       return (public_share, key)

           Figure 31: IDPF-key generation algorithm of BBCGGI21.

8.3.2.  Key Evaluation

      TODO Describe in prose how IDPF-key evaluation algorithm works.

   The description of the IDPF-evaluation algorithm makes use of
   auxiliary functions extend() and convert() defined in Section 8.3.3.

   def eval(
           self,
           agg_id: int,
           public_share: list[CorrectionWord],
           key: bytes,
           level: int,
           prefixes: Sequence[tuple[bool, ...]],
           ctx: bytes,
           nonce: bytes) -> list[list[Field64]] | list[list[Field255]]:

Barnes, et al.            Expires 7 April 2025                [Page 122]
Internet-Draft                    VDAF                      October 2024

       if agg_id not in range(self.SHARES):
           raise ValueError('aggregator id out of range')
       if level not in range(self.BITS):
           raise ValueError('level out of range')
       if len(set(prefixes)) != len(prefixes):
           raise ValueError('prefixes must be unique')

       out_share = []
       for prefix in prefixes:
           if len(prefix) != level + 1:
               raise ValueError('incorrect prefix length')

           # The Aggregator's output share is the value of a node of
           # the IDPF tree at the given `level`. The node's value is
           # computed by traversing the path defined by the candidate
           # `prefix`. Each node in the tree is represented by a seed
           # (`seed`) and a control bit (`ctrl`).
           seed = key
           ctrl = Field2(agg_id)
           y: FieldVec
           for current_level in range(level + 1):
               bit = int(prefix[current_level])

               # Implementation note: typically the current round of
               # candidate prefixes would have been derived from
               # aggregate results computed during previous rounds.
               # For example, when using the IDPF to compute heavy
               # hitters, a string whose hit count exceeded the
               # given threshold in the last round would be the
               # prefix of each `prefix` in the current round. (See
               # [BBCGGI21, Section 5.1].) In this case, part of the
               # path would have already been traversed.
               #
               # Re-computing nodes along previously traversed paths is
               # wasteful. Implementations can eliminate this added
               # complexity by caching nodes (i.e., `(seed, ctrl)`
               # pairs) output by previous calls to `eval_next()`.
               (seed, ctrl, y) = self.eval_next(
                   seed,
                   ctrl,
                   public_share[current_level],
                   current_level,
                   bit,
                   ctx,
                   nonce,
               )
           if agg_id == 0:
               out_share.append(cast(list[Field], y))

Barnes, et al.            Expires 7 April 2025                [Page 123]
Internet-Draft                    VDAF                      October 2024

           else:
               out_share.append(vec_neg(cast(list[Field], y)))
       return cast(
           list[list[Field64]] | list[list[Field255]],
           out_share,
       )

   def eval_next(
           self,
           prev_seed: bytes,
           prev_ctrl: Field2,
           correction_word: CorrectionWord,
           level: int,
           bit: int,
           ctx: bytes,
           nonce: bytes) -> tuple[bytes, Field2, FieldVec]:
       """
       Compute the next node in the IDPF tree along the path determined
       by a candidate prefix. The next node is determined by `bit`, the
       bit of the prefix corresponding to the next level of the tree.
       """

       seed_cw = correction_word[0]
       ctrl_cw = correction_word[1]
       w_cw = cast(list[Field], correction_word[2])
       (s, t) = self.extend(level, prev_seed, ctx, nonce)

       # Implementation note: these conditional operations and
       # input-dependent array indices should be replaced with
       # constant-time selects in practice in order to reduce leakage
       # via timing side channels.
       if prev_ctrl.as_unsigned():
           s[0] = xor(s[0], seed_cw)
           s[1] = xor(s[1], seed_cw)
           t[0] += ctrl_cw[0]
           t[1] += ctrl_cw[1]

       next_ctrl = t[bit]
       convert_output = self.convert(level, s[bit], ctx, nonce)
       next_seed = convert_output[0]
       y = cast(list[Field], convert_output[1])
       # Implementation note: this conditional addition should be
       # replaced with a constant-time select in practice in order to
       # reduce leakage via timing side channels.
       if next_ctrl.as_unsigned():
           for i in range(len(y)):
               y[i] += w_cw[i]

Barnes, et al.            Expires 7 April 2025                [Page 124]
Internet-Draft                    VDAF                      October 2024

       return (next_seed, next_ctrl, cast(FieldVec, y))

        Figure 32: IDPF-evaluation generation algorithm of BBCGGI21.

8.3.3.  Auxiliary Functions

   def extend(
           self,
           level: int,
           seed: bytes,
           ctx: bytes,
           nonce: bytes) -> tuple[list[bytes], list[Field2]]:
       xof = self.current_xof(level, seed, format_dst(1, 0, 0) + ctx, nonce)
       s = [
           bytearray(xof.next(self.KEY_SIZE)),
           bytearray(xof.next(self.KEY_SIZE)),
       ]
       # Use the least significant bits as the control bit correction,
       # and then zero it out. This gives effectively 127 bits of
       # security, but reduces the number of AES calls needed by 1/3.
       t = [Field2(s[0][0] & 1), Field2(s[1][0] & 1)]
       s[0][0] &= 0xFE
       s[1][0] &= 0xFE
       return ([bytes(s[0]), bytes(s[1])], t)

   def convert(
           self,
           level: int,
           seed: bytes,
           ctx: bytes,
           nonce: bytes) -> tuple[bytes, FieldVec]:
       xof = self.current_xof(level, seed, format_dst(1, 0, 1) + ctx, nonce)
       next_seed = xof.next(self.KEY_SIZE)
       field = self.current_field(level)
       w = xof.next_vec(field, self.VALUE_LEN)
       return (next_seed, cast(FieldVec, w))

   def current_xof(self,
                   level: int,
                   seed: bytes,
                   dst: bytes,
                   nonce: bytes) -> Xof:
       if level < self.BITS-1:
           return XofFixedKeyAes128(seed, dst, nonce)
       return XofTurboShake128(seed, dst, nonce)

                 Figure 33: Helper functions for the IDPF.

Barnes, et al.            Expires 7 April 2025                [Page 125]
Internet-Draft                    VDAF                      October 2024

8.4.  Instantiation

   By default, Poplar1 is instantiated with the IDPF in Section 8.3
   (VALUE_LEN == 2) and XofTurboShake128 (Section 6.2.1).  This VDAF is
   suitable for any positive value of BITS.  Test vectors can be found
   in Appendix "Test Vectors".

9.  Security Considerations

   VDAFs (Section 5) have two essential security goals:

   1.  Privacy: an attacker that controls the Collector and a subset of
       Clients and Aggregators learns nothing about the measurements of
       honest Clients beyond what it can deduce from the aggregate
       result.  We assume the attacker controls the entire network
       except for channels between honest Clients and honest
       Aggregators.  In particular, it cannot forge or prevent
       transmission of messages on these channels.

   2.  Robustness: an attacker that controls a subset of Clients cannot
       cause the Collector to compute anything other than the aggregate
       of the measurements of honest Clients.  We assume the attacker
       eavesdrops on the network but does not control transmission of
       messages between honest parties.

   Formal definitions of privacy and robustness can be found in
   [DPRS23].  A VDAF is the core cryptographic primitive of a protocol
   that achieves the above privacy and robustness goals.  It is not
   sufficient on its own, however.  The application will need to assure
   a few security properties, for example:

   *  Securely distributing the long-lived parameters, in particular the
      verification key.

   *  Establishing secure channels:

      -  Confidential and authentic channels among Aggregators, and
         between the Aggregators and the Collector; and

      -  Confidential and Aggregator-authenticated channels between
         Clients and Aggregators.

   *  Enforcing the non-collusion properties required of the specific
      VDAF in use.

   In such an environment, a VDAF provides the high-level privacy
   property described above: the Collector learns only the aggregate
   measurement, and nothing about individual measurements aside from

Barnes, et al.            Expires 7 April 2025                [Page 126]
Internet-Draft                    VDAF                      October 2024

   what can be inferred from the aggregate result.  The Aggregators
   learn neither individual measurements nor the aggregate result.  The
   Collector is assured that the aggregate statistic accurately reflects
   the inputs as long as the Aggregators correctly executed their role
   in the VDAF.

   On their own, VDAFs do not provide:

   1.  Mitigation of Sybil attacks [Dou02].  In this attack, the
       adversary observes a subset of input shares transmitted by a
       Client it is interested in.  It allows the input shares to be
       processed, but corrupts and picks bogus measurements for the
       remaining Clients.  Applications can guard against these risks by
       adding additional controls on report submission, such as Client
       authentication and rate limits.

   2.  Differential privacy [Dwo06].  Depending on the distribution of
       the measurements, the aggregate result itself can still leak a
       significant amount of information about an individual measurement
       or the person that generated it.

   3.  Robustness in the presence of a malicious Aggregator.  An
       Aggregator can, without detection, manipulate the aggregate
       result by modifying its own aggregate share.

   4.  Guaranteed output delivery [GSZ20].  An attacker that controls
       transmission of messages between honest parties can prevent
       computation of the aggregate result by dropping messages.

9.1.  Requirements for the Verification Key

   The Aggregators are responsible for exchanging the verification key
   in advance of executing the VDAF.  Any procedure is acceptable as
   long as the following conditions are met:

   1.  To ensure robustness of the computation, the Aggregators MUST NOT
       reveal the verification key to the Clients.  Otherwise, a
       malicious Client might be able to exploit knowledge of this key
       to craft an invalid report that would be accepted by the
       Aggregators.

   2.  To ensure privacy of the measurements, the Aggregators MUST
       commit to the verification key prior to processing reports
       generated by Clients.  Otherwise, a malicious Aggregator may be
       able to craft a verification key that, for a given report, causes
       an honest Aggregator to leak information about the measurement
       during preparation.

Barnes, et al.            Expires 7 April 2025                [Page 127]
Internet-Draft                    VDAF                      October 2024

   Meeting these conditions is required in order to leverage security
   analysis in the framework of [DPRS23].  Their definition of
   robustness allows the attacker, playing the role of a cohort of
   malicious Clients, to submit arbitrary reports to the Aggregators and
   eavesdrop on their communications as they process them.  Security in
   this model is achievable as long as the verification key is kept
   secret from the attacker.

   The privacy definition of [DPRS23] considers an active attacker that
   controls the network and a subset of Aggregators; in addition, the
   attacker is allowed to choose the verification key used by each
   honest Aggregator over the course of the experiment.  Security is
   achievable in this model as long as the key is picked at the start of
   the experiment, prior to any reports being generated.  (The model
   also requires nonces to be generated at random; see Section 9.2
   below.)

   Meeting these requirements is relatively straightforward.  For
   example, the Aggregators may designate one of their peers to generate
   the verification key and distribute it to the others.  To assure
   Clients of key commitment, the Clients and (honest) Aggregators could
   bind reports to a shared context string derived from the key.  For
   instance, the "task ID" of DAP [DAP] could be set to the hash of the
   verification key; then as long as honest Aggregators only consume
   reports for the task indicated by the Client, forging a new key after
   the fact would reduce to finding collisions in the underlying hash
   function.  (Keeping the key secret from the Clients would require the
   hash function to be one-way.)  However, since rotating the key
   implies rotating the task ID, this scheme would not allow key
   rotation over the lifetime of a task.

9.2.  Requirements for the Nonce

   The sharding and preparation steps of VDAF execution depend on a
   nonce associated with the Client's report.  To ensure privacy of the
   underlying measurement, the Client MUST generate this nonce using a
   CSPRNG.  This is required in order to leverage security analysis for
   the privacy definition of [DPRS23], which assumes the nonce is chosen
   at random prior to generating the report.

   Other security considerations may require the nonce to be non-
   repeating.  For example, to achieve differential privacy it is
   necessary to avoid "over exposing" a report by including it too many
   times in a single batch or across multiple batches.  It is
   RECOMMENDED that the nonce generated by the Client be used by the
   Aggregators for replay protection.

Barnes, et al.            Expires 7 April 2025                [Page 128]
Internet-Draft                    VDAF                      October 2024

9.3.  Requirements for the Public Share

   The Aggregators MUST ensure they have both received the same public
   share from the Client.  It is sufficient, for example, to exchange a
   hash of the public share over a secure channel.

9.4.  Requirements for Aggregation Parameters

   As described in Section 4.3 and Section 5.3 respectively, DAFs and
   VDAFs may impose restrictions on the re-use of input shares.  This is
   to ensure that correlated randomness provided by the Client through
   the input share is not used more than once, which might compromise
   confidentiality of the Client's measurements.

   Protocols that make use of VDAFs therefore MUST call vdaf.is_valid on
   the set of all aggregation parameters used for a Client's input
   share, and only proceed with the preparation and aggregation phases
   if that function call returns True.

9.4.1.  Additional Privacy Considerations

   Aggregating a batch of reports multiple times, each time with a
   different aggregation parameter, could result in information leakage
   beyond what is used by the application.

   For example, when Poplar1 is used for heavy hitters, the Aggregators
   learn not only the heavy hitters themselves, but also the prefix tree
   (as defined in Section 8) computed along the way.  Indeed, this
   leakage is inherent to any construction that uses an IDPF
   (Section 8.1) in the same way.  Depending on the distribution of the
   measurements, the prefix tree can leak a significant amount of
   information about unpopular inputs.  For instance, it is possible
   (though perhaps unlikely) for a large set of non-heavy-hitter values
   to share a common prefix, which would be leaked by a prefix tree with
   a sufficiently small threshold.

   A malicious adversary controlling the Collector and one of the
   Aggregators can further turn arbitrary non-heavy prefixes into heavy
   ones by tampering with the IDPF output at any position.  While our
   construction ensures that the nodes evaluated at one level are
   children of the nodes evaluated at the previous level, this still may
   allow an adversary to discover individual non-heavy strings.

   The only practical, general-purpose approach to mitigating these
   leakages is via differential privacy, which is RECOMMENDED for all
   protocols using Poplar1 for heavy-hitter type applications.

Barnes, et al.            Expires 7 April 2025                [Page 129]
Internet-Draft                    VDAF                      October 2024

9.4.2.  Safe Usage of IDPF Outputs

   The arithmetic sketch described in Section 8 is used by the
   Aggregators to check that the shares of the vector obtained by
   evaluating a Client's IDPF at a sequence of candidate prefixes has at
   most one non-zero value, and that the non-zero value is 1.  Depending
   on how the values are used, the arithmetic sketch on its own may not
   be sufficient for robustness of the application.  In particular, a
   malicious Client may attempt to influence the computation by choosing
   an IDPF that evaluates to 1 at more than one node at a given level of
   the tree.

   This issue can be mitigated by using an IDPF that is extractable as
   defined in in Appendix D of [BBCGGI21].  Extractability ensures that,
   for a particular level of the tree, it is infeasible for an attacker
   to control values of the IDPF such that it takes on chosen non-zero
   values at more than one node.  (It can practically only achieve the
   zero function, a point function, or a pseudorandom function.)

   The IDPF specified in Section 8.1 only guarantees extractability at
   the last level of the tree.  (This is by virtue of using a larger
   field for the leaves than for inner nodes and using an XOF to derive
   leaves that is safe to model as a random oracle (see Section 9.5).)
   For intermediate levels, it is feasible for a client to produce IDPF
   shares with two controlled non-zero nodes.

   This is not an issue for running heavy hitters, since (1) each node
   in the prefix tree is a child of a previously traversed node, (2) the
   arithmetic sketch would detect double voting at every level of the
   prefix tree, and (3) the IDPF is extractable at the last level of the
   tree.  However, the lack of extractability at intermediate levels may
   result in attacks on the robustness of certain applications.

   Thus applications SHOULD NOT use prefix counts for intermediate
   levels for any purpose beyond the heavy-hitters tree traversal.

9.5.  Requirements for XOFs

   As described in Section 6.2, our constructions rely on eXtendable
   Output Functions (XOFs).  In the security analyses of our protocols,
   these are usually modeled as random oracles.  XofTurboShake128 is
   designed to be indifferentiable from a random oracle [MRH04], making
   it a suitable choice for most situations.

   The one exception is the IDPF of Section 8.3.  Here, a random oracle
   is not needed to prove privacy, since the analysis of [BBCGGI21],
   Proposition 1, only requires a Pseudorandom Generator (PRG).  As
   observed in [GKWY20], a PRG can be instantiated from a correlation-

Barnes, et al.            Expires 7 April 2025                [Page 130]
Internet-Draft                    VDAF                      October 2024

   robust hash function H.  Informally, correlation robustness requires
   that for a random r, H(xor(r, x)) is computationally
   indistinguishable from a random function of x.  A PRG can therefore
   be constructed as

   PRG(r) = H(xor(r, 1)) || H(xor(r, 2)) || ...

   since each individual hash function evaluation is indistinguishable
   from a random function.

   Our construction at Section 6.2.2 implements a correlation-robust
   hash function using fixed-key AES.  For security, it assumes that AES
   with a fixed key can be modeled as a random permutation [GKWY20].
   Additionally, we use a different AES key for every client, which in
   the ideal cipher model leads to better concrete security [GKWWY20].

   We note that for robustness, the analysis of [BBCGGI21] still assumes
   a random oracle to make the IDPF extractable.  We therefore use
   XofTurboShake128 instead for the last level of the tree.  It is
   important that XofTurboShake128 supports 16 byte seeds, as this is
   the seed size for the inner levels.

   While XofFixedKeyAes128 has been shown to be differentiable from a
   random oracle [GKWWY20], there are no known attacks exploiting this
   difference.  We also stress that even if the IDPF is not extractable,
   Poplar1 guarantees that every client can contribute to at most one
   prefix among the ones being evaluated by the helpers.

9.6.  Choosing FLP Parameters

   Prio3 and other systems built from FLPs (Section 7.3 in particular)
   may benefit from choosing a field size that is as small as possible.
   Generally speaking, a smaller field results in lower communication
   and storage costs.  Care must be taken, however, since a smaller
   field also results in degraded (or even vacuous) robustness.

   Different variants of Prio3 (Section 7) use different field sizes:
   Prio3Count and Prio3Sum use Field64; but Prio3SumVec, Prio3Histogram,
   and Prio3MultihotCountVec all use Field128, a field that is twice as
   large as Field64.  This is due to the use of joint randomness
   (Section 7.1) in the latter variants.  Joint randomness allows for
   more flexible circuit design (see Section 7.3.1.1), but opens up
   Prio3 to offline attacks in which the attacker searches for input
   shares for an invalid measurement that derive joint randomness that
   causes the circuit to accept.  Choosing a large enough field ensures
   this computation is too expensive to be feasible.  (See [DPRS23],
   Theorem 1.)  Note that privacy is not susceptible to such attacks.

Barnes, et al.            Expires 7 April 2025                [Page 131]
Internet-Draft                    VDAF                      October 2024

   Another way to mitigate this issue (or improve robustness in general)
   is to generate and verify multiple, independent proofs.  (See
   Section 7.1.2.)  For Prio3, the PROOFS parameter controls the number
   of proofs (at least one) that are generated and verified.  In general
   the soundness error of the FLP is given by the following formula:

   (circuit_soundness + flp_soundness) ** PROOFS

   where:

   *  circuit_soundness is the soundness of the validity circuit
      (Section 7.3.2)

   *  flp_soundness is the base soundness of the proof system
      ([BBCGGI19], Theorem 4.3)

   For circuits involving joint randomness, we aim for the soundness
   error to be close to 2^-128 in order to mitigate offline attacks.
   Such circuits MUST use Field128 with at least one proof or Field64
   with at least three proofs.  Depending on the circuit, Field64 with
   two proofs might have significantly lower soundness than Field128
   with one proof.

   We stress that weak parameters (too small a field, too few proofs, or
   both) can be exploited to attack any aggregation task using those
   parameters.  To mitigate offline attacks, it is necessary to disable
   all tasks that use the weak parameters.

9.7.  Choosing the Number of Aggregators

   Two Aggregators are required for privacy in our threat model, but
   some (V)DAFs, including Prio3 (Section 7), allow for any number of
   Aggregators, only one of which needs to be trusted in order for the
   computation to be private.  To hedge against corruptions that happen
   during the course of the attack, deployments may consider involving
   more than two Aggregators as described for example in Section 5.9.
   Note however that some schemes are not compatible with this mode of
   operation, such as Poplar1.

9.8.  Defense-in-Depth Measures

   Prio3 and Poplar1 are designed to resist some attacks that fall
   outside the main threat model for VDAFs.

   Broadly speaking, domain separation is used to prevent cross protocol
   attacks, in which data from evaluation of one VDAF translates to an
   attack against another.  For example:

Barnes, et al.            Expires 7 April 2025                [Page 132]
Internet-Draft                    VDAF                      October 2024

   1.  Weak entropy sources: the VDAF algorithm ID is bound to each XOF
       invocation, thereby ensuring the outputs are different between
       VDAF invocations, even if the underlying randomness is the same.
       For example, two different instances of Prio3 would compute
       different measurement shares.

   2.  Weak parameters: Prio3 variants that require joint randomness are
       subject to offline attacks against robustness.  These attacks are
       feasible if the field size or number of proofs is sufficiently
       small.  (See Section 9.6.)  The joint randomness derivation is
       bound to both the field (via the algorithm ID) and the number of
       proofs, thereby ensuring that joint randomness derived for weak
       parameters is not reused for stronger parameters.  In addition,
       the joint randomness is bound to the application context, meaning
       any work the attacker does to attack some application is not
       useful for other applications that use the same parameters.

   There are also some important limitations to be aware of.  For
   example, Prio3 provides domain separation between families of
   circuits, but does not provide domain separation between instances of
   a circuit.  Concretely, it is possible for Aggregators to accept a
   report for Prio3SumVec from a Client who disagrees with them on the
   value of bits and length.  This is because there is no binding of the
   circuit parameters to the computation.

10.  IANA Considerations

   IANA is requested to make one new registry:

   *  DAF and VDAF Identifiers

   This registry should be created under the heading "Verifiable
   Distributed Aggregation Functions (VDAF)", and administered under the
   Specification Required policy [RFC8126].

   The "VDAF Identifiers" registry lists identifiers for Distributed
   Aggregation Functions (DAFs) and Verifiable Distributed Aggregation
   Functions (VDAFs).  These identifiers are four-byte values, so the
   minimum possible value is 0x00000000 and the maximum possible value
   is 0xffffffff.

   Template:

   *  Value: The four-byte identifier for the DAF or VDAF

   *  Scheme: The name of the DAF or VDAF

Barnes, et al.            Expires 7 April 2025                [Page 133]
Internet-Draft                    VDAF                      October 2024

   *  Type: Either "DAF" for a Distributed Aggregation Function or
      "VDAF" for a Verifiable Distributed Aggregation Function

   *  Reference: Where the algorithm is defined

   The initial contents of the registry are as follows:

     +===============+=======================+======+===============+
     | Value         | Scheme                | Type | Reference     |
     +===============+=======================+======+===============+
     | 0x00000000    | Reserved              | n/a  | RFC XXXX      |
     +---------------+-----------------------+------+---------------+
     | 0x00000001    | Prio3Count            | VDAF | Section 7.4.1 |
     |               |                       |      | of RFC XXXX   |
     +---------------+-----------------------+------+---------------+
     | 0x00000002    | Prio3Sum              | VDAF | Section 7.4.2 |
     |               |                       |      | of RFC XXXX   |
     +---------------+-----------------------+------+---------------+
     | 0x00000003    | Prio3SumVec           | VDAF | Section 7.4.3 |
     |               |                       |      | of RFC XXXX   |
     +---------------+-----------------------+------+---------------+
     | 0x00000004    | Prio3Histogram        | VDAF | Section 7.4.4 |
     |               |                       |      | of RFC XXXX   |
     +---------------+-----------------------+------+---------------+
     | 0x00000005    | Prio3MultihotCountVec | VDAF | Section 7.4.5 |
     |               |                       |      | of RFC XXXX   |
     +---------------+-----------------------+------+---------------+
     | 0x00000006    | Poplar1               | VDAF | Section 8.4   |
     |               |                       |      | of RFC XXXX   |
     +---------------+-----------------------+------+---------------+
     | 0xFFFF0000 to | Reserved for Private  | n/a  | n/a           |
     | 0xFFFFFFFF    | Use                   |      |               |
     +---------------+-----------------------+------+---------------+

          Table 23: Verifiable Distributed Aggregation Function
                           Identifiers Registry

   (RFC EDITOR: Please replace "RFC XXXX" above with the RFC number
   assigned to this document.)

   VDAF identifiers are used for domain separation, as described in
   Section 6.2.3.  Domain separation guards against failures of entropy
   sources, by ensuring that invocations of different VDAFs use
   different derived values, even if they are invoked with the same
   underlying random data.

Barnes, et al.            Expires 7 April 2025                [Page 134]
Internet-Draft                    VDAF                      October 2024

   The benefits of domain separation are undermined if different VDAFs
   are used with the same VDAF Identifier.  The "Reserved for Private
   Use" code points should thus be used judiciously, because they
   provide no defense against such collisions.  Applications SHOULD
   prefer the use of registered code points.

11.  References

11.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
              Writing an IANA Considerations Section in RFCs", BCP 26,
              RFC 8126, DOI 10.17487/RFC8126, June 2017,
              <https://www.rfc-editor.org/rfc/rfc8126>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
              <https://www.rfc-editor.org/rfc/rfc8446>.

   [TurboSHAKE]
              Viguier, B., Wong, D., Van Assche, G., Dang, Q., and J.
              Daemen, "KangarooTwelve and TurboSHAKE", Work in Progress,
              Internet-Draft, draft-irtf-cfrg-kangarootwelve-14, 9 May
              2024, <https://datatracker.ietf.org/doc/html/draft-irtf-
              cfrg-kangarootwelve-14>.

11.2.  Informative References

   [AGJOP21]  Addanki, S., Garbe, K., Jaffe, E., Ostrovsky, R., and A.
              Polychroniadou, "Prio+: Privacy Preserving Aggregate
              Statistics via Boolean Shares", 2021,
              <https://ia.cr/2021/576>.

   [BBCGGI19] Boneh, D., Boyle, E., Corrigan-Gibbs, H., Gilboa, N., and
              Y. Ishai, "Zero-Knowledge Proofs on Secret-Shared Data via
              Fully Linear PCPs", CRYPTO 2019 , 2019,
              <https://ia.cr/2019/188>.

Barnes, et al.            Expires 7 April 2025                [Page 135]
Internet-Draft                    VDAF                      October 2024

   [BBCGGI21] Boneh, D., Boyle, E., Corrigan-Gibbs, H., Gilboa, N., and
              Y. Ishai, "Lightweight Techniques for Private Heavy
              Hitters", IEEE S&P 2021 , 2021, <https://ia.cr/2021/017>.

   [BBCGGI23] Boneh, D., Boyle, E., Corrigan-Gibbs, H., Gilboa, N., and
              Y. Ishai, "Arithmetic Sketching", CRYPTO 2023 , 2023,
              <https://ia.cr/2023/1012>.

   [BGI15]    Boyle, E., Gilboa, N., and Y. Ishai, "Function Secret
              Sharing", EUROCRYPT 2015 , 2015,
              <https://www.iacr.org/archive/
              eurocrypt2015/90560300/90560300.pdf>.

   [CGB17]    Corrigan-Gibbs, H. and D. Boneh, "Prio: Private, Robust,
              and Scalable Computation of Aggregate Statistics", NSDI
              2017 , 2017,
              <https://dl.acm.org/doi/10.5555/3154630.3154652>.

   [DAP]      Geoghegan, T., Patton, C., Pitman, B., Rescorla, E., and
              C. A. Wood, "Distributed Aggregation Protocol for Privacy
              Preserving Measurement", Work in Progress, Internet-Draft,
              draft-ietf-ppm-dap-11, 21 May 2024,
              <https://datatracker.ietf.org/doc/html/draft-ietf-ppm-dap-
              11>.

   [Dou02]    Douceur, J., "The Sybil Attack", IPTPS 2002 , 2002,
              <https://doi.org/10.1007/3-540-45748-8_24>.

   [DPRS23]   Davis, H., Patton, C., Rosulek, M., and P. Schoppmann,
              "Verifiable Distributed Aggregation Functions", PETS
              2023 , 2023, <https://ia.cr/2023/130>.

   [Dwo06]    Dwork, C., "Differential Privacy", ICALP 2006 , 2006,
              <https://link.springer.com/chapter/10.1007/11787006_1>.

   [ENPA]     "Exposure Notification Privacy-preserving Analytics (ENPA)
              White Paper", 2021, <https://covid19-static.cdn-
              apple.com/applications/covid19/current/static/contact-
              tracing/pdf/ENPA_White_Paper.pdf>.

   [EPK14]    Erlingsson, Ú., Pihur, V., and A. Korolova, "RAPPOR:
              Randomized Aggregatable Privacy-Preserving Ordinal
              Response", CCS 2014 , 2014,
              <https://dl.acm.org/doi/10.1145/2660267.2660348>.

Barnes, et al.            Expires 7 April 2025                [Page 136]
Internet-Draft                    VDAF                      October 2024

   [GI14]     Gilboa, N. and Y. Ishai, "Distributed Point Functions and
              Their Applications", EUROCRYPT 2014 , 2014,
              <https://link.springer.com/
              chapter/10.1007/978-3-642-55220-5_35>.

   [GKWWY20]  Guo, C., Katz, J., Wang, X., Weng, C., and Y. Yu, "Better
              concrete security for half-gates garbling (in the multi-
              instance setting)", CRYPTO 2020 , 2020,
              <https://link.springer.com/
              chapter/10.1007/978-3-030-56880-1_28>.

   [GKWY20]   Guo, C., Katz, J., Wang, X., and Y. Yu, "Efficient and
              Secure Multiparty Computation from Fixed-Key Block
              Ciphers", S&P 2020 , 2020,
              <https://eprint.iacr.org/2019/074>.

   [GSZ20]    Goyal, V., Song, Y., and C. Zhu, "Guaranteed Output
              Delivery Comes Free in Honest Majority MPC", CRYPTO 2020 ,
              2020, <https://link.springer.com/
              chapter/10.1007/978-3-030-56880-1_22>.

   [MPDST25]  Mouris, D., Patton, C., Davis, H., Sarkar, P., and N. G.
              Tsoutsos, "Mastic: Private Weighted Heavy-Hitters and
              Attribute-Based Metrics", PETS 2025 , 2025,
              <https://eprint.iacr.org/2024/221>.

   [MPRV09]   Mironov, I., Pandey, O., Reingold, O., and S. Vadhan,
              "Computational Differential Privacy", CRYPTO 2009 , n.d.,
              <https://link.springer.com/
              chapter/10.1007/978-3-642-03356-8_8>.

   [MRH04]    Maurer, U., Renner, R., and C. Holenstein,
              "Indifferentiability, impossibility results on reductions,
              and applications to the random oracle methodology", In TCC
              2004: Theory of Cryptography, pages 21-39,
              DOI 10.1007/978-3-540-24638-1_2, February 2004,
              <https://doi.org/10.1007/978-3-540-24638-1_2>.

   [OriginTelemetry]
              "Origin Telemetry", 2020,
              <https://web.archive.org/web/20221025174046/
              https://firefox-source-
              docs.mozilla.org/toolkit/components/telemetry/collection/
              origin.html>.

   [SML24]    Satriawan, A., Mareta, R., and H. Lee, "A Complete
              Beginner Guide to the Number Theoretic Transform (NTT)",
              2024, <https://eprint.iacr.org/2024/585>.

Barnes, et al.            Expires 7 April 2025                [Page 137]
Internet-Draft                    VDAF                      October 2024

   [TestVectors]
              "Test vectors for Prio3 and Poplar1", commit hash <TODO> ,
              September 2024, <https://github.com/cfrg/draft-irtf-cfrg-
              vdaf/tree/main/test_vec/vdaf>.

Acknowledgments

   The impetus of this work is the success of recent deployments of
   predecessors of Prio3.  The Mozilla Origin Telemetry project
   [OriginTelemetry] and the Exposure Notification Private Analytics
   collaboration among the Internet Security Research Group (ISRG),
   Google, Apple, and others [ENPA] have together aggregated data from
   hundreds of millions of users.

   As the name implies, Prio3 is a descendant of the original Prio
   construction [CGB17].  A second iteration was deployed in the [ENPA]
   system, and like the VDAF described here, the ENPA system was built
   from techniques introduced in [BBCGGI19] that significantly improve
   communication cost.  That system was specialized for a particular
   aggregation function; the goal of Prio3 is to provide the same level
   of generality as the original construction.

   The security considerations in Section 9 are based largely on the
   security analysis of [DPRS23].  Thanks to Hannah Davis and Mike
   Rosulek, who lent their time to developing definitions and security
   proofs.

   Thanks to Junye Chen, Henry Corrigan-Gibbs, Armando Faz-Hernández,
   Simon Friedberger, Tim Geoghegan, Albert Liu, Brandon Pitman, Mariana
   Raykova, Michael Rosenberg, Jacob Rothstein, Shan Wang, Xiao Wang,
   Bas Westerbaan, and Christopher Wood for useful feedback on and
   contributions to the spec.

Test Vectors

      TODO Update the reference in [TestVectors] by replacing <TODO>
      with the commit hash with the final version of the test vectors.)

   The test vectors are available at [TestVectors].  The directory
   contains a set of JSON files.  Each file contains a test vector for
   an instance of Vdaf (Section 5).  A test vector covers sharding,
   preparation, aggregation, and unsharding of each of several
   measurements.  The test vector schema is defined below.

Schema

   ctx:  The application context string encoded in hexadecimal.

Barnes, et al.            Expires 7 April 2025                [Page 138]
Internet-Draft                    VDAF                      October 2024

   verify_key:  The verification key encoded in hexadecimal.

   agg_param:  The aggregation parameter of type Vdaf.AggParam.

   prep:  A list of objects with the following schema:

      measurement:  The measurement of type Vdaf.Measurement.

      nonce:  The nonce encoded in hexadecimal.

      rand:  The sharding randomness encoded in hexadecimal.

      public_share:  The expected public share encoded in hexadecimal.

      input_shares:  The expected list of input shares, each incoded in
         hexadecimal.

      prep_shares:  The expected list of prep shares generated by each
         Aggregator at each round of preparation, encoded in
         hexadecimal.

      prep_messages:  The expected list of prep messages for each round
         of preparation, encoded in hexadecimal.

      out_shares:  The expected list of output shares, encoded in
         hexadecimal.

   agg_shares:  The expected aggregate shares encoded in hexadecimal.

   agg_result:  The expected aggregate result of type Vdaf.AggResult.

   The schema also includes whatever parameters are required to
   instantiate the VDAF.  These are listed in the subsections below.

Prio3Count

   shares:  The number of shares, an integer.

Prio3Sum

   shares:  The number of shares, an integer.

   max_measurement:  The largest measurement, an integer.  Each element
      is in range range(max_measurement+1).

Barnes, et al.            Expires 7 April 2025                [Page 139]
Internet-Draft                    VDAF                      October 2024

Prio3SumVec

   shares:  The number of shares, an integer.

   length:  The lengh of the vector, an integer.

   chunk_length:  the length of each vector chunk, an integer.

   bits:  the bit length of each element of the vector, an integer. each
      element is in range(2 ** bits).

Prio3Histogram

   shares:  The number of shares, an integer.

   length:  The lengh of the vector, an integer.

   chunk_length:  the length of each vector chunk, an integer.

Prio3MultihotCountVec

   shares:  The number of shares, an integer.

   length:  The lengh of the vector, an integer.

   chunk_length:  the length of each vector chunk, an integer.

   max_weight:  The largest vector weight, an integer.  The sum of the
      elements must be in range(max_weight+1).

Poplar1

   bits:  The length of each input, an integer.

Authors' Addresses

   Richard L. Barnes
   Cisco
   Email: rlb@ipv.sx

   David Cook
   ISRG
   Email: divergentdave@gmail.com

   Christopher Patton
   Cloudflare

Barnes, et al.            Expires 7 April 2025                [Page 140]
Internet-Draft                    VDAF                      October 2024

   Email: chrispatton+ietf@gmail.com

   Phillipp Schoppmann
   Google
   Email: schoppmann@google.com

Barnes, et al.            Expires 7 April 2025                [Page 141]