Skip to main content

PSI based on ECDH
draft-wang-ppm-ecdh-psi-00

Document Type Active Internet-Draft (individual)
Authors wangyuchen , Chang , Yufei Lu , Cheng Hong , Jin Peng
Last updated 2024-11-15
Replaces draft-ecdh-psi
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-wang-ppm-ecdh-psi-00
Privacy Preserving Measurement                                   Y. Wang
Internet-Draft                                                  W. Chang
Intended status: Informational                                     Y. Lu
Expires: 19 May 2025                                             C. Hong
                                                                 J. Peng
                                                               Ant Group
                                                        15 November 2024

                           PSI based on ECDH
                       draft-wang-ppm-ecdh-psi-00

Abstract

   This document describes Elliptic Curve Diffie-Hellman Private Set
   Intersection (ECDH-PSI).  It instantiates the classical Meadows
   match-making protocol with standard elliptic curves and hash-to-curve
   methods.  In ECDH-PSI, data items are encoded to points on an
   elliptic curve, and masked by the private keys of both parties.
   After collecting the mutually masked datasets from both parties, a
   participant computes their intersection and outputs the corresponding
   original data items as result.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 19 May 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.

Wang, et al.               Expires 19 May 2025                  [Page 1]
Internet-Draft                  ECDH-PSI                   November 2024

   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Terminology  . . . . . . . . . . . . . . . .   3
     1.2.  Data Structures and Notations . . . . . . . . . . . . . .   3
   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Elliptic Curves . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Hashing to Elliptic Curves  . . . . . . . . . . . . . . .   4
   3.  Protocol  . . . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   7
     3.2.  Handshake . . . . . . . . . . . . . . . . . . . . . . . .   9
       3.2.1.  HandshakeRequest  . . . . . . . . . . . . . . . . . .   9
       3.2.2.  HandshakeResponse . . . . . . . . . . . . . . . . . .  13
     3.3.  Data Exchange . . . . . . . . . . . . . . . . . . . . . .  15
       3.3.1.  EcdhPsiBatch  . . . . . . . . . . . . . . . . . . . .  15
       3.3.2.  EcdhPsiBatchResponse  . . . . . . . . . . . . . . . .  17
   4.  Implementation Considerations . . . . . . . . . . . . . . . .  18
     4.1.  The Representation of EC Point  . . . . . . . . . . . . .  18
     4.2.  The Management of Index . . . . . . . . . . . . . . . . .  18
     4.3.  Support More hash_to_curve Suites . . . . . . . . . . . .  19
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  20
     5.1.  Data Truncation . . . . . . . . . . . . . . . . . . . . .  20
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  21
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  21
     7.2.  Informative References  . . . . . . . . . . . . . . . . .  22
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  25

1.  Introduction

   Private Set Intersection (PSI) schemes enable the discovery of shared
   elements among different parties' datasets without revealing
   individual data.  They are widely used when there's a need to
   identify overlapping data elements between two or more parties while
   preserving the confidentiality of each party's original data.  PSI is
   one of the most frequently used privacy preserving techniques in
   business, e.g. it enables a user to detect whether his/her password
   is leaked without giving away the password to the server [MS21], or
   multiple companies to find their common customers without giving each
   other their raw data.  Furthermore, many privacy-respecting systems
   can leverage PSI schemes to collaborate with other data providers for
   the purpose of enhancing the privacy of data-exchange processes.  As

Wang, et al.               Expires 19 May 2025                  [Page 2]
Internet-Draft                  ECDH-PSI                   November 2024

   an example, a service provider who has aggregated attributes from
   individuals following [I-D.ietf-ppm-dap][draft-irtf-cfrg-vdaf-12] can
   use PSIs to compare its results with another entity without
   disclosing any attribute that does not owned by the partner.

   The classical Diffie-Hellman PSI [Meadows86] has been published for
   more than thirty years.  The academic has also proposed numerous PSI
   schemes with new functionalities and secure guarantees, such as
   [KKRT16][CHLR18][RR22], but DH-PSI is still preferable due to its
   simplicity and communication efficiency.  This document describes a
   widely deployed instance of DH-PSI denoted by Elliptic Curve Diffie-
   Hellman PSI (ECDH-PSI).  Compared with the finite field parameters
   used in the original version, elliptic curves are commonly considered
   to be more efficient and secure [IMC][LOGJAM], and are extensively
   adopted by recent standards including [RFC8031][RFC8446][RFC9497].

   In document describes 2-party ECDH-PSI as a two stage protocol,
   including a handshake phase and a data exchange phase.  In the
   handshake phase, the parties agree on the cipher suite and message
   flow to be used in data exchange phase.  Then, during the data
   exchange phase, original data elements are masked and transmitted in
   batches, allowing one or both parties to output the result.

1.1.  Requirements Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

1.2.  Data Structures and Notations

   In this document, data structures are defined and encoded according
   to the conventions laid out in Section 3 of [RFC8446].

2.  Background

   This section gives brief introductions for elliptic curves and hash-
   to-curve methods.

Wang, et al.               Expires 19 May 2025                  [Page 3]
Internet-Draft                  ECDH-PSI                   November 2024

2.1.  Elliptic Curves

   An elliptic curve E is defined by a two-variable equation over a
   finite field F with prime characteristic p larger than three.  Except
   for a special point called point at infinity, a point on E is a
   (x,y)-coordinate pair that satisfies the curve equation, where x and
   y are elements of F.  All distinct points of curve E consists an
   algebraic group of order n, where the point at infinity is the
   identity element.  Applications and standards generally use a prime
   order (sub)group <G> generated by a point G on E.  The order of <G>
   is denoted by r.

   This document uses term "addition" to refer to the group operation of
   a elliptic curve group, meaning two points P and Q can be added
   together to get a third point R.  Scalar multiplication refers to the
   operation of adding a point P with itself repeatedly.  This document
   uses scalar_mul to denote the scalar multiplication process.  It
   takes a integer x<r and a point P as inputs and outputs the
   multiplication result Q, which is written by Q=scalar_mul(P,x).

   Many cryptographic applications require the participants to generate
   elliptic curve key pairs.  Usually, a key pair on refers to a (PK,sk)
   tuple, where sk is a integer generated uniformly between 1 and r-1,
   and PK=scalar_mul(G,sk).  In the context of ECDH-PSI, only sk, namely
   the private key, is used.  Thus, this document uses the notation
   sk=keygen() to refer to the process of generating a private key on
   the elliptic curve and omits PK.

2.2.  Hashing to Elliptic Curves

   A hash_to_curve function encodes data of arbitrary length to a point
   on a elliptic curve.  The encoding process first hashes the byte
   string to one or more elements of fixed length, and then calculates a
   point on the curve with the elements using so-called map_to_curve and
   clear_cofactor functions.

   [RFC9380] describes a series of hash_to_curve methods (which are also
   called "suites") for standard curves such as NIST P-256 and
   curve25519.  It specifies two encoding types denoted by
   "encode_to_curve" and "hash_to_curve".  Both encodings can be proven
   indifferentiable from random oracles (ROs)[MRH04], but have different
   output distributions.  This document only uses "hash_to_curve"
   encodings which output uniformly random points.  It also uses the
   notation P=hash_to_curve(data) to refer to the process of encoding
   data of arbitrary length to a point P on the curve given in the
   context.

Wang, et al.               Expires 19 May 2025                  [Page 4]
Internet-Draft                  ECDH-PSI                   November 2024

   [RFC9380] requires all uses of the hash_to_curve suites must enforce
   domain separation with domain separation tags (DSTs).  In particular,
   DSTs are distinct strings that enable different applications to
   simulate distinct RO instances with the same hash function.  That is,
   each application concatenates the DST with its message as the input
   to hash function.  This document allocates each suit with a unique
   DST.  For simplicity, DST strings are omitted when writing the
   process of calculating the hash value of a string.  The notation
   tag=hash(str) means that the hash function actually takes "dst||str"
   as its input rather than str alone, where dst is the DST string
   defined for the corresponding hash_to_curve suite.  Similarly, this
   document also omits DST strings when describing hash_to_curve
   functions, assuming that the corresponding DST has been taken as a
   part of the input.

   A suite defined by [RFC9380] includes a series of parameters such as
   the curve, hash function, and encoding type.  In applications, these
   suites are very efficient to be referred to, as the included
   parameters can also be used beyond the scenario of mapping data to
   curves.  For example, the participants can negotiate a hash_to_curve
   suite, and then use the curve deduced from the suite for other
   cryptographic functions such as encryption and key establishment.

   To extend the usage of hash_to_curve suites, [RFC9380] also describes
   methods and naming conventions for defining new suites, which enables
   developers to define suites with new curves and hash functions.

3.  Protocol

   We begin with a very brief description on how ECDH-PSI works.
   Firstly, the participants, A and B, agree on a EC group <G> along
   with the hash_to_curve suites, and generate private keys over <G>.
   Then, both A and B convert their data to points over <G> and multiply
   the points with both parties' private keys.  Next, they exchange the
   sets of masked elements as sets of EC points, and compute the
   intersection over these sets.  Finally, they output the original data
   items corresponding to the intersection of masked sets.

   Assume that the parties have agreed on the suite, and generated
   private keys of sk_A and sk_B, an simplified protocol flow of ECDH-
   PSI is shown by Figure 1, and explained step-by-step as follows:

   1.  A participant maps its data items to EC points with
       hash_to_curve, and masks the points locally with its own private
       key by scalar_mul.

   2.  A and B exchange their locally masked elements.

Wang, et al.               Expires 19 May 2025                  [Page 5]
Internet-Draft                  ECDH-PSI                   November 2024

   3.  Upon receiving the masked data from its partner, a participant
       doubly masks the received points with its private key.

   4.  A participant sends the doubly-masked elements back to its
       partner.

   5.  The participant calculates intersection of the set calculated in
       Step 3 and the set received in Step 4, and finally outputs the
       corresponding original elements.

   In order to clarify the statement, this document uses "the first
   round" to refer to the data exchange process of Step 2, and "the
   second round" for the process of Step 4.

   The classical description of ECDH-PSI enables both parties to output
   PSI results and thus requires two complete rounds (i.e.  Step 2 and
   Step 4) to exchange masked data.  However, many application scenarios
   require that should only one participant output the intersection and
   the other one get nothing.  In such scenarios, Step 4 and Step 5 are
   executed unidirectionally, where only one party sends doubly-masked
   data in Step 4 and only the receiver of that message can output the
   result.

   To output PSI result, the participants should also match the original
   data items with their corresponding EC points.  In particular, the
   relationship can be established with unique indexes, where the
   original data item and the masked EC point(s) are linked to the same
   index.

Wang, et al.               Expires 19 May 2025                  [Page 6]
Internet-Draft                  ECDH-PSI                   November 2024

               A(data_A,sk_A)                  B(data_B,sk_B)
 -----------------------------------------------------------------------
 Step 1:             |                               |
                     |                               |
             p_A=scalar_mul(sk_A,            p_B=scalar_mul(sk_B,
             hash_to_curve(data_A))          hash_to_curve(data_B))
                     |                               |
 Step 2:             |                               |
                     |                               |
                     |-------------p_A-------------->|
                     |<------------p_B---------------|
                     |                               |
 Step 3:             |                               |
                     |                               |
        p_AB=scalar_mul(sk_A,p_B)     p_BA = scalar_mul(sk_B,p_A)
                     |                               |
                     |                               |
                     |                               |
 Step 4:             |                               |
                     |                               |
                     |-------------p_AB------------->|
                     |<------------p_BA--------------|
                     |                               |
 Step 5:             |                               |
                     |                               |
          set_A=intersect(p_AB,p_BA)      set_B=intersect(p_BA,p_AB)
                     |                               |
                     |                               |
          output match(set_A,data_A)      output match(set_B,data_B)

            Figure 1: A Simplified Protocol Flow of ECDH-PSI

3.1.  Overview

   The specification of ECDH-PSI consists of two phases as shown in
   Figure 2, including a handshake phase which negotiates the protocol
   parameters and a data exchange phase that performs the operations
   described by Figure 1.

   * In the handshake phase, a participant, which is denoted by
      requester, sends a HandshakeRequest message to initiate the ECDH-
      PSI protocol flow.  This message carries lists of parameters
      supported by the requester and information about the requester's
      data.  Then, the other participant, denoted by responder, selects
      parameters from the requester's lists, and sends the parameters
      with its data information in a HandshakeResponse message.

Wang, et al.               Expires 19 May 2025                  [Page 7]
Internet-Draft                  ECDH-PSI                   November 2024

   * Next, in the data exchange phase, both parties perform ECDH-PSI
      operations with parameters selected in the handshake phase, where
      EcdhPsiBatch messages are exchanged.  The receiver of an
      EcdhPsiBatch replies with a EcdhPsiBatchResponse message, which
      tells sender the handling status of last EcdhPsiBatch message.

   This specification enables masked data to be transferred by many
   batches, as the entire data set may be extremely large and requires
   numerous resources to be handled with properly.  For example, the
   size of a data set may exceed the limit of a single package of the
   underlying network protocol.  The participants can divide the data
   set to multiple batches and use multiple EcdhPsiBatch messages to
   send them so as to meet the limits of network, storage or computation
   resources.

   Besides hash_to_curve suites and data formats, the handshake phase
   also determines the interaction pattern for data exchange phase.  The
   pattern is decided by the following options:

   * The participant that outputs the result.  If only one participant
      (for example, A) has the privilege of obtaining the result, then,
      in the second round, EcdhPsiBatch messages will be only sent from
      B to A.

   * The maximum size of a single EcdhPsiBatch message.  Upon the
      determination of maximum batch size, the participants can deduce
      the number of batch messages sent or received in each round by
      dividing the size of entire data set with the maximum size.

   * The order of sending EcdhPsiBatch messages.  When both participants
      need to send batch messages in the same round, the requester will
      always send the first one.  Then, they can send EcdhPsiBatch
      messages in turn, or let the requester send all batches in
      sequence before the responder sending its data.

Wang, et al.               Expires 19 May 2025                  [Page 8]
Internet-Draft                  ECDH-PSI                   November 2024

                   A                                     B
 -----------------------------------------------------------------------
                   |                                     |
 Handshake Phase:  |                                     |
                   |----------HandshakeRequest---------->|
                   |<---------HandshakeResponse----------|
                   |                                     |
 Data Exchange     |
 Phase:            |                                     |
                   |                                     |
                   |-------------EcdhPsiBatch----------->|
                   |<--------EcdhPsiBatchResponse--------|
                   |                   .                 |
                   |                   .                 |
                   |                   .                 |
                   |-------------EcdhPsiBatch----------->|
                   |<--------EcdhPsiBatchResponse--------|
                   |                   .                 |
                   |                   .                 |
                   |                   .                 |

    Figure 2: An Overview of the Two-Phase Specification of ECDH-PSI

3.2.  Handshake

   The handshake phase includes a HandshakeRequest and a
   HandshakeResponse message.  The structures of these messages are
   documented as follows.

3.2.1.  HandshakeRequest

   Structure of HandshakeRequest:

     struct {
       uint8 version = 1;
       ProtocolProposal protocol_param;
       DataProposal data_param;
     } HandshakeRequest;

   version: The version of ECDH-PSI protocol.

   protocol_param: ECDH-PSI protocol configurations supported by the
   requester.

   data_param: The requester's data information, including its data set
   size and proposals for data exchange pattern.

Wang, et al.               Expires 19 May 2025                  [Page 9]
Internet-Draft                  ECDH-PSI                   November 2024

3.2.1.1.  Protocol Proposal

   The ProtocolProposal message describes ECDH-PSI protocol parameters
   supported by the requester.

   Structure of this message:

   enum {
     null (0),
     P256_XMD:SHA-256_SSWU_NU_ (1),
     P384_XMD:SHA-384_SSWU_NU_ (2),
     P521_XMD:SHA-512_SSWU_NU_ (3),
     curve25519_XMD:SHA-512_ELL2_NU_ (4),
     (255)
   } HashToCurveSuite

   enum { compressed (0), uncompressed  (1), (255) } PointOctetFormat

   enum {
     no_truncation (0),
     128_bit_truncation (1),
     (255)
     } TruncationOption

   struct {
     HashToCurveSuite    suites<1..255>;
     PointOctetFormat    point_octet_formats<1..255>;
     TruncationOption    truncation_options<1..255>;
   } ProtocolProposal

   suites: The list of supported HashToCurveSuite in order of the
   requester's preference.  .

   Different suites use different DST strings as required by [RFC9380].
   In particular, ECDH-PSI uses "ECDH-PSI-V<xx>-<suites>" as its DST,
   where <xx> is a two-digit number indicating the current protocol
   version as specified by the version field of HandshakeRequest, and
   <suites> is the ASCII string representation of the currently adopted
   suite as defined by [RFC9380].  As an example, when both participants
   agree to use the suite for P256 curve, the corresponding DST is
   "ECDH-PSI-V01-P256_XMD:SHA-256_SSWU_NU_".

   point_octet_formats: The list of octet string formats representing EC
   points, in the order of requester's preference.  This field enables
   the participants to weigh the tradeoffs between the costs of
   communication and computation as discussed by Section 4.1.

Wang, et al.               Expires 19 May 2025                 [Page 10]
Internet-Draft                  ECDH-PSI                   November 2024

   * For the curves defined in [FIPS186-4], the PointOctetFormat values
      refer to the corresponding compressed/uncompressed formats
      documented by [ECDSA].

   * Points on Curve25519 are always encoded as the input and output of
      X25519 function defined by Section 5 of [RFC7748] as 32 byte
      strings.  Unlike the other curves, the scalar multiplication of
      curve25519 only relies on one 32-byte coordinate without the cost
      of decompressing to the (x,y)-coordinate representation, which
      eliminates the need of making a tradeoff.

   truncation_options: Whether the requester supports truncation for
   data transferred in the second round.  The options are listed in the
   requester's preference.  The truncation could decrease the costs of
   data transmission and storage, and even accelerate the computation
   process of intersection, at the expense of a (negligible) false-
   positive rate.

   In this document, only 128-bit truncation is allowed, with a
   restriction that the total number of data items owned by both
   participants SHOULD be no more than 2^40, ensuring that the
   probability of collision is smaller than 2^-48.  See Section 5.1 for
   a detailed discussion.  The requester MUST set this field by a single
   no_truncation option when its data set size has already been equal or
   larger than 2^40.

   The truncation process utilizes Key Derivation Functions (KDFs) as
   many key-exchange protocols that use shared secrets to derive shorter
   strings which can be seen as uniformly distributed cryptographic keys
   (e.g., [RFC8446][RFC9382]).  In this document, the KDFs are
   instantiated by Hashed Message Authentication Code (HMAC) [RFC2104],
   along with HMAC-based KDF (HKDF)[RFC5869], and the hash function
   included in hash_to_curve suite.

   Following [RFC5869], a KDF function kdf() takes a input keying
   material ikm, a salt value salt, an application specific information
   info and output length len as its inputs.  Let mask_data be the
   string representation of a doubly-masked element, a participant
   SHOULD truncate mask_data with:

   truncated_mask_data = kdf(mask_data, nil, "ECDH-PSI", 128)

   In particular, the KDF takes mask_data as the input keying material.
   It does not take a salt value, uses ASCII string "ECDH-PSI" as the
   application specification information, and outputs a 128-bit string
   as the truncation result.

Wang, et al.               Expires 19 May 2025                 [Page 11]
Internet-Draft                  ECDH-PSI                   November 2024

3.2.1.2.  Data Proposal

   The data_param field describes requester's data size and provides
   supported options on how to exchange EcdhPsiBatch messages.

   Structure of this message:

   enum { continuous (0), interactive (1), (255) } BatchMode

   enum { both (0), requester (1), responder (2), (255) } OutputMode

   struct{
     BatchMode  supported_batch_modes<1..255>;
     OutputMode supported_output_modes<1..255>;
     uint64     max_batch_size;
     uint64     item_num;
   } DataProposal

   supported_batch_modes: The list of interaction patterns supported by
   the requester in the order of preference.  BatchMode describes the
   message interaction pattern when both parties need to send their data
   sets with multiple EcdhPsiBatch messages in the same round.

   *  continuous: A participant sends its entire data set with
      continuous EcdhPsiBatch messages.

   *  interactive: The participants send EcdhPsiBatch messages in turn.
      If one party has sent all its batch messages before its partner,
      the partner SHOULD send its messages continuously.

   When both parties need to send data batches in the same round, the
   requester SHOULD always send the first EcdhPsiBatch message.

   supported_output_modes: The list of output modes supported by the
   requester in the order of its preference, where OutputMode describes
   which party has the privilege of obtaining the result.

   The meaning of OutputMode is explained as follows:

   *  both: Both parties output the intersection result, which means
      EcdhPsiBatch are sent mutually in the second round.

   *  requester: Only the requester outputs PSI result, which means only
      the responder sends message batches in the second round.

   *  responder: Only the responder outputs PSI result, which means only
      the requester sends message batches in the second round.

Wang, et al.               Expires 19 May 2025                 [Page 12]
Internet-Draft                  ECDH-PSI                   November 2024

   max_batch_size: The maximum byte size of a single EcdhPsiBatch
   message allowed by the requester.  That is, the requester will not
   send a batch message beyond this limit, and it cannot receive a batch
   message having larger size.  The requester SHOULD decide the value of
   this field comprehensively according to its hardware/software
   configuration and environment.

   item_num: The total number of requester's data items.

3.2.2.  HandshakeResponse

   The responder sends HandshakeResponse to reply to a HandshakeRequest
   message.  It includes selected parameters and the responder's data
   information.

   Structure of this message:

   enum {
     success(0),
     generic_error         (0x01),
     unsupported_version   (0x02)
     invalid_request       (0x03),
     out_of_resource       (0x04),
     unsupported_parameter (0x05),
     (0xFF)
   } HandshakeStatus

   struct {
     HandshakeStatus status;
     ProtocolResult protocol_param;
     DataResult data_param;
   } HandshakeResponse;

   status: Indicating the status of handshake.  The meanings of error
   statuses are listed as follows:

   *  generic_error: The error cannot be categorized to the rest error
      statuses defined by HandshakeStatus.

   *  unsupported_version: The responder does not support the ECDH-PSI
      protocol version given by version.

   *  invalid_request: The responder cannot parse the request correctly.
      The message may be in wrong format and cannot be parsed as a
      normal HandshakeRequest message.

Wang, et al.               Expires 19 May 2025                 [Page 13]
Internet-Draft                  ECDH-PSI                   November 2024

   *  out_of_resource: The responder does not have enough resource to
      handle the ECDH-PSI process following to the options provided by
      the requester.

   *  unsupported_parameter: The responder rejects all options proposed
      by one of the suggested option lists included in HandshakeRequest.

   The responder MUST ignore all options or parameters that it cannot
   recognize when parsing the HandshakeRequest message.  If one of the
   suggested option lists is filled with unrecognized parameters, it
   SHOULD reply with a HandshakeResponse carrying unsupported_parameter.

   Upon receiving a HandshakeResponse message without success status,
   the requester MUST ignore the other fields included in this message.

   protocol_param: The protocol parameter selected by the responder,
   including the hash_to_curve suite, EC point string format and
   truncation option.

   data_param: This message includes the data exchange pattern selected
   by the responder and the size of responder's dataset.

3.2.2.1.  Protocol Result

   The structure of ProtocolResult is defined as follows:

   struct {
     HashToCurveSuite    suite;
     PointOctetFormat    point_octet_format;
     TruncationOption    truncation_option;
   } ProtocolResult;

   suite: The hash_to_curve suite selected by the responder.

   point_octet_format: The format of EC point octet string chosen by the
   responder.

   truncation_option: The truncation option selected by the responder.
   When deciding the field, the responder SHOULD consider the sum of
   both participants' dataset sizes.  If the sum is larger than 2^40,
   truncation MUST no be used, which means the responder MUST set this
   field by no_truncation.

3.2.2.2.  Data Result

   The structure of DataInfoResult is defined as follows:

Wang, et al.               Expires 19 May 2025                 [Page 14]
Internet-Draft                  ECDH-PSI                   November 2024

   struct  {
     BatchMode   batch_mode;
     OutputMode  output_mode;
     uint64      max_batch_size;
     uint64      item_num;
   } DataInfoResult;

   batch_mode: The BatchMode selected by responder.

   output_mode: The OutputMode selected by responder.

   max_batch_size: The maximum size of a EcdhPsiBatch message, which
   SHOULD NOT be larger than the max_batch_size given by the requester.

   item_num: The size of responder's dataset.

3.3.  Data Exchange

   The data exchange phase consists of two rounds.  In each round, a
   participant may send or receive multiple EcdhPsiBatch messages.  Upon
   receiving a EcdhPsiBatch and handling the data properly, the
   participant SHOULD respond with a EcdhPsiBatchResponse message,
   indicating its partner that it is ready to receive the next
   EcdhPsiBatch message.

3.3.1.  EcdhPsiBatch

   The structure of EcdhPsiBatch is defined as follows:

   enum { transmit(0), retransmit(1), fatal_error(2), (255)} BatchStatus

   struct {
     uint64 index;
     opaque octet;
   } ECPointOctet;

   struct {
     BatchStatus   status;
     uint32        batch_type;
     uint64        batch_index;
     uint64        batch_count;
     uint32        is_last_batch;
     ECPointOctet  data<1..2^64-1>;
   } EcdhPsiBatch;

   status: Status of current batch message.  Values of this field are
   explained as follows:

Wang, et al.               Expires 19 May 2025                 [Page 15]
Internet-Draft                  ECDH-PSI                   November 2024

   *  transmit: It is a normal batch message which has not been
      transmitted before.

   *  retransmit: This message has been transmitted, but it is re-
      transmitted due to the receiver's request.  A batch message may be
      retransmitted several times before it is properly handled by the
      receiver.  For example, the receiver may need a relative long time
      to prepare the storage resource, and require the sender to re-send
      the batch until it has enough space.

   *  fatal_error: The sender of this batch message cannot continue the
      ECDH-PSI protocol procedure due to some fatal errors.  The
      receiver MUST ignore the other fields included in the current
      EcdhPsiBatch message and terminate the PSI process.

      Such fatal error may occur in the handshake phase or data exchange
      phase.  For example, if the responder sends a HandshakeRespond
      message with parameters unsupported by the requester, the
      requester should construct and send a EcdhPsiBatch message
      carrying fatal_error status so as to terminate the process.

   batch_type: This field indicates the mask status of current batch.
   That is, value "1" stands for that the data included in the current
   batch are only masked by the owner's private key, and "2" means the
   data has been doubly-masked with both participants' private keys.

   batch_index: The index of current EcdhPsiBatch message, which is an
   incremental counter starting with "1".  The participant SHOULD use
   independent counters for different rounds and protocol runs.  For
   example, if participant A needs to send dataset p_A in the first
   round by 10 batches, and sends p_AB in the second round by 5 batches,
   it should use batch_index from 1 to 10 for the first round, and 1 to
   5 for the second.

   batch_count: Number of EC points included in the current EcdhPsiBatch
   message.

   is_last_batch: Indicating whether this batch is the last one, where
   "1" means the batch is the last one from the sender in current round,
   and "0" for there exists subsequent batches.

   data: This field carries multiple EC points with ECPointOctet, which
   number is specified by the batch_count field.  Each ECPointOctet
   structure includes an unique index allocated by the owner of the
   corresponding original data, and the encoded octet string.

Wang, et al.               Expires 19 May 2025                 [Page 16]
Internet-Draft                  ECDH-PSI                   November 2024

   The purpose of index is to associate the original data item with the
   (doubly)-masked EC points, such that the participant can match its
   dataset with the intersection of masked datasets.  The values of
   index are generated by the owner of data so as to identify each data
   item uniquely.  When masking the data from its partner, a participant
   MUST reserve the index for every record.  Section 4.2 gives more
   discussions on the implementation and maintenance of such indexes.

   The content of each octet string is determined by the selected
   hash_to_curve suite, octet format and truncation option.  In the
   first round, the receiver SHOULD decode the octet strings with the
   configurations negotiated in the handshake phase in order to mask the
   data provided by the sender.  However, in the second round where
   masking is not required, the receiver SHOULD treat the contents as
   octet strings rather than decode them to EC points, as such strings
   are enough for computing the intersection.

3.3.2.  EcdhPsiBatchResponse

   The structure of EcdhPsiBatchResponse is defined as follows:

   enum {
     success (0),
     retransmit (1),
     fatal_error (2),
     (255)
   } BatchResponseStatus;

   struct {
     BatchResponseStatus status;
     uint64 batch_index;
   } EcdhPsiBatchResponse;

   status: Indicating whether the EcdhPsiBatch message identified by
   batch_index has been handled properly.  Values of this field are
   explained as follows:

   *  success: The EcdhPsiBatch message has been handled in a proper
      way.  Upon receiving a response with this status, the sender of
      EcdhPsiBatch can send the subsequence batch or prepare to receive
      the EcdhPsiBatch message from its partner.

   *  retransmit: Upon receiving retransmit, the participant MUST re-
      send the EcdhPsiBatch message identified by batch_index.

   *  fatal_error: The ECDH-PSI process meets some fatal error such that
      both the participants MUST terminate the current session, and
      discard all data received.

Wang, et al.               Expires 19 May 2025                 [Page 17]
Internet-Draft                  ECDH-PSI                   November 2024

4.  Implementation Considerations

4.1.  The Representation of EC Point

   When deciding the point encoding format for the elliptic curves
   expect curve25519, the participants should consider the aspects of
   bandwidth, storage and computation comprehensively.  Compressed point
   format could decrease the costs of transmission and storage, but it
   also needs more computation resources to "decompress" the point.  The
   "decompress" process may be as costly as scalar multiplication
   [CZZDL24].  Uncompressed format requires more storage spaces and
   needs more time to transmit, but the participant can perform scalar
   multiplications without extra effort to recover the points.

   For example, when both parties are deployed in the same data center
   and linked with a high-bandwidth local area network (LAN), they can
   choose to use the uncompressed format to achieve better performance.

   As another example, the parties may be deployed in geographically
   separated data centers connected with low bandwidth wide area network
   (WAN), and equipped with high-end computation acceleration devices
   such as Graphics Processing Units (GPUs).  In this case, the
   computation resource may not be the bottleneck, and the participants
   can choose the compress format to minimize the costs of data
   transmission.

4.2.  The Management of Index

   A RECOMMENDED method of maintaining indexes is storing the records
   with a database table and use the row numbers or prime keys as
   indexes.  The intersection result can be matched with original data
   items by a simple join operation.  The participant could also design
   a different indexing mechanism on its own, as long as guaranteeing
   its uniqueness.

   This document uses explicit indexes to identify data items rather
   than treats the order of records as "implicit" indexes.  Compare with
   the implicit counterpart, explicit indexes can be efficiently created
   and maintained by modern databases, and do not require the
   participants to implement extra logic to preserve the order of
   records.

Wang, et al.               Expires 19 May 2025                 [Page 18]
Internet-Draft                  ECDH-PSI                   November 2024

   When masking the EC points from its partner, a participant MUST send
   the doubly-masked data with correct indexes.  To achieve this, it
   keeps the indexes of each data items received in the first round,
   masks the records with its private key, and associates the masked
   records with the same indexes.  These operations can also be
   implemented easily with database operations by viewing the received
   records and masked records as columns in the same table, which
   enables them to be linked naturally via the primary key.

4.3.  Support More hash_to_curve Suites

   Participants can negotiate hash_to_curve suites besides the ones
   listed by Section 3.2.1.1.  They can use other suites defined by
   [RFC9380], or use a self-defined suite following the syntax and
   convention given by Section 8.9 and 8.10 of [RFC9380].

   As an example, this document next defines a hash_to_curve suite for
   the ShangMi(SM) curve and cryptography algorithms.  The SM algorithms
   are becoming mandatory in China, and have been accepted by
   international standards such as [ISO-SM2], [ISO-SM3] and [RFC8998].

   The new suite, denoted by curveSM2_XMD:SM3_SSWU_RO, encodes data to
   points on the so-called curveSM2[GBT.32918.5-2017] with SM3 hash
   algorithm[GBT.32905-2016], and reuses the expand_message_xmd and
   Simplified Shallue-van de Woestijne-Ulas (SSWU) methods for message
   expansion and mapping.

   In particular, curveSM2_XMD:SM3_SSWU_RO_ is defined as follows:

   *  encoding type: hash_to_curve

   *  E: y^2 = x^3 + A * x + B, where

      -  A = -3

      -  B = 0x28e9fa9e9d9f5e344d5a9e4bcf6509a7f39789f515ab8f92ddbcbd414
         d940e93

   *  p: 2^256-2^224-2^96+2^64-1

   *  m: 1

   *  k: 128

   *  expand_message: expand_message_xmd (Section 5.3.1 of [RFC9380])

   *  H: SM3

Wang, et al.               Expires 19 May 2025                 [Page 19]
Internet-Draft                  ECDH-PSI                   November 2024

   *  L: 48

   *  f: SSWU method (Section 6.6.2 of of [RFC9380])

   *  Z: -9

   *  h_eff: 1

   The value Z is calculated by the sage script given by Appendix H.2 of
   [RFC9380] with the p, A and B parameters of curveSM2.

   The participants SHOULD agree on the meaning of HashToCurveSuite
   values.  In this document, suite curveSM2_XMD:SM3_SSWU_RO_ is
   identified as follows:

   enum {
     null                              (0),
     ... // (1) to (4) follow section 3.2.1.1
     curveSM2_XMD:SM3_SSWU_RO_         (5),
     ... // other hash_to_curve suites
     (255)
   } HashToCurveSuite;

5.  Security Considerations

5.1.  Data Truncation

   This section provides a detailed discussion on the truncation
   mechanism presented in this document.

   The truncation, undoubtedly, will raise the probability of message
   collision.  That is, two different data item may be mapped to the
   same string after the procedures of masking and truncation by
   coincidence.  The collision may happen across the datasets owned by
   different participants, which causes a false-positive case that two
   different records are matched by PSI, or happen in the same dataset.
   The later situation may also lead to a false-positive case.  To be
   more specific, consider a participant A has two different data items
   data_X and data_Y which are mapped to the same record mask_data_XY.
   Its partner, say B, also has record data_X which is mapped to
   mask_data_XY.  Finally, A outputs both data_X and data_Y as the
   result of PSI, as it cannot distinguish which one matches the record
   of mask_data_XY.

Wang, et al.               Expires 19 May 2025                 [Page 20]
Internet-Draft                  ECDH-PSI                   November 2024

   To avoid such false-positive case, we have to consider the collision
   probability with respect to the sum number of data items owned by A
   and B.  Such probability can be computed with the well-known birthday
   paradox bound.  Let n be the number of sampling and d be the output
   space, the probability of collision can be approximately computed
   with:

   p(n,d)=1-e^(-n(n-1)/2d)

   This document uses a truncated length of 128 bit, which means
   d=2^128.  For different n, which is the sum size of records, the
   probability are calculated as follows:

   *  If n=2^50, p(2^50, 2^128) = 2^-29

   *  If n=2^45, p(2^45, 2^128) = 2^-33

   *  If n=2^41, p(2^41, 2^128) = 2^-47

   *  If n=2^40, p(2^40, 2^128) = 2^-49

   *  If n=2^39, p(2^39, 2^128) = 2^-51

   That is, if the number of records is less than 2^40, the probability
   of false-positive will be smaller than 2^-48.  The participant can
   also decide the truncation option by calculating the collision
   probability, and only use truncation when they both agree that the
   probability is acceptable.

6.  IANA Considerations

   This document has no IANA actions.

7.  References

7.1.  Normative References

   [ECDSA]    American National Standards Institute, "Public Key
              Cryptography for the Financial Services Industry: The
              Elliptic Curve Digital Signature Algorithm (ECDSA)",
              ANSI ANS X9.62-2005, November 2005.

   [FIPS186-4]
              National Institute of Standards and Technology (NIST),
              "Digital Signature Standard (DSS)", FIPS 186-4,
              DOI 10.6028/NIST.FIPS.186-4, July 2013,
              <https://nvlpubs.nist.gov/nistpubs/FIPS/
              NIST.FIPS.186-4.pdf>.

Wang, et al.               Expires 19 May 2025                 [Page 21]
Internet-Draft                  ECDH-PSI                   November 2024

   [GBT.32905-2016]
              Standardization Administration of China, "Information
              security technology --- SM3 cryptographic hash algorithm",
              GB/T 32905-2016, March 2017, <http://www.gmbz.org.cn/
              upload/2018-07-24/1532401392982079739.pdf>.

   [GBT.32918.5-2017]
              Standardization Administration of the People's Republic of
              China, "Information security technology --- Public key
              cryptographic algorithm SM2 based on elliptic curves ---
              Part 5: Parameter definition", GB/T 32918.5-2017, December
              2017, <http://www.gmbz.org.cn/
              upload/2018-07-24/1532401863206085511.pdf>.

   [RFC2104]  Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
              Hashing for Message Authentication", RFC 2104,
              DOI 10.17487/RFC2104, February 1997,
              <https://www.rfc-editor.org/info/rfc2104>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC5869]  Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
              Key Derivation Function (HKDF)", RFC 5869,
              DOI 10.17487/RFC5869, May 2010,
              <https://www.rfc-editor.org/info/rfc5869>.

   [RFC7748]  Langley, A., Hamburg, M., and S. Turner, "Elliptic Curves
              for Security", RFC 7748, DOI 10.17487/RFC7748, January
              2016, <https://www.rfc-editor.org/info/rfc7748>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC9380]  Faz-Hernandez, A., Scott, S., Sullivan, N., Wahby, R. S.,
              and C. A. Wood, "Hashing to Elliptic Curves", August 2023.

7.2.  Informative References

   [CHLR18]   Chen, H., Huang, Z., Laine, K., and P. Rindal, "Labeled
              PSI from Fully Homomorphic Encryption with Malicious
              Security", Proceedings of the 2018 {ACM} {SIGSAC}
              Conference on Computer and Communications Security, {CCS}
              2018, Toronto, ON, Canada, October 15-19, 2018, October
              2018, <https://doi.org/10.1145/3243734.3243836>.

Wang, et al.               Expires 19 May 2025                 [Page 22]
Internet-Draft                  ECDH-PSI                   November 2024

   [CZZDL24]  Chen, Y., Zhang, M., Zhang, C., Dong, M., and W. Liu,
              "Private Set Operations from Multi-query Reverse Private
              Membership Test", Public-Key Cryptography - {PKC} 2024 -
              27th {IACR} International Conference on Practice and
              Theory of Public-Key Cryptography, 2024, Proceedings, Part
              {III}, April 2024,
              <https://doi.org/10.1007/978-3-031-57725-3_13>.

   [draft-irtf-cfrg-vdaf-12]
              Barnes, R., Cook, D., Patton, C., and P. Schoppmann,
              "Verifiable Distributed Aggregation Functions", Work in
              Progress, Internet-Draft, draft-irtf-cfrg-vdaf-12, 4
              October 2024, <https://datatracker.ietf.org/doc/html/
              draft-irtf-cfrg-vdaf-12>.

   [I-D.ietf-ppm-dap]
              Geoghegan, T., Patton, C., Pitman, B., Rescorla, E., and
              C. A. Wood, "Distributed Aggregation Protocol for Privacy
              Preserving Measurement", Work in Progress, Internet-Draft,
              draft-ietf-ppm-dap-12, 10 October 2024,
              <https://datatracker.ietf.org/doc/html/draft-ietf-ppm-dap-
              12>.

   [IMC]      Katz, J. and Y. Lindell, "Introduction to Modern
              Cryptography, Third Edition".

   [ISO-SM2]  International Organization for Standardization, "IT
              Security techniques -- Digital signatures with appendix --
              Part 3: Discrete logarithm based mechanisms", ISO/
              IEC 14888-3:2018, November 2018,
              <https://www.iso.org/standard/76382.html>.

   [ISO-SM3]  International Organization for Standardization, "IT
              Security techniques -- Hash-functions -- Part 3: Dedicated
              hash-functions", ISO/IEC 10118-3:2018, October 2018,
              <https://www.iso.org/standard/67116.html>.

   [KKRT16]   Kolesnikov, V., Kumaresan, R., Rosulek, M., and N. Trieu,
              "Efficient Batched Oblivious {PRF} with Applications to
              Private Set Intersection", Proceedings of the 2016 {ACM}
              {SIGSAC} Conference on Computer and Communications
              Security, Vienna, Austria, October 24-28, 2016, October
              2016, <https://doi.org/10.1145/2976749.2978381>.

   [LOGJAM]   Adrian, D., Bhargavan, K., Durumeric, Z., Gaudry, P.,
              Green, M., Halderman, J. A., Heninger, N., Springall, D.,
              Thomé, E., Valenta, L., VanderSloot, B., Wustrow, E.,
              Zanella-BĂ©guelin, S., and P. Zimmermann, "Imperfect

Wang, et al.               Expires 19 May 2025                 [Page 23]
Internet-Draft                  ECDH-PSI                   November 2024

              Forward Secrecy: How Diffie-Hellman Fails in Practice",
              Proceedings of the 22nd {ACM} {SIGSAC} Conference on
              Computer and Communications Security, Denver, CO, USA,
              October 12-16, 2015, October 2015,
              <https://doi.org/10.1145/2810103.2813707>.

   [Meadows86]
              Meadows, C., "A More Efficient Cryptographic Matchmaking
              Protocol for Use in the Absence of a Continuously
              Available Third Party", 1986 IEEE Symposium on Security
              and Privacy, <https://doi.org/10.1109/SP.1986.10022>.

   [MRH04]    Maurer, U., Renner, R., and C. Holenstein,
              "Indifferentiability, Impossibility Results on Reductions,
              and Applications to the Random Oracle Methodology", In TCC
              2004: Theory of Cryptography, pages 21-39,
              DOI 10.1007/978-3-540-24638-1_2, February 2004,
              <https://doi.org/10.1007/978-3-540-24638-1_2>.

   [MS21]     Lauter, K., Kannepalli, S., Laine, K., and R. C. Moreno,
              "Password Monitor: Safeguarding passwords in Microsoft
              Edge", <https://www.microsoft.com/en-us/research/blog/
              password-monitor-safeguarding-passwords-in-microsoft-
              edge/>.

   [RFC8031]  Nir, Y. and S. Josefsson, "Curve25519 and Curve448 for the
              Internet Key Exchange Protocol Version 2 (IKEv2) Key
              Agreement", RFC 8031, DOI 10.17487/RFC8031, December 2016,
              <https://www.rfc-editor.org/info/rfc8031>.

   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
              <https://www.rfc-editor.org/info/rfc8446>.

   [RFC8998]  Yang, P., "ShangMi (SM) Cipher Suites for TLS 1.3",
              RFC 8998, DOI 10.17487/RFC8998, March 2021,
              <https://www.rfc-editor.org/info/rfc8998>.

   [RFC9382]  Ladd, W., "SPAKE2, a Password-Authenticated Key Exchange",
              RFC 9382, DOI 10.17487/RFC9382, September 2023,
              <https://www.rfc-editor.org/info/rfc9382>.

   [RFC9497]  Davidson, A., Faz-Hernandez, A., Sullivan, N., and C. A.
              Wood, "Oblivious Pseudorandom Functions (OPRFs) Using
              Prime-Order Groups", RFC 9497, DOI 10.17487/RFC9497,
              December 2023, <https://www.rfc-editor.org/info/rfc9497>.

Wang, et al.               Expires 19 May 2025                 [Page 24]
Internet-Draft                  ECDH-PSI                   November 2024

   [RR22]     Raghuraman, S. and P. Rindal, "Blazing Fast PSI from
              Improved OKVS and Subfield VOLE", Proceedings of the 2022
              {ACM} {SIGSAC} Conference on Computer and Communications
              Security, {CCS} 2022, Los Angeles, CA, USA, November 7-11,
              2022, November 2022,
              <https://doi.org/10.1145/3548606.3560658>.

Authors' Addresses

   Yuchen Wang
   Ant Group
   Email: tianwu.wyc@antgroup.com

   Wenting Chang
   Ant Group
   Postfach 330440
   Beijing
   Phone: +49-421-218-63921
   Email: bainuan.cwt@antgroup.com

   Yufei Lu
   Ant Group
   Email: yuwen.lyf@antgroup.com

   Cheng Hong
   Ant Group
   Email: vince.hc@alibaba-inc.com

   Jin Peng
   Ant Group
   Email: jim.pj@antgroup.com

Wang, et al.               Expires 19 May 2025                 [Page 25]