Skip to main content

DLT Gateway Crash Recovery Mechanism
draft-belchior-gateway-recovery-02

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Rafael Belchior , Miguel Correia , Thomas Hardjono
Last updated 2021-05-25
Replaced by draft-belchior-satp-gateway-recovery, draft-belchior-satp-gateway-recovery
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-belchior-gateway-recovery-02
Internet Engineering Task Force                              R. Belchior
Internet-Draft                                                M. Correia
Intended status: Informational      INESC-ID, Instituto Superior Tecnico
Expires: November 26, 2021                                   T. Hardjono
                                                                     MIT
                                                            May 25, 2021

                  DLT Gateway Crash Recovery Mechanism
                   draft-belchior-gateway-recovery-02

Abstract

   This memo describes the crash recovery mechanism for the Open Digital
   Asset Protocol (ODAP), called ODAP-2PC.  The goal is to assure
   gateways running ODAP to be able to recover from crashes, and thus
   preserve the consistency of an asset across ledgers (i.e., double
   spend does not occur).  This draft includes the description of the
   messaging and logging flow necessary for the correct functioning of
   ODAP-2PC.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 26, 2021.

Copyright Notice

   Copyright (c) 2021 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect

Belchior, et al.        Expires November 26, 2021               [Page 1]
Internet-Draft           Gateway Crash Recovery                 May 2021

   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Logging Model . . . . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Example . . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Log Storage Types . . . . . . . . . . . . . . . . . . . .   7
     3.3.  Log Storage API:  . . . . . . . . . . . . . . . . . . . .   7
       3.3.1.  Response Codes  . . . . . . . . . . . . . . . . . . .   8
   4.  Format of log entries . . . . . . . . . . . . . . . . . . . .   9
   5.  ODAP-2PC  . . . . . . . . . . . . . . . . . . . . . . . . . .  11
     5.1.  Crash Recovery Model  . . . . . . . . . . . . . . . . . .  11
     5.2.  Recovery Procedure  . . . . . . . . . . . . . . . . . . .  12
       5.2.1.  Transfer Initiation Flow  . . . . . . . . . . . . . .  13
       5.2.2.  Lock-Evidence Flow  . . . . . . . . . . . . . . . . .  13
       5.2.3.  Commitment Establishment  Flow  . . . . . . . . . . .  13
     5.3.  ODAP-2PC Messages . . . . . . . . . . . . . . . . . . . .  14
       5.3.1.  RECOVER . . . . . . . . . . . . . . . . . . . . . . .  14
       5.3.2.  RECOVER-UDPDATE . . . . . . . . . . . . . . . . . . .  14
       5.3.3.  RECOVER-UPDATE ACK  . . . . . . . . . . . . . . . . .  14
       5.3.4.  RECOVER-SUCCESS . . . . . . . . . . . . . . . . . . .  15
       5.3.5.  ROLLBACK  . . . . . . . . . . . . . . . . . . . . . .  15
     5.4.  Examples  . . . . . . . . . . . . . . . . . . . . . . . .  15
       5.4.1.  Crashing before issuing a command to the counterparty
               gateway . . . . . . . . . . . . . . . . . . . . . . .  15
       5.4.2.  Crashing after issuing a command to the counterparty
               gateway . . . . . . . . . . . . . . . . . . . . . . .  17
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  19
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  19
     7.2.  Informative References  . . . . . . . . . . . . . . . . .  19
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

1.  Introduction

   Gateway systems that perform virtual asset transfers among DLTs must
   possess a degree of resiliency and fault tolerance in the face of
   possible crashes.  Accounting for the possibiility of crashes is
   particularly important to guarantee asset consistency across DLTs.

   ODAP-2PC [BVCH21] uses 2PC, an atomic commitment protocol (ACP). 2PC
   considers two roles: a Coordinator that manages the execution of the
   protocol and Participants that manage the resources that must be kept

Belchior, et al.        Expires November 26, 2021               [Page 2]
Internet-Draft           Gateway Crash Recovery                 May 2021

   consistent.  The source gateway plays the ACP role of Coordinator,
   and the recipient gateway plays the Participant role in relay mode.
   Gateways exchange messages corresponding to the protocol execution,
   generating log entries for each one.

   Log entries are organized into logs.  Logs enable either the same or
   other backup gateways to resume any phase of ODAP.  This log can also
   serve as an accountability tool in case of disputes.  Another key
   component is an atomic commit protocol (ACP) that guarantees that the
   source and target DLTs are modified consistently (atomicity) and
   permanently (durability), e.g., that assets that are taken from the
   source DLT are persisted into the recipient DLT.

   Log entries are then the basis satisfying one of the key deployment
   requirements of gateways for asset transfers: a high degree of
   availability.  In this document, we consider two common strategies to
   increase availability: (1) to support the recovery of the gateways
   and (2) to employ backup gateways with the ability to resume a
   stalled transfer.

   This memo proposes: (i) the logging model of ODAP-2PC; (ii) the log
   storage types; (iii) the log storage API; (iv) the log entry format;
   (v) the recovery and rollaback procedures;

2.  Terminology

   There following are some terminology used in the current document:

   o  Gateway: The nodes of a DLT system that are functionally capable
      of handling an asset transfer with another DLT.  Gateway nodes
      implement the gateway-to-gateway asset transfer protocol.

   o  Primary Gateway: The node of a DLT system that has been selected
      or elected to act as a gateway in an asset transfer.

   o  Backup Gateway: The node of a DLT system that has been selected or
      elected to act as a backup gateway to a primary gateway.

   o  Message Flow Parameters: The parameters and payload employed in a
      message flow between a sending gateway and receiving gateway.

   o  Source Gateway (or G1): The gateway that initiates the transfer
      protocol.  Acts as a coordinator of the ACP and mediates the
      message flow.

   o  Recipient Gateway (or G2): The gateway that is the target of an
      asset transfer.  It follows instructions from the source gateway.

Belchior, et al.        Expires November 26, 2021               [Page 3]
Internet-Draft           Gateway Crash Recovery                 May 2021

   o  Source DLT: The DLT of the source gateway.

   o  Target DLT: The DLT of the recipient gateway.

   o  Log: Set of log entries such that those are ordered by the time of
      its creation.

   o  Public (or Shared) Log: log where several nodes can read and write
      from it.

   o  Private Log: log where only one node can read and write from it.

   o  Log data: The log information is retained by a gateway connected
      to an exchanged message within an asset transfer protocol.

   o  Log entry: The log information generated and persisted by a
      gateway regarding one specific message flow step.

   o  Log format: The format of log-data generated by a gateway.

   o  Atomic commit protocol (ACP): A protocol that guarantees that
      assets that are taken from a DLT are persisted into the other DLT.
      Examples are two and three-phase commit protocols (2PC, 3PC,
      respectively) and non-blocking atomic commit protocols.

   o  Fault: A fault is an event that alters the expected behavior of a
      system.

   o  Crash-fault tolerant models: models allowing a system to keep
      operating correctly despite having a set of faulty components.

   o  Digital asset: a form of digital medium recordation that is used
      as a digital representation of a tangible or intangible asset.

3.  Logging Model

   Gateways store logs to map state.  There are two types of logs: a
   private log that stores the current state; and a shared log that
   stores the joint state between two gateways.  Using a shared,
   decentralized log can alleviate trust assumptions between gateways,
   by providing an agreed upon source of truth.

   We consider the log file to be a stack of log entries.  Each time a
   log entry is added, it goes to the top of the stack (the highest
   index).

   To manipulate the log, we define a set of log primitives, that
   translate log entry requests from a process into log entries,

Belchior, et al.        Expires November 26, 2021               [Page 4]
Internet-Draft           Gateway Crash Recovery                 May 2021

   realized by the log storage API (for the context of ODAP,
   Section 3.5):

      writeLogEntry(e,L) (WRITE) - appends a log entry e in the log L
      (held by the corresponding Log Storage Support).

      getLogEntry(i,L) (READ) - retrieves a log entry with index i from
      log L.

   From these primitives, other functions can be built:

      getLogLength (L) (READ) - obtains the number of log entries from
      log L.

      getLogDiff(l1,l2) (READ) - obtains the difference between two
      logs.

      getLastEntry(L): obtains the last log entry from log L.

      getLog(L): retrieves the whole log L.

      updateLog(l1,l2): updates l1 based on l2 (uses getLogDiff and
      writeLogEntry).

   Example 2.1 shows a simplified version log referring to the transfer
   initiation flow ODAP phase.  Each log entry (simplified, definition
   in Section 3) is composed by metadata (phase, sequence number) and
   one attribute from the payload (operation).  Operations map behavior
   to state (see Section 3).

   The following table illustrates the log storage API.  The Function
   describes the primitive supported by the log storage API.  The
   Parameters column specifies the parameters given to the endpoint as
   query parameters.  Endpoint specifies the endpoint mapping a certain
   log primitive.  The column Returns specifies what the contents of
   "response_data" mean.  This last field is illustrated by column
   Response Example.

3.1.  Example

Belchior, et al.        Expires November 26, 2021               [Page 5]
Internet-Draft           Gateway Crash Recovery                 May 2021

     ,--.                     ,--.                                 ,-------.
     |G1|                     |G2|                                 |Log API|
     `--'                     `--'                                 `-------'
      |             [1]: writeLogEntry <1,1,init-validate>             |
      | --------------------------------------------------------------->
      |                        |                                       |
      | initiate ODAP's phase 1|                                       |
      | ----------------------->                                       |
      |                        |                                       |
      |                        | [2]: writeLogEntry <1,2,exec-validate>|
      |                        | -------------------------------------->
      |                        |                                       |
      |                        |----.                                  |
      |                        |    | execute validate from p1         |
      |                        |<---'                                  |
      |                        |                                       |
      |                        | [3]: writeLogEntry <1,3,done-validate>|
      |                        | -------------------------------------->
      |                        |                                       |
      |                        | [4]: writeLogEntry <1,4,ack-validate> |
      |                        | -------------------------------------->
      |                        |                                       |
      |   validation complete  |                                       |
      | <-----------------------                                       |
     ,--.                     ,--.                                 ,-------.
     |G1|                     |G2|                                 |Log API|
     `--'                     `--'                                 `-------'

                                 Figure 1

   Example 2.1 shows the sequence of logging operations over part of the
   first phase of ODAP (simplified):

      At step 1, G1 writes an init-validate operation, meaning it will
      require G2 to initiate the validate function: This generates a log
      entry (p1, 1, init-validate).

      At step 2, G2 writes an exec-validate operation, meaning it will
      try to execute the validate function: This generates a log entry
      (p1, 2, exec-validate).

      At step 3, G2 writes an done-validate operation, meaning it
      successfully executed the validate function: This generates a log
      entry (p1, 3, done-validate).

Belchior, et al.        Expires November 26, 2021               [Page 6]
Internet-Draft           Gateway Crash Recovery                 May 2021

      At step 4, G2 writes an ack-validate operation, meaning it will
      send an acknowledgement to G1 regarding the done-validate: This
      generates a log entry (p1, 4, ack-validate).

3.2.  Log Storage Types

   Different log storage types (or log support) exist.

   The private log can in several supports: 1) off-chain storage (with
   the possibility of a hash of the logs being stored on-chain), where
   logs are stored on the hard-drive of the computer system performing
   the role of a gateway; 2) cloud storage; 3) on-chain storage, i.e.,
   using a DLT.  Shared logs can use supports 2 and 3.

   Saving logs locally is faster than saving them on the respective
   ledger but delivers weaker integrity and availability guarantees.
   Saving log entries on a DLT may slow down the protocol because
   issuing a transaction is several orders of magnitude slower than
   writing on disk or accessing a cloud service.

   We assume the storage service used provides the means necessary to
   assure the logs' confidentiality and integrity, stored and in
   transit.  The service must provide an authentication and
   authorization scheme, e.g., based on OAuth and OIDC [OIDC], and use
   secure channels based on TLS/HTTPS [TLS].

3.3.  Log Storage API:

   The log storage API allows for developers to abstract the log storage
   support, providing a standardized way to interact with logs (e.g.,
   relational vs. non-relational, local vs on-chain).  It also handles
   access control if needed.

Belchior, et al.        Expires November 26, 2021               [Page 7]
Internet-Draft           Gateway Crash Recovery                 May 2021

+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Function                              | Parameters                       | Endpoint                                                               |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Append log entry                      | logId - log entry to be appended | POST / writeLogEntry/:logId Host: example.org Accept: application/json |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Obtains a log entry                   | id - log entry id                | GET getLogEntry/:id Host: example.org                                  |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Obtains the length of the log         | None                             | GET getLogLength Host: example.org                                     |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Obtains the difference                | log - log to be compared         | GET getLogDiff Host: example.org                                       |
| between a given log and a current log |                                  |                                                                        |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Obtains the last log entry            | None                             | GET getLastEntry Host: example.org                                     |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+
| Obtains the whole log                 | None                             | GET getLog Host: example.org                                           |
+---------------------------------------+----------------------------------+------------------------------------------------------------------------+

                                 Figure 2

   The following table maps the respecetive return values and response
   examples:

+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Returns                         | Response Example                                                                                                                                      |
+=================================+=======================================================================================================================================================+
| The entry index of the last log | HTTP/1.1 200 OK Cache-Control: private Date: Mon, 02 Mar 2020 05:07:35 GMT Content-Type: application/json { "success": true, "response_data":"2" }    |
| (string)                        |                                                                                                                                                       |
+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| A log entry                     | HTTP/1.1 200 OK Cache-Control: private Date: Mon, 02 Mar 2020 05:07:35 GMT Content-Type: application/json { "success": true, "response_data": {...} } |
+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| The length of the log           | HTTP/1.1 200 OK Cache-Control: private Date: Mon, 02 Mar 2020 05:07:35 GMT Content-Type: application/json { "success": true, "response_data":"2" }    |
| (string)                        |                                                                                                                                                       |
+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| The difference between two logs | HTTP/1.1 200 OK Cache-Control: private Date: Mon, 02 Mar 2020 05:07:35 GMT Content-Type: application/json { "success": true, "response_data": {...} } |
+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| A log entry                     | HTTP/1.1 200 OK Cache-Control: private Date: Mon, 02 Mar 2020 05:07:35 GMT Content-Type: application/json { "success": true, "response_data": {...} } |
+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| The log                         | HTTP/1.1 200 OK Cache-Control: private Date: Mon, 02 Mar 2020 05:07:35 GMT Content-Type: application/json { "success": true, "response_data": {...} } |
+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+

                                 Figure 3

3.3.1.  Response Codes

   The log storage API MUST respond with return codes indicating the
   failure (error 5XX) or success of the operation (200).  The
   application may carry out further operation in future to determine
   the ultimate status of the operation.

Belchior, et al.        Expires November 26, 2021               [Page 8]
Internet-Draft           Gateway Crash Recovery                 May 2021

   The log storage API response is in JSON format and contains two
   fields: 1) success: true if the operation was successful, and 2)
   response_data: contains the payload of the response generated by the
   log storage API.

4.  Format of log entries

   The log entries are stored by a gateway in its log, and they capture
   gateways operations.  Entries account for the current status of one
   of the three ODAP flows: Transfer Initiation flow, Lock-Evidence
   flow, and Commitment Establishment flow.

   The recommended format for log entries is JSON [xxx], with protocol-
   specific mandatory fields, support for a free format field for
   plaintext or encrypted payloads directed at the DLT gateway or an
   underlying DLT.  Although the recommended format is JSON, other
   formats can be used (e.g., XML).

   The mandatory fields of a log entry, that are generated by ODAP, are:

      Version: ODAP protocol Version (major, minor).

      Session ID: unique identifier (UUIDv2) representing a session.

      Sequence Number: monotonically increasing counter that uniquely
      represents a message from a session.

      ODAP Phase: current ODAP phase.

      Resource URL: Location of Resource to be accessed.

      Developer URN: Assertion of developer / application identity.

      Action/Response: GET/POST and arguments (or Response Code).

      Credential Profile: Specify type of auth (e.g.  SAML, OAuth,
      X.509).

      Credential Block: Credential token, certificate, string.

      Payload Profile: Asset Profile provenance and capabilities.

      Application Profile: Vendor or Application specific profile.

      Payload: Payload for POST, responses, and native DLT txns.  The
      payload is specific to the current ODAP phase.

      Payload Hash: hash of the current message payload.

Belchior, et al.        Expires November 26, 2021               [Page 9]
Internet-Draft           Gateway Crash Recovery                 May 2021

   In addition to the attributes that belong to ODAP s schema, each log
   entry REQUIRES the following attributes:

      timestamp REQUIRED: timestamp referring to when the log entry was
      generated (UNIX format).

      source_gateway_pubkey REQUIRED: the public key of the gateway
      initiating a transfer.

      source_gateway_dlt_system REQUIRED: the ID of the source DLT.

      recipient_gateway_pubkey REQUIRED: the public key of the gateway
      involved in a transfer.

      recipient_gateway_dlt_system REQUIRED: the ID of the recipient
      gatewayinvolved in a transfer.

      logging_profile REQUIRED: contains the profile regarding the
      logging procedure.  Default is local store.

      Message_signature REQUIRED: Gateway EDCSA signature over the log
      entry.

      Last_entry_hash REQUIRED: Hash of previous log entry.

      Access_control_profile REQUIRED: the profile regarding the
      confidentiality of the log entries being stored.  Default is only
      the gateway that created the logs can access them.

      Operation: the high level operation being executed by the gateway
      on that step.  There are five types of operations: Operation init-
      states the intention of a node to execute a particular operation;
      Operation exec- expresses that the node is executing the
      operation; Operation done- states when a node successfully
      executed a step of the protocol; Operation ack- refers to when a
      node acknowledges a message received from another (e.g., command
      executed); Operation fail- occurs when an agent fails to execute a
      specific step.

      operation history: a map between operations and sequence numbers
      of odap

   Optional field entries are:

      source_gateway_uid OPTIONAL: the uid of the source gateway
      involved in a transfer.

Belchior, et al.        Expires November 26, 2021              [Page 10]
Internet-Draft           Gateway Crash Recovery                 May 2021

      recipient_gateway_uid : the uid of the recipient gateway involved
      in a transfer.

      recovery message: the type of recovery message, if gateway is
      involved in a recovery procedure.

      recovery payload: the payload associated with the recovery
      message.

   Example of a log entry created by G1, corresponding to locking an
   asset (phase 2.3 of the ODAP protocol) :

                   TODO

                                 Figure 4

   Example of a log entry created by G2, acknowledging G1 locking an
   asset (phase 2.4 of the ODAP protocol) :

                   TODO

                                 Figure 5

5.  ODAP-2PC

   This section defines general considerations about crash recovery.
   ODAP-2PC is the application of the gateway crash recovery mechanism
   to asset transfers, across all ODAP phases.

5.1.  Crash Recovery Model

   We assume gateways fail by crashing, i.e., by becoming silent, not
   arbitrary or Byzantine faults.  We assume authenticated reliable
   channels obtained using TLS/HTTPS [TLS].  To recover from these
   crashes, gateways store in persistent storage data about the step of
   their protocol.  This allows the system to recover by getting from
   the log the first step that may have failed.  We consider two
   recovery models:

      Self-healing mode: assumes that after a crash, a gateway
      eventually recovers; The recovered gateway informs the other party
      of its recovery and continues the protocol execution;

      Primary-backup mode: assumes that after a crash, a gateway may
      never recover, but that this failure can be detected by timeout
      [AD76].  When a node is crashed indefinitely, a backup is spun

Belchior, et al.        Expires November 26, 2021              [Page 11]
Internet-Draft           Gateway Crash Recovery                 May 2021

      off, using the log storage API to retrieve the most recent version
      of the log.

   In Self-healing mode, when a gateway restarts after a crash, it reads
   the state from the log and continues executing the protocol from that
   point on.  We assume the gateway does not lose its long-term keys
   (public-private key pair) and can reestablish all TLS connections.

   In Primary-backup mode, we assume that after a period T of the
   primary gateway failure, a backup gateway detects that failure
   unequivocally and takes the role of the primary gateway.  The failure
   is detected using heartbeat messages and a conservative value for T.
   The backup gateway does virtually the same as the gateway in self-
   healing mode: reads the log and continues the process.  The
   difference is that the log must be shared between the primary and the
   backup gateways.  If there is more than one backup, a leader-election
   protocol may be executed to decide which backup will take the primary
   role.

5.2.  Recovery Procedure

   Gateways can crash at several points of the protocol.

   In 2PC and 3PC, recovery requires that the protocol steps are
   recorded in a log immediately before sending a message and
   immediately after receiving a message.  When a node crashes:

   Upon recovery, the recovered node attempts to retrieve the most
   recent log of operations.  Two situations might occur: for gateways
   with their local log plus a shared log, the crashed gateway attempts
   to perform an update to its local log, using getLogDiff from the
   shared log.

   If there is no shared log, the crashed gateway needs to synchronize
   itself with the counterparty gateway, by querying the counterparty
   gateway with a recovery message containing the latest log before
   crash.  This message allows the non-crashed log to collect the
   potentially missing log entries from the crashed log.  After that,
   the non-crashed log shares those entries with the now recover
   gateway.

   The recovered gateway can now reconstruct the updated log and derive
   the current state of the asset transfer.  For each phase:

Belchior, et al.        Expires November 26, 2021              [Page 12]
Internet-Draft           Gateway Crash Recovery                 May 2021

5.2.1.  Transfer Initiation Flow

   For every step of this phase, logs are written before operations are
   executed.  A log entry is written when an operation finishes its
   execution.  If a gateway crashes, upon recovery, it sends a special
   message RECOVER to the counterparty gateway.  The counterparty
   gateway derives the latest log entry the recover gateway holds, and
   calculates the difference between its own log (RESPONSE-UPDATE).
   After that, it sends it back to the recovered gateway, which then
   updates its own log.  After that, a recovery confirmation message is
   sent (RECOVERY-CONFIRM), and the respective acknowledgment sent by
   the counterparty gateway (RECOVERY-ACK).  The gateways now share the
   same log, and can proceed its operation.  Note that if the shared log
   is blockchain or cloud based, the same flow applies, but the
   recovered gateway derives the new log, rather than the counterparty
   gateway.

5.2.2.  Lock-Evidence Flow

   If a crash occurs during the lock-evidence flow, the procedure is the
   same as the transfer initiation flow.  However

5.2.3.  Commitment Establishment Flow

   This flow requires changes in distributed ledgers - which implies
   issuing transactions against them.  As transactions cannot be undone
   on blockchains, we use a rollback list - keeping an history of the
   issued transactions.  If a crash occurs and requires reverting state,
   transactions with the contrary effects of what is present on the
   rollaback lists are issued.

      Rollback lists for all the gateways involved are initialized.

      On step 2.3, add a pre-lock transaction to the source gateway
      rollback list

      On step 3.2, if the request is denied, then abort the transaction
      and apply rollbacks on the source gateway

      On step 3.3, add a lock transaction to the source gateway rollback
      list.

      On step 3.4, if the commit fails, then abort the transaction and
      apply rollbacks on the source gateway

      On step 3.5, add a create asset transaction to the rollback list
      of the recipient gateway

Belchior, et al.        Expires November 26, 2021              [Page 13]
Internet-Draft           Gateway Crash Recovery                 May 2021

      On step 3.8, if the commit is successful, ODAP terminates.

      8: Otherwise, if the last commit is not successful, then abort the
      transaction and apply rollbacks to both gateways

5.3.  ODAP-2PC Messages

   ODAP-2PC messages are used to recover from crashes at the several
   ODAP phases.  These messages inform gateways of the current state of
   a recovery procedure.  ODAP-2PC messages follow log format from
   Section 4.

5.3.1.  RECOVER

   A recover message is sent from the crashed gateway to the
   counterparty gateway, sending its most recent state.  This message
   type is encoded on the recovery message field of an ODAP log.

   The parameters of the recovery message payload consists of the
   following:

      ODAP phase: latest ODAP phase registered.

      Sequence number: latest sequence number registered.

      Last_entry_hash REQUIRED: Hash of previous log entry.

5.3.2.  RECOVER-UDPDATE

   The recover update message is sent by the counterparty gateway after
   receiving a recover message from a recovered gateway.  The recovered
   gateway informs of its current state (via the current state of the
   log).  The counterparty gateway now calculates the difference between
   the log entry corresponding to the received sequence number from the
   recovered gateway and the latest sequence number (corresponding to
   the latest log entry).  This state is sent to the recovered gateway.

   The parameters of the recover update payload consists of the
   following:

      recovered logs: the list of log messages that the recovered
      gateway needs to update.

5.3.3.  RECOVER-UPDATE ACK

   The recover-update ack message (response to RECOVER-UPDATE) states if
   the recovered gateway's logs has been successfully updated.  If

Belchior, et al.        Expires November 26, 2021              [Page 14]
Internet-Draft           Gateway Crash Recovery                 May 2021

   inconsistencies are detected, the recovered gateway answers with
   initiates a dispute (RECOVER-DISPUTE message).

   The parameters of this message consists of the following:

      success: true/false.

      entries changed: list of hashes of log entries that were appeded
      to the recovered gateway log.

5.3.4.  RECOVER-SUCCESS

   The recover-ack message is sent by the counterparty gateway to the
   recovered gateway acknowledging that the state is synchronized.

   The parameters of this message consists of the following:

      success: true/false.

5.3.5.  ROLLBACK

   A rollback message is sent by a gateway that initiated a rollback as
   defined by ODAP-2PC.

   The parameters of this message consists of the following:

      success: true/false.

      actions performed: actions performed to rollback a state (e.g.,
      UNLOCK; BURN).

      proofs: TBD.

5.4.  Examples

   There are several situations when a crash may occur.

5.4.1.  Crashing before issuing a command to the counterparty gateway

   The following figure represents the source gateway (G1) crashing
   before it issued an init command to the recipient gateway (G2).

Belchior, et al.        Expires November 26, 2021              [Page 15]
Internet-Draft           Gateway Crash Recovery                 May 2021

        ,--.                           ,--.             ,-------.
        |G1|                           |G2|             |Log API|
        `--'                           `--'             `-------'
         |     [1]: writeLogEntry <1, 1, init-validate>     |
         | ------------------------------------------------->
         |                              |                   |
         |----.                         |                   |
         |    | [2]  Crash              |                   |
         |<---'  ...                    |                   |
         |      [3]recover              |                   |
         |                              |                   |
         |                              |                   |
         |      [4] <1, 2, RECOVER>     |                   |
         | ----------------------------->                   |
         |                              |                   |
         |                              | [5] getLogEntry(i)|
         |                              | ------------------>
         |                              |                   |
         |                              |   [6] logEntries  |
         |                              | <- - - - - - - - -
         |                              |                   |
         |   [7] <1,3,RECOVER-UPDATE>   |                   |
         | <-----------------------------                   |
         |                              |                   |
         |----.                         |                   |
         |    | [8] process log         |                   |
         |<---'                         |                   |
         |                              |                   |
         |              [9] <1,4,writeLogEntry>             |
         | ------------------------------------------------->
         |                              |                   |
         | [10] <1,5,RECOVER-UPDATE-ACK>|                   |
         | ----------------------------->                   |
         |                              |                   |
         |   [11] <1,6,RECOVER-SUCESS>  |                   |
         | <-----------------------------                   |
         |                              |                   |
         |           [12]: <1,7,init-validateNext>          |
         | ------------------------------------------------->
        ,--.                           ,--.             ,-------.
        |G1|                           |G2|             |Log API|
        `--'                           `--'             `-------'

                                 Figure 6

Belchior, et al.        Expires November 26, 2021              [Page 16]
Internet-Draft           Gateway Crash Recovery                 May 2021

5.4.2.  Crashing after issuing a command to the counterparty gateway

   The second scenario requires further synchronization (figure below).
   At the retrieval of the latest log entry, G1 notices its log is
   outdated.  It updates it upon necessary validation and then
   communicates its recovery to G2.  The process then continues as
   defined.

     ,--.                           ,--.                             ,-------.
     |G1|                           |G2|                             |Log API|
     `--'                           `--'                             `-------'
      |              [1]: writeLogEntry <1,1,init-validate>              |
      | ----------------------------------------------------------------->
      |                              |                                   |
      |   [2]: <1,1,init-validate>   |                                   |
      | ----------------------------->                                   |
      |                              |                                   |
      |----.                         |                                   |
      |    | [3] Crash               |                                   |
      |<---'                         |                                   |
      |                              |                                   |
      |                              | [4]: writeLogEntry <exec-validate>|
      |                              | ---------------------------------->
      |                              |                                   |
      |                              |----.                              |
      |                              |    | [5]: execute init            |
      |                              |<---'                              |
      |                              |                                   |
      |                              |   [6]: writeLogEntry <done-init>  |
      |                              | ---------------------------------->
      |                              |                                   |
      |                              |   [7]: writeLogEntry <ack-init>   |
      |                              | ---------------------------------->
      |                              |                                   |
      | [8] <1,2,init-validate-ack>  |                                   |
      |  discovers that G1 crashed   |                                   |
      |  via timeout                 |                                   |
      | <-----------------------------                                   |
      |                              |                                   |
      |----.                         |                                   |
      |    | [9] Recover             |                                   |
      |<---'                         |                                   |
      |                              |                                   |
      |     [10] <1, 2, RECOVER>     |                                   |
      | ----------------------------->                                   |
      |                              |                                   |
      |                              |        [11] getLogEntry(i)        |

Belchior, et al.        Expires November 26, 2021              [Page 17]
Internet-Draft           Gateway Crash Recovery                 May 2021

      |                              | ---------------------------------->
      |                              |                                   |
      |                              |          [12] logEntries          |
      |                              | <- - - - - - - - - - - - - - - - -
      |                              |                                   |
      |   [13] <1,3,RECOVER-UPDATE>  |                                   |
      | <-----------------------------                                   |
      |                              |                                   |
      |----.                         |                                   |
      |    | [14] process log        |                                   |
      |<---'                         |                                   |
      |                              |                                   |
      |                     [15] <1,4,writeLogEntry>                     |
      | ----------------------------------------------------------------->
      |                              |                                   |
      | [16] <1,5,RECOVER-UPDATE-ACK>|                                   |
      | ----------------------------->                                   |
      |                              |                                   |
      |   [17] <1,6,RECOVER-SUCESS>  |                                   |
      | <-----------------------------                                   |
      |                              |                                   |
      |                   [18]: <1,7,init-validateNext>                  |
      | ----------------------------------------------------------------->
     ,--.                           ,--.                             ,-------.
     |G1|                           |G2|                             |Log API|
     `--'                           `--'                             `-------'

                                 Figure 7

6.  Security Considerations

   We assume a trusted, secure communication channel between gateways
   (i.e., messages cannot be spoofed and/or altered by an adversary)
   using TLS 1.3 or higher.  Clients support ?acceptable? credential
   schemes such as OAuth2.0.

   The present protocol is crash fault-tolerant, meaning that it handles
   gateways that crash for several reasons (e.g., power outage).  The
   present protocol does not support Byzantine faults, where gateways
   can behave arbitrarily (including being malicious).  This implies
   that both gateways are considered trusted.  We assume logs are not
   tampered with or lost.

   Log entries need integrity, availability, and confidentiality
   guarantees, as they are an attractive point of attack [BVC19].  Every

Belchior, et al.        Expires November 26, 2021              [Page 18]
Internet-Draft           Gateway Crash Recovery                 May 2021

   log entry contains a hash of its payload for guaranteeing integrity.
   If extra guarantees are needed (e.g., non-repudiation), a log entry
   might be signed by its creator.  Availability is guaranteed by the
   usage of the log storage API that connects a gateway to a dependable
   storage (local, external, or DLT-based).  Each underlying storage
   provides different guarantees.  Access control can be enforced via
   the access control profile that each log can have associated with,
   i.e., the profile can be resolved, indicating who can access the log
   entry in which condition.  Access control profiles can be implemented
   with access control lists for simple authorization.  The
   authentication of the entities accessing the logs is done at the Log
   Storage API level (e.g., username+password authentication in local
   storage vs. blockchain-based access control in a DLT).

   For extra guarantees, the nodes running the log storage API (or the
   gateway nodes themselves) can be protected by hardening technologies
   such as Intel SGX [CD16].

7.  References

7.1.  Normative References

   [ODAP]     Hargreaves, M. and T. Hardjono, "Open Digital Asset
              Protocol, October 2020, IETF, draft-hargreaves-odap-00.",
              October 2020,
              <https://datatracker.ietf.org/doc/draft-hargreaves-odap/>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [TLS]      Rescorla, E., "The Transport Layer Security (TLS) Protocol
              Version 1.3?, RFC 8446.", 2018,
              <https://tools.ietf.org/rfc/rfc8446>.

7.2.  Informative References

   [AD76]     Alsberg, P. and D. Day, "A principle for resilient sharing
              of distributed resources. In Proc. of the 2nd Int. Conf.
              on Software Engineering", 1976, <978-0-201-10715-9>.

   [BHG87]    Bernstein, P., Hadzilacos, V., and N. Goodman,
              "Concurrency Control and Recovery in Database Systems,
              Chapter 7. Addison Wesley Publishing Company", 1987,
              <https://doi.org/10.3389/fbloc.2019.00024>.

Belchior, et al.        Expires November 26, 2021              [Page 19]
Internet-Draft           Gateway Crash Recovery                 May 2021

   [BVC19]    Belchior, R., Vasconcelos, A., and M. Correia, "Towards
              Secure, Decentralized, and Automatic Audits with
              Blockchain. European Conference on Information Systems",
              2019, <https://aisel.aisnet.org/ecis2020_rp/68/>.

   [BVCH21]   Belchior, R., Vasconcelos, A., Correia, M., and T.
              Hardjono, "HERMES: Fault-Tolerant Middleware for
              Blockchain Interoperability", 2021,
              <https://www.techrxiv.org/articles/preprint/HERMES_Fault-T
              olerant_Middleware_for_Blockchain_Interoperability/1412029
              1>.

   [Clar88]   Clark, D., "The Design Philosophy of the DARPA Internet
              Protocols, ACM Computer Communication Review, Proc SIGCOMM
              88, vol. 18, no. 4, pp. 106-114", August 1988.

   [HS2019]   Hardjono, T. and N. Smith, "Decentralized Trusted
              Computing Base for Blockchain Infrastructure Security,
              Frontiers Journal, Special Issue on Blockchain Technology,
              Vol. 2, No. 24", December 2019,
              <https://doi.org/10.3389/fbloc.2019.00024>.

   [OIDC]     Sakimura, N., Bradley, J., Jones, M., de Medeiros, B., and
              C. Mortimore, "OpenID Connect Core 1.0", 2014,
              <http://openid.net/specs/openid-connect-core-1_0.html>.

   [SRC84]    Saltzer, J., Reed, D., and D. Clark, "End-to-End Arguments
              in System Design, ACM Transactions on Computer Systems,
              vol. 2, no. 4, pp. 277-288", November 1984.

Authors' Addresses

   Rafael Belchior
   INESC-ID, Instituto Superior Tecnico

   Email: rafael.belchior@tecnico.ulisboa.pt

   Miguel Correia
   INESC-ID, Instituto Superior Tecnico

   Email: miguel.p.correia@tecnico.ulisboa.pt

   Thomas Hardjono
   MIT

   Email: hardjono@mit.edu

Belchior, et al.        Expires November 26, 2021              [Page 20]