Protocol for Evaluating Reinforcement Learning Environments in Real Time
draft-perlert-wg-00

Document Type Active Internet-Draft (individual)
Last updated 2020-08-13
Stream (None)
Intended RFC status (None)
Formats plain text xml pdf htmlized (tools) htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
Internet Engineering Task Force                               R. Montero
Internet-Draft                                    University of A Coruna
Intended status: Informational                           August 13, 2020
Expires: February 14, 2021

Protocol for Evaluating Reinforcement Learning Environments in Real Time
                          draft-perlert-wg-00

Abstract

   This document defines a simple UDP protocol for communicating a
   server simulating a reinforcement learning environment and a client
   observing it and responding with actions.

   Reinforcement learning problems are usually defined within the scope
   of a Markov Decission Process (MDP) where an agent sends an action
   belonging to an action space to an environment.  The environment acts
   as a black box returning an observation and a reward for the agent,
   whose goal is to maximize the total obtained rewards.

   Although the problem statement is easy to understand, there are no
   conventions on how to communicate a reinforcement learning simulation
   with a client agent, either in a local network or over the Internet.
   Additionally, giving an answer to this can be especially useful when
   it comes to multiagent support and analysis.

   The protocol PERLERT defined in this document assumes that server and
   client have shared certain information beforehand via another way of
   communication like a web page served using HTTP protocol.  For
   example, the client must know a port number and an instance number
   before proceeding to participate in a simulation run on a server.

   Also, although it is often desired to know the full feedback from the
   environment, PERLERT focuses on real-time interaction where human
   agents can interact with AI agents even if that means that
   information can be lost due to network packet loss.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Montero                 Expires February 14, 2021               [Page 1]
Internet-Draft                   PERLERT                     August 2020

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 14, 2021.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Communication Phases  . . . . . . . . . . . . . . . . . . . .   3
   3.  Messages Specification  . . . . . . . . . . . . . . . . . . .   3
     3.1.  Terms . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     3.2.  Client Message Types  . . . . . . . . . . . . . . . . . .   5
     3.3.  Server Message Types  . . . . . . . . . . . . . . . . . .   6
   4.  UDP/IP Ports  . . . . . . . . . . . . . . . . . . . . . . . .   7
   5.  Example Case  . . . . . . . . . . . . . . . . . . . . . . . .   8
   6.  Additional Considerations . . . . . . . . . . . . . . . . . .   8
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   9.  Normative References  . . . . . . . . . . . . . . . . . . . .   9
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   This document specifies PERLERT (Protocol for Evaluation of
   Reinforcement Learning Environments in Real Time).

   It is intended to be used in the context of reinforcement learning
   problems analysis.  In reinforcement learning problems an agent sends
   an action to an environment.  The environment acts as a black box

Montero                 Expires February 14, 2021               [Page 2]
Internet-Draft                   PERLERT                     August 2020

   returning an observation and a reward for the agent, whose goal is to
Show full document text