Network Working Group                                       T. Dreibholz
Internet-Draft                                Simula Research Laboratory
Intended status: Informational                           August 06, 2017
Expires: February 7, 2018


   Applicability of Reliable Server Pooling for Real-Time Distributed
                               Computing
            draft-dreibholz-rserpool-applic-distcomp-23.txt

Abstract

   This document describes the applicability of the Reliable Server
   Pooling architecture to manage real-time distributed computing pools
   and access the resources of such pools.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 7, 2018.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.




Dreibholz               Expires February 7, 2018                [Page 1]


Internet-Draft     RSerPool for Distributed Computing        August 2017


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Distributed Computing using RSerPool  . . . . . . . . . . . .   3
     2.1.  Requirements  . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Architecture  . . . . . . . . . . . . . . . . . . . . . .   4
     2.3.  Limitations . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Reference Implementation  . . . . . . . . . . . . . . . . . .   5
   4.  Testbed Platform  . . . . . . . . . . . . . . . . . . . . . .   5
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .   6
     7.2.  Informative References  . . . . . . . . . . . . . . . . .   7
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   Reliable Server Pooling defines protocols for providing highly
   available services.  The services are located in a pool of redundant
   servers and if a server fails, another server will take over.  The
   only requirement put on these servers belonging to the pool is that
   if state is maintained by the server, this state must be transferred
   to the other server taking over.

   The goal is to provide server-based redundancy.  Transport and
   network level redundancy are handled by the transport and network
   layer protocols.

   The application may choose to distribute its traffic over the servers
   of the pool conforming to a certain policy.

1.1.  Scope

   The scope of this document is to explain the way of using Reliable
   Server Pooling mechanisms to manage and access pools of Distributed
   Computing resources.

1.2.  Terminology

   The terms are commonly identified in related work and can be found in
   the Aggregate Server Access Protocol and Endpoint Handlespace
   Redundancy Protocol Common Parameters document [RFC5354].






Dreibholz               Expires February 7, 2018                [Page 2]


Internet-Draft     RSerPool for Distributed Computing        August 2017


2.  Distributed Computing using RSerPool

2.1.  Requirements

   The application scenario for Distributed Computing is defined as
   follows:

   o  Clients generate large computation jobs.  Jobs have to be
      processed by servers as soon as possible (real-time), i.e. unlike
      concepts like SETI@home [SETIatHome-Website], it is not possible
      to let clients fetch a job, process it later and may be some day
      upload the result.

   o  Jobs may be partitionable, i.e. they can be split up to smaller
      pieces which can be processed independently and the processing
      results can be concatenated to the processing result of the
      complete job.  Jobs have to be processed by servers.

   o  Servers may be unreliable; i.e. user computers may be temporarily
      added to the pool of computing resources and may be revoked when
      they are used again by their owners.  Furthermore, they may simply
      disappear because of broken network connections (modems, etc.) or
      power turned off.

   o  The processing power of servers in a pool of computing resources
      may be very heterogeneous, i.e. a few supercomputers and many low-
      end user PCs.

   Maintaining a Distributed Computing pool for the scenario described
   above arises the following requirements to the pool management:

   o  It must be possible to manage large server pools, e.g. up to some
      hundreds or even thousands of servers.

   o  Due to heterogeneous processing resources within a pool, it must
      be possible to use appropriate server selection procedures to
      meaningfully utilize the available resources.

   o  It must be possible to dynamically add and remove servers.

   o  Servers may be unreliable, especially when the servers are
      represented by user PCs.  Failover mechanisms are required to
      continue an interrupted computation session.








Dreibholz               Expires February 7, 2018                [Page 3]


Internet-Draft     RSerPool for Distributed Computing        August 2017


2.2.  Architecture

   All requirements for pool and session management of the Distributed
   Computing scenario defined in the previous section can be fulfilled
   by the Reliable Server Pooling architecture:

   o  An efficient implementation of the handlespace management
      structures allows pools to contain thousands of elements.
      Handlespace management structures have been proposed, implemented
      and analyzed in [IJHIT2008], [Dre2006].

   o  RSerPool allows to specify server selection rules by pool member
      selection policies [RFC5356].  A set of adaptive and non-adaptive
      policies is already defined.  To fulfill the requirements of new
      applications, it is also possible to define new policies.
      Research has already been made on the subject of load distribution
      efficiency of pool policies in Distributed Computing scenarios:
      see [Dre2006], [IJAIT2009], [LCN2005], [Tencon2005],
      [Euromicro2007] for details.

   o  Dynamic addition and removal of PEs is a feature of RSerPool
      [RFC5352].

   o  The control/data channel concept [RFC5351] of RSerPool realizes a
      session layer.  That is, RSerPool already handles the main task of
      maintaining and monitoring connections between PUs and PEs; the
      only task of the application layer to provide full failover
      functionality is to realize an application-dependent failover
      procedure.  By the usage of client-based state synchronization
      [IJAIT2009], [LCN2002] in the form of ASAP Cookies, a failover may
      be fully transparent to the PU while only a state restoration is
      necessary on the PE side.  A demo application [RSerPool-Website]
      using the RSerPool session layer in a Distributed Computing
      application is described in [Infocom2005].

2.3.  Limitations

   Applying RSerPool for distributed computing applications, the duties
   of the RSerPool architecture are still limited to the management of
   pools and independent sessions only.  It is in particular a non-goal
   to provide functionalities like data synchronization among sessions,
   user authentication, accounting or the support for more than one
   administrative domain.  Such functionalities are considered to be
   application-specific and are therefore out of the scope of RSerPool.







Dreibholz               Expires February 7, 2018                [Page 4]


Internet-Draft     RSerPool for Distributed Computing        August 2017


3.  Reference Implementation

   The RSerPool reference implementation RSPLIB, including example
   Distributed Computing applications, can be found at
   [RSerPool-Website].  It supports the functionalities defined by
   [RFC5351], [RFC5352], [RFC5353], [RFC5354] and [RFC5355] as well as
   the options [I-D.dreibholz-rserpool-asap-hropt],
   [I-D.dreibholz-rserpool-enrp-takeover] and
   [I-D.dreibholz-rserpool-delay].  An introduction to this
   implementation is provided in [Dre2006].

4.  Testbed Platform

   A large-scale and realistic Internet testbed platform with support
   for the multi-homing feature of the underlying SCTP protocol is
   NorNet.  A description of NorNet is provided in [PAMS2013-NorNet],
   some further information can be found on the project website
   [NorNet-Website].

5.  Security Considerations

   The protocols used in the Reliable Server Pooling architecture only
   try to increase the availability of the servers in the network.
   RSerPool protocols do not contain any protocol mechanisms which are
   directly related to user message authentication, integrity and
   confidentiality functions.  For such features, it depends on the
   IPSEC protocols or on Transport Layer Security (TLS) protocols for
   its own security and on the architecture and/or security features of
   its user protocols.

   The RSerPool architecture allows the use of different transport
   protocols for its application and control data exchange.  These
   transport protocols may have mechanisms for reducing the risk of
   blind denial-of-service attacks and/or masquerade attacks.  If such
   measures are required by the applications, then it is advised to
   check the SCTP (see [RFC4960]) applicability statement [RFC3257] for
   guidance on this issue.

6.  IANA Considerations

   This document introduces no additional considerations for IANA.

7.  References








Dreibholz               Expires February 7, 2018                [Page 5]


Internet-Draft     RSerPool for Distributed Computing        August 2017


7.1.  Normative References

   [RFC3257]  Coene, L., "Stream Control Transmission Protocol
              Applicability Statement", RFC 3257, DOI 10.17487/RFC3257,
              April 2002, <http://www.rfc-editor.org/info/rfc3257>.

   [RFC4960]  Stewart, R., Ed., "Stream Control Transmission Protocol",
              RFC 4960, DOI 10.17487/RFC4960, September 2007,
              <http://www.rfc-editor.org/info/rfc4960>.

   [RFC5351]  Lei, P., Ong, L., Tuexen, M., and T. Dreibholz, "An
              Overview of Reliable Server Pooling Protocols", RFC 5351,
              DOI 10.17487/RFC5351, September 2008,
              <http://www.rfc-editor.org/info/rfc5351>.

   [RFC5352]  Stewart, R., Xie, Q., Stillman, M., and M. Tuexen,
              "Aggregate Server Access Protocol (ASAP)", RFC 5352,
              DOI 10.17487/RFC5352, September 2008,
              <http://www.rfc-editor.org/info/rfc5352>.

   [RFC5353]  Xie, Q., Stewart, R., Stillman, M., Tuexen, M., and A.
              Silverton, "Endpoint Handlespace Redundancy Protocol
              (ENRP)", RFC 5353, DOI 10.17487/RFC5353, September 2008,
              <http://www.rfc-editor.org/info/rfc5353>.

   [RFC5354]  Stewart, R., Xie, Q., Stillman, M., and M. Tuexen,
              "Aggregate Server Access Protocol (ASAP) and Endpoint
              Handlespace Redundancy Protocol (ENRP) Parameters",
              RFC 5354, DOI 10.17487/RFC5354, September 2008,
              <http://www.rfc-editor.org/info/rfc5354>.

   [RFC5355]  Stillman, M., Ed., Gopal, R., Guttman, E., Sengodan, S.,
              and M. Holdrege, "Threats Introduced by Reliable Server
              Pooling (RSerPool) and Requirements for Security in
              Response to Threats", RFC 5355, DOI 10.17487/RFC5355,
              September 2008, <http://www.rfc-editor.org/info/rfc5355>.

   [RFC5356]  Dreibholz, T. and M. Tuexen, "Reliable Server Pooling
              Policies", RFC 5356, DOI 10.17487/RFC5356, September 2008,
              <http://www.rfc-editor.org/info/rfc5356>.

   [I-D.dreibholz-rserpool-asap-hropt]
              Dreibholz, T., "Handle Resolution Option for ASAP", draft-
              dreibholz-rserpool-asap-hropt-20 (work in progress),
              January 2017.






Dreibholz               Expires February 7, 2018                [Page 6]


Internet-Draft     RSerPool for Distributed Computing        August 2017


   [I-D.dreibholz-rserpool-delay]
              Dreibholz, T. and X. Zhou, "Definition of a Delay
              Measurement Infrastructure and Delay-Sensitive Least-Used
              Policy for Reliable Server Pooling", draft-dreibholz-
              rserpool-delay-19 (work in progress), January 2017.

   [I-D.dreibholz-rserpool-enrp-takeover]
              Dreibholz, T. and X. Zhou, "Takeover Suggestion Flag for
              the ENRP Handle Update Message", draft-dreibholz-rserpool-
              enrp-takeover-17 (work in progress), January 2017.

7.2.  Informative References

   [Dre2006]  Dreibholz, T., "Reliable Server Pooling - Evaluation,
              Optimization and Extension of a Novel IETF Architecture",
              March 2007, <https://duepublico.uni-duisburg-
              essen.de/servlets/DerivateServlet/Derivate-16326/
              Dre2006_final.pdf>.

   [Euromicro2007]
              Dreibholz, T., Zhou, X., and E. Rathgeb, "A Performance
              Evaluation of RSerPool Server Selection Policies in
              Varying Heterogeneous Capacity Scenarios", Proceedings of
              the 33rd IEEE EuroMirco Conference on Software Engineering
              and Advanced Applications Pages 157-164,
              ISBN 0-7695-2977-1, DOI 10.1109/EUROMICRO.2007.9, August
              2007, <https://www.wiwi.uni-due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/EuroMicro2007.pdf>.

   [IJAIT2009]
              Dreibholz, T. and E. Rathgeb, "Overview and Evaluation of
              the Server Redundancy and Session Failover Mechanisms in
              the Reliable Server Pooling Framework", International
              Journal on Advances in Internet Technology (IJAIT) Number
              1, Volume 2, Pages 1-14, ISSN 1942-2652, June 2009,
              <https://www.wiwi.uni-due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/IJAIT2009.pdf>.

   [IJHIT2008]
              Dreibholz, T. and E. Rathgeb, "An Evaluation of the Pool
              Maintenance Overhead in Reliable Server Pooling Systems",
              SERSC International Journal on Hybrid Information
              Technology (IJHIT) Number 2, Volume 1, Pages 17-32,
              ISSN 1738-9968, April 2008, <https://www.wiwi.uni-
              due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/IJHIT2008.pdf>.





Dreibholz               Expires February 7, 2018                [Page 7]


Internet-Draft     RSerPool for Distributed Computing        August 2017


   [Infocom2005]
              Dreibholz, T. and E. Rathgeb, "An Application
              Demonstration of the Reliable Server Pooling Framework",
               Proceedings of the 24th IEEE INFOCOM, March 2005,
              <https://www.wiwi.uni-due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/Infocom2005.pdf>.

   [LCN2002]  Dreibholz, T., "An Efficient Approach for State Sharing in
              Server Pools", Proceedings of the 27th IEEE Local Computer
              Networks Conference (LCN) Pages 348-349,
              ISBN 0-7695-1591-6, DOI 10.1109/LCN.2002.1181806, November
              2002, <https://www.wiwi.uni-due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/StateSharing-Paper-
              ShortVersion.pdf>.

   [LCN2005]  Dreibholz, T. and E. Rathgeb, "On the Performance of
              Reliable Server Pooling Systems", Proceedings of the IEEE
              Conference on Local Computer Networks (LCN) 30th
              Anniversary Pages 200-208, ISBN 0-7695-2421-4,
              DOI 10.1109/LCN.2005.98, November 2005,
              <https://www.wiwi.uni-due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/LCN2005.pdf>.

   [Tencon2005]
              Dreibholz, T. and E. Rathgeb, "The Performance of Reliable
              Server Pooling Systems in Different Server Capacity
              Scenarios", Proceedings of the IEEE
              TENCON ISBN 0-7803-9312-0, DOI 10.1109/TENCON.2005.300939,
              November 2005, <https://www.wiwi.uni-
              due.de/fileadmin/fileupload/I-
              TDR/ReliableServer/Publications/Tencon2005.pdf>.

   [PAMS2013-NorNet]
              Dreibholz, T. and E. Gran, "Design and Implementation of
              the NorNet Core Research Testbed for Multi-Homed Systems",
              Proceedings of the 3nd International Workshop on Protocols
              and Applications with Multi-Homing Support (PAMS) Pages
              1094-1100, ISBN 978-0-7695-4952-1,
              DOI 10.1109/WAINA.2013.71, March 2013,
              <https://www.simula.no/file/
              threfereedinproceedingsreference2012-12-207643198512pdf/
              download>.

   [SETIatHome-Website]
              SETI Project, , "SETI@home: Search for Extraterrestrial
              Intelligence at home", 2016,
              <http://setiathome.ssl.berkeley.edu>.




Dreibholz               Expires February 7, 2018                [Page 8]


Internet-Draft     RSerPool for Distributed Computing        August 2017


   [RSerPool-Website]
              Dreibholz, T., "Thomas Dreibholz's RSerPool Page",
              Online: http://www.iem.uni-due.de/~dreibh/rserpool/, 2016,
              <http://www.iem.uni-due.de/~dreibh/rserpool/>.

   [NorNet-Website]
              Dreibholz, T., "NorNet - A Real-World, Large-Scale Multi-
              Homing Testbed", Online: https://www.nntb.no/, 2017,
              <https://www.nntb.no/>.

Author's Address

   Thomas Dreibholz
   Simula Research Laboratory, Network Systems Group
   Martin Linges vei 17
   1364 Fornebu, Akershus
   Norway

   Phone: +47-6782-8200
   Fax:   +47-6782-8201
   Email: dreibh@simula.no
   URI:   http://www.iem.uni-due.de/~dreibh/





























Dreibholz               Expires February 7, 2018                [Page 9]