Network Working Group                                         Greg Rabil
INTERNET DRAFT                                               Mike Dooley
                                                              Arun Kapur
                                                       Quadritek Systems

                                                             Ralph Droms
                                                     Bucknell University

                                                           November 1997
                                                        Expires May 1998


                         DHCP Failover Protocol
                    <draft-ietf-dhc-failover-00.txt>

Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

Abstract

   DHCP [RFC 2131] allows for multiple servers to be operating on a
   single network. Some sites are interested in running multiple servers
   in such a way so as to provide redundancy in case of server failure.
   In order for this to work reliably, the servers must maintain a
   consistent database of the lease information.  This implies that
   servers will need to coordinate any and all lease activity so that
   this information is synchronized in case of failover.

   This document defines a protocol to provide this synchronization
   between two servers.  One server is designated the ''primary'' server,
   the other is the ''secondary'' server.  Additionally, this document
   describes a protocol for the automatic transfer of control from the
   primary to the secondary in the case of failure (failover), as well



Rabil, Dooley, Kapur, Droms                                     [Page 1]


DRAFT                    DHCP Failover Protocol            November 1997


   as the re-establishment of control by the primary server.


1.0 Introduction

   As the use of DHCP servers in networked environments grows, the
   dependency of those networks on the DHCP server increases.  This is
   particularly true of the hosts that receive their configuration
   information from the DHCP server.  Therefore, it is very important to
   be able to provide reliable, continuous availability of DHCP
   services.

   This specification describes a protocol to support automatic failover
   from a primary to its secondary server.  The failover mechanism
   allows the secondary server to perform DHCP actions while the primary
   is down.  Additionally, the protocol defines how control is passed
   back to the primary when it becomes operational again.

   In providing the specification for the failover, the protocol
   specifies how to guarantee reliable delivery of changes to the
   secondary.  This is required to synchronize the secondary's lease
   data with that of the primary.  The protocol further specifies a
   mechanism for determining the state (operational or not) of the
   primary server.  The secondary will be able to automatically service
   DHCP requests upon failover.  When the primary server becomes
   available again, the secondary will convey any changes that occurred
   since the time of failover back to the primary prior to the primary
   becoming operational.

1.1 Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC 2119].

1.2 Terminology

   This document uses the following terms:


   o "DHCP client" or "client"

     A DHCP client is an Internet host using DHCP to obtain
     configuration parameters such as a network address.

   o "DHCP server" or "server"

     A DHCP server is an Internet host that returns configuration



Rabil, Dooley, Kapur, Droms                                     [Page 2]


DRAFT                    DHCP Failover Protocol            November 1997


     parameters to DHCP clients.

   o "primary server" or "primary"

     A DHCP server configured to provide primary service to a set of
     DHCP clients.

   o "secondary server" or "secondary"

     A DHCP server configured to act as a backup to a primary server;
     the secondary answers requests from DHCP clients only if its
     primary is unable to respond.

   o "bindings database"

     The collection of bindings managed by a primary and secondary.

2.0  Protocol Summary

   The protocol necessary in providing redundant/failover servers can be
   grouped in three areas:

   o Messages to keep the secondary server's lease data synchronized
     with that of the primary so that when failover occurs, there is no
     degradation of service

   o Messages that allow the secondary to determine the operational
     state of the primary, so as to know when to start servicing DHCP
     traffic

   o Messages that are used to coordinate the primary regaining control
     when it has become available again.

2.1  Primary keeps secondary lease data synchronized

   The messages for keeping the secondary's lease data up to date
   include the following:

      DHCPBNDADD - Primary notifies secondary of new binding
      DHCPBNDUPD - Primary notifies secondary of modified binding
                   (e.g., extended lease)
      DHCPBNDDEL - Primary notifies secondary of deleted binding
                   (e.g., expired or released lease)

   In response to any of the above messages, the secondary server will
   respond to the primary with a message describing the status of the
   binding addition, modification, or deletion.




Rabil, Dooley, Kapur, Droms                                     [Page 3]


DRAFT                    DHCP Failover Protocol            November 1997


      DHCPBNDACK - Positive acknowledgment of binding change
      DHCPBNDNAK - Negative acknowledgment of binding change


2.2  Determination of operational state of a server

   In order to determine the state of a given server, a participant can
   use the following message to poll (or "ping") the server:

      DHCPPOLL - Check if server is operational

   In response to the DHCPPOLL message, the participant will listen for
   the following:

      DHCPPRPL - Poll reply


2.3  Primary requests control from the secondary

   After a failover, when the primary server is restarted, the following
   messages are used to coordinate the primary taking control back from
   the secondary:

      DHCPCTLREQ - Request for control
      DHCPCTLRET - Return of control initiated
      DHCPCTLACK - Return of control completed

3 Message formats and semantics

   The failover protocol messages are encoded as a DHCP/BOOTP option in
   a DHCP message.  A DHCP message carrying a failover protocol message
   carries only the failover protocol message option and no other
   options.  The DHCP message is unicast from the source to the
   destination.

   The option code for these messages is TBD.  Within each failover
   protocol message, the specific message type is indicated by an option
   subcode in the first octet of the data area of the option.  The 'len'
   field includes the number of octets in the option subcode byte and in
   any additional data carried in the failover protocol message.
   Bindings are encoded in a format that is TBD.

   DISCUSSION

      The use of the REQUEST/REPLY field in the DHCP message header and
      the UDP port to be used needs to be considered.

      The use of existing DHCP options and header fields to encode



Rabil, Dooley, Kapur, Droms                                     [Page 4]


DRAFT                    DHCP Failover Protocol            November 1997


      bindings needs to be considered.

   The sender places a 32-bit number in the DHCP header 'xid' field to
   uniquely identify each failover protocol message.  The receiver
   copies the contents of the 'xid' field into any reply or
   acknowledgment message.

   The sender is responsible for reliable transmission and any
   retransmission.

3.1 Primary keeps secondary lease data synchronized

   DHCPBNDADD

      ------------------------------------------
      | XX | len | 1 | Binding information (TBD)
      ------------------------------------------

   The primary sends a DHCPBNDADD message to inform the secondary of a
   binding that has been added to the primary's set of bindings.

   DHCPBNDUPD

      ------------------------------------------
      | XX | len | 2 | Binding information (TBD)
      ------------------------------------------

   The primary sends a DHCPBNDUPD message to inform the secondary of a
   binding that has been changed in the primary's set of bindings.

   DHCPBNDDEL

      ------------------------------------------
      | XX | len | 3 | Binding information (TBD)
      ------------------------------------------

   The primary sends a DHCPBNDDEL message to inform the secondary of a
   binding that has been deleted from the primary's set of bindings.


   DHCPBNDACK

      --------------
      | XX | 1 | 4 |
      --------------

   The secondary sends a DHCPBNDACK message to the primary to inform the
   primary that the binding change request identified by the 'xid' field



Rabil, Dooley, Kapur, Droms                                     [Page 5]


DRAFT                    DHCP Failover Protocol            November 1997


   has successfully been completed.

   DHCPBNDNAK

      --------------
      | XX | 1 | 5 |
      --------------

   The secondary sends a DHCPBNDNAK message to the primary to inform the
   primary that the secondary could not complete the binding change
   request.  For example, the secondary would send a DHCPBNDNAK in
   response to a DHCPBNDUPD request for which the secondary had no
   recorded binding.

   DISCUSSION
      The use of an additional field to indicate the reason for the
      DHCPBNDNAK message should be considered.

3.2  Determination of operational state of a server

   DHCPPOLL

      ----------------------
      | XX | 2 | 6 | flags |
      ----------------------

   A DHCP participant sends a DHCPPOLL message to a server to determine
   whether that server is currently operational.

   A DHCP secondary periodically sends a DHCPPOLL to its primary to
   determine if the primary is currently operational.

   A DHCP primary sends a DHCPPOLL to its secondary if the primary needs
   to determine if the secondary is operational.

   A DHCP client sends a DHCPPOLL to a DHCP server to determine if the
   server is currently operational.

   The flags octet is defined as follows: CRRRRRRR, where the secondary
   sets the 'C' bit to 1 to indicate that it has taken control of the
   bindings database, and the 'R' bits are reserved for future use.

   DHCPPRPL

      ----------------------
      | XX | 2 | 7 | flags |
      ----------------------




Rabil, Dooley, Kapur, Droms                                     [Page 6]


DRAFT                    DHCP Failover Protocol            November 1997


   A DHCP participant replies to a DHCPPOLL message with a DHCPPRPL
   message.  The sender copies the 'xid' field from the DHCPPOLL message
   header into the 'xid' field in the DHCPPRPL message,

   The flags octet is defined as follows: ERRRRRRR, where the primary
   sets the 'E' bit to 1 (in response to a DHCPPOLL message with the 'C'
   bit set to 1) to indicate to the secondary that the primary has not
   relinquished control of the database.  See section 4 for additional
   details.

   DISCUSSION

      The DHCPPOLL and DHCPPRPL messages might also be useful to DHCP
      clients to aid in determining the availability of specific DHCP
      servers.  Such use would avoid overloading the DHCPDISCOVER
      message.

3.3  Primary requests control from the secondary

   DHCPCTLREQ

      --------------
      | XX | 1 | 8 |
      --------------

   A primary sends a DHCPCTLREQ message to its secondary to request
   control of the bindings database from the secondary.

   DHCPCTLRET

      --------------
      | XX | 1 | 9 |
      --------------

   A secondary sends a DHCPCTLRET to its primary to begin the process of
   returning control of the bindings database to the secondary.  After
   sending the DHCPCTLRET message, the secondary sends a sequence of
   DHCPBNDADD, DHCPBNDUPD and DHCPBNDDEL messages to synchronize the
   primary's bindings database with the secondary's database.

   DHCPCTLACK

      ---------------
      | XX | 1 | 10 |
      ---------------

   A secondary sends a DHCPCTLACK to its primary to indicate that the
   secondary has finished returning control to the primary.



Rabil, Dooley, Kapur, Droms                                     [Page 7]


DRAFT                    DHCP Failover Protocol            November 1997


   DISCUSSION

      Primary and secondary servers may need to exchange some additional
      information in DHCPCTLREQ, DHCPCTLRET and DHCPCTLACK messages.
      This information would be encoded in an additional 'flags' or
      'data' field added to the control messages.

      The synchronization essentially requires a reliable transmission
      protocol using DHCPBND* and DHCPBNDACK messages.  An alternative
      to using DHCPBND* messages to transfer bindings updates to the
      primary would be to devise a separate transfer protocol based on
      TCP.

4 Exchange of control between primary and secondary

   The primary and secondary servers coordinate the exchange control
   over the bindings database through the use of DHCPPOLL and DHCPCTLREQ
   messages.  In normal operation:

   o the primary sends notification of each change to its bindings
     database to the secondary, and the secondary keeps its bindings
     database synchronized with the primary's database

   o the secondary periodically sends DHCPPOLL messages to the primary,
     and the primary responds to each DHCPPOLL message with a DHCPPRPL
     message

     If the secondary does not receive a DHCPPRPL response message, the
     secondary takes control of the bindings database and begins
     answering requests from DHCP clients.

     DISCUSSION

        The conditions under which a secondary takes control of the
        bindings database, e.g., the number of consecutive missing
        acknowledgments, should be configurable in the secondary by the
        DHCP administrator.

     The secondary records any changes it makes to the bindings database
     while it has control.  The secondary continues to send DHCPPOLL
     messages to the primary, with the 'D' bit set.

     To regain control of the bindings database, e.g., after the primary
     server has failed, the primary sends a DHCPCTLREQ message to the
     secondary.  The secondary stops answering DHCP client requests, and
     responds to its primary with a DHCPCTLRET message.  After sending
     the DHCPCTLRET message, the secondary sends DHCPBND* messages for
     each of the changes it has made to the bindings database.  The



Rabil, Dooley, Kapur, Droms                                     [Page 8]


DRAFT                    DHCP Failover Protocol            November 1997


     primary sends a DHCPBNDACK for each of the DHCPBND* messages it
     receives.  The secondary completes the transfer of control by
     sending a DHCPCTLACK message to its primary.

     If the primary server has not failed and has been answering DHCP
     client requests, and receives a DHCPPOLL message from its secondary
     with the 'D' bit set, then both the primary and the secondary have
     been answering DHCP client requests, and their bindings databases
     may be unsynchronized.  In this situation, the primary responds to
     the secondary with a DHCPPRPL message with the 'E' bit set.  Both
     the primary and secondary servers notify a network administrator,
     who must take steps to manually resynchronize the two bindings
     databases.

     DISCUSSION

        It may be appropriate to state that, under administrator
        control, the primary and secondary both stop some or all DHCP
        services when the servers discover that both have been
        allocating DHCP addresses simultaneously and their databases are
        potentially unsynchronized.

5 Acknowledgments

6 References

     [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
     Requirement Levels", RFC 2119, March 1997.

     [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol",
     RFC2131, March 1997.

     [RFC 2132] Droms, R., "DHCP Options and BOOTP Vendor Extensions",
     RFC2132, March 1997.

7 Security Considerations

8 Authors' Addresses

     Greg Rabil, Mike Dooley, Arun Kapur
     Quadritek Systems, Inc.
     10 Valley Stream Parkway, Quite 240
     Malvern, PA 19355

     Phone:  (800) 408-2747
     E-mail: grabil@quadritek.com
             mdooley@quadritek.com
             akapur@quadritek.com



Rabil, Dooley, Kapur, Droms                                     [Page 9]


DRAFT                    DHCP Failover Protocol            November 1997


     Ralph Droms
     323 Dana Engineering
     Bucknell University
     Lewisburg, PA 17837

     Phone:  (717) 524-1145
     E-mail: droms@bucknell.edu












































Rabil, Dooley, Kapur, Droms                                    [Page 10]