Network Working Group Greg Rabil
INTERNET DRAFT Mike Dooley
Arun Kapur
Quadritek Systems
Ralph Droms
Bucknell University
November 1997
Expires May 1998
DHCP Failover Protocol
<draft-ietf-dhc-failover-00.txt>
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).
Abstract
DHCP [RFC 2131] allows for multiple servers to be operating on a
single network. Some sites are interested in running multiple servers
in such a way so as to provide redundancy in case of server failure.
In order for this to work reliably, the servers must maintain a
consistent database of the lease information. This implies that
servers will need to coordinate any and all lease activity so that
this information is synchronized in case of failover.
This document defines a protocol to provide this synchronization
between two servers. One server is designated the ''primary'' server,
the other is the ''secondary'' server. Additionally, this document
describes a protocol for the automatic transfer of control from the
primary to the secondary in the case of failure (failover), as well
Rabil, Dooley, Kapur, Droms [Page 1]
DRAFT DHCP Failover Protocol November 1997
as the re-establishment of control by the primary server.
1.0 Introduction
As the use of DHCP servers in networked environments grows, the
dependency of those networks on the DHCP server increases. This is
particularly true of the hosts that receive their configuration
information from the DHCP server. Therefore, it is very important to
be able to provide reliable, continuous availability of DHCP
services.
This specification describes a protocol to support automatic failover
from a primary to its secondary server. The failover mechanism
allows the secondary server to perform DHCP actions while the primary
is down. Additionally, the protocol defines how control is passed
back to the primary when it becomes operational again.
In providing the specification for the failover, the protocol
specifies how to guarantee reliable delivery of changes to the
secondary. This is required to synchronize the secondary's lease
data with that of the primary. The protocol further specifies a
mechanism for determining the state (operational or not) of the
primary server. The secondary will be able to automatically service
DHCP requests upon failover. When the primary server becomes
available again, the secondary will convey any changes that occurred
since the time of failover back to the primary prior to the primary
becoming operational.
1.1 Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC 2119].
1.2 Terminology
This document uses the following terms:
o "DHCP client" or "client"
A DHCP client is an Internet host using DHCP to obtain
configuration parameters such as a network address.
o "DHCP server" or "server"
A DHCP server is an Internet host that returns configuration
Rabil, Dooley, Kapur, Droms [Page 2]
DRAFT DHCP Failover Protocol November 1997
parameters to DHCP clients.
o "primary server" or "primary"
A DHCP server configured to provide primary service to a set of
DHCP clients.
o "secondary server" or "secondary"
A DHCP server configured to act as a backup to a primary server;
the secondary answers requests from DHCP clients only if its
primary is unable to respond.
o "bindings database"
The collection of bindings managed by a primary and secondary.
2.0 Protocol Summary
The protocol necessary in providing redundant/failover servers can be
grouped in three areas:
o Messages to keep the secondary server's lease data synchronized
with that of the primary so that when failover occurs, there is no
degradation of service
o Messages that allow the secondary to determine the operational
state of the primary, so as to know when to start servicing DHCP
traffic
o Messages that are used to coordinate the primary regaining control
when it has become available again.
2.1 Primary keeps secondary lease data synchronized
The messages for keeping the secondary's lease data up to date
include the following:
DHCPBNDADD - Primary notifies secondary of new binding
DHCPBNDUPD - Primary notifies secondary of modified binding
(e.g., extended lease)
DHCPBNDDEL - Primary notifies secondary of deleted binding
(e.g., expired or released lease)
In response to any of the above messages, the secondary server will
respond to the primary with a message describing the status of the
binding addition, modification, or deletion.
Rabil, Dooley, Kapur, Droms [Page 3]
DRAFT DHCP Failover Protocol November 1997
DHCPBNDACK - Positive acknowledgment of binding change
DHCPBNDNAK - Negative acknowledgment of binding change
2.2 Determination of operational state of a server
In order to determine the state of a given server, a participant can
use the following message to poll (or "ping") the server:
DHCPPOLL - Check if server is operational
In response to the DHCPPOLL message, the participant will listen for
the following:
DHCPPRPL - Poll reply
2.3 Primary requests control from the secondary
After a failover, when the primary server is restarted, the following
messages are used to coordinate the primary taking control back from
the secondary:
DHCPCTLREQ - Request for control
DHCPCTLRET - Return of control initiated
DHCPCTLACK - Return of control completed
3 Message formats and semantics
The failover protocol messages are encoded as a DHCP/BOOTP option in
a DHCP message. A DHCP message carrying a failover protocol message
carries only the failover protocol message option and no other
options. The DHCP message is unicast from the source to the
destination.
The option code for these messages is TBD. Within each failover
protocol message, the specific message type is indicated by an option
subcode in the first octet of the data area of the option. The 'len'
field includes the number of octets in the option subcode byte and in
any additional data carried in the failover protocol message.
Bindings are encoded in a format that is TBD.
DISCUSSION
The use of the REQUEST/REPLY field in the DHCP message header and
the UDP port to be used needs to be considered.
The use of existing DHCP options and header fields to encode
Rabil, Dooley, Kapur, Droms [Page 4]
DRAFT DHCP Failover Protocol November 1997
bindings needs to be considered.
The sender places a 32-bit number in the DHCP header 'xid' field to
uniquely identify each failover protocol message. The receiver
copies the contents of the 'xid' field into any reply or
acknowledgment message.
The sender is responsible for reliable transmission and any
retransmission.
3.1 Primary keeps secondary lease data synchronized
DHCPBNDADD
------------------------------------------
| XX | len | 1 | Binding information (TBD)
------------------------------------------
The primary sends a DHCPBNDADD message to inform the secondary of a
binding that has been added to the primary's set of bindings.
DHCPBNDUPD
------------------------------------------
| XX | len | 2 | Binding information (TBD)
------------------------------------------
The primary sends a DHCPBNDUPD message to inform the secondary of a
binding that has been changed in the primary's set of bindings.
DHCPBNDDEL
------------------------------------------
| XX | len | 3 | Binding information (TBD)
------------------------------------------
The primary sends a DHCPBNDDEL message to inform the secondary of a
binding that has been deleted from the primary's set of bindings.
DHCPBNDACK
--------------
| XX | 1 | 4 |
--------------
The secondary sends a DHCPBNDACK message to the primary to inform the
primary that the binding change request identified by the 'xid' field
Rabil, Dooley, Kapur, Droms [Page 5]
DRAFT DHCP Failover Protocol November 1997
has successfully been completed.
DHCPBNDNAK
--------------
| XX | 1 | 5 |
--------------
The secondary sends a DHCPBNDNAK message to the primary to inform the
primary that the secondary could not complete the binding change
request. For example, the secondary would send a DHCPBNDNAK in
response to a DHCPBNDUPD request for which the secondary had no
recorded binding.
DISCUSSION
The use of an additional field to indicate the reason for the
DHCPBNDNAK message should be considered.
3.2 Determination of operational state of a server
DHCPPOLL
----------------------
| XX | 2 | 6 | flags |
----------------------
A DHCP participant sends a DHCPPOLL message to a server to determine
whether that server is currently operational.
A DHCP secondary periodically sends a DHCPPOLL to its primary to
determine if the primary is currently operational.
A DHCP primary sends a DHCPPOLL to its secondary if the primary needs
to determine if the secondary is operational.
A DHCP client sends a DHCPPOLL to a DHCP server to determine if the
server is currently operational.
The flags octet is defined as follows: CRRRRRRR, where the secondary
sets the 'C' bit to 1 to indicate that it has taken control of the
bindings database, and the 'R' bits are reserved for future use.
DHCPPRPL
----------------------
| XX | 2 | 7 | flags |
----------------------
Rabil, Dooley, Kapur, Droms [Page 6]
DRAFT DHCP Failover Protocol November 1997
A DHCP participant replies to a DHCPPOLL message with a DHCPPRPL
message. The sender copies the 'xid' field from the DHCPPOLL message
header into the 'xid' field in the DHCPPRPL message,
The flags octet is defined as follows: ERRRRRRR, where the primary
sets the 'E' bit to 1 (in response to a DHCPPOLL message with the 'C'
bit set to 1) to indicate to the secondary that the primary has not
relinquished control of the database. See section 4 for additional
details.
DISCUSSION
The DHCPPOLL and DHCPPRPL messages might also be useful to DHCP
clients to aid in determining the availability of specific DHCP
servers. Such use would avoid overloading the DHCPDISCOVER
message.
3.3 Primary requests control from the secondary
DHCPCTLREQ
--------------
| XX | 1 | 8 |
--------------
A primary sends a DHCPCTLREQ message to its secondary to request
control of the bindings database from the secondary.
DHCPCTLRET
--------------
| XX | 1 | 9 |
--------------
A secondary sends a DHCPCTLRET to its primary to begin the process of
returning control of the bindings database to the secondary. After
sending the DHCPCTLRET message, the secondary sends a sequence of
DHCPBNDADD, DHCPBNDUPD and DHCPBNDDEL messages to synchronize the
primary's bindings database with the secondary's database.
DHCPCTLACK
---------------
| XX | 1 | 10 |
---------------
A secondary sends a DHCPCTLACK to its primary to indicate that the
secondary has finished returning control to the primary.
Rabil, Dooley, Kapur, Droms [Page 7]
DRAFT DHCP Failover Protocol November 1997
DISCUSSION
Primary and secondary servers may need to exchange some additional
information in DHCPCTLREQ, DHCPCTLRET and DHCPCTLACK messages.
This information would be encoded in an additional 'flags' or
'data' field added to the control messages.
The synchronization essentially requires a reliable transmission
protocol using DHCPBND* and DHCPBNDACK messages. An alternative
to using DHCPBND* messages to transfer bindings updates to the
primary would be to devise a separate transfer protocol based on
TCP.
4 Exchange of control between primary and secondary
The primary and secondary servers coordinate the exchange control
over the bindings database through the use of DHCPPOLL and DHCPCTLREQ
messages. In normal operation:
o the primary sends notification of each change to its bindings
database to the secondary, and the secondary keeps its bindings
database synchronized with the primary's database
o the secondary periodically sends DHCPPOLL messages to the primary,
and the primary responds to each DHCPPOLL message with a DHCPPRPL
message
If the secondary does not receive a DHCPPRPL response message, the
secondary takes control of the bindings database and begins
answering requests from DHCP clients.
DISCUSSION
The conditions under which a secondary takes control of the
bindings database, e.g., the number of consecutive missing
acknowledgments, should be configurable in the secondary by the
DHCP administrator.
The secondary records any changes it makes to the bindings database
while it has control. The secondary continues to send DHCPPOLL
messages to the primary, with the 'D' bit set.
To regain control of the bindings database, e.g., after the primary
server has failed, the primary sends a DHCPCTLREQ message to the
secondary. The secondary stops answering DHCP client requests, and
responds to its primary with a DHCPCTLRET message. After sending
the DHCPCTLRET message, the secondary sends DHCPBND* messages for
each of the changes it has made to the bindings database. The
Rabil, Dooley, Kapur, Droms [Page 8]
DRAFT DHCP Failover Protocol November 1997
primary sends a DHCPBNDACK for each of the DHCPBND* messages it
receives. The secondary completes the transfer of control by
sending a DHCPCTLACK message to its primary.
If the primary server has not failed and has been answering DHCP
client requests, and receives a DHCPPOLL message from its secondary
with the 'D' bit set, then both the primary and the secondary have
been answering DHCP client requests, and their bindings databases
may be unsynchronized. In this situation, the primary responds to
the secondary with a DHCPPRPL message with the 'E' bit set. Both
the primary and secondary servers notify a network administrator,
who must take steps to manually resynchronize the two bindings
databases.
DISCUSSION
It may be appropriate to state that, under administrator
control, the primary and secondary both stop some or all DHCP
services when the servers discover that both have been
allocating DHCP addresses simultaneously and their databases are
potentially unsynchronized.
5 Acknowledgments
6 References
[RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119, March 1997.
[RFC 2131] Droms, R., "Dynamic Host Configuration Protocol",
RFC2131, March 1997.
[RFC 2132] Droms, R., "DHCP Options and BOOTP Vendor Extensions",
RFC2132, March 1997.
7 Security Considerations
8 Authors' Addresses
Greg Rabil, Mike Dooley, Arun Kapur
Quadritek Systems, Inc.
10 Valley Stream Parkway, Quite 240
Malvern, PA 19355
Phone: (800) 408-2747
E-mail: grabil@quadritek.com
mdooley@quadritek.com
akapur@quadritek.com
Rabil, Dooley, Kapur, Droms [Page 9]
DRAFT DHCP Failover Protocol November 1997
Ralph Droms
323 Dana Engineering
Bucknell University
Lewisburg, PA 17837
Phone: (717) 524-1145
E-mail: droms@bucknell.edu
Rabil, Dooley, Kapur, Droms [Page 10]