Network Working Group Ralph Droms
INTERNET DRAFT Bucknell University
Greg Rabil
Mike Dooley
Arun Kapur
Quadritek Systems
Kim Kinnear
Mark Stapp
Cisco Systems
Steve Gonczi
Bernie Volz
Process Software
November 1998
Expires June 1999
DHCP Failover Protocol
<draft-ietf-dhc-failover-03.txt>
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
To view the entire list of current Internet-Drafts, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
Europe), ftp.nic.it (Southern Europe), munnari.oz.au (Pacific Rim),
ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).
Abstract
DHCP [RFC 2131] allows for multiple servers to be operating on a
single network. Some sites are interested in running multiple servers
in such a way so as to provide redundancy in case of server failure.
In order for this to work reliably, the cooperating primary and
Droms, et. al. [Page 1]
DRAFT November 1998
secondary servers must maintain a consistent database of the lease
information. This implies that servers will need to coordinate any
and all lease activity so that this information is synchronized in
case of failover.
This document defines a protocol to provide this synchronization
between two servers. One server is designated the "Primary" server,
the other is the "Secondary" server. Additionally, this document
describes a protocol for the automatic transfer of control from the
primary to the secondary in the case of failure (failover), as well
as a network partition.
This document further develops the concepts presented in draft-ietf-
dhc-failover-02.txt.
1. Introduction
As the use of DHCP servers in networked environments grows, the
dependency of those networks on the DHCP server increases. This is
particularly true of the hosts that receive their configuration
information from the DHCP server. Therefore, it is very important to
be able to provide reliable, continuous availability of DHCP ser-
vices.
This specification describes a protocol to support automatic failover
from a primary to its secondary server. The failover mechanism
allows the secondary server to perform DHCP actions while the primary
is down, or when a network failure prevents the primary and secondary
from communicating. The protocol also specifies how reintegration is
achieved when the primary again becomes operational or when the pri-
mary and secondary can again communicate.
In providing the specification for the failover, the protocol speci-
fies how to guarantee reliable delivery of binding changes to the
partner server. This is required to synchronize lease data between
the primary and the secondary. The protocol further specifies a
mechanism to allow either server to determine if it can communicate
with its partner. The secondary will automatically begin to service
DHCP requests whenever it cannot communicate with the primary. When
the primary server becomes available again, the secondary will convey
any changes that occurred since the time of failover back to the pri-
mary.
Through careful control of the difference between the lease times
offered to DHCP clients and the lease time known by the secondary
server, the protocol allows the primary to communicate with the
secondary after the primary has completed communication with the DHCP
client (a technique known as "lazy" update) and still guarantee that
Droms, et. al. [Page 2]
DRAFT November 1998
duplicate IP address allocations do not occur. Thus, the protocol
does not directly impact the ability of a DHCP server to respond to
DHCP client requests.
1.1. Requirements Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC 2119].
1.2. DHCP Terminology
This document uses the following terms:
o "DHCP client" or "client"
A DHCP client is an Internet host using DHCP to obtain confi-
guration parameters such as a network address.
o "DHCP server" or "server"
A DHCP server is an Internet host that returns configuration
parameters to DHCP clients.
o "binding"
A binding is a collection of configuration parameters, including
at least an IP address, associated with or "bound to" a DHCP
client. Bindings are managed by DHCP servers.
o "binding database"
The collection of bindings managed by a primary and secondary.
o "subnet address pool"
A subnet address pool is the set of IP address which is associ-
ated with a particular network number and subnet mask. In the
simple case, there is a single network number and subnet mask
and a set of IP addresses. In the more complex case (sometimes
called "secondary subnets", sometimes "superscopes"), several
(apparently unrelated) network number and subnet mask combina-
tions with their associated IP addresses may all be configured
together into one subnet address pool.
o "Primary server" or "Primary"
Droms, et. al. [Page 3]
DRAFT November 1998
A DHCP server configured to provide primary service to a set of
DHCP clients for a particular set of subnet address pools.
o "Secondary server" or "Secondary"
A DHCP server configured to act as backup to a primary server
for a particular set of subnet address pools.
o "stable storage"
Every DHCP server is assumed to have some form of what is called
"stable storage". Stable storage is used to hold information
concerning IP address bindings (among other things) so that this
information is not lost in the event of a server failure which
requires restart of the server.
1.3. Requirements for this protocol
The following list of goals must be (and are) achieved by this proto-
col.
1. Implementations of this protocol must work with existing DHCP
client implementations based on the DHCP protocol [RFC 2131].
2. Implementations of the protocol must work with existing BOOTP
relay implementations.
3. The protocol must provide failover redundancy between servers
that are not located on the same subnet.
1.4. Goals for this protocol
1. Provide for continued service to DHCP clients through an
automated mechanism in the event of failure of the primary
server.
2. Avoid binding an IP address to a client while that binding is
currently valid for another client. In other words, do not
allocate the same IP address to two clients.
3. Minimize any need for manual administrative intervention.
4. Introduce no additional delays in server response time as a
result of the communications required to implement the Fail-
over protocol.
Droms, et. al. [Page 4]
DRAFT November 1998
5. Share IP address ranges between primary and secondary servers;
i.e., impose no requirement that the pool of available
addresses be divided between servers.
6. Continue to meet the goals and objectives of this protocol in
the event of server failure or network partition.
7. Provide graceful reintegration of full protocol service after
server failure or network partition.
8. Allow for one computer to act as a secondary server for multi-
ple primary servers. Other topologies (e.g.: mesh) are also
possible. primary and secondary servers SHOULD be viewed as
"logical" servers and not necessarily physical computers.
9. Ensure that an existing client can keep its existing IP
address binding if it can communicate with either the primary
or secondary DHCP server implementing this protocol - not just
whichever server that originally offered it the binding.
10. Ensure that a new client can get an IP address from some
server. Ensure that in the face of partition, where servers
continue to run but cannot communicate with each other, the
above goals and requirements may be met. In addition, when the
partition condition is removed, allow graceful automatic re-
integration without requiring human intervention.
11. If either primary or secondary server loses all of the infor-
mation that is has stored in stable storage, it should be able
to refresh its stable storage from the other server.
1.5. Limitations of this Protocol
The following are explicit limitations of this protocol.
1. Under normal operation, only one server at a time will hand
out new IP addresses, but client lease renewals are serviced
by both servers; the protocol provides reliability through
redundancy and some degree of load balancing of lease
renewals.
2. This protocol provides only one level of redundancy through a
single secondary server for each primary server.
3. The protocol provides a way to detect when the primary and
secondary server cannot communicate, but once this condition
has been detected, does not (indeed, cannot) provide any way
Droms, et. al. [Page 5]
DRAFT November 1998
to further distinguish between network failure and failure of
one of the servers. The protocol allows detection of an ord-
erly shutdown of a participating server.
4. A subset of the address pool is reserved for secondary server
use. In order to handle the failure case where both servers
are able to communicate with DHCP clients, but unable to com-
municate with each other, a subset of the IP address pool must
be set aside as a private address pool for the secondary
server. The secondary can use these to service newly arrived
DHCP clients during such a period. The size of this private
pool SHOULD be based only on the arrival rate of new DHCP
clients and the length of expected down-time, and is not
influenced in any way by the total number of DHCP clients sup-
ported by the server pair.
5. The primary and secondary servers do not respond to client
requests at all while recovering from a failure that could
have resulted in duplicate IP assignments. (When synchroniz-
ing in POTENTIAL-CONFLICT state).
2. Protocol Operations
The protocol features a small number of messages to communicate bind-
ing information, operational status and to manage various
disconnect-reconnect scenarios between servers.
2.1. Message Addressing and Configuration granularity
When discussing messages, an important question is "to whom are mes-
sages sent" and "from whom are messages sent". What is the address-
able entity from which and to which messages are sent?
At one level, this would seem to be a single DHCP server, but in fact
there are many situations where additional flexibility in configura-
tion is useful. For instance, there might be several servers which
are each primary for a distinct set of address pools, and one server
which is secondary for all of those address pools. The situation
with the primaries is straightforward, but the secondary will need to
maintain a separate failover state, partner state, and communications
up/down status for each of the separate primary servers for which it
is acting as a secondary.
The protocol allows for there to be a unique failover entity per
partner per role (where role is primary or secondary). This failover
entity can take actions and hold unique states. There are thus a
Droms, et. al. [Page 6]
DRAFT November 1998
maximum of two failover entities per partner (one for the partner as
a primary and one for that same partner as a secondary.)
Thus, in the case where there are two primary servers A and B each
backed up by a single common secondary server C, there is one fail-
over entity on each of A and B, and two different failover entities
on C. The two different failover entities on C each have unique
states and message xid ranges. As far as the protocol described in
this draft is concerned, they constitute different "servers",
although they are certainly part of one server (as the term is com-
monly used) if they reside in the same process.
It is not the case that there is subnet granularity for each failover
entity. On one server, there is one failover entity per "partner-
role", regardless of how many subnets or address pools are managed by
that combination of partner and role. Conversely, any given subnet
or pool will be associated with exactly one failover entity on a sin-
gle server (but it will also be associated with the corresponding
partner's failover entity.)
When a message is received from the partner, the unique failover
entity to which the message is directed is determined solely by the
IP address of the partner and the setting of the SECONDARY bit in the
'flags' field of the message header.
Throughout this document, the states and actions taken by "servers"
are described. The terms "server", "primary server", and "secondary
server" are commonly used to described the entity taking these states
and taking actions. This description is wholly accurate only for the
simplest of cases, where all of the address pools on one server are
backed up by all of the address pools on another server. In this
case, there is a "true" primary and secondary server. In all other
cases, the term "server" is used to describe one of the two possible
failover entities per partner.
2.2. Packet transport
All messages sent by this protocol are sent in UDP packets. All mes-
sages are unicast from the sender to the receiver. The next section
discusses the port to use when sending DHCP failover UDP packets.
DISCUSSION:
See section 8, Extended discussion #1, for a discussion of the
reasons to use UDP as the protocol.
Droms, et. al. [Page 7]
DRAFT November 1998
2.3. Port usage
Compliant servers SHOULD use port 647 (assigned to dhcp-failover by
IANA) for sending and receiving Failover protocol messages, though
they MAY be configured to use a different port (including ports 67 or
68).
Since the use of port 67 and 68 is allowed, the messages are format-
ted in such a way that they can be distinguished from DHCP or BOOTP
messages by the use of distinct message 'op' codes. Note that send-
ing failover messages on port 67 to servers not designed to support
them may not only not work, but may cause those servers to operate
incorrectly or to crash.
DISCUSSION:
Some implementors have a strong requirement for using a separate
port for the Failover protocol, and the use of the allocated port
647 will accommodate them. Some other implementors seem equally
committed to allowing failover packets to be sent to the standard
DHCP port, port 67. The above language strongly suggests that the
failover port be used (by using SHOULD), but leaves open the pos-
sibility of using the standard DHCP port (or any other) for
servers designed to operate in that fashion.
2.4. Time synchronization between communicating servers
Each Binding update message carries a "sent time stamp" (the time
when the message was sent in GMT). This provides a simple mechanism
to determine any "time drift" between communicating servers.
DISCUSSION:
If a UDP packet is successfully transmitted (i.e.: it does not get
lost), the packet travel time is negligible in the framework of
DHCP leases. By providing a GMT "sent time" stamp, the recipient
can compare this with its notion of the current GMT time at the
time it receives the packet. The difference (plus the packet
travel time, which we ignore) is the time drift. The recipient
MUST use this time drift value to bias "absolute time" values it
receives from the sender.
2.5. Failover Protocol Messages
The Failover protocol messages are sent using UDP and encoded using a
packet format specific to the Failover protocol. To allow easy
recognition of and separation of Failover protocol messages from
Droms, et. al. [Page 8]
DRAFT November 1998
BOOTP and DHCP messages, BOOTP packet 'op' field values 3..11 are
used to indicate various Failover protocol message types. A Failover
protocol message is always unicast from the source to the destination
using the port defined in section 2.2. The sender, and never the
recipient is responsible for retransmission when necessary.
2.6. Failover protocol packet header format
All of the fields in the fixed portion of the packet MUST be filled
with correct data in every message sent.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| op (1) | rev (1) | payload offset (2) |
+---------------+---------------+---------------+---------------+
| xid (4) |
+---------------------------------------------------------------+
| sending server ID ( IP address ) (4) |
+---------------------------------------------------------------+
| time stamp (4) |
+---------------------------------------------------------------+
| state (1) | flags(1) | reserved (2) |
+---------------+---------------+---------------+---------------+
| 0 or more additional header bytes (variable) |
+---------------------------------------------------------------+
| Payload Data, formatted as DHCP-style options |
| (although using a unique option number space) |
| (variable) |
+---------------------------------------------------------------+
Droms, et. al. [Page 9]
DRAFT November 1998
op - 1 byte
These values extend the number space of the existing BOOTP message
type "Op" field.
The following message types are defined:
Value Message Type
----- ------------
0 reserved to BOOTP/DHCP, unused by failover
1 BOOTREQUEST (reserved to BOOTP/DHCP, unused by failover)
2 BOOTREPLY (reserved to BOOTP/DHCP, unused by failover)
3 DHCPPOOLREQ request allocation of addresses
4 DHCPPOOLRESP respond with allocation count
5 DHCPBNDUPD update partner with binding info
6 DHCPBNDACK acknowledge receipt of binding update
7 DHCPPOLL probe partner for comm. integrity
8 DHCPPRPL acknowledge comm. integrity
9 DHCPUPDATEREQALL request full transfer of binding info
10 DHCPUPDATEDONE ack send and ack of req'd binding info
11 DHCPUPDATEREQ req transfer of un-acked binding info
rev - 1 byte
Failover protocol version supported. Set to 1 for the Failover
protocol described in this draft. The value 255 is reserved for
experimental implementations. Such implementations SHOULD use the
DHCP Vendor Class option to recognize a partner server which is using
the same vendor's experimental implementation.
payload offset - 2 bytes, network byte order
The byte offset of the Payload area, from the beginning of the
Failover packet header. The value for the current protocol version is
20.
xid - 4 bytes, network byte order
The sender of a Failover protocol packet is responsible for setting
this number, and the receiver of the packet copies the number over
into any response packet, treating it as opaque data. The sender
SHOULD ensure that every packet sent to a particular IP address and
port combination has a unique transaction id unless that packet is a
re-transmission.
Droms, et. al. [Page 10]
DRAFT November 1998
sending server ID - 4 bytes, network byte order
The IP address of the sending server. In conjunction with the
setting of the SECONDARY flag, this uniquely determines the failover
entity sending the message as well as that destined to receive the
message.
This is placed in the packet instead of being recovered from the IP
header for security purposes (see section 8).
time stamp - 4 bytes, unsigned, network byte order
A time stamp, indicating the time when the packet was sent. The time
is a 32 bit unsigned long value in network byte order, in units of
seconds (GMT since EPOCH).
It is used to determine the time drift between the sender and the
recipient. The time drift is defined as the difference between
"Arrive Time (GMT)" and "(Send Time (GMT)". The actual packet travel
time is assumed to be negligible in this context. All Date-Time
values contained in Failover messages MUST be corrected by the time
drift before being stored by the recipient.
state - 1 byte
This field indicates the state of the sender, at the time the packet
was sent. The field MUST be set in every Failover message. The
server state value can be one of the following:
Value Server State
----- -------------------------------------------------------------
0 NO-STATE May only occur in POLL messages.
The partner should reply, but
should not react with any state
transition.
1 STARTUP Startup state (1)
2 NORMAL Normal state
3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe)
4 PARTNER-DOWN Partner down (unsafe mode)
5 POTENTIAL-CONFLICT Synchronizing
6 RECOVER Recovering bindings from partner
7 PAUSED Shutting down for a short period.
8 SHUTDOWN Shutting down for an extended
period.
9 RECOVER-DONE Interlock state prior to NORMAL
Droms, et. al. [Page 11]
DRAFT November 1998
Note 1: The STARTUP state is never set in the State field of the mes-
sage, but rather is represented by the setting of the STARTUP flag
(see the description of the Flags field immediately below). When the
server is in the STARTUP state, the state transmitted in the State
byte is the PREVIOUS state (usually, but not always, the last
recorded in stable storage prior to a server going down -- see sec-
tion 6.3 for details.)
flags - 1 byte
Currently, bits 7 (MSB), 6, and 5 are defined. All other bits are
reserved, and must be set to 0.
o SECONDARY
Bit 7 is the SECONDARY flag and defines the server role. Bit 7
is 0 if the sender is a primary server, 1 if it is a secondary
server. Note that this role is fixed for the duration of the
relationship between primary and secondary server. In particu-
lar, it does not change when and if the secondary server "takes
over" for the primary server when it enters COMMUNICATIONS-
INTERRUPTED or PARTNER-DOWN state -- each server retains its
role throughout all of its state transitions.
o RESTART
Bit 6 is the RESTART flag. If bit 6 is 1, the sender is res-
tarting. A server MUST set this bit every time it is re-
started, and it MUST clear the bit upon receiving the first
DHCPPRPL to a DHCPPOLL message it has sent with the bit set.
Whenever a DHCPPOLL message is sent with the RESTART bit set in
the 'flags' field, the MCLT Option, Option 235, MUST be
included.
Whenever a message with the RESTART bit is received by a server,
it MUST transition through the communications failed state tran-
sition. The RESTART bit signals that the partner server has
been restarted, and if communications is already considered to
have failed, then nothing need be done. If, however, the
partner server appeared to be operating correctly, then it was
able to restart without the receiving server noticing that it
was ever gone. The communications failed transition is forced
in this case to restart any on-going resynchronization processes
that were operating with the partner server. See section 6.3
for additional information.
Whenever a DHCPPOLL message is sent with the RESTART bit set,
Droms, et. al. [Page 12]
DRAFT November 1998
the server SHOULD include a Vendor Class Identifier, Option 60,
in the message to identify the server to its partner.
o STARTUP
Bit 5 is the STARTUP flag. Bit 5 MUST be set to 1 whenever the
server is in STARTUP state, and set to 0 otherwise. (Note that
when in STARTUP state, the state transmitted in the 'state'
field is usually the last recorded state from stable storage,
but see section 6.3 for details.)
reserved - 2 bytes
2 filler bytes, reserved.
2.7. DHCPPOOLREQ and DHCPPOOLRESP:
A secondary server requests addresses for its unique use from the
primary server by using the DHCPPOOLREQ message. The primary is in
complete charge of how many addresses the secondary receives.
The primary server will allocate IP addresses to the secondary server
upon receipt of a DHCPPOOLREQ message and inform the secondary server
of the number of additional addresses allocated in this allocation
cycle by sending the number in the DHCPPOOLRESP message.
When the primary server gets a DHCPPOOLREQ message, it computes which
addresses should be transferred to the secondary, and queues up
DHCPBNDUPD transactions by setting the Status of the selected
addresses to "BACKUP". Having done this, it sends a DHCPPOOLRESP
message. The DHCPPOOLRESP message carries the "Number of addresses
transferred" as its payload. The primary server does not have to
wait until all the above binding updates have been acknowledged,
The secondary server keeps sending DHCPPOOLREQ messages until it
receives a DHCPPOOLRESP with "Number of addresses transferred" = 0,
or it decides that the partner is not responding.
If the secondary server receives a DHCPPOOLRESP message with "Number
of addresses transferred" > 0, it MUST send another DHCPPOOLREQ mes-
sage, since additional addresses may still be waiting for it. How-
ever, the time at which it sends subsequent DHCPPOOLREQ messages is
implementation dependent. This mechanism makes it possible for the
primary server to pace the transfer (e.g., it could generate all
addresses all at once, or one-by-one) and to some degree for the
secondary to pace their receipt.
Droms, et. al. [Page 13]
DRAFT November 1998
The primary server MUST respond to each DHCPPOOLREQ message it
receives. If it has already generated all private addresses, or it
has no available addresses, it MUST send DHCPPOOLRESP with "Number
of addresses transferred" = 0.
The secondary server MAY send a DHCPPOOLREQ message at any time, and
although the primary server is under no obligation to allocate any
additional addresses, it MUST respond with a DHCPPOOLRESP indicating
how many new addresses it has allocated or 0 if no new addresses were
allocated.
2.8. DHCPUPDATEREQ, DHCPUPDATEREQALL and DHCPUPDATEDONE:
Whenever either server wishes to be updated with information the
other server knows but has not yet transmitted, it will send a
DHCPUPDATEREQ or DHCPUPDATEREQALL message.
When either server gets a DHCPUPDATEREQ or DHCPUPDATEREQALL message,
it computes which updates should be transferred to the partner, and
queues up DHCPBNDUPD transactions as appropriate. Once all such
updates have been acknowledged, it sends a DHCPUPDATEDONE message.
If the message that initiated this process was a DHCPUPDATEREQ mes-
sage, the receiving server will transmit only DHCPBNDUPD messages for
IP addresses which its information indicates that its partner has not
acked.
If, however, the message that initiated this process was a DHCPUP-
DATEREQALL message, the receiving server will transmit DHCPBNDUPD
messages for all IP addresses involved in failover with this partner
in this role.
The secondary server periodically re-transmits the DHCPUPDATEREQ mes-
sage, until it receives a DHCPUPDATEDONE message with a matching
'xid' field, or until it decides that the partner is not responding.
This approach is similar to the DHCPPOOLREQ/DHCPPOOLRESP message
exchange, with one critical difference: the DHCPPOOLRESP is sent as
soon as the binding updates are queued up, but the DHCPUPDATEDONE
message is deferred until all of the sender's DHCPBNDUPD messages
have been successfully transmitted and a corresponding DHCPBNDACK
message has been received for each of them.
The server processing a DHCPUPDATEREQ message MUST NOT send a
corresponding DHCPUPDATEDONE message until all of the DHCPBNDUPD mes-
sages have been acked by the partner with a DHCPBNDACK message.
Droms, et. al. [Page 14]
DRAFT November 1998
Any retransmissions of the DHCPUPDATEREQ message MUST have the same
transaction ID. Use of a new transaction ID may cause rebuilding of
the outgoing binding update queue or other processing in the server
with a negative effect on performance.
2.9. DHCPBNDUPD
One server notifies its partner of a binding state change by using
the DHCPBNDUPD message.
Every DHCPBNDUPD message MUST contain:
o An Assigned IP Address Option (Option 50).
o A DHCP Binding Status (Option X).
o Where the Binding Status is ACTIVE, EXPIRED, RELEASED, or RESET,
it MUST also contain one or both of the Client Identifier
(Option 61) and the Client Hardware Address (Option X+3). In the
case where the Binding Status is ACTIVE, it MUST contain the
Lease Duration, Option 51.
o Where dynamic DNS updates are being used by the sending server,
the Client FQDN Option, Option 81, is used by the sender to
communication the status of the binding update to its partner.
In response to a binding update, the recipient server MUST respond
with a DHCPBNDACK message.
Multiple binding updates MAY be batched up, and sent in one Failover
protocol message (see section 3.1).
2.10. DHCPBNDACK
This message implements either a positive or negative acknowledgment
of one or more binding updates.
A binding update, (or a batch of binding updates sent as one message)
are matched up with their associated acknowledgment by having the
same 'xid' field value in the message header.
The server sending a DHCPBNDACK message MAY include any of the
options that are acceptable in a DHCPBNDUPD message when the
DHCPBNDACK message is returned to the sender. It MUST include at
least the Assigned IP Address Option.
If any of this information differs from the information in the
DHCPBNDUPD message, the receiver MUST NOT update its bindings
Droms, et. al. [Page 15]
DRAFT November 1998
database with that information upon receipt of the DHCPBNDACK mes-
sage, since the sender will have no way of knowing if the receiver
actually received the message.
The DHCPBNDACK MAY selectively reject one or more updates, by includ-
ing one or more IP address - Reject Reason option pairs in the mes-
sage body.
The DHCPBNDACK implicitly acknowledges any binding updates it replies
to, except those it enumerates using Reject Reason Codes.
Implementations of this protocol MAY send batched updates, and they
MUST be prepared to receive batched updates.
2.11. DHCPPOLL
In the absence of other messages, a DHCPPOLL message is used to
verify the communications integrity of the link between the primary
and secondary servers. It is used by either server whenever there is
some question about either the communications integrity or running
status of the other server.
Since current state and other status information is transmitted in
every DHCPPOLL and in every DHCPPRPL message, the DHCPPOLL and
DHCPPRPL exchange can also be used to signal a change in status by a
server or as a way to request an update of the status of its partner.
Whenever a DHCPPOLL message is generated it MUST have a unique value
in the 'xid' field, unless it is a retransmission of a previously
un-acked DHCPPOLL message.
2.12. DHCPPRPL
This message simply replies to the DHCPPOLL message (PRPL = Poll
reply). Like all messages, it needs to have all of the fixed
portions of the failover packet header filled in, including the state
and the flags fields.
3. Protocol Payload Data Format
Payload data is encoded as a set of flexible DHCP/BOOTP style options
[RFC 2132]. (The usual 1 byte option code, 1 byte length, and
"length" bytes of data). The options are placed after the header,
after skipping PayloadOffset bytes. The payload data options are not
preceded by a "cookie" value.
Droms, et. al. [Page 16]
DRAFT November 1998
Since the packet is NOT a DHCP/BOOTP protocol packet, the options
used here do not conflict with any existing "proper" DHCP/BOOTP
options. In fact, these options are allocated in relationship to the
DHCP option space in the following way.
In cases where the syntax and semantics of a Failover Payload Option
is identical to that of a DHCP/BOOTP option, the same option number
is used. For options unique to the Failover protocol, option numbers
starting at 230 are used.
Thus, all new Failover protocol option numbers are assigned from a
continuous range beginning with 230.
The protocol is permissive in allowing various other DHCP options in
binding updates. As long as the sender wishes to use an option, it
MAY include it. On the other hand, the recipient MUST ignore any
option it is not prepared to process.
3.1. Batching multiple binding updates in one packet
Implementations of this protocol MAY send batched updates, and they
MUST be prepared to receive batched updates.
Multiple DHCPBNDUPD transactions MAY be batched together in one
protocol message. Data sets for individual transactions MUST always
begin with the Assigned IP Address (Option 50). Option ordering
between the Assigned IP Address options is not significant.
If batched updates are sent, they MUST be formatted as follows:
Non-IP Address/Non-client specific options first
Assigned IP address option (50) for the first address
Options pertaining to first address, including
at least DHCP Binding Status (230)
Assigned IP address option (50) for the second address
Options pertaining to second address, including
at least DHCP Binding Status (230)
...
In case an implementation chooses to reject some or all of the IP
address binding information in a DHCPBNDUPD message in a DHCPBNDACK
reply, the DHCPBNDACK message MUST contain one or more Assigned IP
Address (Option 50) / Reject Reason Code pairs to indicate that the
updates for the address(es) were not accepted. The Assigned IP
Address options communicates which updates out of the batch are being
rejected, and the Reject Reason Code indicates why. Any IP addresses
Droms, et. al. [Page 17]
DRAFT November 1998
present in the DHCPBNDUPD message without corresponding Option 50/
Reject Reason Code pairs in the DHCPBNDACK message are implicitly
acked by the DHCPBNDACK message. If the DHCPBNDUPD message only con-
tains one binding update and that update is rejected, a DHCPBNDACK
with a single Assigned IP Address / Reject Reason Code pair MUST be
sent.
3.2. DHCP Binding Status
This option is used to convey the current state of a binding. This
option is mandatory for DHCPBNDUPD messages.
Code Len Type
+-----+-----+-----+
| 230 | 1 | 1-7 |
+-----+-----+-----+
Legal values for this option are:
Value Binding Status
----- ------------------------------------------------
1 FREE Lease has never been used
2 ACTIVE Lease is assigned to a client
3 EXPIRED Lease has expired
4 RELEASED Lease has been released by client
5 ABANDONED A server, or client flagged address as unusable
6 RESET Lease was freed by some external agent
7 BACKUP Lease belongs to secondary's private address pool
3.3. Assigned IP address
Uses identical code and format to DHCP Option 50 (requested IP
address). This option is mandatory for DHCPBNDUPD messages and in
any DHCPBNDACK message where a Reject Reason Code option appears.
Code Len Address
+-----+-----+-----+-----+-----+-----+
| 50 | 4 | a1 | a2 | a3 | a4 |
+-----+-----+-----+-----+-----+-----+
Droms, et. al. [Page 18]
DRAFT November 1998
3.4. Absolute time
This absolute time is used for the lease grant time as well the
partner-down time. When used in a DHCPBNDUPD or DHCPBNDACK
message, it represents the lease grant time. When used in a DHCPPOLL
message, it represents the partner-down time.
An absolute, GMT time value for this option, as time synchronization
has already been achieved between the source and the target server
using the time field in the message. Represented as seconds elapsed
since Jan 1, 1970 (i.e. ANSI C time_t time value representation).
Note that this is (at present) a signed field.
Code Len Time
+------+-----+-----+-----+-----+-----+
| 231 | 4 | t1 | t2 | t3 | t4 |
+------+-----+-----+-----+-----+-----+
3.5. Number of addresses transferred to Secondary Server
A 32 bit unsigned long in network byte order. Reports the number of
addresses transferred by the primary to the secondary server
(addresses to be used for the secondary server's private address
pool)
Code Len Number of Addresses
+-----+-----+-----+-----+-----+-----+
| 232 | 4 | n1 | n2 | n3 | n4 |
+-----+-----+-----+-----+-----+-----+
3.6. Lease Duration
Uses the format and code of the standard DHCP IP Address Lease Time
option (51). The time is in units of seconds, and is specified as a
32-bit unsigned integer. A Lease Duration of 0xFFFFFFFF indicates an
infinite lease.
Code Len Lease Time
+-----+-----+-----+-----+-----+-----+
| 51 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+-----+-----+-----+-----+
Droms, et. al. [Page 19]
DRAFT November 1998
3.7. Client Identifier
The format, code and conventions used are identical to DHCP option
61.
Code Len Type Client-Identifier
+-----+-----+-----+-----+-----+---
| 61 | n | t1 | i1 | i2 | ...
+-----+-----+-----+-----+-----+---
3.8. Client Hardware Address
The format is similar to DHCP option 61. T1 (type) MUST be set to the
proper ARP hardware address code, as defined in the ARP section of
RFC 1700 (it MUST NOT be zero!)
Code Len Type MAC address
+-----+-----+-----+-----+-----+---
| 233 | n | t1 | m1 | m2 | ...
+-----+-----+-----+-----+-----+---
Either Client Id, Client Hardware Address or BOTH MAY be present in
binding update transactions. At least one of them MUST be present.
If both are present, the Client Id MUST be used to uniquely identify
the owner of the binding (exactly as in RFC 2131).
3.9. Host Name
Uses the format and code of DHCP option 12.
Code Len Host Name
+-----+-----+-----+-----+-----+-----+-----+-----+--
| 12 | n | h1 | h2 | h3 | h4 | h5 | h6 | ...
+-----+-----+-----+-----+-----+-----+-----+-----+--
3.10. Domain Name
Uses the format and code of DHCP option 15.
Code Len Domain Name
+-----+-----+-----+-----+-----+-----+--
| 15 | n | d1 | d2 | d3 | d4 | ...
+-----+-----+-----+-----+-----+-----+--
Droms, et. al. [Page 20]
DRAFT November 1998
3.11. Client FQDN
If an implementation supports Dynamic DNS updates, this option can be
used to communicate the DNS name that was set. Uses the format and
code of the Client FQDN option (81) as described in <draft-ietf-dhc-
dhcp-dns-08.txt>.
Code Len Flags Rcode1 Rcode2 Domain Name
+-----+-----+-----+------+------+-----+------
| 81 | n | f | r1 | r2 | d1 | d2...
+-----+-----+-----+------+------+-----+------
3.12. Reject Reason Code
This option is used to selectively reject binding updates. It MAY be
used in DHCPBNDACK message, always following an option 50. Option 50
contains the IP address of the specific update being rejected.
Note that a Message option, DHCP Option 56, may be included to give a
human readable error indication along with the Reject Reason Code.
Code Len Reason code
+-----+-----+----------+
| 234 | 1 | R1 |
+-----+-----+----------+
Reason codes :
0 Reserved
1 Illegal IP address (not part of any address pool)
2 Fatal conflict exists: address in use by other client.
3 - 253 Reserved for new Reason Codes.
254 Unknown: Error occurred but does not match any reason code
255 Reserved for code expansion
Droms, et. al. [Page 21]
DRAFT November 1998
3.13. Message
This option is used to supply a human readable message. It may be
used in association with the Reject Reason Code to provide a human
readable error message for the reject.
Code Len Text
+-----+-----+------+-----+--
| 56 | 1 | c1 | c2 | ...
+-----+-----+------+-----+--
3.14. MCLT - Maximum Client Lead Time
Maximum Client Lead Time, in seconds. A 32 bit integer value, in
network byte order. This option MUST be used in DHCPPOLL and DHCPPRPL
messages, when the server is NOT in normal state.
Code Len Time
+------+-----+-----+-----+-----+-----+
| 235 | 4 | t1 | t2 | t3 | t4 |
+------+-----+-----+-----+-----+-----+
3.15. Vendor Class Identifier
A string which identifies the vendor of the failover protocol
implementation.
The code for this option is 60, and its minimum length is 1.
Code Len Vendor Class Identifier
+-----+-----+-----+-----+-----+--
| 60 | n | i1 | i2 | i3 | ...
+-----+-----+-----+-----+-----+--
4. Challenging scenarios for a Failover protocol
There exist a number of failure scenarios which will challenge the
correctness guarantees of the Failover protocol. Two of the
scenarios that the Failover protocol was specifically designed to
handle correctly are detailed in this section in order to motivate
some of the more unusual aspects of the protocol's operations.
Droms, et. al. [Page 22]
DRAFT November 1998
4.1. Primary Server crash before "lazy" update:
In the case where the primary server sends a DHCPACK to a client for
a newly allocated IP address and then crashes prior to sending the
corresponding update to the secondary server, the secondary server
will have no record of the IP address allocation. When the secondary
server takes over, it may well try to allocate that IP address to a
different client. In the case where the first client to receive the
IP address is not on the net at the time (yet while there was still
time to run on its lease), an ICMP echo (i.e., ping) will not prevent
the secondary server from allocating that IP address to different
client.
This is handled in the protocol by having the primary and secondary
allocate addresses for new clients from distinct address pools.
A more likely (in that DHCPRENEWs are presumably more common than
DHCPDISCOVERs) and more subtle version of this problem is where the
primary server crashes after extending a client's lease time, and
before updating the secondary with a new time using a lazy update.
After the secondary takes over, if the client is not connected to the
network the secondary will believe the client's lease has expired
when, in fact, it has not. In this case as well, the IP address
might be reallocated to a different client while the first client is
still using it.
This scenario is handled by the Failover protocol through control of
the lease time and the use of the maximum client lead time (MCLT).
See the next section for details.
4.2. Network partition where servers can't communicate but each can
talk to clients:
Several conditions are required for this situation to occur. First,
due to a network failure, the primary and secondary servers cannot
communicate. As well, some of the DHCP clients must be able to
communicate with the primary server, and some of the clients must now
only be able to communicate with the secondary server. When this
condition occurs, both primary and secondary servers could attempt to
allocate IP addresses for new clients from the same pool of available
addresses. At some point, then, two clients will end up being
allocated the same IP address. This will cause potentially serious
problems when the network failure that created this situation is
corrected.
This is handled in the protocol by having the primary and secondary
servers allocate addresses for new clients from distinct address
Droms, et. al. [Page 23]
DRAFT November 1998
pools.
The specifics of how these two scenarios are handled are supplied in
the next section.
5. Duplicate Address Assignment Control
There are several ways that the Failover protocol avoids the possi-
bility of duplicate address assignment.
5.1. Control of lease time
The key problem with lazy update is that when the a server fails
after updating a client with a particular lease time and before
updating its partner, the partner will believe that a lease has
expired even though the client still retains a valid lease on that IP
address.
In order to handle this problem, a period of time known as the "Max-
imum Client Lead Time" (MCLT) is defined and must be known to both
the primary and secondary servers. Proper use of this time interval
places an upper bound on the difference allowed between the lease
time provided to a DHCP client by a server and the lease time known
by that server's partner. In order that this is not the maximum
lease time that a server can ever provide to a client, during a lazy
update the updating server typically updates its partner with lease
time information which is longer than the lease time previously given
to the client. This allows that server to give a longer lease time
to the client the next time the client renews its lease.
When moving to the PARTNER-DOWN state (where a server is allowed to
reallocate the partner's IP addresses), a server will wait the Max-
imum Client Lead Time before allocating any IP addresses from its
partner's pool to any new DHCP clients. Thus, any clients which have
a lease on an IP address with a lease time greater than that known by
the server moving into PARTNER-DOWN state will either have contacted
that server during the MCLT period or their leases will have expired.
When a server has transitioned to PARTNER-DOWN state, it MUST NOT
reallocate an IP address from one client to another client until an
additional maximum client lead time interval after the lease on the
first client expires. (Actually, until the maximum client lead time
after what it believes to be the lease expiration time of the first
client.)
The fundamental relationship on which much of the correctness of this
protocol depends is that the lease expiration time known to a DHCP
client MUST NOT be more than the maximum client lead time greater
Droms, et. al. [Page 24]
DRAFT November 1998
than the lease expiration time known to a server's partner.
The remainder of this section makes the above fundamental relation-
ship more explicit.
This protocol requires a DHCP server to deal with several different
lease intervals and places specific restrictions on their relation-
ships. The purpose of these restrictions is to allow the other server
in the pair to be able to make certain assumptions in the absence of
an ability to communicate between servers.
The different lease times are:
o desired client lease interval
The desired client lease interval is the lease interval that a
DHCP server would like to give to a DHCP client in the absence
of any restrictions imposed by the Failover protocol. Its
determination is outside of the scope of this protocol. Typi-
cally this is the result of external configuration of a DHCP
server.
o actual client lease interval
The actual client lease internal is the lease interval that a
DHCP server gives out to a DHCP client. It may be shorter than
the desired client lease interval (as explained below).
o desired partner server lease interval
The desired partner server lease interval is the lease expira-
tion interval the local server tells to its partner.
o acknowledged partner server lease interval
The acknowledged partner server lease interval is the interval
the partner server has most recently acknowledged.
The key restriction (and guarantee) that any server makes with
respect to lease intervals is that the actual client lease interval
never exceeds the acknowledged partner server lease interval (if any)
by more than a fixed amount. This fixed amount is called the "Max-
imum Client Lead Time" (MCLT).
The MCLT MAY be configurable, but for correct server operation it
MUST be the same and known to both the primary and secondary servers.
It is transmitted from the primary to the secondary in every message
Droms, et. al. [Page 25]
DRAFT November 1998
sent with the RESTART bit set, and also in every poll and poll reply
message. The secondary MUST ensure that its value agrees with that
of the primary. See section 3.14 concerning the MCLT Option.
A server MUST record in its stable storage both the local server
lease interval and the most recently acknowledged partner server
lease interval for each IP address binding. It is assumed that the
desired client lease interval can be determined through techniques
outside of the scope of this protocol.
Again, the fundamental relationship among these times which MUST be
maintained is:
actual client lease interval <
( acknowledged partner lease interval + MCLT )
The "acknowledged partner lease interval" is the acknowledged secon-
dary server lease interval for the primary server, and it would be
the acknowledged primary server lease interval for the secondary
server when it is operating out of contact with the primary server.
Figure 5.1-1 illustrates a initial lease to a client using the rules
discussed in the example which follows it.
Droms, et. al. [Page 26]
DRAFT November 1998
DHCP Primary Secondary
Client Server Server
| | |
| >-DHCPDISCOVER-> | |
| <---DHCPOFFER-< | |
| | |
| >-DHCPREQUEST-> | |
| (selecting) | |
| | |
| <--------DHCPACK-< | |
| ^ (MCLT) | |
| : | >-DHCPBNDUPD--> |
| : | (1/2 MCLT + X ) |
| : | |
| : | <-DHCPBNDACK-< |
| MCLT / 2 | |
... : ... ...
| : | |
| V | |
| >-DHCPREQUEST-> | |
| (renew) | |
| | |
| <--------DHCPACK-< | |
| ^ (X) | |
| : | >-DHCPBNDUPD--> |
| : | ( 1/2 X + X ) |
| : | |
| : | <-DHCPBNDACK-< |
| X / 2 | |
| : | |
... ... ... ...
Figure 5.1-1: Lazy Update Message Traffic
X = Desired Client Lease Interval
DISCUSSION:
This protocol mandates no algorithm concerning these lease inter-
vals, as long as above fundamental relationship is preserved.
In the interests of clarity, however, let's examine a specific
example. The MCLT in this case is 1 hour. The desired client
lease interval is 3 days, and its renewal time is half the lease
interval.
Droms, et. al. [Page 27]
DRAFT November 1998
The rules for this example are:
o What to tell the client:
Take the remainder of the acknowledged partner server lease
interval. If this is a new lease, then this value will be zero.
If this remainder plus the MCLT is greater than the desired
client lease interval, give the client the desired client lease
interval else give the client the remainder plus the MCLT.
o What to tell the failover partner server:
Take the renewal interval (typically half of the actual client
lease interval), and add to it the desired client lease inter-
val.
In operation this might work as follows:
When a primary server makes an offer for a new lease on an IP
address to a DHCP client, it determines the desired client lease
interval (in this case, 3 days). It then examines the ack-
nowledged partner lease interval (which in this case is zero) and
determines the remainder of the time left to run, which is also
zero. To this it adds the the MCLT. Since the actual client
lease interval cannot be allowed to exceed the remainder of the
current partner lease interval plus the MCLT, the offer made to
the client is for the remainder of the current partner lease
interval (i.e., zero) plus the MCLT. Thus, the actual client
lease interval is 1 hour.
Once the primary server has performed the ACK to the DHCP client,
it will update the secondary server with the lease information.
However, the desired partner server lease interval will be com-
posed of the one half of the current actual client lease interval
added to the desired client lease interval. Thus, the secondary
server is updated with a DHCPBNDUPD with a lease interval of 3
days + 1/2 hour specified in the Lease Duration Option (Option
51).
When the primary server receives an ACK to its update of the
secondary server's (partner's) lease interval, it records that as
the acknowledged partner server lease interval. A server MUST NOT
send a DHCPBNDACK in response to a DHCPBNDUPD message until it is
sure that the information in the DHCPBNDUPD message resides in its
stable storage. Thus, the primary server in this case can be sure
that the secondary server has recorded the desired partner server
lease interval in its stable storage when the primary server
receives a DHCPBNDACK message from the secondary server.
Droms, et. al. [Page 28]
DRAFT November 1998
When the DHCP client attempts to renew at T1 (approximately one
half an hour from the start of the lease), the primary server
again determines the desired client lease interval, which is still
3 days. It then compares this with the remaining acknowledged
partner server lease interval (3 days + 1/2 hour) and adjusts for
the time passed since the secondary was last updated (1/2 hour).
Thus the remaining time on the acknowledged partner server lease
interval is 3 days. Adding the MCLT to this yields 3 days plus 1
hour, which is less than the desired client lease interval of 3
days. So the client is renewed for the desired client lease
interval -- 3 days.
When the primary DHCP server updates the secondary DHCP server
after the DHCP client's renewal ACK is complete, it will calculate
the desired partner server lease interval as the T1 fraction of
the actual client lease interval (1/2 of 3 days this time = 1.5
days). To this it will add the desired client lease interval of 3
days, yielding a total desired partner server lease interval of
4.5 days. In this way, the primary attempts to have the secondary
always "lead" the client in its understanding of the client's
lease interval so as to be able to always offer the client the
desired client lease interval.
Once the initial actual client lease interval of the MCLT is past,
the protocol operates effectively like the DHCP protocol does
today in its behavior concerning lease intervals. However, the
guarantee that the actual client lease interval will never exceed
the remaining acknowledged partner server lease interval by more
than the MCLT allows full recovery from a variety of failures.
5.2. Controlled re-allocation of IP addresses
When in PARTNER-DOWN state (after a period defined in detail in sec-
tion 6.5.2 has passed), a there are no restrictions on reallocating a
lease from one client to another.
In any other state, a server cannot reallocate an address from one
client to another without first notifying (through a DHCPBNDUPD mes-
sage) and receiving acknowledgement (through a DHCPBNDACK message)
that its partner is aware that that first client is not using the
address.
This could be modeled in the following way (though this specific
implementation is in no way required). An "available" IP address on
a server may be allocated to any client. An IP address which was
leased to a client and which expired or was released by that client
would take on a new state, say "pending-available". When an IP
address became "pending-available", the partner server would be
Droms, et. al. [Page 29]
DRAFT November 1998
notified that this IP address was "available" through a DHCPBNDUPD.
When the sending server received the DHCPBNDACK for that IP address
showing it was "available", it would move the IP address from
"pending-available" to "available", and it would be available for
allocation to any clients.
A server MAY reallocate an IP address in "pending-available" state to
the same client with no restrictions.
5.3. Secondary renewal of leases
When operating in NORMAL state, a secondary server MAY process
DHCPREQUEST messages for renewal or rebinding leases. In this case,
the requirements for control of lease time and re-allocation of IP
addresses are the same as that of the primary server.
6. Server Operation
This section discusses the operation of a server implementing the
Failover protocol using the state transition diagram in Figure 6.2-1.
This is the common state transition diagram for both servers in a
pair.
6.1. Server Initialization
When a server starts it starts out in STARTUP state. See section 6.4
below for details.
6.2. Establishing Communications Integrity
Central to the operation of the Failover protocol is a notion of
"communications okay" or "communications failed". State transitions
are taken in many cases when the status of communications with the
partner changes.
A specific discipline exists for establishing and verifying communi-
cations integrity. Communications is set to "okay" whenever a mes-
sage sent is acked by the partner. After an implementation dependent
length of time from the communications "okay" event the communica-
tions with the partner are deemed to have "failed" if no subsequent
acknowledgments have been received. Whenever a DHCPPRPL, DHCPUP-
DATEDONE, DHCPPOOLRESP or DHCPBNDACK is received this time period is
restarted.
Obviously, as the time period elapses, a server SHOULD send DHCPPOLL
messages in order to elicit a DHCPPRPL message in reply, which will
Droms, et. al. [Page 30]
DRAFT November 1998
reset the time period.
While an implementation SHOULD restart this time period on every
DHCPUPDATEDONE, DHCPPOOLRESP or DHCPBNDACK or DHCPRPL, it MAY choose
to only restart it on a DHCPPRPL.
This technique ensures that two-way communications integrity exists
between the servers. Were the timeout period to be reset on the
receipt of any message from the partner, a network failure where one
server could send but not receive messages to the partner could lead
to failure of the entire redundant DHCP subsystem. For example, in a
situation where the primary could send but not receive any messages,
the secondary would never take over from the primary and yet DHCP
clients would not receive any service.
6.3. Server State Transitions
Figure 6.2-1 is the diagram of the server state transitions. The
remainder of this section contains information important to the
understanding of that diagram.
The server stays in the current state until all of the actions speci-
fied on the state transition are complete. If communications fails
during one of the actions, the server simply stays in the current
state and attempts a transition whenever the conditions for a transi-
tion are later fulfilled.
In the state transition diagram below, the "+" or "-" in the upper
right corner of each state is a notation about whether communication
is ongoing with the other server.
The legend "responsive", "partially-responsive", or "unresponsive" in
each state indicates whether the server is responsive to DHCP client
requests in the respective state. The terms "responsive" and
"unresponsive" have the obvious meanings, while "partially-
responsive" means that a DHCP server may respond to DHCPREQUEST mes-
sages that are RENEWAL or REBINDING, but to no other messages.
In the state transition diagram below, when communication is reesta-
blished between the two servers, each must record the state of the
partner when communication was restored. State transitions on one
server in some cases imply state transitions on the partner server,
so a record of the current state of the partner server must be kept
by each server.
If a message is received from a partner with the state equal to zero
(0), then the receiving server should respond to that message with a
DHCPPRPL if it was a DHCPPOLL, but under no circumstances should it
Droms, et. al. [Page 31]
DRAFT November 1998
consider communications to be "okay", nor take any state transitions
based on receipt of that message.
If the state of the partner changes while communicating a server
moves through the communications-failed transition and into whatever
state results. It then immediately moves through whatever state
transition is appropriate given the current state of the partner
server.
DISCUSSION:
The point of this technique is simplicity, both in explanation of
the protocol and in its implementation. The alternative to this
technique of memory of partner state and automatic state transi-
tion on change of partner state is to have every state in the fol-
lowing diagram have a state transition for every possible state of
the partner. With the approach adopted, only the states in which
communications are reestablished require a state transition for
each possible partner state.
The current state of a server must be recorded in stable storage and
thus be available to the server after a server restart.
Droms, et. al. [Page 32]
DRAFT November 1998
+---------------+ V +--------------+
| RECOVER - | | | STARTUP - |
|(unresponsive) | +->|(unresponsive)|
+---------------+ +--------------+
Comm. OK +-----------------+
Other State:-RECOVER | PARTNER DOWN - |<-----+
| | | (responsive) | |
All POTENTIAL- +-----------------+ |
Others CONFLICT------------ | --------+ ^(see |
| Comm. OK | | 6.93) |
UPDATEREQ(ALL) Other State: | +-----+ |
Wait UPDATEDONE | | | Comm. | |
Wait MCLT from fail RECOVER All Others| Failed | |
+--------------+ | V V | | |
|RECOVER-DONE +| +--+ +--------------+ | |
|(unresponsive)| | | POTENTIAL + |<--+ |
+--------------+ Wait for +>| CONFLICT | |
Comm. OK Other | |(unresponsive)|<--- | --+
+--Other State:-+ State: | +--------------+ | |
| | | RECOVER | | | |
| All POTENT. DONE | Resolve Conflict | |
| Others: CONFLICT-- | ----+ (see 6.9) | |
| Wait for V V | |
| Other State: NORMAL +-----------------+ | |
| V | NORMAL + | External | |
| +--+----------+-->|(see 6.72, 6.73) |-Command-->+ |
| ^ ^ +-----------------+ | |
| | | | | |
| Wait for Comm. OK Comm. External |
| Other Other Failed Command |
| State: State: | or | |
|RECOVER-DONE NORMAL Start Safe Safe | |
| | COMM. INT. Period Timer Period | |
| Comm. OK. | V expiration |
| Other State: | +------------------+ | |
| RECOVER +--| COMMUNICATIONS - |-----------+ |
V +-------------| INTERRUPTED | Comm. OK |
RECOVER | (responsive) |--Other State:-+
RECOVER-DONE--------->+------------------+ All Others
Figure 6.2-1: Server state diagram.
Droms, et. al. [Page 33]
DRAFT November 1998
6.4. STARTUP state
The STARTUP state affords an opportunity for a server to probe its
partner server, before starting to service DHCP clients.
DISCUSSION:
Without the STARTUP state, a server would likely start in a state
derived from its previously stored state (held in stable storage),
if any. However, this may be inconsistent with the current state
of the partner. The STARTUP state affords the opportunity for a
server to potentially learn the partner's state and determine if
that state is consistent with its derived starting state or
whether some significant state change has occurred at the partner
that forces the server to start in another state. This is
especially critical if significant time has elapsed while the
server was down.
6.4.1. Operation while in STARTUP state
Whenever a server is in STARTUP state, it MUST be unresponsive to
DHCP client requests, and so the time spent in the STARTUP state is
necessarily short, typically on the order of a few seconds to a few
tens of seconds. The exact time spent in the STARTUP state is imple-
mentation dependent, and the primary and secondary server are not
required to spend the same amount of time in the STARTUP state.
Whenever any message is sent to the partner while in STARTUP state
the STARTUP bit MUST be set in the 'flags' field of the message
header.
6.4.2. Transition out of STARTUP state
Each server starts out in startup state every time it initializes
itself, and performs the following algorithm as part of its initiali-
zation:
1. Ensure that the RESTART bit is set in the 'flags' field of the
failover message header. Once set, the RESTART bit must
remain set in all failover messages sent by the server to the
partner until the first acknowledgment of a message is
received from that partner. This is required to assure that
the partner knows that the server has restarted, even if the
partner itself is unreachable for a long while.
Droms, et. al. [Page 34]
DRAFT November 1998
Do not send any messages until step 5.
2. Is there any record in stable storage of a previous failover
state? If yes, set previous-state to the last recorded state
in stable storage, and continue with step 3.
Is there any configuration information that indicates that
this server was previously running but lost its stable
storage? Such information must typically come from some
administrative intervention, since it is difficult for a
server to distinguish first startup from a startup after it
has lost its stable storage. If yes, then set the previous-
state to RECOVER, and set the time-of-failure to whatever time
was configured, and go on to step 3. This time-of-failure
will be used in the transition out of the RECOVER state into
the RECOVER-DONE state, below.
If there is no record of any previous failover state in stable
storage nor of any previous operational activity for this
server, then set the previous-state to RECOVER and set the
time-of-failure to a time before the maximum-client-lead-time
before now. If using standard Posix times, 0 would typically
do quite well.
3. Is the previous-state NORMAL? If yes, set the previous-state
to COMMUNICATIONS-INTERRUPTED.
4. Start the STARTUP state timer. The time that a server remains
in the STARTUP state (absent any communications with its
partner) is implementation dependent (and would typically be
configurable). It should be long enough to poll several times
and stand a good chance to receive a response to at least one
poll from a heavily loaded partner across a slow network.
5. Start sending DHCPPOLL messages (with both the RESTART and
STARTUP bits set in the 'flags' field).
6. Wait for "communications okay", i.e., the receipt of an
DHCPPRPL message.
When a DHCPPRPL message is received, clear the RESTART flag,
clear the STARTUP flag, and set the current state to the
previous-state.
If the partner is in PARTNER-DOWN state, and if its partner-
down time (received in the DHCPPRPL message in the Absolute
Time Option) is later than the last recorded time of operation
of this server, then set the current state to RECOVER.
Droms, et. al. [Page 35]
DRAFT November 1998
Then, transition to the current state and take the "communica-
tions okay" state transition based on the current state of
this server and the partner.
7. If the startup time expires, take an implementation dependent
action: The server MAY go to the previous-state, or the
server MAY wait.
Reasons to go to previous-state and begin processing:
If the current server is the only operational server, then if
it waits, there will be no operational DHCP servers. This
situation could occur very easily where one server fails and
then the other crashes and reboots. If the rebooting server
doesn't start processing DHCP client requests without first
being in communication with the other server, then the level
of DHCP redundancy is not particularly high. This is an
appropriate approach if the possibility of partition is low,
or if the safe period expiration time is well beyond the time
at which an operator would notice and react to a partition
situation. It is also quite appropriate if the safe period
will never expire.
Reasons to wait:
If the current server has been down for longer than the
maximum-client-lead-time, and it is partitioned from the other
server, then when it returns it will attempt to use its own
available addresses to allocate to new DHCP clients, and the
other server may well be in PARTNER-DOWN state and may have
already allocated some of those available addresses to DHCP
clients. In cases where the possibility of partition is high,
and the safe period expiration time is less than the likely
operator reaction time, this is a good approach to use.
6.5. PARTNER-DOWN state
PARTNER-DOWN state is a state either server can enter. When in this
state, the server does not assume that the other server could still
be operating and servicing a different set of clients, but instead
assumes that it is the only server operating. For this reason, only
one server should be operating in this state at a time.
6.5.1. Upon Entry to PARTNER-DOWN state
When entering PARTNER-DOWN state a server MUST record the time of
entry, and must transmit it during every DHCPPOLL message or DHCPPRPL
Droms, et. al. [Page 36]
DRAFT November 1998
message sent while in PARTNER-DOWN state.
6.5.2. Operation while in PARTNER-DOWN state
A server in PARTNER-DOWN state MUST respond to DHCP client requests.
It will allow renewal of all outstanding leases on IP addresses, and
will allocate IP addresses from its own pool, and after a fixed
period of time (the MCLT interval) has elapsed from entry into
PARTNER-DOWN state, it will allocate IP addresses from the set of all
available IP addresses.
Once a server has entered NORMAL state, the PARTNER-DOWN state is
entered only on command of an external agency (typically an adminis-
trator of some sort) or after the expiration of an externally config-
ured minimum safe-time after the beginning of COMMUNICATIONS-
INTERRUPTED state.
Any available IP address tagged as belonging to the other server (at
entry to PARTNER-DOWN state) MUST NOT be used until the maximum-
client-lead-time beyond the entry into PARTNER-DOWN state has
elapsed.
A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
DHCP client different from that to which it was allocated at the
entrance to PARTNER-DOWN state until the maximum-client-lead-time
beyond the its expiration time has elapsed. If this time would be
earlier than the current time plus the maximum-client-lead-time, then
the current time plus the maximum-client-lead-time is used.
Two options exist for lease times given out while in PARTNER-DOWN
state, with different ramifications flowing from each.
If the server wishes the Failover protocol to protect it from loss of
stable storage in PARTNER-DOWN state, then it should ensure that the
MCLT based lease time restrictions in Section 5.1 are maintained,
even in PARTNER-DOWN state.
If the server wishes to forego the protection of the Failover proto-
col in the event of loss of stable storage, then it need recognize no
restrictions on actual client lease times while in PARTNER-DOWN
state.
A server in PARTNER-DOWN state MUST poll its partner and attempt to
establish communications and synchronization.
While a server is in PARTNER-DOWN state, it MUST send the absolute
time of entry into PARTNER-DOWN using the absolute time option in
Droms, et. al. [Page 37]
DRAFT November 1998
every DHCPPOLL and DHCPRPL message sent.
6.5.3. Transitions out of PARTNER-DOWN state
When a server in PARTNER-DOWN state succeeds in contacting its
partner, its actions are conditional on the state and flags received
in the message from the other server.
If the STARTUP bit is set in the 'flags' field of a received DHCPPOLL
message, the server in PARTNER-DOWN state will send a DHCPPRPL mes-
sage with its current state (and with the absolute PARTNER-DOWN time
in the DHCPPRPL). A server in PARTNER-DOWN state MUST NOT take any
state transitions based on reestablishing communications if the
STARTUP bit is set in the 'flags' field of the messages that reesta-
blished communications.
If the STARTUP bit is not set in the 'flags' field then a server in
PARTNER-DOWN state will move into POTENTIAL-CONFLICT state if the
other server is in the NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-
DOWN, or POTENTIAL-CONFLICT state.
If the STARTUP bit is not set in the 'flags' field, then a server in
PARTNER-DOWN state will stay in PARTNER-DOWN state if it detects that
the other server is in RECOVER state.
If the STARTUP bit is not set in the 'flags' field, then a server in
PARTNER-DOWN state moves into NORMAL state if it detects that the
other server is in RECOVER-DONE state.
6.6. RECOVER state
This state indicates that the server has no information in its stable
storage or that it is re-integrating with a server in PARTNER-DOWN
state after it has been down. A server in this state will attempt to
refresh its stable storage from the other server.
6.6.1. Operation in RECOVER state
A server in RECOVER MUST NOT respond to DHCP client request.
A server in RECOVER state will attempt to reestablish communications
with the other server.
6.6.2. Transitions out of RECOVER state
If the other server is in POTENTIAL-CONFLICT state when communica-
tions are reestablished, then the server in RECOVER state will move
to POTENTIAL-CONFLICT state itself.
Droms, et. al. [Page 38]
DRAFT November 1998
If the other server is in RECOVER state, then this server SHOULD sig-
nal an error and halt processing.
If the other server is in any other state, then the server in RECOVER
state will request an update of missing binding information by send-
ing an UPDATEREQ message. If the server has been configured to indi-
cate that it has lost its stable storage, it will send an
UPDATEREQALL message, otherwise it will send an UPDATEREQ message.
It will wait for an UPDATEDONE message, and upon receipt of that mes-
sage it will start a timer whose expiration is set to a time equal to
the the time the server went down (if known) or the current time (if
the down-time is unknown) plus the maximum-client-lead-time. When
this timer goes off, the server will go into RECOVER-DONE state.
This is to allow any IP addresses that were allocated by this server
prior to loss of its client binding information in stable storage to
contact the other server or to time out.
See Figure 6.6-1.
DISCUSSION:
The actual requirement on this wait period in RECOVER is that it
start when the recovering server went down, not necessarily when
it came back up. If the time when the recovering server failed is
known, then it could be communicated to the recovering server, and
the wait period could be reduced to the maximum-client-lead-time
less the difference between the current time and the time the
server failed. In this way, the waiting period could be minimized.
If an UPDATEDONE message isn't received within an implementation
dependent amount of time, and no DHCPBNDUPD message are being
received, then the UPDATEREQ(ALL) message will be re-transmitted.
Droms, et. al. [Page 39]
DRAFT November 1998
A B
Server Server
| |
RECOVER PARTNER-DOWN
| |
| >--DHCPUPDATEREQ-------------> |
| |
| <-----------------DHCPBNDUPD--< |
| >--DHCPBNDACK----------------> |
... ...
| |
| <-----------------DHCPBNDUPD--< |
| >--DHCPBNDACK----------------> |
| |
| <-------------DHCPUPDATEDONE--< |
| |
Wait MCLT from last known |
time of operation |
| |
RECOVER-DONE |
| |
| >--DHCPPOLL-(RECOVER-DONE)---> |
| <-------------------DHCPPRPL--< |
| |
| NORMAL
| |
| <----------(NORMAL)-DHCPPOLL--< |
| >--DHCPPRPL------------------> |
| |
NORMAL |
| |
| |
Figure 6.6-1: Transition out of RECOVER state
Droms, et. al. [Page 40]
DRAFT November 1998
6.7. NORMAL state
NORMAL state is the state used by a server when it can communicate
with the other server. When in this state, the primary responds to
DHCP all clients requests and while the secondary only responds to
renewal or rebinding requests which it receives. This is one of the
few states where the operation of the primary and secondary servers
are quite different.
6.7.1. Upon Entry to NORMAL state
When entering NORMAL state, a server will send to the other server
all currently unacknowledged DHCPBNDUPD messages.
When the above process is complete, if the server entering NORMAL
state is a secondary server, then it will will request IP addresses
for allocation using the DHCPPOOLREQ message and the techniques
described in section 2.5.
6.7.2. Operation in NORMAL state: Primary Server
When in NORMAL state, the primary server takes the following actions
to implement the Failover protocol:
o Lease Time Calculations
As discussed in section 5.1, "Control of lease time", the lease
interval given to a DHCP client can never be more than the
maximum-client-lead-time greater than the acknowledged partner-
server-lease-interval.
As long as the primary server adheres to this constraint, the
specifics of the lease intervals that it gives to either the
DHCP client or the secondary DHCP server are implementation
dependent. One possible approach is shown in section 5.1, but
that particular approach is in no way required by this protocol.
o Lazy Update of Secondary Server
After an ACK of a IP address binding, the primary server
attempts to update the secondary with the binding information.
The lease time used in the update of the secondary MUST be at
least that given to the DHCP client in the DHCPACK. It MAY,
however, be longer.
Droms, et. al. [Page 41]
DRAFT November 1998
o Reallocation of IP Addresses Between Clients
Whenever a client binding is released, a DHCPBNDUPD message must
be sent to the secondary server, setting the binding state to
RELEASED. However, until a DHCPBNDACK is received for this mes-
sage, the IP address cannot be allocated to another client. It
can be allocated to the same client again.
6.7.3. Operation in NORMAL state: Secondary Server
In normal state, the secondary server receives binding updates from
the primary server in DHCPBNDUPD messages. It records these in its
client binding database in stable storage and then sends the
corresponding DHCPBNDACK message to the primary server. It MUST
ensure that the information is recorded in stable storage prior to
sending the DHCPBNDACK message back to the primary server.
While in NORMAL state, the secondary server MUST also acquire a
series of IP addresses from the primary server to be used to satisfy
DHCPDISCOVER requests from DHCP clients when in COMMUNICATIONS-
INTERRUPTED state. See section 2.5 for details of this acquisition
process.
The secondary server periodically polls the primary server with the
DHCPPOLL message. If it fails to receive a DHCPPRPL message in reply
after a configured number of retries or some administratively deter-
mined time, the secondary server transitions into COMMUNICATIONS-
INTERRUPTED state. Both the DHCPPOLL and DHCPPRPL messages carry the
current state of the sender.
When in normal state, a secondary server is responsive to DHCP client
requests if they are RENEWAL or REBINDING. Any changes it makes to
any leases based on these responses should be sent to the primary
server using DHCPBNDUPD messages.
6.7.4. Transitions out of NORMAL state
If an external command is received by a server in NORMAL state
informing it that its partner is down, then transition into PARTNER-
DOWN state.
If a server in NORMAL state fails to receive acks to any messages
sent to its partner for an implementation dependent period of time,
it will move into COMMUNICATIONS-INTERRUPTED state. (See section
6.2).
Droms, et. al. [Page 42]
DRAFT November 1998
If a server in NORMAL state receives any messages from its partner
where the partner has changed state from that expected by the server
in NORMAL state, then the server should transition into
COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
sition from there. For example, it would be expected for the partner
to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
the partner to transition from NORMAL into POTENTIAL-CONFLICT state.
6.8. COMMUNICATIONS-INTERRUPTED State
A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
unable to communicate with the other server. Primary and secondary
servers cycle automatically (without administrative intervention)
between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
connection between them fails and recovers, or as the partner server
cycles between operational and non-operational. No duplicate IP
address allocation can occur while the servers cycle between these
states.
6.8.1. Upon Entry to COMMUNICATIONS-INTERRUPTED state
When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
configured to support an automatic transition out of COMMUNICATIONS-
INTERRUPTED state and into PARTNER-DOWN state, then a timer MUST be
started for an implementation dependent period.
It is anticipated that some alarm condition would be raised upon the
transition from NORMAL state to COMMUNICATIONS-INTERRUPTED state.
6.8.2. Operation in COMMUNICATIONS-INTERRUPTED State
In this state a server may respond to DHCP client requests. When
allocating new IP addresses, each server allocates from its own IP
address pool. When responding to renewal requests, each server will
allow continued renewal of a DHCP client's current lease on an IP
address, although the renewal period MUST not exceed the maximum
client lead time (MCLT) beyond the lease time already acknowledged by
the other server.
A server operates in COMMUNICATIONS-INTERRUPTED state as the primary
server does in NORMAL state.
However, since the server cannot communicate with its partner in this
state, the acknowledged-partner-lease-time will not be updated in any
new bindings. This is likely to eventually cause the actual-client-
lease-times to be the current-time plus the maximum-client-lead-time
Droms, et. al. [Page 43]
DRAFT November 1998
(unless this is greater than the desired-client-lease-time).
6.8.3. Transition out of COMMUNICATIONS-INTERRUPTED State
If the safe period timer expires while a server is in the
COMMUNICATIONS-INTERRUPTED state, it will go immediately into
PARTNER-DOWN state.
If an external command is received by a server in COMMUNICATIONS-
INTERRUPTED state informing it that its partner is down, it will go
immediately into PARTNER-DOWN state.
If communications is restored with the other server, then the server
in COMMUNICATIONS-INTERRUPTED state will go into another state based
on the state of the partner:
o partner in NORMAL or COMMUNICATIONS-INTERRUPTED
The server will transition into the NORMAL state.
o partner in RECOVER
Stay in COMMUNICATIONS-INTERRUPTED state.
o partner in RECOVER-DONE
Transition into NORMAL state.
o partner in PARTNER-DOWN or POTENTIAL-CONFLICT
Transition into POTENTIAL-CONFLICT state.
o partner in PAUSED
Stay in COMMUNICATIONS-INTERRUPTED state.
o partner in SHUTDOWN
Transition into PARTNER-DOWN state.
Droms, et. al. [Page 44]
DRAFT November 1998
Primary Secondary
Server Server
NORMAL NORMAL
| >--DHCPPOLL----->: |
| :<--------DHCPPOLL--< |
| : |
COMMUNICATIONS : COMMUNICATIONS
INTERRUPTED : INTERRUPTED
| : |
| >--DHCPPOLL------------------> |
| <-------------------DHCPPRPL--< |
NORMAL |
| |
| >--DHCPBNDUPD----------------> |
| <-----------------DHCPBNDACK--< |
| |
| <-------------------DHCPPOLL--< |
| >--DHCPPRPL------------------> |
| NORMAL
| |
| <-----------------DHCPBNDUPD--< |
| >--DHCPBNDACK----------------> |
... ...
| |
| <----------------DHCPPOOLREQ--< |
| >--DHCPPOOLRESP-(2)----------> |
| |
| >--DHCPBNDUPD-(#1)-----------> |
| <-----------------DHCPBNDACK--< |
| |
| <----------------DHCPPOOLREQ--< |
| >--DHCPPOOLRESP-(0)----------> |
| |
| >--DHCPBNDUPD-(#2)-----------> |
| <-----------------DHCPBNDACK--< |
| |
Figure 6.8-1: Transition from NORMAL to COMMUNICATIONS-
INTERRUPTED and back (example with 2
addresses allocated to secondary)
Droms, et. al. [Page 45]
DRAFT November 1998
6.9. POTENTIAL-CONFLICT state
This state indicates that the two servers are attempting to re-
integrate with each other, but at least one of them was running in a
state that did not guarantee automatic reintegration would be
possible. In POTENTIAL-CONFLICT state the servers may determine that
the same IP address has been offered and accepted by two different
DHCP clients.
It is a goal of this protocol to minimize the possibility that
POTENTIAL-CONFLICT state is ever entered.
6.9.1. Upon Entry to POTENTIAL-CONFLICT
When a primary server enters POTENTIAL-CONFLICT state it should
request that the secondary send it all updates of which it is
currently unaware by sending an UPDATEREQ message to the secondary
server.
A secondary server entering POTENTIAL-CONFLICT state will wait for
the primary to send it an UPDATEREQ message.
6.9.2. Operation in POTENTIAL-CONFLICT state
Any server in POTENTIAL-CONFLICT state MUST be unresponsive to incom-
ing DHCP requests.
6.9.3. Transitions out of POTENTIAL-CONFLICT state
If communications fails with the partner while in POTENTIAL-CONFLICT
state, then a primary server will transition to PARTNER-DOWN state
and a secondary server will stay in POTENTIAL-CONFLICT state.
Whenever either server receives an UPDATEDONE message from its
partner, it MUST transition to NORMAL state. This will cause the
primary server to leave POTENTIAL-CONFLICT state prior to the secon-
dary, since the primary sends an UPDATEREQ message and receives an
UPDATEDONE before the secondary sends an UPDATEREQ message and
receives its UPDATEDONE message.
When a secondary server receives an indication that the primary
server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it
SHOULD send an UPDATEREQ message to the primary server.
Droms, et. al. [Page 46]
DRAFT November 1998
Primary Secondary
Server Server
| |
POTENTIAL-CONFLICT POTENTIAL-CONFLICT
| |
| >--DHCPUPDATEREQ-------------> |
| |
| <-----------------DHCPBNDUPD--< |
| >--DHCPBNDACK----------------> |
... ...
| |
| <-----------------DHCPBNDUPD--< |
| >--DHCPBNDACK----------------> |
| |
| <-------------DHCPUPDATEDONE--< |
NORMAL |
| >--DHCPPOLL--(NORMAL) -------> |
| <-------------------DHCPPRPL--< |
| |
| <--------------DHCPUPDATEREQ--< |
| |
| >--DHCPBNDUPD----------------> |
| <-----------------DHCPBNDACK--< |
... ...
| |
| >--DHCPBNDUPD----------------> |
| <-----------------DHCPBNDACK--< |
| |
| >--DHCPUPDATEDONE------------> |
| |
| NORMAL
| |
| <----------------DHCPPOOLREQ--< |
| >--DHCPPOOLRESP--------------> |
| |
Figure 6.9-1: Transition out of POTENTIAL-CONFLICT
Droms, et. al. [Page 47]
DRAFT November 1998
6.10. RECOVER-DONE state
This state exists to allow an interlocked transition for one server
from RECOVER state and another server from PARTNER-DOWN or
COMMUNICATIONS-INTERRUPTED state into NORMAL state.
6.10.1. Operation in RECOVER-DOWN state
A server in RECOVER-DONE state is responsive only to RENEWAL and
REBINDING DHCP messages.
6.10.2. Transitions out of RECOVER-DONE state
When a server in RECOVER-DONE state determines that its partner
server has entered NORMAL state, then it will transition into NORMAL
state as well.
6.11. PAUSED state
This state exists to allow one server to inform another that it will
be out of service for what is predicted to be a relatively short
time, and to allow the other server to transition to COMMUNICATIONS-
INTERRUPTED state immediately and (if it is a secondary server) to
begin servicing clients with no interruption.
A server which is aware that it is shutting down temporarily SHOULD
send one or more DHCPPOLL messages with the 'state' field containing
PAUSED.
While a server may or may not transition internally into PAUSED
state, the 'previous' state determined when it is restarted MUST be
the state the server was in prior to receiving the command to shut-
down and restart and its entry into the PAUSED state.
6.11.1. Upon entry to PAUSED state
When entering PAUSED state, the server MUST remember the previous
state, and use that state as the previous state when it is restarted.
6.11.2. Transitions out of PAUSED state
A server transitions out of PAUSED state by being restarted. At that
time, the previous state MUST be the state the server was in prior to
entering the PAUSED state.
Droms, et. al. [Page 48]
DRAFT November 1998
6.12. SHUTDOWN state
This state exists to allow one server to inform another that it will
be out of service for what is predicted to be a relatively long time,
and to allow the other server to transition immediately to PARTNER-
DOWN state, and take over completely for the server going down.
A server which is aware that it is shutting down SHOULD send one or
more DHCPPOLL messages with the 'state' field containing SHUTDOWN.
While a server may or may not transition internally into SHUTDOWN
state, the 'previous' state determined when it is restarted MUST be
the state active prior to the command to shutdown unless the server
detects that its partner has moved to PARTNER-DOWN, in which case it
MUST be RECOVER.
6.12.1. Upon entry to SHUTDOWN state
When entering SHUTDOWN state, the server MUST record the previous
state in stable storage for use when the server is restarted. It
also MUST record the current time as the last time operational.
A DHCPPOLL message SHOULD be sent to the partner with the 'state'
field containing SHUTDOWN state.
6.12.2.
A server in SHUTDOWN state MUST be unresponsive to DHCP client input.
If a server receives any message indicating that the partner has
moved to PARTNER-DOWN state while it is in SHUTDOWN state (e.g in
response to the DHCPPOLL it sent containing SHUTDOWN state), then it
MUST record RECOVER state as the previous state to be used when it is
restarted.
A server SHOULD wait for a few seconds after informing the partner of
entry into SHUTDOWN state (if communications are okay) to determine
if it will enter PARTNER-DOWN state.
6.12.3. Transitions out of SHUTDOWN state
A server transitions out of SHUTDOWN state by being restarted.
7. Safe Period
Due to the restrictions imposed on each server while in
COMMUNICATIONS-INTERRUPTED state, long-term operation in this state
Droms, et. al. [Page 49]
DRAFT November 1998
is not feasible for either server. One reason that these states
exist at all, is to allow the servers to easily survive transient
network communications failures of a few minutes to a few days
(although the actual time periods will depend a great deal on the
DHCP activity of the network in terms of arrival and departure of
DHCP clients on the network).
Eventually, when the servers are unable to communicate, they will
have to move into a state where they no longer can re-integrate
without the some possibility of a duplicate IP address allocation.
There are two ways that they can move into this state (known as
PARTNER-DOWN).
They can either be informed by external command that, indeed, the
partner server is down. In this case, there is no difficulty in mov-
ing into the PARTNER-DOWN state since it is an accurate reflection of
reality and the protocol has been designed to operate correctly (even
during reintegration) if, when in PARTNER-DOWN state the partner is,
indeed, down.
The more difficult scenario is when the servers are running unat-
tended for extended periods, and in this case an option is provided
to configure something called a "safe-period" into each server. This
OPTIONAL safe-period is the period after which either the primary or
secondary server will automatically transition to PARTNER-DOWN from
COMMUNICATIONS-INTERRUPTED state. If this transition is completed
and the partner is not down, then the possibility of duplicate IP
address allocations will exist.
The goal of the "safe-period" is to allow network operations staff
some time to react to a server moving into COMMUNICATIONS-INTERRUPTED
state. During the safe-period the only requirement is that the net-
work operations staff determine if both servers are still running --
and if they are, to either fix the network communications failure
between them, or to take one of the servers down before the expira-
tion of the safe-period.
The length of the safe-period is installation dependent, and depends
in large part on the number of unallocated IP addresses within the
subnet address pool and the expected frequency of arrival of previ-
ously unknown DHCP clients requiring IP addresses. Many environments
should be able to support safe-periods of several days.
During this safe period, either server will allow renewals from any
existing client. The only limitation concerns the need for IP
addresses for the DHCP server to hand out to new DHCP clients and the
need to re-allocate IP addresses to different DHCP clients.
Droms, et. al. [Page 50]
DRAFT November 1998
The number of "extra" IP addresses required is equal to the expected
total number of new DHCP clients encountered during the safe period.
This is dependent only on the arrival rate of new DHCP clients, not
the total number of outstanding leases on IP addresses.
In the unlikely event that a relatively short safe period of an hour
is all that can be used (given a dearth of IP addresses or a very
high arrival rate of new DHCP clients), even that can provide sub-
stantial benefits in allowing the DHCP subsystem to ride through
minor problems that could occur and be fixed within that hour. In
these cases, no possibility of duplicate IP address allocation
exists, and re-integration after the failure is solved will be
automatic and require no operator intervention.
8. Security
The Failover protocol MAY be secured with a simple shared secret mes-
sage digest which covers each message. Since there are a number of
configuration parameters that must be the same on each server in a
pair, it is not unreasonable to require a shared secret be configured
as well.
Only information within the packet and covered by the message digest
is used for operation of the protocol. It is for this reason that
the IP address of the sending server is sent in the 'sending server
id' field of the fixed header of the failover message when it might
seem that the same information could be recovered from the source
address of the IP packet.
9. Extended Discussion
Some areas in the draft above warranted more extended discussion than
was feasible to insert directly into the next.
1. UDP or TCP
There has been debate about the utility of using UDP for the
Failover protocol, since it doesn't supply guaranteed
delivery. UDP has been chosen as the protocol of choice for
the failover protocol due to the following factors:
First, it is important to recognize that mere receipt of a
packet by the other server in the pair (e.g., receipt of a
DHCPBNDUPD packet by the secondary server) is not sufficient
for the primary to update its own bindings database with new
information about what the secondary knows. In all cases of
Droms, et. al. [Page 51]
DRAFT November 1998
transfers of binding information, the server of a DHCPBNDUPD
message MUST update its own stable storage prior to replying
with a DHCPBNDACK message (except in the marginal case where
all of the updates are rejected). An action is required by
the receiving server and an explicit ACK is needed by the
sending server to ensure the integrity of the protocol. So,
just knowing that the other server has received a Failover
protocol packet is not intrinsically interesting.
Second, the DHCP protocol, both the client and server side, is
being implemented in progressively smaller and smaller
machines. While this progression is most evident in DHCP
clients, there exist implementations today of DHCP servers
embedded in devices that are by no stretch of the imagination
traditional "servers" running mainstream operating systems.
In many ways, the Failover protocol is very well suited to
such devices. Adding additional protocol infrastructure
requirements to implement the Failover protocol might prevent
its implementation in devices that in some ways need it most
(devices with limited stable storage of their own).
Third, there are only a few cases where the Failover protocol
requires guaranteed delivery of packets. In particular, the
normal Primary to Secondary DHCPBNDUPD message do not have to
be delivered reliably. The consequences of lost DHCPBNDUPD
messages are handled by the use of the MCLT, for the simple
reason that since these messages are "lazy", they may not get
delivered because of a server Failover prior to their
transmission. The protocol is robust in the face of loss of
either a DHCPBNDUPD message or a DHCPBNDACK message.
Furthermore, a technique known as "fire and forget" may be
used with this protocol and two cooperating implementations.
If the DHCPBNDACK message contains all of the information ori-
ginally in the DHCPBNDUPD message, then the DHCPBNDUPD message
may be transmitted and forgotten by the sending server (typi-
cally the primary). When and if the secondary receives the
DHCPBNDUPD and replies with a DHCPBNDACK message and the pri-
mary receives it, the primary will update its stable storage
with a new picture of what the secondary knows about the lease
time. If either of these messages is lost, the only downside
is that the DHCP client associated with the binding in ques-
tion may receive a shorter lease for one lease period than it
would otherwise. This "fire and forget" technique could sub-
stantially ease both the complexity of implementation and
memory requirements of an implementation of the Failover pro-
tocol, especially where two servers were communicating over a
very slow link.
Droms, et. al. [Page 52]
DRAFT November 1998
10. Acknowledgments
Ralph Droms started it all, by sketching out an initial interserver
draft that embodied ideas from several past IETF meetings. In that
draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.
Kim Kinnear and Bob Cole each extended that draft, separately and
then together, until they created an interserver draft that supported
any number of servers. The complexity of that approach was just too
great, and that draft wasn't greeted with enthusiasm by many, includ-
ing its authors.
It did however lead to a much simpler approach embodied in the first
Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph
Droms. This draft posited only two servers -- a primary and a secon-
dary.
Kim Kinnear then wrote the Safe Failover draft to layer on top of the
Failover Draft and increase its robustness in the face of certain
rare network failures.
At the spring 1998 IETF meeting in LA, the DHC working group said
that they wanted a merged Failover and Safe Failover draft. Steve
Gonczi and Bernie Volz stepped up and produced the raw material for
such a merged draft, along with a new message format designed around
DHCP options and other extensions and clarifications. Kim Kinnear
edited their work into draft format and made other changes in time
for the Summer Chicago IETF meeting.
During the summer and fall of 1998, two groups have been working on
separate implementations of the evolving draft. Bernie Volz and
Steve Gonczi constitute one group, and Kim Kinnear, Mark Stapp and
Paul Fox make up the other. These two groups have worked together to
produce considerable changes and simplifications of the protocol dur-
ing this period, and Steve Gonczi and Kim Kinnear have edited these
changes into this latest revision in time for submission to the
December 1998 Orlando IETF meeting.
These most recent changes have been reviewed by Ralph Droms, Greg
Rabil, Bernie Volz, Steve Gonczi, Mark Stapp, Paul Fox, and Kim Kin-
near. This does not preclude any of these people from expressing
disagreement with what is contained in this draft at any future time.
Many people have reviewed the various earlier drafts that went into
this result. At American Internet, ideas were contributed by Brad
Parker. At Cisco Systems, Paul Fox, and Ellen Garvey have contri-
buted greatly to the form of the protocol. Glenn Waters of Bay
Droms, et. al. [Page 53]
DRAFT November 1998
Networks contributed ideas and enthusiasm to make a Failover protocol
that was both "safe" and "lazy".
11. References
[RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC
2131, March 1997.
[RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119.
[RFC 2132] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor
Extensions", Internet RFC 2132, March 1997.
12. Author's information
Ralph Droms
323 Dana Engineering
Bucknell University
Lewisburg, PA 17837
Phone: (717) 524-1145
EMail: droms@bucknell.edu
Greg Rabil, Mike Dooley, Arun Kapur
Lucent Technologies (Quadritek)
10 Valley Stream Parkway, Suite 240
Malvern, PA 19355
Phone: (800) 208-2747
EMail: grabil@lucent.com
mdooley@lucent.com
akapur@lucent.com
Kim Kinnear
Mark Stapp
Cisco Systems
250 Apollo Drive
Chelmsford, MA 01824
Phone: (978) 244-8000
Droms, et. al. [Page 54]
DRAFT November 1998
EMail: kkinnear@cisco.com
mjs@cisco.com
Steve Gonczi, Bernie Volz
Process Software Corporation
959 Concord St.
Framingham, MA 01701
Phone: (508) 879-6994
EMail: gonczi@process.com
volz@process.com
Droms, et. al. [Page 55]