Network Working Group                                          T. Brisco
Request for Comments: 1794                            Rutgers University
Category: Informational                                       April 1995

                     DNS Support for Load Balancing

Status of this Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

1. Introduction

   This RFC is meant to first chronicle a foray into the IETF DNS
   Working Group, discuss other possible alternatives to
   provide/simulate load balancing support for DNS, and to provide an
   ultimate, flexible solution for providing DNS support for balancing
   loads of many types.

2. History

   The history of this probably dates back well before my own time - so
   undoubtedly some holes are here.  Hopefully they can be filled in by
   other authors.

   Initially; "load balancing" was intended to permit the Domain Name
   System (DNS) [1] agents to support the concept of "clusters" (derived
   from the VMS usage) of machines - where all machines were
   functionally similar or the same, and it didn't particularly matter
   which machine was picked - as long as the load of the processing was
   reasonably well distributed across a series of actual different
   hosts.  Around 1986 a number of different schemes started surfacing
   as hacks to the Berkeley Internet Name Domain server (BIND)
   distribution.  Probably the most widely distributed of these were the
   "Shuffle Address" (SA) modifications by Bryan Beecher, or possibly
   Marshall Rose's "Round Robin" code.

   The SA records, however, did a round-robin ordering of the Address
   resource records, and didn't do much with regard to the particular
   loads on the target machines.  Matt Madison (of TGV) implemented some
   changes that used VMS facilities to review the system loads, and
   return A RRs in the order of least-loaded to most loaded.

   The problem was with SAs was that load was not actually a factor, and
   TGV's relied on VMS specific facilities to order the records.  The SA
   RRs required changes to the DNS specification (in file syntax and in

   record processing).  These were both viewed as drawbacks and not as
   general solutions.

   Most of the Internet waited in anticipation of an IETF approved
   method for simulating "clusters".

   Through a few IETF DNS Working Group sessions (Chaired by Rob Austein
   of Epilogue), it was collectively agreed upon that a number of
   criteria must be met:

       A) Backwards compatibility with the existing DNS RFC.

       B) Information changes frequently.

       C) Multiple addresses should be sent out.

       D) Must interact with other RRs appropriately.

       E) Must be able to represent many types of "loads"

       F) Must be fast.

   (A) would ensure that the installed base of BIND and other DNS
   implementations would continue to operate and interoperate properly.

   (B) would permit very fast update times - to enable modeling of
   real-time data.  Five minutes was thought as a normal interval,
   though changes as fast as every sixty seconds could be imagined.

   (C) would cover the possibility of a host's address being advertised
   as optimal, yet the machine crashed during the period within the TTL
   of the RR.  The second-most preferable address would be advertised
   second, the third-most preferable third, and so on.  This would allow
   a reasonable stab at recovery during machine failures.

   (D) would ensure correct handling of all ancillary information - such
   as MX, RP, and TXT information, as well as reverse lookup
   information.  It needed to be ensured that such processes as mail
   handling continued to work in an unsurprising and predictable manner.

   (E) would ensure the flexibility that everyone wished.  A breadth of
   "loads" were wished to be represented by various members of the DNS
   Working Group.  Some "loads" were fairly eclectic - such as the
   address ordering by the RTT to the host, some were pragmatic - such
   as balancing the CPU load evenly across a series of hosts.  All
   represented valid concerns within their own context, and the idea of
   having separate RR types for each was unthinkable (primarily; it
   would violate goal A).

   (F) needed to ensure a few things.  Primarily that the time to
