Vulnerabilities of network control protocols: An example
RFC 789
|
Document |
Type |
|
RFC - Unknown
(July 1981; No errata)
|
|
Authors |
|
|
|
Last updated |
|
2013-03-02
|
|
Stream |
|
Legacy
|
|
Formats |
|
plain text
html
pdf
htmlized
bibtex
|
Stream |
Legacy state
|
|
(None)
|
|
Consensus Boilerplate |
|
Unknown
|
|
RFC Editor Note |
|
(None)
|
IESG |
IESG state |
|
RFC 789 (Unknown)
|
|
Telechat date |
|
|
|
Responsible AD |
|
(None)
|
|
Send notices to |
|
(None)
|
RFC 789
Vulnerabilities of Network Control Protocols: An Example
Eric C. Rosen
Bolt Beranek and Newman Inc.
RFC 789 Bolt Beranek and Newman Inc.
Eric C. Rosen
This paper has appeared in the January 1981 edition of the
SIGSOFT Software Engineering Notes, and will soon appear in the
SIGCOMM Computer Communications Review. It is being circulated
as an RFC because it is thought that it may be of interest to a
wider audience, particularly to the internet community. It is a
case study of a particular kind of problem that can arise in
large distributed systems, and of the approach used in the
ARPANET to deal with one such problem.
On October 27, 1980, there was an unusual occurrence on the
ARPANET. For a period of several hours, the network appeared to
be unusable, due to what was later diagnosed as a high priority
software process running out of control. Network-wide
disturbances are extremely unusual in the ARPANET (none has
occurred in several years), and as a result, many people have
expressed interest in learning more about the etiology of this
particular incident. The purpose of this note is to explain what
the symptoms of the problem were, what the underlying causes
were, and what lessons can be drawn. As we shall see, the
immediate cause of the problem was a rather freakish hardware
malfunction (which is not likely to recur) which caused a faulty
sequence of network control packets to be generated. This faulty
sequence of control packets in turn affected the apportionment of
software resources in the IMPs, causing one of the IMP processes
to use an excessive amount of resources, to the detriment of
other IMP processes. Restoring the network to operational
- 1 -
RFC 789 Bolt Beranek and Newman Inc.
Eric C. Rosen
condition was a relatively straightforward task. There was no
damage other than the outage itself, and no residual problems
once the network was restored. Nevertheless, it is quite
interesting to see the way in which unusual (indeed, unique)
circumstances can bring out vulnerabilities in network control
protocols, and that shall be the focus of this paper.
The problem began suddenly when we discovered that, with
very few exceptions, no IMP was able to communicate reliably with
any other IMP. Attempts to go from a TIP to a host on some other
IMP only brought forth the "net trouble" error message,
indicating that no physical path existed between the pair of
IMPs. Connections which already existed were summarily broken.
A flood of phone calls to the Network Control Center (NCC) from
all around the country indicated that the problem was not
localized, but rather seemed to be affecting virtually every IMP.
As a first step towards trying to find out what the state of
the network actually was, we dialed up a number of TIPs around
the country. What we generally found was that the TIPs were up,
but that their lines were down. That is, the TIPs were
communicating properly with the user over the dial-up line, but
no connections to other IMPs were possible.
We tried manually restarting a number of IMPs which are in
our own building (after taking dumps, of course). This procedure
initializes all of the IMPs' dynamic data structures, and will
- 2 -
RFC 789 Bolt Beranek and Newman Inc.
Eric C. Rosen
often clear up problems which arise when, as sometimes happens in
most complex software systems, the IMPs' software gets into a
"funny" state. The IMPs which were restarted worked well until
they were connected to the rest of the net, after which they
exhibited the same complex of symptoms as the IMPs which had not
been restarted.
From the facts so far presented, we were able to draw a
number of conclusions. Any problem which affects all IMPs
throughout the network is usually a routing problem. Restarting
an IMP re-initializes the routing data structures, so the fact
that restarting an IMP did not alleviate the problem in that IMP
suggested that the problem was due to one or more "bad" routing
updates circulating in the network. IMPs which were restarted
would just receive the bad updates from those of their neighbors
which were not restarted. The fact that IMPs seemed unable to
keep their lines up was also a significant clue as to the nature
of the problem. Each pair of neighboring IMPs runs a line
Show full document text