Vulnerabilities of network control protocols: An example
RFC 789

Document Type RFC - Unknown (July 1981; No errata)
Last updated 2013-03-02
Stream Legacy
Formats plain text pdf html bibtex
Stream Legacy state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state RFC 789 (Unknown)
Telechat date
Responsible AD (None)
Send notices to (None)
RFC 789

    Vulnerabilities of Network Control Protocols: An Example

                          Eric C. Rosen

                  Bolt Beranek and Newman Inc.

RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

     This paper has appeared in the January 1981 edition  of  the

SIGSOFT  Software  Engineering Notes, and will soon appear in the

SIGCOMM Computer Communications Review.  It is  being  circulated

as  an  RFC because it is thought that it may be of interest to a

wider audience, particularly to the internet community.  It is  a

case  study  of  a  particular  kind of problem that can arise in

large distributed systems,  and  of  the  approach  used  in  the

ARPANET to deal with one such problem.

     On  October 27, 1980, there was an unusual occurrence on the

ARPANET.  For a period of several hours, the network appeared  to

be  unusable,  due to what was later diagnosed as a high priority

software  process   running   out   of   control.    Network-wide

disturbances  are  extremely  unusual  in  the  ARPANET (none has

occurred in several years), and as a  result,  many  people  have

expressed  interest  in  learning more about the etiology of this

particular incident.  The purpose of this note is to explain what

the symptoms of the problem  were,  what  the  underlying  causes

were,  and  what  lessons  can  be  drawn.   As we shall see, the

immediate cause of the problem was  a  rather  freakish  hardware

malfunction  (which is not likely to recur) which caused a faulty

sequence of network control packets to be generated.  This faulty

sequence of control packets in turn affected the apportionment of

software resources in the IMPs, causing one of the IMP  processes

to  use  an  excessive  amount  of resources, to the detriment of

other  IMP  processes.   Restoring  the  network  to  operational

                              - 1 -

RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

condition  was  a  relatively straightforward task.  There was no

damage other than the outage itself,  and  no  residual  problems

once  the  network  was  restored.   Nevertheless,  it  is  quite

interesting to see the way  in  which  unusual  (indeed,  unique)

circumstances  can  bring  out vulnerabilities in network control

protocols, and that shall be the focus of this paper.

     The problem began suddenly when  we  discovered  that,  with

very few exceptions, no IMP was able to communicate reliably with

any other IMP.  Attempts to go from a TIP to a host on some other

IMP   only   brought  forth  the  "net  trouble"  error  message,

indicating that no physical path  existed  between  the  pair  of

IMPs.   Connections  which already existed were summarily broken.

A flood of phone calls to the Network Control Center  (NCC)  from

all  around  the  country  indicated  that  the  problem  was not

localized, but rather seemed to be affecting virtually every IMP.

     As a first step towards trying to find out what the state of

the network actually was, we dialed up a number  of  TIPs  around

the  country.  What we generally found was that the TIPs were up,

but  that  their  lines  were  down.   That  is,  the  TIPs  were

communicating  properly  with the user over the dial-up line, but

no connections to other IMPs were possible.

     We tried manually restarting a number of IMPs which  are  in

our own building (after taking dumps, of course).  This procedure

initializes  all  of  the IMPs' dynamic data structures, and will

                              - 2 -

RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

often clear up problems which arise when, as sometimes happens in

most complex software systems, the IMPs'  software  gets  into  a

"funny"  state.   The IMPs which were restarted worked well until

they were connected to the rest of  the  net,  after  which  they

exhibited  the same complex of symptoms as the IMPs which had not

been restarted.

     From the facts so far presented, we  were  able  to  draw  a

number  of  conclusions.   Any  problem  which  affects  all IMPs

throughout the network is usually a routing problem.   Restarting

an  IMP  re-initializes  the routing data structures, so the fact

that restarting an IMP did not alleviate the problem in that  IMP

suggested  that  the problem was due to one or more "bad" routing

updates circulating in the network.  IMPs  which  were  restarted

would  just receive the bad updates from those of their neighbors

which were not restarted.  The fact that IMPs  seemed  unable  to

keep  their lines up was also a significant clue as to the nature

of the problem.  Each  pair  of  neighboring  IMPs  runs  a  line
Show full document text